Maximilian Bode

70 posts

Maximilian Bode

Maximilian Bode

@mxpbode

Associate Partner @tngtech

Munich, Bavaria Bergabung Haziran 2018
148 Mengikuti87 Pengikut
Maximilian Bode me-retweet
TNG Technology Consulting GmbH
G1PO, our @UnitreeRobotics humanoid robot, was in a playful showrace against the Munich @Motorworld_de's #GT3RS. Of course, he still got helping hands from his human friends to steer him and thus the kart. @Porsche cars and TNG managers narrowly escaped, and the oil drums got only slightly bumped (yt link in reply ;-).
TNG Technology Consulting GmbH tweet media
English
1
4
11
541
Maximilian Bode me-retweet
TNG Technology Consulting GmbH
News from the Aider discord regarding DeepSeek-TNG R1T2 Chimera's performance in the Aider Polyglot benchmark, courtesy of benchmark wizard neolithic5452 and the magic @UnslothAI quantizations: - 2 bit UD-IQ2_M: 60.0% - 4 bit Q4_K_XL: 62.7% - 8 bit: 64.4% This seems to be the second highest open-weights result, after @deepseek_ai's R1-0528 which scored 71.4%. It appears to be before Kimi K2 (59.1%) and also @Alibaba_Qwen's Coder-480B, which scored 60.9% in the 4 bit UD-Q4_K_XL version. On @openrouter, R1T2 currently is the 11th most popular model for Aider. Over all applications, R1T2 processed 3.12B tokens yesterday on OR. On @chutes_ai, as of today is the tenth-most popular model.
English
4
5
33
1.6K
Maximilian Bode me-retweet
TNG Technology Consulting GmbH
DeepSeek-TNG-R1T2-Chimera is currently the #1 trending model on @openrouter The platform which @karpathy called the "transfer switch of AI". Our Assembly-of-Experts method pushes the Pareto frontier between model intelligence and inference cost. Thanks again to the Open Weights community: @deepseek_ai @huggingface, @AIatMeta @openrouter @chutes_ai @UnslothAI and @jon_durbin @ping_toven @xlr8harder @reach_vb @alexatallah to name a few
TNG Technology Consulting GmbH tweet mediaTNG Technology Consulting GmbH tweet media
Deedy@deedydas

Karpathy at YC startup school calls this the transfer switch of AI. It's hard to keep up with all the LLMs out there. OpenRouter is the go-to place for 2.5M developers to choose from 400+ models with one API. They're serving 100T tokens/yr! Excited to back this special company!

English
1
12
67
12.1K
Maximilian Bode me-retweet
TNG Technology Consulting GmbH
Today we release DeepSeek-TNG R1T2 Chimera. This new Chimera is a Tri-Mind Assembly-of-Experts model with three parents, namely R1-0528, R1 and V3-0324. R1T2 operates at a sweet spot in intelligence vs. output token length. It appears to be... * about 20% faster than R1, and more than twice as fast as R1-0528 * significantly more intelligent than R1 in benchmarks such as GPQA Diamond and AIME-24/25, albeit not quite on R1-0528 level * much more intelligent than our first R1T Chimera, and also think-token consistent, which is a major improvement We perceive it as generally well-behaved and a nice persona to talk to. The weights are on @huggingface under the MIT licence. We are looking forward to your experiments and feedback! Thanks to @deepseek_ai for giving their models to the world, to @chutes_ai and @openrouter for hosting R1T, to @WolframRvnwlf for benchmarking it, to @xlr8harder for beta-testing the new Chimera, and to @natolambert for constructive discussions at @aiDotEngineer.
TNG Technology Consulting GmbH tweet media
English
21
87
391
126.5K
Maximilian Bode me-retweet
TNG Technology Consulting GmbH
We post our new paper "Assembly of Experts: Linear-time construction of the Chimera LLM variants with emergent and adaptable behaviors" on @huggingface, while waiting for @arxiv. We explain how we constructed the 671B R1T Chimera child model from the great @deepseek_ai V3-0324 and R1 parent models (谢谢!) in less than one hour of CPU time. The Chimera research prototype is currently the 4th most-popular LLM on @chutes_ai with about 3.5B tokens/day, 0.9-1.0B of which flow through @openrouter. Over 160B tokens have been processed since release on April 26th.
TNG Technology Consulting GmbH tweet media
English
1
9
58
12.8K
Maximilian Bode me-retweet
TNG Technology Consulting GmbH
Assembly of Experts: Our linear-time 671B Chimera LLM construction paper should soon appear on arxiv.org. We are at the @aiDotEngineer fair in SFO until tomorrow, so for a chat -> DM ;-)
TNG Technology Consulting GmbH tweet media
English
0
4
15
779
Maximilian Bode me-retweet
TNG Technology Consulting GmbH
More evidence for the effectiveness of the Chimera construction method: Taking DeepSeek's R1-0528 release, we started benchmarking new Chimera variants on AIME-24 and SimpleQA. R1-0528 significantly improves AIME performance from 79.8 to 91.4 while doubling the amount of output tokens compared to R1. It appears to be a great model overall. Our R1-0528-Chimera variants of R1-0528 and V3-0324 seem to improve math performance too, with results up to 83.3. Positive aspect: more compact reasoning with output-token count below that of R1. This could be an interesting trade-off for real-world applications. On SimpleQA R1T-0528-Chimera is close to the V3-0328 results and seems to fare better than R1-0528. Caveat: these numbers are just a preliminary indicator, since they stem from single benchmark runs and are subject to statistical fluctuation. We'll continue searching for variants with interesting features and beneficial behavior combinations
TNG Technology Consulting GmbH tweet mediaTNG Technology Consulting GmbH tweet media
English
2
14
83
9.1K