Sabitlenmiş Tweet

Today we release DeepSeek-TNG R1T2 Chimera.
This new Chimera is a Tri-Mind Assembly-of-Experts model with three parents, namely R1-0528, R1 and V3-0324.
R1T2 operates at a sweet spot in intelligence vs. output token length. It appears to be...
* about 20% faster than R1, and more than twice as fast as R1-0528
* significantly more intelligent than R1 in benchmarks such as GPQA Diamond and AIME-24/25, albeit not quite on R1-0528 level
* much more intelligent than our first R1T Chimera, and also think-token consistent, which is a major improvement
We perceive it as generally well-behaved and a nice persona to talk to. The weights are on @huggingface under the MIT licence. We are looking forward to your experiments and feedback!
Thanks to @deepseek_ai for giving their models to the world, to @chutes_ai and @openrouter for hosting R1T, to @WolframRvnwlf for benchmarking it, to @xlr8harder for beta-testing the new Chimera, and to @natolambert for constructive discussions at @aiDotEngineer.

English






















