
Today, we’re releasing the first weights from Trinity Large, our first frontier-scale model in the Trinity MoE family.
Cody Blakeney
20.2K posts

@code_star
Data Dawg @datologyai | Formerly Data Research Lead @DbrxMosaicAI | Visiting Researcher @ Facebook | Ph.D | #TXSTFOOTBALL fan | https://t.co/4G6Jf3at5w

Today, we’re releasing the first weights from Trinity Large, our first frontier-scale model in the Trinity MoE family.

I am now at 5 GPU providers being completely sold out for a single node of 8xH100s. I don’t think people understand the gravity of what is about to come.


"Massive investment in AI contributed basically zero to US economic growth last year," per Goldman Sachs






I’m not going to say frontier models with good harnesses can’t solve incredibly difficult problems. I will say many people who tried fine tuning circa 2022-2024 were just way to earlier. The base models at the time just weren’t good enough to meaningfully be improved. The methods for generating training data were terrible, using models as judges was not economic or reliable and paying for human annotations was too far out of reach. Fast forward to 2026 and many people have developed sophisticated and realistic evaluation environments which if you squint look a lot like what you want for RL. Cost of tokens is way down and the intelligence per token is much higher. Training infrastructure is much simpler to setup, algorithms and data are better. It’s more possible than it has ever been to adapt models to solve real world problems with a small team.

Mistral lance Forge pour fine-tuner. 95% des boîtes n'en ont pas besoin. Un frontier + RAG + tools bat un fine-tuning custom. À chaque fois. Fine-tuning = 2023 Orchestration = 2026 Non ? mistral.ai/news/forge

If all of us are contributing training data to OpenAI/Anthropic, aren't we all "Members of Technical Staff" in our own way?

> dario buys @bunjavascript > 3 months later > sama buys @astral_sh > 3 months later > google panic buys @linuxfoundation



Qwen is irreplaceable. Has been going from strength to strength in recent times. Things will always be different, I'm hopeful we can find groups of other models to fill the void. RIP

NVIDIA is hosting a Kaggle competition. How can you train a nemotron nano model to solve scientific questions? I hope you'll enjoy it! For this competition @kaggle secured NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs from Google Cloud. These GPUs are much more powerful than the usual Kaggle GPUs. Come and try these beasts! kaggle.com/competitions/n…

We were able to significantly improve the model quality and cost to serve. These quality improvements come from our first continued pretraining run, providing a far stronger base to scale our reinforcement learning.

I can do this all day @Dorialexander Your move
