Cody Blakeney

20.2K posts

Cody Blakeney banner
Cody Blakeney

Cody Blakeney

@code_star

Data Dawg @datologyai | Formerly Data Research Lead @DbrxMosaicAI | Visiting Researcher @ Facebook | Ph.D | #TXSTFOOTBALL fan | https://t.co/4G6Jf3at5w

Redwood City, CA Entrou em Ağustos 2011
1.7K Seguindo6.4K Seguidores
Tweet fixado
Cody Blakeney
Cody Blakeney@code_star·
Excited to announce the return of American OSS with Arcee Trinity Large. This model couldn't have been possible without the awesome collaboration of Modeling @arcee_ai , Infra @PrimeIntellect , and Data @datologyai I can't say enough about how talented the whole team at Arcee is being able to scale from their first MoE to a big boy like this in such a short time. Since the last data mix we have been in the lab pushing our midtraining and synthetic data to the limits. For Trinity Large we generated over 800B tokens of high quality synthetic code and 6.5T(!!!) tokens overall. We also added multilingual curation. This was a massive effort from the whole Datology family. From scaling up the rephrasing workflows to support heterogenous clusters to scale efficiently (@isabelle226ku, @JackUrbs, @parthjdoshi @haakonmongstad @alvind319), pushing out midtraining and mixing (@_BrettLarsen) , innovating on new code synthetic data (@amrokamal1997) new math synthetic data (David Schwab), multilingual curation (@KaleighMentzer, @agcrnz, @RicardoMonti9) and of course built on our great foundation of synthetic data (@pratyushmaini Vineeth Dorna)
Arcee.ai@arcee_ai

Today, we’re releasing the first weights from Trinity Large, our first frontier-scale model in the Trinity MoE family.

English
9
19
123
12.4K
Cody Blakeney
Cody Blakeney@code_star·
@birdabo Well … h100 demand and prices are actually going up right now
English
2
0
8
1.2K
sui ☄️
sui ☄️@birdabo·
🚨Goldman Sachs just confirmed something insane and nobody’s talking about it companies spent $450 billion in AI and contributed zero to US economic growth. not “minimal.” actually zero. these companies fired a total of 30,000+ human workers to bet on ai infrastructure. it failed. so where did the money went? > Nvidia took $130B. Jensen got rich selling GPUs - not from AI working, from everyone buying hardware. > corporate buybacks got the rest. cut jobs, slash costs, pump stock, buy it back. money went to shareholders, not productivity. > the bubble: H100 clusters sitting underutilized. enterprise contracts collecting dust. same hype cycle. the damage is done. Meta fired 21K. Amazon 16K. Atlassian 1.6K. those jobs aren’t coming back.​​​​​​​​​​​​​​​​ they bet the economy on AI productivity. Goldman confirmed there is no productivity. lmao either this pays off in 18 months or we just witnessed the largest capital misallocation in tech history and we can’t undo it 💀
unusual_whales@unusual_whales

"Massive investment in AI contributed basically zero to US economic growth last year," per Goldman Sachs

English
133
1.4K
10.2K
561.6K
Cody Blakeney
Cody Blakeney@code_star·
Found another great midtraining paper. I haven't seen it on my TL so thought I would share. Super excited to dig into it later but looks really promising. (ty @lukemerrick_ ) I love seeing more work unifying understanding of midtraining -> RL
Cody Blakeney tweet mediaCody Blakeney tweet mediaCody Blakeney tweet mediaCody Blakeney tweet media
English
3
25
235
12.3K
Cody Blakeney retweetou
Cody Blakeney
Cody Blakeney@code_star·
I’m not going to say frontier models with good harnesses can’t solve incredibly difficult problems. I will say many people who tried fine tuning circa 2022-2024 were just way to earlier. The base models at the time just weren’t good enough to meaningfully be improved. The methods for generating training data were terrible, using models as judges was not economic or reliable and paying for human annotations was too far out of reach. Fast forward to 2026 and many people have developed sophisticated and realistic evaluation environments which if you squint look a lot like what you want for RL. Cost of tokens is way down and the intelligence per token is much higher. Training infrastructure is much simpler to setup, algorithms and data are better. It’s more possible than it has ever been to adapt models to solve real world problems with a small team.
Michel Levy Provençal@mikiane

Mistral lance Forge pour fine-tuner. 95% des boîtes n'en ont pas besoin. Un frontier + RAG + tools bat un fine-tuning custom. À chaque fois. Fine-tuning = 2023 Orchestration = 2026 Non ? mistral.ai/news/forge

English
2
6
61
7.7K
Cody Blakeney retweetou
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
The entire NSF research budget is ~$9B/year. This is literally funding every awarded PI at every field and every institution. But we've decided that all of basic science is a rounding error in comparison to venture bets. Please consider funding basic science more.
English
10
47
469
33K
Cody Blakeney retweetou
Samip
Samip@industriaalist·
Announcing 10x data efficiency on NanoGPT Slowrun! There are two macro trends worth highlighting: - pretraining is nowhere close to done, and - 100x looks feasible. Writeup on all the core ideas: qlabs.sh/10x
GIF
English
6
35
233
17.7K
Ariel
Ariel@redtachyon·
For the next few months I'm finding myself temporarily unemployed. The good part is that I can do anything OSS and not worry about IP issues. The bad part is that my GPU resources are very limited. Anyways, I'm planning to expand gyllm a bit - I think the core API is solid, but it could use some more features, and there's only so much I can do on a single Spark. Sooooo anyone got any GPU credits for some good OSS RL?
English
6
0
57
4.1K
Cody Blakeney retweetou
N8 Programs
N8 Programs@N8Programs·
This is why @allen_ai is so important - it's an organization philosophically committed to complete open-source. It doesn't open-source to generate traction/investor money to then go closed source later (or farm community goodwill w/ OSS releases in order to then offer premium, closed-source models) - it open-sources as that's its ideological commitment. Even if its models aren't SOTA, the fact is that its current models are OSS and as long as it has the backing, it will keep making OSS models. And this is important, as OSS today doesn't guarantee OSS tmrw. It's unclear if Minimax 2.7 will be OSS. If Qwen4 wlil be OSS. GLM-5 Turbo is a closed-source beta. Minimax 2.5, Qwen3.5, and GLM-5 may be open-source, but they are ultimately made by commercial companies with commercial goals who take their models closed-source when they wish. So while I may use the above models now, I can rest assured that Olmo 4, and Olmo 5, and Olmo N, will all be open-source if they exist. There will not be a more-capable Olmo locked behind an API. We will always know how Olmo models are made, have the data they were trained on, and the intermediate checkpoints. And that's incredibly, incredibly valuable.
Nathan Lambert@natolambert

Qwen is irreplaceable. Has been going from strength to strength in recent times. Things will always be different, I'm hopeful we can find groups of other models to fill the void. RIP

English
5
8
96
6.9K
Cody Blakeney retweetou
jason liu
jason liu@jxnlco·
jason liu tweet media
ZXX
9
3
167
4.5K
Cody Blakeney retweetou
Eric W. Tramel
Eric W. Tramel@fujikanaeda·
a new competition from JF and the grandmasters?? you know it’s going to be something special. you don’t want to miss out on this one, kagglers :)
JFPuget 🇺🇦🇨🇦🇬🇱@JFPuget

NVIDIA is hosting a Kaggle competition. How can you train a nemotron nano model to solve scientific questions? I hope you'll enjoy it! For this competition @kaggle secured NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs from Google Cloud. These GPUs are much more powerful than the usual Kaggle GPUs. Come and try these beasts! kaggle.com/competitions/n…

English
0
1
7
877