gabriel teston

40 posts

gabriel teston banner
gabriel teston

gabriel teston

@GabrielTeston

Solving language @Google Search

Katılım Aralık 2023
82 Takip Edilen57 Takipçiler
Sabitlenmiş Tweet
gabriel teston
gabriel teston@GabrielTeston·
Training LLMs across multiple datacenters is hard. 🛑 Synchronization demands often cause massive slowdowns as we scale up. If you're at @NeurIPSConf, come see how we tackle this! Our work, "Scaling Laws for DiLoCo," shows how DiLoCo relax synchronization without compromising model quality, allowing training to scale incredibly well. Come chat with me and @NovaFallen8: 🗓️ Thu, Dec 4 ⏰ 11 AM – 2 PM PST 📍 Exhibit Hall C,D,E, #811 #NeurIPS2025 #LLMs #DistributedTraining #ScalingLaws
Zachary Charles@MatharyCharles

We just put out a key step for making distributed training work at larger and larger models: Scaling Laws for DiLoCo TL;DR: We can do LLM training across datacenters in a way that scales incredibly well to larger and larger models!

English
0
3
10
6.3K
gabriel teston
gabriel teston@GabrielTeston·
It is today! ⏰ 11 AM – 2 PM PST 📍 Exhibit Hall C,D,E, #811
gabriel teston@GabrielTeston

Training LLMs across multiple datacenters is hard. 🛑 Synchronization demands often cause massive slowdowns as we scale up. If you're at @NeurIPSConf, come see how we tackle this! Our work, "Scaling Laws for DiLoCo," shows how DiLoCo relax synchronization without compromising model quality, allowing training to scale incredibly well. Come chat with me and @NovaFallen8: 🗓️ Thu, Dec 4 ⏰ 11 AM – 2 PM PST 📍 Exhibit Hall C,D,E, #811 #NeurIPS2025 #LLMs #DistributedTraining #ScalingLaws

English
0
0
0
46
gabriel teston
gabriel teston@GabrielTeston·
Heading to @NeurIPSConf in San Diego. I’ve got some DiLoCo stickers to give away! 👾 ❤️ Come check out our poster. 🗓️ Thu, Dec 4 ⏰ 11 AM – 2 PM PST 📍 Exhibit Hall C,D,E, #811 #NeurIPS2025
gabriel teston tweet media
English
0
1
5
603
gabriel teston retweetledi
Sam Lehman
Sam Lehman@SPLehman·
Attending @NeurIPSConf and interested in distributed, modular, and/or open AI? Hadn't seen someone put together a list of poster presentations in this area so took it upon myself to thread out who I'm excited to talk to next week🧵
English
5
5
48
4.7K
Yacine Mahdid
Yacine Mahdid@yacinelearning·
hope it helps also for poster sessions try to find the ones doing RL on your favorite game and ask them very precise questions about the game (not their work) to test out their skillset
English
3
0
16
811
gabriel teston retweetledi
Dan Advantage
Dan Advantage@DanAdvantage·
ultimate flex thread; now's your chance to brag without being judged. > 5 things you accomplished this week i'll go first: 1) lost ~1 pound, no muscle loss 2) ran 35 miles 3) upped father game, paying extra attention to kids and engaging them 4) @yacinelearning interview 5) $$$$
English
9
0
33
1.2K
Arthur Douillard
Arthur Douillard@Ar_Douillard·
I’ve been promoted to Staff RS. Vain title etc. but feels good to see appreciation for distributed learning in DeepMind ☺️
English
39
10
439
40.3K
Yacine Mahdid
Yacine Mahdid@yacinelearning·
what are you 5 years goal folks
Yacine Mahdid tweet media
English
42
7
270
83.9K
Arthur Douillard
Arthur Douillard@Ar_Douillard·
Learned today that a startup is using Streaming DiLoCo to train a distributed AlphaFold-like model. Happy :)
English
2
0
24
1.7K
Immalittle.bit
Immalittle.bit@riseofreh·
@VictorTaelin Minha função era escavar, procurar por timestamps, snapshots, vídeos espelhos, e coisas bizarras que são ocultas pra maioria das pessoas. Você navega por uma curva crítica... Eu por outra...
Português
1
0
0
248
Taelin
Taelin@VictorTaelin·
At this point everyone, even OpenAI, is annoyed at me posting Codex stuff non-stop. You get it. I know you do. But I'm genuinely having so much fun and progress building HVM4 that I have to contain myself not to post every small thing. So many things that were previously just not viable due to time constraints are now within reach. For example, HVM1 needed a major refactor of the GC before we could fix the parallel bug. That would take days. I never had that time. HVM4 often needs major refactors to push things forward too, except, now, I can do that by writing a 30 minute prompt, rather than 2 days of coding. In the last 2 days, I iterated through 5+ completely different approaches to task parallelism on HVM4's runtime. Most attempts went nowhere, but that means I lost minutes, not days, as it would take previously. Writing concurrent code is HARD. So many subtle errors. Should I put a fence here? Should this CAS use seq_rel? Should this field exist? Add a subtle bug and there goes an evening debugging... Now, I can just draft a half baked prompt that explains what is on my mind in a way that at least makes some sense, and the AI will fill the holes implement my idea as I browse X. 99% of the times, it just works. Sure, it is most likely just recalling details it learned from human literature. "Okay to write a MPMC, I need this atomic here, then this fence here." But applying it to my specific setup in 10 minutes is nothing but miraculous. I'm just thinking about the last 2 days and there is absolutely no way I'd be able to have tried so many different things in so little time. Now HVM4 has a parallel runtime that works, scales well with cores, all tests pass, near 0 error rate. 2 days ago I had none of that. HVM3 spent a whole year without one!
Taelin tweet media
English
33
19
705
63.8K
elie
elie@eliebakouch·
Open source release of smol merch coming soon at colm 🤏
elie tweet media
English
4
4
55
4.3K
gabriel teston
gabriel teston@GabrielTeston·
Want to learn how to train models across the world, with 400x less bits exchanged and a huge latency tolerance? 🌎 I’ll be presenting our work on how to efficiently scale distributed training at @COLM_conf. 🗓️ TODAY: Tuesday, 11:00 - 13:00 📍 Room 710 #COLM2025
English
0
2
10
4.1K