
We just put out a key step for making distributed training work at larger and larger models: Scaling Laws for DiLoCo TL;DR: We can do LLM training across datacenters in a way that scales incredibly well to larger and larger models!
gabriel teston
40 posts

@GabrielTeston
Solving language @Google Search

We just put out a key step for making distributed training work at larger and larger models: Scaling Laws for DiLoCo TL;DR: We can do LLM training across datacenters in a way that scales incredibly well to larger and larger models!

Training LLMs across multiple datacenters is hard. 🛑 Synchronization demands often cause massive slowdowns as we scale up. If you're at @NeurIPSConf, come see how we tackle this! Our work, "Scaling Laws for DiLoCo," shows how DiLoCo relax synchronization without compromising model quality, allowing training to scale incredibly well. Come chat with me and @NovaFallen8: 🗓️ Thu, Dec 4 ⏰ 11 AM – 2 PM PST 📍 Exhibit Hall C,D,E, #811 #NeurIPS2025 #LLMs #DistributedTraining #ScalingLaws





How to attend AI research conferences smart NeurIPS is in two weeks, and several people asked me how to make the most out of big AI conferences. Here are 7 tips I’ve developed after attending many of them - to maximize your time and actually extract value!


Our TPUs are headed to space! Inspired by our history of moonshots, from quantum computing to autonomous driving, Project Suncatcher is exploring how we could one day build scalable ML compute systems in space, harnessing more of the sun’s power (which emits more power than 100 trillion times humanity’s total electricity production). Like any moonshot, it’s going to require us to solve a lot of complex engineering challenges. Early research shows our Trillium-generation TPUs (our tensor processing units, purpose-built for AI) survived without damage when tested in a particle accelerator to simulate low-earth orbit levels of radiation. However, significant challenges still remain like thermal management and on-orbit system reliability. More testing and breakthroughs will be needed as we count down to launch two prototype satellites with @planet by early 2027, our next milestone of many. Excited for us to be a part of all the innovation happening in (this) space!

Our TPUs are headed to space! Inspired by our history of moonshots, from quantum computing to autonomous driving, Project Suncatcher is exploring how we could one day build scalable ML compute systems in space, harnessing more of the sun’s power (which emits more power than 100 trillion times humanity’s total electricity production). Like any moonshot, it’s going to require us to solve a lot of complex engineering challenges. Early research shows our Trillium-generation TPUs (our tensor processing units, purpose-built for AI) survived without damage when tested in a particle accelerator to simulate low-earth orbit levels of radiation. However, significant challenges still remain like thermal management and on-orbit system reliability. More testing and breakthroughs will be needed as we count down to launch two prototype satellites with @planet by early 2027, our next milestone of many. Excited for us to be a part of all the innovation happening in (this) space!









what are you 5 years goal folks





