Reece Shuttleworth

17 posts

Reece Shuttleworth banner
Reece Shuttleworth

Reece Shuttleworth

@ReeceShuttle

Inception | MIT '25

Palo Alto, CA Katılım Temmuz 2022
83 Takip Edilen380 Takipçiler
Reece Shuttleworth retweetledi
Stefano Ermon
Stefano Ermon@StefanoErmon·
Mercury 2 is live 🚀🚀 The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs. Watching the team turn years of research into a real product never gets old, and I’m incredibly proud of what we’ve built. We’re just getting started on what diffusion can do for language.
English
320
587
4.2K
976.1K
Reece Shuttleworth retweetledi
Reece Shuttleworth retweetledi
Stefano Ermon
Stefano Ermon@StefanoErmon·
@beffjezos It already ate images and videos. Language just took a little longer to chew.
English
2
1
25
1K
Reece Shuttleworth retweetledi
Inception
Inception@_inception_ai·
The more structure a language has, the faster diffusion can run. Code fits that profile. Code has plenty of it. Listen to @justkharbanda on how diffusion unlocks speed for real-world coding workloads. #Diffusion #AIInfrastructure #DeveloperTools
English
0
6
28
4.3K
Reece Shuttleworth retweetledi
Julia Turc
Julia Turc@juliarturc·
Diffusion clicked for me when I read about score-based models, a line of work pioneered by @StefanoErmon (et al.) at Stanford. So it was a full-circle moment to collab with him and @_inception_ai on a video about training & sampling techniques for making diffusion LLMs faster.
English
5
30
166
26.6K
Reece Shuttleworth retweetledi
Inception
Inception@_inception_ai·
"It's now or never." Our CEO @stefanoermon on why he started Inception – featured this week in @WSJ. We're building dLLMs that generate tokens in parallel. Faster. More efficient. More controllable. This is the moment. Thanks for the coverage @KateClarkTweets wsj.com/tech/ai/these-…
Inception tweet media
English
1
10
29
3.1K
Reece Shuttleworth retweetledi
Elon Musk
Elon Musk@elonmusk·
Diffusion will obviously work on any bitstream. With text, since humans read from first word to last, there is just the question of whether the delay to first sentence for diffusion is worth it. That said, the vast majority of AI workload will be video understanding and generation, so good chance diffusion is the biggest winner overall. Also means that the ratio of compute to memory bandwidth will increase.
English
129
190
2.3K
581.1K
Reece Shuttleworth
Reece Shuttleworth@ReeceShuttle·
Really cool to see @thinkymachines exploring similar ideas around LoRA recently! Check out our paper to see our other detailed investigations of diverse topics: How do LoRA initialization and learning rate impact learning? What role does LoRA’s alpha parameter and the product-of-matrices parameterization play in training dynamics observed? Plus mathematical explanations of this phenomenon and more!
English
1
0
15
3.3K
Reece Shuttleworth
Reece Shuttleworth@ReeceShuttle·
🧵 LoRA vs full fine-tuning: same performance ≠ same solution. Our NeurIPS ‘25 paper 🎉shows that LoRA and full fine-tuning, even when equally well fit, learn structurally different solutions and that LoRA forgets less and can be made even better (lesser forgetting) by a simple intervention! Read on for behavioral differences (forgetting, continual learning) and other analysis! Paper: arxiv.org/pdf/2410.21228 (1/7)
Reece Shuttleworth tweet media
English
19
253
1.6K
191.4K
Reece Shuttleworth retweetledi
Vedang Lad
Vedang Lad@vedanglad·
1/7 Wondered what happens when you permute the layers of a language model? In our recent paper with @tegmark, we swap and delete entire layers to understand how models perform inference - in doing so we see signs of four universal stages of inference!
Vedang Lad tweet media
English
21
89
547
120.4K