
We're releasing a technical report describing how Composer 2 was trained.
J.Nathan Yan
270 posts

@NathanYan2012
Research Scientist @GoogleDeepMind. Ph.D. from @CornellCIS/@cornell_tech. I bake my own opinions.

We're releasing a technical report describing how Composer 2 was trained.






🚀Excited to introduce our recent work @ AppleMLR -- DART: Denoising AutoRegressive Transformer for Scalable Text-to-Image Generation! A transformer-based model that unifies Autoregressive and Diffusion with a non-Markovian diffusion framework: 🔗 arxiv.org/abs/2410.08159 (1/n)


probably obvious at this point, but if you work in ai you should seriously consider learning chinese



Linear Attention and Beyond: Interactive Tutorial with Songlin Yang (@SonglinYang4 MIT/Flash Linear Attention) I didn’t follow some of the recent results, so I zoomed Songlin and she explained it all to me for two hours 😂 youtu.be/d0HJvGSWw8A






🚀Excited to introduce our recent work @ AppleMLR -- DART: Denoising AutoRegressive Transformer for Scalable Text-to-Image Generation! A transformer-based model that unifies Autoregressive and Diffusion with a non-Markovian diffusion framework: 🔗 arxiv.org/abs/2410.08159 (1/n)






Introducing Gemini 2.0 Flash Thinking, an experimental model that explicitly shows its thoughts. Built on 2.0 Flash’s speed and performance, this model is trained to use thoughts to strengthen its reasoning. And we see promising results when we increase inference time computation!