Soham De

190 posts

Soham De

@sohamde_

Research Scientist at DeepMind. Previously PhD at the University of Maryland.

London, England Katılım Mayıs 2011

1.1K Takip Edilen2.3K Takipçiler

Sabitlenmiş Tweet

Soham De@sohamde_·14 Mar

Just got back from vacation, and super excited to finally release Griffin - a new hybrid LLM mixing RNN layers with Local Attention - scaled up to 14B params! arxiv.org/abs/2402.19427 My co-authors have already posted about our amazing results, so here's a 🧵on how we got there!

English

306

48.5K

Soham De retweetledi

Samuel L Smith@SamuelMLSmith·13 Eki

The Training team @OpenAI is hiring researchers in London 🚀 Our twin missions are to train better LLMs, and serve them more cheaply Get in touch if you are excited to collaborate on architecture design, reliable scaling, and faster optimization

English

490

88.7K

Soham De retweetledi

Jun Cheng@s6juncheng·25 Haz

Excited to share #AlphaGenome, a start of our AlphaGenome named journey to decipher the regulatory genome! The model matches or exceeds top-performing external models on 24 out of 26 variant evaluations, across a wide range of biological modalities.1/6

English

209

913

87.2K

Soham De retweetledi

Antonio Orvieto@orvieto_antonio·3 Haz

We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs. It's surprising how much one can delve into, and how beautiful it can become. With (and only thanks to) the amazing Alexandre and @BachFrancis arxiv.org/pdf/2502.09287

English

171

11.1K

Soham De retweetledi

Vaishnavh Nagarajan@_vaishnavh·2 Haz

📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵

English

168

29.4K

Soham De retweetledi

Brendan O'Donoghue@bodonoghue85·20 May

Excited to share what my team has been working on lately - Gemini diffusion! We bring diffusion to language modeling, yielding more power and blazing speeds! 🚀🚀🚀 Gemini diffusion is especially strong at coding. In this example the model generates at 2000 tokens/sec, including overheads like tokenization, prefill, safety filters etc.

Google DeepMind@GoogleDeepMind

We’ve developed Gemini Diffusion: our state-of-the-art text diffusion model. Instead of predicting text directly, it learns to generate outputs by refining noise, step-by-step. This helps it excel at coding and math, where it can iterate over solutions quickly. #GoogleIO

English

250

2.7K

576.2K

Soham De retweetledi

Lisa Schut@miouantoinette·1 Nis

Excited to share that our paper "Bridging the human–AI knowledge gap through concept discovery and transfer in AlphaZero" is now out in PNAS! With @weballergy, @banburismus_, @demishassabis, @ulrichpaquet, @_beenkim 🎉 📄 doi.org/10.1073/pnas.2…

English

436

101.3K

Soham De@sohamde_·3 Nis

Our new paper sheds light on the process of knowledge acquisition in language models, with implications for - data curricula - the challenges of learning new knowledge when fine-tuning - the emergence of hallucinations. Nicolas did a great job on the project! See his thread👇

Nicolas Zucchet@NicolasZucchet

Large language models store vast amounts of knowledge, but how exactly do they learn it? Excited to share my @GoogleDeepMind internship results, which reveal the fascinating dynamics behind factual knowledge acquisition in LLMs! arxiv.org/abs/2503.21676

English

3.5K

Soham De retweetledi

Google DeepMind@GoogleDeepMind·23 Eki

Today, we’re open-sourcing our SynthID text watermarking tool through an updated Responsible Generative AI Toolkit. Available freely to developers and businesses, it will help them identify their AI-generated content. 🔍 Find out more → goo.gle/40apGQh

English

212

946

407.4K

Soham De retweetledi

Caglar Gulcehre@caglarml·20 Eki

Great contribution from Meta to the research community with a very easy-to-read codebase for LLM development: github.com/facebookresear… @sohamde_ and @SamuelMLSmith have implemented Hawk as well, which seems to have a performance comparable to Mamba.

English

131

14K

Soham De retweetledi

Preetum Nakkiran@PreetumNakkiran·15 Eki

We have an opening for a PhD intern working closely with (among others) me, Arwen Bradley, David Berthelot, on scientific aspects of diffusion & generative models. 1/

English

205

48K

Soham De retweetledi

Google DeepMind@GoogleDeepMind·5 Eyl

We’re presenting AlphaProteo: an AI system for designing novel proteins that bind more successfully to target molecules. 🧬 It could help scientists better understand how biological systems function, save time in research, advance drug design and more. 🧵 dpmd.ai/3XuMqbX

GIF

English

794

2.9K

1.1M

Soham De@sohamde_·3 Eyl

@champydaku The data efficiency comes primarily due to better tuning. We did a lot of work to establish hyperparameter scaling rules for Griffin so we can scale efficiently - we might write this up at some point. We compare diff capabilities in the Griffin paper: arxiv.org/abs/2402.19427

English

Avi Dhaliwal@dhaliwalavis·31 Ağu

@sohamde_ Could you elaborate on how the Griffin architecture enables this data efficiency? Are there specific tasks where you've observed RecurrentGemma excelling or lagging compared to transformer-based models?

English

110

Soham De@sohamde_·29 Ağu

Two months back, we released a 9B RecurrentGemma model, one of the strongest SSM-based language models out there, trained on 2T tokens! I finally updated arXiv with some of our results: arxiv.org/abs/2404.07839 Link to weights and code for our models in thread!

English

225

23.9K

Soham De retweetledi

Gus (🤖🧠+🐍+🥑🗣️)@gusthema·3 Eyl

A new blog post talking about Gemma architecture explained! This time is RecurrentGemma: developers.googleblog.com/en/gemma-expla… This is the Gemma model that is not based in the Transformers architecture but on Recurrent Neural Network! Is this the return of RNNs? #gemmaverse

English

1.1K

Soham De@sohamde_·29 Ağu

Both pre-trained and instruction-tuned models are here: huggingface.co/google/recurre… huggingface.co/google/recurre… Code here: github.com/google-deepmin… And ofc, we have our 2B version of RecurrentGemma as well, released earlier this year! huggingface.co/google/recurre… huggingface.co/google/recurre…

English

724

Soham De retweetledi

Armand Joulin@armandjoulin·1 Ağu

Are small models still undertrained? We are releasing a 2B model that beats GPT-3.5. The crazy part is that it was distill on only 2T tokens from a small model. Distillation is the future of LLMs with the growing availability of large and efficient open models!

English

366

62.6K

Soham De@sohamde_·31 Tem

It was fun to moderate this discussion with a great group of panelists. Lots of interesting points made on how to approach the next gen of seq modelling architectures. Thanks for the invite @caglarml @orvieto_antonio Razvan and others!

Caglar Gulcehre@caglarml

The panel discussion at NGSM workshop going on full steam ahead with a great line of panelists moderated by @sohamde_ ...

English

1.1K

Soham De retweetledi

Caglar Gulcehre@caglarml·26 Tem

@sohamde_ is presenting on SSM architectures and RNNs in NGSM workshop at Strauss 3 #ICML2024.

English

424

Soham De retweetledi

Surya Bhupatiraju@suryabhupa·27 Haz

I am absolutely thrilled to announce the release of Gemma 2! Today, we're releasing both pre-trained-only and fully post-trained 9B and 27B models. The full technical report is here: goo.gle/gemma2report and it's live *right now* on aistudio.google.com.

English

231

26.3K

Soham De retweetledi

Vaibhav (VB) Srivastav@reach_vb·11 Haz

Welcome RecurrentGemma 9B 🔥 > Same performance as Gemma with more than 25% lower latency and 6-7x higher tokens/ sec ⚡ > Base (9B) and Instruct (9B-IT) models released. > MMLU - 60.5, CommonSenseQA 73.2, AGIEval 39.3 - pretty strong base model to fine-tune further. > Based on the Griffin Architecture > Achieves faster inference with long sequences by replacing gloabal attention with local and linear recurrences. > Available in Transformers! 🤗 Massive Kudos to Google for continue open research for alternative architectures! GG!

English

212

37.5K

Keşfet

@OpenAI @BachFrancis @weballergy @banburismus_ @demishassabis @ulrichpaquet @_beenkim @SamuelMLSmith