Soham De

190 posts

Soham De banner
Soham De

Soham De

@sohamde_

Research Scientist at DeepMind. Previously PhD at the University of Maryland.

London, England Katılım Mayıs 2011
1.1K Takip Edilen2.3K Takipçiler
Sabitlenmiş Tweet
Soham De
Soham De@sohamde_·
Just got back from vacation, and super excited to finally release Griffin - a new hybrid LLM mixing RNN layers with Local Attention - scaled up to 14B params! arxiv.org/abs/2402.19427 My co-authors have already posted about our amazing results, so here's a 🧵on how we got there!
English
12
65
306
48.5K
Soham De retweetledi
Samuel L Smith
Samuel L Smith@SamuelMLSmith·
The Training team @OpenAI is hiring researchers in London 🚀 Our twin missions are to train better LLMs, and serve them more cheaply Get in touch if you are excited to collaborate on architecture design, reliable scaling, and faster optimization
English
11
38
490
88.7K
Soham De retweetledi
Jun Cheng
Jun Cheng@s6juncheng·
Excited to share #AlphaGenome, a start of our AlphaGenome named journey to decipher the regulatory genome! The model matches or exceeds top-performing external models on 24 out of 26 variant evaluations, across a wide range of biological modalities.1/6
Jun Cheng tweet media
English
14
209
913
87.2K
Soham De retweetledi
Antonio Orvieto
Antonio Orvieto@orvieto_antonio·
We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs. It's surprising how much one can delve into, and how beautiful it can become. With (and only thanks to) the amazing Alexandre and @BachFrancis arxiv.org/pdf/2502.09287
Antonio Orvieto tweet media
English
2
42
171
11.1K
Soham De retweetledi
Vaishnavh Nagarajan
Vaishnavh Nagarajan@_vaishnavh·
📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵
Vaishnavh Nagarajan tweet media
English
1
42
168
29.4K
Soham De retweetledi
Brendan O'Donoghue
Brendan O'Donoghue@bodonoghue85·
Excited to share what my team has been working on lately - Gemini diffusion! We bring diffusion to language modeling, yielding more power and blazing speeds! 🚀🚀🚀 Gemini diffusion is especially strong at coding. In this example the model generates at 2000 tokens/sec, including overheads like tokenization, prefill, safety filters etc.
Google DeepMind@GoogleDeepMind

We’ve developed Gemini Diffusion: our state-of-the-art text diffusion model. Instead of predicting text directly, it learns to generate outputs by refining noise, step-by-step. This helps it excel at coding and math, where it can iterate over solutions quickly. #GoogleIO

English
94
250
2.7K
576.2K
Soham De
Soham De@sohamde_·
Our new paper sheds light on the process of knowledge acquisition in language models, with implications for - data curricula - the challenges of learning new knowledge when fine-tuning - the emergence of hallucinations. Nicolas did a great job on the project! See his thread👇
Nicolas Zucchet@NicolasZucchet

Large language models store vast amounts of knowledge, but how exactly do they learn it? Excited to share my @GoogleDeepMind internship results, which reveal the fascinating dynamics behind factual knowledge acquisition in LLMs! arxiv.org/abs/2503.21676

English
1
6
36
3.5K
Soham De retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
Today, we’re open-sourcing our SynthID text watermarking tool through an updated Responsible Generative AI Toolkit. Available freely to developers and businesses, it will help them identify their AI-generated content. 🔍 Find out more → goo.gle/40apGQh
English
27
212
946
407.4K
Soham De retweetledi
Preetum Nakkiran
Preetum Nakkiran@PreetumNakkiran·
We have an opening for a PhD intern working closely with (among others) me, Arwen Bradley, David Berthelot, on scientific aspects of diffusion & generative models. 1/
English
4
37
205
48K
Soham De retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
We’re presenting AlphaProteo: an AI system for designing novel proteins that bind more successfully to target molecules. 🧬 It could help scientists better understand how biological systems function, save time in research, advance drug design and more. 🧵 dpmd.ai/3XuMqbX
GIF
English
67
794
2.9K
1.1M
Soham De
Soham De@sohamde_·
@champydaku The data efficiency comes primarily due to better tuning. We did a lot of work to establish hyperparameter scaling rules for Griffin so we can scale efficiently - we might write this up at some point. We compare diff capabilities in the Griffin paper: arxiv.org/abs/2402.19427
English
0
0
2
85
Avi Dhaliwal
Avi Dhaliwal@dhaliwalavis·
@sohamde_ Could you elaborate on how the Griffin architecture enables this data efficiency? Are there specific tasks where you've observed RecurrentGemma excelling or lagging compared to transformer-based models?
English
1
0
2
110
Soham De
Soham De@sohamde_·
Two months back, we released a 9B RecurrentGemma model, one of the strongest SSM-based language models out there, trained on 2T tokens! I finally updated arXiv with some of our results: arxiv.org/abs/2404.07839 Link to weights and code for our models in thread!
Soham De tweet media
English
5
30
225
23.9K
Soham De retweetledi
Armand Joulin
Armand Joulin@armandjoulin·
Are small models still undertrained? We are releasing a 2B model that beats GPT-3.5. The crazy part is that it was distill on only 2T tokens from a small model. Distillation is the future of LLMs with the growing availability of large and efficient open models!
Armand Joulin tweet media
English
10
39
366
62.6K
Soham De retweetledi
Surya Bhupatiraju
Surya Bhupatiraju@suryabhupa·
I am absolutely thrilled to announce the release of Gemma 2! Today, we're releasing both pre-trained-only and fully post-trained 9B and 27B models. The full technical report is here: goo.gle/gemma2report and it's live *right now* on aistudio.google.com.
English
21
47
231
26.3K
Soham De retweetledi
Vaibhav (VB) Srivastav
Vaibhav (VB) Srivastav@reach_vb·
Welcome RecurrentGemma 9B 🔥 > Same performance as Gemma with more than 25% lower latency and 6-7x higher tokens/ sec ⚡ > Base (9B) and Instruct (9B-IT) models released. > MMLU - 60.5, CommonSenseQA 73.2, AGIEval 39.3 - pretty strong base model to fine-tune further. > Based on the Griffin Architecture > Achieves faster inference with long sequences by replacing gloabal attention with local and linear recurrences. > Available in Transformers! 🤗 Massive Kudos to Google for continue open research for alternative architectures! GG!
Vaibhav (VB) Srivastav tweet media
English
8
45
212
37.5K