Pradeep Dasigi

479 posts

Pradeep Dasigi

Pradeep Dasigi

@pdasigi

Research Scientist @allen_ai; #NLProc, Post-training for OLMo

Seattle, WA Katılım Şubat 2009
525 Takip Edilen1.5K Takipçiler
Pradeep Dasigi
Pradeep Dasigi@pdasigi·
@HannaHajishirzi Working with you at Ai2 was a wonderful learning experience. Thank you for your leadership on so many impactful projects!
English
0
0
2
346
Hanna Hajishirzi
Hanna Hajishirzi@HannaHajishirzi·
Life update here: Last week marked the end of my time at Ai2. Proud to have built releases like Olmo, Tülu, FlexOlmo, DRTulu, OLMoTrace, OlmoE, and datasets including Dolma and Dolci—and of how strongly we pushed for open models and open science. Our artifacts reached 33M+ downloads, including ~4M for Olmo 3. I believe Olmo has empowered researchers to push the boundaries of AI I’ll always be cheering on Ai2 and will continue to strongly support open-source, open-science AI. I’m deeply grateful for this chapter and excited for what comes next.
Hanna Hajishirzi tweet media
English
40
25
548
56.9K
Pradeep Dasigi retweetledi
Pradeep Dasigi retweetledi
Nathan Lambert
Nathan Lambert@natolambert·
Excited to share the latest Olmo model: Olmo Hybrid. This is a model with gated delta net (GDN) layers in a 3:1 ratio with full attention. It follows lots of other developments like Qwen 3.5 and Kimi Linear. It's incredible timing to release a fully open model so people can study how these architecture changes impact the full stack. Personally, I learned a lot in making the post-training work. Even with the data being identical for pretraining, post-training is very different! In particular, the OSS tools for these new architectures is really limited. New architectures are much slower than standard transformers or popular models like DeepSeek MoEs. This is work that we can do together to keep pushing the frontier of efficient, open models. This work was led by @lambdaviking @tyleraromero and others. I got to play a smaller part in making post-training work, super fun project! I've written up a blog post that explains why this matters and hybrid models didn't work a few years ago when Mamba was super popular. Plus, this paper is a great entry point for modern deep learning / language modeling scaling theory. Enjoy and send feedback!
Nathan Lambert tweet media
English
18
72
497
76.3K
Pradeep Dasigi retweetledi
Ai2
Ai2@allen_ai·
Introducing Olmo Hybrid, a 7B fully open model combining transformer and linear RNN layers. It decisively outperforms Olmo 3 7B across evals, w/ new theory & scaling experiments explaining why. 🧵
Ai2 tweet media
English
17
127
785
169.2K
Akari Asai
Akari Asai@AkariAsai·
Thrilled to share: OpenScholar - our work on scientific deep research agents for reliable literature synthesis -has been accepted to Nature! 🎉 Huge thanks to collaborators across institutions who made this possible!
Akari Asai tweet media
English
33
227
1.3K
126.1K
Pradeep Dasigi retweetledi
Ai2
Ai2@allen_ai·
Introducing Ai2 Open Coding Agents—starting with SERA, our first-ever coding models. Fast, accessible agents (8B–32B) that adapt to any repo, including private codebases. Train a powerful specialized agent for as little as ~$400, & it works with Claude Code out of the box. 🧵
Ai2 tweet media
English
42
141
929
348K
Saurabh Shah
Saurabh Shah@saurabh_shah2·
I’ve joined humans&! My last blog post explains why I think a human-centric approach is the missing piece in modern AI systems. I’m super psyched about the technical direction of the company. Perhaps even more important, though, is the team; the humans at humans&. My coworkers are completely and wholly wonderful. They’re brilliant, yes, but they’re also kind, funny, focused, and just about every other good adjective I can think of. Put simply: vibes are goooood. We’re bringing together wonderful people united by a much-needed mission to build something truly different. If that excites you, I’d love to chat.
humans&@humansand

Today we introduce humans&, a human-centric frontier AI lab. We believe AI can be reimagined, centering around people and their relationships with each other. At its best, AI should serve as a deeper connective tissue that strengthens organizations and communities

English
40
8
231
40.7K
Pradeep Dasigi
Pradeep Dasigi@pdasigi·
I have been trying out Telugu queries on the Indic LLM Arena over the last few days and most of the responses are surprisingly bad, with lots of hallucinations and sometimes even grammatical errors, even from strong (in English) models. Clearly there is a huge gap between English and Indian language capabilities, and evaluating this is very important. Do contribute if you care about making LLMs work for Indian languages.
AI4Bharat@ai4bharat

For AI to be truly inclusive, it must understand more than just grammar—it must understand context. @AI4Bharat at @iitmadras had launched the Indic LLM Arena. This isn't just another leaderboard; it’s a public utility for: ✅ Developers: Test your models against real-world Indian use cases. ✅ Enterprises: Find out which LLM actually resonates with your customers in rural India. ✅ Sovereignty: Building AI that respects our social fabric and safety norms. Be a part of this movement. Try the Arena today and help us rank the models that will power India's digital future. 👉 ai4bharat.iitm.ac.in/blog/indic-llm… #GenerativeAI #DigitalIndia #IITMadras #IndicLLM #indiaaiimpactsummit2026 @MiteshKhapra @anoopk @prajdabre @ravi_iitm @partha_p_t @ManishGuptaMG1 @meghtweets @dineshteewari1 @abapna @WSAI_IITM @OfficialINDIAai @EkStep_Org @PeoplePlusAI

English
1
1
30
3.4K
Pradeep Dasigi retweetledi
Wenting Zhao
Wenting Zhao@wzhao_nlp·
🌶️ Some (perhaps) spicy thoughts. It’s been a while since my last tweet, but I wanted to write about how disorienting it has been from academia to an LLM lab 😅 The kind of research I was trained to do during my PhD almost doesn’t exist here. The obsession with mathematical elegance and novelty is mostly gone. Everything is about scaling data and compute. For a while, that really got to me. At my lowest point, I felt like I’d lost interest in building LLMs altogether. I didn’t feel intellectually challenged anymore. What made this even stranger was that, at a technical level, things worked. If there was a capability I wanted to teach a model, scaling the right data and compute always got me there, no exception (so far). But recently, I found a way to reconcile with myself.. I realized the real competition isn’t in the ML recipe anymore. Most teams do roughly the same thing. What actually matters is how fast you can iterate, test ideas, and recover from mistakes. And that speed is mostly backed by infrastructure 🏗️ Faster loops, fewer bugs, better tooling. Seeing this made me excited again! Infra is its own deep, hard, and intellectually fun problem space. In 2026, I want to become an ML researcher who’s really good at infra. And I'll come back to ML problems with that edge, and will be excited to share what I find 😌
English
63
114
1.9K
201.5K
Pradeep Dasigi retweetledi
Ai2
Ai2@allen_ai·
SciArena update: our Olmo 3.1 32B Instruct scores 963.6 Elo overall at just $0.17/100 calls—ahead of OpenAI’s GPT-OSS-20B. In Engineering, it hits 1039.2 Elo, only 2.5 behind GPT-OSS-120B—a model ~4× its size. 🧵
English
1
3
14
1.9K
Pradeep Dasigi retweetledi
Partha Talukdar
Partha Talukdar@partha_p_t·
Indic LLM Arena needs you! 🇮🇳 ​Try out which LLM works best for your Indic language queries and vote for the winner! ​ arena.ai4bharat.org
AI4Bharat@ai4bharat

For AI to be truly inclusive, it must understand more than just grammar—it must understand context. @AI4Bharat at @iitmadras had launched the Indic LLM Arena. This isn't just another leaderboard; it’s a public utility for: ✅ Developers: Test your models against real-world Indian use cases. ✅ Enterprises: Find out which LLM actually resonates with your customers in rural India. ✅ Sovereignty: Building AI that respects our social fabric and safety norms. Be a part of this movement. Try the Arena today and help us rank the models that will power India's digital future. 👉 ai4bharat.iitm.ac.in/blog/indic-llm… #GenerativeAI #DigitalIndia #IITMadras #IndicLLM #indiaaiimpactsummit2026 @MiteshKhapra @anoopk @prajdabre @ravi_iitm @partha_p_t @ManishGuptaMG1 @meghtweets @dineshteewari1 @abapna @WSAI_IITM @OfficialINDIAai @EkStep_Org @PeoplePlusAI

English
1
4
23
1.6K
Pradeep Dasigi retweetledi
Ai2
Ai2@allen_ai·
Olmo 3.1 32B Instruct is now on @openrouter, hosted by @DeepInfra. Built for real-world use: reliable instruction following & function calling for agentic workflows + research. Fully open & leading benchmark performance, ready to plug into your stack. 👇
Ai2 tweet media
English
3
4
33
8.3K
Pradeep Dasigi retweetledi
DeepInfra
DeepInfra@DeepInfra·
Now hosting @allen_ai Olmo-3.1-32B-Instruct on DeepInfra. Designed for solid reasoning and clean instruction following - great for research workflows. $0.20 in / $0.60 out per Mtoken
DeepInfra tweet media
English
2
2
6
579
Pradeep Dasigi retweetledi
Ai2
Ai2@allen_ai·
Now you can use our most powerful models via API. Olmo 3.1 32B Think, our reasoning model for complex problems, is on @openrouter—free through 12/22. And Olmo 3.1 32B Instruct, our flagship chat model with tool use, is available through @huggingface Inference Providers. 👇
Ai2 tweet media
English
5
10
118
16.8K
Pradeep Dasigi retweetledi
Kyle Lo
Kyle Lo@kylelostat·
olmo 3 paper finally on arxiv 🫡 thx to our teammates esp folks who chased additional baselines thx to arxiv-latex-cleaner and overleaf feature for chasing latex bugs thx for all the helpful discussions after our Nov release, best part of open science is progressing together!
Kyle Lo tweet media
English
12
99
467
55.2K
Pradeep Dasigi retweetledi
Kyle Lo
Kyle Lo@kylelostat·
lol so during neurips, we kept the RL run going and the model kept getting better 😂 Olmo 3.1 is a.. 🐡 32B Thinking, still best fully-open model to-date 🐠 32B Instruct, for ppl who hate long yapping, as good as qwen3 we added like 10 more pages to the paper too! thx for community feedback from convos at neurips: 🐟 more on our eval ideology 🦈 more baselines 🍣 more about RL Zero etc we picked final model (internally called moonlit surfer 🌛🏄) not just on bench scores but good vibes 🥰
Kyle Lo tweet media
Ai2@allen_ai

Olmo 3.1 is here. We extended our strongest RL run and scaled our instruct recipe to 32B—releasing Olmo 3.1 Think 32B & Olmo 3.1 Instruct 32B, our most capable models yet. 🧵

English
2
26
146
19.8K