Prasann Singhal (@prasann_singhal) - Twitter Profili

Prasann Singhal retweetledi

Imagine you fully post-trained "YourModel v1". Then, you've got better data — math, code, tool use, safety — and you want to improve it. Today, that usually means retraining the whole model. But what if new data could be added modularly, with a fixed cost each time?

Ai2@allen_ai

Last year, we introduced FlexOlmo, a novel way to train parts of a model independently then combine them later. BAR builds on that idea for a harder problem: how to keep improving a model without having to retrain each time. 🧵

English

5

18

140

19.4K

Prasann Singhal retweetledi

Amanda Bertsch@abertsch72·1 May

New paper! allenai.org/papers/olmpool This tackles a puzzle we found during the training of Olmo 3: how could two models with nearly identical short-context performance (and trained on the same data!) behave completely differently after long context extension?

Ai2@allen_ai

Recipes for teaching language models to handle long inputs don't work equally well across model families. We wanted to know why—is it the architecture, the training data, or both? 🧵

English

3

28

111

15.1K

Prasann Singhal@prasann_singhal·8 May

I love this paper so much! Results in Figure 3 / 5, and Appendix B.2 are in particular my favorites. "Emergent modularity" is such a cool concept. I think this paper represents a really fresh / cool style of pre-training work which I'm hoping to see a lot more of in coming years!

Ryan Yixiang Wang@RyanYixiang

MoEs are everywhere in frontier models, and they are deployed as a monolith system. But many applications only need a narrow slice of capabilities, e.g., math, code, biomedical, etc. So what if "modularity" is actually the missing opportunity for MoEs? Today, we're releasing EMO: an end-to-end pretrained MoE where modularity emerges naturally, enabling selective use of experts!

English

0

1

9

1.9K

Prasann Singhal retweetledi

Ryan Yixiang Wang@RyanYixiang·8 May

MoEs are everywhere in frontier models, and they are deployed as a monolith system. But many applications only need a narrow slice of capabilities, e.g., math, code, biomedical, etc. So what if "modularity" is actually the missing opportunity for MoEs? Today, we're releasing EMO: an end-to-end pretrained MoE where modularity emerges naturally, enabling selective use of experts!

Ai2@allen_ai

Today we’re releasing EMO, a new mixture-of-experts (MoE) model trained so modular structure emerges directly from data without human-defined priors. EMO can use a small subset of its experts for a given task while keeping near full-model performance. 🧵

English

7

73

532

112.7K

Prasann Singhal retweetledi

Zayne Sprague ✈️ ICLR Rio@ZayneSprague·15 Nis

SkillFactory will be at ICLR this year! We study how self-distillation can create synthetic data that primes models for RLVR through SFT. We built a recipe for teaching your model new capabilities it doesn’t have yet, matching and sometimes outperforming teacher distillation.

Zayne Sprague ✈️ ICLR Rio@ZayneSprague

RL amplifies existing behaviors. Let’s prime models w/ good behaviors for better RL! Introducing SkillFactory: ✂️Rearrange model traces on a problem to demo verification + retry ⚙️SFT on those traces 🦾RL Result: Learn robust explicit verification + retry across domains 🧵

English

1

7

37

9.8K

Prasann Singhal retweetledi

Greg Durrett@gregd_nlp·6 Nis

Great tool for experiment management with Claude Code. Give it a try! Zayne's post also shares insights about how this accelerates the research process. Autoresearch doesn't replace the researcher, but lets them focus more on the actual research and less on the annoying stuff!

Zayne Sprague ✈️ ICLR Rio@ZayneSprague

x.com/i/article/2039…

English

0

1

18

2.7K

Prasann Singhal retweetledi

Mohit Iyyer@MohitIyyer·27 Mar

I spent the last week reading novels generated by Claude Code & Codex on our new AutoFiction platform and I... actually like some of them! For context, I've worked on creative language generation for ~10 years (😅), time mostly spent on getting models to produce a few relevant paragraphs that didn't repeat the same word over and over again. I def did not expect coherent 100K-word novels this soon!! My current favorites: - "The Wrong Mouth" (dark fantasy about a conman who "eats" a paladin and assassin and gains their skills) - "Keeper League" (comic murder mystery set at a fantasy football draft) - "Thank You for Your Patience" (horror/thriller about 30 post-training researchers locked in with a very polite rogue AI). The novels aren't perfect: mid-book pacing need work, character descriptions sometimes repeat, and character voices flatten. But they're structurally coherent and fun to read, and I'm curious to see how they improve with model/harness advances!

English

1

5

34

2.7K

Prasann Singhal retweetledi

Greg Durrett@gregd_nlp·25 Mar

ICML reviews have you considering this? Please look at our final submission instructions for COLM below!

Conference on Language Modeling@COLM_conf

~45 hours until the abstract deadline! Submit abstracts on OpenReview by 3/26 11:59pm AOE, full papers 3/31. Final reminders & instructions for COLM are below (link in thread). Note that as of the March 31 deadline, papers must not be under review for ICML or committed to ACL.

English

0

9

77

18.6K

Prasann Singhal retweetledi

Manya Wadhwa@ManyaWadhwa1·13 Mar

⚛️ Introducing CREATE, a benchmark for creative associative reasoning in LLMs. Making novel, meaningful connections is key for scientific & creative works. We objectively measure how well LLMs can do this. 🧵👇

English

4

43

144

21.8K

Prasann Singhal retweetledi

Greg Durrett@gregd_nlp·13 Mar

Check out Manya's benchmark for LLM creativity! Inspired by work on creativity in graphs (@AdtRaghunathan's "roll the dice" paper), CREATE isolates testing of creative insights for discovery. Future: understand how LLMs derive insights & how they can be better creative partners!

Manya Wadhwa@ManyaWadhwa1

⚛️ Introducing CREATE, a benchmark for creative associative reasoning in LLMs. Making novel, meaningful connections is key for scientific & creative works. We objectively measure how well LLMs can do this. 🧵👇

English

0

13

57

7.9K

Prasann Singhal retweetledi

Jacob Steinhardt@JacobSteinhardt·18 Şub

New blog post:"Building Technology to Drive AI Governance". I argue that many governance challenges are fundamentally bottlenecked by technical gaps, and consider case studies from other fields (food safety, climate change) that illustrate this dynamic.

English

4

29

121

15.7K

Prasann Singhal retweetledi

Jiacheng Liu@liujc1998·26 Oca

Calling on behalf of infini-gram: does anyone know where I can get / apply for AWS credits? 💸💸 Keeping infini-gram alive costs quite some money, mostly SSD rental. If you're a fan of keeping open LLM training data readily inspectable, please reply / DM me some pointers! 🧵1/4

English

3

16

28

3.6K

Prasann Singhal retweetledi

Zayne Sprague ✈️ ICLR Rio@ZayneSprague·4 Ara

RL amplifies existing behaviors. Let’s prime models w/ good behaviors for better RL! Introducing SkillFactory: ✂️Rearrange model traces on a problem to demo verification + retry ⚙️SFT on those traces 🦾RL Result: Learn robust explicit verification + retry across domains 🧵

English

2

26

69

21.1K

Prasann Singhal retweetledi

Greg Durrett@gregd_nlp·4 Ara

Check out SkillFactory! Priming LLMs with SFT before RL is pretty cheap and lets models learn cognitive skills from RL more effectively. And adding this inductive bias via SFT data is nicely compatible with the bitter lesson!

Zayne Sprague ✈️ ICLR Rio@ZayneSprague

RL amplifies existing behaviors. Let’s prime models w/ good behaviors for better RL! Introducing SkillFactory: ✂️Rearrange model traces on a problem to demo verification + retry ⚙️SFT on those traces 🦾RL Result: Learn robust explicit verification + retry across domains 🧵

English

2

18

91

10.4K

Prasann Singhal retweetledi

Greg Durrett@gregd_nlp·12 Oca

Still accepting applications for this postdoc position in my lab at NYU! Applications due Feb 1. Links to posting and Interfolio application in thread.

Greg Durrett@gregd_nlp

📢 Postdoc position 📢 I’m recruiting a postdoc for my lab at NYU! Topics include LM reasoning, creativity, limitations of scaling, AI for science, & more! Apply by Feb 1. (Different from NYU Faculty Fellows, which are also great but less connected to my lab.) Link in 🧵

English

1

11

24

4.5K

Prasann Singhal retweetledi

Lisa Dunlap@lisabdunlap·14 Oca

🌟NEW PAPER🌟 Do you know that changing a visual marker from red to blue can completely reorder VLM leaderboards? In our most recent work, we explore the fragility of visually prompted benchmarks. lisadunlap.github.io/vpbench/

English

6

38

219

47.7K

Prasann Singhal retweetledi

Jiacheng Liu@liujc1998·16 Ara

Belated update: I defended my PhD last month! I am tremendously grateful to my advisors, @HannaHajishirzi and @YejinChoinka. Without their incredible support, I wouldn’t have had so much fun exploring bold ideas, like taking a journey into the ocean of LLM pretraining data. 🥰🥰

English

39

10

308

20.8K

Prasann Singhal retweetledi

Greg Durrett@gregd_nlp·16 Ara

Submit to COLM! Deadline of March 31. This llama gets to enjoy his holidays and isn't stressed out just yet...

Conference on Language Modeling@COLM_conf

COLM 2026 is just around the corner! Mark your calendars for: 💡Abstract deadline: Thursday, March 26, 2026 📄Full paper submission deadline: Tuesday, March 31, 2026 Call for papers in thread (website coming soon).

English

0

12

60

7K

Prasann Singhal retweetledi

Lisa Dunlap@lisabdunlap·15 Ara

🧵Tired of scrolling through your horribly long model traces in VSCode to figure out why your model failed? We made StringSight to fix this: an automated pipeline for analyzing your model outputs at scale. ➡️Demo: stringsight.com ➡️Blog: blog.stringsight.com

English

3

37

92

28K

Prasann Singhal retweetledi

Sewon Min@sewon__min·12 Ara

Really excited about this work!! As a retrieval person, having a pre-training-scale retrieval index in an academic setting has long been a dream, and I thought it would be too difficult / infeasible. Collaborating with systems experts made it possible much earlier than I expected. Huge thanks to the students driving this: @YichuanM and @jinjianliuu !

Yichuan Wang@YichuanM

(1/N) 🚀 DS-Serve is a framework for efficient, scalable neural retrieval — it turns any in-house dataset (<1T tokens) into a high-throughput (up to 10,000 QPS), low-latency (<100ms), memory-efficient (<200GB RAM) retrieval system with a web UI and API. With DS-Serve, we publicly deployed a 400B-token datastore of high-quality LLM pretraining data (2B vectors), spanning academic resources — and it matches commercial search endpoints on our benchmarks at extremely low latency and high throughput. Try it out: api.ds-serve.org:30888/ui Blog: berkeley-large-rag.github.io/RAG-DS-Serve Work from UC Berkeley ( @BerkeleyNLP & @BerkeleySky) with collaborators at UW & UIUC!

English

5

16

120

23.3K

Prasann Singhal

Keşfet