Prasann Singhal

133 posts

Prasann Singhal

Prasann Singhal

@prasann_singhal

1st-year #NLProc PhD at UC Berkeley working with @sewon__min / @JacobSteinhardt , formerly advised by @gregd_nlp

Berkeley, California Katılım Ocak 2021
780 Takip Edilen325 Takipçiler
Prasann Singhal retweetledi
Sanjay Adhikesaven
Sanjay Adhikesaven@sadhikesaven·
Imagine you fully post-trained "YourModel v1". Then, you've got better data — math, code, tool use, safety — and you want to improve it. Today, that usually means retraining the whole model. But what if new data could be added modularly, with a fixed cost each time?
Sanjay Adhikesaven tweet media
Ai2@allen_ai

Last year, we introduced FlexOlmo, a novel way to train parts of a model independently then combine them later. BAR builds on that idea for a harder problem: how to keep improving a model without having to retrain each time. 🧵

English
5
18
140
19.4K
Prasann Singhal retweetledi
Amanda Bertsch
Amanda Bertsch@abertsch72·
New paper! allenai.org/papers/olmpool This tackles a puzzle we found during the training of Olmo 3: how could two models with nearly identical short-context performance (and trained on the same data!) behave completely differently after long context extension?
Ai2@allen_ai

Recipes for teaching language models to handle long inputs don't work equally well across model families. We wanted to know why—is it the architecture, the training data, or both? 🧵

English
3
28
111
15.1K
Prasann Singhal
Prasann Singhal@prasann_singhal·
I love this paper so much! Results in Figure 3 / 5, and Appendix B.2 are in particular my favorites. "Emergent modularity" is such a cool concept. I think this paper represents a really fresh / cool style of pre-training work which I'm hoping to see a lot more of in coming years!
Ryan Yixiang Wang@RyanYixiang

MoEs are everywhere in frontier models, and they are deployed as a monolith system. But many applications only need a narrow slice of capabilities, e.g., math, code, biomedical, etc. So what if "modularity" is actually the missing opportunity for MoEs? Today, we're releasing EMO: an end-to-end pretrained MoE where modularity emerges naturally, enabling selective use of experts!

English
0
1
9
1.9K
Prasann Singhal retweetledi
Ryan Yixiang Wang
Ryan Yixiang Wang@RyanYixiang·
MoEs are everywhere in frontier models, and they are deployed as a monolith system. But many applications only need a narrow slice of capabilities, e.g., math, code, biomedical, etc. So what if "modularity" is actually the missing opportunity for MoEs? Today, we're releasing EMO: an end-to-end pretrained MoE where modularity emerges naturally, enabling selective use of experts!
Ryan Yixiang Wang tweet media
Ai2@allen_ai

Today we’re releasing EMO, a new mixture-of-experts (MoE) model trained so modular structure emerges directly from data without human-defined priors. EMO can use a small subset of its experts for a given task while keeping near full-model performance. 🧵

English
7
73
532
112.7K
Prasann Singhal retweetledi
Zayne Sprague ✈️ ICLR Rio
Zayne Sprague ✈️ ICLR Rio@ZayneSprague·
SkillFactory will be at ICLR this year! We study how self-distillation can create synthetic data that primes models for RLVR through SFT. We built a recipe for teaching your model new capabilities it doesn’t have yet, matching and sometimes outperforming teacher distillation.
Zayne Sprague ✈️ ICLR Rio@ZayneSprague

RL amplifies existing behaviors. Let’s prime models w/ good behaviors for better RL! Introducing SkillFactory: ✂️Rearrange model traces on a problem to demo verification + retry ⚙️SFT on those traces 🦾RL Result: Learn robust explicit verification + retry across domains 🧵

English
1
7
37
9.8K
Prasann Singhal retweetledi
Greg Durrett
Greg Durrett@gregd_nlp·
Great tool for experiment management with Claude Code. Give it a try! Zayne's post also shares insights about how this accelerates the research process. Autoresearch doesn't replace the researcher, but lets them focus more on the actual research and less on the annoying stuff!
Zayne Sprague ✈️ ICLR Rio@ZayneSprague

x.com/i/article/2039…

English
0
1
18
2.7K
Prasann Singhal retweetledi
Mohit Iyyer
Mohit Iyyer@MohitIyyer·
I spent the last week reading novels generated by Claude Code & Codex on our new AutoFiction platform and I... actually like some of them! For context, I've worked on creative language generation for ~10 years (😅), time mostly spent on getting models to produce a few relevant paragraphs that didn't repeat the same word over and over again. I def did not expect coherent 100K-word novels this soon!! My current favorites: - "The Wrong Mouth" (dark fantasy about a conman who "eats" a paladin and assassin and gains their skills) - "Keeper League" (comic murder mystery set at a fantasy football draft) - "Thank You for Your Patience" (horror/thriller about 30 post-training researchers locked in with a very polite rogue AI). The novels aren't perfect: mid-book pacing need work, character descriptions sometimes repeat, and character voices flatten. But they're structurally coherent and fun to read, and I'm curious to see how they improve with model/harness advances!
English
1
5
34
2.7K
Prasann Singhal retweetledi
Prasann Singhal retweetledi
Manya Wadhwa
Manya Wadhwa@ManyaWadhwa1·
⚛️ Introducing CREATE, a benchmark for creative associative reasoning in LLMs. Making novel, meaningful connections is key for scientific & creative works. We objectively measure how well LLMs can do this. 🧵👇
Manya Wadhwa tweet media
English
4
43
144
21.8K
Prasann Singhal retweetledi
Greg Durrett
Greg Durrett@gregd_nlp·
Check out Manya's benchmark for LLM creativity! Inspired by work on creativity in graphs (@AdtRaghunathan's "roll the dice" paper), CREATE isolates testing of creative insights for discovery. Future: understand how LLMs derive insights & how they can be better creative partners!
Manya Wadhwa@ManyaWadhwa1

⚛️ Introducing CREATE, a benchmark for creative associative reasoning in LLMs. Making novel, meaningful connections is key for scientific & creative works. We objectively measure how well LLMs can do this. 🧵👇

English
0
13
57
7.9K
Prasann Singhal retweetledi
Jacob Steinhardt
Jacob Steinhardt@JacobSteinhardt·
New blog post:"Building Technology to Drive AI Governance". I argue that many governance challenges are fundamentally bottlenecked by technical gaps, and consider case studies from other fields (food safety, climate change) that illustrate this dynamic.
Jacob Steinhardt tweet media
English
4
29
121
15.7K
Prasann Singhal retweetledi
Jiacheng Liu
Jiacheng Liu@liujc1998·
Calling on behalf of infini-gram: does anyone know where I can get / apply for AWS credits? 💸💸 Keeping infini-gram alive costs quite some money, mostly SSD rental. If you're a fan of keeping open LLM training data readily inspectable, please reply / DM me some pointers! 🧵1/4
Jiacheng Liu tweet media
English
3
16
28
3.6K
Prasann Singhal retweetledi
Zayne Sprague ✈️ ICLR Rio
Zayne Sprague ✈️ ICLR Rio@ZayneSprague·
RL amplifies existing behaviors. Let’s prime models w/ good behaviors for better RL! Introducing SkillFactory: ✂️Rearrange model traces on a problem to demo verification + retry ⚙️SFT on those traces 🦾RL Result: Learn robust explicit verification + retry across domains 🧵
Zayne Sprague ✈️ ICLR Rio tweet mediaZayne Sprague ✈️ ICLR Rio tweet media
English
2
26
69
21.1K
Prasann Singhal retweetledi
Greg Durrett
Greg Durrett@gregd_nlp·
Check out SkillFactory! Priming LLMs with SFT before RL is pretty cheap and lets models learn cognitive skills from RL more effectively. And adding this inductive bias via SFT data is nicely compatible with the bitter lesson!
Zayne Sprague ✈️ ICLR Rio@ZayneSprague

RL amplifies existing behaviors. Let’s prime models w/ good behaviors for better RL! Introducing SkillFactory: ✂️Rearrange model traces on a problem to demo verification + retry ⚙️SFT on those traces 🦾RL Result: Learn robust explicit verification + retry across domains 🧵

English
2
18
91
10.4K
Prasann Singhal retweetledi
Prasann Singhal retweetledi
Lisa Dunlap
Lisa Dunlap@lisabdunlap·
🌟NEW PAPER🌟 Do you know that changing a visual marker from red to blue can completely reorder VLM leaderboards? In our most recent work, we explore the fragility of visually prompted benchmarks. lisadunlap.github.io/vpbench/
Lisa Dunlap tweet media
English
6
38
219
47.7K
Prasann Singhal retweetledi
Jiacheng Liu
Jiacheng Liu@liujc1998·
Belated update: I defended my PhD last month! I am tremendously grateful to my advisors, @HannaHajishirzi and @YejinChoinka. Without their incredible support, I wouldn’t have had so much fun exploring bold ideas, like taking a journey into the ocean of LLM pretraining data. 🥰🥰
Jiacheng Liu tweet mediaJiacheng Liu tweet media
English
39
10
308
20.8K
Prasann Singhal retweetledi
Lisa Dunlap
Lisa Dunlap@lisabdunlap·
🧵Tired of scrolling through your horribly long model traces in VSCode to figure out why your model failed? We made StringSight to fix this: an automated pipeline for analyzing your model outputs at scale. ➡️Demo: stringsight.com ➡️Blog: blog.stringsight.com
English
3
37
92
28K
Prasann Singhal retweetledi
Sewon Min
Sewon Min@sewon__min·
Really excited about this work!! As a retrieval person, having a pre-training-scale retrieval index in an academic setting has long been a dream, and I thought it would be too difficult / infeasible. Collaborating with systems experts made it possible much earlier than I expected. Huge thanks to the students driving this: @YichuanM and @jinjianliuu !
Yichuan Wang@YichuanM

(1/N) 🚀 DS-Serve is a framework for efficient, scalable neural retrieval — it turns any in-house dataset (<1T tokens) into a high-throughput (up to 10,000 QPS), low-latency (<100ms), memory-efficient (<200GB RAM) retrieval system with a web UI and API. With DS-Serve, we publicly deployed a 400B-token datastore of high-quality LLM pretraining data (2B vectors), spanning academic resources — and it matches commercial search endpoints on our benchmarks at extremely low latency and high throughput. Try it out: api.ds-serve.org:30888/ui Blog: berkeley-large-rag.github.io/RAG-DS-Serve Work from UC Berkeley ( @BerkeleyNLP & @BerkeleySky) with collaborators at UW & UIUC!

English
5
16
120
23.3K