Sewon Min

1.1K posts

Sewon Min banner
Sewon Min

Sewon Min

@sewon__min

Assistant professor @Berkeley_EECS @berkeley_ai || Research scientist at @allen_ai || PhD from @uwcse @uwnlp

Seattle, WA Katılım Kasım 2017
862 Takip Edilen15.1K Takipçiler
Sabitlenmiş Tweet
Sewon Min
Sewon Min@sewon__min·
It has been great working on the project with support from @allen_ai! I believe there are many meaningful ways different people and orgs can work together to build strong shared models, and data collaboration might be the most impactful form of it. 📄Paper: allenai.org/papers/flexolmo Big thanks to @WeijiaShi2, @AkshitaB93, @notkevinfarhat and the team for making it all work!
Ai2@allen_ai

Introducing FlexOlmo, a new paradigm for language model training that enables the co-development of AI through data collaboration. 🧵

English
9
15
225
44.7K
Sewon Min retweetledi
Jelani Nelson
Jelani Nelson@minilek·
Today is "Big Give", Berkeley's 24-hour online fundraiser Last year @Berkeley_EECS graduated 1,029 Computer Science majors. Next year it will be ~350. Enrollments were slashed primarily due to the high cost of instruction, with undergrad teaching assistants now costing the department $73-82/hr (equivalent pre-tax comp. rate ~$89-103/hr; see tinyurl.com/ugradtas) If you want to help us maintain our excellence in this world of high costs, you can donate to the Berkeley EECS Excellence Fund at givingday.berkeley.edu/amb/supporteecs
English
8
12
98
34K
Sewon Min retweetledi
Oscar Yinn
Oscar Yinn@yinn_oscar·
Many people are using RL to make models smarter. We used RL to pull training data out of the models themselves. Our results show that models know a lot more about their training data than most people think. We develop Active Data Reconstruction Attack (ADRA) — a data detection method that uses RL to induce models to reconstruct data seen during training. ADRA beats existing methods by an average of >10% across pre-training, post-training, and distillation. Our paper, with @uwnlp, @Cornell, and @BerkeleyNLP @Berkeleyai, is now available. Arxiv: arxiv.org/pdf/2602.19020 Joint work with @jxmnop @shmatikov @sewon__min @HannaHajishirzi
Oscar Yinn tweet media
English
4
38
181
10.9K
Sewon Min retweetledi
Tal Linzen
Tal Linzen@tallinzen·
My take on the substance of the matter: if you want to study how humans are using a new technology in high-stakes social context like medicine, you need to study it carefully, in a controlled human study. These questions are too important to leave to substackers who spend a couple of hours on each post or to the big labs' comms teams. I didn't read the study any more than the two famous journalists who tweeted or retweeted about it did, but as far as I can tell the authors of this particular study did everything right. They put a preprint on arXiv as soon as the study was concluded, and then also submitted it for publication. Submitting it seems like a good move. They got a few more people to evaluate their methodology, and, I suppose, the snazzy Nature Group typesetting got them the attention of two famous journalists who prior to this publication didn't seem to be interested in this topic (or aware of the preprint that was released about a year ago). Of course it would be nice to apply the evaluation methodology the authors propose to every weekly update from every big lab, but I don't think that's a reasonable expectation from academia. Maybe the action editor should have asked for one last replication with 2025 models before accepting this for publication. But the important thing is that this article points out a gap between models' performance on medical questions (which was already high in 2024) and the outcomes of the models' interactions with humans, and it advocates for more realistic evaluations that include the human component of the equation. Now it's up to companies and policymakers to decide what to do with this information.
Tal Linzen@tallinzen

I tried to find the tweet from yesterday where @mattyglesias expressed an opinion about academic publishing and had to scroll past pages and pages of tweets where he had equally strong opinions about literally dozens of unrelated topics

English
2
4
37
7.8K
Sewon Min retweetledi
Shannon Shen
Shannon Shen@shannonzshen·
Super excited to share our open interactive demo for DR Tulu-8B! It supports web and literature search with full transparency — you can see the model's thinking traces and tool outputs as it reasons through your query. 🔗 dr-tulu.org 📝 arxiv.org/abs/2511.19399
English
5
27
133
26.6K
Sewon Min retweetledi
Yuezhou Hu
Yuezhou Hu@yuezhouhu·
Take a look at Residual Context Diffusion (RCD): a simple idea to boost diffusion LLMs—stop wasting “remasked” tokens!!! arxiv.org/abs/2601.22954 (Example on AIME24. RCD increases parallelism by 4x while reaching the baseline's peak accuracy.) #DiffusionLLM #LLM #Reasoning #GenAI
GIF
English
3
35
201
37.4K
Sewon Min retweetledi
Zirui "Colin" Wang
Zirui "Colin" Wang@zwcolin·
🎮 We release VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents (w/ @junyi42 @aomaru_21490) 🌐 With 17 environments across multiple domains, we show systematically the brittleness of VLMs in visual interaction, and what training leads to. 🧵[1/8]
English
2
34
181
39.6K
Sewon Min retweetledi
Junyang Lin
Junyang Lin@JustinLin610·
i do agree that sometimes boring stuffs with solid impl make miracles happen but i still believe that great ideas can change the world and it will become the era of research. it is not the difference between academia and industry. it is what it is, for always. always low prob for elegant ideas work well unless a toy setting but we still need more compute for these undeterministic stuff. but for the deterministic stuff u r right about it. good infra is the key to fast iteration and fast success.
English
2
7
152
15.9K
Jiacheng Liu
Jiacheng Liu@liujc1998·
Belated update: I defended my PhD last month! I am tremendously grateful to my advisors, @HannaHajishirzi and @YejinChoinka. Without their incredible support, I wouldn’t have had so much fun exploring bold ideas, like taking a journey into the ocean of LLM pretraining data. 🥰🥰
Jiacheng Liu tweet mediaJiacheng Liu tweet media
English
39
10
307
20.7K
Sewon Min retweetledi
Sewon Min retweetledi
Rulin Shao
Rulin Shao@RulinShao·
Please check out DS-Serve for pretraining-scale index serving! Huge congrats to amazing @YichuanM and @jinjianliuu who made this possible! 🥳 I've been far wishing to have an online serving version of MassiveDS since its release in 2024. Here's a roadmap that I've witnessed with wonderful mentors @sewon__min @Tim_Dettmers : 1. We initially developed a CPU-based serving code that performs distributed search for MassiveDS. This code has been used in some Meta and AI2 projects, but it was painful to monitor the jobs that we needed to relaunch the server frequently and grab unnecessarily expensive nodes for serving. 2. Another aspect is the data quality. It was unclear how to filter high-quality data for a retrieval datastore to cut costs. This problem was explored in CompactDS led by @XinxiLyu & @micdun8, advised by @sewon__min. They did a scary amount of careful ablations on how data composition, vector compression, and reranking can impact model performance. As a result, they built CompactDS that, as its name indicates, is a higher-quality version of MassiveDS. 3. Despite CompactDS made it possible to serve one giant index on one node with 1TB CPU memory, the latency and throughput were still not satisfying. @YichuanM and @jinjianliuu, as the experts of efficient IR system, built this DS-Serve that finally made the large-scale index usable in online serving applications. In case it's not obvious how significant is the achievement: my previous distributed serving index can only compress the latency to a few seconds to even minutes at most depending on index types, but they made it <100ms with diskANN & other techniques, which is shockingly fast. I believe it will enable many important applications such as Deep Research training w/ in-house datastore. 4. Hardware improvement. This is not published yet, but @Tim_Dettmers and I built an in-house SSD machine in my first year of PhD that is specially designed for large-scale serving. DS-Serve is currently running on this machine, showing its great capacity at low cost. Time to rework on software-hardware co-design for index serving 😃 As RL w/ tool use and context management is becoming more popular, I believe there will be more use cases that require larger-scale in-house datastore 😍
Yichuan Wang@YichuanM

(1/N) 🚀 DS-Serve is a framework for efficient, scalable neural retrieval — it turns any in-house dataset (<1T tokens) into a high-throughput (up to 10,000 QPS), low-latency (<100ms), memory-efficient (<200GB RAM) retrieval system with a web UI and API. With DS-Serve, we publicly deployed a 400B-token datastore of high-quality LLM pretraining data (2B vectors), spanning academic resources — and it matches commercial search endpoints on our benchmarks at extremely low latency and high throughput. Try it out: api.ds-serve.org:30888/ui Blog: berkeley-large-rag.github.io/RAG-DS-Serve Work from UC Berkeley ( @BerkeleyNLP & @BerkeleySky) with collaborators at UW & UIUC!

English
2
7
71
25.2K
Sewon Min retweetledi
Yichuan Wang
Yichuan Wang@YichuanM·
(1/N) 🚀 DS-Serve is a framework for efficient, scalable neural retrieval — it turns any in-house dataset (<1T tokens) into a high-throughput (up to 10,000 QPS), low-latency (<100ms), memory-efficient (<200GB RAM) retrieval system with a web UI and API. With DS-Serve, we publicly deployed a 400B-token datastore of high-quality LLM pretraining data (2B vectors), spanning academic resources — and it matches commercial search endpoints on our benchmarks at extremely low latency and high throughput. Try it out: api.ds-serve.org:30888/ui Blog: berkeley-large-rag.github.io/RAG-DS-Serve Work from UC Berkeley ( @BerkeleyNLP & @BerkeleySky) with collaborators at UW & UIUC!
GIF
English
5
53
172
63.5K
Sewon Min
Sewon Min@sewon__min·
This is also part of a longer effort on pre-training-scale retrieval: * MassiveDS led by @RulinShao : Retrieval over trillion-token pre-training data brings substantial, consistent gains across the board (reproducing RETRO) arxiv.org/abs/2407.12854 * CompactDS led by @XinxiLyu & @micdun8 : You can actually get same / better gains from a smaller subset (0.4T tokens) through careful, high-quality data curation, with benefits extending beyond classic RAG to reasoning-heavy tasks. arxiv.org/abs/2507.01297 * DS-Serve led by @jinjianliuu & @YichuanM : now this can be super-efficient, low-latency, modest-memory, high-accuracy serving -- publicly deployable in an academic setting (and now you can use it freely via our API!) berkeley-large-rag.github.io/RAG-DS-Serve
English
0
3
23
2.1K
Sewon Min
Sewon Min@sewon__min·
Really excited about this work!! As a retrieval person, having a pre-training-scale retrieval index in an academic setting has long been a dream, and I thought it would be too difficult / infeasible. Collaborating with systems experts made it possible much earlier than I expected. Huge thanks to the students driving this: @YichuanM and @jinjianliuu !
Yichuan Wang@YichuanM

(1/N) 🚀 DS-Serve is a framework for efficient, scalable neural retrieval — it turns any in-house dataset (<1T tokens) into a high-throughput (up to 10,000 QPS), low-latency (<100ms), memory-efficient (<200GB RAM) retrieval system with a web UI and API. With DS-Serve, we publicly deployed a 400B-token datastore of high-quality LLM pretraining data (2B vectors), spanning academic resources — and it matches commercial search endpoints on our benchmarks at extremely low latency and high throughput. Try it out: api.ds-serve.org:30888/ui Blog: berkeley-large-rag.github.io/RAG-DS-Serve Work from UC Berkeley ( @BerkeleyNLP & @BerkeleySky) with collaborators at UW & UIUC!

English
5
15
121
23.2K
Sewon Min retweetledi
Saining Xie
Saining Xie@sainingxie·
Please don’t call it a shitshow 1) It’s an OpenReview bug, so it isn’t really any organizer’s fault. The ICLR chairs worked through the holiday to find a solution. 2) There isn’t a perfect fix. In the extreme, they could either redo all the reviews or just leave everything as they are, and neither is possible. 3) Reverting the ratings and letting the ACs make the call seems like a pretty reasonable compromise. (I understand they won’t erase the discussions.)
English
5
5
91
9.6K