BerkeleyNLP

117 posts

BerkeleyNLP banner
BerkeleyNLP

BerkeleyNLP

@BerkeleyNLP

We work on natural language processing, machine learning, linguistics, and deep learning. PIs: Dan Klein, @alsuhr, @sewon__min

Berkeley, California Beigetreten Eylül 2019
37 Folgt6.3K Follower
BerkeleyNLP retweetet
BerkeleyNLP retweetet
Sewon Min
Sewon Min@sewon__min·
Really excited about this work!! As a retrieval person, having a pre-training-scale retrieval index in an academic setting has long been a dream, and I thought it would be too difficult / infeasible. Collaborating with systems experts made it possible much earlier than I expected. Huge thanks to the students driving this: @YichuanM and @jinjianliuu !
Yichuan Wang@YichuanM

(1/N) 🚀 DS-Serve is a framework for efficient, scalable neural retrieval — it turns any in-house dataset (<1T tokens) into a high-throughput (up to 10,000 QPS), low-latency (<100ms), memory-efficient (<200GB RAM) retrieval system with a web UI and API. With DS-Serve, we publicly deployed a 400B-token datastore of high-quality LLM pretraining data (2B vectors), spanning academic resources — and it matches commercial search endpoints on our benchmarks at extremely low latency and high throughput. Try it out: api.ds-serve.org:30888/ui Blog: berkeley-large-rag.github.io/RAG-DS-Serve Work from UC Berkeley ( @BerkeleyNLP & @BerkeleySky) with collaborators at UW & UIUC!

English
5
15
121
23.2K
BerkeleyNLP retweetet
Yichuan Wang
Yichuan Wang@YichuanM·
(1/N) 🚀 DS-Serve is a framework for efficient, scalable neural retrieval — it turns any in-house dataset (<1T tokens) into a high-throughput (up to 10,000 QPS), low-latency (<100ms), memory-efficient (<200GB RAM) retrieval system with a web UI and API. With DS-Serve, we publicly deployed a 400B-token datastore of high-quality LLM pretraining data (2B vectors), spanning academic resources — and it matches commercial search endpoints on our benchmarks at extremely low latency and high throughput. Try it out: api.ds-serve.org:30888/ui Blog: berkeley-large-rag.github.io/RAG-DS-Serve Work from UC Berkeley ( @BerkeleyNLP & @BerkeleySky) with collaborators at UW & UIUC!
GIF
English
5
53
172
63.5K
BerkeleyNLP retweetet
Jiaxin Ge
Jiaxin Ge@aomaru_21490·
✨Introducing ECHO, the newest in-the-wild image generation benchmark! You’ve seen new image models and new use cases discussed on social media, but old benchmarks don’t test them! We distilled this qualitative discussion into a structured benchmark. 🔗 echo-bench.github.io
English
4
32
126
45.3K
BerkeleyNLP retweetet
BerkeleyNLP retweetet
Wenjie Ma
Wenjie Ma@wenjie_ma·
LLMs solving math benchmarks with verifiable answers like AIME? ✅ LLMs solving math proofs? ❌ Still an open problem. RL works great for final-answer problems, but proofs are different: - Often no single checkable answer - Correct answers can hide flawed reasoning The key bottleneck: reliable proof evaluation. Without a good evaluator, we can't automatically evaluate or train better "provers." Our new work tackles this challenge step by step. 🧵 📄 Paper: arxiv.org/pdf/2510.13888
English
9
37
194
60K
BerkeleyNLP retweetet
Kayo Yin
Kayo Yin@kayo_yin·
Happy to announce the first workshop on Pragmatic Reasoning in Language Models — PragLM @ COLM 2025! 🧠🎉 How do LLMs engage in pragmatic reasoning, and what core pragmatic capacities remain beyond their reach? 🌐 sites.google.com/berkeley.edu/p… 📅 Submit by June 23rd
English
6
22
94
57.8K
BerkeleyNLP retweetet
Ruiqi Zhong @NeurIPS 25
Ruiqi Zhong @NeurIPS 25@ZhongRuiqi·
Last day of PhD! I pioneered using LLMs to explain dataset&model. It's used by interp at @OpenAI and societal impact @AnthropicAI Tutorial here. It's a great direction & someone should carry the torch :) Thesis available, if you wanna read my acknowledgement section=P
Ruiqi Zhong @NeurIPS 25 tweet media
English
30
36
541
57.6K
BerkeleyNLP retweetet
Nicholas Tomlin
Nicholas Tomlin@NickATomlin·
The long-term goal of AI is to build models that can handle arbitrary tasks, not just ones they’ve been trained on. We hope our new *benchmark generator* can help measure progress toward this vision
Nicholas Tomlin tweet media
Vivek Verma@vcubingx

🎮 Excited to announce gg-bench, a fully synthetic benchmark for LLMs consisting of games generated entirely by LLMs!! This benchmark centers around the fact that LLMs are capable of generating complex tasks that they themselves cannot even solve. 📄: arxiv.org/abs/2505.07215

English
4
29
179
25.9K
BerkeleyNLP retweetet
Vivek Verma
Vivek Verma@vcubingx·
🎮 Excited to announce gg-bench, a fully synthetic benchmark for LLMs consisting of games generated entirely by LLMs!! This benchmark centers around the fact that LLMs are capable of generating complex tasks that they themselves cannot even solve. 📄: arxiv.org/abs/2505.07215
Vivek Verma tweet media
English
3
25
147
38K
BerkeleyNLP retweetet
Nicholas Tomlin
Nicholas Tomlin@NickATomlin·
I'm incredibly excited to share that I'll be joining @TTIC_Connect as an assistant professor in Fall 2026! Until then, I'm wrapping up my PhD at Berkeley, and after that I'll be a faculty fellow at @NYUDataScience
English
33
10
198
16.7K
BerkeleyNLP retweetet
Ruiqi Zhong @NeurIPS 25
Ruiqi Zhong @NeurIPS 25@ZhongRuiqi·
Finished my dissertation!!! (scalable oversight,link below) Very fortunate to have @JacobSteinhardt and Dan Klein as my advisors! Words can't describe my gratitude, so I used a pic of Frieren w/ her advisor :) Thanks for developing my research mission, and teaching me magic
Ruiqi Zhong @NeurIPS 25 tweet media
English
27
11
395
23.5K
BerkeleyNLP retweetet
Lakshya A Agrawal
Lakshya A Agrawal@LakshyAAAgrawal·
🧵Introducing LangProBe: the first benchmark testing where and how composing LLMs into language programs affects cost-quality tradeoffs! We find that, on avg across diverse tasks, smaller models within optimized programs beat calls to larger models at a fraction of the cost.
Lakshya A Agrawal tweet media
English
3
45
142
29.8K
BerkeleyNLP retweetet
Kayo Yin
Kayo Yin@kayo_yin·
Induction heads are commonly associated with in-context learning, but are they the primary driver of ICL at scale? We find that recently discovered "function vector" heads, which encode the ICL task, are the actual primary drivers of few-shot ICL. arxiv.org/abs/2502.14010 🧵
Kayo Yin tweet media
English
17
113
784
288.1K
BerkeleyNLP retweetet
Charlie Snell
Charlie Snell@sea_snell·
Can we predict emergent capabilities in GPT-N+1🌌 using only GPT-N model checkpoints, which have random performance on the task? We propose a method for doing exactly this in our paper “Predicting Emergent Capabilities by Finetuning”🧵
Charlie Snell tweet media
English
14
75
573
156.3K
BerkeleyNLP retweetet
Kayo Yin
Kayo Yin@kayo_yin·
Cool new dataset for translation ambiguity in 9 language pairs (7 low-resource), and we found LLM-generated descriptions help weaker models resolve ambiguity! @BaruaJosh will be presenting this at the 2-3:30pm poster session today, come talk to us about multilinguality in LLMs!
Josh Barua@BaruaJosh

Do LLMs encode knowledge of concept variation across languages? Can they use this knowledge to resolve ambiguity in translation? Our #EMNLP2024 paper finds a big performance gap between closed- and open-weight LLMs, but lexical rules can help transfer knowledge across models! 🧵

English
0
2
6
6K
BerkeleyNLP retweetet
Josh Barua
Josh Barua@BaruaJosh·
Do LLMs encode knowledge of concept variation across languages? Can they use this knowledge to resolve ambiguity in translation? Our #EMNLP2024 paper finds a big performance gap between closed- and open-weight LLMs, but lexical rules can help transfer knowledge across models! 🧵
Josh Barua tweet media
English
2
12
38
6K
BerkeleyNLP retweetet
Kayo Yin
Kayo Yin@kayo_yin·
🚨New dataset + challenge #EMNLP2024🚨 We release ASL STEM Wiki: the first signing dataset of STEM articles! 📰 254 Wikipedia articles 📹 ~300 hours of ASL interpretations 👋 New task: automatic sign suggestion to make STEM education more accessible microsoft.com/en-us/research… 🧵
English
8
21
107
15.3K
BerkeleyNLP retweetet
Ruiqi Zhong @NeurIPS 25
Ruiqi Zhong @NeurIPS 25@ZhongRuiqi·
Given the rapid progress of LLMs, I feel compelled to present this topic (even if it's not the main focus of my Ph.D. work). I will cover concrete ML problems related to "AI deception" -- undesirable behaviors of AI systems that are hard to catch -- and how to study this "Sci-Fi" topic scientifically.
Stanford NLP Group@stanfordnlp

For this week’s NLP Seminar, we are thrilled to host @ZhongRuiqi to talk about Concrete Problems in AI Deception: From Evaluation Gaming to Cyber Attack! When: 10/3 Thurs 11am PT Non-Stanford affiliates registration form: forms.gle/ez4PtVWVATbnT2… (closed at 9am PT on the talk day)

English
3
17
121
25.6K
BerkeleyNLP retweetet
Ruiqi Zhong @NeurIPS 25
Ruiqi Zhong @NeurIPS 25@ZhongRuiqi·
Graphical models struggle to explain patterns in text & images 😭 LLM can do this but hallucinates. 👿 It’s time to combine their strengths! We define models with natural language parameters! Unlocking opportunities in science, business, ML, etc
Ruiqi Zhong @NeurIPS 25 tweet media
English
8
30
222
33.8K