Berkeley AI Research

1.4K posts

Berkeley AI Research

@berkeley_ai

We're graduate students, postdocs, faculty and scientists at the cutting edge of artificial intelligence research.

Berkeley, CA 加入时间 Temmuz 2017

448 关注263.9K 粉丝

Berkeley AI Research 已转推

Negar Arabzadeh@NegarEmpr·5h

1/ "Can QPP Choose the Right Query Variant?" has been accepted at #SIGIR2026!🇦🇺 You can easily over-generate multiple query variants at low cost, but running RAG for all of them is expensive! Can we pick the winner query before paying the generation cost? arxiv.org/abs/2604.22661

English

1.8K

Berkeley AI Research 已转推

Kevin Zakka@kevin_zakka·2d

Gave my PhD dissertation talk on Friday! It's been an incredible journey made possible by the best advisor who believed in me and gave me the freedom and support to explore. Thank you @pabbeel! And thank you to everyone who came to support and share this milestone with me 🙏

English

641

27.8K

Berkeley AI Research 已转推

Marwa Abdulhai@marwaabdulhai·2d

I am in Rio for #ICLR 2026! I am excited to be presenting 3 posters at the following workshops: ⚖️ Hierarchical Agenda Reasoning for Strategic Multi-Turn Dialogue Agents 📍 04/26, 11:15 AM–12 PM or 4:10–5 PM | DATA-FM, Room 203 A+B 📍 04/27, 11:35 AM–12:20 PM & 2:30–3:10 PM | SPOT Workshop ✏️ How LLMs Distort Our Written Language 📍 04/26, 3–3:50 PM | AIWILD Workshop, Room 204 A/B 🤖 Evaluating and Reducing Deceptive Dialogue from Language Models with Multi-Turn RL 📍 04/27, 10–11 AM | Trustworthy AI Workshop, Room 204 A+B If you're interested in dialogue agents, multi-turn RL, AI deception, and/or preserving human agency in AI-interactions, I'd love to chat. Feel free to reach out!

English

Berkeley AI Research 已转推

Sergey Levine@svlevine·1d

Tomorrow in the Workshop on World Models at ICLR in Rio (10:30 am) I’ll talk about a… different take on what might make for a good world model. Come find out, 10:30 in Room 202A at ICLR sites.google.com/view/iclr-2026…

English

296

23.4K

Berkeley AI Research 已转推

Sewon Min@sewon__min·2d

I will give two talks at ICLR workshops!! 🇧🇷 Sunday 9:40-10:10: "LLMs for Distributed Data Use" @ Workshop on Data Problems in Foundation Models (Room 203 A/B) Monday 15:30–16:05 : "Are Mixture-of-Experts Modular? Why It Matters and How to Fix It" @ ICBINB Workshop (Room 201 C) Both happened to be related to MoEs, but tackle two completely different questions → some say hi!

English

128

11.9K

Berkeley AI Research 已转推

Sergey Levine@svlevine·2d

I'll give a talk about lifelong learning tmrw (Sun), 9:30 am (Brazil time) in the lifelong agents workshop, about how we can get robot foundation models to improve with RL: lifelongagent.github.io Then at 11:30 am, I'll talk about how generative models can drive self-improvement here: recursive-workshop.github.io

English

338

33.8K

Berkeley AI Research 已转推

Seohong Park@seohong_park·4d

What if we represent a state as a "list" of similarities to all other states? In our recent ICLR paper, we studied this "dual" representation. Come visit our poster at #4608 10:30a-1p on Fri (morning, 2nd day)! Paper: arxiv.org/abs/2510.06714 Blog post: seohong.me/blog/dual-repr…

Seohong Park@seohong_park

Introducing *dual representations*! tl;dr: We represent a state by the "set of similarities" to all other states. This dual perspective has lots of nice properties and practical benefits in RL. Blog post: seohong.me/blog/dual-repr… Paper: arxiv.org/abs/2510.06714 ↓

English

295

39.9K

Berkeley AI Research 已转推

Lakshya A Agrawal@LakshyAAAgrawal·5d

Thrilled to present GEPA as an Oral Talk and Poster at ICLR 2026 this Friday in Rio! 🇧🇷 Apr 24 Oral Session 3A (Agents), 10:30 AM BRT, Amphitheater Poster Session 4, 3:15 PM, Pavilion 3 x.com/LakshyAAAgrawa… Let's recap what's happened since we released GEPA last year 🧵

Lakshya A Agrawal@LakshyAAAgrawal

How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵

English

216

39.3K

Berkeley AI Research 已转推

Yajat Yadav@YajatYadav314·5d

Excited to be in Rio this week to present RETAIN (w/ @zhiyuan_zhou_ , @ajwagenmaker, @KarlPertsch, and @svlevine) at #ICLR2026! 🇧🇷 Saturday 10:30 AM – 1:00 PM at P3-#1208. Project Page: retain.yajatyadav.com x.com/zhiyuan_zhou_/….

Paul Zhou@zhiyuan_zhou_

Do you ever find finetuning VLA overfits to the target task, to the point where generalist ability is lost and even minor deviations beyond the SFT data break the policy? We found an extremely simple solution: directly merge the base and finetuned policy in weight space 🤯 👇🧵

English

17.3K

Berkeley AI Research 已转推

Qiyang (Colin) Lil@qiyang_li·6d

Excited to be in Rio attending ICLR this week to present some papers! 🧵(1/3) (1) Decoupled Q-Chunking w/ @seohong_park and @svlevine on Fri 3:15-5:45 (#4504) x.com/qiyang_li/stat…

Qiyang (Colin) Lil@qiyang_li

Action chunking is drawing growing interest in RL, yet its theoretical properties are still understudied. We are excited to share some insights on when we should use action chunking in Q-learning + a new algo (DQC) to tackle hard long-horizon tasks!colinqiyangli.github.io/dqc🧵1/N

English

101

21.3K

Berkeley AI Research 已转推

Serina Chang@serinachang5·21 Nis

🎉 Thrilled to have two papers accepted to ACL 2026 main! 1. Graph-based models match LLMs on close-ended human simulation tasks with far less compute & greater transparency 2. (oral) How to allocate human samples towards fine-tuning vs post-hoc rectification in simulation

English

134

13.8K

Berkeley AI Research 已转推

Angjoo Kanazawa@akanazawa·21 Nis

Very excited to share this work @davidrmcall did with the fantastic NVIDIA Finland team last year. We have a surprisingly simple, but sample efficient way to post-train a flow model with RL.

David McAllister@davidrmcall

We developed a simple, sample-efficient online RL technique for post-training image generation models. We see it as a possible steerable alternative to CFG, driven by any scalar reward, including human preference.

English

15.8K

Berkeley AI Research 已转推

Dawn Song@dawnsongtweets·21 Nis

🎉 The Agents in the Wild: Safety, Security, and Beyond workshop @ICLR2026 is less than a week away! Join us April 26 in Room 204 A/B, Riocentro, Rio de Janeiro! 🌴 Safety and security for AI agents — both foundational and emerging challenges — demand serious attention. Researchers and practitioners are mobilizing: ▪️ 151 papers accepted ▪️ 161 reviewers (58% industry, 42% academia) ▪️ Up to 800 participants expected ▪️ Incredible engagement on a topic that clearly matters. The schedule: 👇

English

7.2K

Berkeley AI Research 已转推

Kuba@kuba_AI·21 Nis

Optimize materials yourself! 🔥 No cloning, no install — just open our interactive Google Colab demo and start optimizing crystals for formation energy or band gap in < 2 minutes. → Try it here: colab.research.google.com/drive/1usUg7ze… → Or grab the repo + open weights: github.com/znowu/CliqueFl… huggingface.co/iamkuba/Clique…

English

12.6K

Berkeley AI Research 已转推

C.K. Wolfe@ckwolfeofficial·21 Nis

Released on @berkeley_ai blog, recent work by @michaelpsenka M. Rabbat @ask1729 @ylecun @_amirbar — long horizons in visual world models punish naive gradients, GRASP reshapes them (lifted virtual states, noised state iterates, action-friendly descent) so planning stays stable when rollouts get ill-conditioned … 🐻📄 bair.berkeley.edu/blog/2026/04/2…

GIF

English

14.5K

Berkeley AI Research 已转推

David McAllister@davidrmcall·17 Nis

English

367

59.5K

Berkeley AI Research 已转推

A. Sophia Koepke@ASophiaKoepke·17 Nis

New paper: Back into Plato’s Cave Are vision and language models converging to the same representation of reality? The Platonic Representation Hypothesis says yes. BUT we find the evidence for this is more fragile than it looks. Project page: akoepke.github.io/cave_umwelten/ 1/9

English

228

44.7K

Berkeley AI Research 已转推

Dawn Song@dawnsongtweets·18 Nis

The 'why' matters more than the benchmark score. We decompose long-horizon failures into diagnosable patterns — check out our benchmark and analysis to see exactly where and why agents break.

Xinyu Jessica Wang@xwang2775

Do you know how your OpenClaw agent fails? The Long-Horizon Task Mirage? LLM agents seem capable… until tasks get long. Even extending a few steps can break them. In embodied tasks, 3–4 steps already fail. Real-world failures are happening. But we still don’t understand why.🤔

English

25.2K

Berkeley AI Research 已转推

Dawn Song@dawnsongtweets·16 Nis

We broke 8 major agent eval benchmarks. Now we're open-sourcing the tool that did it — so you can break yours first. Try out BenchJack: Pen-testing, but for evals.

Hao Wang@MogicianTony

Benchmarks are often easier to game than they look. We build BenchJack to audit benchmarks for hidden shortcuts and reward hacks — before they evaluate your agent. Now in preview. Fully open source, with support for auditing your own benchmarks too. github.com/benchjack/benc… Issues and PRs welcome.

English

25.8K

Berkeley AI Research 已转推

Hao Wang@MogicianTony·15 Nis

Hao Wang@MogicianTony

SWE-bench Verified and Terminal-Bench—two of the most cited AI benchmarks—can be reward-hacked with simple exploits. Our agent scored 100% on both. It solved 0 tasks. Evaluate the benchmark before it evaluates your agent. If you’re picking models by leaderboard score alone, you’re optimizing for the wrong thing. 🧵

English

36K

发现

@pabbeel @zhiyuan_zhou_ @ajwagenmaker @KarlPertsch @svlevine @seohong_park @davidrmcall @michaelpsenka