Berkeley AI Research

1.5K posts

Berkeley AI Research banner
Berkeley AI Research

Berkeley AI Research

@berkeley_ai

We're graduate students, postdocs, faculty and scientists at the cutting edge of artificial intelligence research.

Berkeley, CA Katılım Temmuz 2017
452 Takip Edilen269.5K Takipçiler
Berkeley AI Research retweetledi
Angjoo Kanazawa
Angjoo Kanazawa@akanazawa·
Babies learn by being naturally curious. How do we get autonomous agents to do the same? We revisited curiosity in 3D exploration and found that memory is key. This project taught me a lot about what kind of functions an agent and a "world model" need to have for this direction
Lily Goli@lily_goli

🚀 🚀 🚀 Excited to share our new paper: Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration What does it take for an agent to stay curious in a 3D world? The answer is memory. 🌐 Project: recuriosity.github.io 📄 Paper: arxiv.org/abs/2605.22814 💻 Code: github.com/recuriosity/re…

English
2
19
147
31.7K
Berkeley AI Research retweetledi
Lakshya A Agrawal
Lakshya A Agrawal@LakshyAAAgrawal·
Our paper on optimize_anything has been accepted to CAIS 2026, and is out on Arxiv with expanded experiments and details! A unified API to optimize agents (with architecture), CUDA kernels, cloud scheduling policies, or even graphics! x.com/LakshyAAAgrawa…
Lakshya A Agrawal tweet media
Lakshya A Agrawal@LakshyAAAgrawal

Excited to release @gepa_ai's optimize_anything: a universal API for optimizing any text parameter. It consistently matches or outperforms domain-specific tools optimizing code, prompts, agent harnesses, cloud policies, even visuals! If you can measure it, you can optimize it.

English
4
22
176
22.3K
Berkeley AI Research retweetledi
Dawn Song
Dawn Song@dawnsongtweets·
1/ Can AI agents turn security vulnerabilities into real attacks? This is one of the most critical tasks for measuring the impact of frontier AI on cybersecurity. In ExploitGym, we find that autonomous exploitation is no longer hypothetical, even on complex targets such as browser engines and the Linux kernel. How we measured this⬇️
Dawn Song tweet media
English
6
34
115
18.1K
Berkeley AI Research retweetledi
Giuseppe Loianno
Giuseppe Loianno@loiannog·
RAPTOR-our new tiny foundation policy for quadrotors has just appeared on @SciRobotics! A single compact policy that adapts in milliseconds across different quadrotors and autopilots, flies zero-shot with no fine-tuning, and simultaneously tested on multiple platforms!
Science Robotics@SciRobotics

A new Science #Robotics study highlights an open-source, computationally light policy that can adapt to control unfamiliar quadrotors and stabilize against external perturbations like strong winds. @loiannog @jonas_eschmann scim.ag/4wrIQih

English
2
3
29
11.2K
Berkeley AI Research retweetledi
Alison Gopnik
Alison Gopnik@AlisonGopnik·
Here is the specific link to our paper with Eunice Yiu, Shiry Ginosar and Kelsey Allen, how to construct causal models through intrinsically motivated action, something kids do and LLMs don't. The whole issue on world models is very much worth reading. royalsocietypublishing.org/rsta/article/3…
English
3
14
53
9K
Berkeley AI Research retweetledi
Andrew Wagenmaker
Andrew Wagenmaker@ajwagenmaker·
Come check out our workshop on post-training robot foundation models at RSS 2026! Also consider participating in our real-world RL challenge!
Shiduo Zhang@Joey_zh_

#RSS2026 Call for participants 📢 Excited to announce our RSS 2026 Workshop: Post-Training for Robotics Foundation Models, together with the first Real-World Reinforcement Learning Challenge! The workshop is held on July 13 in Sydney. posttraining-for-robotics.github.io

English
5
5
21
14K
Berkeley AI Research retweetledi
Dawn Song
Dawn Song@dawnsongtweets·
Excited to share DecodingTrust-Agent Platform (DTap), the first controllable, full-stack simulation platform for advanced AI agent red-teaming across 50+ realistic environments. DTap supports multiple attack vectors, including environment-, tool-, skill-, and prompt-level injections, as well as their compositions. We also build DTap-Bench, a ~7K-task benchmark with complex workflows and sophisticated attacks for evaluating agent security and utility under realistic threat scenarios. Through DTap, we uncover systematic vulnerabilities and zero-day failure modes in popular agents such as OpenClaw and Claude Code, and provide insights on how to improve harness design, tool execution, and trust calibration for more robust agentic systems. Read our paper to learn more 👇 Paper link: arxiv.org/pdf/2605.04808 Platform + benchmark + code: decodingtrust-agent.com Great work by the team!
Zhaorun Chen@ZRChen_AISafety

AI agents are already going wild, but today’s red-teaming tools for them are still like toys 😢 🔥👽 After spending 20 months and $120K API credits, we are excited to finally open-source DecodingTrust-Agent Platform (DTap): the first controllable, realistic simulation platform for advanced AI agent red-teaming !! 🌍 DTap simulates 50+ real-world environments across 14 high-stakes domains, with realistic agent interfaces replicated from their official MCPs and GUIs. The environments are full-stack, interactive, fully parallelizable, and can be easily configured to reproduce arbitrary real-world attack scenarios, making agent red-teaming scalable and highly transferable to deployment settings. 🔥We also release DTap-Bench, a large-scale benchmark with ~7K agent red-teaming tasks and ~4K policy-grounded malicious goals. Each red-teaming task includes a sophisticated attack sequence across environment-, tool-, skill-, prompt-level injections, as well as their compositions, plus a handcrafted verifiable judge that checks the actual consequences in the environment. Using DTap-Bench, we evaluate popular agent frameworks and backbone models across diverse policies, risks, threat models, and attack strategies, revealing systematic vulnerabilities and zero-days in today’s agents! Paper link: arxiv.org/pdf/2605.04808 Platform + benchmark + code: decodingtrust-agent.com Join our Discord: discord.gg/V4fG6NcVc Read more below 👇

English
4
6
50
16.7K
Berkeley AI Research retweetledi
Ken Goldberg
Ken Goldberg@Ken_Goldberg·
My students and I are very excited about the potential of agentic coding for robotics. Looking fwd to presenting new results in plenary talk on Tues 2 June, the first day of @IEEEorg #ICRA2026 in Vienna: invt.io/1txbkmc73b7
English
3
12
106
22.3K
Berkeley AI Research retweetledi
Ahmed Alaa
Ahmed Alaa@_ahmedmalaa·
New work on ML application to plasma proteomics! Proteomic studies often don't replicate when conducted on different measurement platforms. We trained ML models on paired SomaScan and Olink data to bridge that gap and improve cross-platform replicability.
Zhi Yu@ZhiYu_ACGT

🎉Check out our preprint led by @LinkeLi_MGH @_ahmedmalaa with @pnatarajanmd : ) One of the biggest hurdles in proteomics is the non-replication of associations due to limited cross-platform correlation. We tackle this with ML models that bridge the two major ones, SomaScan and Olink, plus a tiered protein reliability system to guide which signals to trust. We validate against gold-standard measurements, AlphaFold 3 predicted structures, and 3 real-world applications, including replication of published dementia🧠and heart failure🫀associations, showing markedly improved cross-platform replicability. biorxiv.org/content/10.648…

English
1
7
23
14.4K
Berkeley AI Research retweetledi
Marwa Abdulhai
Marwa Abdulhai@marwaabdulhai·
Really excited about our new paper: LLM user simulators may sound human, but the actions they take may not be. We tested 24 models as user simulators and found the distribution of their behaviors to be very different from real users in WildChat. What does this mean for user sim research?
Shuhaib Mehri@shuhaibmehri

What happens when you compare the distributions of real and simulated user behaviors? 🔍 The gap is large. We introduce a method to measure this gap and evaluate 24 LLM-based user simulators across coding and writing tasks. @convai_uiuc @MSFTResearch @berkeley_ai 🧵 1/N

English
5
5
36
12.7K
Berkeley AI Research retweetledi
Serina Chang
Serina Chang@serinachang5·
User simulators have emerged as promising tools for building interactive AI, but what makes a “good” simulator? We reframe the problem as what creates downstream value for humans Our new simulator test: how an LLM assistant trained with the simulator performs with human users🧵
Serina Chang tweet media
English
6
23
131
14.6K
Berkeley AI Research retweetledi
Joseph Jeesung Suh
Joseph Jeesung Suh@JosephJSSuh·
We use LLMs to role-play "users" to train, evaluate, and improve AI assistants. How do you know if your user simulator is any good? We argue: rather than measuring how realistic it sounds, start measuring how the assistants it trains perform with real humans. 🧵👇
Joseph Jeesung Suh tweet media
English
8
8
66
9K
Berkeley AI Research retweetledi
Berkeley AI Research retweetledi
Rishabh Tiwari
Rishabh Tiwari@rish2k1·
Very excited about this line of research of fast-slow learning, 1) potential to solve a lot of issues with current RL (eg. entropy collapse, sparse rewards) 2) an intuitive way of incorporating rich feedback with RL 3) provides a way to transfer knowledge of text-only based learning into the model 4) a great candidate for model-harness co-evolution, seeing a lot discussion on X lately about future models developing their own harness. 5) most importantly, can imagine these kinds of algorithms to be more suitable candidates for discovery that requires both extreme exploration but at the same time improving the underlying model capabilities. and much more ...
Kusha Sareen@KushaSareen

Can LLMs adapt continually without losing base skills? Fast-Slow Training (FST) pairs "slow" weights with "fast" context. FST vs. RL: • 3x more sample-efficient • Higher performance ceiling • Less KL drift (better plasticity) • Continual learning: succeeds where RL stalls

English
3
25
170
27.2K
Berkeley AI Research retweetledi
Yi Ma
Yi Ma@YiMaTweets·
My Berkeley EECS Colloquium talk last week on Pursuing the Nature of Intelligence was recorded and is now available on YouTube: youtube.com/watch?v=Az9sfy… May view it as an overview of an endeavor to establish the study of Intelligence as a scientific and theoretical subject.
YouTube video
YouTube
English
12
24
192
1M
Berkeley AI Research retweetledi
Rachel Freedman
Rachel Freedman@FreedmanRach·
Active Teacher Selection for Reward Learning: now published in TMLR! Most RLHF systems assume feedback comes from one canonical teacher — but annotators can disagree over 30% of the time. So who should the agent ask for feedback? Paper: arxiv.org/abs/2310.15288…
GIF
English
3
15
41
5.8K
Berkeley AI Research retweetledi
Shreya Shankar
Shreya Shankar@sh_reya·
I'm joining Carnegie Mellon's CS Department (and HCII by courtesy) as an assistant professor in Fall 2027! I'll be recruiting PhD students next cycle. If you're interested in AI systems or human-AI collaboration, list me in your application. Stay tuned for more about my new lab!
English
120
108
2K
210K