Berkeley AI Research

1.5K posts

Berkeley AI Research

@berkeley_ai

We're graduate students, postdocs, faculty and scientists at the cutting edge of artificial intelligence research.

Berkeley, CA Katılım Temmuz 2017

452 Takip Edilen269.5K Takipçiler

Berkeley AI Research retweetledi

Angjoo Kanazawa@akanazawa·3d

Babies learn by being naturally curious. How do we get autonomous agents to do the same? We revisited curiosity in 3D exploration and found that memory is key. This project taught me a lot about what kind of functions an agent and a "world model" need to have for this direction

Lily Goli@lily_goli

🚀 🚀 🚀 Excited to share our new paper: Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration What does it take for an agent to stay curious in a 3D world? The answer is memory. 🌐 Project: recuriosity.github.io 📄 Paper: arxiv.org/abs/2605.22814 💻 Code: github.com/recuriosity/re…

English

147

31.7K

Berkeley AI Research retweetledi

Lily Goli@lily_goli·4d

English

214

66.9K

Berkeley AI Research retweetledi

Lakshya A Agrawal@LakshyAAAgrawal·6d

Our paper on optimize_anything has been accepted to CAIS 2026, and is out on Arxiv with expanded experiments and details! A unified API to optimize agents (with architecture), CUDA kernels, cloud scheduling policies, or even graphics! x.com/LakshyAAAgrawa…

Lakshya A Agrawal@LakshyAAAgrawal

Excited to release @gepa_ai's optimize_anything: a universal API for optimizing any text parameter. It consistently matches or outperforms domain-specific tools optimizing code, prompts, agent harnesses, cloud policies, even visuals! If you can measure it, you can optimize it.

English

176

22.3K

Berkeley AI Research retweetledi

Dawn Song@dawnsongtweets·5d

1/ Can AI agents turn security vulnerabilities into real attacks? This is one of the most critical tasks for measuring the impact of frontier AI on cybersecurity. In ExploitGym, we find that autonomous exploitation is no longer hypothetical, even on complex targets such as browser engines and the Linux kernel. How we measured this⬇️

English

115

18.1K

Berkeley AI Research retweetledi

Giuseppe Loianno@loiannog·6d

RAPTOR-our new tiny foundation policy for quadrotors has just appeared on @SciRobotics! A single compact policy that adapts in milliseconds across different quadrotors and autopilots, flies zero-shot with no fine-tuning, and simultaneously tested on multiple platforms!

Science Robotics@SciRobotics

A new Science #Robotics study highlights an open-source, computationally light policy that can adapt to control unfamiliar quadrotors and stabilize against external perturbations like strong winds. @loiannog @jonas_eschmann scim.ag/4wrIQih

English

11.2K

Berkeley AI Research retweetledi

Yichuan Wang@YichuanM·18 May

LEANN just won the Best Paper Award at #MLSys26 🥹 still processing this. paper: arxiv.org/abs/2506.08276 repo: github.com/yichuan-w/LEANN huge thanks to all the amazing collaborators, advisors, and open-source contributors who made this possible ❤️

English

276

40.9K

Berkeley AI Research retweetledi

Alison Gopnik@AlisonGopnik·17 May

Here is the specific link to our paper with Eunice Yiu, Shiry Ginosar and Kelsey Allen, how to construct causal models through intrinsically motivated action, something kids do and LLMs don't. The whole issue on world models is very much worth reading. royalsocietypublishing.org/rsta/article/3…

English

Berkeley AI Research retweetledi

Andrew Wagenmaker@ajwagenmaker·15 May

Come check out our workshop on post-training robot foundation models at RSS 2026! Also consider participating in our real-world RL challenge!

Shiduo Zhang@Joey_zh_

#RSS2026 Call for participants 📢 Excited to announce our RSS 2026 Workshop: Post-Training for Robotics Foundation Models, together with the first Real-World Reinforcement Learning Challenge! The workshop is held on July 13 in Sydney. posttraining-for-robotics.github.io。

English

14K

Berkeley AI Research retweetledi

Dawn Song@dawnsongtweets·14 May

Excited to share DecodingTrust-Agent Platform (DTap), the first controllable, full-stack simulation platform for advanced AI agent red-teaming across 50+ realistic environments. DTap supports multiple attack vectors, including environment-, tool-, skill-, and prompt-level injections, as well as their compositions. We also build DTap-Bench, a ~7K-task benchmark with complex workflows and sophisticated attacks for evaluating agent security and utility under realistic threat scenarios. Through DTap, we uncover systematic vulnerabilities and zero-day failure modes in popular agents such as OpenClaw and Claude Code, and provide insights on how to improve harness design, tool execution, and trust calibration for more robust agentic systems. Read our paper to learn more 👇 Paper link: arxiv.org/pdf/2605.04808 Platform + benchmark + code: decodingtrust-agent.com Great work by the team!

Zhaorun Chen@ZRChen_AISafety

AI agents are already going wild, but today’s red-teaming tools for them are still like toys 😢 🔥👽 After spending 20 months and $120K API credits, we are excited to finally open-source DecodingTrust-Agent Platform (DTap): the first controllable, realistic simulation platform for advanced AI agent red-teaming !! 🌍 DTap simulates 50+ real-world environments across 14 high-stakes domains, with realistic agent interfaces replicated from their official MCPs and GUIs. The environments are full-stack, interactive, fully parallelizable, and can be easily configured to reproduce arbitrary real-world attack scenarios, making agent red-teaming scalable and highly transferable to deployment settings. 🔥We also release DTap-Bench, a large-scale benchmark with ~7K agent red-teaming tasks and ~4K policy-grounded malicious goals. Each red-teaming task includes a sophisticated attack sequence across environment-, tool-, skill-, prompt-level injections, as well as their compositions, plus a handcrafted verifiable judge that checks the actual consequences in the environment. Using DTap-Bench, we evaluate popular agent frameworks and backbone models across diverse policies, risks, threat models, and attack strategies, revealing systematic vulnerabilities and zero-days in today’s agents! Paper link: arxiv.org/pdf/2605.04808 Platform + benchmark + code: decodingtrust-agent.com Join our Discord: discord.gg/V4fG6NcVc Read more below 👇

English

16.7K

Berkeley AI Research retweetledi

Ken Goldberg@Ken_Goldberg·14 May

My students and I are very excited about the potential of agentic coding for robotics. Looking fwd to presenting new results in plenary talk on Tues 2 June, the first day of @IEEEorg #ICRA2026 in Vienna: invt.io/1txbkmc73b7

English

106

22.3K

Berkeley AI Research retweetledi

Ahmed Alaa@_ahmedmalaa·14 May

New work on ML application to plasma proteomics! Proteomic studies often don't replicate when conducted on different measurement platforms. We trained ML models on paired SomaScan and Olink data to bridge that gap and improve cross-platform replicability.

Zhi Yu@ZhiYu_ACGT

🎉Check out our preprint led by @LinkeLi_MGH @_ahmedmalaa with @pnatarajanmd : ) One of the biggest hurdles in proteomics is the non-replication of associations due to limited cross-platform correlation. We tackle this with ML models that bridge the two major ones, SomaScan and Olink, plus a tiered protein reliability system to guide which signals to trust. We validate against gold-standard measurements, AlphaFold 3 predicted structures, and 3 real-world applications, including replication of published dementia🧠and heart failure🫀associations, showing markedly improved cross-platform replicability. biorxiv.org/content/10.648…

English

14.4K

Berkeley AI Research retweetledi

Marwa Abdulhai@marwaabdulhai·13 May

Really excited about our new paper: LLM user simulators may sound human, but the actions they take may not be. We tested 24 models as user simulators and found the distribution of their behaviors to be very different from real users in WildChat. What does this mean for user sim research?

Shuhaib Mehri@shuhaibmehri

What happens when you compare the distributions of real and simulated user behaviors? 🔍 The gap is large. We introduce a method to measure this gap and evaluate 24 LLM-based user simulators across coding and writing tasks. @convai_uiuc @MSFTResearch @berkeley_ai 🧵 1/N

English

12.7K

Berkeley AI Research retweetledi

Serina Chang@serinachang5·13 May

User simulators have emerged as promising tools for building interactive AI, but what makes a “good” simulator? We reframe the problem as what creates downstream value for humans Our new simulator test: how an LLM assistant trained with the simulator performs with human users🧵

English

131

14.6K

Berkeley AI Research retweetledi

Joseph Jeesung Suh@JosephJSSuh·13 May

We use LLMs to role-play "users" to train, evaluate, and improve AI assistants. How do you know if your user simulator is any good? We argue: rather than measuring how realistic it sounds, start measuring how the assistants it trains perform with real humans. 🧵👇

English

Berkeley AI Research retweetledi

Matei Zaharia@matei_zaharia·13 May

Really excited about this work that combines GEPA with RL! You get some of the advantages of both, with reflection on rich feedback leading to better weight updates.

Kusha Sareen@KushaSareen

Can LLMs adapt continually without losing base skills? Fast-Slow Training (FST) pairs "slow" weights with "fast" context. FST vs. RL: • 3x more sample-efficient • Higher performance ceiling • Less KL drift (better plasticity) • Continual learning: succeeds where RL stalls

English

183

27.6K

Berkeley AI Research retweetledi

Rishabh Tiwari@rish2k1·13 May

Very excited about this line of research of fast-slow learning, 1) potential to solve a lot of issues with current RL (eg. entropy collapse, sparse rewards) 2) an intuitive way of incorporating rich feedback with RL 3) provides a way to transfer knowledge of text-only based learning into the model 4) a great candidate for model-harness co-evolution, seeing a lot discussion on X lately about future models developing their own harness. 5) most importantly, can imagine these kinds of algorithms to be more suitable candidates for discovery that requires both extreme exploration but at the same time improving the underlying model capabilities. and much more ...

Kusha Sareen@KushaSareen

English

170

27.2K

Berkeley AI Research retweetledi

Ken Goldberg@Ken_Goldberg·12 May

Thank you for summarizing 30 years of art and robotics @UCBerkeley. Go Bears!

Berkeley Engineering@Cal_Engineer

The Summer ’26 issue of Berkeley Engineer magazine is here! Our cover story centers on how Professor @Ken_Goldberg is redefining robot manipulation with creativity and code. Read the issue: engineering.berkeley.edu/magazine

English

240

33K

Berkeley AI Research retweetledi

Yi Ma@YiMaTweets·12 May

My Berkeley EECS Colloquium talk last week on Pursuing the Nature of Intelligence was recorded and is now available on YouTube: youtube.com/watch?v=Az9sfy… May view it as an overview of an endeavor to establish the study of Intelligence as a scientific and theoretical subject.

YouTube

English

192

Berkeley AI Research retweetledi

Rachel Freedman@FreedmanRach·13 May

Active Teacher Selection for Reward Learning: now published in TMLR! Most RLHF systems assume feedback comes from one canonical teacher — but annotators can disagree over 30% of the time. So who should the agent ask for feedback? Paper: arxiv.org/abs/2310.15288…

GIF

English

5.8K

Berkeley AI Research retweetledi

Shreya Shankar@sh_reya·12 May

I'm joining Carnegie Mellon's CS Department (and HCII by courtesy) as an assistant professor in Fall 2027! I'll be recruiting PhD students next cycle. If you're interested in AI systems or human-AI collaboration, list me in your application. Stay tuned for more about my new lab!

English

120

108

210K

Keşfet

@SciRobotics @IEEEorg @UCBerkeley @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates