Brett Barkley

25 posts

Brett Barkley

@bebark99

ECE PhD at UT Austin Lapsed aerospace engineer

Katılım Aralık 2024

140 Takip Edilen46 Takipçiler

Sabitlenmiş Tweet

Brett Barkley@bebark99·7 Eki

(1/n) With over 1,300 citations, MBPO is often cited as proof that model based RL beats model free methods. In arxiv.org/pdf/2412.14312 we showed it often completely fails in DeepMind Control. In our new work, Fixing That Free Lunch (FTFL), we explain why and make it succeed.

English

Brett Barkley@bebark99·12 Kas

@aurielws I'll be at NeurIPS and would love to chat! bebark.github.io scholar.google.com/citations?user…

English

699

Auriel@aurielws·11 Kas

if you are: - doing a PhD in ML / CV / RL - comfy with Python + training and - published @ NeurIPS / ICLR / ICML - curious about agents + self-improvement - maybe even JAX-fluent (bonus!) - AND you like boba 🧋 then: Come work with us!!! Drop your info below and also come say hi to some of us at @NeurIPSConf

Roberta Raileanu@robertarail

Our Open-Endedness team at @GoogleDeepMind is hiring student researchers. Come work on some cool research with the super talented @cong_ml 🤖🧪 🚀

English

639

128.5K

Brett Barkley@bebark99·10 Kas

@CodeSignalCom For more information on cooldown periods: support.codesignal.com/hc/en-us/artic…

English

Brett Barkley@bebark99·7 Eki

(12/12) In summary, FTFL turns MBPO’s synthetic-data failures into successes and shows how even seemingly similar environment structure can shape algorithmic reliability. Full paper: arxiv.org/abs/2510.01457

English

167

Brett Barkley@bebark99·7 Eki

(11/n) FTFL shows that understanding when and why algorithms fail is as important as improving their averages. We hope this motivates the RL community to build mappings between environment structure and algorithmic choices as a step toward more generally reliable methods.

English

199

Brett Barkley@bebark99·7 Eki

English

Brett Barkley@bebark99·28 Mar

@GuanyaShi @Caltech @lschmidt3 Totally resonates with our work (arXiv:2412.14312), we show that Dyna-style tweaks - dominant in Gym - consistently hurt performance in DMC despite both using Mujoco. Adding them to off-policy makes it worse, not better. Maybe we’ve overfit to Gym more than we realized.

English

112

Guanya Shi@GuanyaShi·27 Mar

When I was a Ph.D. student at @Caltech, @lschmidt3 discussed the paper "Do ImageNet Classifiers Generalize to ImageNet?" in his job talk, which left me with a super deep impression until today. Basically, they recreated an ImageNet and found the SOTA models in circa 2019 had 11-14% performance degradation. Even for supervised learning, models and algorithms could easily overfit to a dataset. I cannot help but ask myself how many conclusions we have in RL/control algorithm papers, which run MuJoCo-Walker-style examples, can actually translate to real robotic systems. Another related question I keep asking myself these days is, what is the future of control and RL theory? Most of the existing theories are based on assumptions that are hard or impossible to verify in real systems. My thoughts are: 1. Similar to "Physics of LLM", maybe we should switch from "math-style" constructive theory to "physics-style" theory based on large-scale empirical observations, for "computational" control problems. E.g., arxiv.org/pdf/2205.05787 shows that the closed-loop dynamics of a well-trained RL locomotion policy is approximately linear. 2. Control/RL theory should be used to guide new paradigm designs, versus explaining existing paradigms. 3. We should pay more attention to lower-bound-style theoretical results, which show fundamental hardness of estimation or control for certain systems (e.g., balancing a super short pole in your hand). They can guide system designs and filter out the "wrong problem." 4. We need another DARPA Humanoid Challenge or a Humanoid Arena to systematically test and compare different approaches.

C Zhang@ChongZitaZhang

No more trust any RL conclusions from mujoco benchmarks, especially about exploration. Many results are there simply because: 1) not using massive parallel simulation for exploration, 2) not correctly randomizing environments and goals, 3) not reasonably setting up partial obs.

English

273

41.8K

Brett Barkley@bebark99·20 Ara

(10/10) In summary, Open AI Gym and DMC are equally conventional testbeds that share a common physics backend (Mujoco). There is no 'good' reason for MBPO and ALM to largely fail in DMC, but they do. We encourage readers to check out our paper for more: arxiv.org/pdf/2412.14312

English

178

Brett Barkley@bebark99·20 Ara

(9/n) Not only that, but at the time of this post MBPO has >1000 citations and a reproducibility study at Neurips. Despite this, only one paper has noted this performance gap, and it was only noted across hopper tasks in Gym and DMC.

English

188

Brett Barkley@bebark99·20 Ara

You might be surprised to learn that modern RL favors Dyna-style model-based algorithms for their sample efficiency, yet they can both require up to 40x more wall clock time to train and significantly underperform simple model-free methods across diverse benchmarks.

English

1.3K

Keşfet

@aurielws @NeurIPSConf @CodeSignalCom @GuanyaShi @Caltech @lschmidt3 @elonmusk @BarackObama