Eric Rosen

1.1K posts

Eric Rosen banner
Eric Rosen

Eric Rosen

@_ericrosen

Robotics Research Scientist @ Robotics and AI Institute (RAI) | CS PhD from @BrownUniversity 🤖

Boston, MA Katılım Temmuz 2019
753 Takip Edilen1.3K Takipçiler
Sabitlenmiş Tweet
Eric Rosen
Eric Rosen@_ericrosen·
What’s the easiest way to improve your pretrained diffusion policy? Swap your Gaussian with a single noise vector that maximizes your downstream reward function! ✅ Keeps original policy weights! frozen ✅ No training new neural networks! ✅ No RL infrastructure needed!
Omkar Patil@op45_indian

🚨New paper alert 🚨 from @rai_inst! arxiv.org/abs/2603.15757 🤖You robot policy is actually better than you think! We find that for a given policy, ALWAYS denoising a single noise vector, which we call a ✨Golden Ticket ✨, leads to consistent performance improvements! 🧵...

English
2
6
25
4K
Eric Rosen retweetledi
Thomas Weng
Thomas Weng@thomas_weng·
The Golden Ticket Hypothesis🤝Generative Robot Policies Improve your policy by playing the lotto with initial noise vectors! Check out the original post for the paper and more videos.
Omkar Patil@op45_indian

🚨New paper alert 🚨 from @rai_inst! arxiv.org/abs/2603.15757 🤖You robot policy is actually better than you think! We find that for a given policy, ALWAYS denoising a single noise vector, which we call a ✨Golden Ticket ✨, leads to consistent performance improvements! 🧵...

English
0
2
5
698
Eric Rosen retweetledi
Chetan
Chetan@chetan_·
these results always make me chuckle. makes you realize how little we know about how these things work still
Omkar Patil@op45_indian

🚨New paper alert 🚨 from @rai_inst! arxiv.org/abs/2603.15757 🤖You robot policy is actually better than you think! We find that for a given policy, ALWAYS denoising a single noise vector, which we call a ✨Golden Ticket ✨, leads to consistent performance improvements! 🧵...

English
1
3
7
1.4K
Eric Rosen
Eric Rosen@_ericrosen·
@chris_j_paxton Thank you! 😊 I’m really excited for more works to explore latent steering with reinforcement learning (I ❤️ DSRL) There are benefits that complement direct finetuning / residual approaches, and it’s really interesting how simple it can be to do policy improvement using noise!
English
0
0
5
115
Eric Rosen
Eric Rosen@_ericrosen·
Check out our open-source codebase to try it yourself! 🤗It includes results for @huggingface @LeRobotHF SmolVLA, where we improve the open-source checkpoint performance on LIBERO in 30 tasks!
GIF
English
0
0
1
101
Eric Rosen
Eric Rosen@_ericrosen·
What’s the easiest way to improve your pretrained diffusion policy? Swap your Gaussian with a single noise vector that maximizes your downstream reward function! ✅ Keeps original policy weights! frozen ✅ No training new neural networks! ✅ No RL infrastructure needed!
Omkar Patil@op45_indian

🚨New paper alert 🚨 from @rai_inst! arxiv.org/abs/2603.15757 🤖You robot policy is actually better than you think! We find that for a given policy, ALWAYS denoising a single noise vector, which we call a ✨Golden Ticket ✨, leads to consistent performance improvements! 🧵...

English
2
6
25
4K
Eric Rosen
Eric Rosen@_ericrosen·
@GiseopK This is beautiful! One comment: it looks like rainbow gradient is for both (all) link positions and conveying positional history of just last link. Maybe keep links rainbow, and have the end effector history be light-to-dark purple, to make it obvious it’s all from that link?
English
0
0
2
105
Giseop Kim
Giseop Kim@GiseopK·
Preparing for tomorrow’s Advanced Mobile System class... (I'm sure Claude will make learning robotics more enjoyable 😗 )
English
1
0
14
712
Eric Rosen
Eric Rosen@_ericrosen·
@cbames Nice! I’ve studied causal calculus for fun (ty @yudapearl), and I love do-calculus. Graphical models are just great 😊 Causal RL is something I’ve been meaning to dig into. I’m very familiar with model-based RL so seems related. I’m curious about applications to robotics.
English
0
0
2
39
Barrett Ames
Barrett Ames@cbames·
I've exited from BotBuilt. The company is continuing on a SaaS trajectory. I wish them all the best. I'm working on the next idea Rick Rubin style. Id appreciate factory tours and smart people to argue with about ideas.
Barrett Ames@cbames

Big changes coming soon!

English
13
1
55
5K
Eric Rosen
Eric Rosen@_ericrosen·
🤖 Reflected inertia plays an important role for safety (among other things) in robotics. ⚙️ I wrote an introductory post on reflected inertia, and how it is determined by gear ratios. ✍️ Check out full blog for details! #ContinuallyLearningBlog
Eric Rosen tweet mediaEric Rosen tweet mediaEric Rosen tweet mediaEric Rosen tweet media
English
1
1
43
2.2K
Eric Rosen
Eric Rosen@_ericrosen·
Fantastic talk from @KyleMorgenstein on deploying RL and sim2real! Definitely recommend if you’re interested in how a strong background in controls helps guide choices in RL!
David Bar@observie

Lots of great stuff in here by @KyleMorgenstein > Low Kp = feedforward torque, not position tracking. Enables full exploration > Kp = max_torque / joint_RoM, D ≈ Kp/20 > High Kp pushes policy to torque limits, kills exploration > Train 2-5x past apparent convergence for smooth deployable policies > noise_std super important, must decrease and stabilize > Start with perfect sim, add randomization one factor at a time thehumanoid.ai/deployment-rea…

English
0
2
25
3.7K
Eric Rosen
Eric Rosen@_ericrosen·
@KyleMorgenstein @maedmatt @observie Why not additionally log the individual reward components separately and see when those regularization rewards additionally converge as a heuristic for stopping point?
English
1
0
2
38
Kyle🤖🚀🦭
Kyle🤖🚀🦭@KyleMorgenstein·
@maedmatt @observie regularization rewards are usually orders of magnitude smaller than task rewards. you only really smooth out torques and accelerations after the task rewards have converged. the reward is still changing even if it looks mostly flat because the changes are <<1
English
1
0
2
87
David Bar
David Bar@observie·
Lots of great stuff in here by @KyleMorgenstein > Low Kp = feedforward torque, not position tracking. Enables full exploration > Kp = max_torque / joint_RoM, D ≈ Kp/20 > High Kp pushes policy to torque limits, kills exploration > Train 2-5x past apparent convergence for smooth deployable policies > noise_std super important, must decrease and stabilize > Start with perfect sim, add randomization one factor at a time thehumanoid.ai/deployment-rea…
English
4
8
95
8.6K
Eric Rosen
Eric Rosen@_ericrosen·
Open question for robot learning: Should simulators / world models be used mostly during training, or inference? For example: - sim2real methods use them mostly during training. - TAMP / planning methods use them mostly during inference. What’s the best way to balance both?
English
4
0
13
1.9K
Eric Rosen
Eric Rosen@_ericrosen·
@vai_viswanathan @1x_tech @GoogleDeepMind Thanks for the other references! I’d be open to that, although I’d be most convinced if an online RL policy trained in the world model worked zero-shot on hardware (sim2real style). That would demonstrate world model doesn’t have issues with reward hacking / generalizes well.
English
1
0
1
40
Vai Viswanathan
Vai Viswanathan@vai_viswanathan·
@_ericrosen @1x_tech @GoogleDeepMind DreamDojo from Nvidia and WorldGym both show high correlation with reality! If there was a World Models that showed high success rate correlation with reality, would you use it?
English
1
0
1
54
Eric Rosen
Eric Rosen@_ericrosen·
@vai_viswanathan No, we have a sim setup for our hardware / software stack. I know groups are pursuing world models for robot eval (@1x_tech, @GoogleDeepMind ), but I think we’re still in the very early days of it. But as the saying goes, today’s models are the worst they’ll ever be!
English
1
0
0
42
Eric Rosen
Eric Rosen@_ericrosen·
Cool zero-shot reward model approach: Ask the VLM if task is complete, and use log prob on “True” token logits to compute a reward! They test it using advantage-weighted regression, although I’m curious if reward hacking happens in online learning setting (we see that a lot)
Jiafei Duan@DJiafei

Instead of asking a VLM to output progress, it reads the model’s internal belief directly from token logits. No in-context learning. No fine-tuning. No reward training. 📈 We introduce: TOPReward, a zero-shot reward modeling approach for robotics using token probabilities from pretrained video VLMs. The simplest way of doing reward modelling for robotics! Project: topreward.github.io/webpage/ 🧵👇

English
0
1
18
2.5K
Eric Rosen
Eric Rosen@_ericrosen·
@vai_viswanathan Definitely! We use sims to evaluate our behavior cloning policies against task metrics, which really helps with hyperparameter tuning for hardware policies 😊
English
1
0
1
42
Eric Rosen
Eric Rosen@_ericrosen·
@JieWang_ZJUI Yeah, I love the spectrum of sims for eval: - larger, diverse sim benchmarks that aren’t as realistic / relevant, but provide broad signal on how general a method is. - smaller, targeted sim that is realistic and relevant to your task/robot. Useful overall, especially sim2real.
English
0
0
1
117