Eric Rosen (@_ericrosen) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Eric Rosen@_ericrosen·19 Mar

What’s the easiest way to improve your pretrained diffusion policy? Swap your Gaussian with a single noise vector that maximizes your downstream reward function! ✅ Keeps original policy weights! frozen ✅ No training new neural networks! ✅ No RL infrastructure needed!

Omkar Patil@op45_indian

🚨New paper alert 🚨 from @rai_inst! arxiv.org/abs/2603.15757 🤖You robot policy is actually better than you think! We find that for a given policy, ALWAYS denoising a single noise vector, which we call a ✨Golden Ticket ✨, leads to consistent performance improvements! 🧵...

English

2

6

25

4K

Eric Rosen retweetledi

Thomas Weng@thomas_weng·6d

The Golden Ticket Hypothesis🤝Generative Robot Policies Improve your policy by playing the lotto with initial noise vectors! Check out the original post for the paper and more videos.

Omkar Patil@op45_indian

🚨New paper alert 🚨 from @rai_inst! arxiv.org/abs/2603.15757 🤖You robot policy is actually better than you think! We find that for a given policy, ALWAYS denoising a single noise vector, which we call a ✨Golden Ticket ✨, leads to consistent performance improvements! 🧵...

English

0

2

5

698

Eric Rosen retweetledi

Chetan@chetan_·20 Mar

these results always make me chuckle. makes you realize how little we know about how these things work still

Omkar Patil@op45_indian

🚨New paper alert 🚨 from @rai_inst! arxiv.org/abs/2603.15757 🤖You robot policy is actually better than you think! We find that for a given policy, ALWAYS denoising a single noise vector, which we call a ✨Golden Ticket ✨, leads to consistent performance improvements! 🧵...

English

1

3

7

1.4K

Eric Rosen retweetledi

Omkar Patil@op45_indian·19 Mar

🌐Website: bdaiinstitute.github.io/lottery_ticket… 📜Paper: arxiv.org/abs/2603.15757 💻Code: github.com/bdaiinstitute/… Work done as a part of my internship in @rai_inst!

English

1

11

934

Eric Rosen@_ericrosen·20 Mar

@chris_j_paxton Thank you! 😊 I’m really excited for more works to explore latent steering with reinforcement learning (I ❤️ DSRL) There are benefits that complement direct finetuning / residual approaches, and it’s really interesting how simple it can be to do policy improvement using noise!

English

0

5

115

Chris Paxton@chris_j_paxton·20 Mar

A really interesting takeaway actually

Omkar Patil@op45_indian

🚨New paper alert 🚨 from @rai_inst! arxiv.org/abs/2603.15757 🤖You robot policy is actually better than you think! We find that for a given policy, ALWAYS denoising a single noise vector, which we call a ✨Golden Ticket ✨, leads to consistent performance improvements! 🧵...

English

2

4

30

6.4K

Eric Rosen@_ericrosen·19 Mar

Check out our open-source codebase to try it yourself! 🤗It includes results for @huggingface @LeRobotHF SmolVLA, where we improve the open-source checkpoint performance on LIBERO in 30 tasks!

GIF

English

0

1

101

Eric Rosen@_ericrosen·19 Mar

What’s the easiest way to improve your pretrained diffusion policy? Swap your Gaussian with a single noise vector that maximizes your downstream reward function! ✅ Keeps original policy weights! frozen ✅ No training new neural networks! ✅ No RL infrastructure needed!

Omkar Patil@op45_indian

🚨New paper alert 🚨 from @rai_inst! arxiv.org/abs/2603.15757 🤖You robot policy is actually better than you think! We find that for a given policy, ALWAYS denoising a single noise vector, which we call a ✨Golden Ticket ✨, leads to consistent performance improvements! 🧵...

English

2

6

25

4K

Eric Rosen retweetledi

Lakshita Dodeja@lakshitadodeja·18 Mar

For more details : Paper : ieeexplore.ieee.org/document/11267… Project Website: lakshitadodeja.github.io/uncertainty-aw… Code : github.com/lakshitadodeja… Arxiv : arxiv.org/abs/2506.17564 (5/5)

English

1

3

8

642

Eric Rosen@_ericrosen·3 Mar

@GiseopK This is beautiful! One comment: it looks like rainbow gradient is for both (all) link positions and conveying positional history of just last link. Maybe keep links rainbow, and have the end effector history be light-to-dark purple, to make it obvious it’s all from that link?

English

0

2

105

Giseop Kim@GiseopK·3 Mar

Preparing for tomorrow’s Advanced Mobile System class... (I'm sure Claude will make learning robotics more enjoyable 😗 )

English

1

0

14

712

Eric Rosen@_ericrosen·2 Mar

@cbames Nice! I’ve studied causal calculus for fun (ty @yudapearl), and I love do-calculus. Graphical models are just great 😊 Causal RL is something I’ve been meaning to dig into. I’m very familiar with model-based RL so seems related. I’m curious about applications to robotics.

English

0

2

39

Barrett Ames@cbames·2 Mar

@_ericrosen Thanks! I'm looking at Causal RL right now.

English

1

0

2

120

Barrett Ames@cbames·2 Mar

I've exited from BotBuilt. The company is continuing on a SaaS trajectory. I wish them all the best. I'm working on the next idea Rick Rubin style. Id appreciate factory tours and smart people to argue with about ideas.

Barrett Ames@cbames

Big changes coming soon!

English

13

1

55

5K

Eric Rosen@_ericrosen·1 Mar

🔗 Link to full post continuallylearning.github.io/Robotics/Refle…

English

0

2

190

Eric Rosen@_ericrosen·28 Şub

🤖 Reflected inertia plays an important role for safety (among other things) in robotics. ⚙️ I wrote an introductory post on reflected inertia, and how it is determined by gear ratios. ✍️ Check out full blog for details! #ContinuallyLearningBlog

English

1

43

2.2K

Eric Rosen@_ericrosen·27 Şub

Fantastic talk from @KyleMorgenstein on deploying RL and sim2real! Definitely recommend if you’re interested in how a strong background in controls helps guide choices in RL!

David Bar@observie

Lots of great stuff in here by @KyleMorgenstein > Low Kp = feedforward torque, not position tracking. Enables full exploration > Kp = max_torque / joint_RoM, D ≈ Kp/20 > High Kp pushes policy to torque limits, kills exploration > Train 2-5x past apparent convergence for smooth deployable policies > noise_std super important, must decrease and stabilize > Start with perfect sim, add randomization one factor at a time thehumanoid.ai/deployment-rea…

English

0

2

25

3.7K

Eric Rosen@_ericrosen·27 Şub

@KyleMorgenstein @maedmatt @observie Why not additionally log the individual reward components separately and see when those regularization rewards additionally converge as a heuristic for stopping point?

English

1

0

2

38

Kyle🤖🚀🦭@KyleMorgenstein·27 Şub

@maedmatt @observie regularization rewards are usually orders of magnitude smaller than task rewards. you only really smooth out torques and accelerations after the task rewards have converged. the reward is still changing even if it looks mostly flat because the changes are <<1

English

1

0

2

87

David Bar@observie·27 Şub

Lots of great stuff in here by @KyleMorgenstein > Low Kp = feedforward torque, not position tracking. Enables full exploration > Kp = max_torque / joint_RoM, D ≈ Kp/20 > High Kp pushes policy to torque limits, kills exploration > Train 2-5x past apparent convergence for smooth deployable policies > noise_std super important, must decrease and stabilize > Start with perfect sim, add randomization one factor at a time thehumanoid.ai/deployment-rea…

English

4

8

95

8.6K

Eric Rosen@_ericrosen·27 Şub

@vai_viswanathan @1x_tech @GoogleDeepMind I’ll have to take a look and report back, I appreciate the pointers! 😊

English

1

0

1

36

Vai Viswanathan@vai_viswanathan·27 Şub

@_ericrosen @1x_tech @GoogleDeepMind WorldGymnast shows that!

English

1

0

34

Eric Rosen@_ericrosen·26 Şub

Open question for robot learning: Should simulators / world models be used mostly during training, or inference? For example: - sim2real methods use them mostly during training. - TAMP / planning methods use them mostly during inference. What’s the best way to balance both?

English

4

0

13

1.9K

Eric Rosen@_ericrosen·26 Şub

@vai_viswanathan @1x_tech @GoogleDeepMind Thanks for the other references! I’d be open to that, although I’d be most convinced if an online RL policy trained in the world model worked zero-shot on hardware (sim2real style). That would demonstrate world model doesn’t have issues with reward hacking / generalizes well.

English

1

0

1

40

Vai Viswanathan@vai_viswanathan·26 Şub

@_ericrosen @1x_tech @GoogleDeepMind DreamDojo from Nvidia and WorldGym both show high correlation with reality! If there was a World Models that showed high success rate correlation with reality, would you use it?

English

1

0

1

54

Eric Rosen@_ericrosen·26 Şub

@vai_viswanathan No, we have a sim setup for our hardware / software stack. I know groups are pursuing world models for robot eval (@1x_tech, @GoogleDeepMind ), but I think we’re still in the very early days of it. But as the saying goes, today’s models are the worst they’ll ever be!

English

1

0

42

Vai Viswanathan@vai_viswanathan·26 Şub

@_ericrosen Do you use World Models for eval?

English

1

0

28

Eric Rosen@_ericrosen·26 Şub

Cool zero-shot reward model approach: Ask the VLM if task is complete, and use log prob on “True” token logits to compute a reward! They test it using advantage-weighted regression, although I’m curious if reward hacking happens in online learning setting (we see that a lot)

Jiafei Duan@DJiafei

Instead of asking a VLM to output progress, it reads the model’s internal belief directly from token logits. No in-context learning. No fine-tuning. No reward training. 📈 We introduce: TOPReward, a zero-shot reward modeling approach for robotics using token probabilities from pretrained video VLMs. The simplest way of doing reward modelling for robotics! Project: topreward.github.io/webpage/ 🧵👇

English

0

1

18

2.5K

Eric Rosen@_ericrosen·26 Şub

@vai_viswanathan Definitely! We use sims to evaluate our behavior cloning policies against task metrics, which really helps with hyperparameter tuning for hardware policies 😊

English

1

0

1

42

Vai Viswanathan@vai_viswanathan·26 Şub

@_ericrosen they should be used for eval!

English

1

0

1

50

Eric Rosen@_ericrosen·26 Şub

@JieWang_ZJUI Yeah, I love the spectrum of sims for eval: - larger, diverse sim benchmarks that aren’t as realistic / relevant, but provide broad signal on how general a method is. - smaller, targeted sim that is realistic and relevant to your task/robot. Useful overall, especially sim2real.

English

0

1

117

Jie Wang@JieWang_ZJUI·26 Şub

@_ericrosen use for evaluation

English

1

0

2

176

Eric Rosen

Keşfet