Marius Memmel

130 posts

Marius Memmel

Marius Memmel

@memmelma

Robotics PhD student @UW, previously intern @NVIDIA, intern @Bosch_AI, @EPFL, @TUDarmstadt, @DHBW

Seattle, WA Katılım Nisan 2021
566 Takip Edilen498 Takipçiler
Sabitlenmiş Tweet
Marius Memmel
Marius Memmel@memmelma·
There’s a discussion going on rn about two recent robotic reward models: TOPReward⛰️ and Robometer🌡️ Which one is better? It depends entirely on your objective! Here is a deep dive into the conceptual differences, strengths, and weaknesses of both. 🧵👇
Marius Memmel tweet media
English
3
18
55
14.1K
Marius Memmel retweetledi
Abhishek Gupta
Abhishek Gupta@abhishekunique7·
Excited to share the project that has surprised me the most in the last year! Large-scale RL in simulation, no demos and no reward engineering can solve dynamic, dexterous and contact rich tasks. The learned behaviors are reactive, forceful and use the environment for recovery in ways that are extremely challenging to bake in or teleoperate! You can play with the policies yourself to see: weirdlabuw.github.io/omnireset/ And, the learned behavior transfers to real world robots from RGB camera inputs! So what’s the trick - using simulator resets carefully! Let’s unpack (1/10)
English
17
77
513
55.6K
Marius Memmel retweetledi
Patrick Yin
Patrick Yin@patrickhyin·
We’re releasing OmniReset, a framework for training robot policies using large-scale RL and diverse resets for contact-rich, dexterous manipulation. OmniReset pushes the frontier of robustness and dexterity, without any reward engineering or demonstrations. Try the policies yourself in our interactive simulator! weirdlabuw.github.io/omnireset/ (1/N 🧵)
English
17
85
393
73.7K
Marius Memmel
Marius Memmel@memmelma·
Hi Shirui, thanks for the clarification. The plot above shows that TOPReward fails to detect task success ("put the pear on the red plate") and hallucinates progress for the unrelated language instructions. So even with the raw logits (which we use for the trajectory ranking and confusion matrix evaluations), TOPReward performs poorly on comparing progress across multiple trajectories. To be useful as a reward model for RL, there must be a clear separation between successful and unsuccessful/unrelated trajectories. Basically, the problem is that during RL exploration, the agent often fails to solve the task or solves an unrelated task (e.g., "put the strawberry on the red plate"). If these trajectories get comparable rewards to a successful one, Q/V values will be similar for all of these trajectories. As such, the policy will likely extract behaviors that solve the wrong task.
English
0
0
0
14
Shirui
Shirui@ChinSengi·
@memmelma Hi Marius, author of TOPReward here, actually our method does allow comparison across multiple trajectories with the raw token probability. It is the model's internal belief of the progress of the task. Also our advantage-aligned BC does not use normalized progress metric.
English
1
0
1
23
Marius Memmel
Marius Memmel@memmelma·
There’s a discussion going on rn about two recent robotic reward models: TOPReward⛰️ and Robometer🌡️ Which one is better? It depends entirely on your objective! Here is a deep dive into the conceptual differences, strengths, and weaknesses of both. 🧵👇
Marius Memmel tweet media
English
3
18
55
14.1K
Marius Memmel retweetledi
Tyler Han
Tyler Han@TylerHan19·
Animals can’t learn by being tele-operated. But, they do learn by observing and interacting with the world around them. So, why don’t robots learn this way? Excited to release, “Planning from Observation and Interaction”, for real-world observational learning on robots! 🧵(1/12)
English
2
15
80
7.4K
Marius Memmel
Marius Memmel@memmelma·
Do off-the-shelf VLMs hallucinate progress? While playing around with TOPReward⛰️, I’ve noticed some patterns that I didn't see with Roboreward🌡️ and hypothesized that “off-the-shelf VLMs hallucinate progress”. To test this hypothesis, I evaluated Robometer🌡️ and TOPReward⛰️ (Qwen3-VL-8B-Instruct) with different language instructions on the same video of the robot “placing a banana on a red plate”. While TOPReward⛰️ predicts increasing progress for the correct language instruction (”put the banana on the red plate”), it predicts oscillating progress for the always-true instruction (”put the pear on the red plate”), and completely OOD instructions (”fold the pink towel” and “make a left turn with the car”). In contrast, the correct and OOD instructions are clearly separated in the Robometer🌡️ evaluation! These findings support the above recommendation to use TOPReward⛰️ as a simple baseline for filtering offline trajectories and Robometer🌡️ as a reward model for reinforcement learning 🦾
Marius Memmel tweet mediaMarius Memmel tweet media
English
2
0
8
668
Marius Memmel retweetledi
Marius Memmel retweetledi
Jesse Zhang
Jesse Zhang@Jesse_Y_Zhang·
A reward model that works, zero-shot, across robots, tasks, and scenes? Introducing Robometer: Scaling general-purpose robotic reward models with 1M+ trajectories. Enables zero-shot: online/offline/model-based RL, data retrieval + IL, automatic failure detection, and more! 🧵 (1/12)
English
7
104
401
87.4K
Marius Memmel retweetledi
Avinandan Bose
Avinandan Bose@avibose22·
🚨Checkout our latest work on inference time coldstart personalization Memory is promising, but what if a user queries a task out of domain from their chat history? In session multi turn interactions provide a great opportunity to surface user needs via strategic elicitation 🔁
Avinandan Bose tweet media
Stella Li@StellaLisy

Personalization assumes you need history with a user. What if you don't? Cold-start is hard: each task&user has many preference dimensions, but each user only cares about a few. A few strategic questions is all you need, if u know how preferences correlate across population👉🏻🧵

English
0
1
11
5.5K
Marius Memmel retweetledi
Nicholas Pfaff
Nicholas Pfaff@NicholasEPfaff·
Meet SceneSmith: An agentic system that generates entire simulation-ready environments from a single text prompt. VLM agents collaborate to build scenes with dozens of objects per room, articulated furniture, and full physics properties. We believe environment generation is no longer the bottleneck for scalable robot training and evaluation in simulation. Website: scenesmith.github.io 👇🧵(1/8)
English
18
79
560
71.3K
Marius Memmel retweetledi
Abhishek Gupta
Abhishek Gupta@abhishekunique7·
Check out new work from @EntongSu for RL finetuning pre-training flow policies with residual flow steering. The motivation is simple - steering input diffusion noise can struggle to handle higher dexterity problems like multi-fingered hands, because the base policy may not cover the requisite behavior. In these cases, you want local adjustments to the behavior with an action space residual, along with global adjustments to the strategy by modulating input noise for a flow policy. The solution is frustratingly simple, but works really well for high dexterity manipulation - especially with dexterous multifingered hands. Check out @EntongSu's thread for more details, and our website / paper. entongsu.github.io/rfs/ arxiv.org/abs/2602.01789
Entong Su@EntongSu

Pretrained diffusion/flow policies are powerful — but brittle at deployment. We introduce RFS, a data-efficient RL framework that: • steers latent noise for global adaptation • applies residual actions for precise local correction Works in sim and real-world dexterous manipulation 🖐️🤖 👉📄 Paper + videos: entongsu.github.io/rfs/

English
1
7
58
7K
Marius Memmel retweetledi
Entong Su
Entong Su@EntongSu·
Pretrained diffusion/flow policies are powerful — but brittle at deployment. We introduce RFS, a data-efficient RL framework that: • steers latent noise for global adaptation • applies residual actions for precise local correction Works in sim and real-world dexterous manipulation 🖐️🤖 👉📄 Paper + videos: entongsu.github.io/rfs/
English
8
26
178
17K
Marius Memmel retweetledi
Jiafei Duan
Jiafei Duan@DJiafei·
Why do generalist robotic models fail when a cup is moved just two inches to the left? It’s not a lack of motor skill, it’s an alignment problem. Today, we introduce VLS: Vision-Language Steering of Pretrained Robot Policies, a training-free framework that guides robot behavior in real time. Check out the project: vision-language-steering.github.io/webpage/ 👇🧵 (Watch till the end: VLS runs uncut, steering pretrained policies across long-horizon tasks.)
English
2
35
192
53.8K
Marius Memmel retweetledi
Avinandan Bose
Avinandan Bose@avibose22·
🚨 New Blog Alert: Adaptive Intelligence (and Why LLMs Lack It) Lately I've been obsessing over a simple question: why do AI assistants feel harder to use even as they get smarter? You know the feeling. You wanted something specific. Your AI didn't ask, it just "thought for 57 seconds" and confidently solved a problem you never had in mind. Now you're 10 "thought for a minute" deep, feeling unheard, trying to steer it back to what you actually meant. That feeling? It's a real capability gap. 🧵👇
Avinandan Bose tweet media
English
1
3
24
10.2K
Marius Memmel retweetledi
Andrew Wagenmaker
Andrew Wagenmaker@ajwagenmaker·
How should we pretrain a policy from demonstrations to ensure it is an effective initialization for RL finetuning, while preserving the performance of the pretrained policy itself? We propose Posterior Behavioral Cloning (PostBC)! (1/11)
English
3
34
274
57.6K