Philip Schroeder

30 posts

Philip Schroeder

Philip Schroeder

@_pschro

PhD student at MIT. @MIT_CSAIL @MITEECS @nlp_mit

Beigetreten Kasım 2025
316 Folgt244 Follower
Angehefteter Tweet
Philip Schroeder
Philip Schroeder@_pschro·
Excited to share our NeurIPS 2025 paper introducing our video reasoning framework, ROVER (Reasoning Over VidEo Recursively), that improves visual understanding of VLMs in embodied settings. ROVER is a recursive framework that enables the model to maintain a compact attention window at each timestep of the video, without losing global context across the full video. ROVER works by decomposing the video into segments corresponding to each subtask within the full task trajectory. ROVER then generates a separate line of reasoning for each subtask, instead of attempting to reason across the full trajectory. We evaluate on simulated and real-world robotic manipulation tasks from RoboCasa and OpenX Embodiment. Overall, ROVER significantly improves the ability of VLMs to reason about what is happening at each moment during a robot task attempt. rover-vlm.github.io
English
1
4
8
2.2K
Philip Schroeder retweetet
Dominique Paul
Dominique Paul@DominiqueCAPaul·
The @huggingface team just published an incredible post on fine-tuning π0 / π0.5 for shirt folding. Key finding: algorithmic tweaks gave 5–20%. Training only on the top-20% of data gave +50%. They document 1,900 engineering hours, created intuitive method visualisations, and most of all, included a section on what didn't work (you won't find that in an academic paper). Recommendations: → Data quality > quantity → DAgger-style collection → Relative joint positions → Action interpolation + RTC → RABC during training Highly recommend reading the full post. @LeRobotHF
Dominique Paul tweet media
English
8
21
175
11K
Philip Schroeder retweetet
Jack Vial
Jack Vial@jackvial89·
pi0.5 with rtc. I trained the expert + last 3 layers of the backbone on an rtx 4070 ti super. going to use this as the base model for recap. next up is collecting a first round of rollouts so I have data to train advantage conditioning
English
14
5
120
6.7K
Philip Schroeder retweetet
pfung
pfung@philfung·
Last week, @positronic_ro released a cool robot study gauging how well the latest open models perform on real-life manufacturing tasks. They compared a human, a teleoperated arm, and models pi0.5, GR00T-1.6, ACT, and HF-SmolVLA on simple pick-and-place bin tasks (batteries, towels, spoons, etc.). The models were fine-tuned using BC on ~350 demos. KEY CONCLUSIONS: 1. Not Usable in Real Life: Open models were essentially unusable in real-life settings, even for simple pick/place tasks. The mean time to failure/assist was only 4 minutes and worked at < 5% the speed of a person. 2. Teleop Works but is Slow: Teleoperation did just as well as a human (i.e., 100%), but at 25% the speed. 3. Pi ~= Groot: pi.05 and GR00T-1.6 performed similarly, slightly outperforming vanilla ACT. Visit the website to see every success and failure — it's really insightful! phail.ai Thanks folks at @positronic_ro for releasing such a great study for all our benefit.
pfung tweet media
San Francisco, CA 🇺🇸 English
1
1
4
392
Philip Schroeder retweetet
Patrick Yin
Patrick Yin@patrickhyin·
We’re releasing OmniReset, a framework for training robot policies using large-scale RL and diverse resets for contact-rich, dexterous manipulation. OmniReset pushes the frontier of robustness and dexterity, without any reward engineering or demonstrations. Try the policies yourself in our interactive simulator! weirdlabuw.github.io/omnireset/ (1/N 🧵)
English
21
93
465
106K
Philip Schroeder retweetet
Physical Intelligence
Physical Intelligence@physical_int·
We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.
English
37
286
2.2K
412K
Philip Schroeder retweetet
Eric Rosen
Eric Rosen@_ericrosen·
What’s the easiest way to improve your pretrained diffusion policy? Swap your Gaussian with a single noise vector that maximizes your downstream reward function! ✅ Keeps original policy weights! frozen ✅ No training new neural networks! ✅ No RL infrastructure needed!
Omkar Patil@op45_indian

🚨New paper alert 🚨 from @rai_inst! arxiv.org/abs/2603.15757 🤖You robot policy is actually better than you think! We find that for a given policy, ALWAYS denoising a single noise vector, which we call a ✨Golden Ticket ✨, leads to consistent performance improvements! 🧵...

English
2
6
27
4.1K
Philip Schroeder retweetet
Omkar Patil
Omkar Patil@op45_indian·
🚨New paper alert 🚨 from @rai_inst! arxiv.org/abs/2603.15757 🤖You robot policy is actually better than you think! We find that for a given policy, ALWAYS denoising a single noise vector, which we call a ✨Golden Ticket ✨, leads to consistent performance improvements! 🧵...
English
3
19
135
24.3K
Philip Schroeder retweetet
Lakshita Dodeja
Lakshita Dodeja@lakshitadodeja·
Residual RL is a powerful strategy for adapting a pretrained base policy, but it struggles when: ❌ Exploration is uncontrolled ❌ Base policies are stochastic We tackle this in our new RAL paper “Accelerating Residual Reinforcement Learning with Uncertainty Estimation” (1/5)
English
1
17
81
8.5K
Philip Schroeder retweetet
0x796F
0x796F@0x796F·
You can now train @physical_int style robots in 1 day for only $5k. Anvil’s devkits have all the hardware, software, controls, cameras, and more ready-to-go. (1/5)
English
22
74
573
322.8K
Philip Schroeder retweetet
Jiaheng Hu
Jiaheng Hu@JiahengHu1·
VLA models are capable generalists. But can they continually self-improve? Such Continual Reinforcement Learning (CRL) problems are traditionally considered very challenging. Surprisingly, we found that with the right setup, the simplest CRL recipe can work really well! arxiv.org/abs/2603.11653
English
7
50
269
44.4K
Philip Schroeder retweetet
Seungwook Han
Seungwook Han@seungwookh·
Can language models learn useful priors without ever seeing language? We pre-pre-train transformers on neural cellular automata — fully synthetic, zero language. This improves language modeling by up to 6%, speeds up convergence by 40%, and strengthens downstream reasoning. Surprisingly, it even beats pre-pre-training on natural text! Blog: hanseungwook.github.io/blog/nca-pre-p… (1/n)
Seungwook Han tweet media
English
47
261
1.7K
246.8K
Philip Schroeder retweetet
Anthony Liang
Anthony Liang@aliangdw·
Just released RBM-1M-OOD! We collected and annotated 1k+ robot trajectories of varying expertise across 4 universities for evaluating reward models. 📂Dataset: huggingface.co/datasets/robom… Also give Robometer a try on your own robot trajectories here: huggingface.co/spaces/robomet…
Anthony Liang@aliangdw

Super excited to share Robometer, a reward model that works zero-shot across robots, tasks, and scenes! Try fine-tuning Robometer on your own dataset! 🌐Project website: robometer.github.io 💻Code: github.com/robometer/robo…

English
0
14
48
5.9K
Philip Schroeder retweetet
Nishanth Kumar
Nishanth Kumar@nishanthkumar23·
State-of-the-art robot policies often need hundreds of hours of data. What if we needed none? Introducing TiPToP: a manipulation system that zero-shots open-world tasks from pixels and language using vision foundation models and GPU-parallelized Task and Motion Planning (TAMP).
English
7
38
217
74.3K
Philip Schroeder retweetet
Adam Hung
Adam Hung@Adamjhung·
Learning from human videos often requires restrictive, carefully choreographed human motions. We propose ✨3PoinTr✨: a scalable way to pretrain from casual human videos. It bridges the embodiment gap by learning 3D scene evolution, enabling learning from natural human motions.
English
1
24
68
8.6K
Philip Schroeder retweetet
Yinpei Dai
Yinpei Dai@YinpeiD·
Robot memory methods are growing fast, but systematic evaluation is largely lacking. 📉 Introducing RoboMME: a new benchmark for memory-augmented robotic manipulation! 🤖🧠 Featuring 16 tasks across temporal, spatial, object, and procedural memory 🔗 robomme.github.io
English
6
46
221
52.7K
Philip Schroeder retweetet
Itamar Pres
Itamar Pres@PresItamar·
New paper: It's time to optimize for 🔁self-consistency 🔁 We’ve pushed LLMs to the limits of available data, yet failures like sycophancy and factual inconsistency persist. We argue these stem from the same assumption: that behavior can be specified one I/O pair at a time. 🧵
Itamar Pres tweet media
English
16
55
425
72.9K
Philip Schroeder retweetet
pfung
pfung@philfung·
added the excellent Robometer algorithm!! Easily compare your fav robot reward functions (Robometer, TOPReward, GVL, etc) in one website: philfung.github.io/rewardscope Thanks @aliangdw, @yigitkkorkmaz, @Jesse_Y_Zhang, @JiahuiZhang__32 for amazing Robometer paper!
pfung@philfung

Inspired by the TopReward paper, I made a lil web tool to test these robot manipulation rewards on your own videos. Try: philfung.github.io/rewardscope Record yourself folding a towel, upload it, and compare: 1. TopReward (this paper) 2. GVL (Deepmind) 3. Brute Force (i.e. at each frame, ask LLM to reply with a probability) TopReward (Qwen3VL-8B) holds its own surprisingly well against the others, even if those use ChatGPT! Great work @DJiafei, UW, AllenAI, thanks for pushing @VilleKuosmanen.

San Francisco, CA 🇺🇸 English
1
8
55
4.4K
Philip Schroeder retweetet
Shivin Dass
Shivin Dass@ShivinDass·
Tried out RoboMeter on some of the old demos from Telemoma and it's pretty neat! The progress tracks are quite accurate given that these videos are pretty random. Also their visualizer is a lot of fun to play with: robometer-rewardeval-ui.hf.space
Jesse Zhang@Jesse_Y_Zhang

A reward model that works, zero-shot, across robots, tasks, and scenes? Introducing Robometer: Scaling general-purpose robotic reward models with 1M+ trajectories. Enables zero-shot: online/offline/model-based RL, data retrieval + IL, automatic failure detection, and more! 🧵 (1/12)

English
2
10
47
3.7K
Philip Schroeder retweetet
Marius Memmel
Marius Memmel@memmelma·
There’s a discussion going on rn about two recent robotic reward models: TOPReward⛰️ and Robometer🌡️ Which one is better? It depends entirely on your objective! Here is a deep dive into the conceptual differences, strengths, and weaknesses of both. 🧵👇
Marius Memmel tweet media
English
3
18
57
14.3K