Oliver

125 posts

Oliver

@XOS9000

interested in everything AI & Robotics

Katılım Ocak 2022

253 Takip Edilen26 Takipçiler

Oliver@XOS9000·6d

@zhengyiluo Any data on this release? Egoscale’s data was planned to be released no?

English

Zhengyi “Zen” Luo@zhengyiluo·8 May

Open-sourcing the whole package here! The last piece of our SONIC open-source, data collection, gr00t VLA post-training, inference just hit the repo! Train your Autonomous policies on G1 Whole-body with SONIC and gr00t N1.7! 🧑‍💻Code: github.com/NVlabs/GR00T-W… 📑Docs: nvlabs.github.io/GR00T-WholeBod…

Zhengyi “Zen” Luo@zhengyiluo

SONIC is now open-source! Generalist whole-body teleoperation for EVERYONE! Our team has long been building comprehensive pipelines for whole-body control, kinematic planner, and teleoperation, and they will all be shared. This will be a continuous update; inference code + model already there, training code and gr00t integration coming soon! Code: github.com/NVlabs/GR00T-W… Docs: nvlabs.github.io/GR00T-WholeBod… Site: nvlabs.github.io/GEAR-SONIC/

English

375

45.2K

Oliver@XOS9000·6d

@WillShenSaysHi @JieWang_ZJUI Worth noting that Dreamzero has only been tested on 2 tasks, compared to 7 for Tiptop.

English

163

William Shen@WillShenSaysHi·6d

𝗧𝗶𝗣𝗧𝗼𝗣 𝗶𝘀 #𝟭 𝗼𝗻 𝗠𝗼𝗹𝗺𝗼𝗦𝗽𝗮𝗰𝗲𝘀! Outperforming VLAs including MolmoAct2 and π₀.₅, and WAMs like DreamZero It's the only method that uses inference-time search and 𝙯𝙚𝙧𝙤 robot data. We didn't do any benchmark-specific tuning.

English

133

12.3K

Oliver@XOS9000·7 May

Anybody know what model this j2-vla model which seams to dominate the robotarena leaderboards? Still only 20 evals but very strong early results!

English

Oliver@XOS9000·18 Nis

@JieWang_ZJUI Question wether they use any infrared depth sensors, and whether its just using the proprioception. If not, very impressive!

English

Jie Wang@JieWang_ZJUI·17 Nis

impresssive robustness under light change If you change light conditions, most VLAs will unable to act correctly

Generalist@GeneralistAI

GEN-1 still works with lights off, and generalizes under harsh lighting conditions. The model uses raw video pixels to make decisions, so strong lighting changes can drastically alter its input distribution. Yet performance still holds. Why? GEN-1 was pre-trained on a massive, diverse dataset of different lighting conditions—everywhere from outdoor farms, to warehouses, from grocery stores, to dimly lit homes—it's already seen it all, and transfers this knowledge to new tasks. This is a glimpse of what we call Mastery, and is part of the reason these models can cross a new performance threshold. Read more about it in our blog post in the comments below 👇

English

1.3K

Oliver@XOS9000·17 Nis

@benjamin_bolte So u’r bullish on companies actually manufacturing robots and then compaies integrating them into factories?

English

393

Benjamin Bolte@benjamin_bolte·16 Nis

I mean, I know this is just some astroturfing thing and I should just ignore it. But seriously, don't fall for it anon, you're gonna get rugged. The day Alibaba or Minimax or whoever open-sources their video action model, Pi will fade into obscurity and everyone will collectively remember that startups are supposed to try and make money. I would be shocked if Alibaba doesn't already have a Tesla / xAI-level real-time video model release planned for the next 12 months. I have multiple friends at Pi, they're super smart and hard-working. And they've done great with their secondaries. I still cannot fathom how someone can look at this situation and not see the glaring sectoral risk. You're taking business and machine learning advice from the geniuses behind Everyday Robots. Advice for anyone trying to invest in robotics: just buy Unitree on Hiive, it's still a huge discount and it's an objectively great business. Write-up on Unitree's IPO filing: therobotreport.com/unitree-ipo-sh… Key details: - 60% (!) gross margins - 300% YoY growth, $250m in revenue - Humanoids at > 50% of core revenue

Y Combinator@ycombinator

Physical Intelligence (@physical_int) is building a foundation model that can control any robot to do any task — what the team describes as the GPT moment for robotics. The company's cross-embodiment approach trains across many different robot platforms, and recent results show tasks being performed zero-shot that last year required hundreds of hours of data collection. In this episode of the @LightconePod , co-founder Quan Vuong (@QuanVng) sat down with @garrytan, @snowmaker, @sdianahu, and @harjtaggar to talk about why robotics is finally ready for its scaling moment, how PI runs its models in the cloud rather than on-device, and the playbook for what Quan sees as a Cambrian explosion of vertical robotics companies. 00:00 — Robotics just got cheaper 00:41 — The GPT moment for robotics 02:24 — Why robots didn’t work before 05:30 — The breakthrough that changed everything 09:12 — The data problem 13:33 — Robots learning without data 15:05 — Robots folding laundry (for real) 22:18 — From engineering problem → ops problem 29:12 — The startup playbook 38:46 — Thousands of robotics startups are coming

English

227

49.3K

Oliver@XOS9000·11 Nis

@DominiqueCAPaul Could u explain point 4 a bit better? Are these gestures for labeling episodes?

English

154

Dominique Paul@DominiqueCAPaul·11 Nis

Thoughts from a teleop session today:  1/ Teleop is painful. UMI-style grippers are making more and more sense to me: shorter per-episode execution, more data, and more intuitive for factory workers who will be the ones with my product eventually. Wondering if PI resists this because researchers aren't the ones collecting the data?  2/ The take-away of the HF shirt folding post stuck with me: data quality matters most of all. I'm 30 minutes into this task and still making mistakes with teleop. What’s the perspective for non-roboticists? Maybe VR headset is better. Want to try that next. 3/ I’m noticeable better at teleop even when I’m just 15cm closer. 4/ Double-close gestures for re-record (left) and early episode end (right) are a game changer. Credit @neurosp1ke. 5/ Want to gamify my own collection more: thinking of a daily target dashboard. 6/ I’d like to rate each episode with a 1-5 data quality score. Don't wanna throw away bad data away, but still be able to filter top-quality. Maybe possible with foot pedals?

English

237

37.8K

Oliver@XOS9000·25 Mar

@zhaohang0124 @SeonghyeonYe But dreamzero didnt do any ‘imagination’ at their inference time, they just denoise noisy actions (they kept the frame prediction since it didnt affect runtime much, not because it was crucial for performance). Or am i missing smth?

English

532

Hang Zhao@zhaohang0124·23 Mar

Our recent findings on World Action Models (WAMs): the core advantage of WAMs is not test-time “imagination” of futures, but the training-time supervision from future video prediction. We propose Fast-WAM, which makes inference simple, fast, and policy-centric.

English

819

156.9K

Oliver@XOS9000·23 Mar

How does Claude Code stack up against all of Norway in developing AI solutions? 🇳🇴 Over 3,000 participants in the norwegian ai championship, competing for 100k USD, and my claude bot running fully autonomously on a single prompt placed top 50 in two out of three problems!

English

Oliver@XOS9000·13 Mar

@ShuoYangAIR @AndreTI How slow is it? Dreamzero runs in like 0.6 seconds (but on GB200 tho)

English

Shuo Yang@ShuoYangAIR·13 Mar

@AndreTI it’s not a bug. It is because the VAM is very slow to infer so the robot receives discontinuous trajectories despite of smoothing mechanism. We need to improve inference time by compressing model

English

397

Shuo Yang@ShuoYangAIR·12 Mar

We’re excited to share DiT4DiT, an end-to-end Video-Action Model for robot learning that unifies a video Diffusion Transformer and an action Diffusion Transformer in a single cascaded framework. By leveraging the rich spatiotemporal and physical dynamics learned through video generation, rather than static image-text priors, DiT4DiT achieves state-of-the-art results on LIBERO (98.6%) and RoboCasa GR1 (50.8%) with far less training data, delivering over 10× better sample efficiency and up to 7× faster convergence. Real-world deployment on a humanoid robot further shows robust generalization. We believe this is a step toward making video generation a powerful backbone for robot policy learning. This work builds upon the brilliant foundations laid by Nvidia's GR00T and Cosmos. Project: dit4dit.github.io Paper: arxiv.org/abs/2603.10448 Code: Coming soon. In the meantime, you can ask your coding agent to reproduce the method based on GR00T/Cosmos.

English

228

31.3K

Oliver@XOS9000·2 Mar

@Vikashplus @ChongZitaZhang Couldnt precision like this just be a result of good force control and sensing..

English

Vikash Kumar@Vikashplus·2 Mar

@ChongZitaZhang Try threading a needle.

English

829

Vikash Kumar@Vikashplus·2 Mar

❌ NOT TRUE @ChongZitaZhang A finger ≠ a leg In legged locomotion, low gear ratios help with impact tolerance, store kinetic energy, and back-drivability under heavy load. Unlike legs, hand constraints & requirements are different - high torque density - high positional controllability - brutal space constraints The real trade offs for hands are: - Low gear ratio → back drivability, responsiveness, - High gear ratio → torque density, stability, compactness 🟠So is high ratio bad? **Its depends** -- high gear ratios improve static precision & torque density, but reduce dynamic responsiveness & back drivability Infact, biological hands are not low-impedance torque sources either. Like duality of photons, human hands sometime act as precise, while other times acts as force manipulators. Perhaps we need a "Heisenberg Uncertainty Principle" but for Hands.

C. Zhang@ChongZzZhang

Before reading I didn't know the landscape of dex hand is so bad. In modern legged locomotion, a gear ratio of 100 would already means [not usable] -- forceful actuation is not repeatable. I can't believe in manipulation where precision is more important, they do this.

English

9.8K

Oliver@XOS9000·1 Mar

@trq212 @bcherny Ok so every time it asks for permission for some new command the ‘don’t ask again for: ‘ option is wayyy too specific, there should be an option to just give permission to all similar commands… This is super annoying

English

Thariq@trq212·28 Şub

a few Friday afternoon ships to end the week: the AskUserQuestion tool can now show markdown snippets to display diagrams, code examples, etc.

English

181

163

4.6K

486.2K

Oliver@XOS9000·1 Mar

@chris_j_paxton @kvablack @notmahi Yes very, thats why i think taste is still the best gradient for robotic learning research. But I’m happy that Nvidia is willing to spend such large amount of talent and compute and open sourcing it all, proper evals and ablations are extremely expensive and timeconsuming!

English

Chris Paxton@chris_j_paxton·1 Mar

@XOS9000 @kvablack @notmahi Yeah this is the core problem with all of it right? That its hard to put too much weight in any individual architecture decision

English

100

Mahi Shafiullah 🏠🤖@notmahi·28 Şub

MolmoSpaces leaderboard is now open for submissions! When we created this benchmark for zero-shot real-to-sim eval in diverse homes, we didn’t expect things to heat up so quickly. But it did, thanks to @jang_yoel and team at GEAR toppling PI to take the crown on task-general category. Congrats 🎉 You can evaluate and submit your model to this leaderboard: molmospaces.allen.ai/leaderboard

Joel Jang@jang_yoel

𝐃𝐫𝐞𝐚𝐦𝐙𝐞𝐫𝐨 𝐢𝐬 #𝟏 𝐨𝐧 𝐛𝐨𝐭𝐡 𝐌𝐨𝐥𝐦𝐨𝐒𝐩𝐚𝐜𝐞𝐬 𝐚𝐧𝐝 𝐑𝐨𝐛𝐨𝐀𝐫𝐞𝐧𝐚 🏆 𝗪𝗵𝗮𝘁 𝗺𝗮𝗸𝗲𝘀 𝘁𝗵𝗶𝘀 𝗻𝗼𝘁𝗮𝗯𝗹𝗲: DreamZero-DROID is trained 𝑓𝑟𝑜𝑚 𝑠𝑐𝑟𝑎𝑡𝑐ℎ using only the DROID dataset. No pretraining on large-scale robot data, unlike competing VLAs. This demonstrates the strength of video-model backbones for generalist robot policies (VAMs/WAMs). More broadly, training 𝑜𝑛𝑙𝑦 on real data and evaluating on (1) transparent, distributed benchmarks like 𝐑𝐨𝐛𝐨𝐀𝐫𝐞𝐧𝐚 or (2) scalable sim-benchmarks like 𝐌𝐨𝐥𝐦𝐨𝐒𝐩𝐚𝐜𝐞𝐬 is an exciting step toward fairer and more reproducible evaluation of generalist policies, one that the community can hillclimb together to measure progress. Special thanks to the Ai2 MolmoSpaces team (@notmahi @omarrayyann @YejinKim4 Max Argus) and the RoboArena team (@pranav_atreya) for helping with the set-up and getting these evaluations! Special shout out to @youliangtan @NadunRanawakaA @chuning_zhu, who led these efforts from the GEAR side :) + We also release our DreamZero-AgiBot checkpoint & post-training code to enable very efficient few-shot adaptation. Post-train on just ~30 minutes of play data for your specific robot, and see the robot do basic language following and pick-and-place 🤗(See YAM experiments in our paper for more detail). ++ We also provide the entire codebase & preprocessed dataset to replicate the DreamZero-DROID checkpoint. 🌐 dreamzero0.github.io 💻 github.com/dreamzero0/dre… RoboArena: robo-arena.github.io/leaderboard MolmoSpaces: molmospaces.allen.ai/leaderboard

English

4.6K

Oliver@XOS9000·1 Mar

@asimovinc @TimLukeAnderson How much?

English

Asimov@asimovinc·1 Mar

@TimLukeAnderson Would you be interested in a DIY kit to build your own humanoid robot? Like buying a wardrobe from IKEA, but a humanoid robot from us.

English

290

7.9K

Asimov@asimovinc·28 Şub

This is Asimov v1. We're planning to open-source the complete body design, simulation files, and a full list of actuators. Asimov v1 includes everything you need to build, modify, and train your own humanoid.

Chris Paxton@chris_j_paxton

This is looking amazing

English

134

501

4.6K

303.3K

Oliver@XOS9000·1 Mar

@chris_j_paxton @kvablack @notmahi The 5B dreamzero is not that much larger than the Pi VLAs and seems to have much greater genralization with less robot data. Tho these ablations were done with only 50k steps and bs32, so wouldn’t put too much weight on it

English

113

Chris Paxton@chris_j_paxton·28 Şub

Yeah I guess that's speculation too. A couple things made me say that: - wan architecture is based around a 3d vae; would that even make sense without the vision auxiliary loss? - the core wan architecture other than the vae just doesn't seem interesting to me, am I missing something? - from my own quite out of date research i saw this same effect, where in low data/ overparameterized regimes you got better performance with vision generation - they have a huge number of parameters here, and their own vla experiments do show a very negative result (although as you point out, it's not a very clear result)

English

9.9K

Oliver@XOS9000·1 Mar

@alpercanbe Hhmm not seen this before, what paper is this from?

English

Alper Canberk@alpercanbe·27 Şub

most robot learning scaling research consists of exploring approaches with varying D_F and \alpha

English

2.7K

Oliver@XOS9000·4 Şub

@theo PaperVM with X11 has way better support no?

English

Theo - t3.gg@theo·2 Şub

The Niri window manager feels like exactly what I was looking for. Holy shit this is good.

Theo - t3.gg@theo

x.com/i/article/2018…

English

173

154

4.2K

758.6K

Oliver@XOS9000·4 Oca

@Sp4rqDev Zeroth bot frm Kscale labs

English

Sp4rq@Sp4rqDev·4 Oca

@XOS9000 What model of robot is it? Open source?

English

Oliver@XOS9000·1 Kas

Balance policy working

English

Oliver@XOS9000·4 Oca

@businessbarista Claude

English

Alex Lieberman@businessbarista·3 Oca

I want to start a community dedicated to Claude Code. It’s become the gateway drug to coding and experiencing the power of AI for tons of people. This will be a space for people to share killer use cases, agentic workflows, proven prompts, and connect with other CC obsessives. Comment “Claude” if you want to join.

English

7.1K

208

6.3K

621.2K

Oliver@XOS9000·27 Ara

@jeffreyhuber etched i guess? But they r locked into transformer architecture tho..

English

Jeff Huber@jeffreyhuber·26 Ara

groq is cool, but i’ve seen a sneak peak of inference 20x faster than groq coming soon

English

278

80.7K

Oliver@XOS9000·17 Ara

@ChongZitaZhang I think ur right. Must be possible to add a low latency reactive nn part. Just quick image encoder and then add a corrective bias on the actions.

English

C. Zhang@ChongZzZhang·16 Ara

Am I wrong that these async inference methods do not really bring low latency, but just utilizing the prediction of the known future? Like, if something suddenly happens and needs a reaction in 0.05s, they won't work.

Danfei Xu@danfei_xu

2/ Problem #1: Inference latency VLAs are big. Inference often takes longer than a control step. Naive synchronous execution → the robot literally stalls. Most modern solutions start with asynchronous inference.

English

10.1K

Keşfet

@zhengyiluo @WillShenSaysHi @JieWang_ZJUI @benjamin_bolte @DominiqueCAPaul @neurosp1ke @zhaohang0124 @SeonghyeonYe