Oliver

125 posts

Oliver

Oliver

@XOS9000

interested in everything AI & Robotics

Katılım Ocak 2022
253 Takip Edilen26 Takipçiler
Oliver
Oliver@XOS9000·
@zhengyiluo Any data on this release? Egoscale’s data was planned to be released no?
English
0
0
0
41
William Shen
William Shen@WillShenSaysHi·
𝗧𝗶𝗣𝗧𝗼𝗣 𝗶𝘀 #𝟭 𝗼𝗻 𝗠𝗼𝗹𝗺𝗼𝗦𝗽𝗮𝗰𝗲𝘀! Outperforming VLAs including MolmoAct2 and π₀.₅, and WAMs like DreamZero It's the only method that uses inference-time search and 𝙯𝙚𝙧𝙤 robot data. We didn't do any benchmark-specific tuning.
William Shen tweet media
English
3
15
133
12.3K
Oliver
Oliver@XOS9000·
Anybody know what model this j2-vla model which seams to dominate the robotarena leaderboards? Still only 20 evals but very strong early results!
Oliver tweet media
English
0
0
0
14
Oliver
Oliver@XOS9000·
@JieWang_ZJUI Question wether they use any infrared depth sensors, and whether its just using the proprioception. If not, very impressive!
English
0
0
0
23
Oliver
Oliver@XOS9000·
@benjamin_bolte So u’r bullish on companies actually manufacturing robots and then compaies integrating them into factories?
English
0
0
0
393
Benjamin Bolte
Benjamin Bolte@benjamin_bolte·
I mean, I know this is just some astroturfing thing and I should just ignore it. But seriously, don't fall for it anon, you're gonna get rugged. The day Alibaba or Minimax or whoever open-sources their video action model, Pi will fade into obscurity and everyone will collectively remember that startups are supposed to try and make money. I would be shocked if Alibaba doesn't already have a Tesla / xAI-level real-time video model release planned for the next 12 months. I have multiple friends at Pi, they're super smart and hard-working. And they've done great with their secondaries. I still cannot fathom how someone can look at this situation and not see the glaring sectoral risk. You're taking business and machine learning advice from the geniuses behind Everyday Robots. Advice for anyone trying to invest in robotics: just buy Unitree on Hiive, it's still a huge discount and it's an objectively great business. Write-up on Unitree's IPO filing: therobotreport.com/unitree-ipo-sh… Key details: - 60% (!) gross margins - 300% YoY growth, $250m in revenue - Humanoids at > 50% of core revenue
Y Combinator@ycombinator

Physical Intelligence (@physical_int) is building a foundation model that can control any robot to do any task — what the team describes as the GPT moment for robotics. The company's cross-embodiment approach trains across many different robot platforms, and recent results show tasks being performed zero-shot that last year required hundreds of hours of data collection. In this episode of the @LightconePod , co-founder Quan Vuong (@QuanVng) sat down with @garrytan, @snowmaker, @sdianahu, and @harjtaggar to talk about why robotics is finally ready for its scaling moment, how PI runs its models in the cloud rather than on-device, and the playbook for what Quan sees as a Cambrian explosion of vertical robotics companies. 00:00 — Robotics just got cheaper 00:41 — The GPT moment for robotics 02:24 — Why robots didn’t work before 05:30 — The breakthrough that changed everything 09:12 — The data problem 13:33 — Robots learning without data 15:05 — Robots folding laundry (for real) 22:18 — From engineering problem → ops problem 29:12 — The startup playbook 38:46 — Thousands of robotics startups are coming

English
22
9
227
49.3K
Oliver
Oliver@XOS9000·
@DominiqueCAPaul Could u explain point 4 a bit better? Are these gestures for labeling episodes?
English
0
0
1
154
Dominique Paul
Dominique Paul@DominiqueCAPaul·
Thoughts from a teleop session today:
 1/ Teleop is painful. UMI-style grippers are making more and more sense to me: shorter per-episode execution, more data, and more intuitive for factory workers who will be the ones with my product eventually. Wondering if PI resists this because researchers aren't the ones collecting the data? 
2/ The take-away of the HF shirt folding post stuck with me: data quality matters most of all. I'm 30 minutes into this task and still making mistakes with teleop. What’s the perspective for non-roboticists? Maybe VR headset is better. Want to try that next. 3/ I’m noticeable better at teleop even when I’m just 15cm closer. 4/ Double-close gestures for re-record (left) and early episode end (right) are a game changer. Credit @neurosp1ke. 5/ Want to gamify my own collection more: thinking of a daily target dashboard. 6/ I’d like to rate each episode with a 1-5 data quality score. Don't wanna throw away bad data away, but still be able to filter top-quality. Maybe possible with foot pedals?
English
21
14
237
37.8K
Oliver
Oliver@XOS9000·
@zhaohang0124 @SeonghyeonYe But dreamzero didnt do any ‘imagination’ at their inference time, they just denoise noisy actions (they kept the frame prediction since it didnt affect runtime much, not because it was crucial for performance). Or am i missing smth?
English
0
0
0
532
Hang Zhao
Hang Zhao@zhaohang0124·
Our recent findings on World Action Models (WAMs): the core advantage of WAMs is not test-time “imagination” of futures, but the training-time supervision from future video prediction. We propose Fast-WAM, which makes inference simple, fast, and policy-centric.
Hang Zhao tweet media
English
10
87
819
156.9K
Oliver
Oliver@XOS9000·
How does Claude Code stack up against all of Norway in developing AI solutions? 🇳🇴 Over 3,000 participants in the norwegian ai championship, competing for 100k USD, and my claude bot running fully autonomously on a single prompt placed top 50 in two out of three problems!
Oliver tweet media
English
0
0
1
31
Shuo Yang
Shuo Yang@ShuoYangAIR·
@AndreTI it’s not a bug. It is because the VAM is very slow to infer so the robot receives discontinuous trajectories despite of smoothing mechanism. We need to improve inference time by compressing model
English
2
0
5
397
Shuo Yang
Shuo Yang@ShuoYangAIR·
We’re excited to share DiT4DiT, an end-to-end Video-Action Model for robot learning that unifies a video Diffusion Transformer and an action Diffusion Transformer in a single cascaded framework. By leveraging the rich spatiotemporal and physical dynamics learned through video generation, rather than static image-text priors, DiT4DiT achieves state-of-the-art results on LIBERO (98.6%) and RoboCasa GR1 (50.8%) with far less training data, delivering over 10× better sample efficiency and up to 7× faster convergence. Real-world deployment on a humanoid robot further shows robust generalization. We believe this is a step toward making video generation a powerful backbone for robot policy learning. This work builds upon the brilliant foundations laid by Nvidia's GR00T and Cosmos. Project: dit4dit.github.io Paper: arxiv.org/abs/2603.10448 Code: Coming soon. In the meantime, you can ask your coding agent to reproduce the method based on GR00T/Cosmos.
English
6
36
228
31.3K
Vikash Kumar
Vikash Kumar@Vikashplus·
❌ NOT TRUE @ChongZitaZhang A finger ≠ a leg In legged locomotion, low gear ratios help with impact tolerance, store kinetic energy, and back-drivability under heavy load. Unlike legs, hand constraints & requirements are different - high torque density - high positional controllability - brutal space constraints The real trade offs for hands are: - Low gear ratio → back drivability, responsiveness, - High gear ratio → torque density, stability, compactness 🟠So is high ratio bad? **Its depends** -- high gear ratios improve static precision & torque density, but reduce dynamic responsiveness & back drivability Infact, biological hands are not low-impedance torque sources either. Like duality of photons, human hands sometime act as precise, while other times acts as force manipulators. Perhaps we need a "Heisenberg Uncertainty Principle" but for Hands.
C. Zhang@ChongZzZhang

Before reading I didn't know the landscape of dex hand is so bad. In modern legged locomotion, a gear ratio of 100 would already means [not usable] -- forceful actuation is not repeatable. I can't believe in manipulation where precision is more important, they do this.

English
6
7
59
9.8K
Oliver
Oliver@XOS9000·
@trq212 @bcherny Ok so every time it asks for permission for some new command the ‘don’t ask again for: ‘ option is wayyy too specific, there should be an option to just give permission to all similar commands… This is super annoying
English
0
0
0
9
Thariq
Thariq@trq212·
a few Friday afternoon ships to end the week: the AskUserQuestion tool can now show markdown snippets to display diagrams, code examples, etc.
English
181
163
4.6K
486.2K
Oliver
Oliver@XOS9000·
@chris_j_paxton @kvablack @notmahi Yes very, thats why i think taste is still the best gradient for robotic learning research. But I’m happy that Nvidia is willing to spend such large amount of talent and compute and open sourcing it all, proper evals and ablations are extremely expensive and timeconsuming!
English
0
0
2
42
Chris Paxton
Chris Paxton@chris_j_paxton·
@XOS9000 @kvablack @notmahi Yeah this is the core problem with all of it right? That its hard to put too much weight in any individual architecture decision
English
1
0
2
100
Mahi Shafiullah 🏠🤖
Mahi Shafiullah 🏠🤖@notmahi·
MolmoSpaces leaderboard is now open for submissions! When we created this benchmark for zero-shot real-to-sim eval in diverse homes, we didn’t expect things to heat up so quickly. But it did, thanks to @jang_yoel and team at GEAR toppling PI to take the crown on task-general category. Congrats 🎉 You can evaluate and submit your model to this leaderboard: molmospaces.allen.ai/leaderboard
Joel Jang@jang_yoel

𝐃𝐫𝐞𝐚𝐦𝐙𝐞𝐫𝐨 𝐢𝐬 #𝟏 𝐨𝐧 𝐛𝐨𝐭𝐡 𝐌𝐨𝐥𝐦𝐨𝐒𝐩𝐚𝐜𝐞𝐬 𝐚𝐧𝐝 𝐑𝐨𝐛𝐨𝐀𝐫𝐞𝐧𝐚 🏆 𝗪𝗵𝗮𝘁 𝗺𝗮𝗸𝗲𝘀 𝘁𝗵𝗶𝘀 𝗻𝗼𝘁𝗮𝗯𝗹𝗲: DreamZero-DROID is trained 𝑓𝑟𝑜𝑚 𝑠𝑐𝑟𝑎𝑡𝑐ℎ using only the DROID dataset. No pretraining on large-scale robot data, unlike competing VLAs. This demonstrates the strength of video-model backbones for generalist robot policies (VAMs/WAMs). More broadly, training 𝑜𝑛𝑙𝑦 on real data and evaluating on (1) transparent, distributed benchmarks like 𝐑𝐨𝐛𝐨𝐀𝐫𝐞𝐧𝐚 or (2) scalable sim-benchmarks like 𝐌𝐨𝐥𝐦𝐨𝐒𝐩𝐚𝐜𝐞𝐬 is an exciting step toward fairer and more reproducible evaluation of generalist policies, one that the community can hillclimb together to measure progress. Special thanks to the Ai2 MolmoSpaces team (@notmahi @omarrayyann @YejinKim4 Max Argus) and the RoboArena team (@pranav_atreya) for helping with the set-up and getting these evaluations! Special shout out to @youliangtan @NadunRanawakaA @chuning_zhu, who led these efforts from the GEAR side :) + We also release our DreamZero-AgiBot checkpoint & post-training code to enable very efficient few-shot adaptation. Post-train on just ~30 minutes of play data for your specific robot, and see the robot do basic language following and pick-and-place 🤗(See YAM experiments in our paper for more detail). ++ We also provide the entire codebase & preprocessed dataset to replicate the DreamZero-DROID checkpoint. 🌐 dreamzero0.github.io 💻 github.com/dreamzero0/dre… RoboArena: robo-arena.github.io/leaderboard MolmoSpaces: molmospaces.allen.ai/leaderboard

English
2
4
40
4.6K
Asimov
Asimov@asimovinc·
@TimLukeAnderson Would you be interested in a DIY kit to build your own humanoid robot? Like buying a wardrobe from IKEA, but a humanoid robot from us.
English
54
10
290
7.9K
Asimov
Asimov@asimovinc·
This is Asimov v1. We're planning to open-source the complete body design, simulation files, and a full list of actuators. Asimov v1 includes everything you need to build, modify, and train your own humanoid.
Chris Paxton@chris_j_paxton

This is looking amazing

English
134
501
4.6K
303.3K
Oliver
Oliver@XOS9000·
@chris_j_paxton @kvablack @notmahi The 5B dreamzero is not that much larger than the Pi VLAs and seems to have much greater genralization with less robot data. Tho these ablations were done with only 50k steps and bs32, so wouldn’t put too much weight on it
English
1
0
2
113
Chris Paxton
Chris Paxton@chris_j_paxton·
Yeah I guess that's speculation too. A couple things made me say that: - wan architecture is based around a 3d vae; would that even make sense without the vision auxiliary loss? - the core wan architecture other than the vae just doesn't seem interesting to me, am I missing something? - from my own quite out of date research i saw this same effect, where in low data/ overparameterized regimes you got better performance with vision generation - they have a huge number of parameters here, and their own vla experiments do show a very negative result (although as you point out, it's not a very clear result)
Chris Paxton tweet mediaChris Paxton tweet media
English
1
0
8
9.9K
Oliver
Oliver@XOS9000·
@alpercanbe Hhmm not seen this before, what paper is this from?
English
0
0
0
40
Alper Canberk
Alper Canberk@alpercanbe·
most robot learning scaling research consists of exploring approaches with varying D_F and \alpha
Alper Canberk tweet media
English
3
2
31
2.7K
Oliver
Oliver@XOS9000·
@theo PaperVM with X11 has way better support no?
English
0
0
1
70
Sp4rq
Sp4rq@Sp4rqDev·
@XOS9000 What model of robot is it? Open source?
English
1
0
0
15
Oliver
Oliver@XOS9000·
Balance policy working
English
1
0
1
40
Alex Lieberman
Alex Lieberman@businessbarista·
I want to start a community dedicated to Claude Code. It’s become the gateway drug to coding and experiencing the power of AI for tons of people. This will be a space for people to share killer use cases, agentic workflows, proven prompts, and connect with other CC obsessives. Comment “Claude” if you want to join.
English
7.1K
208
6.3K
621.2K
Oliver
Oliver@XOS9000·
@jeffreyhuber etched i guess? But they r locked into transformer architecture tho..
English
0
0
0
52
Jeff Huber
Jeff Huber@jeffreyhuber·
groq is cool, but i’ve seen a sneak peak of inference 20x faster than groq coming soon
English
41
8
278
80.7K
Oliver
Oliver@XOS9000·
@ChongZitaZhang I think ur right. Must be possible to add a low latency reactive nn part. Just quick image encoder and then add a corrective bias on the actions.
English
0
0
1
29
C. Zhang
C. Zhang@ChongZzZhang·
Am I wrong that these async inference methods do not really bring low latency, but just utilizing the prediction of the known future? Like, if something suddenly happens and needs a reaction in 0.05s, they won't work.
Danfei Xu@danfei_xu

2/ Problem #1: Inference latency VLAs are big. Inference often takes longer than a control step. Naive synchronous execution → the robot literally stalls. Most modern solutions start with asynchronous inference.

English
11
3
48
10.1K