dave

29 posts

dave

@davidhe137

cs @ georgia tech

Katılım Kasım 2022

255 Takip Edilen34 Takipçiler

dave retweetledi

Woo Chul Shin@woochulshin1726·24 Nis

What if your robot could plan tasks it has never seen before without ever being retrained? Meet Compositional Visual Planning via Inference-Time Diffusion Scaling (ICLR 2026 🏆) comp-visual-planning.github.io If you are in Rio🇧🇷 visit us! Sat, 04/25/26 6:30-9:00 AM PDT Pavillion 4 #4203

English

8.4K

dave@davidhe137·21 Nis

Check out our Oral at #ICLR2026 ! We combine compositional diffusion with inference-time scaling to do long-horizon planning with only short-horizon data! Would love to chat about diffusion, robotics, or anything else really. My DMs are open 😸

Utkarsh Mishra@utkarshm0410

Our paper "Compositional Diffusion with Guided Search (CDGS)" is an Oral at #ICLR2026! Short-horizon Foundation Models + Compositional Generative Planning + Inference-time Search = CDGS for goal-conditioned long-horizon planning! More details: cdgsearch.github.io 🧵 below

English

330

dave@davidhe137·14 Nis

@Mike_A_Merrill 👋

QME

113

Mike A. Merrill@Mike_A_Merrill·13 Nis

Who’s going to ICLR?

English

12.6K

dave@davidhe137·10 Nis

@madteryx no way it's quantavious tradingson

English

1.6K

dave retweetledi

Danfei Xu@danfei_xu·23 Mar

Introducing EgoVerse: an ecosystem for robot learning from egocentric human data. Built and tested by 4 research labs + 3 industry partners, EgoVerse enables both science and scaling 1300+ hrs, 240 scenes, 2000+ tasks, and growing Dataset design, findings, and ecosystem 🧵

English

158

856

251.8K

dave@davidhe137·5 Mar

@quantbagel flowcast? arxiv.org/abs/2602.01329

English

Lucas@quantbagel·5 Mar

You can do the same for diffusion decoders as well!

Tanishq Kumar@tanishqkumar07

I've been working on a new LLM inference algorithm. It's called Speculative Speculative Decoding (SSD) and it's up to 2x faster than the strongest inference engines in the world. Collab w/ @tri_dao @avnermay. Details in thread.

English

3.1K

dave@davidhe137·4 Mar

@k7agar just run them on b200s arxiv.org/abs/2602.18397…

English

atharva ☆@k7agar·4 Mar

the robotic problem rn is how do you keep your huge slow bulky neural network inference sub ~300ms for good real time control. good mix of both engineering and research problems to solve for robot inference to be as smooth as possible slow is bad, bad is not acceptable

English

147

10.2K

dave@davidhe137·4 Mar

@ScienceYael thank you! you da🐐

English

663

Ya'el Courtney, PhD@ScienceYael·4 Mar

@someraregem live here: gist.github.com/yaelccourtney1…

English

2.9K

Ya'el Courtney, PhD@ScienceYael·4 Mar

sorry to be annoying about Claude code again but today in <15 minutes I did a full apartment hunt, scoring top candidates by my personal weighted criteria (including commute time @ certain hours), and made a powerpoint with photos, stats, and live links for my top 20 options.

English

1.7K

119.5K

dave@davidhe137·4 Mar

@quantbagel how does it affect rollout performance? i.e. LIBERO should be a straightforward sanity check

English

429

Lucas@quantbagel·3 Mar

Robot action models shouldn't need 256 vision tokens per frame. Pi0.5 spends 400M parameters on SigLIP just to see. We replaced it with a 4.4M encoder that outputs 5 tokens — and action quality barely changes. 91x smaller. 51x fewer tokens. 7.3x faster inference.

English

363

21K

dave@davidhe137·24 Şub

@chris_j_paxton correlated with yesterday's standard intelligence release?

English

1.4K

Chris Paxton@chris_j_paxton·24 Şub

What is going on at MSL

Russ Salakhutdinov@rsalakhu

My time at Meta Superintelligence Lab (MSL) comes to a close. Together with @kohjingyu and @dan_fried, I joined Meta nearly two years ago to help advance and scale computer-use agents, a long-standing research focus of ours at CMU @SCSatCMU. It has been a remarkable journey working with and leading an exceptionally talented team, spanning agentic and reasoning model training across pre-training and post-training, building evals, and data pipelines. I’m deeply grateful for the opportunity to collaborate with so many amazing colleagues across GenAI and the incredible researchers at FAIR @AIatMeta. I also want to thank Meta’s leadership, especially @Ahmad_Al_Dahle and @rob_fergus, for giving our team the freedom to explore and pursue cutting-edge research. Big tech excels at large-scale engineering and the scaling of foundation models, but further progress toward superintelligence will require new breakthroughs in architectures, optimization, and the efficient use of data, including synthetic data. And I believe academia will play a pivotal role in driving these advances, particularly through open-source research.

English

164

52.8K

dave@davidhe137·22 Şub

@GenAI_is_real to make sure I understand correctly, is this about the jitter on inference latency when you roll out a WAM in the real world? My intuition is that sufficiently long action chunks i.e. 500ms + some async chunk merging (SAIL/RTC/VLASH) would make any jitter totally negligible?

English

102

Chayenne Zhao@GenAI_is_real·21 Şub

jim is spot on: 2026 is the year of world models. but the "bitter lesson" for robotics isn't just about data scale—it's about the inference tax. running a diffusion-based world model at 10fps on a 5090 is impressive, but for real-world closed-loop control, every ms of jitter in that "dream" kills the policy. this is why we’ve been obsessing over multi-modal serving kernels lately. the gap between a cool demo and a deployable foundation agent is purely a system latency problem. the infra layer for these "live teleop" dreams is where the real war will be won.

Jim Fan@DrJimFan

Announcing DreamDojo: our open-source, interactive world model that takes robot motor controls and generates the future in pixels. No engine, no meshes, no hand-authored dynamics. It's Simulation 2.0. Time for robotics to take the bitter lesson pill. Real-world robot learning is bottlenecked by time, wear, safety, and resets. If we want Physical AI to move at pretraining speed, we need a simulator that adapts to pretraining scale with as little human engineering as possible. Our key insights: (1) human egocentric videos are a scalable source of first-person physics; (2) latent actions make them "robot-readable" across different hardware; (3) real-time inference unlocks live teleop, policy eval, and test-time planning *inside* a dream. We pre-train on 44K hours of human videos: cheap, abundant, and collected with zero robot-in-the-loop. Humans have already explored the combinatorics: we grasp, pour, fold, assemble, fail, retry—across cluttered scenes, shifting viewpoints, changing light, and hour-long task chains—at a scale no robot fleet could match. The missing piece: these videos have no action labels. So we introduce latent actions: a unified representation inferred directly from videos that captures "what changed between world states" without knowing the underlying hardware. This lets us train on any first-person video as if it came with motor commands attached. As a result, DreamDojo generalizes zero-shot to objects and environments never seen in any robot training set, because humans saw them first. Next, we post-train onto each robot to fit its specific hardware. Think of it as separating "how the world looks and behaves" from "how this particular robot actuates." The base model follows the general physical rules, then "snaps onto" the robot's unique mechanics. It's kind of like loading a new character and scene assets into Unreal Engine, but done through gradient descent and generalizes far beyond the post-training dataset. A world simulator is only useful if it runs fast enough to close the loop. We train a real-time version of DreamDojo that runs at 10 FPS, stable for over a minute of continuous rollout. This unlocks exciting possibilities: - Live teleoperation *inside* a dream. Connect a VR controller, stream actions into DreamDojo, and teleop a virtual robot in real time. We demo this on Unitree G1 with a PICO headset and one RTX 5090. - Policy evaluation. You can benchmark a policy checkpoint in DreamDojo instead of the real world. The simulated success rates strongly correlate with real-world results - accurate enough to rank checkpoints without burning a single motor. - Model-based planning. Sample multiple action proposals → simulate them all in parallel → pick the best future. Gains +17% real-world success out of the box on a fruit packing task. We open-source everything!! Weights, code, post-training dataset, eval set, and whitepaper with tons of details to reproduce. DreamDojo is based on NVIDIA Cosmos, which is open-weight too. 2026 is the year of World Models for physical AI. We want you to build with us. Happy scaling! Links in thread:

English

11.9K

dave@davidhe137·10 Oca

@IsaacKing314 prime hedge opportunity for budden here

English

224

Appliance Discharge@IsaacKing314·9 Oca

The saga continues...

Budden@davidmbudden

I'll upload the proofs to arXiv where they belong, once I'm satisfied with the quality. And make an appropriate donation to charity -- in addition to the two bet payouts and 5k I've already donated to Litt's preferred charity -- based on the timing and outcomes. And that is all.

English

102

8.8K

dave@davidhe137·7 Oca

@allgarbled it is more acceptable to have multiple “talking” stages. optionality abounds

English

gabe@allgarbled·7 Oca

I think the “talking” stage is new. Of course technically you have to talk to somebody before going on a date. But I don’t think this used to be an identifiable stage. I think the zoomers invented this. The question is why?

Hero Thousandfaces@1thousandfaces_

if you use "talking" to mean "going out with each other but not committed" you are retarded. talking is for when you are just talking but not doing anything

English

109

534

75.5K

dave@davidhe137·7 Oca

@felpix_ harmless elite white collar job btw

Svenska

felpix@felpix_·6 Oca

please i will work 120 hour weeks to trade prediction markets pretty please

Joel Rubano@TCK_JRubano

Just saw my first job posting for a Prediction Markets trader at an elite hedge fund.

English

805

dave@davidhe137·6 Oca

@bernhardsson there is a beauty to economies of scale

English

421

Erik Bernhardsson@bernhardsson·6 Oca

Resource pooling is truly a free lunch for infra providers, but you know what’s a free twelve-course dinner? Colocating online workloads with batch jobs in a way that makes capacity completely flat.

English

115

11.3K

dave@davidhe137·5 Oca

@felpix_ i think market makers are morally neutral

English

3.1K

felpix@felpix_·5 Oca

ivy league grads in 2026 are basically forced to take occupations in industries responsible for immense amounts of domestic and global suffering just to make a living you quite literally cannot get an "elite" level white collar job without harming people these days

English

2.5K

678.3K

dave@davidhe137·16 Eki

the problem with my twitter is that finding the 1% of pure insight jason wei threads requires sifting through the 99% of cluely abg retweets

English

156

dave@davidhe137·8 Eki

@jon_barron @joshuajohnsonAI GANs also map noise to a learned distribution though? Feels like the **gradual** loss of information in the forward process is what makes it “diffusion”

English

761

Jon Barron@jon_barron·8 Eki

@joshuajohnsonAI yeah I find this argument most persuasive.

English

3.2K

Jon Barron@jon_barron·8 Eki

If diffusion has been distilled down to a single step, is it still diffusion? Why or why not?

English

247

31.5K

dave retweetledi

Ryan Punamiya@ryan_punamiya·25 Eyl

Robots struggle to learn new skills from human videos. Why? We found that naive co-training produces disjoint distributions. Our EgoBridge (NeurIPS’25) extends Optimal Transport to align human-robot latents, improving success by 44% and generalization to human-only tasks!🧵

English

197

53.4K

Keşfet

@Mike_A_Merrill @madteryx @quantbagel @k7agar @ScienceYael @chris_j_paxton @GenAI_is_real @IsaacKing314