Valdis Gerasymiak
1.4K posts

Valdis Gerasymiak
@Valdiolus
Building humanoid robots. Ex-Founder of ROTHEM (bicycle safety hardware/AI startup)
San Francisco, CA Katılım Ekim 2011
74 Takip Edilen92 Takipçiler
Sabitlenmiş Tweet

@cixliv Need an affordable US robotics platform for boxing …and dancing?))
English


@JasonrShuman @a1SuperOnion I think a lot about humanoid robots in small warehouses - which are without an automation at all, only shelves with products and few people.
English

I spent the last week in over a dozen pitches with robotics companies across Silicon Valley, NY and Europe...then I looked at the US Census Bureau Data
Turns out 88% of US manufacturing plants don't own a single robot...and that's the opportunity Founders are seeing.
Despite the endless deluge of humanoid robot demos and "AI factory" hype in our feeds, nearly 9 out of 10 American factories look exactly the same as they did 20 years ago.
Manual labor, mechanical machinery, a retiring workforce and challenges in filling roles.
The reasons why they haven't been "updated" historically breaks down into two clear buckets that I call:
1. The Integration Iceberg: A robot arm might cost $25,000 and has come down in price, but the custom tooling, safety cases and software integrations to make it work cost $125,000.
2. The Agility Tax: A traditional robot does one thing a million times. But the average US shop does "high-mix, low-volume" work. To reprogram a robot for a new part has required an expensive software engineer and could take days depending on engineer availability.
The next generation of massive robotics outcomes won't come from building shinier hardware for the 12% of factories that are already automated.
It will come from the Founders solving the integration and business model friction for the 88% that aren't.
If your GTM strategy doesn't solve the 18-month ROI math of a shop owner in Ohio who needs financing, fast onboarding and the ability for the robot to handle a variety of tasks, then you're likely going to struggle.
If you're working on a robotics business solving our countries biggest talent bottlenecks, I want to chat.
English

@__tinygrad__ Next to nothing CPU load - sometimes is a benchmark for framework + training code quality.
English

tinygrad will write that C for you. Our new driver compiles all interaction with the GPU to C, so once it's running the CPU does next to nothing.
Elon Musk@elonmusk
SpaceX has almost finished writing V1.0 of an in-house AI training stack in C that exact-maps to 220k GB300s with 800G NICs, making heavy use of pipeline parallelism and getting as close to bare metal as possible. The potential speed improvement vs JAX for large training runs is over an order of magnitude.
English


@Aman25m What’s up? Interested in humanoid robots building?)
English

Excited to share that I've joined @RoboStrategy.
I've spent the last decade building robotics startups.
In that time the category has shifted from (nearly) un-investable to one of the most consequential sectors in tech.
We're entering the physical intelligence era.
At RoboStrategy I get to apply what I've practiced as an operator and back the companies I believe can become the winners across autonomous systems.
If you're building, get in touch.
RoboStrategy@RoboStrategy
BREAKING: Jack Pearson (@JacklouisP) has signed with RoboStrategy as an Investment Principal
English
Valdis Gerasymiak retweetledi

@oprydai Never go into debt, you can find such labs for free
English

After 40 years, I am retiring from the field
And moving into the front office
During my tenure, I met some of the sharpest people in the field. Yet robotics still crawled forward. Not because of them — but because we desperately needed more like them
But this is changing
The new crop of talent is amazing; hence I can now be more effective in the front office than on the playing field
This is why I joined RoboStrategy as Robotics Research Diligence Director: so I can help propel the dawn of physical intelligence through strategic investments
(For informational purposes only. Not investment advice.)
][
RoboStrategy@RoboStrategy
BREAKING: Robotics Research Diligence Director (R2D2) Scott Walter, PhD (@goingballistic5) has been signed by RoboStrategy
English

Person should validate that it’s safe to operate robot in this particular environment, from people safety standpoint and from payload safety too.
Humanoid Scott@GoingBallistic5
Despite the firm grip from below, the damage on the sides is consistent with the bulky wrist design
English

@junfanzhu98 @saturdayrobotic @ryan_punamiya @aurorafeng_01 @andreygizdov @babugi28 @bcristei Thanks for recap, the event was very insightful, looking forward to implementing this!
English

🐝 @saturdayrobotic Robotics & World Models Reading Club 08 Recap: Embodied Human Data as the “Internet of Motion and Behavior” keynote @ryan_punamiya, hosts @junfanzhu98, @aurorafeng_01.
Great Parallel: Egocentric Human Data = Internet for Robot Foundation Models 🤖
Jim Fan’s “Robotics’ Endgame” nails it: LLM pipeline (Pre-Train “Simulating” → SFT “Aligning” → Reasoning RL “Surpassing”) perfectly mirrors robotics World Modeling → Action Fine-Tuning → Physical RL. Egocentric video + pose + language = the scalable “human experience” corpus robots desperately need.
Bottlenecks are brutal: no counterfactuals, Swiss-cheese coverage gaps, teleop skill caps, and bounded human thought (only deliberate actions; subconscious micro-adjustments and collaboration missing). Early methods relied on point tracks (MotionTracks/ATM), value functions (VIP/V-PTR on Ego4D 18k scenes + Bridge 3k clips/150k trans), and repr. learning (MVP/R3M: time-contrastive + video-lang alignment + L1 sparsity). Still brittle multi-stage pipelines.
Mimicplay (2023) fixed the bridge: multi-view human play → latent planner 𝒫 (goal g_t^h + current o_t^h) → GMM decoder → 3D hand traj l_t. Stage 2 freezes 𝒫 and adds tiny robot data (wrist/proprio) to train policy π. Result: true zero-shot testing from human goals. This lets robots learn complex play without hand-crafted labels.
EgoMimic (ICRA’25) goes further: treat human hand as “just another robot.” Unified co-trained ACT policy: masked obs + hand p_t^H + wrist + robot ^R p_t / ^R R_t → shared vision encoder + norm layers → ACT trunk → Cartesian (3D pose) + joint losses. Human/robot/shared streams in one model.
The 4× embodiment gap (kinematic DoF/morphology, kinodynamic speed, tactile sensing, visual + partial observability) explains why naïve transfer fails. Human pretraining = monocular noisy pose/occlusions; robot deployment = rich proprio/calibrated sensing.
EgoBridge solves alignment with Joint Optimal Transport on latent+action distributions. Soft supervision via cost function retains geometry and marginals. KL/MMD baselines collapse (pick/place clusters disjoint; W₂ drops 8.704 → ~0). It aligns not just representations, but controllable trajectory distributions — enabling new behaviors where everything else fails. Human data supplies semantics/diversity; small robot grounding anchors it into executable control. (Flow-matching adjacent.)
Hardware co-design closes the rest: EMMA (shared Xformer: Nav/Phase/EEF/Joints heads + Aria Glasses/ViperX Arms/AgileX Tracer/Realsense D405 + retargeting) = zero-shot mobile nav. DexUMi (hand exo-skeletons) boosts throughput: 11 → 36 → 51 trajectories in 15 min (teleop vs bare hand).
H2R proves cross-embodiment works: robot primitives + human semantic composition (“big items bottom, small top”) → 8× autonomous toolbox packing.
EgoVerse scales the data flywheel: 79,692 episodes / 1,362 h / 240 scenes / 1,965 tasks. Multi-lab → EC2+Ray+EgoDB (dense language). Consortium (GT/Stanford/UCSD/ETH) shows +4× autonomous when mixed with in-domain robot data.
EgoScale delivers the recipe: 20,854 h human pre-training → 50 h human + 4 h aligned mid-training → one-shot dexterous post-training (syringe/tong/unscrew/fold). Scaling laws: operators + scene diversity >> raw demonstrations (fixed budget). Long-horizon tasks mirror LLM context scaling — subtask explosion demands exponential behavioral diversity.
DreamDojo turns human data into a behavior/motion world model: pre (In-lab/EgoDex/HV) → robot post (GR-1/G1/AgiBot/YAM) → autoregressive distillation → Student for eval/planning/teleop/unseen envs. Human data = “Internet of the physical world” — when properly grounded.
Eval crisis remains real: MSE/validation loss correlates weakly (multimodal actions invalidate single-target assumptions). Need closed-loop + dense procedural language (“right pinky rotate bottle 90° CW”) for entropy reduction. Missing: tactile, hesitation, collaboration.
Hot takes: Diversity > repetition. We generate data faster than we can study it. Full-stack hardware+algo+data co-design is non-negotiable. Omni-models enable in-context preference. Sim2real still hard. Observability alignment is first-class.
Robotics is no longer about collecting more data — it is about aligning embodiment manifolds. Human semantics + robot grounding = executable physical intelligence. The Great Parallel is no longer theory. It’s engineering.




Junfan Zhu 朱俊帆 ✈️ CVPR@junfanzhu98
English






