Cybernetic Labs

824 posts

Cybernetic Labs banner
Cybernetic Labs

Cybernetic Labs

@cybernetic_lab

Building a lifelong robot learning flywheel (🤖/acc)

เข้าร่วม Mayıs 2021
43 กำลังติดตาม997 ผู้ติดตาม
Cybernetic Labs รีทวีตแล้ว
Artashes Hovesyan
Artashes Hovesyan@ArtashesHo2043·
GR00T N1.5 vs N1.7 on the same dual-SO101 packing task. Same 228-episode dataset, same sim setup, same prompt, same rollout params. Left: N1.5 fine-tuned 10k steps Right: N1.7 fine-tuned 50k steps Surprisingly, N1.5 is still more reliable here. N1.7 at 10k basically failed the task, so we pushed it to 50k — but it still underperforms N1.5 on this setup. #SO101 #GR00T
English
0
1
1
70
Cybernetic Labs
Cybernetic Labs@cybernetic_lab·
The scale of modern LLMs flipped the inference problem on its head. Inference engines used to juggle many small models on one GPU. Now a single trillion-parameter model demands an entire GPU cluster - and the whole optimization stack has been rebuilt around serving just one giant.
Samy K (🤖/acc )@samy_cybernetic

x.com/i/article/2066…

English
0
0
4
424
Cybernetic Labs รีทวีตแล้ว
dar
dar@radbackwards·
The GPT moment of robotics is a lot closer than people suspect
English
73
75
1.1K
84.9K
Cybernetic Labs
Cybernetic Labs@cybernetic_lab·
Tencent's Hy-Embodied-0.5-VLA argues the real unlock is co-designing the whole stack: data, representation, policy refinement, and execution. Where π0, π0.5, GR00T N1, and OpenVLA lean on model scale or web priors, HyVLA-0.5 interlocks four pillars: 10K hours of human UMI demos, an embodiment-agnostic backbone, reward-free preference RL (FlowPRO), and a training-free real-time deployment recipe. Pillar 1 - data. A custom fingertip UMI device plus motion-capture cage collects 10,000+ hours of egocentric, sub-millimeter human demonstrations. No teleop master-slave rig, no SLAM-only label noise. The same trajectories double as post-training data for downstream robots. Why UMI? Hand-held human collection scales diversity in the wild and yields action labels that aren't bound to any specific robot. With reachability filtering, those demos remain usable across embodiments - the foundation of the paper's headline: transfer without any target-robot teleoperation. Pillar 2 - architecture. A 4B Mixture-of-Transformers backbone (Hy-Embodied-0.5-MoT) keeps separate vision/text parameters with shared self-attention, plus native-resolution Hy-ViT 2.0 so cameras aren't downsampled. Embodied-native priors, not a repurposed general VLM. On top sits a flow-matching action expert predicting continuous actions, a compact memory encoder for spatiotemporal context, and a delta-chunk action representation - incremental end-effector motion that decouples policy learning from any specific robot's kinematics. Pillar 3 - FlowPRO. A critic-free, reward-free offline RL stage based on Proximalized Preference Optimization. Paired success/failure trajectories are harvested via teleop intervention-and-rollback, then aligned with the flow-matching objective. No reward shaping. No value network. FlowPRO matters because reward design breaks on contact-rich manipulation, and plain DPO-style preference learning reward-hacks. A proximal regularizer anchors the implicit reward, and gradient cancellation lets SFT samples co-train safely - turning failures into a fast iteration loop. Pillar 4 - deployment. Training-free, plug-and-play. Asynchronous inference overlaps backbone forward passes with execution, and cubic Bézier action smoothing stitches successive delta chunks with guaranteed C¹-continuous transitions. Real hardware, closed visual loop, high frequency. Two SFT tracks tie it together. Track-A: target-robot demos and same-platform deployment. Track-B: UMI-only cross-embodiment transfer to morphologically different robots, including JAKA and Astribot S1, without any target-robot teleoperation at all. The Results: SOTA on RoboTwin2.0 simulation, strong bimanual real-world task success, and the headline - zero-teleop transfer to unseen embodiments. FlowPRO drives long-tail tasks toward near-ceiling success rates from preference data alone. With HyVLA, the gap between benchmark policies and deployed robots is closed by a stack where each layer absorbs a different bottleneck - data fidelity, action interface, failure correction, latency - around a stable policy core.
ModelScope@ModelScope2022

Meet Hy-Embodied-0.5-VLA, a full-stack VLA system that covers everything from data collection to real-world deployment. Apache 2.0. 🚀 Two checkpoints released: VLA-RoboTwin: SFT on 50 bimanual tasks, SOTA on RoboTwin 2.0 (90.9% Clean / 90.1% Randomized) 🤖 RoboTwin: modelscope.cn/models/Tencent… VLA-UMI: pretrained base on 10,000+ hours of UMI demonstrations, ready for fine-tuning on new robot platforms 🤖 UMI: modelscope.cn/models/Tencent… 📦 10,000+ hours of high-fidelity UMI demonstrations via optical motion-capture 🌍 Cross-embodiment transfer validated on 4 real-world robot platforms 🤖 MoT backbone with flow-matching action expert and compact memory encoder for multi-frame history ⚡ FlowPRO preference optimization + asynchronous inference for continuous dexterous manipulation 📄 modelscope.ai/papers/2606.14…

English
0
0
4
467
Cybernetic Labs
Cybernetic Labs@cybernetic_lab·
The quiet trick in robot teleop: pilots wear VR headsets not for immersion, but to be deliberately blinded. This forces data collection under the robot's exact sensory limits, yielding cleaner training than full human perspective. If the human can peek around the robot's camera, the policy learns from info it will never have at runtime. Tight embodiment isn't UX, it's a data integrity constraint.
English
2
5
19
5.1K
Cybernetic Labs รีทวีตแล้ว
Litian Liang
Litian Liang@litian_liang·
Introducing Universal Manipulation Exoskeleton (UME) A low-cost exoskeleton with real-time haptic torque feedback for learning autonomous policies that perform highly force-mediated, tightly space-constrained, visually occluded, whole-body, and long-horizon mobile manipulation tasks. Using UME, the teleoperator can unsheathe a heavy metal sword completely blindfolded. ume-exo.github.io 🧵1/N
English
43
96
689
434.5K
Cybernetic Labs รีทวีตแล้ว
NVIDIA Robotics
NVIDIA Robotics@NVIDIARobotics·
The next wave of robot foundation models is here. ⚡ World-Action Models are emerging as a powerful new approach for physical AI, combining pretrained scene dynamics with action generation. Learn more ➡️ nvda.ws/43C1abm
English
25
45
287
33.1K
Cybernetic Labs รีทวีตแล้ว
Samy K (🤖/acc )
Samy K (🤖/acc )@samy_cybernetic·
AI has so far lived in the world of bits - chatbots, image generators, code assistants. Physical AI crosses into the world of atoms: systems that perceive their environment, reason about it, and take action in it. And it's quietly moving from research labs into factories, warehouses, and the real world. Classical robotics = a robotic arm welding the same seam 1,000 times a day. Precise, repeatable, brittle. It only works in an environment engineered around it. Step outside the script and it breaks. Physical AI = robotic agents equipped with embodied AI models and reinforcement learning. They carry general world understanding plus specialized skills. The same underlying idea powers smart factories, self-optimizing energy grids, and autonomous cars - not just robot arms. Three things converged. First: Vision Language Action (VLA) models arrived. Vision to perceive, language to reason, action to do. Before VLAs, robots could see and act - but couldn't reason about novel situations. Now they can. Second: open foundation models trained on tens of millions of hours of robotics and driving data. General knowledge of physics and object manipulation, downloadable from HuggingFace. You don't have to build world understanding from scratch anymore. Third: compute. Processing 20 million hours of video would have taken 3 years on previous-gen CPUs. Now it takes weeks on current-gen GPUs. More data, faster iteration, better models. But training physical AI isn't like training an LLM. You can't just scrape text. Things have to move and react. So training starts in simulation - a virtual workbench, virtual parts, virtual robot - plus domain randomization: varying lighting, friction, part orientation. Then reinforcement learning. The robot attempts a task. Success = reward. Failure = nothing. Across millions of trials, it figures out what works. Once it clears a success threshold in sim, it's deployed to reality. And reality is messier than any simulation. Parts are slightly off. Surfaces behave unexpectedly. So you capture that real-world data, feed it back into the simulation, retrain, redeploy. This loop is how the sim-to-real gap actually closes. Models are good enough. Compute is cheap enough. Simulation is realistic enough. Physical AI is crossing from bits into atoms - and the feedback loop between the two is what makes it work. As we master this feedback loop, how will intelligent autonomous machines transform our physical world?
English
0
2
5
452
Cybernetic Labs
Cybernetic Labs@cybernetic_lab·
The sim-to-real gap doesn't close in simulation. It closes when a robot fails in reality; that failure gets captured, fed back into the sim, and trained against. Every weird surface and oddly shaped part is a data point. The real world is the test set you can't fake. Simulation is not a one-time prep phase. It is a permanent engine. Deploy the robot, let it hit unexpected real-world physics, feed that messy failure data back into the simulator, and retrain. Repeat forever 🔁
English
0
8
21
1.5K
Cybernetic Labs
Cybernetic Labs@cybernetic_lab·
The internet is basically a giant first-person dataset captured by the wrong embodiment. Run it through a world model to re-render it from a robot's POV, and suddenly every cooking vlog and GoPro hike becomes manipulation training data. Compute as a data acquisition shortcut!
Cybernetic Labs tweet media
English
1
2
5
415
Cybernetic Labs รีทวีตแล้ว
Sergey Levine
Sergey Levine@svlevine·
Flow reversal steering allows "steering" diffusion-based VLAs with high-level actions, for example from VLM reasoning. This also lets us run RL in the diffusion noise space with exploration guided by high-level reasoning: think through a task, then practice it! 👇
English
6
65
561
68K