Sentientx

8 posts

Sentientx

Sentientx

@Sentientx_io

Embodied AI Lab. We collect real world robot data for humanoid manipulation models.

शामिल हुए Ocak 2026
12 फ़ॉलोइंग59 फ़ॉलोवर्स
Sentientx रीट्वीट किया
satyam
satyam@satiyum·
More data leads to better models. And more data is not as hard to get as people claim. Don’t solve problems that don’t really exist. Scale real world data.
elvis@omarsar0

First empirical evidence that VLA models scale with massive real-world robot data. VLA foundation models promise robots that can follow natural language instructions and adapt to new tasks quickly. However, the field has lacked comprehensive studies on how performance actually scales with real-world data. This new research introduces LingBot-VLA, a Vision-Language-Action foundation model trained on approximately 20,000 hours of real-world manipulation data from 9 dual-arm robot configurations. Scaling pre-training data from 3,000 hours to 20,000 hours improves downstream success rates consistently, with no signs of saturation. More data still helps. The architecture uses a Mixture-of-Transformers design that couples a pre-trained VLM (Qwen2.5-VL) with an action expert through shared self-attention. This allows high-dimensional semantic priors to guide action generation while avoiding cross-modal interference. On the GM-100 benchmark spanning 100 tasks across 3 robotic platforms with 22,500 evaluation trials, LingBot-VLA achieves 17.30% success rate and 35.41% progress score, outperforming π0.5 (13.02% SR, 27.65% PS), GR00T N1.6 (7.59% SR, 15.99% PS), and WALL-OSS (4.05% SR, 10.35% PS). In simulation on RoboTwin 2.0, the model reaches 88.56% success rate in clean scenes and 86.68% in randomized environments, beating π0.5 by 5.82% and 9.92% respectively. Training efficiency matters for scaling. Their optimized codebase achieves 261 samples per second per GPU on an 8-GPU setup, representing a 1.5-2.8× speedup over existing VLA codebases like StarVLA, OpenPI, and DexBotic. Data efficiency is equally impressive: with only 80 demonstrations per task, LingBot-VLA outperforms π0.5 using the full 130-demonstration set. This is the first empirical demonstration that VLA performance continues scaling with more real-world robot data without saturation, providing a clear roadmap for building more capable robotic foundation models. Paper: arxiv.org/abs/2601.18692 Learn to build effective AI agents in our academy: dair-ai.thinkific.com

English
0
1
6
469
Sentientx रीट्वीट किया
Sentientx
Sentientx@Sentientx_io·
CES was a hit
Sentientx tweet mediaSentientx tweet media
English
2
0
3
196
Sentientx
Sentientx@Sentientx_io·
Core idea: let an LLM actively steer the human during recording to induce structure that passive egocentric video systematically misses. Most egocentric datasets are heavily biased toward a world where: •tasks rarely fail •failures rarely compound •abandoning a task midway is always acceptable (low stakes) This creates deceptively “clean” trajectories.
English
2
0
5
179
Sentientx
Sentientx@Sentientx_io·
We’re experimenting with LLM-controlled egocentric data collection for world models or robotics learning. The LLM assigns long-horizon objectives, interrupts with immediate sub-tasks, lets the episode drift into side tasks, and forces failure + recovery. This produces intent → action → outcome → correction loops in real time, not post-hoc labels. Claim: diversity-aware prompting can fill data gaps passive video never will, without brute-force scale. Curious whether this actually helps world models learn controllable, general dynamics. 🧵
English
4
0
8
1.1K
Sentientx
Sentientx@Sentientx_io·
The hoodie was more impactful then a booth
English
0
0
4
143