Intern Robotics

48 posts

Intern Robotics

Intern Robotics

@InternRobotics

Building inclusive infrastructure for Embodied AI, from Shanghai AI Lab. GitHub: https://t.co/vJITgYwCWS Wesite: https://t.co/bIOl4hc668

Joined Temmuz 2025
37 Following131 Followers
Pinned Tweet
Intern Robotics
Intern Robotics@InternRobotics·
InternRobotics is open-source! 🚀 A Sim-Data-Train/Eval inclusive engine for Embodied AI: ⚙️ 1-line sim deploy 📦 Massive hybrid datasets 🧠 One-click training & eval across 50+ models 🔗 Click to explore: github.com/InternRobotics
English
1
6
12
1.6K
Intern Robotics retweeted
ModelScope
ModelScope@ModelScope2022·
🤖 Introducing InternVLA-A1 — now fully open-sourced! Many VLA models follow instructions well in static scenes… but struggle in dynamic environments (conveyor belts, rotating platforms, multi-robot setups). Why? They see the present—but can’t imagine the future. InternVLA-A1 solution: unify perception, imagination, and action in one model: ✅ Scene understanding: Image + text → task parsing ✅ Task imagination: Predict future frames → reason about dynamics ✅ Guided control: Execute actions steered by visual foresight Powered by InternData-A1 - Large-scale high-quality simulated dataset, InternVLA-A1 stays robust under complex backgrounds, lighting, and distractions. 🔥 See it in action: 1️⃣ High-speed conveyor: track, predict, and stably grasp or flip packages 2️⃣ Rotating platform: task-aware recognition & precise pick-up of diverse items 📊 Outperforms π0 and Gr00t N1.5 on general manipulation benchmarks! ✨ Model, data, and code are all open! Models: modelscope.cn/models/InternR… Datasets: modelscope.cn/datasets/Inter… GitHub: github.com/InternRobotics…
English
14
90
551
37.8K
Intern Robotics
Intern Robotics@InternRobotics·
Meet InternVLA-A1 🤖✨ It unifies scene understanding, visual foresight, and action execution into a single framework. 🧠 The Core: Synergizes MLLM's semantic understanding with world-model-style dynamic prediction, to "imagine" the future and guide adaptive actions. 🚀 The Fuel: Empowered by high-fidelity synthetic data (InternData-A1). The result? A VLA model that tackles high-dynamic scenarios with effortless mastery.
English
1
1
4
147
Intern Robotics
Intern Robotics@InternRobotics·
Just a camera. No IMU. Low-cost setup — and it’s open-sourced. 🚀 👉 Project (more demos): steinate.github.io/logoplanner.gi… 📄 Paper (please upvote): huggingface.co/papers/2512.19… 💻 Code + deployment (please star): github.com/steinate/NavDP…
Wenzhe Cai@WenzheC7616

🤖Can robots achieve accurate navigation without any external localization feedback? 📸We present #LoGoPlanner, which handles perception, localization, and planning in one go! Check our results on LeKiWi, G1, and Go2 robots. 🌐Project: steinate.github.io/logoplanner.gi…

English
0
0
4
153
Intern Robotics
Intern Robotics@InternRobotics·
Great release! Gallant demonstrates a clean voxel-grid pipeline for perceptive humanoid locomotion — unified policy, strong generalization across stairs, gaps, stepping stones, and cluttered spaces.
Elgce@BenQingwei

Introducing Gallant: Voxel Grid-based Humanoid Locomotion and Local-navigation across 3D Constrained Terrains 🤖 Project page: gallantloco.github.io Arxiv: arxiv.org/abs/2511.14625 Gallant is, to our knowledge, the first system to run a single policy that handles full-space constraints — including ground-level barriers, lateral clutter, and overhead obstacles on a humanoid robot. Instead of elevation maps or depth cameras, Gallant uses a voxel grid built directly from raw LiDAR as its perception representation, giving it inherent 3D coverage of the scene. With our custom LiDAR simulation toolkit (github.com/agent-3154/sim…), we model realistic scans, including returns from the robot’s own moving links, which is crucial for sim-to-real transfer. On the control side, we use a target-based training scheme rather than standard velocity tracking. The robot is given a goal and learns to discover its own in-path velocities and trajectories, so no external high-frequency command stream is needed during deployment. The policy itself is intentionally lightweight: just a 3-layer CNN + 3-layer MLP (~0.3M params), running onboard on the Unitree G1’s Orin NX at 50 Hz with no extra compute. Training takes about 6 hours on 8× NVIDIA RTX 4090 GPUs. The resulting policy transfers directly to the real robot and achieves >90% success rate on most tested terrain types. Gallant is our “half-way” step toward robust perceptive locomotion — a problem we believe remains fundamental for humanoid robots. We’re now working toward closing the gap to near-100% reliability and expanding the pipeline further. Code will be fully released soon. Discussion, feedback, and collaboration are very welcome! 🙌

English
0
0
0
143
Intern Robotics retweeted
Elgce
Elgce@BenQingwei·
Introducing Gallant: Voxel Grid-based Humanoid Locomotion and Local-navigation across 3D Constrained Terrains 🤖 Project page: gallantloco.github.io Arxiv: arxiv.org/abs/2511.14625 Gallant is, to our knowledge, the first system to run a single policy that handles full-space constraints — including ground-level barriers, lateral clutter, and overhead obstacles on a humanoid robot. Instead of elevation maps or depth cameras, Gallant uses a voxel grid built directly from raw LiDAR as its perception representation, giving it inherent 3D coverage of the scene. With our custom LiDAR simulation toolkit (github.com/agent-3154/sim…), we model realistic scans, including returns from the robot’s own moving links, which is crucial for sim-to-real transfer. On the control side, we use a target-based training scheme rather than standard velocity tracking. The robot is given a goal and learns to discover its own in-path velocities and trajectories, so no external high-frequency command stream is needed during deployment. The policy itself is intentionally lightweight: just a 3-layer CNN + 3-layer MLP (~0.3M params), running onboard on the Unitree G1’s Orin NX at 50 Hz with no extra compute. Training takes about 6 hours on 8× NVIDIA RTX 4090 GPUs. The resulting policy transfers directly to the real robot and achieves >90% success rate on most tested terrain types. Gallant is our “half-way” step toward robust perceptive locomotion — a problem we believe remains fundamental for humanoid robots. We’re now working toward closing the gap to near-100% reliability and expanding the pipeline further. Code will be fully released soon. Discussion, feedback, and collaboration are very welcome! 🙌
English
3
35
207
52.7K
Intern Robotics
Intern Robotics@InternRobotics·
🎉IROS 2025 Workshop & Challenge Highlights On Oct 20, the Workshop on Multimodal Robot Learning in Physical Worlds, hosted by Shanghai AI Lab, successfully concluded at #IROS2025. 💡 The event gathered experts from UC Berkeley, MIT, Stanford, Tsinghua, Zhejiang University, and ShanghaiTech to explore interactive and generalizable multimodal robot learning bridging simulation and the real world. 📺 Full talk replays are now live on the official website — check them out! 🔗internrobotics.shlab.org.cn/workshop/2025/ #embodiedai #AIResearch
Intern Robotics tweet media
English
0
0
3
904
Intern Robotics
Intern Robotics@InternRobotics·
🤖 Go from a task like "set the table" to a complete 3D tabletop scene, ready for robot simulation. Meet MesaTask 🚀 [NeurIPS 2025 Spotlight] ✨ 10K+ physics-verified tabletop scenes ✨ 12K+ curated 3D assets ✨ Outperforms baselines in alignment, realism & physicality All data & code are OPEN — try it now ⚡ 🔗 Dataset: huggingface.co/datasets/Inter… 🌐 Project: mesatask.github.io 💻 Code: github.com/InternRobotics… 📄 Paper:arxiv.org/abs/2509.22281 #NeurIPS2025 #AI #Robotics #EmbodiedAI #Dataset #OpenSource
English
0
2
9
380
Intern Robotics
Intern Robotics@InternRobotics·
🤖 Behavior Foundation Model (BFM) for Humanoid Robots #robotics #embodiedai We are excited to re-introduce our Behavior Foundation Model for Humanoid Robots, built upon a unified perspective of diverse WBC tasks —— a promising step toward a foundation model for general humanoid control. 🌐 Website: bfm4humanoid.github.io 📄 Paper: arxiv.org/abs/2509.13780
English
0
1
4
173
Intern Robotics
Intern Robotics@InternRobotics·
Shanghai AI Laboratory has launched InternVLA·A1, the first integrated “embodied manipulation model” capable of understanding, imagining, and executing ❗️ Real-world evaluations show it significantly outperforms π0 and GR00T N1.5, demonstrating strong adaptability in highly dynamic scenarios 🌟. The model has been adapted for multiple robotic platforms, including Ark Infinity, Guodi Qinglong Humanoid Robot, Zhiyuan Genie, 松灵, and Franka🦾, enabling users to quickly adapt to new environments and tasks. With the open-source release of InternVLA·A1, AI Lab has shared the complete technical framework for embodied intelligence's "thinking-acting-self-learning" closed loop: InternVLA·M1 serves as the "brain," responsible for spatial reasoning and task planning; InternVLA·A1 acts as the "cerebellum," enabling agile and precise motion execution; The general reward model VLAC enhances reinforcement learning efficiency in real-world applications. At 7:30 PM on September 19 (this Friday), Shanghai AI Laboratory will collaborate with multiple industry experts to host the second live session of Open Source Week, providing an in-depth analysis of the related technologies. Welcome to reserve your spot and join the broadcast.
Intern Robotics tweet mediaIntern Robotics tweet mediaIntern Robotics tweet media
English
0
0
2
188