Intern Robotics (@InternRobotics) - Twitter Profile

Pinned Tweet

Intern Robotics@InternRobotics·26 Tem

InternRobotics is open-source! 🚀 A Sim-Data-Train/Eval inclusive engine for Embodied AI: ⚙️ 1-line sim deploy 📦 Massive hybrid datasets 🧠 One-click training & eval across 50+ models 🔗 Click to explore: github.com/InternRobotics

English

1

6

12

1.6K

Intern Robotics@InternRobotics·9 Oca

@ModelScope2022 paper is coming：arxiv.org/abs/2601.02456

English

0

2

48

Intern Robotics retweeted

ModelScope@ModelScope2022·5 Oca

🤖 Introducing InternVLA-A1 — now fully open-sourced! Many VLA models follow instructions well in static scenes… but struggle in dynamic environments (conveyor belts, rotating platforms, multi-robot setups). Why? They see the present—but can’t imagine the future. InternVLA-A1 solution: unify perception, imagination, and action in one model: ✅ Scene understanding: Image + text → task parsing ✅ Task imagination: Predict future frames → reason about dynamics ✅ Guided control: Execute actions steered by visual foresight Powered by InternData-A1 - Large-scale high-quality simulated dataset, InternVLA-A1 stays robust under complex backgrounds, lighting, and distractions. 🔥 See it in action: 1️⃣ High-speed conveyor: track, predict, and stably grasp or flip packages 2️⃣ Rotating platform: task-aware recognition & precise pick-up of diverse items 📊 Outperforms π0 and Gr00t N1.5 on general manipulation benchmarks! ✨ Model, data, and code are all open! Models: modelscope.cn/models/InternR… Datasets: modelscope.cn/datasets/Inter… GitHub: github.com/InternRobotics…

English

14

90

551

37.8K

Intern Robotics@InternRobotics·7 Oca

Links & Resources 🤖 model：huggingface.co/InternRobotics… 🧩 code：github.com/InternRobotics… 📦 data：huggingface.co/datasets/Inter… 📄 paper：arxiv.org/abs/2601.02456

English

0

5

86

Intern Robotics@InternRobotics·7 Oca

Meet InternVLA-A1 🤖✨ It unifies scene understanding, visual foresight, and action execution into a single framework. 🧠 The Core: Synergizes MLLM's semantic understanding with world-model-style dynamic prediction, to "imagine" the future and guide adaptive actions. 🚀 The Fuel: Empowered by high-fidelity synthetic data (InternData-A1). The result? A VLA model that tackles high-dynamic scenarios with effortless mastery.

English

1

4

147

Intern Robotics@InternRobotics·23 Ara

Just a camera. No IMU. Low-cost setup — and it’s open-sourced. 🚀 👉 Project (more demos): steinate.github.io/logoplanner.gi… 📄 Paper (please upvote): huggingface.co/papers/2512.19… 💻 Code + deployment (please star): github.com/steinate/NavDP…

Wenzhe Cai@WenzheC7616

🤖Can robots achieve accurate navigation without any external localization feedback? 📸We present #LoGoPlanner, which handles perception, localization, and planning in one go! Check our results on LeKiWi, G1, and Go2 robots. 🌐Project: steinate.github.io/logoplanner.gi…

English

0

4

153

Intern Robotics@InternRobotics·28 Kas

code: github.com/InternRobotics… paper: arxiv.org/abs/2511.21688 webpage: gordonhu608.github.io/g2vlm.github.i… model: huggingface.co/InternRobotics… huggingface paper: huggingface.co/papers/2511.21…

Wenbo Hu@gordonhu608

🚀 Introducing G^2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning G^2VLM can natively predicts 3D attributes (depth, camera pose, pointmaps) and uses them for spatial understanding via interleaved reasoning. 🔧 End-to-End Unified Model ✅ Monocular & Video Depth Estimation ✅ Pose Estimation ✅ 3D Point Reconstruction ✅ Spatial Understanding and Reasoning 🏆 This design helps G^2VLM achieve robust performance in both 3D Reconstruction and Spatial Reasoning! #VLM #spatial #3D #LLM 💻 Code at: github.com/InternRobotics… 👇[1/n]

English

0

2

210

Intern Robotics@InternRobotics·19 Kas

Great release! Gallant demonstrates a clean voxel-grid pipeline for perceptive humanoid locomotion — unified policy, strong generalization across stairs, gaps, stepping stones, and cluttered spaces.

Elgce@BenQingwei

Introducing Gallant: Voxel Grid-based Humanoid Locomotion and Local-navigation across 3D Constrained Terrains 🤖 Project page: gallantloco.github.io Arxiv: arxiv.org/abs/2511.14625 Gallant is, to our knowledge, the first system to run a single policy that handles full-space constraints — including ground-level barriers, lateral clutter, and overhead obstacles on a humanoid robot. Instead of elevation maps or depth cameras, Gallant uses a voxel grid built directly from raw LiDAR as its perception representation, giving it inherent 3D coverage of the scene. With our custom LiDAR simulation toolkit (github.com/agent-3154/sim…), we model realistic scans, including returns from the robot’s own moving links, which is crucial for sim-to-real transfer. On the control side, we use a target-based training scheme rather than standard velocity tracking. The robot is given a goal and learns to discover its own in-path velocities and trajectories, so no external high-frequency command stream is needed during deployment. The policy itself is intentionally lightweight: just a 3-layer CNN + 3-layer MLP (~0.3M params), running onboard on the Unitree G1’s Orin NX at 50 Hz with no extra compute. Training takes about 6 hours on 8× NVIDIA RTX 4090 GPUs. The resulting policy transfers directly to the real robot and achieves >90% success rate on most tested terrain types. Gallant is our “half-way” step toward robust perceptive locomotion — a problem we believe remains fundamental for humanoid robots. We’re now working toward closing the gap to near-100% reliability and expanding the pipeline further. Code will be fully released soon. Discussion, feedback, and collaboration are very welcome! 🙌

English

0

143

Intern Robotics retweeted

Elgce@BenQingwei·19 Kas

Introducing Gallant: Voxel Grid-based Humanoid Locomotion and Local-navigation across 3D Constrained Terrains 🤖 Project page: gallantloco.github.io Arxiv: arxiv.org/abs/2511.14625 Gallant is, to our knowledge, the first system to run a single policy that handles full-space constraints — including ground-level barriers, lateral clutter, and overhead obstacles on a humanoid robot. Instead of elevation maps or depth cameras, Gallant uses a voxel grid built directly from raw LiDAR as its perception representation, giving it inherent 3D coverage of the scene. With our custom LiDAR simulation toolkit (github.com/agent-3154/sim…), we model realistic scans, including returns from the robot’s own moving links, which is crucial for sim-to-real transfer. On the control side, we use a target-based training scheme rather than standard velocity tracking. The robot is given a goal and learns to discover its own in-path velocities and trajectories, so no external high-frequency command stream is needed during deployment. The policy itself is intentionally lightweight: just a 3-layer CNN + 3-layer MLP (~0.3M params), running onboard on the Unitree G1’s Orin NX at 50 Hz with no extra compute. Training takes about 6 hours on 8× NVIDIA RTX 4090 GPUs. The resulting policy transfers directly to the real robot and achieves >90% success rate on most tested terrain types. Gallant is our “half-way” step toward robust perceptive locomotion — a problem we believe remains fundamental for humanoid robots. We’re now working toward closing the gap to near-100% reliability and expanding the pipeline further. Code will be fully released soon. Discussion, feedback, and collaboration are very welcome! 🙌

English

3

35

207

52.7K

Intern Robotics@InternRobotics·13 Kas

X-VLA is fully open-source！The first complete open pipeline for long-horizon cloth folding. #VLA #embodiedai #Robotics thu-air-dream.github.io/X-VLA/

English

18

1

2

110

Intern Robotics@InternRobotics·23 Eki

🎉IROS 2025 Workshop & Challenge Highlights On Oct 20, the Workshop on Multimodal Robot Learning in Physical Worlds, hosted by Shanghai AI Lab, successfully concluded at #IROS2025. 💡 The event gathered experts from UC Berkeley, MIT, Stanford, Tsinghua, Zhejiang University, and ShanghaiTech to explore interactive and generalizable multimodal robot learning bridging simulation and the real world. 📺 Full talk replays are now live on the official website — check them out! 🔗internrobotics.shlab.org.cn/workshop/2025/ #embodiedai #AIResearch

English

0

3

904

Intern Robotics@InternRobotics·29 Eyl

🤖 Go from a task like "set the table" to a complete 3D tabletop scene, ready for robot simulation. Meet MesaTask 🚀 [NeurIPS 2025 Spotlight] ✨ 10K+ physics-verified tabletop scenes ✨ 12K+ curated 3D assets ✨ Outperforms baselines in alignment, realism & physicality All data & code are OPEN — try it now ⚡ 🔗 Dataset: huggingface.co/datasets/Inter… 🌐 Project: mesatask.github.io 💻 Code: github.com/InternRobotics… 📄 Paper：arxiv.org/abs/2509.22281 #NeurIPS2025 #AI #Robotics #EmbodiedAI #Dataset #OpenSource

English

0

2

9

380

Intern Robotics@InternRobotics·28 Eyl

🚨 Important Notice 🚨 Challenge update: 📅 Test server closing extended → Oct 7 ⏰ Final sprint — registered teams, make sure to submit on time! Don’t miss your chance!🔥

Intern Robotics@InternRobotics

🔥 Join our Challenge on Multimodal Robot Learning in InternUtopia and Real World! 🎮 Tasks: Manipulation & Navigation 🗺️ Each track includes an online qualifier and on-site finals 🧰 Starter kits open now 🥇 Winner prize: $10K 🔗 internrobotics.shlab.org.cn/challenge/2025/ #IROS2025 #Robotics

English

0

1

188

Intern Robotics@InternRobotics·22 Eyl

🤖 Behavior Foundation Model (BFM) for Humanoid Robots #robotics #embodiedai We are excited to re-introduce our Behavior Foundation Model for Humanoid Robots, built upon a unified perspective of diverse WBC tasks —— a promising step toward a foundation model for general humanoid control. 🌐 Website: bfm4humanoid.github.io 📄 Paper: arxiv.org/abs/2509.13780

English

0

1

4

173

Intern Robotics@InternRobotics·19 Eyl

We are trending fast on @huggingface ! 🔥🔥🔥 Thanks for all the love on our models, datasets, and papers 🥰 Fuel us with more LIKES & STARS so we can push even harder 🚀huggingface.co/InternRobotics

English

0

1

129

Intern Robotics@InternRobotics·18 Eyl

Shanghai AI Laboratory has launched InternVLA·A1, the first integrated “embodied manipulation model” capable of understanding, imagining, and executing ❗️ Real-world evaluations show it significantly outperforms π0 and GR00T N1.5, demonstrating strong adaptability in highly dynamic scenarios 🌟. The model has been adapted for multiple robotic platforms, including Ark Infinity, Guodi Qinglong Humanoid Robot, Zhiyuan Genie, 松灵, and Franka🦾, enabling users to quickly adapt to new environments and tasks. With the open-source release of InternVLA·A1, AI Lab has shared the complete technical framework for embodied intelligence's "thinking-acting-self-learning" closed loop: InternVLA·M1 serves as the "brain," responsible for spatial reasoning and task planning; InternVLA·A1 acts as the "cerebellum," enabling agile and precise motion execution; The general reward model VLAC enhances reinforcement learning efficiency in real-world applications. At 7:30 PM on September 19 (this Friday), Shanghai AI Laboratory will collaborate with multiple industry experts to host the second live session of Open Source Week, providing an in-depth analysis of the related technologies. Welcome to reserve your spot and join the broadcast.

English

0

2

188

Intern Robotics@InternRobotics·17 Eyl

InternRobotics has 3 models trending on @huggingface #Robotics 🚀 🏆 #2, #3, #7 — climbing the charts fast! 👉 huggingface.co/InternRobotics

English

0

101

Intern Robotics

Discover