Sparsh Garg

86 posts

Sparsh Garg

Sparsh Garg

@_sparshgarg_

MLE Perception@ Lucid Motors | 3D Perception Researcher @ Bosch Center for AI | CMU Robotics

Newark, CA Katılım Ekim 2023
1.1K Takip Edilen136 Takipçiler
Sabitlenmiş Tweet
Sparsh Garg
Sparsh Garg@_sparshgarg_·
1/ Super excited to announce that our work, Depth Any Camera (DAC) has been accepted to CVPR 2025! A zero-shot metric monocular depth estimation framework that generalizes seamlessly to different camera types – no retraining needed! 📸 @Bosch_AI @BoschGlobal 🧵👇
English
1
1
12
900
Sparsh Garg retweetledi
Skild AI
Skild AI@SkildAI·
Announcing Series C We’ve raised $1.4B, valuing the company at over $14B With this capital, we will accelerate our mission to build omni-bodied intelligence 🚀 skild.ai/blogs/series-c
English
25
73
596
343.7K
Sparsh Garg retweetledi
Skild AI
Skild AI@SkildAI·
Humans learn by watching. Robots should too.
English
32
134
878
1.1M
Sparsh Garg retweetledi
AI at Meta
AI at Meta@AIatMeta·
🔉 Introducing SAM Audio, the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts. We’re sharing SAM Audio with the community, along with a perception encoder model, benchmarks and research papers, to empower others to explore new forms of expression and build applications that were previously out of reach. 🔗 Learn more: go.meta.me/568e5d
English
410
920
6.4K
1.2M
Sparsh Garg retweetledi
Songming Liu
Songming Liu@songming_liu·
😠💢😵‍💫Tired of endless data collection & fine-tuning every time you try out VLA? Meet RDT2, the first foundation model that zero-shot deploys on any robot arms with unseen scenes, objects & instructions. No collection. No tuning. Just plug and play🚀 Witness a clear sign of embodied superintelligence - 7B one-step diffusion → 23 Hz inference⚡ - Re-designed UMI @chichengcc @SongShuran and manufactured 100 portable devices - Trained on 10K-hour UMI data on 100 real houses - Zero-shot: pick, place, press, wipe… open-vocabulary - Demos: block 30 m/s arrows in 500 ms🛡️; first to play ping-pong with an end-to-end model 🏓; extinguish burning incense by shaking quickly🥢 Fully open source at github.com/thu-ml/RDT2 Project page: rdt-robotics.github.io/rdt2/ Thanks to awesome collaborators @bang_guo96535 @D0g4M74794 @EthanNg51931527
English
86
101
580
100.7K
Sparsh Garg retweetledi
Skild AI
Skild AI@SkildAI·
We built a robot brain that nothing can stop. Shattered limbs? Jammed motors? If the bot can move, the Brain will move it— even if it’s an entirely new robot body. Meet the omni-bodied Skild Brain:
English
506
888
6.8K
2.4M
ICLR
ICLR@iclr_conf·
We’ve received A LOT OF submissions this year 🤯🤯 and are excited to see so much interest! To ensure high-quality review, we are looking for more dedicated reviewers. If you'd like to help, please sign up here docs.google.com/forms/d/e/1FAI…
English
12
71
373
100.9K
Sparsh Garg retweetledi
Lukas Ziegler
Lukas Ziegler@lukas_m_ziegler·
A robotic ballet! 🩰 Coordinating multiple robot arms on a busy factory floor is notoriously complex. Each arm needs to move without colliding with its neighbors or the surrounding equipment, and today that planning is still mostly done by hand, a process that takes specialists hundreds of hours. Researchers at @ucl, @GoogleDeepMind, and @intrinsic have introduced RoboBallet, a new AI system that tackles this challenge head-on. Using a combination of graph neural networks and reinforcement learning, RoboBallet learns how to coordinate many robots in real time, generating smooth, collision-free motion plans in seconds instead of days. In tests, the system handled up to 40 tasks with eight arms working together, far beyond the limits of traditional planners. More importantly, it showed strong generalization: RoboBallet could adapt instantly to new factory layouts or recover if one robot failed, something previous methods struggled with. For manufacturers, this could mean faster deployment of automation, less downtime, and the ability to reconfigure production lines on the fly. Applications range from automotive welding to electronics assembly, and even large-scale construction. The current system focuses on reaching tasks, but the team plans to expand it to more complex operations like pick-and-place or painting. Here's the paper: deepmind.google/research/publi…
English
6
66
444
26K
Sparsh Garg retweetledi
Jason Liu
Jason Liu@JasonJZLiu·
Ever wish a robot could just move to any goal in any environment—avoiding all collisions and reacting in real time? 🚀Excited to share our #CoRL2025 paper, Deep Reactive Policy (DRP), a learning-based motion planner that navigates complex scenes with moving obstacles—directly from point cloud input. w/ @Jiahui_Yang6709 (1/N)
English
21
161
885
71.8K
Stephen James
Stephen James@stepjamUK·
𝗜'𝘃𝗲 𝗵𝗲𝗮𝗿𝗱 𝘁𝗵𝗶𝘀 𝗮 𝗹𝗼𝘁 𝗿𝗲𝗰𝗲𝗻𝘁𝗹𝘆: "𝗪𝗲 𝘁𝗿𝗮𝗶𝗻𝗲𝗱 𝗼𝘂𝗿 𝗿𝗼𝗯𝗼𝘁 𝗼𝗻 𝗼𝗻𝗲 𝗼𝗯𝗷𝗲𝗰𝘁 𝗮𝗻𝗱 𝗶𝘁 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝗲𝗱 𝘁𝗼 𝗮 𝗻𝗼𝘃𝗲𝗹 𝗼𝗯𝗷𝗲𝗰𝘁 - 𝘁𝗵𝗲𝘀𝗲 𝗻𝗲𝘄 𝗩𝗟𝗔 𝗺𝗼𝗱𝗲𝗹𝘀 𝗮𝗿𝗲 𝗰𝗿𝗮𝘇𝘆!" Let's talk about what's actually happening in that "A" (Action) part of your VLA model. The Vision and Language components? They're incredible. Pre-trained on internet-scale data, they understand objects, spatial relationships, and task instructions better than ever. But the Action component? That's still learned from scratch on your specific robot demonstrations. 𝗛𝗲𝗿𝗲'𝘀 𝘁𝗵𝗲 𝗿𝗲𝗮𝗹𝗶𝘁𝘆: Your VLA model has internet-scale understanding of what a screwdriver looks like and what "tighten the screw" means. But the actual motor pattern for "rotating wrist while applying downward pressure"? That comes from your 500 robot demos. 𝗪𝗵𝗮𝘁 𝘁𝗵𝗶𝘀 𝗺𝗲𝗮𝗻𝘀 𝗳𝗼𝗿 "𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝗮𝘁𝗶𝗼𝗻":   • 𝗩𝗶𝘀𝗶𝗼𝗻 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝗮𝘁𝗶𝗼𝗻: Recognises novel objects instantly (thanks to pre-training)   • 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝗮𝘁𝗶𝗼𝗻: Understands new task instructions (thanks to pre-training)   • 𝗔𝗰𝘁𝗶𝗼𝗻 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝗮𝘁𝗶𝗼𝗻: Still limited to motor patterns seen during robot training Ask that same robot to "unscrew the bottle cap" and it fails because: • Vision: Recognises bottle and cap • Language: Understands "unscrew" • Action: Never learned the "twist while pulling" motor pattern 𝗧𝗵𝗲 𝗵𝗮𝗿𝗱 𝘁𝗿𝘂𝘁𝗵 𝗮𝗯𝗼𝘂𝘁 𝗩𝗟𝗔 𝗺𝗼𝗱𝗲𝗹𝘀: The "VL" gives you incredible zero-shot understanding. The "A" still requires task-specific demonstrations. We've cracked the perception and reasoning problem. We haven't cracked the motor generalisation problem.
English
12
51
390
51.3K
Sparsh Garg retweetledi
Lucid Motors
Lucid Motors@LucidMotors·
Rugged by design. Elevated by nature. The #LucidGravityX concept redefines what a trail-ready adventure vehicle could be. Read more about our new bold concept: bit.ly/46Yu886
Lucid Motors tweet mediaLucid Motors tweet mediaLucid Motors tweet mediaLucid Motors tweet media
English
80
127
968
143.7K
Sparsh Garg retweetledi
Skild AI
Skild AI@SkildAI·
We’ve all seen humanoid robots doing backflips and dance routines for years. But if you ask them to climb a few stairs in the real world, they stumble! We took our robot on a walk around town to environments that it hadn’t seen before. Here’s how it works🧵⬇️
English
38
142
812
270.2K
Sparsh Garg retweetledi
Deepak Pathak
Deepak Pathak@pathak2206·
AI that truly understands the physical world should not be limited by robot type or tasks. We tackle robotics in its full generality @SkildAI. The goal is to build a continually improving, omni-bodied brain that can control any hardware for any task. x.com/SkildAI/status…
English
5
8
65
6.9K
Sparsh Garg retweetledi
Russ Tedrake
Russ Tedrake@RussTedrake·
TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: toyotaresearchinstitute.github.io/lbm1/ One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the technology, and to share a lot of details for how we're achieving it. youtube.com/watch?v=BEXFnr…
YouTube video
YouTube
English
8
105
487
87.5K
Sparsh Garg retweetledi
Shalev
Shalev@Shalev_lif·
The neural network objective function is a very complicated objective function. It's very non convex, and there are no mathematical guarantees whatsoever about its success. And so if you were to speak to somebody who studies optimization from a theoretical point of view, they would tell you that there is no theoretical reason to believe that the optimization will succeed. And yet it does. And this is an empirical fact. -- Ilya Suskever in 2015 Mastering deep learning = gaining intuition into why this in fact succeeds.
Shalev tweet media
English
30
122
1K
204.3K
Sparsh Garg retweetledi
Inbar Mosseri
Inbar Mosseri@inbar_mosseri·
Excited to share that TokenVerse won Best Paper Award at SIGGRAPH 2025! 🎉 TokenVerse enables personalization of complex visual concepts, from objects and materials to poses and lighting, each can be extracted from a single image and be recomposed into a coherent result. 👇
English
9
22
211
16.2K
Sparsh Garg retweetledi
Ksenia_TuringPost
Ksenia_TuringPost@TheTuringPost·
Log-linear attention — a new type of attention proposed by @MIT which is: - fast and efficient as linear attention - expressive as softmax It uses a small but growing number of memory slots that increases logarithmically with the sequence length. Here's how it works:
Ksenia_TuringPost tweet media
English
12
213
1.4K
103.9K
Sparsh Garg retweetledi
Yuan Liu
Yuan Liu@YuanLiu41955461·
I'm excited to share our new work Diffusion as Shader (DaS), a versatile controllable video generation method for various tasks: object manipulation, camera control, mesh-to-video, and motion transfer. Project page: igl-hkust.github.io/das/ Github: github.com/IGL-HKUST/Diff…
English
4
79
321
33.4K
Sparsh Garg retweetledi
Ayush Jain
Ayush Jain@ayushjain1144·
We move our eyes actively—driven by survival and efficiency—but we still don’t fully understand how. That makes supervised learning hard. In our new work, we explore how to train VLMs to reason visually using RL. ViGoRL offers a glimpse into how models like o3 might be trained.
Gabriel Sarch@GabrielSarch

How can we get VLMs to move their eyes—and reason step-by-step in visually grounded ways? 👀 We introduce ViGoRL, a RL method that anchors reasoning to image regions. 🎯 It outperforms vanilla GRPO and SFT across grounding, spatial tasks, and visual search (86.4% on V*). 👇🧵

English
0
1
11
556
Yuliang Guo
Yuliang Guo@33yuliangguo·
Honored to serve as co-chair of the “Robot Mapping 2” session at #ICRA2025. Truly rewarding to connect with leading researchers and exchange insights on the past, present, and future of robotics and AI. Loved the poster session right after talks—great setup for deep interaction with authors and even session chairs!
English
1
0
1
128