Sparsh Garg

86 posts

Sparsh Garg

@_sparshgarg_

MLE Perception@ Lucid Motors | 3D Perception Researcher @ Bosch Center for AI | CMU Robotics

Newark, CA Katılım Ekim 2023

1.1K Takip Edilen136 Takipçiler

Sabitlenmiş Tweet

Sparsh Garg@_sparshgarg_·28 Şub

1/ Super excited to announce that our work, Depth Any Camera (DAC) has been accepted to CVPR 2025! A zero-shot metric monocular depth estimation framework that generalizes seamlessly to different camera types – no retraining needed! 📸 @Bosch_AI @BoschGlobal 🧵👇

English

900

Sparsh Garg retweetledi

Skild AI@SkildAI·14 Oca

Announcing Series C We’ve raised $1.4B, valuing the company at over $14B With this capital, we will accelerate our mission to build omni-bodied intelligence 🚀 skild.ai/blogs/series-c

English

596

343.7K

Sparsh Garg retweetledi

Skild AI@SkildAI·13 Oca

Humans learn by watching. Robots should too.

English

134

878

1.1M

Sparsh Garg retweetledi

AI at Meta@AIatMeta·16 Ara

🔉 Introducing SAM Audio, the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts. We’re sharing SAM Audio with the community, along with a perception encoder model, benchmarks and research papers, to empower others to explore new forms of expression and build applications that were previously out of reach. 🔗 Learn more: go.meta.me/568e5d

English

410

920

6.4K

1.2M

Sparsh Garg retweetledi

Songming Liu@songming_liu·26 Eyl

😠💢😵‍💫Tired of endless data collection & fine-tuning every time you try out VLA? Meet RDT2, the first foundation model that zero-shot deploys on any robot arms with unseen scenes, objects & instructions. No collection. No tuning. Just plug and play🚀 Witness a clear sign of embodied superintelligence - 7B one-step diffusion → 23 Hz inference⚡ - Re-designed UMI @chichengcc @SongShuran and manufactured 100 portable devices - Trained on 10K-hour UMI data on 100 real houses - Zero-shot: pick, place, press, wipe… open-vocabulary - Demos: block 30 m/s arrows in 500 ms🛡️; first to play ping-pong with an end-to-end model 🏓; extinguish burning incense by shaking quickly🥢 Fully open source at github.com/thu-ml/RDT2 Project page: rdt-robotics.github.io/rdt2/ Thanks to awesome collaborators @bang_guo96535 @D0g4M74794 @EthanNg51931527

English

101

580

100.7K

Sparsh Garg retweetledi

Skild AI@SkildAI·24 Eyl

We built a robot brain that nothing can stop. Shattered limbs? Jammed motors? If the bot can move, the Brain will move it— even if it’s an entirely new robot body. Meet the omni-bodied Skild Brain:

English

506

888

6.8K

2.4M

Sparsh Garg@_sparshgarg_·24 Eyl

@iclr_conf Hope to contribute as a reviewer this year!

English

3.2K

ICLR@iclr_conf·23 Eyl

We’ve received A LOT OF submissions this year 🤯🤯 and are excited to see so much interest! To ensure high-quality review, we are looking for more dedicated reviewers. If you'd like to help, please sign up here docs.google.com/forms/d/e/1FAI…

English

373

100.9K

Sparsh Garg retweetledi

Lukas Ziegler@lukas_m_ziegler·15 Eyl

A robotic ballet! 🩰 Coordinating multiple robot arms on a busy factory floor is notoriously complex. Each arm needs to move without colliding with its neighbors or the surrounding equipment, and today that planning is still mostly done by hand, a process that takes specialists hundreds of hours. Researchers at @ucl, @GoogleDeepMind, and @intrinsic have introduced RoboBallet, a new AI system that tackles this challenge head-on. Using a combination of graph neural networks and reinforcement learning, RoboBallet learns how to coordinate many robots in real time, generating smooth, collision-free motion plans in seconds instead of days. In tests, the system handled up to 40 tasks with eight arms working together, far beyond the limits of traditional planners. More importantly, it showed strong generalization: RoboBallet could adapt instantly to new factory layouts or recover if one robot failed, something previous methods struggled with. For manufacturers, this could mean faster deployment of automation, less downtime, and the ability to reconfigure production lines on the fly. Applications range from automotive welding to electronics assembly, and even large-scale construction. The current system focuses on reaching tasks, but the team plans to expand it to more complex operations like pick-and-place or painting. Here's the paper: deepmind.google/research/publi…

English

444

26K

Sparsh Garg retweetledi

Jason Liu@JasonJZLiu·9 Eyl

Ever wish a robot could just move to any goal in any environment—avoiding all collisions and reacting in real time? 🚀Excited to share our #CoRL2025 paper, Deep Reactive Policy (DRP), a learning-based motion planner that navigates complex scenes with moving obstacles—directly from point cloud input. w/ @Jiahui_Yang6709 (1/N)

English

161

885

71.8K

Sparsh Garg@_sparshgarg_·18 Ağu

@stepjamUK Great insight!

English

335

Stephen James@stepjamUK·18 Ağu

𝗜'𝘃𝗲 𝗵𝗲𝗮𝗿𝗱 𝘁𝗵𝗶𝘀 𝗮 𝗹𝗼𝘁 𝗿𝗲𝗰𝗲𝗻𝘁𝗹𝘆: "𝗪𝗲 𝘁𝗿𝗮𝗶𝗻𝗲𝗱 𝗼𝘂𝗿 𝗿𝗼𝗯𝗼𝘁 𝗼𝗻 𝗼𝗻𝗲 𝗼𝗯𝗷𝗲𝗰𝘁 𝗮𝗻𝗱 𝗶𝘁 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝗲𝗱 𝘁𝗼 𝗮 𝗻𝗼𝘃𝗲𝗹 𝗼𝗯𝗷𝗲𝗰𝘁 - 𝘁𝗵𝗲𝘀𝗲 𝗻𝗲𝘄 𝗩𝗟𝗔 𝗺𝗼𝗱𝗲𝗹𝘀 𝗮𝗿𝗲 𝗰𝗿𝗮𝘇𝘆!" Let's talk about what's actually happening in that "A" (Action) part of your VLA model. The Vision and Language components? They're incredible. Pre-trained on internet-scale data, they understand objects, spatial relationships, and task instructions better than ever. But the Action component? That's still learned from scratch on your specific robot demonstrations. 𝗛𝗲𝗿𝗲'𝘀 𝘁𝗵𝗲 𝗿𝗲𝗮𝗹𝗶𝘁𝘆: Your VLA model has internet-scale understanding of what a screwdriver looks like and what "tighten the screw" means. But the actual motor pattern for "rotating wrist while applying downward pressure"? That comes from your 500 robot demos. 𝗪𝗵𝗮𝘁 𝘁𝗵𝗶𝘀 𝗺𝗲𝗮𝗻𝘀 𝗳𝗼𝗿 "𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝗮𝘁𝗶𝗼𝗻": • 𝗩𝗶𝘀𝗶𝗼𝗻 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝗮𝘁𝗶𝗼𝗻: Recognises novel objects instantly (thanks to pre-training) • 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝗮𝘁𝗶𝗼𝗻: Understands new task instructions (thanks to pre-training) • 𝗔𝗰𝘁𝗶𝗼𝗻 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝗮𝘁𝗶𝗼𝗻: Still limited to motor patterns seen during robot training Ask that same robot to "unscrew the bottle cap" and it fails because: • Vision: Recognises bottle and cap • Language: Understands "unscrew" • Action: Never learned the "twist while pulling" motor pattern 𝗧𝗵𝗲 𝗵𝗮𝗿𝗱 𝘁𝗿𝘂𝘁𝗵 𝗮𝗯𝗼𝘂𝘁 𝗩𝗟𝗔 𝗺𝗼𝗱𝗲𝗹𝘀: The "VL" gives you incredible zero-shot understanding. The "A" still requires task-specific demonstrations. We've cracked the perception and reasoning problem. We haven't cracked the motor generalisation problem.

English

390

51.3K

Sparsh Garg retweetledi

Lucid Motors@LucidMotors·15 Ağu

Rugged by design. Elevated by nature. The #LucidGravityX concept redefines what a trail-ready adventure vehicle could be. Read more about our new bold concept: bit.ly/46Yu886

English

127

968

143.7K

Sparsh Garg retweetledi

Skild AI@SkildAI·7 Ağu

We’ve all seen humanoid robots doing backflips and dance routines for years. But if you ask them to climb a few stairs in the real world, they stumble! We took our robot on a walk around town to environments that it hadn’t seen before. Here’s how it works🧵⬇️

English

142

812

270.2K

Sparsh Garg retweetledi

Deepak Pathak@pathak2206·29 Tem

AI that truly understands the physical world should not be limited by robot type or tasks. We tackle robotics in its full generality @SkildAI. The goal is to build a continually improving, omni-bodied brain that can control any hardware for any task. x.com/SkildAI/status…

English

6.9K

Sparsh Garg retweetledi

Russ Tedrake@RussTedrake·9 Tem

TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: toyotaresearchinstitute.github.io/lbm1/ One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the technology, and to share a lot of details for how we're achieving it. youtube.com/watch?v=BEXFnr…

YouTube

English

105

487

87.5K

Sparsh Garg retweetledi

Shalev@Shalev_lif·29 Haz

The neural network objective function is a very complicated objective function. It's very non convex, and there are no mathematical guarantees whatsoever about its success. And so if you were to speak to somebody who studies optimization from a theoretical point of view, they would tell you that there is no theoretical reason to believe that the optimization will succeed. And yet it does. And this is an empirical fact. -- Ilya Suskever in 2015 Mastering deep learning = gaining intuition into why this in fact succeeds.

English

122

204.3K

Sparsh Garg retweetledi

Inbar Mosseri@inbar_mosseri·11 Haz

Excited to share that TokenVerse won Best Paper Award at SIGGRAPH 2025! 🎉 TokenVerse enables personalization of complex visual concepts, from objects and materials to poses and lighting, each can be extracted from a single image and be recomposed into a coherent result. 👇

English

211

16.2K

Sparsh Garg retweetledi

Ksenia_TuringPost@TheTuringPost·7 Haz

Log-linear attention — a new type of attention proposed by @MIT which is: - fast and efficient as linear attention - expressive as softmax It uses a small but growing number of memory slots that increases logarithmically with the sequence length. Here's how it works:

English

213

1.4K

103.9K

Sparsh Garg retweetledi

Yuliang Guo@33yuliangguo·4 Haz

[CVPR2025] Depth Any Camera: Zero-Shot Metric Depth Estimation from Any ... youtu.be/U1qGXx0QBwE?si… via @YouTube

YouTube

English

210

Sparsh Garg retweetledi

Yuan Liu@YuanLiu41955461·8 Oca

I'm excited to share our new work Diffusion as Shader (DaS), a versatile controllable video generation method for various tasks: object manipulation, camera control, mesh-to-video, and motion transfer. Project page: igl-hkust.github.io/das/ Github: github.com/IGL-HKUST/Diff…

English

321

33.4K

Sparsh Garg retweetledi

Ayush Jain@ayushjain1144·30 May

We move our eyes actively—driven by survival and efficiency—but we still don’t fully understand how. That makes supervised learning hard. In our new work, we explore how to train VLMs to reason visually using RL. ViGoRL offers a glimpse into how models like o3 might be trained.

Gabriel Sarch@GabrielSarch

How can we get VLMs to move their eyes—and reason step-by-step in visually grounded ways? 👀 We introduce ViGoRL, a RL method that anchors reasoning to image regions. 🎯 It outperforms vanilla GRPO and SFT across grounding, spatial tasks, and visual search (86.4% on V*). 👇🧵

English

556

Sparsh Garg@_sparshgarg_·23 May

@33yuliangguo That's great, Yuliang!

English

Yuliang Guo@33yuliangguo·23 May

Honored to serve as co-chair of the “Robot Mapping 2” session at #ICRA2025. Truly rewarding to connect with leading researchers and exchange insights on the past, present, and future of robotics and AI. Loved the poster session right after talks—great setup for deep interaction with authors and even session chairs!

English

128

Keşfet

@chichengcc @SongShuran @bang_guo96535 @D0g4M74794 @EthanNg51931527 @iclr_conf @ucl @GoogleDeepMind