Debidatta Dwibedi

771 posts

Debidatta Dwibedi

Debidatta Dwibedi

@debidatta

Senior Research Scientist @GoogleDeepMind, Previously Robotics @CarnegieMellon, EE @IITKanpur, StudApps (https://t.co/iDVr86IjhA)

Mountain View, CA Katılım Mayıs 2009
1.9K Takip Edilen911 Takipçiler
Sabitlenmiş Tweet
Debidatta Dwibedi
Debidatta Dwibedi@debidatta·
Our Vision-Language-Action robot demo at #RSS2025 was eye-opening. The ultimate eval for any generalist model: new environment, new objects from audience, and new instructions. For the first time it really hit me: what if we've been underestimating what these models can do?
English
3
10
84
11.7K
Debidatta Dwibedi retweetledi
Sean Kirmani
Sean Kirmani@SeanKirmani·
Introducing Veo Robotics! In this work, we show that an action-conditioned video model can be used as a general robot simulator for evaluation, safety, etc. veo-robotics.github.io
English
5
17
93
17.2K
Debidatta Dwibedi
Debidatta Dwibedi@debidatta·
Our Vision-Language-Action robot demo at #RSS2025 was eye-opening. The ultimate eval for any generalist model: new environment, new objects from audience, and new instructions. For the first time it really hit me: what if we've been underestimating what these models can do?
English
3
10
84
11.7K
Debidatta Dwibedi retweetledi
Peng Xu
Peng Xu@sippeyxp·
🔥Gemini Robotics On-Device is here! VLA with similar generalization, instruction following, and fast adaptation as our March release, now fits on a 4090! More exciting: we're 🚀an SDK and a model dev service (flywheel) alongside it 🎯= democratizing model development! #DeepMind #robotics is collaborating with select Trusted Testers to refine the process. For everyone else, check the 🧵 below for videos showcasing. We're just getting started—all suggestions and guidance are welcome! deepmind.google/discover/blog/…
English
4
8
29
15.7K
Debidatta Dwibedi
Debidatta Dwibedi@debidatta·
What was once a dream is now real! ✨ Excited to announce Gemini Robotics On-Device: our VLA model that runs locally and shows impressive performance on 3 robot types. On-device intelligence, no internet needed!
Google DeepMind@GoogleDeepMind

We’re bringing powerful AI directly onto robots with Gemini Robotics On-Device. 🤖 It’s our first vision-language-action model to help make robots faster, highly efficient, and adaptable to new tasks and environments - without needing a constant internet connection. 🧵

English
0
1
24
1.4K
Debidatta Dwibedi retweetledi
Kevin Zakka
Kevin Zakka@kevin_zakka·
Booster recovery controller from last night. Sim design, training and deployment on hardware took < 1 day. With @qiayuanliao
English
38
83
678
104.4K
Debidatta Dwibedi retweetledi
Carolina Parada
Carolina Parada@parada_car88104·
✨🤖 Today our team is so excited to bring Gemini 2.0 into the physical world with Gemini Robotics, our most advanced AI models to power the next generation of helpful robots. 🤖✨ Check it out! youtube.com/watch?v=4MvGnm… And read our blog: deepmind.google/discover/blog/… We are looking forward to seeing how robot developers will use these models to continue to advance robot performance with Gemini at the core.
YouTube video
YouTube
Google DeepMind@GoogleDeepMind

Meet Gemini Robotics: our latest AI models designed for a new generation of helpful robots. 🤖 Based on Gemini 2.0, they bring capabilities such as better reasoning, interactivity, dexterity and generalization into the physical world. 🧵 goo.gle/gemini2-roboti…

English
3
12
101
7.5K
Debidatta Dwibedi retweetledi
Yuge Shi (Jimmy)
Yuge Shi (Jimmy)@YugeTen·
✨New blog post✨: my attempt as a vision researcher at finally understanding RLHF -- a deep dive into PPO & DeepSeek's GRPO! No hot take, I promise. yugeten.github.io/posts/2025/01/…
English
25
170
1.3K
88.4K
Debidatta Dwibedi retweetledi
Kevin Zakka
Kevin Zakka@kevin_zakka·
The ultimate test of any physics simulator is its ability to deliver real-world results. With MuJoCo Playground, we’ve combined the very best: MuJoCo’s rich and thriving ecosystem, massively parallel GPU-accelerated simulation, and real-world results across a diverse range of robot platforms: quadrupeds, humanoids, dexterous hands, and arms. Best of all? You can get started today with a single command: pip install playground playground.mujoco.org
English
37
175
901
153K
Debidatta Dwibedi retweetledi
Antoine Yang
Antoine Yang@AntoineYang2·
Gemini 2.0 Flash's video understanding is here 🚀 Think: search in videos via timecodes, extract text from moving camera footage, analyze screen recordings in real-time interactions with native audio out 🔊 Come and try it aistudio.google.com 😀 youtu.be/Mot-JEU26GQ?si…
YouTube video
YouTube
English
2
10
83
8.6K
Debidatta Dwibedi retweetledi
Vidhi Jain
Vidhi Jain@viddivj·
🧵1/8 So annoying when my 🤖 vacuum cleaner buzzes loudly during my Zoom meeting! Can we teach robots to be aware of their noise levels at home? Introducing ANAVI—a framework that uses indoor visuals to predict sound propagation! 🎶🏠
English
5
24
120
17.3K
Debidatta Dwibedi retweetledi
Julen Urain
Julen Urain@robotgradient·
YouTube is a LARGE dataset of demonstration videos to train Generalist robot agents, but lacks action data. How can we learn DEXTEROUS skills from them? In #CoRL2024, we explore the problem of learning a Generalist Piano Playing agent from YouTube videos. pianomime.github.io
English
6
43
315
42.4K
Debidatta Dwibedi retweetledi
No Context Brits
No Context Brits@NoContextBrits·
“The first slot machine was invented in 1894.” People in 1893:
English
57
1.5K
20.3K
2.8M
Debidatta Dwibedi retweetledi
Michael Tschannen
Michael Tschannen@mtschannen·
We just released a big 🎁GIVT update! 📈 Larger models and improved image generation results across the board 💡 Improved GMM formulation and adapter module 💻 Code, model checkpoints, and a colab are now available at github.com/google-researc… More details below... 1/
Michael Tschannen@mtschannen

Decoder-only models only work with discrete tokens, right? 🤔 Excited to present 🎁GIVT: Generative Infinite-Vocabulary Transformers, a simple way to generate arbitrary vector sequences with real-valued entries using transformer decoder-only models! arxiv.org/abs/2312.02116 1/

English
5
47
248
66.2K
Aditya Golatkar
Aditya Golatkar@adityagolatkar2·
@debidatta Oh thats amazing, even more intrigued to the read the paper now! Its akin the time-step embedding in diffusion models it seems.
English
1
0
0
33
Debidatta Dwibedi
Debidatta Dwibedi@debidatta·
Can we train a model to describe different parts of images in varying levels of detail? Introducing FlexCap, a VLM designed to output localized captions in N words where we can control N with special length tokens. flex-cap.github.io
English
2
9
38
14.1K
Debidatta Dwibedi
Debidatta Dwibedi@debidatta·
@adityagolatkar2 The model counts implicitly because of special length tokens that we add. If we use the token length_N then the model outputs N words before outputting EOS.
English
1
0
0
28
Aditya Golatkar
Aditya Golatkar@adityagolatkar2·
@debidatta This is cool work! Quick question: Do you actually count the number of words during decoding or is it determined before hand and the model outputs EOS after the pre determined number?
English
1
0
0
18