Debidatta Dwibedi

771 posts

Debidatta Dwibedi

@debidatta

Senior Research Scientist @GoogleDeepMind, Previously Robotics @CarnegieMellon, EE @IITKanpur, StudApps (https://t.co/iDVr86IjhA)

Mountain View, CA Katılım Mayıs 2009

1.9K Takip Edilen911 Takipçiler

Sabitlenmiş Tweet

Debidatta Dwibedi@debidatta·28 Haz

Our Vision-Language-Action robot demo at #RSS2025 was eye-opening. The ultimate eval for any generalist model: new environment, new objects from audience, and new instructions. For the first time it really hit me: what if we've been underestimating what these models can do?

English

11.7K

Debidatta Dwibedi retweetledi

Sean Kirmani@SeanKirmani·12 Ara

Introducing Veo Robotics! In this work, we show that an action-conditioned video model can be used as a general robot simulator for evaluation, safety, etc. veo-robotics.github.io

English

17.2K

Debidatta Dwibedi@debidatta·28 Haz

For more details check out deepmind.google/models/gemini-…

English

315

Debidatta Dwibedi@debidatta·28 Haz

English

11.7K

Debidatta Dwibedi retweetledi

Peng Xu@sippeyxp·24 Haz

🔥Gemini Robotics On-Device is here! VLA with similar generalization, instruction following, and fast adaptation as our March release, now fits on a 4090! More exciting: we're 🚀an SDK and a model dev service (flywheel) alongside it 🎯= democratizing model development! #DeepMind #robotics is collaborating with select Trusted Testers to refine the process. For everyone else, check the 🧵 below for videos showcasing. We're just getting started—all suggestions and guidance are welcome! deepmind.google/discover/blog/…

English

15.7K

Debidatta Dwibedi@debidatta·24 Haz

What was once a dream is now real! ✨ Excited to announce Gemini Robotics On-Device: our VLA model that runs locally and shows impressive performance on 3 robot types. On-device intelligence, no internet needed!

Google DeepMind@GoogleDeepMind

We’re bringing powerful AI directly onto robots with Gemini Robotics On-Device. 🤖 It’s our first vision-language-action model to help make robots faster, highly efficient, and adaptable to new tasks and environments - without needing a constant internet connection. 🧵

English

1.4K

Debidatta Dwibedi retweetledi

Kevin Zakka@kevin_zakka·24 Mar

Booster recovery controller from last night. Sim design, training and deployment on hardware took < 1 day. With @qiayuanliao

English

678

104.4K

Debidatta Dwibedi retweetledi

Carolina Parada@parada_car88104·12 Mar

✨🤖 Today our team is so excited to bring Gemini 2.0 into the physical world with Gemini Robotics, our most advanced AI models to power the next generation of helpful robots. 🤖✨ Check it out! youtube.com/watch?v=4MvGnm… And read our blog: deepmind.google/discover/blog/… We are looking forward to seeing how robot developers will use these models to continue to advance robot performance with Gemini at the core.

YouTube

Google DeepMind@GoogleDeepMind

Meet Gemini Robotics: our latest AI models designed for a new generation of helpful robots. 🤖 Based on Gemini 2.0, they bring capabilities such as better reasoning, interactivity, dexterity and generalization into the physical world. 🧵 goo.gle/gemini2-roboti…

English

101

7.5K

Debidatta Dwibedi retweetledi

Yuge Shi (Jimmy)@YugeTen·31 Oca

✨New blog post✨: my attempt as a vision researcher at finally understanding RLHF -- a deep dive into PPO & DeepSeek's GRPO! No hot take, I promise. yugeten.github.io/posts/2025/01/…

English

170

1.3K

88.4K

Debidatta Dwibedi retweetledi

Kevin Zakka@kevin_zakka·17 Oca

The ultimate test of any physics simulator is its ability to deliver real-world results. With MuJoCo Playground, we’ve combined the very best: MuJoCo’s rich and thriving ecosystem, massively parallel GPU-accelerated simulation, and real-world results across a diverse range of robot platforms: quadrupeds, humanoids, dexterous hands, and arms. Best of all? You can get started today with a single command: pip install playground playground.mujoco.org

English

175

901

153K

Debidatta Dwibedi retweetledi

Antoine Yang@AntoineYang2·17 Ara

Gemini 2.0 Flash's video understanding is here 🚀 Think: search in videos via timecodes, extract text from moving camera footage, analyze screen recordings in real-time interactions with native audio out 🔊 Come and try it aistudio.google.com 😀 youtu.be/Mot-JEU26GQ?si…

YouTube

English

8.6K

Debidatta Dwibedi retweetledi

Vidhi Jain@viddivj·24 Eki

🧵1/8 So annoying when my 🤖 vacuum cleaner buzzes loudly during my Zoom meeting! Can we teach robots to be aware of their noise levels at home? Introducing ANAVI—a framework that uses indoor visuals to predict sound propagation! 🎶🏠

English

120

17.3K

Debidatta Dwibedi retweetledi

Julen Urain@robotgradient·5 Eyl

YouTube is a LARGE dataset of demonstration videos to train Generalist robot agents, but lacks action data. How can we learn DEXTEROUS skills from them? In #CoRL2024, we explore the problem of learning a Generalist Piano Playing agent from YouTube videos. pianomime.github.io

English

315

42.4K

Debidatta Dwibedi retweetledi

No Context Brits@NoContextBrits·18 Ağu

“The first slot machine was invented in 1894.” People in 1893:

English

1.5K

20.3K

2.8M

Debidatta Dwibedi retweetledi

Stephen James@stepjamUK·11 Tem

As we explore new opportunities and the future of this talented group, we’re grateful for all the support. Feel free to reach out—our DMs are open! @mohito1905 @younggyoseo @iainhaughton @nc__dev @chrysalis_ai @eugene_teoh @JafarUruc @SridharSola

English

10.4K

Debidatta Dwibedi retweetledi

Alexander Kolesnikov@__kolesnikov__·14 May

We just released PaliGemma-3B, a very capable Vision-Language Model. Do not waste any time, finetune it for your task: Code: github.com/google-researc… Colab: colab.research.google.com/github/google-… Kaggle: kaggle.com/models/google/… HF: huggingface.co/collections/go… Vertex AI: console.cloud.google.com/vertex-ai/publ…

English

312

27.8K

Debidatta Dwibedi retweetledi

Michael Tschannen@mtschannen·22 Mar

We just released a big 🎁GIVT update! 📈 Larger models and improved image generation results across the board 💡 Improved GMM formulation and adapter module 💻 Code, model checkpoints, and a colab are now available at github.com/google-researc… More details below... 1/

Michael Tschannen@mtschannen

Decoder-only models only work with discrete tokens, right? 🤔 Excited to present 🎁GIVT: Generative Infinite-Vocabulary Transformers, a simple way to generate arbitrary vector sequences with real-valued entries using transformer decoder-only models! arxiv.org/abs/2312.02116 1/

English

248

66.2K

Debidatta Dwibedi@debidatta·20 Mar

With Vid2Robot, we take a step towards developing robots that can perform tasks by observing humans do them in videos. Check 🧵 below for more details.

Vidhi Jain@viddivj

What if we could show a robot how to do a task? We present Vid2Robot, which is a robot policy trained to decode human intent from visual cues and translate it into actions in its environment. 🤖 Website: vid2robot.github.io Arxiv: arxiv.org/abs/2403.12943 🧵(1/n)

English

1.3K

Debidatta Dwibedi@debidatta·19 Mar

@adityagolatkar2 That's an interesting connection! Hadn't thought of it that way.

English

Aditya Golatkar@adityagolatkar2·19 Mar

@debidatta Oh thats amazing, even more intrigued to the read the paper now! Its akin the time-step embedding in diffusion models it seems.

English

Debidatta Dwibedi@debidatta·19 Mar

Can we train a model to describe different parts of images in varying levels of detail? Introducing FlexCap, a VLM designed to output localized captions in N words where we can control N with special length tokens. flex-cap.github.io

English

14.1K

Debidatta Dwibedi@debidatta·19 Mar

@adityagolatkar2 The model counts implicitly because of special length tokens that we add. If we use the token length_N then the model outputs N words before outputting EOS.

English

Aditya Golatkar@adityagolatkar2·19 Mar

@debidatta This is cool work! Quick question: Do you actually count the number of words during decoding or is it determined before hand and the model outputs EOS after the pre determined number?

English

Debidatta Dwibedi@debidatta·19 Mar

Project webpage: flex-cap.github.io Arxiv: arxiv.org/abs/2403.12026 This is joint work with @viddivj, @JonathanTompson, Andrew Zisserman and @yusufaytar .

English

643

Debidatta Dwibedi@debidatta·19 Mar

FlexCap has been useful for robotics. We used it in AutoRT (auto-rt.github.io) to find objects in the robot's environment. It also helped create the dataset used to train SpatialVLM (spatial-vlm.github.io).

English

257

Keşfet

@qiayuanliao @mohito1905 @younggyoseo @iainhaughton @nc__dev @chrysalis_ai @eugene_teoh @JafarUruc