Pannag Sanketi

759 posts

Pannag Sanketi

@pannag_

Tech Lead Manager / Researcher @GoogleAI @GoogleDeepMind Robotics. Open X-Embodiment Co-Lead. Table Tennis Robots Lead. @UCBerkeley @iitmadras alum.

Katılım Eylül 2009

1K Takip Edilen894 Takipçiler

Sabitlenmiş Tweet

Pannag Sanketi@pannag_·2 Ara

I will be at #NeurIPS2025 this week! I will be giving a talk on the Table Tennis Robotics project at Google Deepmind at the MyoSymposium on Athletic Intelligence (sites.google.com/corp/view/myos…) and participating in discussing our two papers at NeurIPS:

English

454

Pannag Sanketi@pannag_·25 Mar

@andyzengineer Great work and exciting to see this! Congrats Andy and the Generalist team!

English

Andy Zeng@andyzengineer·24 Mar

Back in grad school, live ML robot demos used to be a tall order—bump a camera, it’s over. It was SOTA to duct-tape “DO NOT TOUCH” signs everywhere 😭 and pray the moon was in the right phase… Now? Turns out, with enough pretraining data, physical AI foundation models like GEN-0 don’t care—generalizes to new environments, lighting, robots… it just works. Anywhere. Millions of tasks, thousands of settings, a lifetime of physical experience, all baked into model weights. It feels like we're entering a new era for intelligent robots, and I am SO excited for it.

Generalist@GeneralistAI

We ran a live demo @nvidia GTC last week, but the real story is how quickly we got it running. The system was up and running in days, not weeks. This is a step toward robots that can be deployed quickly without task-by-task programming. How we made it happen👇 🧵 (1/6)

English

8.1K

Pannag Sanketi@pannag_·22 Mar

@heygurisingh Nice 👍 🙂. I don't see any real robot videos.

English

586

Guri Singh@heygurisingh·22 Mar

🚨 BREAKING: Someone just open-sourced a full pipeline that trains a humanoid robot to play table tennis by watching humans play. It's called Robomotion and it's beyond insane. Here's everything you need to know: → Takes raw motion-capture footage of real table tennis players → Filters out noisy data -- only keeps smooth, physically plausible trajectories → Trains a Unitree G1 humanoid using PPO reinforcement learning → Runs millions of parallel episodes on GPU with MuJoCo MJX → Randomizes friction, mass, center of mass so the policy transfers to real hardware → Exports to ONNX format -- deploys directly on the physical robot Most robotics teams spend months hand-tuning controllers for a single motion. This system watches a human, learns the skill, and executes it with the robot's own body dynamics. No teleoperation. No manual programming. JAX + Brax + MuJoCo. 100% Open Source. (Link in the comments)

English

412

144.7K

Pannag Sanketi retweetledi

Mohana Krishna (మోహన కృష్ణ)@mkrishna0510·22 Mar

Adding to this with a personal experience — I've witnessed this firsthand at Virupaksha Temple, Hampi. That glowing inverted image behind us? It's the 52-metre gopuram — projected through a tiny opening in the wall. A pinhole camera. Built centuries ago. Our ancestors weren't worshipping in spite of science. They were worshipping through it. Science. Geometry. Astronomy. Devotion. One seamless pursuit. 🇮🇳 @AnandMahindra — Hampi deserves its own visit on your list too. 🙏

English

325

84.8K

Pannag Sanketi@pannag_·5 Oca

@ylecun @_arohan_ My understanding was that B and D can go together. Is that incorrect?

English

473

Yann LeCun@ylecun·4 Oca

I think you missed the main ideas. - The basic premise of JEPA is that training by reconstructio/prediction in input space is evil (or counterproductive). The details are almost always unpredictable. Hence prediction must take place in representation space, where unpredictable details are eliminated. - The main issue with JEPA is how to prevent collapse (in the absence of reconstruction loss). There are two classes of methods: (1) EMA: Using weights in target encoder that are an exponential moving average (EMA) of the weights in other encoder (I-JEPA, V-JEPA, DINO, BYOL). (2) Infomax: Using a regularizer that attempts to maximize the information content of the representation (e.g. over a batch). There are two sets of methods for that: (2a) sample-contrastive methods: that want to make each representation vector different from the others (Siamese nets, DrLIM, SimCLR, etc). They tend to not work well in high dimension, to require large batches, and hard negative mining (2b) dimension-contrastive methods: that want to make each variable independent from the others (Barlow Twins, VICReg, SIGReg/ LeJEPA, MMCR, MCR2....) Bottom line: A. SSL by reconstruction/prediction doesn't work for high-dim, continuous, noisy data B. EMA sucks: no loss function being minimized, requirement for weightmsharing.... C. Sample-contrastive informax doesn't scale to high dimension D. My money is on dimension-contrastive methods like SIGReg/LeJEPA

English

110

1.1K

153.4K

rohan anil@_arohan_·4 Oca

On a long flight, I finally decided to dive into what JEPA is all about. You can convert an encoder decoder into JEPA by the following: - target encoder replaced by moving average of encoder to avoid collapse - Use a projection to get a summary embedding, instead of token embedding for both input and target - use all the clever loss to avoid scale sensitivity If you want tokens out, slap a decoder ontop of the summary representation. Feels like all of this could be an ablation.

English

553

276.6K

Pannag Sanketi@pannag_·5 Oca

@JitendraMalikCV Congrats Jitendra on the great journey at Meta! Very much looking forward to the work coming out of FAR!

English

342

Jitendra MALIK@JitendraMalikCV·4 Oca

1/4 For the last several years I worked part-time at the FAIR lab at Meta, in addition to being a professor at UC Berkeley. That phase is now over, and starting Jan. 5, I will be leading a robotics research effort at Amazon FAR in San Francisco, while continuing at Berkeley.

English

1.5K

307K

Pannag Sanketi@pannag_·26 Ara

@matthewsyed They will make all the talks available online soon. I will keep you posted. Thanks!

English

Matthew Syed@matthewsyed·21 Ara

@pannag_ Is this talk available to watch online? If you could follow, I’d like to DM. Your research looks v exciting!

English

143

Pannag Sanketi@pannag_·2 Ara

English

454

Pannag Sanketi retweetledi

GDP@bookwormengr·17 Ara

Congrats Pieter Abbeel @pabbeel - Amazon's new AGI Head. TPOT needs no introduction to him. He is a pioneer in Robotics and DeepRL. Has 231K citations. He is Prof UC Berkeley; has been an Amazon Distinguished Scientist / VP / Scholar since 2024 focussed on Advancing AI and Robotics. He has advised legendary doctoral students like Chelsea Finn (@chelseabfinn) of Physical Intelligence and John Schulman (@johnschulman2) - ex OpenAI, ex Anthropic and now co-founder of Thinking Machines. Looking forward.

English

1.5K

189.3K

Pannag Sanketi retweetledi

Demis Hassabis@demishassabis·7 Ara

Gemini has always had exceptionally strong multimodal capabilities. Gemini 3 Pro is an incredible vision AI model and is SOTA across all main vision & multimodal benchmarks. It’s great for document, screen, image, video & spatial understanding tasks - try now in the @GeminiApp!

English

134

175

2.1K

242.4K

Pannag Sanketi@pannag_·2 Ara

Looking forward to catching up with old friends and making new ones! Message me on X or LinkedIn if you would like to chat / grab a coffee together.

English

Pannag Sanketi@pannag_·2 Ara

1. SELF-IMPROVING EMBODIED FOUNDATION MODELS (arxiv.org/abs/2509.15155) 2. Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets (arxiv.org/abs/2505.15517)

English

Pannag Sanketi@pannag_·21 Kas

@tonyzzhao @sundayrobotics Congrats Tony and team!

English

363

Tony Zhao@tonyzzhao·19 Kas

Today, we present a step-change in robotic AI @sundayrobotics. Introducing ACT-1: A frontier robot foundation model trained on zero robot data. - Ultra long-horizon tasks - Zero-shot generalization - Advanced dexterity 🧵->

English

434

649

5.4K

Pannag Sanketi@pannag_·10 Kas

@xiao_ted @xf1280 @lgraesser3 @Stacormed @jackyliang42 @Kanishka_Rao @ColinearDevin @keerthanpg @sippeyxp @TianliDing @_anniexie Good luck Ted with your next adventure!

English

230

Ted Xiao@xiao_ted·8 Kas

After 8 unforgettable years, I have decided to leave Google DeepMind. I feel immensely grateful to have had the opportunity to help transform the dream of general-purpose robot learning from a heretical fringe idea into a normalized technology roadmap. It has been the honor of a lifetime to work on the most challenging and important problems of our time with the brightest, kindest, and most talented colleagues I could have wished for. Thank you to Julian and Vincent for taking a chance on me back in 2017, when a ragtag team at Google Brain began exploring the potential for end-to-end learning on arm farms in the real world. The team has always dreamed big: my “starter project” with Corey and Pierre was to work on a goal-conditioned imitation policy capable of going from any initial condition (latent embedding) to any goal state. That 3-month project turned into a 2-year endeavor! But even though research ambitions were lofty, colleagues and mentors have always been grounded and compassionate by default. Alex H, Karol, Julian, and Sergey supported my vision of concurrent control RL at scale while allowing me the space to grow into a creative researcher on my own terms. The team’s technical progress and my own research taste began to accelerate substantially in 2020, when Kanishka and Karol inspired the whole team to bet big on one single crazy moonshot: a general robot policy that could accomplish thousands of household manipulation tasks. Such an unprecedented group effort was new to the whole team but extremely satisfying—to learn how to harmoniously navigate 0-to-1 real-world systems scaling (robot fleets, teleoperators, scaled learning stacks) alongside rigorous scientific exploration (an objective comparison of the scaling properties of imitation and reinforcement learning). I learned so much from all my comrades-in-arms during this time, and even to this day, many of my research and engineering intuitions draw from the lessons I learned from Eric, Yao, Alex I, Keerthana, and Yevgen. The following period, starting in 2022, was absolutely magical and unique in the breadth and depth of imaginative explorations that I was privileged to contribute to and lead. Exploring the potential of foundation models for robotics changed my research outlook permanently, and projects like SayCan, RT-1, and RT-2 felt like the first magically viral moments when the world started thinking more seriously about what the promise of general and performant embodied AI might look like. When the first generalist VLAs began to reliably perform tasks that we hadn’t collected data for, it was a huge lightbulb moment for our team and the field. During this time, I was immensely inspired by what high agency, manic creativity, and blazing iteration speed can do for research, learning from extremely kind and productive colleagues like Fei, Brian, Andy, Pete, Quan, Harris, and Danny. I applied this approach of wildly creative research to areas I cared about, such as creating better action representations, understanding robot generalization, and leveraging VLMs for data quality and augmentation. I am grateful to teammates who joined me on these adventurous explorations, such as Chelsea, Dorsa, Jonathan, Wenhao, Tianli, Montse, Sean, Austin, Kelly, and Paul. I also deeply appreciate all the academic collaborations during this time—ranging from multi-institution cross-embodiment learning to open-source VLAs to scalable offline evaluation to organizing workshops. Thank you, students, interns, and friends; in particular, Soroush, Jiayuan, Laura, Xuanlin, Kyle, Karl, Oier, Dhruv, Annie, Jensen, Priya, Suneel, Ike, Homanga, Hao, and Xuesu. In the final chapter of my career at GDM, starting in 2024, I became enamored with the science and impact of frontier models and how to harness them properly in robotics. It always fundamentally bugged me that robot learning often looked like “classical” machine learning of just fitting simple distributions with small models, rather than the polished scaled systems and science of how frontier models are developed with pre-training, mid-training, and post-training. I wanted to learn about that world and figure out how to make AGI understand the physical world. I am proud of the progress we have made, and from where we started with Gemini 1.0 to today, the research innovations we have unlocked have placed both Gemini and Gemini Robotics clearly at the forefront of both fundamental world understanding and general VLA control. Thank you so much to my teammates in Embodied Reasoning who make every day bright, interesting, and fun: Fei, Jacky, Laura, Wentao, Annie, Lewis, Ksenia, Mohit, Sean, and Danny. Thank you to friends in Gemini Multimodal who taught me how to frontier model: Xi, Karel, Ishita, and Xudong. Thank you to the VLA whisperers who have shown me how very far innovation and perseverance can take you: Coline, Giulia, Claudio, Alex L, Sumeet, Ashwin, Sudeep, Debi, and Ayzaan. Thank you to mentors throughout the years who have provided shining examples that velocity and impact, and compassion, are not zero-sum: Carolina, Jie, Kanishka, Nicolas, Jonathan, Pierre, Vincent, Karol, Sergey, Chelsea, and Julian. Thank you, thank you, thank you. It has been such an unbelievable adventure, and I am so fortunate to have been part of the crazy team that started the technology breakthroughs transforming the world into one where general and helpful embodied AGI is ubiquitous in society. I will always be #1 GDM fan! As for my own journey, I will be embarking on a new adventure, both familiar and very different, and hope to have more to share soon.

English

469

42.5K

Pannag Sanketi@pannag_·4 Kas

@danijarh @GoogleDeepMind Thanks for your awesome work at GDM, Danijar! Congrats on your fantastic achievements and journey here. Good luck on your next adventure!

English

258

Danijar Hafner@danijarh·3 Kas

Today is my last day at @GoogleDeepMind. After almost exactly 10 years at Google including 12 internships and the last 2 1/2 years full time, it really feels like a chapter coming to an end. I'm grateful for all the experiences and friends I've made at Google and DeepMind. I still remember my first Brain internship in Mountain View in 2016 with James Davidson and @V_Vanhoucke, at a time where nobody had a working PPO implementation and we were wrangling with TensorFlow graphs 😄 The moment @lukaszkaiser showed us the first plausible Wikipedia page generated by a "big" LSTM. @ashVaswani full of excitement explaining the compute efficiency of a new architecture that later became the Transformer and asking me to try it for RL (I did not :P) The excitement to work on Deep RL and generative models at DeepMind during my master's in London, which turned into PlaNet with @countzerozzz and @itfische. Figuring out Karl Friston's free energy principle with Nicolas Heess and @AdaptiveAgents (which took a few more years to get right). Spending a good part of my PhD at the Brain Team in Toronto working on multiple generations of Dreamer with @mo_norouzi, various collaborations, and celebrating the Turing Award with @geoffreyhinton. And over the last few years working from Berkeley/SF on world models with @wilson1yan with significant resources thanks to @countzerozzz and @koraykv, and seeing video models & world models accomplish results that seemed completely out of reach just a few years ago. With mixed feelings but also excitement, it's time to start a new chapter!

San Francisco, CA 🇺🇸 English

138

292K

Pannag Sanketi retweetledi

Harsh Goenka@hvgoenka·1 Kas

Forget Shark Tank, forget Ideabaaz, this pitch stole my heart….

English

305

3.6K

19.9K

567.8K

Pannag Sanketi@pannag_·29 Eki

@elonmusk Federated learning FTW.

English

Elon Musk@elonmusk·29 Eki

I am increasingly confident that this idea could work

Nic Cruz Patane@niccruzpatane

Elon Musk came up with a pretty incredible idea during the Q3 Earnings Call, that no one is really talking about. His words: “Actually, one of the things I thought, if we've got all these cars that maybe are bored, while they're sort of, if they are bored, we could actually have a giant distributed inference fleet and say, if they're not actively driving, let's just have a giant distributed inference fleet. At some point, if you've got tens of millions of cars in the fleet, or maybe at some point 100 million cars in the fleet, and let's say they had at that point, I don't know, a kilowatt of inference capability, of high-performance inference capability, that's 100 gigawatts of inference distributed with power and cooling taken, with cooling and power conversion taken care of. That seems like a pretty significant asset.” So basically, each car has ~1 kilowatt of high-performance AI inference capability, Tesla wouldn’t need to build giant data centers — the fleet is the data center. Tesla could turn their entire fleet into a giant distributed inference network, spread across the world, powered by the batteries and AI in the car already. Mind blown.

English

5.5K

8.6K

110.4K

37.3M

Pannag Sanketi@pannag_·26 Eki

@kevin_zakka @Googleorg Congrats Kevin!

English

Kevin Zakka@kevin_zakka·24 Eki

Super happy and honored to be a 2025 Google PhD Fellow! Thank you @Googleorg for believing in my research. I'm looking forward to making humanoid robots more capable and trustworthy partners 🤗

Google.org@Googleorg

🎉 We're excited to announce the 2025 Google PhD Fellows! @GoogleOrg is providing over $10 million to support 255 PhD students across 35 countries, fostering the next generation of research talent to strengthen the global scientific landscape. Read more: goo.gle/43wJWw8

English

194

22.9K

Pannag Sanketi retweetledi

RoboHub🤖@XRoboHub·17 Eki

At the launch event, AgiBot demonstrated the ultra-low latency teleoperation capability of the G2 robot. An operator located in Beijing successfully crossed over 2,000 kilometers (1,243 miles) to remotely demonstrate precise shooting of a balloon floating in the Shanghai studio. The first shot missed due to the constant swaying of the balloon, but the second shot was successful, showcasing the robot's high precision and low latency performance.

RoboHub🤖@XRoboHub

AgiBot has formally unveiled its G2 humanoid robot, a system designed to transition into various industries and liberate humans from repetitive labor. G2 features high-performance joints, precision torque sensors, and an advanced spatial perception system, supporting quick deployment and multi-modal voice interaction. ► Factory Floor Performance: The G2 is engineered to industrial standards. In a safety belt lock production line, robots collaborate with human workers, performing tasks like pressing lock cores. The G2 collects production data to continuously train and iterate models (local server deployment ensures data privacy), steadily improving its operational ability. ► Mobility & Safety: The G2 navigates narrow factory aisles using dual LiDAR and full-panorama vision for environment sensing and collision detection. Its chassis is designed to overcome common obstacles (speed bumps, elevator gaps). It supports 24/7 continuous operation via autonomous return-to-charge and battery swapping. ► Humanoid Design Advantage: The G2's design includes a three-degree-of-freedom flexible waist, allowing it to mimic natural human movements like bending and side-leaning. This dramatically expands its operational workspace and enables seamless integration into existing human-centric production lines without costly modifications. ► Advanced Dexterity & Learning (Lab): The new G02 arm is the world's first cross-moment arm, featuring high-precision joint torque sensors that allow it to precisely sense external forces and adjust stiffness, mimicking human hand compliance. Using Real-Machine Reinforcement Learning (RL), the G2 can learn complex, delicate tasks like memory stick insertion in about one hour with minimal human intervention. ► Logistics & Grasping: In logistics sorting, the G2 uses a 19-degree-of-freedom mechanical dexterous hand (20N maximum fingertip force; 35kg capacity for hard objects) equipped with 3D tactile sensors to ensure it grasps securely without damaging items. Its full-body articulation (waist and legs) aids grasping and posture adjustment. ► Model & Data: G2's intelligence is powered by the Go-One Large Embodied Model (VLA architecture: Vision-Language-Latent Action) and the GE-One World Model (vision-centric predictive modeling), trained using the AgiBot Word true-machine dataset (over 500k downloads). ► Service & Interaction: The G2 is deployed as a guide/receptionist in settings like art museums. It uses its high-DOF head, arms, and waist to point to exhibits, maintains eye contact while navigating difficult spaces (chassis walks forward, body faces backward), handles specialized and random queries, and uses proactive safety features (stops movement, issues warnings) when people get too close.

English

124

18.3K

Pannag Sanketi retweetledi

Keerthana Gopalakrishnan@keerthanpg·17 Eki

The funnest part about being a roboticist is that you get to play with robots and call it “work”. Here’s Apollo w/ GR 1.5 trying to grab from my unyielding hand, like a toddler, a test of manipulation generalization. Cannot believe this is the worst humanoids will ever be!

English

265

38.3K

Pannag Sanketi retweetledi

University of California@UofCalifornia·11 Eki

University of California faculty and alumni won five Nobel Prizes this week, setting a new record for the most faculty of one institution to achieve this great honor in a single year 🏅💙 💛 These remarkable achievement highlight the ongoing contributions of America’s #1 public research university and the central role of federal funding in advancing world-changing scientific inquiry. bit.ly/4qn6CZV

English

244

1.1K

84.7K

Keşfet

@andyzengineer @heygurisingh @AnandMahindra @ylecun @_arohan_ @JitendraMalikCV @matthewsyed @pabbeel