Himanshu Gaurav Singh

140 posts

Himanshu Gaurav Singh

@Cinnabar233

phd @berkeley_ai, prev iitd

Berkeley, CA Katılım Haziran 2019

921 Takip Edilen614 Takipçiler

Himanshu Gaurav Singh retweetledi

Lucas Beyer (bl16)@giffmana·2d

In other words: their RL transfers/generalizes.

OpenAI@OpenAI

We’re talking about Goblins. openai.com/index/where-th…

English

668

121.4K

Himanshu Gaurav Singh retweetledi

Grummz@Grummz·3 Nis

Robotics are cool. This is open source, 3d printable.

English

186

2.1K

118.8K

Himanshu Gaurav Singh retweetledi

Neerja Thakkar@neerjathakkar·2 Nis

What’s the right representation for a world model? 3D, pixels, or something else? Excited to release our new paper “Forecasting Motion in the Wild” where we propose point tracks as tokens for generating complex non-rigid motion and behavior From @GoogleDeepmind @Berkeley_AI @TTIC_Connect

GIF

English

458

77.3K

Himanshu Gaurav Singh retweetledi

Kevin Zakka@kevin_zakka·20 Mar

Applied to Claude Code and Codex OSS programs for my MuJoCo work (mjlab + related tools), but didn’t get in 😢. If anyone at OpenAI or Anthropic is open to taking another look, would love to share more about what I’m building and its impact on the ecosystem.

English

207

34.4K

Himanshu Gaurav Singh retweetledi

Rahul@selfawareatom·10 Mar

Now that our 15 member llm team is infamous, time to expand for next time! If you have done one or more of the following, then please reach out. - pretrained a model of any size, from scratch - posttrained any base model, end to end (data curation, sft, rl) - are a pytorch wizard - are a cuda kernel master - you have any other relevant skills and work to back it up firstnamesarvamai

English

696

82.9K

Himanshu Gaurav Singh retweetledi

Harman Singh @ ICLR 🇧🇷@Harman26Singh·6 Mar

Future directions from V1 / pairwise self-verification: 💡 Latency knob for #DeepThink-style systems by spending compute upfront on parallelizable pairwise verification 💡 Test-time scaling for agents: use or improve V1-Infer as a selection signal (@xiaochuanlee) 💡 Reward-model-free RLVR via self-signals from pairwise comparisons 💡 Rubric-based RLVR in non-verifiable domains via V1-Infer-style ranking for rewards (@vijaytarian) 💡 Analyze how V1 shifts the generation vs verification compute frontier, and how RL-for-verification changes that curve (@nishadsinghi @hbXNov) Related work (links) below 👇

Harman Singh @ ICLR 🇧🇷@Harman26Singh

Can LLMs Self-Verify? Much better than you'd expect. LLMs are increasingly used as parallel reasoners, sampling many solutions at once. Choosing the right answer is the real bottleneck. We show that pairwise self-verification is a powerful primitive. Introducing V1, a framework that unifies generation and self-verification: 💡 Pairwise self-verification beats pointwise scoring, improving test-time scaling 💡 V1-Infer: Efficient tournament-style ranking that improves self-verification 💡 V1-PairRL: RL training where generation and verification co-evolve for developing better self-verifiers 🧵👇

English

2.6K

Himanshu Gaurav Singh retweetledi

Harman Singh @ ICLR 🇧🇷@Harman26Singh·5 Mar

English

383

88.5K

Himanshu Gaurav Singh retweetledi

Will Liang@willjhliang·4 Mar

Introducing Tether 🪢, a fun little idea to scale data by having our robot “play” in the real world for over 24 hours, throughout the day and overnight—improving policies from zero to mastery with minimal supervision! But play is messy, with out-of-distribution scenarios that are hard to anticipate. To perform autonomous functional play in the real world, from just a handful of demos, we propose a highly robust few-shot imitation method that warps demo trajectories using visual correspondences. Then, continuously running it within a multi-task VLM-guided cycle, we generate a data stream that produces 1000+ expert-level demos. This generated data is finally funneled downstream to train imitation learning policies, which improve from zero to near-perfect success rates. We’ll be presenting Tether at #ICLR2026 in just a few weeks! But before that, deep dive with me… 🧵

English

272

44.5K

Himanshu Gaurav Singh retweetledi

Gurusha Juneja@GurushaJuneja·25 Oca

There were hallucinated references at #NeurIPS2025 & @iclr_conf this year, so I built harcx pypi.org/project/harcx/. A Python package to verify BibTeX citations against real academic databases. It supports papers, books, and URLs. Usage: pip install harcx harcx references.bib

Jitendra MALIK@JitendraMalikCV

Now that phantom citations hallucinated by LLMs have been found in NeurIPS papers, what is to be done? Develop a software tool that authors are expected to run to verify their references in Google Scholar. Next, conferences use it to screen papers, and desk reject violators.

English

237

40.5K

Himanshu Gaurav Singh@Cinnabar233·7 Kas

@karpathy Finally! This did not work last November (2024) 😀 Of course one can complain about contamination and what not. But that would be true for the previous generation of VLMs too.

English

239

Himanshu Gaurav Singh@Cinnabar233·22 Kas

I wonder when is reasoning with images going to get there. Last time I tried the task from karpathy.github.io/2012/10/22/sta… (@karpathy ' s blog more than a decade ago), GPT-4o failed completely.

English

742

Himanshu Gaurav Singh@Cinnabar233·22 Kas

Despite being generally optimistic about the LLM trajectory, I did not expect this benchmark to get saturated this fast😱

Daman Arora@amuseddaman

It's so o'1'ver. JEE is too easy for o1. Performs close to 80-90%

English

4.1K

Himanshu Gaurav Singh retweetledi

rishabh ranjan@_rishabhranjan_·30 Eki

Transformers are great for sequences, but most business-critical predictions (e.g. product sales, customer churn, ad CTR, in-hospital mortality) rely on highly-structured relational data where signal is scattered across rows, columns, linked tables and time. Excited to finally share what I have been working on over the last year: a Foundation Model architecture which brings the power of Transformers to relational domains, enabling large-scale pretraining and zero-shot generalization in enterprise settings. 🧵1/n

English

151

59.9K

Himanshu Gaurav Singh retweetledi

Tony Zhao@tonyzzhao·4 Eki

It's been 15 years! We've come a long way. youtube.com/watch?v=gy5g33…

YouTube

English

27.1K

Himanshu Gaurav Singh retweetledi

Danijar Hafner@danijarh·30 Eyl

Excited to introduce Dreamer 4, an agent that learns to solve complex control tasks entirely inside of its scalable world model! 🌎🤖 Dreamer 4 pushes the frontier of world model accuracy, speed, and learning complex tasks from offline datasets. co-led with @wilson1yan

English

357

2.6K

455.1K

Himanshu Gaurav Singh retweetledi

Lars Ankile@larsankile·1 Eki

How can we enable finetuning of humanoid manipulation policies, directly in the real world? In our new paper, Residual Off-Policy RL for Finetuning BC Policies, we demonstrate real-world RL on a bimanual humanoid with 5-fingered hands (29 DoF) and improve pre-trained policies with ~15-75 minutes of robot interaction. By learning residual corrections on frozen BC policies using sample-efficient off-policy RL, we achieve significant improvements in sample efficiency, enabling policy finetuning directly on the hardware — to our knowledge, one of the first examples of this on a humanoid with bimanual dexterous hands. (If you know of other examples, let me know!)

English

263

52K

Himanshu Gaurav Singh retweetledi

Ankur Handa@ankurhandos·30 Eyl

Our whitepaper on Isaac Lab is out! Isaac Lab is a natural successor of Isaac Gym that pioneered GPU-accelerated simulation for robotics. It subsumes all the features of Gym and provides the latest advances in simulation technology to robotics researchers. It also supports warp-based custom sensors, actuator models, motion generation pipelines, teleoperation devices, and various ready to use environments for sim-to-real research for locomotion, manipulation, navigation and more.

English

375

77.2K

Himanshu Gaurav Singh retweetledi

Kevin Zakka@kevin_zakka·29 Eyl

I'm super excited to announce mjlab today! mjlab = Isaac Lab's APIs + best-in-class MuJoCo physics + massively parallel GPU acceleration Built directly on MuJoCo Warp with the abstractions you love.

English

142

867

91.3K

Himanshu Gaurav Singh retweetledi

Nick Turley@nickaturley·19 Ağu

We just launched ChatGPT Go in India, a new subscription tier that gives users in India more access to our most popular features: 10x higher message limits, 10x more image generations, 10x more file uploads, and 2x longer memory compared with our free tier. All for Rs. 399. 🇮🇳

English

1.2K

1.7K

25K

4.9M

Himanshu Gaurav Singh retweetledi

Neeldhara 🐦|🐘@neeldhara·7 Ağu

NPTEL is vastly underrated and frequently mistaken for some ill-maintained half-hearted “sarkari thing” from the early 2000s. Sighs. The effort they put into conducting the exams alone is remarkable… a ton of quiet, solid work behind the scenes.

atishayokti@atishayokti

NPTEL, which started well before Coursera, is still going strong. If it had been "founded" closer to San Jose, its founders would have by now entered "tech" mythology.

English

226

2.4K

81.5K

Himanshu Gaurav Singh@Cinnabar233·1 Ağu

Viser made it so much easy to work with simulation engines on remote hardware. Interactive headless rendering with viser is low latency enough that the need for a local GPU for visualization goes away. Beautiful piece of software from @brenthyi, @redstone_hong and team!

Brent Yi@brenthyi

July has been a big month for Viser! - Released v1.0.0😊 - We did some writing Some demos👇

English

1.2K

Himanshu Gaurav Singh retweetledi

David McAllister@davidrmcall·29 Tem

Excited to share Flow Matching Policy Gradients: expressive RL policies trained from rewards using flow matching. It’s an easy, drop-in replacement for Gaussian PPO on control tasks.

English

205

1.2K

150.1K

Keşfet

@GoogleDeepmind @Berkeley_AI @TTIC_Connect @xiaochuanlee @vijaytarian @nishadsinghi @hbXNov @iclr_conf