Himanshu Gaurav Singh

140 posts

Himanshu Gaurav Singh banner
Himanshu Gaurav Singh

Himanshu Gaurav Singh

@Cinnabar233

phd @berkeley_ai, prev iitd

Berkeley, CA Katılım Haziran 2019
921 Takip Edilen614 Takipçiler
Himanshu Gaurav Singh retweetledi
Grummz
Grummz@Grummz·
Robotics are cool. This is open source, 3d printable.
English
45
186
2.1K
118.8K
Himanshu Gaurav Singh retweetledi
Neerja Thakkar
Neerja Thakkar@neerjathakkar·
What’s the right representation for a world model? 3D, pixels, or something else? Excited to release our new paper “Forecasting Motion in the Wild” where we propose point tracks as tokens for generating complex non-rigid motion and behavior From @GoogleDeepmind @Berkeley_AI @TTIC_Connect
GIF
English
8
73
458
77.3K
Himanshu Gaurav Singh retweetledi
Kevin Zakka
Kevin Zakka@kevin_zakka·
Applied to Claude Code and Codex OSS programs for my MuJoCo work (mjlab + related tools), but didn’t get in 😢. If anyone at OpenAI or Anthropic is open to taking another look, would love to share more about what I’m building and its impact on the ecosystem.
English
15
12
207
34.4K
Himanshu Gaurav Singh retweetledi
Rahul
Rahul@selfawareatom·
Now that our 15 member llm team is infamous, time to expand for next time! If you have done one or more of the following, then please reach out. - pretrained a model of any size, from scratch - posttrained any base model, end to end (data curation, sft, rl) - are a pytorch wizard - are a cuda kernel master - you have any other relevant skills and work to back it up firstnamesarvamai
English
34
35
696
82.9K
Himanshu Gaurav Singh retweetledi
Harman Singh @ ICLR 🇧🇷
Harman Singh @ ICLR 🇧🇷@Harman26Singh·
Future directions from V1 / pairwise self-verification: 💡 Latency knob for #DeepThink-style systems by spending compute upfront on parallelizable pairwise verification 💡 Test-time scaling for agents: use or improve V1-Infer as a selection signal (@xiaochuanlee) 💡 Reward-model-free RLVR via self-signals from pairwise comparisons 💡 Rubric-based RLVR in non-verifiable domains via V1-Infer-style ranking for rewards (@vijaytarian) 💡 Analyze how V1 shifts the generation vs verification compute frontier, and how RL-for-verification changes that curve (@nishadsinghi @hbXNov) Related work (links) below 👇
Harman Singh @ ICLR 🇧🇷@Harman26Singh

Can LLMs Self-Verify? Much better than you'd expect. LLMs are increasingly used as parallel reasoners, sampling many solutions at once. Choosing the right answer is the real bottleneck. We show that pairwise self-verification is a powerful primitive. Introducing V1, a framework that unifies generation and self-verification: 💡 Pairwise self-verification beats pointwise scoring, improving test-time scaling 💡 V1-Infer: Efficient tournament-style ranking that improves self-verification 💡 V1-PairRL: RL training where generation and verification co-evolve for developing better self-verifiers 🧵👇

English
1
5
22
2.6K
Himanshu Gaurav Singh retweetledi
Harman Singh @ ICLR 🇧🇷
Harman Singh @ ICLR 🇧🇷@Harman26Singh·
Can LLMs Self-Verify? Much better than you'd expect. LLMs are increasingly used as parallel reasoners, sampling many solutions at once. Choosing the right answer is the real bottleneck. We show that pairwise self-verification is a powerful primitive. Introducing V1, a framework that unifies generation and self-verification: 💡 Pairwise self-verification beats pointwise scoring, improving test-time scaling 💡 V1-Infer: Efficient tournament-style ranking that improves self-verification 💡 V1-PairRL: RL training where generation and verification co-evolve for developing better self-verifiers 🧵👇
English
13
63
383
88.5K
Himanshu Gaurav Singh retweetledi
Will Liang
Will Liang@willjhliang·
Introducing Tether 🪢, a fun little idea to scale data by having our robot “play” in the real world for over 24 hours, throughout the day and overnight—improving policies from zero to mastery with minimal supervision! But play is messy, with out-of-distribution scenarios that are hard to anticipate. To perform autonomous functional play in the real world, from just a handful of demos, we propose a highly robust few-shot imitation method that warps demo trajectories using visual correspondences. Then, continuously running it within a multi-task VLM-guided cycle, we generate a data stream that produces 1000+ expert-level demos. This generated data is finally funneled downstream to train imitation learning policies, which improve from zero to near-perfect success rates. We’ll be presenting Tether at #ICLR2026 in just a few weeks! But before that, deep dive with me… 🧵
English
7
44
272
44.5K
Himanshu Gaurav Singh retweetledi
Gurusha Juneja
Gurusha Juneja@GurushaJuneja·
There were hallucinated references at #NeurIPS2025 & @iclr_conf this year, so I built harcx pypi.org/project/harcx/. A Python package to verify BibTeX citations against real academic databases. It supports papers, books, and URLs. Usage: pip install harcx harcx references.bib
Jitendra MALIK@JitendraMalikCV

Now that phantom citations hallucinated by LLMs have been found in NeurIPS papers, what is to be done? Develop a software tool that authors are expected to run to verify their references in Google Scholar. Next, conferences use it to screen papers, and desk reject violators.

English
7
30
237
40.5K
Himanshu Gaurav Singh
Himanshu Gaurav Singh@Cinnabar233·
@karpathy Finally! This did not work last November (2024) 😀 Of course one can complain about contamination and what not. But that would be true for the previous generation of VLMs too.
Himanshu Gaurav Singh tweet media
English
1
0
4
239
Himanshu Gaurav Singh retweetledi
rishabh ranjan
rishabh ranjan@_rishabhranjan_·
Transformers are great for sequences, but most business-critical predictions (e.g. product sales, customer churn, ad CTR, in-hospital mortality) rely on highly-structured relational data where signal is scattered across rows, columns, linked tables and time. Excited to finally share what I have been working on over the last year: a Foundation Model architecture which brings the power of Transformers to relational domains, enabling large-scale pretraining and zero-shot generalization in enterprise settings. 🧵1/n
rishabh ranjan tweet media
English
5
40
151
59.9K
Himanshu Gaurav Singh retweetledi
Danijar Hafner
Danijar Hafner@danijarh·
Excited to introduce Dreamer 4, an agent that learns to solve complex control tasks entirely inside of its scalable world model! 🌎🤖 Dreamer 4 pushes the frontier of world model accuracy, speed, and learning complex tasks from offline datasets. co-led with @wilson1yan
English
85
357
2.6K
455.1K
Himanshu Gaurav Singh retweetledi
Lars Ankile
Lars Ankile@larsankile·
How can we enable finetuning of humanoid manipulation policies, directly in the real world? In our new paper, Residual Off-Policy RL for Finetuning BC Policies, we demonstrate real-world RL on a bimanual humanoid with 5-fingered hands (29 DoF) and improve pre-trained policies with ~15-75 minutes of robot interaction. By learning residual corrections on frozen BC policies using sample-efficient off-policy RL, we achieve significant improvements in sample efficiency, enabling policy finetuning directly on the hardware — to our knowledge, one of the first examples of this on a humanoid with bimanual dexterous hands. (If you know of other examples, let me know!)
English
11
58
263
52K
Himanshu Gaurav Singh retweetledi
Ankur Handa
Ankur Handa@ankurhandos·
Our whitepaper on Isaac Lab is out! Isaac Lab is a natural successor of Isaac Gym that pioneered GPU-accelerated simulation for robotics. It subsumes all the features of Gym and provides the latest advances in simulation technology to robotics researchers. It also supports warp-based custom sensors, actuator models, motion generation pipelines, teleoperation devices, and various ready to use environments for sim-to-real research for locomotion, manipulation, navigation and more.
Ankur Handa tweet media
English
6
50
375
77.2K
Himanshu Gaurav Singh retweetledi
Kevin Zakka
Kevin Zakka@kevin_zakka·
I'm super excited to announce mjlab today! mjlab = Isaac Lab's APIs + best-in-class MuJoCo physics + massively parallel GPU acceleration Built directly on MuJoCo Warp with the abstractions you love.
English
32
142
867
91.3K
Himanshu Gaurav Singh retweetledi
Nick Turley
Nick Turley@nickaturley·
We just launched ChatGPT Go in India, a new subscription tier that gives users in India more access to our most popular features: 10x higher message limits, 10x more image generations, 10x more file uploads, and 2x longer memory compared with our free tier. All for Rs. 399. 🇮🇳
English
1.2K
1.7K
25K
4.9M
Himanshu Gaurav Singh retweetledi
Neeldhara 🐦|🐘
Neeldhara 🐦|🐘@neeldhara·
NPTEL is vastly underrated and frequently mistaken for some ill-maintained half-hearted “sarkari thing” from the early 2000s. Sighs. The effort they put into conducting the exams alone is remarkable… a ton of quiet, solid work behind the scenes.
atishayokti@atishayokti

NPTEL, which started well before Coursera, is still going strong. If it had been "founded" closer to San Jose, its founders would have by now entered "tech" mythology.

English
59
226
2.4K
81.5K
Himanshu Gaurav Singh retweetledi
David McAllister
David McAllister@davidrmcall·
Excited to share Flow Matching Policy Gradients: expressive RL policies trained from rewards using flow matching. It’s an easy, drop-in replacement for Gaussian PPO on control tasks.
English
8
205
1.2K
150.1K