Divyat Mahajan

357 posts

Divyat Mahajan banner
Divyat Mahajan

Divyat Mahajan

@divyat09

Ph.D. Candidate @Mila_Quebec | Visiting Researcher @AIatMeta | Former: @MSFTResearch @IITKanpur

Montreal Katılım Ağustos 2016
676 Takip Edilen768 Takipçiler
Sabitlenmiş Tweet
Divyat Mahajan
Divyat Mahajan@divyat09·
[1/9] While pretraining data might be hitting a wall, novel methods for modeling it are just getting started! We introduce future summary prediction (FSP), where the model predicts future sequence embeddings to reduce teacher forcing & shortcut learning. 📌Predict a learned embedding of the future sequence, not the tokens themselves
GIF
English
11
46
218
55.3K
Divyat Mahajan retweetledi
Amit Sharma
Amit Sharma@amt_shrma·
The better LLMs get at reasoning, the longer their traces get—thousands of tokens, dozens of tool calls. But in law, medicine, and agentic AI, "usually correct" isn't good enough: answers must be verifiably correct. We built interwhen at @MSFTResearch to make that tractable. And it's now open source. Across benchmarks, plugging interwhen into an LLM yields: ✅ 100% soundness (with full verifiers) 📈 up to 15% accuracy gain ⚡ ~ 1.5× compute cost 🧵
English
3
24
113
12.4K
Divyat Mahajan retweetledi
Moksh Jain
Moksh Jain@JainMoksh·
We have been pushing the limits of test-time scaling with RSA for single-turn reasoning problems in science and math. Check out our blog post with new results on ARC-AGI-2, ArXivMath, and FrontierScience! A lot of gains with just test-time scaling! rsa-llm.github.io/blog
Moksh Jain tweet media
English
0
19
81
12.4K
Divyat Mahajan retweetledi
Arnas Uselis
Arnas Uselis@a_uselis·
How do embedding spaces of models that generalize from limited data look? We study what structure such models should exhibit. Turns out: linear and orthogonal. And modern embedding models like CLIP and SigLIP already show signs of it! 🧵 (1/n)
English
4
101
709
75.7K
Divyat Mahajan retweetledi
Sharut Gupta
Sharut Gupta@sharut_gupta·
[1/n] Do distinct large models admit a simple map that aligns their embedding spaces? We show that across multimodal contrastive models—trained on different data and architectures—an orthogonal map aligns image embeddings. Strikingly, the same map also aligns text embeddings.
English
12
61
437
35.3K
Divyat Mahajan retweetledi
Julia Kempe
Julia Kempe@KempeLab·
1/ #1stProof. Our second installment — this time tackling Problem 3, with @scottnarmstrong and @MunosRemi Also check out our takeaways — and a short “Humor from your bot” interlude — below.
Julia Kempe tweet media
English
4
20
79
9.4K
Divyat Mahajan retweetledi
Benno Krojer
Benno Krojer@benno_krojer·
🚨New paper Are visual tokens going into an LLM interpretable 🤔 Existing methods (e.g. logit lens) and assumptions would lead you to think “not much”... We propose LatentLens and show that most visual tokens are interpretable across *all* layers 💡 Details 🧵
Benno Krojer tweet media
English
3
58
241
51.7K
Divyat Mahajan retweetledi
Sébastien Lachapelle
Sébastien Lachapelle@seblachap·
I had a lot of fun meeting all the smart people at this workshop and presenting my work "On the Identifiability of Latent Action Policies" as an oral! A huge thanks to the organizers! Paper: arxiv.org/abs/2510.01337
World Modeling Workshop@worldmodel_conf

What an awesome first day! Thank you all for joining and listening to our amazing speakers: @SchmidhuberAI, @sherryyangML, @cosmo_shirley, @Yoshua_Bengio, @ylecun, @mido_assran World Models have beautiful days ahead. This is just the beginning 🫡

English
1
4
25
2.3K
Divyat Mahajan retweetledi
Sheshansh Agrawal
Sheshansh Agrawal@sheshanshag·
**New research: Introducing ⚡BlitzRank** Current LLM rerankers waste tokens on information they already have. If A > B and B > C, you already know A > C, existing methods don’t track this. BlitzRank fixes this. It uses tournament graphs to extract maximal information from each LLM call. 📊 Pareto-optimal across 14 benchmarks × 5 LLMs ⚡ 25–40% fewer tokens than comparable methods ⚡ 7× cheaper than pairwise at near-identical quality
Sheshansh Agrawal tweet media
English
4
21
72
17.7K
Divyat Mahajan retweetledi
Sharut Gupta
Sharut Gupta@sharut_gupta·
1/n Can LLMs learn to reason on hard benchmarks like AIME and GPQA purely through context, without SFT, RL, or any weight updates? Turns out… Yes! And it can have strong performance while being highly efficient Paper: arxiv.org/pdf/2602.02366 Blog: reasoncache.github.io
Sharut Gupta tweet media
English
4
35
208
17.2K
Divyat Mahajan retweetledi
Jason Weston
Jason Weston@jaseweston·
📈Self-Improving Pretraining 📈 ✍️: arxiv.org/abs/2601.21343 Reinvents pretraining: no more next token prediction! - Uses existing LM from last self-improvement iteration to give rewards to pretrain new model on *sequences* - Large gains in factuality, safety & quality 🧵1/5
Jason Weston tweet media
English
10
86
607
50.5K
Divyat Mahajan retweetledi
Aniket Vashishtha
Aniket Vashishtha@AniketVashisht8·
Happy to share that our paper on identifying missing cognitive skills for counterfactual reasoning in LLMs via a code based framework is accepted at ICLR'26🎉 We show issues with past approaches in evaluating CF reasoning of LLMs and how RL can induce the required skills!
Aniket Vashishtha@AniketVashisht8

A lot is said about LLMs’ counterfactual reasoning, but do they truly possess the cognitive skills it needs? Introducing Executable Counterfactuals, a code framework that (1) shows frontier models lack these skills (2) offers a testbed for improvement via Reinforcement Learning

English
2
7
42
4.6K
Divyat Mahajan retweetledi
rohan
rohan@rohanbanerjeee·
Check out the latest (and the best) open-source ECG interpretation foundation models from our team at heartwise.ai @ICMtl Now published at @ESC_Journals
Robert Avram@RobertAvramMD

EXCITED to share the release of two foundation models for electrocardiogram interpretation in @ehj_ed We built DeepECG-SL and DeepECG-SSL, two open-source ECG foundation models trained on >1M ECGs and validated across 11 external datasets (881K ECGs). 🔗 academic.oup.com/eurheartj/adva…

English
0
5
12
1.1K
Divyat Mahajan retweetledi
Divyat Mahajan retweetledi
Divyat Mahajan retweetledi
Vineet Jain
Vineet Jain@thevineetjain·
Bayesian methods enable online adaptation in offline RL, but most still rely on conservatism that limits generalization. How can we drop it entirely? TL;DR: Reason over plausible MDPs + history-dependent policy. Paper: arxiv.org/abs/2512.04341 Code: github.com/twni2016/neubay
Tianwei Ni@twni2016

Offline RL is dominated by conservatism -- safe, but limiting generalization. In our new paper, we ask: what if we drop it and rely on Bayesian principle for adaptive generalization? Surprisingly, long-horizon rollouts -- usually avoided in model-based RL -- make it work. 🧵

English
0
2
10
466
Divyat Mahajan retweetledi
Anirudh Goyal
Anirudh Goyal@anirudhg9119·
Why do complex skills “emerge” in bigger LLMs? LLM “emergence” isn’t magic. Our work shows it’s a mathematical consequence of (1) scaling laws + (2) how real text mixes skills. We call it slingshot generalisation. Work with @prfsanjeevarora
Anirudh Goyal tweet media
Sholto Douglas@_sholtodouglas

One day we’ll be able to decompose the loss curve of a neural net into all of the quanta it learns along the way - this is one of my fav streams of fundamental research. Really promising line of work

English
12
33
341
45.9K
Divyat Mahajan retweetledi
Vaishnavh Nagarajan
Vaishnavh Nagarajan@_vaishnavh·
1/ We found that deep sequence models memorize atomic facts "geometrically" -- not as an associative lookup table as often imagined. This opens up practical questions on reasoning/memory/discovery, and also poses a theoretical "memorization puzzle."
GIF
English
59
247
1.5K
89.5K
Divyat Mahajan retweetledi
Vedant Shah
Vedant Shah@veds_12·
LOTs of discourse lately about the correctness of the KL-regularization term used in RLVR fine-tuning of LLMs. Which estimator to use? Whether to add it to the reward or loss? What’s even the difference? 🤔 In our new preprint, we evaluate these choices empirically. 🧵 1/n
Vedant Shah tweet media
English
7
34
124
19.5K
Divyat Mahajan retweetledi
Reyhane Askari
Reyhane Askari@ReyhaneAskari·
Super excited about our new paper. If you are working on reward models, judges or post-training pipelines for omni-models, we hope MMRB2 helps you benchmark progress. See Yushi's thread for full details. Paper: arxiv.org/abs/2512.16899 Code: github.com/facebookresear…
Yushi Hu@huyushi98

Reward models make or break post-training for multimodal omni models (e.g., nano banana), yet there’s surprisingly little research on that‼️ We’re releasing MMRB2: new reward benchmark focusing on omni models, spanning T2I, editing, interleaved, and thinking with images 🧵1/n

English
0
7
21
2.5K