Divyat Mahajan

357 posts

Divyat Mahajan

@divyat09

Ph.D. Candidate @Mila_Quebec | Visiting Researcher @AIatMeta | Former: @MSFTResearch @IITKanpur

Montreal Katılım Ağustos 2016

676 Takip Edilen768 Takipçiler

Sabitlenmiş Tweet

Divyat Mahajan@divyat09·29 Eki

[1/9] While pretraining data might be hitting a wall, novel methods for modeling it are just getting started! We introduce future summary prediction (FSP), where the model predicts future sequence embeddings to reduce teacher forcing & shortcut learning. 📌Predict a learned embedding of the future sequence, not the tokens themselves

GIF

English

218

55.3K

Divyat Mahajan retweetledi

Amit Sharma@amt_shrma·20 Mar

The better LLMs get at reasoning, the longer their traces get—thousands of tokens, dozens of tool calls. But in law, medicine, and agentic AI, "usually correct" isn't good enough: answers must be verifiably correct. We built interwhen at @MSFTResearch to make that tractable. And it's now open source. Across benchmarks, plugging interwhen into an LLM yields: ✅ 100% soundness (with full verifiers) 📈 up to 15% accuracy gain ⚡ ~ 1.5× compute cost 🧵

English

113

12.4K

Divyat Mahajan retweetledi

Moksh Jain@JainMoksh·15 Mar

We have been pushing the limits of test-time scaling with RSA for single-turn reasoning problems in science and math. Check out our blog post with new results on ARC-AGI-2, ArXivMath, and FrontierScience! A lot of gains with just test-time scaling! rsa-llm.github.io/blog

English

12.4K

Divyat Mahajan retweetledi

Arnas Uselis@a_uselis·11 Mar

How do embedding spaces of models that generalize from limited data look? We study what structure such models should exhibit. Turns out: linear and orthogonal. And modern embedding models like CLIP and SigLIP already show signs of it! 🧵 (1/n)

English

101

709

75.7K

Divyat Mahajan retweetledi

Sharut Gupta@sharut_gupta·23 Şub

[1/n] Do distinct large models admit a simple map that aligns their embedding spaces? We show that across multimodal contrastive models—trained on different data and architectures—an orthogonal map aligns image embeddings. Strikingly, the same map also aligns text embeddings.

English

437

35.3K

Divyat Mahajan retweetledi

Julia Kempe@KempeLab·13 Şub

1/ #1stProof. Our second installment — this time tackling Problem 3, with @scottnarmstrong and @MunosRemi Also check out our takeaways — and a short “Humor from your bot” interlude — below.

English

9.4K

Divyat Mahajan retweetledi

Benno Krojer@benno_krojer·11 Şub

🚨New paper Are visual tokens going into an LLM interpretable 🤔 Existing methods (e.g. logit lens) and assumptions would lead you to think “not much”... We propose LatentLens and show that most visual tokens are interpretable across *all* layers 💡 Details 🧵

English

241

51.7K

Divyat Mahajan retweetledi

Sébastien Lachapelle@seblachap·6 Şub

I had a lot of fun meeting all the smart people at this workshop and presenting my work "On the Identifiability of Latent Action Policies" as an oral! A huge thanks to the organizers! Paper: arxiv.org/abs/2510.01337

World Modeling Workshop@worldmodel_conf

What an awesome first day! Thank you all for joining and listening to our amazing speakers: @SchmidhuberAI, @sherryyangML, @cosmo_shirley, @Yoshua_Bengio, @ylecun, @mido_assran World Models have beautiful days ahead. This is just the beginning 🫡

English

2.3K

Divyat Mahajan retweetledi

Sheshansh Agrawal@sheshanshag·6 Şub

**New research: Introducing ⚡BlitzRank** Current LLM rerankers waste tokens on information they already have. If A > B and B > C, you already know A > C, existing methods don’t track this. BlitzRank fixes this. It uses tournament graphs to extract maximal information from each LLM call. 📊 Pareto-optimal across 14 benchmarks × 5 LLMs ⚡ 25–40% fewer tokens than comparable methods ⚡ 7× cheaper than pairwise at near-identical quality

English

17.7K

Divyat Mahajan retweetledi

Sharut Gupta@sharut_gupta·3 Şub

1/n Can LLMs learn to reason on hard benchmarks like AIME and GPQA purely through context, without SFT, RL, or any weight updates? Turns out… Yes! And it can have strong performance while being highly efficient Paper: arxiv.org/pdf/2602.02366 Blog: reasoncache.github.io

English

208

17.2K

Divyat Mahajan retweetledi

Jason Weston@jaseweston·30 Oca

📈Self-Improving Pretraining 📈 ✍️: arxiv.org/abs/2601.21343 Reinvents pretraining: no more next token prediction! - Uses existing LM from last self-improvement iteration to give rewards to pretrain new model on *sequences* - Large gains in factuality, safety & quality 🧵1/5

English

607

50.5K

Divyat Mahajan retweetledi

Aniket Vashishtha@AniketVashisht8·26 Oca

Happy to share that our paper on identifying missing cognitive skills for counterfactual reasoning in LLMs via a code based framework is accepted at ICLR'26🎉 We show issues with past approaches in evaluating CF reasoning of LLMs and how RL can induce the required skills!

Aniket Vashishtha@AniketVashisht8

A lot is said about LLMs’ counterfactual reasoning, but do they truly possess the cognitive skills it needs? Introducing Executable Counterfactuals, a code framework that (1) shows frontier models lack these skills (2) offers a testbed for improvement via Reinforcement Learning

English

4.6K

Divyat Mahajan retweetledi

Daniel Severo@_dsevero·26 Oca

Accepted at ICLR2026!

Daniel Severo@_dsevero

New work: a scalable way to learn dists over permutations/rankings. The method can trade-off compute and expressivity by varying # NFEs (ie unmasking more than one token at a time), and subsumes well known families of models (eg Mallow' model) arxiv.org/abs/2505.24664

English

4.7K

Divyat Mahajan retweetledi

rohan@rohanbanerjeee·23 Oca

Check out the latest (and the best) open-source ECG interpretation foundation models from our team at heartwise.ai @ICMtl Now published at @ESC_Journals

Robert Avram@RobertAvramMD

EXCITED to share the release of two foundation models for electrocardiogram interpretation in @ehj_ed We built DeepECG-SL and DeepECG-SSL, two open-source ECG foundation models trained on >1M ECGs and validated across 11 external datasets (881K ECGs). 🔗 academic.oup.com/eurheartj/adva…

English

1.1K

Divyat Mahajan retweetledi

Moksh Jain@JainMoksh·26 Oca

With a simple RSA loop on top of Gemini 3 Flash we were able to match the human average on ARC-AGI-2 (Public) and rank among the top models! This is a really exciting results but we believe there is still a lot of room for improvement and interesting research questions to study!

Siddarth Venkatraman@siddarthv66

Recursive Self-Aggregation (RSA) + Gemini 3 Flash scores 59.31% on the public ARC-AGI-2 evals, placing it firmly among the top performers! Here are the highlights: > Outperforms Gemini DeepThink at about 1/10th the cost > Bridges the performance gap with GPT-5.2-xHigh for a similar cost > Nearly matches Poetiq while using a much simpler pipeline. Poetiq uses scaffolded refinement (often via generated code), while RSA does not Also, Gemini 3 Flash is impressive; the Gemini team cooked with this one! We're also eager to run evals with GPT-5.2 + RSA in the future (anyone with credits? :P)

English

3.6K

Divyat Mahajan retweetledi

Vineet Jain@thevineetjain·26 Oca

RSA with Gemini Flash 3 absolutely crushes the public ARC-AGI-2 evals! 🚀 It is an extremely simple test-time algorithm, and there is really no reason to not use it (other than computation/cost ofc). Check out our detailed post and website for more details! @arcprize @GeminiApp

Siddarth Venkatraman@siddarthv66

English

1.4K

Divyat Mahajan retweetledi

Vineet Jain@thevineetjain·14 Oca

Bayesian methods enable online adaptation in offline RL, but most still rely on conservatism that limits generalization. How can we drop it entirely? TL;DR: Reason over plausible MDPs + history-dependent policy. Paper: arxiv.org/abs/2512.04341 Code: github.com/twni2016/neubay

Tianwei Ni@twni2016

Offline RL is dominated by conservatism -- safe, but limiting generalization. In our new paper, we ask: what if we drop it and rely on Bayesian principle for adaptive generalization? Surprisingly, long-horizon rollouts -- usually avoided in model-based RL -- make it work. 🧵

English

466

Divyat Mahajan retweetledi

Anirudh Goyal@anirudhg9119·15 Oca

Why do complex skills “emerge” in bigger LLMs? LLM “emergence” isn’t magic. Our work shows it’s a mathematical consequence of (1) scaling laws + (2) how real text mixes skills. We call it slingshot generalisation. Work with @prfsanjeevarora

Sholto Douglas@_sholtodouglas

One day we’ll be able to decompose the loss curve of a neural net into all of the quanta it learns along the way - this is one of my fav streams of fundamental research. Really promising line of work

English

341

45.9K

Divyat Mahajan retweetledi

Vaishnavh Nagarajan@_vaishnavh·8 Oca

1/ We found that deep sequence models memorize atomic facts "geometrically" -- not as an associative lookup table as often imagined. This opens up practical questions on reasoning/memory/discovery, and also poses a theoretical "memorization puzzle."

GIF

English

247

1.5K

89.5K

Divyat Mahajan retweetledi

Vedant Shah@veds_12·5 Oca

LOTs of discourse lately about the correctness of the KL-regularization term used in RLVR fine-tuning of LLMs. Which estimator to use? Whether to add it to the reward or loss? What’s even the difference? 🤔 In our new preprint, we evaluate these choices empirically. 🧵 1/n

English

124

19.5K

Divyat Mahajan retweetledi

Reyhane Askari@ReyhaneAskari·19 Ara

Super excited about our new paper. If you are working on reward models, judges or post-training pipelines for omni-models, we hope MMRB2 helps you benchmark progress. See Yushi's thread for full details. Paper: arxiv.org/abs/2512.16899 Code: github.com/facebookresear…

Yushi Hu@huyushi98

Reward models make or break post-training for multimodal omni models (e.g., nano banana), yet there’s surprisingly little research on that‼️ We’re releasing MMRB2: new reward benchmark focusing on omni models, spanning T2I, editing, interleaved, and thinking with images 🧵1/n

English

2.5K

Keşfet

@MSFTResearch @scottnarmstrong @MunosRemi @ICMtl @ESC_Journals @arcprize @GeminiApp @prfsanjeevarora