Arun Iyer

897 posts

Arun Iyer

@AIonGradFlow

Researcher

Bangalore, India Katılım Haziran 2009

263 Takip Edilen103 Takipçiler

Arun Iyer retweetledi

Rulin Shao@RulinShao·1d

DR Tulu is now accepted for an oral presentation at #ICML2026 🙏 Updated paper: arxiv.org/abs/2511.19399 📥We added more ablations including using Qwen3-8B as the rubric generator&judge, showing evolving rubrics work with a weak model too; spurious rewards sanity check, etc. Live demo: dr-tulu.org Code&models: github.com/rlresearch/dr-…

Rulin Shao@RulinShao

Happy to share that DR Tulu has been accepted to ICML as a ✨Spotlight✨! We believe that co-evolving the agent and its reward metric can lead to more capable intelligence. DR Tulu is a team effort. Huge thanks and congrats to all my amazing collaborators and mentors!

English

198

15.8K

Arun Iyer retweetledi

Alex Smola@smolix·1d

The LLM benchmark zoo keeps growing: MMLU, MTEB, HELM, BigCodeBench, AlpacaEval, LiveBench, Arena-Hard, MT-Bench... days of GPU time per release. But the columns are wildly correlated. The real question isn't "which benchmark" but "which subset."

English

1.8K

Arun Iyer retweetledi

Yifan Yang@Yif_Yang·2d

🚀 Introducing SkillOpt — an optimizer for agent skills. Instead of finetuning model weights, we treat a natural-language skill as a trainable external parameter. Think of it as deep learning for the frontier-model + agent era: learning rate, LR schedule, mini-batch, batch size, epoch, momentum — all in text-space optimization. SkillOpt enables stable, controllable skill updates through bounded edits, allowing the optimizer to summarize “gradient directions” from agent experience and continuously improve procedural capability. We evaluate SkillOpt across 6 benchmarks and 7 models, under both direct model calls and real agent execution loops with Codex + Claude Code. SkillOpt achieves best or tied-best results in 52/52 settings. Train the skill, not the model. 🛠️🤖 🌐 aka.ms/skillopt 📄 huggingface.co/papers/2605.23…

English

102

820

80.3K

Arun Iyer retweetledi

Nathan Lambert@natolambert·2d

I heard people needed clarification that my book was a post-training book posttrainingbook.com

English

370

48K

Arun Iyer retweetledi

Arthur Gretton@ArthurGretton·4d

Your drifting model is secretly a fixed point for the Wasserstein gradient flow on... ...the KL? ...an approximation to the Sinkhorn? ...Is it even a Wasserstein gradient flow at all? arxiv.org/abs/2605.05118 @liwenliang @agalashov @JamesTThorn @ValentinDeBort1 @ArnaudDoucet1

English

438

61.9K

Arun Iyer retweetledi

Soheil Feizi@FeiziSoheil·4d

🚨 New paper alert: LLM agents increasingly need to decide when to answer directly and when to use a tool. But tool use is not one-size-fits-all. In our new paper, “Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use,” we argue that tool necessity should depend on the model itself. A question that GPT-level models can answer reliably may still require a calculator, search engine, or database call for a weaker model. Treating tool necessity as model-agnostic misses this important reality. We introduce a model-adaptive view of tool necessity, grounded in each model’s empirical capabilities, and compare when a tool is actually needed with when models choose to call one. Across arithmetic and factual QA settings, we find substantial mismatches: models often either call tools when they do not need them, or fail to call tools when they do. The key finding is a knowing-doing gap in LLM tool use. Models often contain internal signals about whether a tool is needed, but those signals do not reliably translate into the final tool-call action. This suggests that improving agent reliability is not only about teaching models to recognize when tools are useful, but also about making sure that recognition is converted into action. As LLMs become more agentic, tool-use reliability will be central to their safety, efficiency, and trustworthiness. Our work points to a more model-aware way of evaluating and improving when agents should rely on themselves versus external tools. Paper: arxiv.org/abs/2605.14038 Code & Data: github.com/chengez/Tool-C… Joint work with @chengez1114, Chenrui Fan, Mahdi JafariRaviz, @RezaeiKeivan

English

1.7K

Arun Iyer retweetledi

Zhepei Wei@weizhepei·5d

The paper and accompanying artifacts are now released — including 500+ RLVR checkpoints for studying training dynamics and extrapolation! 🥳🥳 📚 Paper: arxiv.org/abs/2605.21468 📝 Blog: weizhepei.notion.site/you-only-need-… 💻 Code: github.com/weizhepei/RELEX 🤗 Checkpoints: huggingface.co/relex-rlvr

Zhepei Wei@weizhepei

😢RLVR is powerful but expensive 🤯Imagine using <20% RLVR training while achieving 100% performance? Sounds surprising? We show that minimal RLVR training is enough to know where training is going, and predict future ckpts at no training cost! 📃tinyurl.com/minimal-rlvr 🧵[1/n]

English

6.2K

Arun Iyer retweetledi

Souradip Chakraborty@SOURADIPCHAKR18·15 May

🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them. We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute? ⤵️ Pedagogical RL

English

483

111.5K

Arun Iyer retweetledi

Duy Nguyen@duynguyen772·5d

Sparse binary rewards bottleneck LLM RL, motivating the use of privileged information in self-distillation as dense teachers. How can we use and balance multiple types of privileged info: leveraging stable cross-view info, while preserving view-specific info? Current on-policy self-distillation methods often condition the teacher on only one type of privileged view: full solution, partial rationale, answer-only, reference code, feedback, etc. This can be suboptimal: 1️⃣ No single privileged view consistently performs best when used as a teacher. 2️⃣ Views can introduce teacher-specific artifacts from information unavailable to the student. 🧠 Adaptive-View Self-Distillation (AVSD) considers multiple privileged views jointly as a teacher family, balancing cross-view consensus and view-specific signals through a token-level gate to construct better dense learning signals. 🧵👇

English

25.3K

Arun Iyer retweetledi

Yuxiang Huang@yxyxyyy6·5d

[1/n] Can a model learn *where* and *how much* information it should attend to, and do so efficiently? We introduce DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention! This pushes the accuracy-efficiency frontier in LLMs.

GIF

English

119

30.2K

Arun Iyer retweetledi

Ming Li @ UMD PhD@Ming_Liiii·18 May

Excited to share that our paper: “Schoenfeld’s Anatomy of Mathematical Reasoning by Language Models” has been selected as an ACL 2026 Oral 🎉 @aclmeeting Mathematical reasoning has become one of the key frontiers for evaluating and improving large language models. Yet we still lack a clear picture of how these models organize their reasoning internally through natural language traces. In this work, we propose ThinkARM, a framework for analyzing LLM mathematical reasoning from a cognitive science perspective. Building on Schoenfeld’s episode theory of mathematical problem solving, ThinkARM segments model reasoning into interpretable functional episodes such as Reading, Analysis, Exploration, Implementation, and Verification. This allows us to ask not only whether a model reaches the right answer, but how it moves through the reasoning process. We believe this provides a useful step toward more interpretable analysis of LLM reasoning, and toward building models that reason not only more, but better. Congrats to the collaborators Chenrui, @chengez1114 , @FeiziSoheil and @zhoutianyi at UMD @umdcs Paper: arxiv.org/pdf/2512.19995 Repo: github.com/MingLiiii/Thin…

English

36.8K

Arun Iyer retweetledi

Emmy Liu@_emliu·20 May

Copying → morphology/translation → basic arithmetic → complex reasoning & math. Across every model family we tested, LLMs acquire skills in roughly the same order during pretraining. Can we use this to predict what a model will learn next, just from its internals? 🧵

English

477

52K

Arun Iyer retweetledi

Konstantin Mishchenko@konstmish·20 May

That's a nice paper, very neat.

English

181

22.5K

Arun Iyer retweetledi

Sungjin Ahn@SungjinAhn_·20 May

🧠We introduce "Generative Recursive Reasoning"! Recursive Reasoning Models like HRM, TRM, and Looped Transformers are deterministic — same input, same reasoning, every time. They collapse the entire space of plausible reasoning paths into a single attractor. Our model GRAM (Generative Recursive reAsoning Models) turns recursion itself into a stochastic latent trajectory. Multiple hypotheses, alternative solution strategies, and inference-time scaling not just by depth, but by width — parallel trajectory sampling. And here's the kicker: the same formulation that gives us conditional reasoning p(y|x) also makes GRAM a general generative model p(x). With only 10M params: • Sudoku-Extreme: 97.0% (TRM 87.4%) • ARC-AGI-1: 52.0% • ARC-AGI-2: 11.1% • N-Queens coverage: 90%+ 📄 Paper: arxiv.org/abs/2605.19376 🌐 Project page: ahn-ml.github.io/gram-website w/ Junyeob Baek @JunyeobB (KAIST), Mingyu Jo @pyross0000 (KAIST), Minsu Kim @minsuuukim (KAIST & Mila), Mengye Ren @mengyer (NYU), Yoshua Bengio @Yoshua_Bengio (Mila), Sungjin Ahn @SungjinAhn_ (KAIST)

English

208

1.5K

179.1K

Arun Iyer retweetledi

Yuda Song@yus167·20 May

Exciting work! But in our February paper, "Reinforcement Learning with Text Feedback", we proposed the same methodology: predicting environment feedback on top of the RL loss. Nice to see this idea specialized to agentic terminal tasks, and the new insight this brings 💡. [1/2]

Dimitris Papailiopoulos@DimitrisPapail

x.com/i/article/2056…

English

225

29.7K

Arun Iyer retweetledi

Jeonghye Kim@beanie0__0·19 May

Great to see RL with self-distillation (w/ text feedback) in agent setups being scaled to a production Cursor model! If you're interested in this regime, I highly recommend "Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization" (ICLR'26). In multi-turn agents interacting with external environments, it shows how agents can distill self-generated textual tips during RL training to correct past failures and explore more efficiently, achieving up to a 128.6% performance improvement🚀 📄 Paper: arxiv.org/abs/2602.23008 📝 Blog: agent-lightning.github.io/posts/empo2/ 💻 Code: github.com/microsoft/agen…

GIF

Cursor@cursor_ai

Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.

English

4.7K

Arun Iyer retweetledi

Satwik Bhattamishra@satwik1729·16 May

Given black-box access to a Transformer's output, can we efficiently recover its parameters? We analyse the learnability of attention-based models with query access in our new work. Accepted at #ICML2026 🎉 Work done with @shahkulin98, @mhahn29 and Varun Kanade. 🧵

English

163

22.1K

Arun Iyer retweetledi

Paria Rashidinejad@paria_rd·15 May

Looped Transformers: the dream was right. But there was trouble in paradise. The loop made them unstable, expensive, and memory-hungry, with gains hard to scale. So we asked: 𝗖𝗮𝗻 𝘄𝗲 𝗿𝗲𝗮𝗽 𝘁𝗵𝗲 𝗿𝗲𝘄𝗮𝗿𝗱𝘀 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝗽𝗮𝘆𝗶𝗻𝗴 𝘁𝗵𝗲 𝗹𝗼𝗼𝗽 𝘁𝗮𝘅? Introducing 𝗔𝘁𝘁𝗿𝗮𝗰𝘁𝗼𝗿 𝗠𝗼𝗱𝗲𝗹𝘀 𝗳𝗼𝗿 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗮𝗻𝗱 𝗥𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴: • A Backbone proposes an initial “guess” output embedding; • An Attractor refines it: a fixed-point solver lets the model “think” before each token. Implicit differentiation trains the model stably, with constant memory and without BPTT. Training also revealed a surprising phenomenon: 𝗘𝗾𝘂𝗶𝗹𝗶𝗯𝗿𝗶𝘂𝗺 𝗜𝗻𝘁𝗲𝗿𝗻𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 Over the course of training, the Backbone learns to propose latents close to the equilibrium itself, making the Attractor almost unnecessary at inference. Results: • 𝗣𝗮𝗿𝗲𝘁𝗼 𝗶𝗺𝗽𝗿𝗼𝘃𝗲𝗺𝗲𝗻𝘁 𝗼𝗻 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗺𝗼𝗱𝗲𝗹𝗶𝗻𝗴: up to 𝟰𝟲.𝟲% lower perplexity and 𝟭𝟵.𝟳% better downstream accuracy. A 770M Attractor Model beats a 1.3B Transformer, despite being trained on half as many tokens. • 𝗦𝗶𝗴𝗻𝗶𝗳𝗶𝗰𝗮𝗻𝘁 𝗴𝗮𝗶𝗻𝘀 𝗼𝗻 𝗵𝗮𝗿𝗱 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 𝘁𝗮𝘀𝗸𝘀: a 27M Attractor Model trained on only 1K examples achieves 𝟵𝟭.𝟰% 𝗼𝗻 𝗦𝘂𝗱𝗼𝗸𝘂-𝗘𝘅𝘁𝗿𝗲𝗺𝗲 and 𝟵𝟯.𝟭% 𝗼𝗻 𝗠𝗮𝘇𝗲-𝗛𝗮𝗿𝗱, while Transformers and frontier models like Claude and GPT o3 score 𝟬%. 📝 arxiv.org/pdf/2605.12466 🧵 1/10

English

590

64.2K

Arun Iyer retweetledi

Yuwei Zhang@YuweiZh49446108·13 May

On-policy self-distillation is a promising direction for learning from rich textual feedback. But can it really learn from failed trajectories? Our answer: not quite -- unless we let the model actively interpret them. 🧵1/N

English

478

519.2K

Arun Iyer retweetledi

Linlu Qiu@linluqiu·12 May

Language is discrete. Language models don’t have to be. 🧚Introducing ELF🧚‍♀️: Embedded Language Flows—a class of diffusion models in continuous embedding space based on continuous-time Flow Matching 🧵

English

131

806

135.2K

Keşfet

@liwenliang @agalashov @JamesTThorn @ValentinDeBort1 @ArnaudDoucet1 @chengez1114 @RezaeiKeivan @aclmeeting