Deepak Vijaykeerthy

6.1K posts

Deepak Vijaykeerthy

@deepakvijayke

Research & Engineering @IBM

Bengaluru, India Katılım Eylül 2020

1.5K Takip Edilen254 Takipçiler

Sabitlenmiş Tweet

Deepak Vijaykeerthy@deepakvijayke·27 May

"The pile gets soaked with data and starts to get mushy over time, so it's technically recurrent."

English

6.5K

Deepak Vijaykeerthy@deepakvijayke·8h

@DirhousssiAmine Is the bug in the embed/LM-head accounting? tie_word_embeddings=false on this model, so: lm_head_flops = 0 if tie_word_embeddings else 2Vh, so it evaluates to 2Vh Incorrect total: 2Vh (wrong embed) + 2Vh (correct lm_head) = 4Vh Should be: 0 + 2Vh = 2Vh

English

122

Dirhousssi Amine@DirhousssiAmine·9h

Current way we compute MFU H100 SXM bf16 flops ~989.5e12 If someone can spot a bug. I am all ears

Dirhousssi Amine@DirhousssiAmine

🤯🤯🤯! after battling a nasty NCCL bug we finally have crazy crazy results. Thats adjusted MFU (to causal mask) I need to recheck my numbers because that's crazy high

English

1.5K

Deepak Vijaykeerthy retweetledi

Ricardo Olmedo@rdolmedo_·1d

We fine-tuned Alec Radford’s 1930 vintage LLM to solve SWE-bench issues. After just ‼️250‼️ training examples, the model solves its first issue, a simple patch to the xarray library. 🧵👇

English

1.1K

179.1K

Deepak Vijaykeerthy@deepakvijayke·17h

@willccbb I don't think people have issues with AI content (and disclosure certainly helps), people have a problem with mindless slop that some churn out (you the kind I am referring to :D). In many cases, it is so obvious that you don't an AI detector to tell that it's AI slop.

English

257

will brown@willccbb·18h

this is kinda fascinating to me. it seems like the answer to people being completely OK with AI-generated content is just to clearly disclose it. if you're writing tweets or articles with AI, just mention the model you used and it's fine. uno reverse for the pangram snipers

will brown@willccbb

Entropy is H(p) = E_p[-log p(x)], your optimal expected code length when you know p. Shannon coding assigns symbol x a length of -log p(x), and on average you can't beat it. Now suppose you don't know p. You believe it's q, so you build a code where x has length -log q(x). But the data is actually drawn from p, so your expected code length is: H(p, q) = E_p[-log q(x)] = -sum over x of p(x) * log q(x) That's cross-entropy: the bits you actually pay using q's codebook on p's data. KL is just the gap between what you pay and the optimum: KL(p || q) = H(p, q) - H(p) = sum over x of p(x) * log( p(x) / q(x) ) Or as a one-liner you could literally run: kl = sum(p[x] * log(p[x] / q[x]) for x in support) The story in three lines: H(p) is the floor, H(p,q) is what you actually pay with the wrong codebook, KL is the penalty — and it's always ≥ 0 because you can't beat the optimal code. - Will Brown & Claude Opus 4.7

English

268

28.7K

Deepak Vijaykeerthy retweetledi

Haitham Bou Ammar@hbouammar·31 Oca

We found that much of LLM “reasoning” doesn’t come from RL training; it comes from how you sample the model. Building on power sampling (Karan & Du 2025), we show you can approximate global reasoning without MCMC, without training, and 10× faster. 🧠 Inference-time intelligence is real. 📝 Blog ↓ @haitham.bouammar71/we-didnt-train-the-model-it-started-reasoning-better-anyway-118dda6f9448?postPublishedType=initial" target="_blank" rel="nofollow noopener">medium.com/@haitham.bouam…

English

663

63.4K

Deepak Vijaykeerthy@deepakvijayke·1d

@phalgooon

GIF

QME

Phalgun@phalgooon·1d

this is how you realise you are now part of the 36-45 age cohort

English

2.8K

Deepak Vijaykeerthy retweetledi

Jiaxin Pei@jiaxin_pei·4d

Why are AI agents so expensive? Do more tokens actually lead to better performance? Which models are more token-efficient? Can agents predict their own token costs before execution? These were the questions bugging us, so we wrote a paper to find out. Excited to share "How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks", led by @Longju_Bai, with co-authors at Stanford, MIT, Google DeepMind, Microsoft AI, and All Hands AI. A few findings that surprised us: 🔹 Agentic coding tasks consume ~1000× more tokens than chat or reasoning workloads. And input tokens, not output, become the dominant cost driver, because each round re-feeds the entire trajectory back into the model. 🔹 More tokens ≠ better outcomes. Runs on the same task can vary by up to 30× in token use, and accuracy often peaks at intermediate cost. Beyond that, extra spending tends to reflect redundant exploration and does not bring further performance gain. 🔹 Models differ substantially in token efficiency. On the same successfully solved tasks, Kimi-K2 and Claude Sonnet-4.5 use roughly twice as many tokens as GPT-5.2. The gap becomes even larger when all the models fail. 🔹 Human-rated task difficulty weakly predicts actual cost. "Easy" tasks for humans can be surprisingly expensive for agents, and vice versa. The classic "Moravec's Paradox" is also true for coding agents! 🔹 Agents struggle to predict their own costs. Self-prediction correlations top out around 0.39, and every model we tested systematically underestimates what a task will cost. Result-based pricing still has a long way to go when we cannot even figure out the token cost beforehand. Together, these results suggest that cost prediction is a genuinely challenging task for current agents. We think this opens up real research questions around self-modeling, calibrated cost estimation, and pricing mechanisms that work under residual uncertainty. Huge thanks to my collaborators: @Longju_Bai, Zhemin Huang, @sunjiao123sun_ , @xingyaow_ , @radamihalcea , @erikbryn , @alex_pentland paper: arxiv.org/abs/2604.22750 website: longjubai.github.io/agent_token_co…

English

125

12.2K

Deepak Vijaykeerthy retweetledi

Matthew Yglesias@mattyglesias·5d

Five months in, I think I've decided that I don't want to vibecode — I want professionally managed software companies to use AI coding assistance to make more/better/cheaper software products that they sell to me for money.

English

183

197

397.2K

Deepak Vijaykeerthy retweetledi

Lakshya A Agrawal@LakshyAAAgrawal·3 Mar

🧵Introducing LangProBe: the first benchmark testing where and how composing LLMs into language programs affects cost-quality tradeoffs! We find that, on avg across diverse tasks, smaller models within optimized programs beat calls to larger models at a fraction of the cost.

English

161

41.2K

Deepak Vijaykeerthy retweetledi

Aran Komatsuzaki@arankomatsuzaki·5d

The non-English tax is real. Sutton's Bitter Lesson, translated across languages and normalized to OpenAI English token count: Hindi: OpenAI 1.37×, Anthropic 3.24× Arabic: OpenAI 1.31×, Anthropic 2.86× Chinese: OpenAI 1.15×, Anthropic 1.71× Claude’s tokenizer charges a much higher linguistic tax.

English

251

1.5K

842.8K

Deepak Vijaykeerthy@deepakvijayke·5d

@XingyouSong You need a step-level signal to localise where the trajectory went wrong, then a token-level correction inside the bad steps. What makes RL on world/env feedback actually interesting is that proxy rewards are abundant in agentic workflows in a way human preferences never were.

English

Deepak Vijaykeerthy@deepakvijayke·5d

@XingyouSong Agree with the premise, but I think the hard part isn't the feedback. It's the credit assignment over it. :) GRPO assigns a uniform advantage within a rollout, so every token gets the same credit whether it caused the failure or not.

English

Richard Song@XingyouSong·6d

The day we use RL + GRPO-like techniques on all world feedback is the day we've achieved super-intelligence. We welcome all attendees and submissions -- see you all in Korea!

Akari Asai@AkariAsai

Can RL agents learn directly from real-world feedback? Join us at the RLxF (RL from World Feedback) Workshop @ ICML 2026 (Jul 10) to find out! 📄 Submit Full papers (8p) & proof-of-concept short-papers (2–4p) by May 13 AoE. 🔗 sites.google.com/view/rlxf-icml…

English

11.5K

Deepak Vijaykeerthy retweetledi

Jacob Bartlett@jacobtechtavern·5d

Can you feel the AGI yet?

English

7.9K

Deepak Vijaykeerthy retweetledi

Ronak Malde@rronak_·5d

My takeaways from ICLR 2026 1. Recursive self improvement / continual learning is the next frontier of research. Several great papers in self distillation, auto agent harness optimization, learning from non verifiable reward, self-play are sarly signs of success 2. Multimodal models and world models are attaining emergent reasoning capabilities, opening up a near door to spatial understanding that was previously locked 3. Lots of concerns that the research community is currently too focused on benchmaxxing rather than improving the research process, and a call to action to address this, like Percy Liang’s fully open source training community. 4. Rio is possibly even better than San Diego 🇧🇷🏄

English

134

1.5K

86K

Deepak Vijaykeerthy retweetledi

Michael Choi@michaelchchoi·6d

I am going to teach a graduate-level course on "Stochastic Processes and Applications" in Fall 2026. Excited about this and for now I plan to teach it based on Pierre Del Moral and Spiridon Penev's book "Stochastic Processes: From Applications to Theory"

English

127

9.7K

Deepak Vijaykeerthy retweetledi

ishan@0xishand·25 Nis

Agentic inference has a ton of patterns that you can exploit to get some really huge perf gains. However, inference engines today are opaque token in/token out machines without any knowledge about the “agentic loop”. Excited to share some of our early work on making Dynamo a world class agentic orchestrator layer. Stayed tuned for a lot more soon!

NVIDIA AI@NVIDIAAI

Traditional inference wasn’t built for agentic coding. Agentic tools make hundreds of API calls per coding session, often with recomputed context, creating bottlenecks that drive up cost per token. NVIDIA Dynamo rebuilds the stack for agents with: → KV-aware routing → Agent-aware scheduling → Multi-tier caching → Unified orchestration The result: higher cache hit rates, lower latency, and up to 7× more throughput: nvda.ws/3P1tO1N

English

2.4K

Deepak Vijaykeerthy retweetledi

lovish@louvishh·16 Eki

🚨 New Paper: The Art of Scaling Reinforcement Learning Compute for LLMs 🚨 We burnt a lot of GPU-hours to provide the community with the first open, large-scale systematic study on RL scaling for LLMs. x.com/Devvrit_Khatri…

Devvrit@Devvrit_Khatri

Wish to build scaling laws for RL but not sure how to scale? Or what scales? Or would RL even scale predictably? We introduce: The Art of Scaling Reinforcement Learning Compute for LLMs

English

26.5K

Deepak Vijaykeerthy retweetledi

YIFENG LIU@YIFENGLIU_AI·25 Nis

Excited to present our paper, "On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning," at ICLR 2026 (Pavilion 4 #4517) this afternoon! Project page: github.com/complex-reason… Paper: arxiv.org/abs/2505.17508

English

2.9K

Deepak Vijaykeerthy retweetledi

Praveen Swami@praveenswami·26 Nis

This excellent article explains why you would be very unwise to take my advice on repairing cars, growing roses or curing leprosy. For some reason, newsrooms seem to be a little slow to grasp that they need people who know what they’re writing about. pekingnology.com/p/some-trouble…

English

1.1K

Deepak Vijaykeerthy retweetledi

vik@vikhyatk·26 Nis

if you want to know what it feels like to train a big model without training a big model read this

English

687

131.8K

Deepak Vijaykeerthy@deepakvijayke·26 Nis

@DirhousssiAmine @_lewtun Perhaps something like what is done in Nemotron Cascade is the way to go. arxiv.org/abs/2603.19220

English

Dirhousssi Amine@DirhousssiAmine·26 Nis

@_lewtun oh ok, it makes more sense. I was getting confused bc they mentioned mixed training RL in prev v3.2 as a way to circumvent catastrophic forgetting

English

Dirhousssi Amine@DirhousssiAmine·24 Nis

Catastrophic forgetting in RL is a real problem Deepseek v4 moved from mixed RL in v3.2 to OPD. I feel like splitting the post-training into two phases is counterproductive. I expect we'll start seeing robust RL post-trainning paradigm across tasks in the near future

English

542

Keşfet

@DirhousssiAmine @willccbb @phalgooon @Longju_Bai @sunjiao123sun_ @xingyaow_ @radamihalcea @erikbryn