Deepak Vijaykeerthy

6.1K posts

Deepak Vijaykeerthy

Deepak Vijaykeerthy

@deepakvijayke

Research & Engineering @IBM

Bengaluru, India Katılım Eylül 2020
1.5K Takip Edilen254 Takipçiler
Sabitlenmiş Tweet
Deepak Vijaykeerthy
Deepak Vijaykeerthy@deepakvijayke·
"The pile gets soaked with data and starts to get mushy over time, so it's technically recurrent."
Deepak Vijaykeerthy tweet media
English
0
0
8
6.5K
Deepak Vijaykeerthy
Deepak Vijaykeerthy@deepakvijayke·
@DirhousssiAmine Is the bug in the embed/LM-head accounting? tie_word_embeddings=false on this model, so: lm_head_flops = 0 if tie_word_embeddings else 2Vh, so it evaluates to 2Vh Incorrect total: 2Vh (wrong embed) + 2Vh (correct lm_head) = 4Vh Should be: 0 + 2Vh = 2Vh
English
0
0
0
122
Deepak Vijaykeerthy retweetledi
Ricardo Olmedo
Ricardo Olmedo@rdolmedo_·
We fine-tuned Alec Radford’s 1930 vintage LLM to solve SWE-bench issues. After just ‼️250‼️ training examples, the model solves its first issue, a simple patch to the xarray library. 🧵👇
Ricardo Olmedo tweet media
English
22
76
1.1K
179.1K
Deepak Vijaykeerthy
Deepak Vijaykeerthy@deepakvijayke·
@willccbb I don't think people have issues with AI content (and disclosure certainly helps), people have a problem with mindless slop that some churn out (you the kind I am referring to :D). In many cases, it is so obvious that you don't an AI detector to tell that it's AI slop.
English
0
0
1
257
Deepak Vijaykeerthy retweetledi
Haitham Bou Ammar
Haitham Bou Ammar@hbouammar·
We found that much of LLM “reasoning” doesn’t come from RL training; it comes from how you sample the model. Building on power sampling (Karan & Du 2025), we show you can approximate global reasoning without MCMC, without training, and 10× faster. 🧠 Inference-time intelligence is real. 📝 Blog ↓ @haitham.bouammar71/we-didnt-train-the-model-it-started-reasoning-better-anyway-118dda6f9448?postPublishedType=initial" target="_blank" rel="nofollow noopener">medium.com/@haitham.bouam…
Haitham Bou Ammar tweet media
English
29
86
663
63.4K
Phalgun
Phalgun@phalgooon·
this is how you realise you are now part of the 36-45 age cohort
Phalgun tweet media
English
5
1
37
2.8K
Deepak Vijaykeerthy retweetledi
Jiaxin Pei
Jiaxin Pei@jiaxin_pei·
Why are AI agents so expensive? Do more tokens actually lead to better performance? Which models are more token-efficient? Can agents predict their own token costs before execution? These were the questions bugging us, so we wrote a paper to find out. Excited to share "How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks", led by @Longju_Bai, with co-authors at Stanford, MIT, Google DeepMind, Microsoft AI, and All Hands AI. A few findings that surprised us: 🔹 Agentic coding tasks consume ~1000× more tokens than chat or reasoning workloads. And input tokens, not output, become the dominant cost driver, because each round re-feeds the entire trajectory back into the model. 🔹 More tokens ≠ better outcomes. Runs on the same task can vary by up to 30× in token use, and accuracy often peaks at intermediate cost. Beyond that, extra spending tends to reflect redundant exploration and does not bring further performance gain. 🔹 Models differ substantially in token efficiency. On the same successfully solved tasks, Kimi-K2 and Claude Sonnet-4.5 use roughly twice as many tokens as GPT-5.2. The gap becomes even larger when all the models fail. 🔹 Human-rated task difficulty weakly predicts actual cost. "Easy" tasks for humans can be surprisingly expensive for agents, and vice versa. The classic "Moravec's Paradox" is also true for coding agents! 🔹 Agents struggle to predict their own costs. Self-prediction correlations top out around 0.39, and every model we tested systematically underestimates what a task will cost. Result-based pricing still has a long way to go when we cannot even figure out the token cost beforehand. Together, these results suggest that cost prediction is a genuinely challenging task for current agents. We think this opens up real research questions around self-modeling, calibrated cost estimation, and pricing mechanisms that work under residual uncertainty. Huge thanks to my collaborators: @Longju_Bai, Zhemin Huang, @sunjiao123sun_ , @xingyaow_ , @radamihalcea , @erikbryn , @alex_pentland paper: arxiv.org/abs/2604.22750 website: longjubai.github.io/agent_token_co…
Jiaxin Pei tweet media
English
1
35
125
12.2K
Deepak Vijaykeerthy retweetledi
Matthew Yglesias
Matthew Yglesias@mattyglesias·
Five months in, I think I've decided that I don't want to vibecode — I want professionally managed software companies to use AI coding assistance to make more/better/cheaper software products that they sell to me for money.
English
183
197
4K
397.2K
Deepak Vijaykeerthy retweetledi
Lakshya A Agrawal
Lakshya A Agrawal@LakshyAAAgrawal·
🧵Introducing LangProBe: the first benchmark testing where and how composing LLMs into language programs affects cost-quality tradeoffs! We find that, on avg across diverse tasks, smaller models within optimized programs beat calls to larger models at a fraction of the cost.
Lakshya A Agrawal tweet media
English
4
44
161
41.2K
Deepak Vijaykeerthy retweetledi
Aran Komatsuzaki
Aran Komatsuzaki@arankomatsuzaki·
The non-English tax is real. Sutton's Bitter Lesson, translated across languages and normalized to OpenAI English token count: Hindi: OpenAI 1.37×, Anthropic 3.24× Arabic: OpenAI 1.31×, Anthropic 2.86× Chinese: OpenAI 1.15×, Anthropic 1.71× Claude’s tokenizer charges a much higher linguistic tax.
Aran Komatsuzaki tweet media
English
94
251
1.5K
842.8K
Deepak Vijaykeerthy
Deepak Vijaykeerthy@deepakvijayke·
@XingyouSong You need a step-level signal to localise where the trajectory went wrong, then a token-level correction inside the bad steps. What makes RL on world/env feedback actually interesting is that proxy rewards are abundant in agentic workflows in a way human preferences never were.
English
0
0
0
28
Deepak Vijaykeerthy
Deepak Vijaykeerthy@deepakvijayke·
@XingyouSong Agree with the premise, but I think the hard part isn't the feedback. It's the credit assignment over it. :) GRPO assigns a uniform advantage within a rollout, so every token gets the same credit whether it caused the failure or not.
English
1
0
0
48
Richard Song
Richard Song@XingyouSong·
The day we use RL + GRPO-like techniques on all world feedback is the day we've achieved super-intelligence. We welcome all attendees and submissions -- see you all in Korea!
Akari Asai@AkariAsai

Can RL agents learn directly from real-world feedback? Join us at the RLxF (RL from World Feedback) Workshop @ ICML 2026 (Jul 10) to find out! 📄 Submit Full papers (8p) & proof-of-concept short-papers (2–4p) by May 13 AoE. 🔗 sites.google.com/view/rlxf-icml…

English
1
9
66
11.5K
Deepak Vijaykeerthy retweetledi
Jacob Bartlett
Jacob Bartlett@jacobtechtavern·
Can you feel the AGI yet?
Jacob Bartlett tweet media
English
13
2
62
7.9K
Deepak Vijaykeerthy retweetledi
Ronak Malde
Ronak Malde@rronak_·
My takeaways from ICLR 2026 1. Recursive self improvement / continual learning is the next frontier of research. Several great papers in self distillation, auto agent harness optimization, learning from non verifiable reward, self-play are sarly signs of success 2. Multimodal models and world models are attaining emergent reasoning capabilities, opening up a near door to spatial understanding that was previously locked 3. Lots of concerns that the research community is currently too focused on benchmaxxing rather than improving the research process, and a call to action to address this, like Percy Liang’s fully open source training community. 4. Rio is possibly even better than San Diego 🇧🇷🏄
English
31
134
1.5K
86K
Deepak Vijaykeerthy retweetledi
Michael Choi
Michael Choi@michaelchchoi·
I am going to teach a graduate-level course on "Stochastic Processes and Applications" in Fall 2026. Excited about this and for now I plan to teach it based on Pierre Del Moral and Spiridon Penev's book "Stochastic Processes: From Applications to Theory"
English
5
6
127
9.7K
Deepak Vijaykeerthy retweetledi
ishan
ishan@0xishand·
Agentic inference has a ton of patterns that you can exploit to get some really huge perf gains. However, inference engines today are opaque token in/token out machines without any knowledge about the “agentic loop”. Excited to share some of our early work on making Dynamo a world class agentic orchestrator layer. Stayed tuned for a lot more soon!
NVIDIA AI@NVIDIAAI

Traditional inference wasn’t built for agentic coding. Agentic tools make hundreds of API calls per coding session, often with recomputed context, creating bottlenecks that drive up cost per token. NVIDIA Dynamo rebuilds the stack for agents with: → KV-aware routing → Agent-aware scheduling → Multi-tier caching → Unified orchestration The result: higher cache hit rates, lower latency, and up to 7× more throughput: nvda.ws/3P1tO1N

English
2
5
21
2.4K
Deepak Vijaykeerthy retweetledi
lovish
lovish@louvishh·
🚨 New Paper: The Art of Scaling Reinforcement Learning Compute for LLMs 🚨 We burnt a lot of GPU-hours to provide the community with the first open, large-scale systematic study on RL scaling for LLMs. x.com/Devvrit_Khatri…
lovish tweet media
Devvrit@Devvrit_Khatri

Wish to build scaling laws for RL but not sure how to scale? Or what scales? Or would RL even scale predictably? We introduce: The Art of Scaling Reinforcement Learning Compute for LLMs

English
3
18
93
26.5K
Deepak Vijaykeerthy retweetledi
Praveen Swami
Praveen Swami@praveenswami·
This excellent article explains why you would be very unwise to take my advice on repairing cars, growing roses or curing leprosy. For some reason, newsrooms seem to be a little slow to grasp that they need people who know what they’re writing about. pekingnology.com/p/some-trouble…
English
1
1
1
1.1K
Deepak Vijaykeerthy retweetledi
vik
vik@vikhyatk·
if you want to know what it feels like to train a big model without training a big model read this
vik tweet media
English
7
26
687
131.8K
Dirhousssi Amine
Dirhousssi Amine@DirhousssiAmine·
@_lewtun oh ok, it makes more sense. I was getting confused bc they mentioned mixed training RL in prev v3.2 as a way to circumvent catastrophic forgetting
Dirhousssi Amine tweet media
English
1
0
0
39
Dirhousssi Amine
Dirhousssi Amine@DirhousssiAmine·
Catastrophic forgetting in RL is a real problem Deepseek v4 moved from mixed RL in v3.2 to OPD. I feel like splitting the post-training into two phases is counterproductive. I expect we'll start seeing robust RL post-trainning paradigm across tasks in the near future
Dirhousssi Amine tweet media
English
2
1
6
542