Abhinav Java

69 posts

Abhinav Java banner
Abhinav Java

Abhinav Java

@abhinav_java

navigating the sea of entropy | RF @Microsoft Research | Prev @Adobe, @dtu_delhi | Hala Madrid!

Bangalore, India Katılım Eylül 2022
624 Takip Edilen122 Takipçiler
Sabitlenmiş Tweet
Abhinav Java
Abhinav Java@abhinav_java·
Our #CVPR 2026 paper is selected for an oral presentation🌟 One of my favorite projects as a Research Fellow at MSR India w @SachdevaBhuvan, @karan_uppal3, and Vineeth NB! @SachdevaBhuvan is extremely driven and talented. If you’ve got RE roles, reach out now!
Bhuvan Sachdeva@SachdevaBhuvan

Our paper has been accepted for Oral presentation at #CVPR 🎉🎉 Kudos to the team: @karan_uppal3, @abhinav_java See you in Denver! On a side note, I am finishing my fellowship by June and looking for full-time research roles. DMs open.

English
0
0
12
380
Abhinav Java retweetledi
Deepti Ghadiyaram
Deepti Ghadiyaram@deeptigp·
[1/4] The human eye doesn't process every single pixel of a video continuously—it focuses on what changes. So why are our video AI models wasting compute on redundant frames? Introducing Swift Sampling: a test-time technique inspired by the human visual system. 🧠👇
Deepti Ghadiyaram tweet media
English
3
21
188
19.6K
Abhinav Java
Abhinav Java@abhinav_java·
Unexpected Claude Code hack: instead of letting it write the code, ask it to generate exercises for you. Way more fun. You actually learn the codebase and you stop being a spectator to your own project. #ClaudeCode
Abhinav Java tweet media
English
0
0
1
73
Abhinav Java retweetledi
Ashutosh Srivastava
Ashutosh Srivastava@h4shkat·
Realtime visualizer for @a1zhang’s RLMs is live! Watch recursive decomposition, sub-LLM calls, and REPL state unfold in real time. Built directly on the inference engine with interactive visualizations. Thread with details 👇
English
9
17
250
17.9K
Abhinav Java retweetledi
Amit Sharma
Amit Sharma@amt_shrma·
🚀 Hiring Alert: Postdoc Positions at Microsoft Research India! I'm especially looking for candidates interested in trustworthy AI reasoning, agentic systems and verification. If you hold a PhD in CS or a related field and want to work on research that truly matters, consider applying. ⏳ Applications are reviewed on a rolling basis through April 15 Happy to answer any questions through replies here.
English
3
12
78
10.6K
Abhinav Java retweetledi
Nalin Wadhwa @ ICLR 2026
Nalin Wadhwa @ ICLR 2026@nalin_wadhwa·
🚨 Can we truly prove that LLMs won't leak your data or generate unsafe code? Almost all LLM safety evals today only give estimates. Sampling. Benchmarks. Leaderboards. Zero guarantees ❌ We present BEAVER, the first system to provably bound LLM constraint satisfaction. 🧵👇
Nalin Wadhwa @ ICLR 2026 tweet media
English
11
14
70
18.8K
Abhinav Java retweetledi
Bhuvan Sachdeva
Bhuvan Sachdeva@SachdevaBhuvan·
New paper: Understanding Task Transfer in Vision–Language Models How does finetuning a model on one task affect its performance on other tasks? @karan_uppal3 and @abhinav_java are presenting this work at Unireps, NeurIPS!! 📍 Ballroom 20D ⏰ 3:45 PM – 5:00 PM Come and say Hi!🧵
Bhuvan Sachdeva tweet media
English
1
5
16
9.7K
Abhinav Java
Abhinav Java@abhinav_java·
@SandraWachter5 Same goes for writing, painting, etc. I think humans will continue to do things that bring them joy regardless of the ease of arriving at the final outcome. Or at least I do strongly hope so.
English
0
0
0
24
Sandra Wachter -@swachter.bsky.social
Sandra Wachter [email protected]@SandraWachter5·
I find it dystopian to claim the era of traditional photography is “over.”It basically says“why capture real genuine human moments if you could just generate them on your computer? I am not sure why we should celebrate the death of art & artists & see this as a sign of progress
English
1
0
5
1.7K
Abhinav Java retweetledi
Chirag Agarwal
Chirag Agarwal@_cagarwal·
Excited to share our new work on verifying LLM reasoning! Everyone loves Chain-of-Thought (CoT). LLMs can generate amazing, step-by-step solutions (called CoT). But when they make a mistake, can a human actually find it quickly? The answer is: No, not easily.
Chirag Agarwal tweet media
English
1
1
12
1.4K
VibeCodeTeddy
VibeCodeTeddy@VibeCodeTeddy·
@ChrisGPotts Prompt optimization streamlines efficiency. While RL can be shiny, it often complicates the basics we can nail down with good prompts.
English
1
0
2
294
Christopher Potts
Christopher Potts@ChrisGPotts·
I suspect biases against prompt optimization derive from the community elevating RL post-training to a mythical status. The truth is that RL post-training is hard, and never effective without outstanding prompts. Prompt optimizers are cheaper and more effective in most scenarios.
Karthik Kalyan@karthikkalyan90

Shout out to @DSPyOSS GEPA (From 20:15). cc @LakshyAAAgrawal

English
7
13
137
50.1K
Abhinav Java retweetledi
Ravid Shwartz Ziv
Ravid Shwartz Ziv@ziv_ravid·
The core problem with NeurIPS reviews is the submission volume. We simply don't have enough good reviewers. I'm not sure what is going to work, but there are many potential solutions that we can try: cap submissions per author, make review load proportional to submissions, or add modest submission fees. I think that at the end, we need fewer, higher-quality submissions.
English
7
4
90
13.1K
Abhinav Java
Abhinav Java@abhinav_java·
@corbtt This is aligned with our findings as well! For small models, prompt optimization + SFT + RL is surprisingly effective. Check out x.com/abhinav_java/s… Also we invite you to try your model on our latest challenging DR benchmark huggingface.co/datasets/micro…
Abhinav Java@abhinav_java

TL;DR 🧠 🔹 Optimized ReAct Prompting + 🔹 SFT to exploit test-time compute + 🔹 RL finetuning to learn when to stop ➡️ Only 1k training examples ⚡ Lower inference latency 📄 arxiv.org/abs/2507.07634…

English
0
0
0
17
Kyle Corbitt
Kyle Corbitt@corbtt·
The recipe is simple: 1️⃣ Use SFT to teach basic research skills 2️⃣ Apply GRPO to use them more effectively 3️⃣ Evaluate against DeepResearch Bench Result: a model that steadily climbs to frontier-level performance. 📈
Kyle Corbitt tweet media
English
3
3
56
6.7K
Kyle Corbitt
Kyle Corbitt@corbtt·
🚨 We’ve just published a recipe to train a frontier-level deep research agent using RL. With just 30 hours on an H200, any developer can now beat Sonnet-4 on DeepResearch Bench using open-source tools. (Thread 🧵)
Kyle Corbitt tweet media
English
38
169
1.3K
213.3K
Abhinav Java retweetledi
Aaron Tay
Aaron Tay@aarontay·
[Read] Characterizing Deep Research: A Benchmark and Formal Definition arxiv.org/abs/2508.04183 (1)
Aaron Tay tweet media
English
2
1
5
651
Abhinav Java retweetledi
Taelin
Taelin@VictorTaelin·
My thoughts on how AI will automate my SWE job in 2026 (I will be plainly honest on this post, even though some people on both sides of this debate will be upset. So, please, respect that these are my predictions. I don't want to start an argument, I just want to share my thoughts with people who might be interested.) So, through the last few days, I've been running a crazy experiment: rebuild HVM from scratch, with AI only. Turns out there is now a model (you know which - no more ads 😑) that is already capable of writing most of my code. I give it instructions, go do something else, and come back to a (possibly) working implementation. And that's really incredible. What I wonder is: what is still consuming my time? I'm an experienced developer. Given enough time™, I could accomplish a lot. In a year or two, I could write a browser, an operating system, a game engine, an MMORPG. So, if AIs are truly automating my job, how come I didn't ship all these things yet? What would objectively prevent me from becoming a single-man massive SWE company, if I wanted to? Well, other than the obvious bureaucratic matters, even in the pure coding sense, that is obviously not possible, for one reason: The AI can only work for so long before it needs me. Here's how this experiment went: 1. I wrote a full spec of the "next-gen HVM" 2. I asked the AI to make the first part (parser) 3. It wrote 80% correct code, but got key points wrong 4. I corrected these points (expert intelligence injection) 5. It wrote 100% correct code 6. I asked the AI to make the second part 7. ... repeat 2-5 over and over ... About 3 days later, I have a working prototype. I didn't write more than 1% of that code. I spent 95% of that time playing games. From a point of view, the AI automated 95% of my job, if we measure by time alone. Yet, from another point of view, it automated 0% of my job. After all, without the expert (me) stepping in every 30 minutes, the AI wouldn't be able to move past the very first module. That is, if I just said "implement the spec" and left it working alone, it would not complete the job. Not in 3 days, not in 3 years. I'd just come back to a bug-ridden, useless codebase, and a traumatized codebot. Is a way, this is kinda the best case scenario for the desperate about job loss: today AI can automate most of your job, as long as it is you doing it! How convenient, isn't it? Of course, we can't count on that being true forever. If the "autonomous work time span" keeps increasing - as many expect - then, as soon as it hits the 3 day mark (48x times from where we are now), it would have been able to complete the job without any intervention. At least in theory. Now, when (or whether) that happens, it depends on the big AI labs, and I really have no control over it. Yet, what I can do is ask: Given whatever is the SOTA model's autonomous work time window, can I implement a new language and tooling so good that it extend that time window by a constant, but substantial, factor? Can I make the Bend2→NeoGen→AI loop so good that this same model would be able to complete this very task with way less interventions - or, even better, fully autonomously? I'm really excited because I think the answer is "yes", and it seems like we're really close to that point here. If that really works, that would be a huge, because we'd then be able to automate the development of massive scale software, a few years before models are able to do so on their own. (I wonder if HOC should just become a SWE company at that point?) Plus it would be a massive bump on my quality of life, as I'd finally be able to stop playing League of Legends, and go play Ragnarok Online instead, like the old times 🥳 Anyway that's what's currently on my mind Thanks for reading and have a great day (Today I'll let the AI keep polishing ""HVM4"" as I do other things😏 on the background. I really want to see how far this goes.)
English
91
66
1.3K
140.4K
Abhinav Java retweetledi
zek
zek@zekramu·
You guys don’t understand. Bad grammar and spelling is becoming high signal. Perfection looks too close to an LLM. Being retarded is the only way to differentiate yourself for a machine
English
835
1.7K
24K
1.3M