Abhinav Java

69 posts

Abhinav Java

@abhinav_java

navigating the sea of entropy | RF @Microsoft Research | Prev @Adobe, @dtu_delhi | Hala Madrid!

Bangalore, India Katılım Eylül 2022

624 Takip Edilen122 Takipçiler

Sabitlenmiş Tweet

Abhinav Java@abhinav_java·10 Nis

Our #CVPR 2026 paper is selected for an oral presentation🌟 One of my favorite projects as a Research Fellow at MSR India w @SachdevaBhuvan, @karan_uppal3, and Vineeth NB! @SachdevaBhuvan is extremely driven and talented. If you’ve got RE roles, reach out now!

Bhuvan Sachdeva@SachdevaBhuvan

Our paper has been accepted for Oral presentation at #CVPR 🎉🎉 Kudos to the team: @karan_uppal3, @abhinav_java See you in Denver! On a side note, I am finishing my fellowship by June and looking for full-time research roles. DMs open.

English

380

Abhinav Java retweetledi

Deepti Ghadiyaram@deeptigp·2d

[1/4] The human eye doesn't process every single pixel of a video continuously—it focuses on what changes. So why are our video AI models wasting compute on redundant frames? Introducing Swift Sampling: a test-time technique inspired by the human visual system. 🧠👇

English

188

19.6K

Abhinav Java@abhinav_java·3d

Unexpected Claude Code hack: instead of letting it write the code, ask it to generate exercises for you. Way more fun. You actually learn the codebase and you stop being a spectator to your own project. #ClaudeCode

English

Abhinav Java retweetledi

Ashutosh Srivastava@h4shkat·12 Nis

Realtime visualizer for @a1zhang’s RLMs is live! Watch recursive decomposition, sub-LLM calls, and REPL state unfold in real time. Built directly on the inference engine with interactive visualizations. Thread with details 👇

English

250

17.9K

Abhinav Java retweetledi

Amit Sharma@amt_shrma·1 Nis

🚀 Hiring Alert: Postdoc Positions at Microsoft Research India! I'm especially looking for candidates interested in trustworthy AI reasoning, agentic systems and verification. If you hold a PhD in CS or a related field and want to work on research that truly matters, consider applying. ⏳ Applications are reviewed on a rolling basis through April 15 Happy to answer any questions through replies here.

English

10.6K

Abhinav Java@abhinav_java·12 Mar

Legend says Fede Valverde is still running. #REALMADRİD #halamadrid #ucl

English

Abhinav Java retweetledi

Bhuvan Sachdeva@SachdevaBhuvan·21 Şub

Full paper accepted to CVPR!!😀

Bhuvan Sachdeva@SachdevaBhuvan

New paper: Understanding Task Transfer in Vision–Language Models How does finetuning a model on one task affect its performance on other tasks? @karan_uppal3 and @abhinav_java are presenting this work at Unireps, NeurIPS!! 📍 Ballroom 20D ⏰ 3:45 PM – 5:00 PM Come and say Hi!🧵

English

6.7K

Abhinav Java retweetledi

Nalin Wadhwa @ ICLR 2026@nalin_wadhwa·5 Şub

🚨 Can we truly prove that LLMs won't leak your data or generate unsafe code? Almost all LLM safety evals today only give estimates. Sampling. Benchmarks. Leaderboards. Zero guarantees ❌ We present BEAVER, the first system to provably bound LLM constraint satisfaction. 🧵👇

English

18.8K

Abhinav Java retweetledi

Bhuvan Sachdeva@SachdevaBhuvan·6 Ara

English

9.7K

Abhinav Java@abhinav_java·2 Ara

@SandraWachter5 Same goes for writing, painting, etc. I think humans will continue to do things that bring them joy regardless of the ease of arriving at the final outcome. Or at least I do strongly hope so.

English

Sandra Wachter [email protected]@SandraWachter5·1 Ara

I find it dystopian to claim the era of traditional photography is “over.”It basically says“why capture real genuine human moments if you could just generate them on your computer? I am not sure why we should celebrate the death of art & artists & see this as a sign of progress

English

1.7K

Abhinav Java@abhinav_java·29 Kas

For once I was glad that a reviewer responded within two hours of posting a rebuttal 🥲

Yongyuan Liang@cheryyun_l

After 24 hours of complete silence except for that single social media statement, ICLR has now decided to disregard all author and reviewer discussions during the two week rebuttal period 🤣 Quite a surprising way to wrap up this year’s review process

English

229

Abhinav Java retweetledi

Chirag Agarwal@_cagarwal·29 Eki

Excited to share our new work on verifying LLM reasoning! Everyone loves Chain-of-Thought (CoT). LLMs can generate amazing, step-by-step solutions (called CoT). But when they make a mistake, can a human actually find it quickly? The answer is: No, not easily.

English

1.4K

Abhinav Java@abhinav_java·10 Eki

@VibeCodeTeddy @ChrisGPotts This directly supports our thesis arxiv.org/abs/2507.07634. We find that prompt optimization can also drastically fuel RL efficiency

English

VibeCodeTeddy@VibeCodeTeddy·9 Eki

@ChrisGPotts Prompt optimization streamlines efficiency. While RL can be shiny, it often complicates the basics we can nail down with good prompts.

English

294

Christopher Potts@ChrisGPotts·9 Eki

I suspect biases against prompt optimization derive from the community elevating RL post-training to a mythical status. The truth is that RL post-training is hard, and never effective without outstanding prompts. Prompt optimizers are cheaper and more effective in most scenarios.

Karthik Kalyan@karthikkalyan90

Shout out to @DSPyOSS GEPA (From 20:15). cc @LakshyAAAgrawal

English

137

50.1K

Abhinav Java retweetledi

Tarun Menta@_tarunmenta·25 Eyl

Poured a ton into this over the past few months - late nights, endless brainstorming, and lots of debugging marathons alongside @VikParuchuri and @zach_nussbaum. Couldn’t be prouder that it’s finally in the hands of users 🚀

Datalab@datalabto

Launch Day 3 of 6: Layout Model Updates 🚀 If your doc parser gets layout wrong, everything downstream breaks. ⚠️ Reading order scrambled → unusable output ⚠️ Blocks missed → footnotes, financial figures, legal text lost We just rolled out major layout upgrades at Datalab to fix this at the root.

English

311

Abhinav Java@abhinav_java·21 Eyl

@ziv_ravid AI reviewers for filtering noise?

English

524

Ravid Shwartz Ziv@ziv_ravid·20 Eyl

The core problem with NeurIPS reviews is the submission volume. We simply don't have enough good reviewers. I'm not sure what is going to work, but there are many potential solutions that we can try: cap submissions per author, make review load proportional to submissions, or add modest submission fees. I think that at the end, we need fewer, higher-quality submissions.

English

13.1K

Abhinav Java@abhinav_java·12 Eyl

Thrilled to share our latest work LiveDRBench at Ploutos! Catch us on Sept 12, 7PM PDT. #Research #Benchmarking #AI #DeepResearch huggingface.co/datasets/micro…

Cecile Tamura@ceciletamura

🔍 What is deep research & how can AI master it? Join us for a fireside chat on a new benchmark LiveDRBench w/: 🎙 @abhinav_java of @Microsoft Research India 🎙 @Ceciletamura of @ploutosai world.ploutos.dev/stream/crystal…

English

108

Abhinav Java@abhinav_java·3 Eyl

@corbtt This is aligned with our findings as well! For small models, prompt optimization + SFT + RL is surprisingly effective. Check out x.com/abhinav_java/s… Also we invite you to try your model on our latest challenging DR benchmark huggingface.co/datasets/micro…

Abhinav Java@abhinav_java

TL;DR 🧠 🔹 Optimized ReAct Prompting + 🔹 SFT to exploit test-time compute + 🔹 RL finetuning to learn when to stop ➡️ Only 1k training examples ⚡ Lower inference latency 📄 arxiv.org/abs/2507.07634…

English

Kyle Corbitt@corbtt·2 Eyl

The recipe is simple: 1️⃣ Use SFT to teach basic research skills 2️⃣ Apply GRPO to use them more effectively 3️⃣ Evaluate against DeepResearch Bench Result: a model that steadily climbs to frontier-level performance. 📈

English

6.7K

Kyle Corbitt@corbtt·2 Eyl

🚨 We’ve just published a recipe to train a frontier-level deep research agent using RL. With just 30 hours on an H200, any developer can now beat Sonnet-4 on DeepResearch Bench using open-source tools. (Thread 🧵)

English

169

1.3K

213.3K

Abhinav Java retweetledi

Aaron Tay@aarontay·10 Ağu

[Read] Characterizing Deep Research: A Benchmark and Formal Definition arxiv.org/abs/2508.04183 (1)

English

651

Abhinav Java retweetledi

Taelin@VictorTaelin·26 Ağu

My thoughts on how AI will automate my SWE job in 2026 (I will be plainly honest on this post, even though some people on both sides of this debate will be upset. So, please, respect that these are my predictions. I don't want to start an argument, I just want to share my thoughts with people who might be interested.) So, through the last few days, I've been running a crazy experiment: rebuild HVM from scratch, with AI only. Turns out there is now a model (you know which - no more ads 😑) that is already capable of writing most of my code. I give it instructions, go do something else, and come back to a (possibly) working implementation. And that's really incredible. What I wonder is: what is still consuming my time? I'm an experienced developer. Given enough time™, I could accomplish a lot. In a year or two, I could write a browser, an operating system, a game engine, an MMORPG. So, if AIs are truly automating my job, how come I didn't ship all these things yet? What would objectively prevent me from becoming a single-man massive SWE company, if I wanted to? Well, other than the obvious bureaucratic matters, even in the pure coding sense, that is obviously not possible, for one reason: The AI can only work for so long before it needs me. Here's how this experiment went: 1. I wrote a full spec of the "next-gen HVM" 2. I asked the AI to make the first part (parser) 3. It wrote 80% correct code, but got key points wrong 4. I corrected these points (expert intelligence injection) 5. It wrote 100% correct code 6. I asked the AI to make the second part 7. ... repeat 2-5 over and over ... About 3 days later, I have a working prototype. I didn't write more than 1% of that code. I spent 95% of that time playing games. From a point of view, the AI automated 95% of my job, if we measure by time alone. Yet, from another point of view, it automated 0% of my job. After all, without the expert (me) stepping in every 30 minutes, the AI wouldn't be able to move past the very first module. That is, if I just said "implement the spec" and left it working alone, it would not complete the job. Not in 3 days, not in 3 years. I'd just come back to a bug-ridden, useless codebase, and a traumatized codebot. Is a way, this is kinda the best case scenario for the desperate about job loss: today AI can automate most of your job, as long as it is you doing it! How convenient, isn't it? Of course, we can't count on that being true forever. If the "autonomous work time span" keeps increasing - as many expect - then, as soon as it hits the 3 day mark (48x times from where we are now), it would have been able to complete the job without any intervention. At least in theory. Now, when (or whether) that happens, it depends on the big AI labs, and I really have no control over it. Yet, what I can do is ask: Given whatever is the SOTA model's autonomous work time window, can I implement a new language and tooling so good that it extend that time window by a constant, but substantial, factor? Can I make the Bend2→NeoGen→AI loop so good that this same model would be able to complete this very task with way less interventions - or, even better, fully autonomously? I'm really excited because I think the answer is "yes", and it seems like we're really close to that point here. If that really works, that would be a huge, because we'd then be able to automate the development of massive scale software, a few years before models are able to do so on their own. (I wonder if HOC should just become a SWE company at that point?) Plus it would be a massive bump on my quality of life, as I'd finally be able to stop playing League of Legends, and go play Ragnarok Online instead, like the old times 🥳 Anyway that's what's currently on my mind Thanks for reading and have a great day (Today I'll let the AI keep polishing ""HVM4"" as I do other things😏 on the background. I really want to see how far this goes.)

English

1.3K

140.4K

Abhinav Java retweetledi

zek@zekramu·22 Ağu

You guys don’t understand. Bad grammar and spelling is becoming high signal. Perfection looks too close to an LLM. Being retarded is the only way to differentiate yourself for a machine

English

835

1.7K

24K

1.3M

Keşfet

@a1zhang @karan_uppal3 @SandraWachter5 @VibeCodeTeddy @ChrisGPotts @VikParuchuri @zach_nussbaum @ziv_ravid