Abhinav Java

64 posts

Abhinav Java banner
Abhinav Java

Abhinav Java

@abhinav_java

navigating the sea of entropy | RF @Microsoft Research | Prev @Adobe, @dtu_delhi | Hala Madrid! | Looking for PhD opportunities!

Bangalore, India 参加日 Eylül 2022
608 フォロー中115 フォロワー
固定されたツイート
Abhinav Java
Abhinav Java@abhinav_java·
🚀 Meet FrugalRAG at #ICML2025 in Vancouver 🇨🇦! 📍 July 18 – VecDB Workshop, West 208–209 📍 July 19 – ES-FoMO Workshop, East Exhibition Hall A Come chat with me and @naga86 and learn how we're rethinking training efficiency and inference latency for RAG systems. 🧵
Abhinav Java tweet mediaAbhinav Java tweet media
English
1
4
12
1.5K
Abhinav Java がリツイート
Nalin Wadhwa
Nalin Wadhwa@nalin_wadhwa·
🚨 Can we truly prove that LLMs won't leak your data or generate unsafe code? Almost all LLM safety evals today only give estimates. Sampling. Benchmarks. Leaderboards. Zero guarantees ❌ We present BEAVER, the first system to provably bound LLM constraint satisfaction. 🧵👇
Nalin Wadhwa tweet media
English
11
14
72
16K
Abhinav Java がリツイート
Bhuvan Sachdeva
Bhuvan Sachdeva@SachdevaBhuvan·
New paper: Understanding Task Transfer in Vision–Language Models How does finetuning a model on one task affect its performance on other tasks? @karan_uppal3 and @abhinav_java are presenting this work at Unireps, NeurIPS!! 📍 Ballroom 20D ⏰ 3:45 PM – 5:00 PM Come and say Hi!🧵
Bhuvan Sachdeva tweet media
English
1
5
16
7.3K
Abhinav Java
Abhinav Java@abhinav_java·
@SandraWachter5 Same goes for writing, painting, etc. I think humans will continue to do things that bring them joy regardless of the ease of arriving at the final outcome. Or at least I do strongly hope so.
English
0
0
0
24
Sandra Wachter -@swachter.bsky.social
Sandra Wachter [email protected]@SandraWachter5·
I find it dystopian to claim the era of traditional photography is “over.”It basically says“why capture real genuine human moments if you could just generate them on your computer? I am not sure why we should celebrate the death of art & artists & see this as a sign of progress
English
1
0
5
1.7K
Abhinav Java がリツイート
Chirag Agarwal
Chirag Agarwal@_cagarwal·
Excited to share our new work on verifying LLM reasoning! Everyone loves Chain-of-Thought (CoT). LLMs can generate amazing, step-by-step solutions (called CoT). But when they make a mistake, can a human actually find it quickly? The answer is: No, not easily.
Chirag Agarwal tweet media
English
1
1
12
1.3K
VibeCodeTeddy
VibeCodeTeddy@VibeCodeTeddy·
@ChrisGPotts Prompt optimization streamlines efficiency. While RL can be shiny, it often complicates the basics we can nail down with good prompts.
English
1
0
2
294
Christopher Potts
Christopher Potts@ChrisGPotts·
I suspect biases against prompt optimization derive from the community elevating RL post-training to a mythical status. The truth is that RL post-training is hard, and never effective without outstanding prompts. Prompt optimizers are cheaper and more effective in most scenarios.
Karthik Kalyan@karthikkalyan90

Shout out to @DSPyOSS GEPA (From 20:15). cc @LakshyAAAgrawal

English
7
13
137
50K
Abhinav Java がリツイート
Ravid Shwartz Ziv
Ravid Shwartz Ziv@ziv_ravid·
The core problem with NeurIPS reviews is the submission volume. We simply don't have enough good reviewers. I'm not sure what is going to work, but there are many potential solutions that we can try: cap submissions per author, make review load proportional to submissions, or add modest submission fees. I think that at the end, we need fewer, higher-quality submissions.
English
7
4
91
13.1K
Abhinav Java
Abhinav Java@abhinav_java·
@corbtt This is aligned with our findings as well! For small models, prompt optimization + SFT + RL is surprisingly effective. Check out x.com/abhinav_java/s… Also we invite you to try your model on our latest challenging DR benchmark huggingface.co/datasets/micro…
Abhinav Java@abhinav_java

TL;DR 🧠 🔹 Optimized ReAct Prompting + 🔹 SFT to exploit test-time compute + 🔹 RL finetuning to learn when to stop ➡️ Only 1k training examples ⚡ Lower inference latency 📄 arxiv.org/abs/2507.07634…

English
0
0
0
17
Kyle Corbitt
Kyle Corbitt@corbtt·
The recipe is simple: 1️⃣ Use SFT to teach basic research skills 2️⃣ Apply GRPO to use them more effectively 3️⃣ Evaluate against DeepResearch Bench Result: a model that steadily climbs to frontier-level performance. 📈
Kyle Corbitt tweet media
English
3
3
57
6.7K
Kyle Corbitt
Kyle Corbitt@corbtt·
🚨 We’ve just published a recipe to train a frontier-level deep research agent using RL. With just 30 hours on an H200, any developer can now beat Sonnet-4 on DeepResearch Bench using open-source tools. (Thread 🧵)
Kyle Corbitt tweet media
English
38
171
1.3K
213.2K
Abhinav Java がリツイート
Aaron Tay
Aaron Tay@aarontay·
[Read] Characterizing Deep Research: A Benchmark and Formal Definition arxiv.org/abs/2508.04183 (1)
Aaron Tay tweet media
English
2
1
5
643
Abhinav Java がリツイート
Taelin
Taelin@VictorTaelin·
My thoughts on how AI will automate my SWE job in 2026 (I will be plainly honest on this post, even though some people on both sides of this debate will be upset. So, please, respect that these are my predictions. I don't want to start an argument, I just want to share my thoughts with people who might be interested.) So, through the last few days, I've been running a crazy experiment: rebuild HVM from scratch, with AI only. Turns out there is now a model (you know which - no more ads 😑) that is already capable of writing most of my code. I give it instructions, go do something else, and come back to a (possibly) working implementation. And that's really incredible. What I wonder is: what is still consuming my time? I'm an experienced developer. Given enough time™, I could accomplish a lot. In a year or two, I could write a browser, an operating system, a game engine, an MMORPG. So, if AIs are truly automating my job, how come I didn't ship all these things yet? What would objectively prevent me from becoming a single-man massive SWE company, if I wanted to? Well, other than the obvious bureaucratic matters, even in the pure coding sense, that is obviously not possible, for one reason: The AI can only work for so long before it needs me. Here's how this experiment went: 1. I wrote a full spec of the "next-gen HVM" 2. I asked the AI to make the first part (parser) 3. It wrote 80% correct code, but got key points wrong 4. I corrected these points (expert intelligence injection) 5. It wrote 100% correct code 6. I asked the AI to make the second part 7. ... repeat 2-5 over and over ... About 3 days later, I have a working prototype. I didn't write more than 1% of that code. I spent 95% of that time playing games. From a point of view, the AI automated 95% of my job, if we measure by time alone. Yet, from another point of view, it automated 0% of my job. After all, without the expert (me) stepping in every 30 minutes, the AI wouldn't be able to move past the very first module. That is, if I just said "implement the spec" and left it working alone, it would not complete the job. Not in 3 days, not in 3 years. I'd just come back to a bug-ridden, useless codebase, and a traumatized codebot. Is a way, this is kinda the best case scenario for the desperate about job loss: today AI can automate most of your job, as long as it is you doing it! How convenient, isn't it? Of course, we can't count on that being true forever. If the "autonomous work time span" keeps increasing - as many expect - then, as soon as it hits the 3 day mark (48x times from where we are now), it would have been able to complete the job without any intervention. At least in theory. Now, when (or whether) that happens, it depends on the big AI labs, and I really have no control over it. Yet, what I can do is ask: Given whatever is the SOTA model's autonomous work time window, can I implement a new language and tooling so good that it extend that time window by a constant, but substantial, factor? Can I make the Bend2→NeoGen→AI loop so good that this same model would be able to complete this very task with way less interventions - or, even better, fully autonomously? I'm really excited because I think the answer is "yes", and it seems like we're really close to that point here. If that really works, that would be a huge, because we'd then be able to automate the development of massive scale software, a few years before models are able to do so on their own. (I wonder if HOC should just become a SWE company at that point?) Plus it would be a massive bump on my quality of life, as I'd finally be able to stop playing League of Legends, and go play Ragnarok Online instead, like the old times 🥳 Anyway that's what's currently on my mind Thanks for reading and have a great day (Today I'll let the AI keep polishing ""HVM4"" as I do other things😏 on the background. I really want to see how far this goes.)
English
93
67
1.3K
140.3K
Abhinav Java がリツイート
zek
zek@zekramu·
You guys don’t understand. Bad grammar and spelling is becoming high signal. Perfection looks too close to an LLM. Being retarded is the only way to differentiate yourself for a machine
English
848
1.8K
24.2K
1.3M
Abhinav Java がリツイート
joowon
joowon@n0w00j·
this is peter gregory vs gavin belson irl
joowon tweet media
English
110
177
3.1K
199.7K
Abhinav Java がリツイート
Amit Sharma
Amit Sharma@amt_shrma·
Deep research has emerged as a popular task with many recently released models. But beyond lengthy reports, what exactly defines the task? And how to quantify progress? [New Paper!] We provide an objective defn. centered on claim discovery & a 100-problem benchmark spanning scientific discovery and prior art search. 🧵
English
1
12
38
2.8K
Abhinav Java がリツイート
Swayam Singh
Swayam Singh@swayaminsync·
Good work & impressive gains (though core RL folks critique entropy methods). Idea: - Explore until model's entropy nears natural range. - Adjust adv at token level for shared/unique tokens (which is already in GRPO via token-level importance-sampling) 🔗arxiv.org/abs/2507.19849
Swayam Singh tweet media
English
2
2
33
2K