Vijay V.

1.1K posts

Vijay V.

@vijaytarian

Grad student at CMU. I do research on ̶a̶p̶p̶l̶i̶e̶d̶ ̶N̶L̶P̶ ̶ large language models. he/him

Pittsburgh, PA Katılım Nisan 2009

504 Takip Edilen773 Takipçiler

Sabitlenmiş Tweet

Vijay V.@vijaytarian·26 Tem

RL with verifiable rewards? Works great ✨ Realistic or non-verifiable tasks? Still a mess 📉 Reward models and AI judges? Fragile and inconsistent 💔 Our proposal? RL from Checklist Feedback 📜 arxiv.org/abs/2507.18624 👇

English

234

23.6K

Vijay V. retweetledi

David Duvenaud@DavidDuvenaud·6d

Announcing Talkie: a new, open-weight historical LLM! We trained and finetuned a 13B model on a newly-curated dataset of only pre-1930 data. Try it below! with @AlecRad and @status_effects 🧵

English

197

450

3.5K

1.4M

Vijay V.@vijaytarian·23 Nis

We trained an 8B model to help coding agents ask users clarifying questions, matching GPT-5 while asking far fewer Q's! We show a concrete playbook for RL in human-AI interaction: use data analysis to find what drives good interactions, then encode it as a structured reward ⬇️🧵

Sanidhya Vijayvargiya@sanidhya903

1/ Humans often can’t state exactly what they want, making things hard for AI agents. Obvious fix: ask clarifying questions. But which ones? We studied this empirically with coding agents. Effective clarification comes down to two properties: answerability and task relevance.

English

Vijay V. retweetledi

Cas (Stephen Casper)@StephenLCasper·10 Nis

I'm pretty convinced that AI companies perversely use warnings about danger as a hype mechanism. This week, Anthropic has gotten tons of attention by announcing its Mythos model and talking a lot about its unique risks in the cyber domain. But if it's so dangerous, why exactly did they need to announce this publicly? And why are their employees vagueposting sensational stuff? Meanwhile, it looks like OpenAI is feeling left out, so they are now announcing their own internal progress toward the same type of model. It reminds me of last year when Anthropic probably made too big a deal about Claude 4 Opus and when they did not "rule out the possibility" of uplift on dangerous tasks. OpenAI and Google immediately made the same warnings for their next systems. I don't fully trust the story being constructed. AI companies seem to be crying wolf and diluting risk discourse in order to get attention. This is not to say that I don't think that Mythos is a big deal/concern. It probably is. But it's almost certainly a smaller concern than Anthropic people are selling it to be. Meanwhile, Anthropic is basking in hype/press this week. We are on such a dumb timeline.

English

118

11.1K

Vijay V. retweetledi

Anjali Kantharuban@anjali_ruban·8 Nis

Most sentence embeddings capture *what* is said, not *how* it is said. We introduce ✨IDIOLEX ✨: a training framework that captures idiolect using only weak supervision (social proximity + linguistic features), not task labels. 📄 Preprint: arxiv.org/abs/2604.04704 🧵 1/8

English

9.4K

Vijay V. retweetledi

Arvind Narayanan@random_walker·6 Mar

The real sign of AI writing is not superficial stuff like “It’s not X—it’s Y”. It’s the hollowness. Polished writing but relatively mundane ideas. The giveaway is that you’re less impressed when you read it the second time. With good writing, it should be the other way around. I’m not sure this is inherently about AI. It’s more about the fact that people tend to turn to AI when they don’t have much to say. Reading text that has the syntactic smell of AI is mildly annoying, but when I read hollow writing I feel the writer is wasting my time, which is much more frustrating. So don’t do it. People are unlikely to respond to your email or subscribe to your newsletter or whatever you’re trying to get them to do. And they’ll probably remember that you betrayed their trust as a reader.

English

247

438.7K

Vijay V. retweetledi

Harman Singh @ ICLR 🇧🇷@Harman26Singh·5 Mar

Can LLMs Self-Verify? Much better than you'd expect. LLMs are increasingly used as parallel reasoners, sampling many solutions at once. Choosing the right answer is the real bottleneck. We show that pairwise self-verification is a powerful primitive. Introducing V1, a framework that unifies generation and self-verification: 💡 Pairwise self-verification beats pointwise scoring, improving test-time scaling 💡 V1-Infer: Efficient tournament-style ranking that improves self-verification 💡 V1-PairRL: RL training where generation and verification co-evolve for developing better self-verifiers 🧵👇

English

383

90.1K

Vijay V.@vijaytarian·28 Şub

If I'm understanding this tweet (and that's a big if, given how it's written), OpenAI won by trusting the gov't to interpret its terms of service freely. If a user is allowed to decide how the terms of service are interpreted, aren't they meaningless?

Senior Official Jeremy Lewin@UnderSecretaryF

For the avoidance of doubt, the OpenAI - @DeptofWar contract flows from the touchstone of “all lawful use” that DoW has rightfully insisted upon & xAI agreed to. But as Sam explained, it references certain existing legal authorities and includes certain mutually agreed upon safety mechanisms. This, again, is a compromise that Anthropic was offered, and rejected. Even if the substantive issues are the same there is a huge difference between (1) memorializing specific safety concerns by reference to particular legal and policy authorities, which are products of our constitutional and political system, and (2) insisting upon a set of prudential constraints subject to the interpretation of a private company and CEO. As we have been saying, the question is fundamental—who decides these weighty questions? Approach (1), accepted by OAI, references laws and thus appropriately vests those questions in our democratic system. Approach (2) unacceptably vests those questions in a single unaccountable CEO who would usurp sovereign control of our most sensitive systems. It is a great day for both America’s national security and AI leadership that two of our leading labs, OAI and xAI have reached the patriotic and correct answer here 🇺🇸

English

292

Vijay V. retweetledi

Gokul Swamy@g_k_swamy·27 Şub

It took a few years of deep thinking, but I'm super excited to finally share PROSPER: a beautiful, regression-based algorithm for RL from *rubric rewards* that robustly handles the *inconsistent feedback* that LLM judges provide. Let's go Back to Black(well)! 🧵(1/n)

English

271

51.2K

Vijay V. retweetledi

Chayenne Zhao@GenAI_is_real·25 Şub

At my undergrad, before this vibe coding century, having a detailed argument explanation was an indicator of carefulness, high code quality, and precision. One cannot imagine that having the codes as I attached took how many hours of Vijay, Graham, and me. @gneubig @vijaytarian Things changed! 😭 If anyone gives me a code base with a detailed argument explanation, it's 100% sure that they are vibe coding. Right now, argument explanation is an indicator of low code quality.

English

4.5K

Vijay V.@vijaytarian·24 Şub

⚠️ DeepSeek allegedly used Claude as a rubric grader to get rewards for RL. This is Against The Rules™️ 🏝️ Good news is, if your rubrics are actually of high quality, then you don't need such a big, expensive model to do the grading! See more in arxiv.org/abs/2507.18624

Anthropic@AnthropicAI

Distillation can be legitimate: AI labs use it to create smaller, cheaper models for their customers. But foreign labs that illicitly distill American models can remove safeguards, feeding model capabilities into their own military, intelligence, and surveillance systems.

English

11K

Vijay V.@vijaytarian·13 Şub

Very cool work on using checklist rewards (aka rubrics) for improving multi-step to use. Use checklists, not (naive) reward models!

Zhen Zhang@zhenzhangzz

AI agents are evolving beyond simple tasks to complex, multi-turn and multi-step interactions. But how do we train them with RL when verifiable rewards don't exist for open-ended conversations and building execution environments for thousands of tools is unscalable? Introducing 🛠️CM2: RL with Checklist Rewards for Multi-Turn and Multi-Step Agentic Tool Use [arxiv.org/abs/2602.12268] Core Contributions: 🔄 Multi-turn and Multi-step tool use senario ✅ Checklist Rewards: Replaces vague scalar scores with fine-grained, evidence-based binary criteria. 🛠️ Scalable Tool Simulation: Trains on 5,000+ tools using a hybrid LLM simulator, removing the need for manual API engineering. 👍 SOTA Performance: Achieves +8-12 point gains on τ^2-Bench, BFCL-V4 & ToolSandbox, surpassing larger open-source models.

English

2.8K

Vijay V. retweetledi

Cameron R. Wolfe, Ph.D.@cwolferesearch·4 Şub

I've been reading a lot about rubrics-as-rewards (RaR) for RL. Some of my favorite papers (so far): 1. arxiv.org/abs/2507.17746 2. arxiv.org/abs/2508.12790 3. arxiv.org/abs/2510.07743 4. arxiv.org/abs/2511.19399 5. arxiv.org/abs/2507.18624 Most of the added technical complexity of RaR is less related to RL and more related to reward modeling. If we can get a reliable reward signal, RaR works well, but teaching a model to perform granular / instance-level evaluation is tough. Generalizing these evaluation capabilities across arbitrary domains is even tougher (especially those that are highly subjective). Our reward model also needs to avoid hacking in large-scale RL runs. In my opinion, new developments in this space are likely to come from advancing the frontier of (generative) reward models rather than RL. So much to be done.

English

413

23.3K

Vijay V. retweetledi

Akari Asai@AkariAsai·4 Şub

Thrilled to share: OpenScholar - our work on scientific deep research agents for reliable literature synthesis -has been accepted to Nature! 🎉 Huge thanks to collaborators across institutions who made this possible!

English

228

1.3K

126.4K

Vijay V.@vijaytarian·11 Oca

We never talked about Prompt2Model as an "AI research agent" when we were designing it, but in hindsight that's exactly what it is! One takeaway is that choosing the right level of abstraction for your agent's "action space" might be more important than anything else

Graham Neubig@gneubig

With the talk of automated science and agents training models nowadays, I'd like to highlight one of our older works *prompt2model* by @vijaytarian and @GenAI_is_real We had an automatic pipeline of data acquisition, data transformation, fine tuning, and evaluation.

English

Vijay V. retweetledi

Yiming Xu@yiming_xu_·5 Ara

🤔 Can we build clustering that understands your specific domain instead of just grouping similar words? Introducing ClusterFusion. By guiding LLMs with embeddings, we achieve +48% accuracy over SOTA on domain-specific data. 🚀 Check out below 👇 arxiv.org/abs/2512.04350

English

16.7K

Vijay V.@vijaytarian·4 Ara

Happening today in Exhibit Hall C,D,E, Poster #103

Vijay V.@vijaytarian

I'm at #NeurIPS2025. I'm very excited about our Spotlight paper on using rubrics to improve RL training for language models in non-verifiable settings. I can't wait to talk about it (Poster Session 4,Thursday afternoon)!

English

588

Vijay V.@vijaytarian·3 Ara

I'm hoping to connect with folks this week! A few questions I'd love to chat/debate about: 1. What makes a reward model useful for RL? 2. In the world of verifiers, do we still need RMs? 3. Why does synthetic data work? 4. What are the limits of synthetic data? DM or email me!

English

283

Vijay V.@vijaytarian·3 Ara

Vijay V.@vijaytarian

English

5.9K

Vijay V. retweetledi

Amanda Bertsch@abertsch72·7 Kas

Can LLMs accurately aggregate information over long, information-dense texts? Not yet… We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!

English

356

80.7K

Vijay V.@vijaytarian·1 Kas

This shouldn't be controversial. Science requires sharing with the public. If you're never sharing your research, you're not a research scientist. I don't think you have to share it via peer review, but vague musings on Twitter or on a podcast definitely don't count as science

dr. jack morris@jxmnop

my most controversial opinion is that you shouldn’t trust anyone that calls themself an “AI researcher” but has never gotten a first author paper through peer review

English

249

Keşfet

@AlecRad @status_effects @gneubig @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates