Adithya Bhaskar

80 posts

Adithya Bhaskar

@AdithyaNLP

Third year CS PhD candidate at Princeton University (@princeton_nlp @PrincetonPLI), previously CS undergrad at IIT Bombay

Princeton, NJ شامل ہوئے Haziran 2023

498 فالونگ462 فالوورز

پن کیا گیا ٹویٹ

Adithya Bhaskar@AdithyaNLP·25 Eyl

Language models that think, chat better. We used longCoT (w/ reward model) for RLHF instead of math, and it just works. Llama-3.1-8B-Instruct + 14K ex beats GPT-4o (!) on chat & creative writing, & even Claude-3.7-Sonnet (thinking) on AlpacaEval2 and WildBench! Read on. 🧵 1/8

English

110

27K

Adithya Bhaskar ری ٹویٹ کیا

dr. jack morris@jxmnop·21 Şub

it always disappointed me that such a small subset of mathematical ideas matter for AI i miss doing real math

English

1.5K

86.1K

Adithya Bhaskar ری ٹویٹ کیا

Yinghui He@yinghui_he_·2 Şub

STAT has been accepted to ICLR 2026! See you in Brazil 🇧🇷 Skill-Targeted Adaptive Training (STAT) is a continual learning method that squeezes out 🚨 7~10% more performance on extensively trained models like Qwen. It constructs a 🧩 Missing-Skill-Profile for each model based on what skills the model lacks in their responses, and adaptively curates post-training data accordingly. Check out our Blog Post 👉 ying-hui-he.github.io/Skill-Targeted… 🔗arXiv : arxiv.org/abs/2510.10023 💻GitHub: github.com/princeton-pli/…

English

202

28.4K

Adithya Bhaskar ری ٹویٹ کیا

Xindi Wu@cindy_x_wu·20 Oca

New #NVIDIA Paper We introduce Motive, a motion-centric, gradient-based data attribution method that traces which training videos help or hurt video generation. By isolating temporal dynamics from static appearance, Motive identifies which training videos shape motion in video generation. 🔗 research.nvidia.com/labs/sil/proje… 1/10

English

112

540

72.9K

Adithya Bhaskar@AdithyaNLP·1 Ara

@suchenzang Hey Susan, would love to chat at NeurIPS. Tried DMing you but got a popup telling me I need to be verified to do that!

English

152

Susan Zhang@suchenzang·1 Ara

i'm in san diego this week! dm to say hi irl if you're also around :) also, a throwback to the first NeurIPS i ever attended, and the 2007 paper that won the test of time that year:

English

184

19.7K

Adithya Bhaskar@AdithyaNLP·1 Ara

@LiuZuxin Hi Zuxin, would love to chat! Can’t DM you as I don’t have premium so commenting instead.

English

188

Zuxin Liu@LiuZuxin·1 Ara

I’ll be at #NeurIPS2025 from Dec 1–6 👋 If you’re around and want to chat about agents, RL, or reasoning models, feel free to ping me and say hi!

English

8.9K

Adithya Bhaskar@AdithyaNLP·1 Ara

@nikishin_evg Hi Evgenii, would love to chat at NeurIPS!

English

190

Evgenii Nikishin@nikishin_evg·1 Ara

Visiting San Diego for NeurIPS from Dec 3 till Dec 7. Let's grab coffee!

English

7.1K

Adithya Bhaskar@AdithyaNLP·30 Kas

@VincentMoens @NeurIPSConf Hey Vincent, I would love to chat at NeurIPS. It seems that I can't DM you here without a premium account, so commenting instead.

English

273

vmoens@VincentMoens·30 Kas

I’ll be at @NeurIPSConf in San Diego next week, dm if you want to chat!

English

4.1K

Adithya Bhaskar@AdithyaNLP·30 Kas

@srush_nlp @WenhuChen Thanks! I think I was a bit late, they are all booked again 😅

English

242

Sasha Rush@srush_nlp·30 Kas

@WenhuChen @AdithyaNLP Oh man, looks like they all got booked. I’ll add some more times

English

1.4K

Sasha Rush@srush_nlp·30 Kas

At NeurIPS next week. Interested in post training: long horizon, better search, agent formalisms, non-scalar rewards. calendly.com/srush-research…

English

284

38.8K

Adithya Bhaskar@AdithyaNLP·30 Kas

@WenhuChen Hi Wenhu, I would love to chat at NeurIPS. It appears that I cannot message you here without a premium account, so commenting instead.

English

380

Wenhu Chen@WenhuChen·30 Kas

I will attending NeurIPS from Dec 2nd to Dec 4th. Happy to chat about anything related to LLM/Agent/Multimodal research and career decisions! We have 3 spotlight papers and 2 posters at the main conference.

English

116

12.3K

Adithya Bhaskar@AdithyaNLP·28 Kas

I will be at NeurIPS 2025 from 12/2 to 12/7. These days, I am most interested in bridging mid-training and post-training (of LLMs). Hit me up if you want to chat!

English

559

Adithya Bhaskar@AdithyaNLP·21 Kas

@harshit_sikchi @OpenAI Hi Harshit, would love to catch up over a coffee at NeurIPS!

English

127

Harshit Sikchi@harshit_sikchi·20 Kas

I ll be attending #NeurIPS2025. Say hi! Happy to chat about RL, LLMs and life at @OpenAI.

English

163

16.5K

Adithya Bhaskar ری ٹویٹ کیا

William Yang@YangWilliam_·31 Eki

Text-to-image (T2I) models can generate rich supervision for visual learning but generating subtle distinctions still remains challenging. Fine-tuning helps, but too much tuning → overfitting and loss of diversity. How do we preserve fidelity without sacrificing diversity (1/8)

English

23.3K

Adithya Bhaskar ری ٹویٹ کیا

Yinghui He@yinghui_he_·20 Eki

Claude Skills shows performance benefits from leveraging LLM skill catalogs at inference time. Our previous work (linked under thread 5/5) showed the same 6 months ago! 🌟Our new work, STAT, shows that leveraging skills during training can greatly help too‼️, e.g., Qwen can continue to learn new tricks from Hendrycks MATH, which it had been over-trained on. 🚨 We introduce Skill-Targeted Adaptive Training (STAT), which uses a supervisor model and a skill catalog to construct a 🧩Missing-Skill-Profile for each student model, and then modifies training to squeeze out >=7% more performance! The intervention can be as simple as reweighting existing training sets. You can also think of this as a more effective distillation method. More in threads 🧵 📎 [arxiv]: arxiv.org/abs/2510.10023 💻 [github]: github.com/princeton-pli/… 🥳 Amazing collaborators: @Abhishek_034, @Yong18850571, @prfsanjeevarora

English

201

55.5K

Adithya Bhaskar@AdithyaNLP·1 Eki

@xiye_nlp and I have been using tinker to run some experiments for our recent paper (go check it out!), and we can attest that it is really convenient! - Don't have to worry about moving stuff to devices, OOMs, etc. - Great conceptual modularity - Great throughput!

Thinking Machines@thinkymachines

Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models! thinkingmachines.ai/tinker

English

Adithya Bhaskar ری ٹویٹ کیا

Xi Ye@xiye_nlp·29 Eyl

Check out our new work on making reasoning models think broadly! 🤔 We find a minimalist, surprisingly effective recipe to THINK for CHAT: RLVR + a strong reward model, trained on real-world prompts. This project was fun and surprised me in a few ways 👇 📌 We can run RL directly on a base model (no SFT), showing base models might already chat well. Llama-3.1-8B-Base with only 7K prompts ends up chatting well, matching Llama-3.1-8B-Instruct. This is interesting since Instruct was trained with a complex multi-stage pipeline. Also nice to see this working on Llama, while most RLVR papers only show success on Qwen. 📌 Interesting findings about rewards. Leaderboard scores of reward models aren’t always the best indicator of downstream performance. We also tested checklist-based rewards, which helps on synthetic instruction-following tasks (IFEval) but didn’t generalize well to chat. I still believe in this direction, and would love to see more open-source efforts. 📌 Real user prompts (shout out to WildChat @wzhao_nlp ) were the most effective. These prompts often require “thinking before answering,” which makes them fit for teaching models general thinking. The recipe is simple, we need good ingredients to cook better. 📌 Algorithms, like GRPO vs PPO, has a bigger impact when training directly from base models, but once warm-started with SFT, models are less sensitive to the choice. Overall, my feeling is: if we start with a strong base LM, and put it in the right “chat environment” (good prompts + good rewards), simple RL training goes a long way. Thus we are quite excited to explore more on pretraining and reward design!

Adithya Bhaskar@AdithyaNLP

English

18.3K

Adithya Bhaskar@AdithyaNLP·29 Eyl

Thanks for tweeting our paper!! 😁

Rohan Paul@rohanpaul_ai

The paper shows that making models think before answering makes them chat better. It introduces reinforcement learning with model rewarded thinking, RLMT, which makes the model write a private plan, then the final reply. A separate reward model, trained from human choices, scores each reply so higher scoring answers get reinforced. Training uses group relative policy optimization, GRPO, which samples several replies per prompt and pushes the model toward the better than average ones. They try 2 setups, a warm start that learns the format from teacher examples, and a zero setup that skips supervised fine tuning. The prompts come from real chat requests, not math or code, so the model practices everyday tasks. Across Llama-3.1-8B and Qwen-2.5-7B, the thinking versions beat non thinking baselines on chat and writing by about 3 to 8 points. One 8B model beats GPT-4o on chat and creative writing, and is close to Claude-3.7-Sonnet. Language_Models_that_Think_Chat… The thinking style shifts from rigid checklists to listing constraints, grouping themes, checking edge cases, and then refining. Gains depend on a strong reward judge and a chat heavy prompt mix, which pushes benefits beyond verifiable tasks like math and code. ---- Paper – arxiv. org/abs/2509.20357 Paper Title: "Language Models that Think, Chat Better"

English

334

Adithya Bhaskar@AdithyaNLP·29 Eyl

Honored to be included in the list, thanks a lot!

DAIR.AI@dair_ai

7. Language Models that Think, Chat Better A simple recipe, RL with Model-rewarded Thinking, makes small open models “plan first, answer second” on regular chat prompts and trains them with online RL against a preference reward. x.com/omarsar0/statu…

English

618

Adithya Bhaskar ری ٹویٹ کیا

DAIR.AI@dair_ai·28 Eyl

Top AI Papers of The Week (September 22-28): - ATOKEN - LLM-JEPA - Code World Model - Teaching LLMs to Plan - Agents Research Environments - Language Models that Think, Chat Better - Embodied AI: From LLMs to World Models Read on for more:

English

287

42.4K

Adithya Bhaskar@AdithyaNLP·29 Eyl

Thanks for your kind words!

Manish Kulariya@MKulria

Ever wonder why some AI chats feel robotic while others nail it? This new paper introduces a game-changer: Language Models that Think, Chat Better. They train AIs to "think" step-by-step before replying, crushing benchmarks. Mind blown? Let's dive in 👇

English

698

دریافت کریں

@suchenzang @LiuZuxin @nikishin_evg @VincentMoens @NeurIPSConf @srush_nlp @WenhuChen @harshit_sikchi