Siddhant (Sid) Bhambri

88 posts

Siddhant (Sid) Bhambri

Siddhant (Sid) Bhambri

@sbhambr1

PhD @ Yochan Lab, ASU

Katılım Temmuz 2019
219 Takip Edilen120 Takipçiler
Siddhant (Sid) Bhambri
Siddhant (Sid) Bhambri@sbhambr1·
Paper accepted at #ACL2026! Check out the detailed post below:
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z

#ACL2026 just accepted a paper lead by Yochanites @sbhambr1 & @biswas_2707 that shows, in a Q&A setting, that the intermediate tokens in LRMs (1) don't necessarily need to have user interpretable semantics and (2) distilling models with traces having semantics doesn't necessarily improve accuracy. 1/

English
1
1
12
685
Siddhant (Sid) Bhambri
Siddhant (Sid) Bhambri@sbhambr1·
💡 Are AI agents trained to solve tasks with humans in a team actually cooperating? 🔗Check out our recent work accepted at #AAAI2026 that dives deeper into this question: lnkd.in/gFAfGMwR
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z

What if your cooperative AI agent is actively avoiding you? Despite significant interest in having human and AI agents teaming constructively to solve problems, most work in the area focuses on the bottom line task reward rather than any actual cooperation between the agents. In many cases, where the task can, in principle, be completed by either agent alone albeit with additional burden (i.e., the task doesn't require cooperation), task reward itself doesn't give any indication of whether there is any actual cooperation between the agents. In a paper to be presented at #AAAI2026, Yochanite @biswas_2707 (w/ @PalodVardh12428 and @sbhambr1) develop a novel metric to analyze inter-dependencies between human and AI agents, and use that measure to evaluate cooperation induced by several SOTA AI agents trained for cooperative tasks. We see that most SOTA AI agents that claim to be RL trained for "Zero-shot cooperation" actually don't induce much inter-dependence between the AI and human agents at all. This calls into question the prevalent approach of training AI agents on task reward, and hoping for cooperation to emerge as a side effect!

English
0
0
0
199
Siddhant (Sid) Bhambri
Siddhant (Sid) Bhambri@sbhambr1·
💡 𝐈𝐬 𝐬𝐞𝐦𝐚𝐧𝐭𝐢𝐜 𝐜𝐨𝐫𝐫𝐞𝐜𝐭𝐧𝐞𝐬𝐬 𝐨𝐟 𝐂𝐡𝐚𝐢𝐧 𝐨𝐟 𝐓𝐡𝐨𝐮𝐠𝐡𝐭 𝐭𝐫𝐚𝐜𝐞𝐬 𝐭𝐡𝐞 𝐬𝐚𝐦𝐞 𝐚𝐬 𝐥𝐨𝐜𝐚𝐥 𝐜𝐨𝐡𝐞𝐫𝐞𝐧𝐜𝐞? 🖇️ Check out our recent work critically looking at how trace coherence is impacted by RLVR post-training: lnkd.in/gAYq_s2b
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z

Our recent research efforts have questioned the narrative that the LRM intermediate tokens have semantics (see x.com/rao2z/status/1… ). Some may counter these with "..but I read the traces, and they do seem to make sense.." and claim RLVR post-training must be making the traces correct. We analyze this disconnect in terms of local coherence vs. global validity/correctness of the trace. 1/

English
0
1
1
614
Siddhant (Sid) Bhambri
Siddhant (Sid) Bhambri@sbhambr1·
➡️ 𝘒𝘦𝘺 𝘱𝘢𝘵𝘩𝘸𝘢𝘺𝘴 𝘧𝘰𝘳 𝘥𝘦𝘴𝘪𝘨𝘯𝘪𝘯𝘨 𝘳𝘰𝘣𝘶𝘴𝘵 𝘢𝘯𝘥 𝘳𝘦𝘭𝘪𝘢𝘣𝘭𝘦, 𝘦𝘯𝘥 𝘶𝘴𝘦𝘳-𝘧𝘢𝘤𝘪𝘯𝘨 𝘈𝘐 𝘵𝘩𝘢𝘵 𝘣𝘢𝘭𝘢𝘯𝘤𝘦𝘴 𝘢𝘥𝘷𝘪𝘴𝘢𝘣𝘪𝘭𝘪𝘵𝘺 𝘢𝘯𝘥 𝘦𝘹𝘱𝘭𝘢𝘪𝘯𝘢𝘣𝘪𝘭𝘪𝘵𝘺. #AI #MachineLearning #LLMs #HumanAI
English
0
0
0
44
Siddhant (Sid) Bhambri
Siddhant (Sid) Bhambri@sbhambr1·
➡️ 𝘞𝘩𝘢𝘵 𝘪𝘯𝘵𝘦𝘳𝘱𝘳𝘦𝘵𝘢𝘣𝘪𝘭𝘪𝘵𝘺 𝘢𝘯𝘥 𝘳𝘦𝘢𝘴𝘰𝘯𝘪𝘯𝘨 𝘵𝘳𝘢𝘤𝘦𝘴 𝘳𝘦𝘢𝘭𝘭𝘺 𝘮𝘦𝘢𝘯 𝘧𝘰𝘳 𝘦𝘯𝘥 𝘶𝘴𝘦𝘳𝘴 𝘴𝘦𝘦𝘬𝘪𝘯𝘨 𝘵𝘰 𝘵𝘳𝘶𝘴𝘵 𝘢𝘯𝘥 𝘶𝘯𝘥𝘦𝘳𝘴𝘵𝘢𝘯𝘥 𝘈𝘐 𝘴𝘺𝘴𝘵𝘦𝘮𝘴. (x.com/rao2z/status/1…) (x.com/rao2z/status/1…) (5/n)
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z

Semantics of Intermediate Tokens in Trace-based distillation in Q&A tasks: Yochanites @sbhambr1 and @biswas_2707 looked at distillation on a Q&A task, and found a disconnect between the validity of derivational traces and the correctness of the solution.. 🧵 1/

English
1
0
0
108
Siddhant (Sid) Bhambri
Siddhant (Sid) Bhambri@sbhambr1·
Recent talk at @allen_ai: "𝐑𝐨𝐥𝐞 𝐨𝐟 𝐋𝐚𝐫𝐠𝐞 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐬 𝐢𝐧 𝐇𝐮𝐦𝐚𝐧-𝐀𝐈 𝐈𝐧𝐭𝐞𝐫𝐚𝐜𝐭𝐢𝐨𝐧: 𝐀 𝐂𝐫𝐢𝐭𝐢𝐜𝐚𝐥 𝐀𝐩𝐩𝐫𝐚𝐢𝐬𝐚𝐥". Link:youtube.com/watch?v=rjZUBe… Thanks to @rao2z for guiding this research & to @dsweld for hosting me!🧵(1/n)
YouTube video
YouTube
English
1
2
8
3K
Siddhant (Sid) Bhambri retweetledi
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)
Delighted to share that @sbhambr1 & @v_mudit's critical evaluation and refutation of the reasoning claims of ReACT has been accepted to TMLR @TmlrOrg 👉openreview.net/forum?id=aFAMP…
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) tweet media
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z

📢 ReAct popularized the "Think 🤔" magic by claiming to help LLMs plan by "synergizing reasoning and acting." @v_mudit & @sbhambr1 investigated the claims, and have a thing are two to say about the extreme brittleness of ReAct style prompting. 👉arxiv.org/abs/2405.13966 1/

English
1
3
16
2.1K