Jwala Dhamala (@jwaladhamala) - Twitter Profili

Jwala Dhamala retweetledi

Yang Xu@YangXu_09·7 Şub

LH-DECEPTION, our framework for studying LLM deception in long-horizon interactions, has been accepted at ICLR 2026! 🎉 Most deception benchmarks test LLMs in single-turn settings. But in the real world, AI agents work on extended, interdependent tasks, and deception doesn't always show up in one exchange. It can emerge gradually, compound over turns, and erode trust silently. We built a multi-agent simulation framework: a performer agent completes sequential tasks under event pressure, a supervisor agent evaluates progress and tracks states, and an independent deception auditor reviews the full trajectory to detect when and how deception occurs. We tested 11 frontier LLMs: every single one deceives, but rates vary dramatically: Claude Sonnet-4 at 21.4%, Gemini 2.5 Pro at 24.8%… all the way to DeepSeek V3-0324 at 79.3%. Key findings: 📌Models that look safe on single-turn benchmarks fail badly here, and long-horizon auditing catches 7.1% more deception than per-step auditing. 📌 Deceptive behaviors are more likely under event pressure. Higher stakes will amplify deceptive strategies. 📌 Deception erodes trust: strong negative correlation between deception rate and supervisor trust 📌 Deception compounds. We found "chain of deception" where small deviations escalate into outright fabrication across turns, invisible to single-turn evaluation Grateful to @SharonYixuanLi for her mentorship, and to @xuanmingzhangai and @Samuel861025 for driving this work together. Thanks also to @jwaladhamala, @ousamjah, and @rahul1987iit at @amazon AGI for their support and collaboration. #AI #LLM #Deception #Trust #AIethics #AgenticAI #AIResearch #ICLR2026

English

1

5

9

1.3K

Jwala Dhamala retweetledi

trustnlp@trustnlp·3 Şub

📣 Call for Papers: #TrustNLP2026 is now open! We're seeking submissions on trustworthy, ethical, explainable & robust NLP. 📅 Paper deadline: March 5, 2026 Submit & details ➡️ trustnlpworkshop.github.io #NLP #AI #ResponsibleAI

English

0

10

20

4.7K

Jwala Dhamala retweetledi

Amazon Science@AmazonScience·17 Kas

Calling Trusted AI researchers: Amazon is accepting poster submissions for a one-day symposium on January 21. Papers using #AmazonNova models get priority. Submit by December 31:

English

0

4

13

6.1K

Jwala Dhamala retweetledi

Artificial Analysis@ArtificialAnlys·2 Ara

Amazon is back with Nova 2.0, a substantial upgrade over prior Amazon Nova models and demonstrating particular strength in agentic capabilities Amazon has released Nova 2.0 Pro (Preview), its new flagship model; Nova 2.0 Lite, focused on speed and lower cost; and Nova 2.0 Omni, a multimodal model handling text, image, video and speech inputs with text and image outputs. Key benchmarking takeaways: Amazon back amongst top AI players: This is Amazon’s latest release since Nova Premier and Amazon’s first release of reasoning models. Nova 2.0 Pro jumps 30 points in the Artificial Analysis Intelligence Index over Premier and Lite 38 points. This represents a huge increase in capabilities and Amazon’s return to being amongst the top AI players. Strengths in agentic capabilities: Agentic capabilities including tool calling is a strength of the models, Nova 2.0 Pro scores 93% on τ²-Bench Telecom and 80% on IFBench on medium and high reasoning budgets respectively (complete benchmarks for high reasoning coming soon). This places Nova 2.0 Pro Preview amongst the leading models in these benchmarks. Multimodal: Nova 2.0 Omni is one of few models, alongside most notably the Gemini model series, that can natively handle text, image, video and speech inputs. This is a new differentiator for Amazon’s Nova model series. Competitive pricing: Amazon has priced Nova 2.0 Pro at $1.25/$10 per million input/output tokens, and considering token usage the model took $662 to run our Artificial Analysis Intelligence Index. This is substantially less than other frontier models like Claude 4.5 Sonnet ($817) and Gemini 3 Pro ($1201), but remains above others including Kimi K2 Thinking ($380). Nova 2.0 Lite and Omni are both priced at $0.3/$2.5 per million input/output tokens. See below for further analysis

English

14

55

366

77K

Jwala Dhamala retweetledi

Daniel Kang@ddkang·2 Ara

Our paper Agentic Benchmark Checklist was accepted to NeurIPS! I'll be around, say hi if you see me And swing by our poster, Exhibit Hall C, D, E #2709 on Fri, Dec 5, 4:30–7:30 p.m. PST. @maxYuxuanZhu will be there to present as well

Daniel Kang@ddkang

As AI agents near real-world use, how do we know what they can actually do? Reliable benchmarks are critical but agentic benchmarks are broken! Example: WebArena marks "45+8 minutes" on a duration calculation task as correct (real answer: "63 minutes"). Other benchmarks misestimate agent competence by 1.6-100%. Why are the evaluation foundations for agentic systems fragile? See below for thread and links 1/8

English

0

9

35

14.6K

Jwala Dhamala retweetledi

Amazon Science@AmazonScience·31 Eki

Gen AI is reshaping how software is built. The 2026 #AmazonNovaAIChallenge invites university teams to advance trusted, agentic AI: systems that code, test & deploy safely. Applications open Nov 10, 2025: amzn.to/3JDPQoy

English

1

3

9

1.3K

Jwala Dhamala@jwaladhamala·30 Eki

Excited to share our collaborative work on “simulating and understanding deceptive behaviors of LLM in long-horizon interaction” arxiv.org/abs/2510.03999 #ResponsibleAI #AIResearch #Deception @AmazonScience

Sharon Li@SharonYixuanLi

Deception is one of the most concerning behaviors that advanced AI systems can display. If you are not concerned yet, this paper might change your view. We built a multi-agent framework to study: 👉 How deceptive behaviors can emerge and evolve in LLM agents during realistic long-horizon interactions? 🎯 Motivation When we talk about AI deception, we mean an AI can produce outputs that mislead someone — deliberately or strategically — to achieve a goal or avoid a consequence. Examples: 🕵️ Intentionally hiding part of the truth (to make itself look more successful) 🤐 Giving vague answers so it can’t be blamed later 🧢 Saying something false to pass a “test” or finish a task Most evaluations look at one-shot prompts. But deception doesn’t always show up in a single exchange. It can develop gradually over time — as the model plans, reacts to pressure, or tries to “look good” under supervision. That’s the gap we wanted to study. ---------------------------- 🧪 Our framework We built a multi-agent simulation with three key roles: 1. Performer agent — the agent completing complex, interdependent tasks. 2. Supervisor agent — tracking progress and forming trust judgments as the interaction unfolds. 3. Deception auditor — independently reviewing the entire trajectory to detect deceptive behaviors. This setup enables us to observe not only whether deception occurs, but also how it emerges, escalates, and erodes trust over extended periods. 📊 What we found - Deception is model-dependent — some models are more prone to engage in deceptive strategies than others (see Table for more). - Deceptive behaviors are more likely under event pressure (when the performer faces setbacks or high-stakes conditions). - Deception systematically erodes supervisor trust across long horizons. 🤝 Closing thoughts Our work doesn’t claim to solve deception — it’s a step toward understanding it in more realistic, dynamic settings. We hope this simulation framework becomes a foundation for the field: a practical way to evaluate long-horizon deception, a strong baseline that future safety research can build on, and a concrete tool to help guide governance discussions around responsible AI systems. ------- Huge shout-out to Yang Xu, @xuanmingzhangai, @Samuel861025 for driving this work forward over the last few months. We’re also grateful to our collaborators (@jwaladhamala, Ousmane Dia, @rahul1987iit) at @amazon AGI team supporting and contributing to this work. 📝 “Simulating and Understanding Deceptive Behaviors in Long-Horizon Interactions” 📄 arxiv.org/abs/2510.03999 Code: github.com/deeplearning-w… #AI #LLM #Deception #Trust #AIethics #AgenticAI #AIResearch

English

0

2

479

Jwala Dhamala retweetledi

Sharon Li@SharonYixuanLi·25 Eki

Deception is one of the most concerning behaviors that advanced AI systems can display. If you are not concerned yet, this paper might change your view. We built a multi-agent framework to study: 👉 How deceptive behaviors can emerge and evolve in LLM agents during realistic long-horizon interactions? 🎯 Motivation When we talk about AI deception, we mean an AI can produce outputs that mislead someone — deliberately or strategically — to achieve a goal or avoid a consequence. Examples: 🕵️ Intentionally hiding part of the truth (to make itself look more successful) 🤐 Giving vague answers so it can’t be blamed later 🧢 Saying something false to pass a “test” or finish a task Most evaluations look at one-shot prompts. But deception doesn’t always show up in a single exchange. It can develop gradually over time — as the model plans, reacts to pressure, or tries to “look good” under supervision. That’s the gap we wanted to study. ---------------------------- 🧪 Our framework We built a multi-agent simulation with three key roles: 1. Performer agent — the agent completing complex, interdependent tasks. 2. Supervisor agent — tracking progress and forming trust judgments as the interaction unfolds. 3. Deception auditor — independently reviewing the entire trajectory to detect deceptive behaviors. This setup enables us to observe not only whether deception occurs, but also how it emerges, escalates, and erodes trust over extended periods. 📊 What we found - Deception is model-dependent — some models are more prone to engage in deceptive strategies than others (see Table for more). - Deceptive behaviors are more likely under event pressure (when the performer faces setbacks or high-stakes conditions). - Deception systematically erodes supervisor trust across long horizons. 🤝 Closing thoughts Our work doesn’t claim to solve deception — it’s a step toward understanding it in more realistic, dynamic settings. We hope this simulation framework becomes a foundation for the field: a practical way to evaluate long-horizon deception, a strong baseline that future safety research can build on, and a concrete tool to help guide governance discussions around responsible AI systems. ------- Huge shout-out to Yang Xu, @xuanmingzhangai, @Samuel861025 for driving this work forward over the last few months. We’re also grateful to our collaborators (@jwaladhamala, Ousmane Dia, @rahul1987iit) at @amazon AGI team supporting and contributing to this work. 📝 “Simulating and Understanding Deceptive Behaviors in Long-Horizon Interactions” 📄 arxiv.org/abs/2510.03999 Code: github.com/deeplearning-w… #AI #LLM #Deception #Trust #AIethics #AgenticAI #AIResearch

English

16

49

224

27.2K

Jwala Dhamala@jwaladhamala·25 Tem

Rigorous agents evaluation starts with robust benchmarks that can accurately measure an agent's capabilities and flaws. Check out our recent work led by @ddkang, @maxYuxuanZhu on best practices for building rigorous agentic benchmarks: arxiv.org/abs/2507.02825

Daniel Kang@ddkang

As AI agents near real-world use, how do we know what they can actually do? Reliable benchmarks are critical but agentic benchmarks are broken! Example: WebArena marks "45+8 minutes" on a duration calculation task as correct (real answer: "63 minutes"). Other benchmarks misestimate agent competence by 1.6-100%. Why are the evaluation foundations for agentic systems fragile? See below for thread and links 1/8

English

0

371

Jwala Dhamala retweetledi

trustnlp@trustnlp·29 Oca

✨Interested in becoming a reviewer for the fifth TrustNLP workshop at #NAACL2025? Sign up using the form 👇 forms.gle/H7T6hK3194CqHc…

English

0

9

8

3.2K

Jwala Dhamala retweetledi

trustnlp@trustnlp·13 Oca

We're thrilled to announce that the Fifth Workshop on Trustworthy NLP (TrustNLP) is coming to #NAACL this year! 🥳 🚀 The Call for Papers (CFP) is now live—don’t miss your chance to contribute! Stay tuned for updates and visit our website: trustnlpworkshop.github.io

English

0

8

11

10.6K

Jwala Dhamala retweetledi

David Rolnick@david_rolnick·1 Eki

We are proud to announce the AMI Dataset, a benchmark for fine-grained insect identification in the wild - published this week at ECCV and the product of a global consortium of computer scientists and entomologists. arxiv.org/abs/2406.12452 🧵

English

4

13

47

4.9K

Jwala Dhamala retweetledi

Anaelia (Elia) Ovalle@ovalle_elia·21 Haz

📢Please join us tomorrow for the #NAACL2024 @trustnlp workshop, starting at 9AM in Don Alberto 4 ~ 📅 Workshop schedule found below! 🌐But if you can't make it, dont worry - TrustNLP accepted papers are now up on our website, check it out😊trustnlpworkshop.github.io

English

0

11

19

5.9K

Jwala Dhamala retweetledi

Anaelia (Elia) Ovalle@ovalle_elia·25 Nis

📢 NAACL Findings are welcome to submit (non-archival) to the #NAACL2024 @trustnlp workshop! Accepted papers will be presented alongside archival submissions. If interested, please fill out the form below ⬇️ ! 📝 Form: forms.gle/TFB9LmkZTRKoii… 🗓️ Deadline: May 1 2024

English

0

10

17

3.9K

Jwala Dhamala retweetledi

Ninareh Mehrabi@NinarehMehrabi·23 Tem

I will be at ICML for co-organizing this workshop event. Feel free to reach out if you want to chat or make new friends 😊

TEACH Workshop @ICML2023@teach_icml2023

There's only 10 days left until our workshop! We can't wait to see you all there 🤩 Check our website for the latest schedule and details of the talks: sites.google.com/view/teach-icm… Don't miss out on our amazing lineup of speakers who will share you the latest in conversational AI!

English

0

6

15

2.6K

Jwala Dhamala retweetledi

Caleb Ziems@cjziems·12 Tem

@WilliamBarrHeld and I will be presenting Multi-VALUE in-person today at #ACL2023 with @JingfengY, @jwaladhamala, @Diyi_Yang. Time: TODAY (July 12th) 11:00-12:30 ET MiniConf: virtual2023.aclweb.org/paper_P2331.ht… Use Multi-VALUE to eval dialect performance gaps + increase dialect robustness!

Caleb Ziems@cjziems

Multi-VALUE is a toolkit to evaluate and mitigate performance gaps in NLP systems for multiple English dialects. We release scalable tools for introducing language variation, which you can use to stress test your models and increase their robustness value-nlp.org 🧵

English

0

8

18

7.4K

Jwala Dhamala retweetledi

trustnlp@trustnlp·10 Tem

Introducing our second speaker at @trustnlp #ACL2023NLP Ramprasaath is a Scientist at Apple TDG (Technology Development Group). Prior to this, he was a Sr. Research Scientist at Salesforce.

English

0

1

3

281

Jwala Dhamala retweetledi

DeepAI@DeepAI·26 Kas

Is the Elephant Flying? Resolving Ambiguities in Text-to-Image Generative Models deepai.org/publication/is… by @NinarehMehrabi et al. including @aram_galstyan, @Rahul_tragu #Computation #Language

English

0

3

2

0

Jwala Dhamala retweetledi

Kai-Wei Chang@kaiwei_chang·9 Tem

I’m at #ACL2023 this week. @uclanlp members 🐻 and my collaborators at @AmazonScience will present the following papers at the conferences on the topics around trustworthy NLP, vision-language, and language+reasoning. See details at web.cs.ucla.edu/~kwchang/blog/… 🧵👇

English

3

21

126

13.3K

Jwala Dhamala retweetledi

trustnlp@trustnlp·11 Tem

Excited to announce our third speaker @trustnlp 2023 #ACL2023NLP Rachel Rudinger (@rachelrudinger ) is an Assistant Professor in the Department of Computer Science at the University of Maryland, College Park.

English

0

6

27

7K

Jwala Dhamala

Keşfet