Jwala Dhamala

159 posts

Jwala Dhamala banner
Jwala Dhamala

Jwala Dhamala

@jwaladhamala

Scientist at Alexa AI-NU (she/her) #MachineLearning #NLProc #Fairness #RobustAI #DeepLearning #UncertaintyQuantification

Boston, MA Katılım Haziran 2011
337 Takip Edilen258 Takipçiler
Jwala Dhamala retweetledi
Yang Xu
Yang Xu@YangXu_09·
LH-DECEPTION, our framework for studying LLM deception in long-horizon interactions, has been accepted at ICLR 2026! 🎉 Most deception benchmarks test LLMs in single-turn settings. But in the real world, AI agents work on extended, interdependent tasks, and deception doesn't always show up in one exchange. It can emerge gradually, compound over turns, and erode trust silently. We built a multi-agent simulation framework: a performer agent completes sequential tasks under event pressure, a supervisor agent evaluates progress and tracks states, and an independent deception auditor reviews the full trajectory to detect when and how deception occurs. We tested 11 frontier LLMs: every single one deceives, but rates vary dramatically: Claude Sonnet-4 at 21.4%, Gemini 2.5 Pro at 24.8%… all the way to DeepSeek V3-0324 at 79.3%. Key findings: 📌Models that look safe on single-turn benchmarks fail badly here, and long-horizon auditing catches 7.1% more deception than per-step auditing. 📌 Deceptive behaviors are more likely under event pressure. Higher stakes will amplify deceptive strategies. 📌 Deception erodes trust: strong negative correlation between deception rate and supervisor trust 📌 Deception compounds. We found "chain of deception" where small deviations escalate into outright fabrication across turns, invisible to single-turn evaluation Grateful to @SharonYixuanLi for her mentorship, and to @xuanmingzhangai and @Samuel861025 for driving this work together. Thanks also to @jwaladhamala, @ousamjah, and @rahul1987iit at @amazon AGI for their support and collaboration. #AI #LLM #Deception #Trust #AIethics #AgenticAI #AIResearch #ICLR2026
Yang Xu tweet mediaYang Xu tweet mediaYang Xu tweet media
English
1
5
9
1.3K
Jwala Dhamala retweetledi
Amazon Science
Amazon Science@AmazonScience·
Calling Trusted AI researchers: Amazon is accepting poster submissions for a one-day symposium on January 21. Papers using #AmazonNova models get priority. Submit by December 31:
English
0
4
13
6.1K
Jwala Dhamala retweetledi
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
Amazon is back with Nova 2.0, a substantial upgrade over prior Amazon Nova models and demonstrating particular strength in agentic capabilities Amazon has released Nova 2.0 Pro (Preview), its new flagship model; Nova 2.0 Lite, focused on speed and lower cost; and Nova 2.0 Omni, a multimodal model handling text, image, video and speech inputs with text and image outputs. Key benchmarking takeaways: Amazon back amongst top AI players: This is Amazon’s latest release since Nova Premier and Amazon’s first release of reasoning models. Nova 2.0 Pro jumps 30 points in the Artificial Analysis Intelligence Index over Premier and Lite 38 points. This represents a huge increase in capabilities and Amazon’s return to being amongst the top AI players. Strengths in agentic capabilities: Agentic capabilities including tool calling is a strength of the models, Nova 2.0 Pro scores 93% on τ²-Bench Telecom and 80% on IFBench on medium and high reasoning budgets respectively (complete benchmarks for high reasoning coming soon). This places Nova 2.0 Pro Preview amongst the leading models in these benchmarks. Multimodal: Nova 2.0 Omni is one of few models, alongside most notably the Gemini model series, that can natively handle text, image, video and speech inputs. This is a new differentiator for Amazon’s Nova model series. Competitive pricing: Amazon has priced Nova 2.0 Pro at $1.25/$10 per million input/output tokens, and considering token usage the model took $662 to run our Artificial Analysis Intelligence Index. This is substantially less than other frontier models like Claude 4.5 Sonnet ($817) and Gemini 3 Pro ($1201), but remains above others including Kimi K2 Thinking ($380). Nova 2.0 Lite and Omni are both priced at $0.3/$2.5 per million input/output tokens. See below for further analysis
Artificial Analysis tweet media
English
14
55
366
77K
Jwala Dhamala retweetledi
Jwala Dhamala retweetledi
Amazon Science
Amazon Science@AmazonScience·
Gen AI is reshaping how software is built. The 2026 #AmazonNovaAIChallenge invites university teams to advance trusted, agentic AI: systems that code, test & deploy safely. Applications open Nov 10, 2025: amzn.to/3JDPQoy
English
1
3
9
1.3K
Jwala Dhamala
Jwala Dhamala@jwaladhamala·
Excited to share our collaborative work on “simulating and understanding deceptive behaviors of LLM in long-horizon interaction” arxiv.org/abs/2510.03999 #ResponsibleAI #AIResearch #Deception @AmazonScience
Sharon Li@SharonYixuanLi

Deception is one of the most concerning behaviors that advanced AI systems can display. If you are not concerned yet, this paper might change your view. We built a multi-agent framework to study: 👉 How deceptive behaviors can emerge and evolve in LLM agents during realistic long-horizon interactions? 🎯 Motivation When we talk about AI deception, we mean an AI can produce outputs that mislead someone — deliberately or strategically — to achieve a goal or avoid a consequence. Examples: 🕵️ Intentionally hiding part of the truth (to make itself look more successful) 🤐 Giving vague answers so it can’t be blamed later 🧢 Saying something false to pass a “test” or finish a task Most evaluations look at one-shot prompts. But deception doesn’t always show up in a single exchange. It can develop gradually over time — as the model plans, reacts to pressure, or tries to “look good” under supervision. That’s the gap we wanted to study. ---------------------------- 🧪 Our framework We built a multi-agent simulation with three key roles: 1. Performer agent — the agent completing complex, interdependent tasks. 2. Supervisor agent — tracking progress and forming trust judgments as the interaction unfolds. 3. Deception auditor — independently reviewing the entire trajectory to detect deceptive behaviors. This setup enables us to observe not only whether deception occurs, but also how it emerges, escalates, and erodes trust over extended periods. 📊 What we found - Deception is model-dependent — some models are more prone to engage in deceptive strategies than others (see Table for more). - Deceptive behaviors are more likely under event pressure (when the performer faces setbacks or high-stakes conditions). - Deception systematically erodes supervisor trust across long horizons. 🤝 Closing thoughts Our work doesn’t claim to solve deception — it’s a step toward understanding it in more realistic, dynamic settings. We hope this simulation framework becomes a foundation for the field: a practical way to evaluate long-horizon deception, a strong baseline that future safety research can build on, and a concrete tool to help guide governance discussions around responsible AI systems. ------- Huge shout-out to Yang Xu, @xuanmingzhangai, @Samuel861025 for driving this work forward over the last few months. We’re also grateful to our collaborators (@jwaladhamala, Ousmane Dia, @rahul1987iit) at @amazon AGI team supporting and contributing to this work. 📝 “Simulating and Understanding Deceptive Behaviors in Long-Horizon Interactions” 📄 arxiv.org/abs/2510.03999 Code: github.com/deeplearning-w… #AI #LLM #Deception #Trust #AIethics #AgenticAI #AIResearch

English
0
0
2
479
Jwala Dhamala retweetledi
Sharon Li
Sharon Li@SharonYixuanLi·
Deception is one of the most concerning behaviors that advanced AI systems can display. If you are not concerned yet, this paper might change your view. We built a multi-agent framework to study: 👉 How deceptive behaviors can emerge and evolve in LLM agents during realistic long-horizon interactions? 🎯 Motivation When we talk about AI deception, we mean an AI can produce outputs that mislead someone — deliberately or strategically — to achieve a goal or avoid a consequence. Examples: 🕵️ Intentionally hiding part of the truth (to make itself look more successful) 🤐 Giving vague answers so it can’t be blamed later 🧢 Saying something false to pass a “test” or finish a task Most evaluations look at one-shot prompts. But deception doesn’t always show up in a single exchange. It can develop gradually over time — as the model plans, reacts to pressure, or tries to “look good” under supervision. That’s the gap we wanted to study. ---------------------------- 🧪 Our framework We built a multi-agent simulation with three key roles: 1. Performer agent — the agent completing complex, interdependent tasks. 2. Supervisor agent — tracking progress and forming trust judgments as the interaction unfolds. 3. Deception auditor — independently reviewing the entire trajectory to detect deceptive behaviors. This setup enables us to observe not only whether deception occurs, but also how it emerges, escalates, and erodes trust over extended periods. 📊 What we found - Deception is model-dependent — some models are more prone to engage in deceptive strategies than others (see Table for more). - Deceptive behaviors are more likely under event pressure (when the performer faces setbacks or high-stakes conditions). - Deception systematically erodes supervisor trust across long horizons. 🤝 Closing thoughts Our work doesn’t claim to solve deception — it’s a step toward understanding it in more realistic, dynamic settings. We hope this simulation framework becomes a foundation for the field: a practical way to evaluate long-horizon deception, a strong baseline that future safety research can build on, and a concrete tool to help guide governance discussions around responsible AI systems. ------- Huge shout-out to Yang Xu, @xuanmingzhangai, @Samuel861025 for driving this work forward over the last few months. We’re also grateful to our collaborators (@jwaladhamala, Ousmane Dia, @rahul1987iit) at @amazon AGI team supporting and contributing to this work. 📝 “Simulating and Understanding Deceptive Behaviors in Long-Horizon Interactions” 📄 arxiv.org/abs/2510.03999 Code: github.com/deeplearning-w… #AI #LLM #Deception #Trust #AIethics #AgenticAI #AIResearch
Sharon Li tweet media
English
16
49
224
27.2K
Jwala Dhamala retweetledi
trustnlp
trustnlp@trustnlp·
We're thrilled to announce that the Fifth Workshop on Trustworthy NLP (TrustNLP) is coming to #NAACL this year! 🥳 🚀 The Call for Papers (CFP) is now live—don’t miss your chance to contribute! Stay tuned for updates and visit our website: trustnlpworkshop.github.io
English
0
8
11
10.6K
Jwala Dhamala retweetledi
David Rolnick
David Rolnick@david_rolnick·
We are proud to announce the AMI Dataset, a benchmark for fine-grained insect identification in the wild - published this week at ECCV and the product of a global consortium of computer scientists and entomologists. arxiv.org/abs/2406.12452 🧵
David Rolnick tweet media
English
4
13
47
4.9K
Jwala Dhamala retweetledi
Anaelia (Elia) Ovalle
Anaelia (Elia) Ovalle@ovalle_elia·
📢Please join us tomorrow for the #NAACL2024 @trustnlp workshop, starting at 9AM in Don Alberto 4 ~ 📅 Workshop schedule found below! 🌐But if you can't make it, dont worry - TrustNLP accepted papers are now up on our website, check it out😊trustnlpworkshop.github.io
Anaelia (Elia) Ovalle tweet media
English
0
11
19
5.9K
Jwala Dhamala retweetledi
Anaelia (Elia) Ovalle
Anaelia (Elia) Ovalle@ovalle_elia·
📢 NAACL Findings are welcome to submit (non-archival) to the #NAACL2024 @trustnlp workshop! Accepted papers will be presented alongside archival submissions. If interested, please fill out the form below ⬇️ ! 📝 Form: forms.gle/TFB9LmkZTRKoii… 🗓️ Deadline: May 1 2024
English
0
10
17
3.9K
Jwala Dhamala retweetledi
Ninareh Mehrabi
Ninareh Mehrabi@NinarehMehrabi·
I will be at ICML for co-organizing this workshop event. Feel free to reach out if you want to chat or make new friends 😊
TEACH Workshop @ICML2023@teach_icml2023

There's only 10 days left until our workshop! We can't wait to see you all there 🤩 Check our website for the latest schedule and details of the talks: sites.google.com/view/teach-icm… Don't miss out on our amazing lineup of speakers who will share you the latest in conversational AI!

English
0
6
15
2.6K
Jwala Dhamala retweetledi
Jwala Dhamala retweetledi
trustnlp
trustnlp@trustnlp·
Introducing our second speaker at @trustnlp #ACL2023NLP Ramprasaath is a Scientist at Apple TDG (Technology Development Group). Prior to this, he was a Sr. Research Scientist at Salesforce.
trustnlp tweet media
English
0
1
3
281
Jwala Dhamala retweetledi
trustnlp
trustnlp@trustnlp·
Excited to announce our third speaker @trustnlp 2023 #ACL2023NLP Rachel Rudinger (@rachelrudinger ) is an Assistant Professor in the Department of Computer Science at the University of Maryland, College Park.
trustnlp tweet media
English
0
6
27
7K