Samuel Simko @ ICML 2026 (@SimkoSamuel) - Twitter Profili

Sabitlenmiş Tweet

Samuel Simko @ ICML 2026@SimkoSamuel·6 Tem

Excited to present "Training with Honeypots" 🍯 at #ICML2026 tomorrow (Tuesday 10:30AM in Hall A, Poster #1601), an approach for adversarial defense that makes successful jailbreaks less useful to attackers! #AISafety 👉 icml.cc/virtual/2026/p…

English

3

9

30

2.1K

Samuel Simko @ ICML 2026 retweetledi

Jinesis Lab (UToronto)@JinesisLab·1d

Our Jinesis Lab at the University of Toronto is bringing three exciting papers to #COLM2026! 🎉 These projects push the frontiers of #AISafety, #AgentSecurity, and #AIForScience.🤖 Huge congratulations to all collaborators and co-authors! Papers: 🛡️ODILE: Orthogonal Disruption of Injected Tool-Call Embeddings for Agentic Prompt Injection Defense 🔭Stargazer: A Scalable Model-fitting Benchmark Environment for AI Agents under Astrophysical Constraints ⚠️One Word at a Time: Incremental Completion Decomposition Breaks LLM Safety @UofT,@VectorInst,@EuroSafeAI,@ETH_en,@MPI_IS,@UMich

English

0

3

17

716

Samuel Simko @ ICML 2026 retweetledi

Jinesis Lab (UToronto)@JinesisLab·3d

🌟 What an incredible experience at #ICML2026! 🌟 Our team at Jinesis Lab had a fantastic time presenting research across several exciting and important areas: 🤝 Multi-agent LLMs 🧠 Causal reasoning 🛡️ AI safety We were proud to share this work alongside our collaborators and to engage with so many researchers from across the machine learning community. 💡 Just as valuable as presenting our own work was learning from the research others brought to ICML. From thought-provoking talks and innovative poster presentations to insightful discussions and spontaneous hallway conversations, the conference was full of ideas that challenged our thinking and opened new directions for future work. ✨ We are grateful to everyone who attended our presentations, visited our posters, shared feedback, and took the time to discuss their own research with us. 🙌 A huge thank-you to our lab members, collaborators, and the broader ICML community for making the experience so memorable. We are leaving inspired, energized, and excited to build on the ideas and connections formed throughout the conference. 🌍🔬🚀 #ICML #MachineLearning #ArtificialIntelligence #MultiAgentSystems #LLM #CausalReasoning #AISafety #ResponsibleAI #AIResearch #ResponsibleAI #AIResearch

English

0

5

28

2.4K

Samuel Simko @ ICML 2026 retweetledi

Jinesis Lab (UToronto)@JinesisLab·7 Tem

🚨Check out our #ICML2026 Spotlight Paper, "AI Poses Risks to Democratic and Social System"! 📍Poster Session: Wednesday, July 8 • 10:30 AM – 12:15 PM (KST) • Hall A, Poster #3015 What if every AI model were perfectly safe, but society still wasn't? TL;DR: We argue that safe models do not necessarily lead to safe societies. Our paper introduces sociopolitical risk, a framework for understanding risks that emerge from AI deployment at a societal scale, and argues that model-level safety should be complemented with system-level evaluations and governance. Presenting authors @ ICML: @ZhijingJin @davidguzman1120 @PepijnCobben @x_angelohuang @ChanglingXavier @SimkoSamuel @TerryJCZhang 1/2

English

1

4

14

1.4K

Samuel Simko @ ICML 2026 retweetledi

David Guzman 📍ICML2026@davidguzman1120·7 Tem

I'm at #ICML2026 this week! Tomorrow I'm presenting our #spotlight paper , "Safe Models Do Not Guarantee Safe Societies." Even a perfectly aligned AI can quietly wear down the institutions a democracy runs on.

English

1

2

8

311

Samuel Simko @ ICML 2026@SimkoSamuel·7 Tem

Happening right now! Come say hi 👋 Hall A, #1601

Samuel Simko @ ICML 2026@SimkoSamuel

Excited to present "Training with Honeypots" 🍯 at #ICML2026 tomorrow (Tuesday 10:30AM in Hall A, Poster #1601), an approach for adversarial defense that makes successful jailbreaks less useful to attackers! #AISafety 👉 icml.cc/virtual/2026/p…

English

0

3

8

757

Samuel Simko @ ICML 2026@SimkoSamuel·6 Tem

Under strong embedding-space and RL attacks, our method reduces both how often jailbreaks succeed and how useful successful jailbreaks are to attackers. Many thanks to @psyonp @ZhijingJin @bschoelkopf @ETH_en @EuroSafeAI @MPI_IS @UofTCompSci

English

0

2

5

141

Samuel Simko @ ICML 2026@SimkoSamuel·6 Tem

So we introduce a second line of defense. During training, we make the model prefer honeypot responses over truly harmful ones, so that if jailbreaks still succeed, the resulting outputs are less useful to humans. Technically, we add a DPO-style regularizer during adversarial defense training. The model learns to slightly prefer honeypot responses over truly harmful ones, while both remain unlikely overall. This can be added on top of strong inner defenses like circuit breakers.

English

1

3

124

Samuel Simko @ ICML 2026@SimkoSamuel·6 Tem

Excited to present "Training with Honeypots" 🍯 at #ICML2026 tomorrow (Tuesday 10:30AM in Hall A, Poster #1601), an approach for adversarial defense that makes successful jailbreaks less useful to attackers! #AISafety 👉 icml.cc/virtual/2026/p…

English

3

9

30

2.1K

Samuel Simko @ ICML 2026@SimkoSamuel·6 Tem

When a jailbreak succeeds, not all failures are equal. Some answers are wrong, vague, or not useful, while others are actionable enough to help someone cause real-world harm. We find that unprotected harmful model outputs often skew toward the more severe and actionable side. At the same time, automated judges often struggle to tell what is actually useful for harm. In a blatant case, a nonsensical hacking answer like “tap the control key three times” can still be judged harmful by systems such as StrongREJECT.

English

0

1

2

94

Samuel Simko @ ICML 2026 retweetledi

Thomas Bloom@thomasfbloom·23 Haz

In this new age of AI-generated papers and proofs (which can, sometimes at least, be both correct and interesting) we have a problem about where to host such papers. Here is one proposed solution that was sent to me recently: Project Diderot. projectdiderot.com/about

English

5

16

97

10.2K

Samuel Simko @ ICML 2026 retweetledi

Rada Mihalcea@radamihalcea·10 Haz

🌍Very excited to launch AI Explorers—a global research & mentorship program for pre-doctoral students, aiming to help participants build stronger PhD applications through mentored research & academic guidance. For our 2026 cohort, we especially encourage applicants from Africa and Latin America. Apply here: theaiexplorers.org

English

3

33

128

11.3K

Samuel Simko @ ICML 2026 retweetledi

Bernhard Schölkopf@bschoelkopf·12 Haz

If you are a PhD student or early-career researcher, apply before the deadline: June 21, 2026. Aug 31 – Sep 11 · Preliminary speaker list and applications: lnkd.in/ekrPt4m9Amazin…, this is the 50th MLSS, and it coincides with the 25th anniversary of our lab at Max Planck (2/3).

English

1

27

224

56.1K

Samuel Simko @ ICML 2026 retweetledi

FAR.AI@farairesearch·4 Haz

Open-weight LLMs ship with safety training that can be stripped in a few hundred fine-tuning steps. Can current defenses stop this? We built and open-sourced TamperBench, the first unified framework for evaluating tamper resistance, and the answer is mostly no. 1/7

English

1

15

31

4.9K

Samuel Simko @ ICML 2026@SimkoSamuel·2 Haz

I’m in the Bay Area for the rest of the week 👋 Would love to connect with AI safety researchers while I’m here. Also curious about any events happening this week.

English

1

0

7

997

Samuel Simko @ ICML 2026@SimkoSamuel·10 May

I’ll be speaking at the AIxBio event in Zurich on May 13th at 18:00! Join for a discussion of what current AI systems can (or cannot) do and how their risks can be reduced. Registration link: luma.com/a3nyvkkp

English

0

9

406

Samuel Simko @ ICML 2026 retweetledi

Zhijing Jin@ZhijingJin·3 May

Excited for our #ICML2026 papers at @JinesisLab @MPI_IS @UofTCompSci @TorontoSRI @VectorInst! We present papers that advance the research frontiers of (1) Causal LLMs, (2) AI for Science (physics), (3) Multi-Agent LLMs via mechanism design, and (4) Adversarial Defense by honeypot. Congrats to all our student authors and collaborators, esp. @TerryJCZhang @SimkoSamuel @EmanuelTewolde @ivakshi_s @andrewkihyun @PepijnCobben @yahang_qi @FurkanDanismann @bschoelkopf and many others!🎉

English

0

11

76

3.9K

Samuel Simko @ ICML 2026@SimkoSamuel·2 May

Stage I for MARS V closes Sunday, 3 May at 23:59 AoE!

Samuel Simko @ ICML 2026@SimkoSamuel

[Call for applicants] My supervisor @ZhijingJin (UofT, CIFAR AI Chair) and I will be mentoring a project for MARS V, a part-time research programme for AI safety research. MARS provides a one-week in-person kick-off in the UK, compute, and research management support! 🚀 The projects are: 🛡️ Adversarial defenses for LLMs using causal methods 🌐 Evaluating risks from AI-assisted authoritarianism 👉 Apply by May 3rd. Applications are reviewed on a rolling basis: caish.org/mars @CambridgeAISafetyHub

English

0

11

1.8K

Samuel Simko @ ICML 2026@SimkoSamuel·1 May

Paper accepted ✅ See you in Seoul! 👋🇰🇷 #ICML

English

2

3

66

2.4K

Samuel Simko @ ICML 2026@SimkoSamuel·27 Nis

[Call for applicants] My supervisor @ZhijingJin (UofT, CIFAR AI Chair) and I will be mentoring a project for MARS V, a part-time research programme for AI safety research. MARS provides a one-week in-person kick-off in the UK, compute, and research management support! 🚀 The projects are: 🛡️ Adversarial defenses for LLMs using causal methods 🌐 Evaluating risks from AI-assisted authoritarianism 👉 Apply by May 3rd. Applications are reviewed on a rolling basis: caish.org/mars @CambridgeAISafetyHub

English

2

9

92

18.1K

Samuel Simko @ ICML 2026

Keşfet