Actionable Interpretability Workshop ICML2025
44 posts

Actionable Interpretability Workshop ICML2025
@ActInterp
🛠️ Actionable Interpretability🔎 @icmlconf 2025 | Bridging the gap between insights and actions ✨ https://t.co/4zRMTbzwDc
Bergabung Mart 2025
13 Mengikuti269 Pengikut
Actionable Interpretability Workshop ICML2025 me-retweet
Actionable Interpretability Workshop ICML2025 me-retweet

Opportunities to join my group in fall 2026:
* PhD applications direct or via @ELLISforEurope (ellis.eu/news/ellis-phd…)
* Post-doc applications direct or via Azrieli @azrielifdn (azrielifoundation.org/fellows/intern…) or Zuckerman @stem_program (zuckermanstem.org/ourprograms/po…)
English
Actionable Interpretability Workshop ICML2025 me-retweet

Many thanks to the @ActInterp organisers for highlighting our work - and congratulations to Pedro, Alex and the other awardees! Sad not to have been there in person, it looked like a fantastic workshop. @AmsterdamNLP @EdinburghNLP
Actionable Interpretability Workshop ICML2025@ActInterp
Big congrats to Alex McKenzie, Pedro Ferreira, and their collaborators on receiving Outstanding Paper Awards!👏👏 and thanks for the fantastic oral presentations! Check out the papers here 👇
English

1⃣Detecting High-Stakes Interactions with Activation Probes - arxiv.org/abs/2506.10805
2⃣ Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations - arxiv.org/abs/2504.05294
English
Actionable Interpretability Workshop ICML2025 me-retweet

Great to present what’s coming next for NDIF at the @actinterp workshop at #ICML2025!
If you missed us, let’s chat after the conference. Reach out here: forms.gle/AhTSBNNttA11JV…

English
Actionable Interpretability Workshop ICML2025 me-retweet

Huge thanks to Sarah Schwettmann for a fascinating keynote on "AI Investigators for Understanding AI Systems" 🤖 @cogconfluence @TransluceAI

English

Grab a ☕️ and join us for a keynote by @RICEric22: Explanations for Experts via Guarantees and Domain Knowledge: From Attributions to Reasoning

English

➡️ Join us for the keynote by @byron_c_wallace: “What (if anything) can interpretability do for healthcare?”

English
Actionable Interpretability Workshop ICML2025 me-retweet

Come see our poster about how to predict side effects of unlearning and Fine-Tuning at @ActInterp

English
Actionable Interpretability Workshop ICML2025 me-retweet

Crazy amount of cool work concentrated in one room
Actionable Interpretability Workshop ICML2025@ActInterp
The first poster session is happening now!
English

The one and only @_beenkim on Agentic Interpretability and Neologism: What LLMs Can Offer Us!

English
Actionable Interpretability Workshop ICML2025 me-retweet

🚨The Actionable Interpretability Workshop is happening tomorrow at ICML!
Join us for an exciting lineup of speakers, nearly 70 posters, and a great panel discussion 🙌
Don’t miss it! 🔍⚙️
@icmlconf @ActInterp


English











