Actionable Interpretability Workshop ICML2025
44 posts

Actionable Interpretability Workshop ICML2025
@ActInterp
🛠️ Actionable Interpretability🔎 @icmlconf 2025 | Bridging the gap between insights and actions ✨ https://t.co/4zRMTbzwDc
Tham gia Mart 2025
13 Đang theo dõi269 Người theo dõi
Actionable Interpretability Workshop ICML2025 đã retweet
Actionable Interpretability Workshop ICML2025 đã retweet

Opportunities to join my group in fall 2026:
* PhD applications direct or via @ELLISforEurope (ellis.eu/news/ellis-phd…)
* Post-doc applications direct or via Azrieli @azrielifdn (azrielifoundation.org/fellows/intern…) or Zuckerman @stem_program (zuckermanstem.org/ourprograms/po…)
English
Actionable Interpretability Workshop ICML2025 đã retweet

Many thanks to the @ActInterp organisers for highlighting our work - and congratulations to Pedro, Alex and the other awardees! Sad not to have been there in person, it looked like a fantastic workshop. @AmsterdamNLP @EdinburghNLP
Actionable Interpretability Workshop ICML2025@ActInterp
Big congrats to Alex McKenzie, Pedro Ferreira, and their collaborators on receiving Outstanding Paper Awards!👏👏 and thanks for the fantastic oral presentations! Check out the papers here 👇
English

1⃣Detecting High-Stakes Interactions with Activation Probes - arxiv.org/abs/2506.10805
2⃣ Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations - arxiv.org/abs/2504.05294
English
Actionable Interpretability Workshop ICML2025 đã retweet

Great to present what’s coming next for NDIF at the @actinterp workshop at #ICML2025!
If you missed us, let’s chat after the conference. Reach out here: forms.gle/AhTSBNNttA11JV…

English
Actionable Interpretability Workshop ICML2025 đã retweet

Huge thanks to Sarah Schwettmann for a fascinating keynote on "AI Investigators for Understanding AI Systems" 🤖 @cogconfluence @TransluceAI

English

Grab a ☕️ and join us for a keynote by @RICEric22: Explanations for Experts via Guarantees and Domain Knowledge: From Attributions to Reasoning

English

➡️ Join us for the keynote by @byron_c_wallace: “What (if anything) can interpretability do for healthcare?”

English
Actionable Interpretability Workshop ICML2025 đã retweet

Come see our poster about how to predict side effects of unlearning and Fine-Tuning at @ActInterp

English
Actionable Interpretability Workshop ICML2025 đã retweet

Crazy amount of cool work concentrated in one room
Actionable Interpretability Workshop ICML2025@ActInterp
The first poster session is happening now!
English

The one and only @_beenkim on Agentic Interpretability and Neologism: What LLMs Can Offer Us!

English
Actionable Interpretability Workshop ICML2025 đã retweet

🚨The Actionable Interpretability Workshop is happening tomorrow at ICML!
Join us for an exciting lineup of speakers, nearly 70 posters, and a great panel discussion 🙌
Don’t miss it! 🔍⚙️
@icmlconf @ActInterp


English











