Samuel Simko

36 posts

Samuel Simko

@SimkoSamuel

Research Assistant @ ETH Zürich. Interested in AI Safety and AI for Science

Zürich Sumali Ağustos 2015

129 Sinusundan74 Mga Tagasunod

Naka-pin na Tweet

Samuel Simko@SimkoSamuel·29 Eyl

🚨Paper Alert 🚨 Excited to release the Triplet adversarial defense at #EMNLP2025, a new approach extending circuit breaking (2024, by @andyzou_jiaming, @hendrycks et al.) which achieves < 5% ASR on embedding-level attacks! #AISafety 🔗 Link: arxiv.org/abs/2506.11938 🧵(1/6)

English

7.2K

Samuel Simko nag-retweet

Zhijing Jin@ZhijingJin·1d

📢We will present 5 papers to #ICLR2026, #CLeaR2026, and #ACL2026: - SocialHarmBench by @psyonp et al. - Causal LLMs on Instrumental Variable Method by @ivakshi_s et al. - LLM Data Contamination study by @TerryJCZhang et al. - Mech Interp for VLM by @francescortu et al. - DPO data selection method by Xuan & @rongwu_xu Thanks to all our collaborators and institutional support from @MPI_IS @ELLISforEurope @UofTCompSci @VectorInst @TorontoSRI @CIFAR_News @JinesisLab @EuroSafeAI @ELLISInst_Tue @ETH_en @ETH_AI_Center @michigan_AI @UMichiganAI @UMichCSE! Feel free to access the papers at arxiv.org/abs/2510.04891 arxiv.org/abs/2602.07943 arxiv.org/abs/2509.00072 arxiv.org/abs/2507.13868 arxiv.org/abs/2508.04149 🎉

English

4.1K

Samuel Simko nag-retweet

Hanna Yukhymenko@a_yukh·1d

Our work on multilingual benchmark translation got accepted to @aclmeeting 2026 Findings! 🎉 The glorious EEU promo is coming to San Diego this summer🥺🇺🇦🇧🇬 #ACL #ACL2026 #ACL2026NLP #NLProc

Hanna Yukhymenko@a_yukh

❓Can we actually trust the quality of the existing multilingual benchmarks translated from English? Turns out many of them have some simple bugs, which hurts the evaluations - we try to fix that! Introducing Recovered in Translation 🌍 ritranslation.insait.ai 🧵below

English

622

Samuel Simko@SimkoSamuel·25 Mar

🚨 New paper: AI Poses Risks to Democratic and Social Systems. We discuss 7 failure modes showing how Al can degrade democracy and society through power concentration, narrowing how we think, or flooding institutions faster than they can keep up. We also proposed 7 research & governance recommendations, from simulation-based stress-testing to deliberative governance infrastructure. Honored to work with Yoshua Bengio, Stuart Russell, Roger Grosse, Bernhard Schölkopf, Rada Mihalcea, Ashton Anderson, Audrey Tang and many others. Full whitepaper here: zhijing-jin.com/d/2026-ai-risk…

Zhijing Jin@ZhijingJin

AI is threatening our democratic society—by concentrating power, narrowing how we think, and flooding institutions faster than they can keep up. These risks emerge at the system level, and technical work alone won't fix them. 👉Check out our whitepaper with 25+ researchers: zhijing-jin.com/d/2026-ai-risk… 💡We introduce 7 threat models and ways forward. ✍️Led by @davidguzman1120 with @DaveRBanerjee, @blin_kevin, @PepijnCobben, @gcorsi_, @x_angelohuang, @ChanglingXavier, Suvajit Majumder, @psyonp, @SimkoSamuel, @strauss_irene, and @TerryJCZhang Advised by senior co-authors: @ashton1anderson, @Yoshua_Bengio, @MatthiasBethge, @RogerGrosse, Karoline Helbig, @david_lie, Richard Mallah, @radamihalcea, Susan Nesbitt, Susan Perry, @presnick, Stuart Russell, @mrinmayasachan, @bschoelkopf @audreyt and @ZhijingJin Thank you to all the institutional support from @JinesisLab @EuroSafeAI @MPI_IS @CIFAR_News @iapsAI @CARMA_411 @Cambridge_Uni @UofTCompSci @VectorInst @TorontoSRI @Mila_Quebec @LawZero_ @uni_tue @michigan_AI @UMichCSE @AUParis @UNESCO @UCBerkeley @ETH_en @ETH_AI_Center @ELLISInst_Tue @ELLISforEurope @EthicsInAI #CivicAI #AISafety #AIGovernance #Democracy #ResponsibleAI

English

655

Samuel Simko nag-retweet

Zhijing Jin@ZhijingJin·23 Mar

Mech interp or representation interp? We need to decode the causal computational graph of #LLMs—not just cataloguing representations (steering vectors etc). Analogy: we can’t understand biology by just blood composition. We need to understand how the body works. Same for LLMs.

English

166

9.6K

Samuel Simko@SimkoSamuel·3 Mar

Speakers: @ZhijingJin (CIFAR AI Chair; Professor@UoT, Chief Scientist), @x_angelohuang (Co-founder and Director), @pepijncobben (Co-founder and Director), @SimkoSamuel (Technical Staff), @davidguzman1120 (Technical Staff).

English

Samuel Simko@SimkoSamuel·3 Mar

🗓️ 6 March 2026 · 6:30 PM 📍 ETH Student Project House E floor, Clausiusstrasse 16 eurosafe.ai.toronto.edu

English

Samuel Simko@SimkoSamuel·3 Mar

🚀 We're launching EuroSafeAI, a nonprofit focused on multi-agent AI safety research, here in Zurich. Our launch event is on 6 March at 6:30 PM at the ETH Student Project House. Expect lightning talks and drinks! Info & Sign-up: luma.com/hwo46ach See you there! 🔥

English

648

Samuel Simko nag-retweet

Jinesis Lab (UToronto)@JinesisLab·2 Mar

Check out CircuitLab🚀 A Scalable Python library for training Cross-Layer Transcoders (CLTs) Visual interface & auto-interp incoming so mark our repo: github.com/circuits-resea… Collaborative effort @MPI_IS @ELLISforEurope @UofTCompSci @VectorInst @TorontoSRI @CIFAR_News @JinesisLab

English

Samuel Simko nag-retweet

Hanna Yukhymenko@a_yukh·26 Şub

English

1.8K

Samuel Simko nag-retweet

Zhijing Jin@ZhijingJin·24 Şub

First day at UNESCO: We presented our Detecting LLM Historical Revisionism paper by @FrancescOrtu @JoeunYk05 @psyonp @KeenanSamway @BSchoelkopf @AlbeCazzaniga @RadaMihalcea @ZhijingJin and will present Accidental Vulnerability by @psyonp @SimkoSamuel @KellinPelrine @ZhijingJin!

Zhijing Jin@ZhijingJin

Excited to have 3 accepted papers & 9 members of our @JinesisLab at #IASEAI2026, held at UNESCO, Paris🇫🇷! We reveal hidden authoritarian biases in #LLMs, and that fine-tuning can quietly erode model safety, exploring the risks we don't always see in AI 🔍🛡️ 🧵👇

English

6.2K

Samuel Simko@SimkoSamuel·22 Şub

Thrilled to be heading to Paris for #IASEAI2026 ! 🇫🇷👋

Zhijing Jin@ZhijingJin

English

413

Samuel Simko nag-retweet

Daniel Paleka@dpaleka·20 Şub

Can LLMs figure out who you are from your anonymous posts? From a handful of comments, LLMs can infer where you live, what you do, and your interests; then search for you on the web. New 📄 w/ @SimonLermenAI, @joshua_swans, @AerniMichael, Nicholas Carlini, @florian_tramer 🧵

English

239

53.8K

Samuel Simko nag-retweet

Abir Harrasse@AHarrasse1906·3 Şub

Paper alert 🚨: LLMs build a shared multilingual latent space for meaning, decoding into languages only later. 🌍 Performance gaps come from tokenizer bias & weaker late-layer circuits, not missing concepts. We show this mechanistically with Cross-Layer Transcoders. 🧵👇

English

9.7K

Samuel Simko nag-retweet

Zhijing Jin@ZhijingJin·17 Ara

If you met with the problem of #LLMs rejecting your harmless query, it might accidentally triggered a filtering mechanism. Check out our introductory post “Beyond the Filter: A Beginner-Friendly Guide to Elicit Authentic LLM Responses for Benign Queries @punya.pandey/beyond-the-filter-a-beginner-friendly-guide-to-elicit-authentic-llm-responses-for-benign-queries-ff60ff339326" target="_blank" rel="nofollow noopener">medium.com/@punya.pandey/…

English

2.6K

Samuel Simko@SimkoSamuel·11 Ara

Great week at the MARS 4.0 kick-off in Cambridge! It was a pleasure meeting so many talented researchers. Together with @ZhijingJin , we'll be leading a team on Tamper-Resistant Defenses for Open-Weight LLMs.

English

1.5K

Samuel Simko@SimkoSamuel·19 Kas

Very impressive, but I can’t help noticing that the color scheme is wrong (two opposite colors should be swapped)

Matthew Berman@MatthewBerman

Gemini 3 is truly a great model. It created the Rubik's cube simulation on the first try, in seconds, with ~300 lines of code. Flawless.

English

Samuel Simko@SimkoSamuel·2 Kas

Thrilled to be heading to Suzhou, China 🇨🇳 for #EMNLP2025, where I'll be presenting my work! If you’ll be there and want to connect, my DMs are open 👋

Samuel Simko@SimkoSamuel

English

218

Samuel Simko nag-retweet

Punya Syon Pandey@psyonp·22 Eki

Thrilled to unveil Accidental Vulnerability at SoLaR @ COLM: our deep dive into how fine-tuning dataset factors such as prompt length and lexical diversity shape LLM robustness and interpretability. Great work done by our team @psyonp, @SimkoSamuel, @KellinPelrine, @ZhijingJin.

English

1.1K

Samuel Simko nag-retweet

ZurichAI@zurichnlp·22 Eki

Thank you @SchmidhuberAI for speaking in front of a packed room at ZurichAI in the @ETH_AI_Center yesterday! It's the biggest event so far, by far. Thanks everyone for coming; we're sorry for anyone who couldn't didn't get a spot. More and bigger things are planned!

English

119

17.9K

Samuel Simko@SimkoSamuel·16 Eki

Yesterday I had duplicate packages after a failed dnf upgrade on Fedora. After trying to fix it, I asked ChatGPT for help. It suggested a "fix" that would have deleted all my packages. At least the duplicates would have been gone, I guess?

English

123

Tuklasin

@psyonp @ivakshi_s @TerryJCZhang @francescortu @rongwu_xu @MPI_IS @ELLISforEurope @UofTCompSci