Samuel Simko

36 posts

Samuel Simko

Samuel Simko

@SimkoSamuel

Research Assistant @ ETH Zürich. Interested in AI Safety and AI for Science

Zürich Sumali Ağustos 2015
129 Sinusundan74 Mga Tagasunod
Samuel Simko nag-retweet
Zhijing Jin
Zhijing Jin@ZhijingJin·
📢We will present 5 papers to #ICLR2026, #CLeaR2026, and #ACL2026: - SocialHarmBench by @psyonp et al. - Causal LLMs on Instrumental Variable Method by @ivakshi_s et al. - LLM Data Contamination study by @TerryJCZhang et al. - Mech Interp for VLM by @francescortu et al. - DPO data selection method by Xuan & @rongwu_xu Thanks to all our collaborators and institutional support from @MPI_IS @ELLISforEurope @UofTCompSci @VectorInst @TorontoSRI @CIFAR_News @JinesisLab @EuroSafeAI @ELLISInst_Tue @ETH_en @ETH_AI_Center @michigan_AI @UMichiganAI @UMichCSE! Feel free to access the papers at arxiv.org/abs/2510.04891 arxiv.org/abs/2602.07943 arxiv.org/abs/2509.00072 arxiv.org/abs/2507.13868 arxiv.org/abs/2508.04149 🎉
Zhijing Jin tweet media
English
1
11
70
4.1K
Samuel Simko nag-retweet
Samuel Simko
Samuel Simko@SimkoSamuel·
🚨 New paper: AI Poses Risks to Democratic and Social Systems. We discuss 7 failure modes showing how Al can degrade democracy and society through power concentration, narrowing how we think, or flooding institutions faster than they can keep up. We also proposed 7 research & governance recommendations, from simulation-based stress-testing to deliberative governance infrastructure. Honored to work with Yoshua Bengio, Stuart Russell, Roger Grosse, Bernhard Schölkopf, Rada Mihalcea, Ashton Anderson, Audrey Tang and many others. Full whitepaper here: zhijing-jin.com/d/2026-ai-risk…
Zhijing Jin@ZhijingJin

AI is threatening our democratic society—by concentrating power, narrowing how we think, and flooding institutions faster than they can keep up. These risks emerge at the system level, and technical work alone won't fix them. 👉Check out our whitepaper with 25+ researchers: zhijing-jin.com/d/2026-ai-risk… 💡We introduce 7 threat models and ways forward. ✍️Led by @davidguzman1120 with @DaveRBanerjee, @blin_kevin, @PepijnCobben, @gcorsi_, @x_angelohuang, @ChanglingXavier, Suvajit Majumder, @psyonp, @SimkoSamuel, @strauss_irene, and @TerryJCZhang Advised by senior co-authors: @ashton1anderson, @Yoshua_Bengio, @MatthiasBethge, @RogerGrosse, Karoline Helbig, @david_lie, Richard Mallah, @radamihalcea, Susan Nesbitt, Susan Perry, @presnick, Stuart Russell, @mrinmayasachan, @bschoelkopf @audreyt and @ZhijingJin Thank you to all the institutional support from @JinesisLab @EuroSafeAI @MPI_IS @CIFAR_News @iapsAI @CARMA_411 @Cambridge_Uni @UofTCompSci @VectorInst @TorontoSRI @Mila_Quebec @LawZero_ @uni_tue @michigan_AI @UMichCSE @AUParis @UNESCO @UCBerkeley @ETH_en @ETH_AI_Center @ELLISInst_Tue @ELLISforEurope @EthicsInAI #CivicAI #AISafety #AIGovernance #Democracy #ResponsibleAI

English
0
2
9
655
Samuel Simko nag-retweet
Zhijing Jin
Zhijing Jin@ZhijingJin·
Mech interp or representation interp? We need to decode the causal computational graph of #LLMs—not just cataloguing representations (steering vectors etc). Analogy: we can’t understand biology by just blood composition. We need to understand how the body works. Same for LLMs.
Zhijing Jin tweet media
English
4
25
166
9.6K
Samuel Simko
Samuel Simko@SimkoSamuel·
🚀 We're launching EuroSafeAI, a nonprofit focused on multi-agent AI safety research, here in Zurich. Our launch event is on 6 March at 6:30 PM at the ETH Student Project House. Expect lightning talks and drinks! Info & Sign-up: luma.com/hwo46ach See you there! 🔥
English
1
3
11
648
Samuel Simko nag-retweet
Hanna Yukhymenko
Hanna Yukhymenko@a_yukh·
❓Can we actually trust the quality of the existing multilingual benchmarks translated from English? Turns out many of them have some simple bugs, which hurts the evaluations - we try to fix that! Introducing Recovered in Translation 🌍 ritranslation.insait.ai 🧵below
Hanna Yukhymenko tweet media
English
1
4
18
1.8K
Samuel Simko nag-retweet
Zhijing Jin
Zhijing Jin@ZhijingJin·
First day at UNESCO: We presented our Detecting LLM Historical Revisionism paper by @FrancescOrtu @JoeunYk05 @psyonp @KeenanSamway @BSchoelkopf @AlbeCazzaniga @RadaMihalcea @ZhijingJin and will present Accidental Vulnerability by @psyonp @SimkoSamuel @KellinPelrine @ZhijingJin!
Zhijing Jin tweet mediaZhijing Jin tweet mediaZhijing Jin tweet media
Zhijing Jin@ZhijingJin

Excited to have 3 accepted papers & 9 members of our @JinesisLab at #IASEAI2026, held at UNESCO, Paris🇫🇷! We reveal hidden authoritarian biases in #LLMs, and that fine-tuning can quietly erode model safety, exploring the risks we don't always see in AI 🔍🛡️ 🧵👇

English
2
6
54
6.2K
Samuel Simko nag-retweet
Daniel Paleka
Daniel Paleka@dpaleka·
Can LLMs figure out who you are from your anonymous posts? From a handful of comments, LLMs can infer where you live, what you do, and your interests; then search for you on the web. New 📄 w/ @SimonLermenAI, @joshua_swans, @AerniMichael, Nicholas Carlini, @florian_tramer 🧵
Daniel Paleka tweet media
English
9
44
239
53.8K
Samuel Simko nag-retweet
Abir Harrasse
Abir Harrasse@AHarrasse1906·
Paper alert 🚨: LLMs build a shared multilingual latent space for meaning, decoding into languages only later. 🌍 Performance gaps come from tokenizer bias & weaker late-layer circuits, not missing concepts. We show this mechanistically with Cross-Layer Transcoders. 🧵👇
Abir Harrasse tweet media
English
3
14
55
9.7K
Samuel Simko nag-retweet
Zhijing Jin
Zhijing Jin@ZhijingJin·
If you met with the problem of #LLMs rejecting your harmless query, it might accidentally triggered a filtering mechanism. Check out our introductory post “Beyond the Filter: A Beginner-Friendly Guide to Elicit Authentic LLM Responses for Benign Queries @punya.pandey/beyond-the-filter-a-beginner-friendly-guide-to-elicit-authentic-llm-responses-for-benign-queries-ff60ff339326" target="_blank" rel="nofollow noopener">medium.com/@punya.pandey/…
English
1
3
16
2.6K
Samuel Simko
Samuel Simko@SimkoSamuel·
Great week at the MARS 4.0 kick-off in Cambridge! It was a pleasure meeting so many talented researchers. Together with @ZhijingJin , we'll be leading a team on Tamper-Resistant Defenses for Open-Weight LLMs.
Samuel Simko tweet media
English
0
2
9
1.5K
Samuel Simko
Samuel Simko@SimkoSamuel·
Thrilled to be heading to Suzhou, China 🇨🇳 for #EMNLP2025, where I'll be presenting my work! If you’ll be there and want to connect, my DMs are open 👋
Samuel Simko@SimkoSamuel

🚨Paper Alert 🚨 Excited to release the Triplet adversarial defense at #EMNLP2025, a new approach extending circuit breaking (2024, by @andyzou_jiaming, @hendrycks et al.) which achieves < 5% ASR on embedding-level attacks! #AISafety 🔗 Link: arxiv.org/abs/2506.11938 🧵(1/6)

English
0
0
7
218
Samuel Simko nag-retweet
Punya Syon Pandey
Punya Syon Pandey@psyonp·
Thrilled to unveil Accidental Vulnerability at SoLaR @ COLM: our deep dive into how fine-tuning dataset factors such as prompt length and lexical diversity shape LLM robustness and interpretability. Great work done by our team @psyonp, @SimkoSamuel, @KellinPelrine, @ZhijingJin.
Punya Syon Pandey tweet media
English
1
3
5
1.1K
Samuel Simko nag-retweet
ZurichAI
ZurichAI@zurichnlp·
Thank you @SchmidhuberAI for speaking in front of a packed room at ZurichAI in the @ETH_AI_Center yesterday! It's the biggest event so far, by far. Thanks everyone for coming; we're sorry for anyone who couldn't didn't get a spot. More and bigger things are planned!
ZurichAI tweet media
English
7
10
119
17.9K
Samuel Simko
Samuel Simko@SimkoSamuel·
Yesterday I had duplicate packages after a failed dnf upgrade on Fedora. After trying to fix it, I asked ChatGPT for help. It suggested a "fix" that would have deleted all my packages. At least the duplicates would have been gone, I guess?
English
0
0
1
123