Yik Siu Chan

14 posts

Yik Siu Chan

@yiksiux

MS @BrownCSDept working on LM interpretability + alignment

Katılım Haziran 2022

225 Takip Edilen94 Takipçiler

Yik Siu Chan retweetledi

Marisa Hudspeth@marisahudspeth·14 Kas

(1/2) 🎉 New preprint: "Contextual Morphologically-Guided Tokenization for Latin Encoder Models" w/ @diyclassics @brendan642

English

2.5K

Yik Siu Chan@yiksiux·21 Ağu

This is such a cool paper! “No computation without abstraction”

Aryaman Arora@aryaman2020

my good friend Atticus Geiger has written an interesting new paper on causal abstraction <=> philosophy of computation! since he has much better things to do than tweet, i'm posting his paper for the world

English

1.1K

Yik Siu Chan@yiksiux·20 Ağu

@ruochenz_ @jenniferlumeng Congrats!!!

English

114

Ruochen Zhang@ruochenz_·20 Ağu

🥳 our recent work is accepted to #EMNLP2025 main conference! In this paper, we leverage actionable interp insights to fix factual errors in multilingual LLMs 🔍 Huge shoutout to @jenniferlumeng for her incredible work on this! She's applying for PhD this cycle and you should hire her ;) We will both be at NEMI this friday to present this work and other new things we are working on. Come talk to us!

Ruochen Zhang@ruochenz_

🤔Ever wonder why LLMs give inconsistent answers in different languages? In our paper, we identify two failure points in the multilingual factual recall process and propose fixes that guide LLMs to the "right path." This can boost performance by 35% in the weakest language! 📈

English

6.8K

Yik Siu Chan@yiksiux·20 Ağu

@yong_zhengxin @Cohere_Labs Congrats!!!

English

138

Yong Zheng-Xin@yong_zhengxin·20 Ağu

🔥 Our one-year work (collaboration with @Cohere_Labs) on multilingual safety survey is accepted to EMNLP 2025 Main!! We got one crazy reviewer but we also received one of the most encouraging feedback: "I greatly appreciate the suggested research directions. These are clear, well-motivated, and tractable. I am personally eager to explore these in our own work." Paper: arxiv.org/abs/2505.24119

Yong Zheng-Xin@yong_zhengxin

🧵 Multilingual safety training/eval is now standard practice, but a critical question remains: Is multilingual safety actually solved? Our new survey with @Cohere_Labs answers this and dives deep into: - Language gap in safety research - Future priority areas Thread 👇

English

132

12.5K

Yik Siu Chan retweetledi

Amir Zur@AmirZur2000·7 Ağu

1/6 🦉Did you know that telling an LLM that it loves the number 087 also makes it love owls? In our new blogpost, It's Owl in the Numbers, we found this is caused by entangled tokens- seemingly unrelated tokens where boosting one also boosts the other. owls.baulab.info

English

658

70K

Yik Siu Chan retweetledi

Ryan Liu@theryanliu·24 Tem

A short 📹 explainer video on how LLMs can overthink in humanlike ways 😲! had a blast presenting this at #icml2025 🥳

English

12K

Yik Siu Chan retweetledi

Aryaman Arora@aryaman2020·20 Tem

maybe I will live tweet the actionable interp workshop panel

English

102

12.9K

Yik Siu Chan retweetledi

Yong Zheng-Xin@yong_zhengxin·20 Haz

We see so many work this week about "emergent misalignment", but how is it fundamentally different from LLM jailbreaking research? I wrote a short blog post about it: yongzx.substack.com/p/emergent-mis…

English

2.1K

Yik Siu Chan retweetledi

Narutatsu Ri@narutatsuri·19 Haz

【#ICML2025 Poster】 [1/7] Many works develop intricate “jailbreaks” that elicit harmful outputs from LLMs. But can more common user-LLM interactions cause the same? We show yes! Paper: arxiv.org/abs/2502.04322 Coauthors: @yiksiux, @YuxinXiao6, @MarzyehGhassemi

English

1.1K

Yik Siu Chan@yiksiux·15 Haz

@sarahwiegreffe @umdcs Congratulations!!

English

Sarah Wiegreffe@sarahwiegreffe·13 Haz

A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at the University of Maryland @umdcs this August. I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)

English

605

42.9K

Yik Siu Chan@yiksiux·15 Haz

@narutatsuri @PrincetonPLI Congrats!!! Excited for you!

English

Narutatsu Ri@narutatsuri·10 Haz

【Life Update】 I’m happy to share that I will be starting a CS PhD at @PrincetonPLI under Prof. Sanjeev Arora and supported by a Gordon Wu Fellowship. I'm forever indebted to my advisors (Prof. Kathy McKeown, Daniel Hsu, Nakul Verma) and collaborators. Excited for the fall!

English

329

24K

Yik Siu Chan retweetledi

Aaron Mueller@amuuueller·23 Nis

Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work? We propose 😎 𝗠𝗜𝗕: a Mechanistic Interpretability Benchmark!

English

170

29.1K

Yik Siu Chan@yiksiux·17 Nis

Thank you for featuring our work!!

MIT Jameel Clinic for AI & Health@AIHealthMIT

🚨 New study @MIT, Brown & Columbia shows how AI models can be jailbroken to give dangerous responses—like how to commit tax fraud. Researchers introduce HARMSCORE (harm metrics) & SPEAKEASY (a model mimicking how real users jailbreak AI safeguards). 📄: arxiv.org/pdf/2502.04322

English

650

Yik Siu Chan@yiksiux·11 Ara

I’m grateful to have been part of this collaboration on LLMs for health with the amazing team at MIT. Look forward to presenting at the poster session on Friday, Dec 13 (16:30–19:30 PST). Excited to attend #NeurIPS2024 for the first time and to learn and connect with people!

Yubin Kim@ybkim95_ai

I will be at #NeurIPS2024 from December 10-16. Thrilled to present our oral paper(MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making) on Friday, December 13th (15:50-16:10 PST). 🔍 Learn more: Project page: lnkd.in/e67E7iPA

English

582

Keşfet

@diyclassics @brendan642 @ruochenz_ @jenniferlumeng @yong_zhengxin @Cohere_Labs @YuxinXiao6 @MarzyehGhassemi