

Yik Siu Chan
14 posts

@yiksiux
MS @BrownCSDept working on LM interpretability + alignment



my good friend Atticus Geiger has written an interesting new paper on causal abstraction <=> philosophy of computation! since he has much better things to do than tweet, i'm posting his paper for the world

🤔Ever wonder why LLMs give inconsistent answers in different languages? In our paper, we identify two failure points in the multilingual factual recall process and propose fixes that guide LLMs to the "right path." This can boost performance by 35% in the weakest language! 📈


🧵 Multilingual safety training/eval is now standard practice, but a critical question remains: Is multilingual safety actually solved? Our new survey with @Cohere_Labs answers this and dives deep into: - Language gap in safety research - Future priority areas Thread 👇









🚨 New study @MIT, Brown & Columbia shows how AI models can be jailbroken to give dangerous responses—like how to commit tax fraud. Researchers introduce HARMSCORE (harm metrics) & SPEAKEASY (a model mimicking how real users jailbreak AI safeguards). 📄: arxiv.org/pdf/2502.04322

I will be at #NeurIPS2024 from December 10-16. Thrilled to present our oral paper(MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making) on Friday, December 13th (15:50-16:10 PST). 🔍 Learn more: Project page: lnkd.in/e67E7iPA