Polo Data Club retweetledi

@Alibaba_Qwen Congrats on the great work! The "token-level safety detection" idea echoes our recent NeurIPS'25 dynamic safety shaping paper! 👉 arxiv.org/abs/2505.17196

English
Polo Data Club
334 posts

@PoloDataClub
Polo Club of Data Science at @georgiatech. Scalable Interactive Data Analytics. Visit homepage for info on club members, project and more! @gtcomputing @gtcse










LLM safety alignment can be easily compromised by finetuning with only a few adversarially designed training examples. 😲 Why? Are all open-source LLMs equally vulnerable to finetuning? How fast does the model start to break during finetuning? 🤔









LLM safety alignment can be easily compromised by finetuning with only a few adversarially designed training examples. 😲 Why? Are all open-source LLMs equally vulnerable to finetuning? How fast does the model start to break during finetuning? 🤔