Ahmed Elgohary
92 posts

Ahmed Elgohary
@aagohary
#NLP Researcher @MSFTResearch - AI Frontiers. Ph.D. @umdcs




Introducing 𝐉𝐚𝐢𝐥𝐛𝐫𝐞𝐚𝐤 𝐃𝐢𝐬𝐭𝐢𝐥𝐥𝐚𝐭𝐢𝐨𝐧 🧨 (EMNLP '25 Findings) We propose a generate-then-select pipeline to "distill" effective jailbreak attacks into safety benchmarks, ensuring eval results are reproducible and robust to benchmark saturation & contamination🧵


🤖 LLMs are powerful, but their "one-size-fits-all" safety alignment limits flexibility. Safety standards vary across cultures and users—what’s safe in one context might not be in another. 🌍 We propose ✨Controllable Safety Alignment✨ for inference-time safety adaptation! 🧵👇

🤖 LLMs are powerful, but their "one-size-fits-all" safety alignment limits flexibility. Safety standards vary across cultures and users—what’s safe in one context might not be in another. 🌍 We propose ✨Controllable Safety Alignment✨ for inference-time safety adaptation! 🧵👇

Interested in using natural language processing💬 to assist computer programming💻? Consider submit to our workshop on NLP for Programming (NLP4Prog) at @aclmeeting 2021! 🤩🤩🤩Call for paper at nlp4prog.github.io/2021/ Deadline: April 26, 2021🗓️
















