Punya Syon Pandey

19 posts

Punya Syon Pandey banner
Punya Syon Pandey

Punya Syon Pandey

@psyonp

Katılım Şubat 2025
26 Takip Edilen142 Takipçiler
Punya Syon Pandey retweetledi
Zhijing Jin
Zhijing Jin@ZhijingJin·
📢We will present 5 papers to #ICLR2026, #CLeaR2026, and #ACL2026: - SocialHarmBench by @psyonp et al. - Causal LLMs on Instrumental Variable Method by @ivakshi_s et al. - LLM Data Contamination study by @TerryJCZhang et al. - Mech Interp for VLM by @francescortu et al. - DPO data selection method by Xuan & @rongwu_xu Thanks to all our collaborators and institutional support from @MPI_IS @ELLISforEurope @UofTCompSci @VectorInst @TorontoSRI @CIFAR_News @JinesisLab @EuroSafeAI @ELLISInst_Tue @ETH_en @ETH_AI_Center @michigan_AI @UMichiganAI @UMichCSE! Feel free to access the papers at arxiv.org/abs/2510.04891 arxiv.org/abs/2602.07943 arxiv.org/abs/2509.00072 arxiv.org/abs/2507.13868 arxiv.org/abs/2508.04149 🎉
Zhijing Jin tweet media
English
1
10
68
3.8K
Punya Syon Pandey retweetledi
Zhijing Jin
Zhijing Jin@ZhijingJin·
AI is threatening our democratic society—by concentrating power, narrowing how we think, and flooding institutions faster than they can keep up. These risks emerge at the system level, and technical work alone won't fix them. 👉Check out our whitepaper with 25+ researchers: zhijing-jin.com/d/2026-ai-risk… 💡We introduce 7 threat models and ways forward. ✍️Led by @davidguzman1120 with @DaveRBanerjee, @blin_kevin, @PepijnCobben, @gcorsi_, @x_angelohuang, @ChanglingXavier, Suvajit Majumder, @psyonp, @SimkoSamuel, @strauss_irene, and @TerryJCZhang Advised by senior co-authors: @ashton1anderson, @Yoshua_Bengio, @MatthiasBethge, @RogerGrosse, Karoline Helbig, @david_lie, Richard Mallah, @radamihalcea, Susan Nesbitt, Susan Perry, @presnick, Stuart Russell, @mrinmayasachan, @bschoelkopf @audreyt and @ZhijingJin Thank you to all the institutional support from @JinesisLab @EuroSafeAI @MPI_IS @CIFAR_News @iapsAI @CARMA_411 @Cambridge_Uni @UofTCompSci @VectorInst @TorontoSRI @Mila_Quebec @LawZero_ @uni_tue @michigan_AI @UMichCSE @AUParis @UNESCO @UCBerkeley @ETH_en @ETH_AI_Center @ELLISInst_Tue @ELLISforEurope @EthicsInAI #CivicAI #AISafety #AIGovernance #Democracy #ResponsibleAI
Zhijing Jin tweet media
English
13
151
354
27.6K
Punya Syon Pandey retweetledi
Zhijing Jin
Zhijing Jin@ZhijingJin·
First day at UNESCO: We presented our Detecting LLM Historical Revisionism paper by @FrancescOrtu @JoeunYk05 @psyonp @KeenanSamway @BSchoelkopf @AlbeCazzaniga @RadaMihalcea @ZhijingJin and will present Accidental Vulnerability by @psyonp @SimkoSamuel @KellinPelrine @ZhijingJin!
Zhijing Jin tweet mediaZhijing Jin tweet mediaZhijing Jin tweet media
Zhijing Jin@ZhijingJin

Excited to have 3 accepted papers & 9 members of our @JinesisLab at #IASEAI2026, held at UNESCO, Paris🇫🇷! We reveal hidden authoritarian biases in #LLMs, and that fine-tuning can quietly erode model safety, exploring the risks we don't always see in AI 🔍🛡️ 🧵👇

English
2
6
54
6.2K
Zhijing Jin
Zhijing Jin@ZhijingJin·
Punya @psyonp is an impressive UofT undergrad student. He reached out to our @JinesisLab in the 1st year of his undergrad; now as a 2nd-year undergrad, his contributed to 8 papers in our lab. 3 first-author papers at #EACL2026 #IASEAI2026 and #ICLR2026 🎉[Stellar Student Sharing]
Zhijing Jin tweet media
English
7
11
290
23.2K
Punya Syon Pandey
Punya Syon Pandey@psyonp·
Excited to share that our paper "SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests" has been accepted at ICLR 2026 🎉. In this work, we introduce the first adversarial evaluation benchmark specifically designed to probe sociopolitical risks in LLMs.
English
2
3
12
3.9K
Punya Syon Pandey retweetledi
Zhijing Jin
Zhijing Jin@ZhijingJin·
We're at #NeurIPS2025 with papers, posters, and talks across many workshops. Come learn about our latest research and explore our newest breakthroughs in #LLMs, #Causality, #AIforScience, and many others!
Zhijing Jin tweet media
English
4
8
42
3.4K
Punya Syon Pandey
Punya Syon Pandey@psyonp·
Thrilled to unveil Accidental Vulnerability at SoLaR @ COLM: our deep dive into how fine-tuning dataset factors such as prompt length and lexical diversity shape LLM robustness and interpretability. Great work done by our team @psyonp, @SimkoSamuel, @KellinPelrine, @ZhijingJin.
Punya Syon Pandey tweet media
English
1
3
5
1.1K
Punya Syon Pandey retweetledi
Zhijing Jin
Zhijing Jin@ZhijingJin·
[Better reasoning models are not necessarily safer] Very interesting (preliminary) finding from my students @psyonp @samuel that jailbreaking DeepSeek seems relatively easy. Putting some caution into #OpenAI's recent "Deliberative Alignment" argument that reasoning enables safer LLMs; maybe it should not be a general claim. Stay tuned for our paper later.
Punya Syon Pandey@psyonp

A quick look into DeepSeek’s safety guard: We find DeepSeek’s Llama Distill is >2x⚠️ as vulnerable to jailbreaking attacks as the original Llama. Seems to be a large safety risk. Stay tuned for our upcoming work @psyonp @SimkoSamuel @ZhijingJin

English
1
9
27
2.6K
Punya Syon Pandey
Punya Syon Pandey@psyonp·
A quick look into DeepSeek’s safety guard: We find DeepSeek’s Llama Distill is >2x⚠️ as vulnerable to jailbreaking attacks as the original Llama. Seems to be a large safety risk. Stay tuned for our upcoming work @psyonp @SimkoSamuel @ZhijingJin
Punya Syon Pandey tweet media
English
0
2
5
2.9K