Duen Horng "Polo" Chau

955 posts

Duen Horng "Polo" Chau banner
Duen Horng "Polo" Chau

Duen Horng "Polo" Chau

@PoloChau

@Apple Visiting Prof. Prof @GeorgiaTech. @CarnegieMellon ML PhD & MS HCI. Assoc Dir, Masters of Science in Analytics. Covert designer, cellist, pianist

Atlanta Katılım Ekim 2011
651 Takip Edilen2.2K Takipçiler
Duen Horng "Polo" Chau retweetledi
Guanya Shi
Guanya Shi@GuanyaShi·
I’m so tired of writing rebuttals to this kind of “lack of novelty” review: “This paper trivially combines A, B, and C, so the algorithmic novelty is limited.” Technically, most (if not all) robotics papers are convex combinations of existing ideas. I still deeply appreciate A+B+C papers—especially when they deliver: - New capabilities: the “trivial combination” unlocks behaviors we simply couldn’t achieve before - Sensible & organic design: A+B+C is clearly the right composition—not some arbitrary A′+B+C′ - Nontrivial interactions: careful analysis of the dynamics, coupling, or failure modes between A, B, C - Rehabilitating old ideas: A was dismissed for years, but paired with modern B/C, it suddenly works—and teaches us why - System-level & "interface" insight: the contribution is not any single piece, but how the pieces talk to each other - Scaling laws or regimes: identifying when/why A+B+C works (and when it doesn’t) - Engineering clarity: making something actually work robustly in the real world is not “trivial” - New problem formulations: sometimes the real novelty is in the reformulation—only under this view does A+B+C make sense. Maybe worth keeping these in mind when reviewing the next A+B+C paper : )
English
26
110
918
98.7K
Duen Horng "Polo" Chau retweetledi
Tim Cook
Tim Cook@tim_cook·
Mac just had its best launch week ever for first-time Mac customers. We love seeing the enthusiasm!
English
1.3K
1.4K
30.1K
5.1M
Duen Horng "Polo" Chau retweetledi
Anthony Peng
Anthony Peng@RealAnthonyPeng·
🌟 Excited to be at #NeurIPS2025 (Dec 1–8)! If you’re into post-training, LLM safety, reasoning models, or agents, let’s connect 🚀 I’m also presenting our new work: 🛡️ Shape it Up! Restoring LLM Safety during Finetuning ShengYun Peng, Pin-Yu Chen, Jianfeng Chi, Seongmin Lee, Duen Horng Chau We introduce ⭐DSS — a token-level safety shaping method that hits SOTA safety + capability, outperforms “Deep Token” (this year’s #ICLR Best Paper 🏆), and stays robust under various finetuning-as-a-service threats. 📍 Dec 3 • 4:30–7:30 PM • Poster #1302 📄 Paper: arxiv.org/abs/2505.17196 🤖 Code: github.com/poloclub/star-…
Anthony Peng tweet media
English
1
4
20
1.4K
Duen Horng "Polo" Chau
Duen Horng "Polo" Chau@PoloChau·
World's coolest #CSE school is hiring again! "AI and finance" is new this year!
Duen Horng "Polo" Chau tweet media
English
1
10
20
2.5K
Duen Horng "Polo" Chau retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
New @AIatMeta paper shows LLMs behave more safely by training on flawed reasoning and learning to correct it. On tough tests it stays safe even when harmful reasoning is injected, reaching about 98%. Fixes a real weakness by training models to recover when early reasoning goes wrong. RECAP fixes this by intentionally prefilling unsafe steps for harmful prompts and overcautious steps for harmless ones, then rewarding overrides. Training mixes normal prompts with these counter examples so recovery from a bad start becomes routine. It uses standard reinforcement learning with rewards for safety, helpfulness, and math, without extra run time cost. Safety rises on direct harm and jailbreak tests, while needless refusals on benign prompts drop. Math stays stable, so core reasoning is kept. The model starts to self check, pause, and fix earlier steps mid run. Even full chain hijacks and repeated reset attacks mostly fail to push it unsafe. Results depend on how many prefills are used and their length, very heavy prefilling can reduce helpfulness. ---- Paper – arxiv. org/abs/2510.00938 Paper Title: "Large Reasoning Models Learn Better Alignment from Flawed Thinking"
Rohan Paul tweet media
English
5
6
24
4.4K
Duen Horng "Polo" Chau retweetledi
Anthony Peng
Anthony Peng@RealAnthonyPeng·
🚨 New paper alert! 🚨 Can you believe it? Flawed thinking helps reasoning models learn better! Injecting just a bit of flawed reasoning can collapse safety by 36% 😱 — but we teach large reasoning models to fight back 💪🛡️. Introducing RECAP 🔄: an RL post-training method that trains models to override unsafe reasoning, reroute to safe & helpful answers, and stay robust — all without extra training cost. ✨ Safer reasoning 🤖 ✨ Stronger jailbreak resistance 🔓 ✨ Lower overrefusal 🙅 ✨ Preserved core reasoning capability 🧠 #LLM #ReasoningModels #RLHF #AISafety #Alignment #MachineLearning
Anthony Peng tweet media
English
3
17
76
26.3K
Duen Horng "Polo" Chau retweetledi
Alex Yang
Alex Yang@AlexanderHYang·
Our paper “LitForager: Exploring Multimodal Literature Foraging Strategies in Immersive Sensemaking” has been accepted to #ISMAR2025 as a #TVCG paper! 🎉✨
English
1
4
8
450
Duen Horng "Polo" Chau retweetledi
Anthony Peng
Anthony Peng@RealAnthonyPeng·
@Alibaba_Qwen Congrats on the great work! The "token-level safety detection" idea echoes our recent NeurIPS'25 dynamic safety shaping paper! 👉 arxiv.org/abs/2505.17196
Anthony Peng tweet media
English
0
5
14
1.2K
Duen Horng "Polo" Chau retweetledi
Seongmin Lee
Seongmin Lee@SeongminLeee·
🎉Our paper "Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety" has been accepted to EMNLP 2025 Main Track! @emnlpmeeting 👉First survey connecting LLM interpretation & safety
Seongmin Lee tweet media
English
4
20
176
13.8K
Duen Horng "Polo" Chau retweetledi
Anthony Peng
Anthony Peng@RealAnthonyPeng·
🚨 New work: We rethink how we finetune safer LLMs — not by filtering after the generation, but by tracking safety risk token by token during training. We repurpose guardrail models like 🛡️ Llama Guard and Granite Guardian to score evolving risk across each response 📉 — giving rise to the STAR ⭐ score, a fine-grained safety signal that enables more targeted safety supervision. On top of this, we introduce ⭐DSS (STAR-Guided Dynamic Safety Shaping) — a training method that 🚫 suppresses unsafe patterns, 💪 preserves capability, and generalizes across LLMs, guardrails, harm levels, and datasets. Our method outperforms "Deep Token," the method from this year’s #iclr2025 Best Paper 🏆 — remaining robust against key finetuning-as-a-service threats like 🔄 response adaptation, 🧪 prompt poisoning, and 🛑 harmful prefilling. #MachineLearning #DeepLearning #LLM #AISafety #Alignment #Finetuning
Anthony Peng tweet media
English
3
15
80
9.6K
Duen Horng "Polo" Chau retweetledi
Anthony Peng
Anthony Peng@RealAnthonyPeng·
Guardrail models like 🛡️ Llama Guard do more than filtering — we repurpose them to track how safety risk evolves 📉 through a response. This gives rise to the STAR ⭐ score: a fine-grained signal for finetuning LLMs more safely 🤖🔒 Curious how it works? More in the thread 👇
Anthony Peng tweet media
English
1
3
10
798
Duen Horng "Polo" Chau retweetledi
Alec Helbling
Alec Helbling@alec_helbling·
I've been putting together an interactive tool called DiffusionLab for explaining the geometric intuition behind diffusion and flow based generative models. Sampling is actually being done in the browser using Tensorflow.js! It is still in the very early stages.
English
18
143
1.5K
104.6K
Victor
Victor@victor_explore·
This website has visualizations to understand almost all major topics in Machine Learning (link in comment)
English
3
37
271
14.6K
Duen Horng "Polo" Chau retweetledi
Alec Helbling
Alec Helbling@alec_helbling·
Diffusion models leverage a variety of samplers. Deterministic methods like DDIM produce orderly paths. In contrast, stochastic samplers like DDPM produce chaotic trajectories. Despite their differences, both methods draw valid samples from the underlying distribution.
English
24
180
1.4K
101.9K
Duen Horng "Polo" Chau retweetledi
Alec Helbling
Alec Helbling@alec_helbling·
Create heatmaps that localize text concepts in generated videos. We discovered that our approach, ConceptAttention, can be directly extended from image generation to video generation models! It's amazing how simple techniques often generalize way better than more complex ones.
English
11
64
533
40K
Duen Horng "Polo" Chau retweetledi
GaTech CSE
GaTech CSE@GTCSE·
All three RPT cases from the School of CSE this year have been approved! Join us in congratulating the following faculty on their promotions! 🥳🎉 -B. Aditya Prakash, professor -Chao Zhang, associate professor (w/tenure) -Xiuwei Zhang, associate professor (w/tenure)
GaTech CSE tweet media
English
0
2
17
678