Polo Data Club

334 posts

Polo Data Club

@PoloDataClub

Polo Club of Data Science at @georgiatech. Scalable Interactive Data Analytics. Visit homepage for info on club members, project and more! @gtcomputing @gtcse

Atlanta, GA Katılım Haziran 2014

176 Takip Edilen1K Takipçiler

Polo Data Club retweetledi

Anthony Peng@RealAnthonyPeng·23 Eyl

@Alibaba_Qwen Congrats on the great work! The "token-level safety detection" idea echoes our recent NeurIPS'25 dynamic safety shaping paper! 👉 arxiv.org/abs/2505.17196

English

1.2K

Polo Data Club retweetledi

Seongmin Lee@SeongminLeee·2 Eyl

🎉Our paper "Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety" has been accepted to EMNLP 2025 Main Track! @emnlpmeeting 👉First survey connecting LLM interpretation & safety

English

176

13.8K

Polo Data Club retweetledi

Anthony Peng@RealAnthonyPeng·26 May

🚨 New work: We rethink how we finetune safer LLMs — not by filtering after the generation, but by tracking safety risk token by token during training. We repurpose guardrail models like 🛡️ Llama Guard and Granite Guardian to score evolving risk across each response 📉 — giving rise to the STAR ⭐ score, a fine-grained safety signal that enables more targeted safety supervision. On top of this, we introduce ⭐DSS (STAR-Guided Dynamic Safety Shaping) — a training method that 🚫 suppresses unsafe patterns, 💪 preserves capability, and generalizes across LLMs, guardrails, harm levels, and datasets. Our method outperforms "Deep Token," the method from this year’s #iclr2025 Best Paper 🏆 — remaining robust against key finetuning-as-a-service threats like 🔄 response adaptation, 🧪 prompt poisoning, and 🛑 harmful prefilling. #MachineLearning #DeepLearning #LLM #AISafety #Alignment #Finetuning

English

9.6K

Polo Data Club retweetledi

Anthony Peng@RealAnthonyPeng·23 May

Guardrail models like 🛡️ Llama Guard do more than filtering — we repurpose them to track how safety risk evolves 📉 through a response. This gives rise to the STAR ⭐ score: a fine-grained signal for finetuning LLMs more safely 🤖🔒 Curious how it works? More in the thread 👇

English

797

Polo Data Club retweetledi

Victor@victor_explore·12 Nis

This website has visualizations to understand almost all major topics in Machine Learning (link in comment)

English

272

14.6K

Polo Data Club retweetledi

Alec Helbling@alec_helbling·25 Mar

One of the simplest algorithms for sampling from a probability distribution is Random Walk Metropolis-Hastings. It proposes new samples by taking Gaussian-distributed steps, accepting or rejecting them to maintain the target distribution. I call this pdf the "fidget spinner".

English

152

1.3K

79.8K

Polo Data Club retweetledi

Alec Helbling@alec_helbling·4 Mar

Create heatmaps that localize text concepts in generated videos. We discovered that our approach, ConceptAttention, can be directly extended from image generation to video generation models! It's amazing how simple techniques often generalize way better than more complex ones.

English

533

40K

Polo Data Club retweetledi

Alec Helbling@alec_helbling·28 Şub

Diffusion Transformers aren't just generative models, but also powerful multi-modal encoders. ConceptAttention creates rich heatmaps of text concepts in images from DiT representations. This even works on real images, and can be applied to tasks like segmentation! Demo 👇

English

355

24.4K

Polo Data Club retweetledi

Alec Helbling@alec_helbling·26 Şub

Introducing ConceptAttention, an approach to interpreting diffusion transformer models! Write a prompt, choose some concepts, generate an image, and get high-quality heatmaps of text concepts. Our method outperforms existing methods like cross attention. Link to demo 👇

English

477

36.6K

Polo Data Club retweetledi

Alec Helbling@alec_helbling·24 Şub

Gradient descent alone tends to converge to local minima. Momentum frames optimization as a ball with mass moving down a hill. By adding inertia, the ball resists settling in small basins, allowing it to arrive at the global minimum.

English

1.5K

Polo Data Club retweetledi

Seongmin Lee@SeongminLeee·16 Ara

🚀 Effective Guidance for Model Attention with Simple Yes-no Annotations Excited to share that I'll be presenting our recent work 🎨CRAYON🖍️ at @ieeebigdata soon! Catch me at 2pm in the Deep Learning II session!

English

1.2K

Polo Data Club retweetledi

Duen Horng "Polo" Chau@PoloChau·31 Eki

🎉The coolest #CSE school in the world is hiring multiple faculty members! Application link below👇

English

5.6K

Polo Data Club retweetledi

Anthony Peng@RealAnthonyPeng·29 Eki

🧑‍💻 The code of our NeurIPS'24 LLM safety landscape paper is now publicly available at: github.com/poloclub/llm-l… x.com/RealAnthonyPen…

Anthony Peng@RealAnthonyPeng

LLM safety alignment can be easily compromised by finetuning with only a few adversarially designed training examples. 😲 Why? Are all open-source LLMs equally vulnerable to finetuning? How fast does the model start to break during finetuning? 🤔

English

1.6K

Polo Data Club@PoloDataClub·30 Eki

@einsums @Sumanth_077 Thanks for asking! Here is the code github.com/poloclub/trans…

English

fwdpass@einsums·30 Eki

@Sumanth_077 You are a visualization wizard. How did you make this?

English

Polo Data Club retweetledi

Sumanth@Sumanth_077·29 Eki

Transformers visually explained: poloclub.github.io/transformer-ex…

English

631

3.2K

211.8K

Polo Data Club@PoloDataClub·30 Eki

@kasplatch @Sumanth_077 Sure! Diffusion Explainer poloclub.github.io/diffusion-expl…

Français

SUP!@kasplatch·30 Eki

@Sumanth_077 can you do something like this for difussions?

English

128

Polo Data Club retweetledi

GaTech CSE@GTCSE·14 Eki

CSE Prof. @PoloChau and his group are presenting two papers and two posters this week at @ieeevis! Check out the interactive graphic 🔗👇 for a peek of all Georgia Tech research presented this week, including award-winning work on Transformer Explainer! public.tableau.com/views/VIS2024/…

English

1.3K

Polo Data Club retweetledi

Seongmin Lee@SeongminLeee·16 Eki

🚀Excited to present Diffusion Explainer at the @ieeevis tomorrow at 1:45pm EST in the AI & LLM session! Try it now: poloclub.github.io/diffusion-expl… #StableDiffusion #GenerativeAI #AI #Visualization #IEEEVIS2024

English

2.4K

Polo Data Club retweetledi

CMU Human-Computer Interaction Institute@cmuhcii·2 Eki

Please join us in congratulating longtime staff member, Queenie Kravitz, on her retirement today. She started @CarnegieMellon in 1993 and the HCII in 2004, and as graduate program coordinator certified our very first HCI PhD and master's degrees. Congrats, Queenie! #CMUhcii

CMU Human-Computer Interaction Institute tweet media

English

7.9K

Polo Data Club retweetledi

Anthony Peng@RealAnthonyPeng·25 Eyl

😎 Our paper on the LLM safety landscape has been accepted at @NeurIPSConf 2024! #Safety #LLM #MachineLearning

Anthony Peng@RealAnthonyPeng

English

4.5K

Keşfet

@Alibaba_Qwen @emnlpmeeting @ieeebigdata @einsums @Sumanth_077 @kasplatch @PoloChau @ieeevis