Yuxin Wen

132 posts

Yuxin Wen

@ywen99

AI Security @OpenAI | PhD @umdcs advised by @tomgoldsteincs

Katılım Aralık 2021

868 Takip Edilen619 Takipçiler

Yuxin Wen retweetledi

Jonas Geiping@jonasgeiping·13 May

We’re training models wrong and it’s due to chatGPT. Even the modern coding agents used daily still use message-based exchanges: They send messages to users, to themselves (CoT) and to tools, and receive messages in turn. This bottlenecks even very intelligent agents to a single stream. The models cannot read while writing, cannot act while thinking and cannot think while processing information. In our new paper, see below, we discuss LLMs with parallel streams. We show that multi-stream LLMs can … 🔵Be created by instruction-tuning for the stream format 🔵Simplify user and tool use UX removing many pain points with agents and chat models (such as having to interrupt the model to get a word in) 🔵Multi-Stream LLMs are fast, they can predict+read tokens in all streams in parallel in each forward pass, improving latency 🔵 LLMs with multiple streams have an easier time encoding a separation of concerns, improving security 🔵 LLMs with many internal streams provide a legible form of parallel/cont. reasoning. Even if the main CoT stream is accidentally pressured or too focused on a particular task to voice concerns, other internal streams can subvocalize concerns that would otherwise not be verbalized. Does this sound related to a recent thinky post :) - Yes, but I don’t feel so bad about being outshipped with such a cool report on their side by 23 hours. I’ll link a 2nd thread below with a more direct comparison. I actually think both are complementary in interesting ways.

GIF

English

168

1.4K

153.6K

Yuxin Wen retweetledi

Jeffrey Yang Fan Chiang@JeffreyFC1225·23 Mar

x.com/i/article/2035…

ZXX

Yuxin Wen retweetledi

Sicheng Zhu@sichengzhuml·10 Mar

Instruction Hierarchy defines how LLMs prioritize conflicting instructions. Our IH RL training dataset can makes models more robust to prompt injections, IH attacks, and better follow in-context safety specs while maintaining capabilities and helpfulness 🧵cdn.openai.com/pdf/14e541fa-7…

English

3.8K

Yuxin Wen retweetledi

Sean McLeish@SeanMcleish·11 Kas

Looped latent reasoning models like TRM, HRM, Ouro and Huginn are great for reasoning, but they’re inefficient to train at larger scales. We fix this by post training regular language models into looped models, achieving higher accuracy on a per training FLOP basis. 📜1/7

English

391

65K

Yuxin Wen retweetledi

Jonas Geiping@jonasgeiping·21 Eki

There's been a lot of discussion recently about parallel vs sequential reasoning. The recurrent models we trained this year are sequential, which makes them good at math, but slow (see pic) However, if you squint, models with recurrent-depth/loops are like diffusion models ...

English

4.6K

Yuxin Wen retweetledi

Ravid Shwartz Ziv@ziv_ravid·12 Eki

I've been looking into prompt optimization methods lately, and honestly, the automatic generation of good discrete prompts is still somewhat of a mess. Found this cool (and old! from 2023) paper that comes up with a very simple trick: "Hard Prompts Made Easy" Quick background: You can hand-craft prompts (which works but takes forever) or use soft prompts - continuous embeddings that are optimized with gradients. The problem is that soft prompts are uninterpretable garbage. You can't read them, can't transfer them between models, can't use them with APIs. The question is: Can we optimize discrete, readable text prompts using gradients? The problem, of course, is that text is discrete. You can't backprop through discrete tokens. This paper utilizes a concept that was previously employed in binary neural networks. They keep continuous representations during optimization, but project back to discrete tokens after each gradient step (stochastic rounding). In other words, the forward pass is discrete and the backward pass is continuous! They used it both with Stable Diffusion prompt optimization - given an image, it finds prompts that recreate it. The prompts often beat hand-crafted ones that are way longer. For text tasks, they tested on classification. Their method (PEZ) beats other discrete optimization approaches. When you add fluency constraints, the prompts make sense grammatically. You can actually understand what the model learned. The coolest part is that they show you can bypass content filters. Midjourney filters at the token level, but if you optimize through the open-source text encoder, you can find prompts that pass the filter but still generate filtered content. Now that VLMs are becoming so powerful, there are so many applications to automatic discrete prompt optimization. I'm mostly excited about synthetic data generation and understanding world models.

English

145

16K

Yuxin Wen retweetledi

Neel Jain@neeljain1717·6 Eki

Excited to present Refusal Tokens at #COLM2025, Thursday morning at Poster #72, which explores managing refusal rates across different categories, as each category requires its own. Stop by to find out more!

English

2.4K

Yuxin Wen retweetledi

Jonas Geiping@jonasgeiping·23 Eyl

Would LLMs ever *lie* to their users to prevent harm, instead of refusing harmful questions? I hope you're not too tired of reading about LLM Deception this week, because here is our report on 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝗶𝗰 𝗗𝗶𝘀𝗵𝗼𝗻𝗲𝘀𝘁𝘆 in LLMs, and how it complicates Safety Evals:

English

9.7K

Yuxin Wen retweetledi

Monte Hoover@MonteBHoover·8 Eyl

Guardrails with custom polices are hard for models trained on safety and harm-related datasets. But what if you trained a guardian model on arbitrary rules? Introducing DynaGuard, a guardian model for custom policies: arxiv.org/abs/2509.02563

English

13.3K

Yuxin Wen retweetledi

Kaiyu Yue@kaiyuyue·15 Ağu

🚀 Train Small, Run Big - Surrogate Training for Giant VLMs. Training a tiny 400M vision encoder that plugs into a 70B LLM – ✅ No billion $ GPU bills ✅ No endless fine‑tuning. Sounds like a free lunch? 🍱 Our #ICCV2025 paper shows it’s real with Zero‑Shot Grafting.📝 Paper: arxiv.org/abs/2505.22664 🧶 Thread ↓

English

1.2K

Yuxin Wen retweetledi

Jonas Geiping@jonasgeiping·30 Haz

(Structured) Model pruning is a nice tool when you really need to deploy a model that is a *bit* smaller, but don't want to deploy a bigger hammer like quantization. We recently published an improved *automated* model pruning method, surprisingly based on model merging:

English

242

36.1K

Yuxin Wen retweetledi

Ruchit Rawal@RawalRuchit·10 Haz

Introducing ARGUS 👁️ A benchmark for measuring hallucinations and omissions in free-form captions generated by Video-LLMs.

English

12.7K

Yuxin Wen retweetledi

Zikui Cai@zikuicai·10 Haz

Introducing MORSE-500 🌐 morse-500.github.io 500 scripted videos that stress-test six reasoning skills — beyond math, beyond static pics, built to get harder. Key Features: 🚀 Fresh & Portable 🎯 Diverse Categories 👁️ Pure Visual Cues 📈 Scalable Difficulty Dive in 🧵

English

15.2K

Yuxin Wen retweetledi

Ashwinee Panda@PandaAshwinee·16 Nis

fine-grained editing of videos is hard. if I use a Video Diffusion Transformer to make my videos, just adding "red" to the prompt totally changes the video. in our new paper, we dive deep into the attention maps of VDiTs and find a way to do fine-grained editing, and other stuff!

English

129

26.7K

Yuxin Wen retweetledi

Kwang Moo Yi@kwangmoo_yi·15 Nis

Preprint of today: Wen et al., "Analysis of Attention in Video Diffusion Transformers" -- arxiv.org/abs/2504.10317 Super interesting insights on Video ViTs. Attention sinks, sparse attention (which can be properly sparsified), specific layers being more important, and more!

English

140

11.8K

Yuxin Wen retweetledi

Juzheng Zhang@juzheng_z·15 Nis

🚨 How much parameter redundancy does LoRA really contain? We introduce LoRI, a method that keeps performance strong—even when we drastically shrink trainable parameters of LoRA. 🧵1/N

English

281

49.5K

Yuxin Wen retweetledi

Ashwinee Panda@PandaAshwinee·13 Mar

we show for the first time ever how to privacy audit LLM training. we give new SOTA methods that show how much models can memorize. by using our methods, you can know beforehand whether your model is going to memorize its training data, and how much, and when, and why! (1/n 🧵)

English

127

14.3K

Yuxin Wen retweetledi

Dayal Kalra@dayal_kalra·7 Mar

Low-memory optimizers sometimes match Adam but aren't as reliable, making practitioners reluctant to use them. We examine when Adam's second moments can be compressed during training. We also introduce SlimAdam, which compresses moments when feasible & preserves when detrimental

English

25.9K

Yuxin Wen retweetledi

Sean McLeish@SeanMcleish·12 Şub

Introducing the Gemstones💎. 22 models ranging from 50M to 2B parameters, spanning 11 widths and 18 depths trained for 350B tokens of Dolma to allow for a more detailed analysis of scaling laws. 1/n

English

171

57.6K

Yuxin Wen retweetledi

Jonas Geiping@jonasgeiping·10 Şub

Ok, so I can finally talk about this! We spent the last year (actually a bit longer) training an LLM with recurrent depth at scale. The model has an internal latent space in which it can adaptively spend more compute to think longer. I think the tech report ...🐦‍⬛

English

196

2.2K

369.5K

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry