Yuxin Wen

132 posts

Yuxin Wen banner
Yuxin Wen

Yuxin Wen

@ywen99

AI Security @OpenAI | PhD @umdcs advised by @tomgoldsteincs

Katılım Aralık 2021
868 Takip Edilen619 Takipçiler
Yuxin Wen retweetledi
Jonas Geiping
Jonas Geiping@jonasgeiping·
We’re training models wrong and it’s due to chatGPT. Even the modern coding agents used daily still use message-based exchanges: They send messages to users, to themselves (CoT) and to tools, and receive messages in turn. This bottlenecks even very intelligent agents to a single stream. The models cannot read while writing, cannot act while thinking and cannot think while processing information. In our new paper, see below, we discuss LLMs with parallel streams. We show that multi-stream LLMs can … 🔵Be created by instruction-tuning for the stream format 🔵Simplify user and tool use UX removing many pain points with agents and chat models (such as having to interrupt the model to get a word in) 🔵Multi-Stream LLMs are fast, they can predict+read tokens in all streams in parallel in each forward pass, improving latency 🔵 LLMs with multiple streams have an easier time encoding a separation of concerns, improving security 🔵 LLMs with many internal streams provide a legible form of parallel/cont. reasoning. Even if the main CoT stream is accidentally pressured or too focused on a particular task to voice concerns, other internal streams can subvocalize concerns that would otherwise not be verbalized. Does this sound related to a recent thinky post :) - Yes, but I don’t feel so bad about being outshipped with such a cool report on their side by 23 hours. I’ll link a 2nd thread below with a more direct comparison. I actually think both are complementary in interesting ways.
GIF
English
42
168
1.4K
153.6K
Yuxin Wen retweetledi
Sicheng Zhu
Sicheng Zhu@sichengzhuml·
Instruction Hierarchy defines how LLMs prioritize conflicting instructions. Our IH RL training dataset can makes models more robust to prompt injections, IH attacks, and better follow in-context safety specs while maintaining capabilities and helpfulness 🧵cdn.openai.com/pdf/14e541fa-7…
Sicheng Zhu tweet media
English
2
12
39
3.8K
Yuxin Wen retweetledi
Sean McLeish
Sean McLeish@SeanMcleish·
Looped latent reasoning models like TRM, HRM, Ouro and Huginn are great for reasoning, but they’re inefficient to train at larger scales. We fix this by post training regular language models into looped models, achieving higher accuracy on a per training FLOP basis. 📜1/7
Sean McLeish tweet media
English
9
65
391
65K
Yuxin Wen retweetledi
Jonas Geiping
Jonas Geiping@jonasgeiping·
There's been a lot of discussion recently about parallel vs sequential reasoning. The recurrent models we trained this year are sequential, which makes them good at math, but slow (see pic) However, if you squint, models with recurrent-depth/loops are like diffusion models ...
Jonas Geiping tweet mediaJonas Geiping tweet media
English
3
17
76
4.6K
Yuxin Wen retweetledi
Ravid Shwartz Ziv
Ravid Shwartz Ziv@ziv_ravid·
I've been looking into prompt optimization methods lately, and honestly, the automatic generation of good discrete prompts is still somewhat of a mess. Found this cool (and old! from 2023) paper that comes up with a very simple trick: "Hard Prompts Made Easy" Quick background: You can hand-craft prompts (which works but takes forever) or use soft prompts - continuous embeddings that are optimized with gradients. The problem is that soft prompts are uninterpretable garbage. You can't read them, can't transfer them between models, can't use them with APIs. The question is: Can we optimize discrete, readable text prompts using gradients? The problem, of course, is that text is discrete. You can't backprop through discrete tokens. This paper utilizes a concept that was previously employed in binary neural networks. They keep continuous representations during optimization, but project back to discrete tokens after each gradient step (stochastic rounding). In other words, the forward pass is discrete and the backward pass is continuous! They used it both with Stable Diffusion prompt optimization - given an image, it finds prompts that recreate it. The prompts often beat hand-crafted ones that are way longer. For text tasks, they tested on classification. Their method (PEZ) beats other discrete optimization approaches. When you add fluency constraints, the prompts make sense grammatically. You can actually understand what the model learned. The coolest part is that they show you can bypass content filters. Midjourney filters at the token level, but if you optimize through the open-source text encoder, you can find prompts that pass the filter but still generate filtered content. Now that VLMs are becoming so powerful, there are so many applications to automatic discrete prompt optimization. I'm mostly excited about synthetic data generation and understanding world models.
Ravid Shwartz Ziv tweet media
English
6
15
145
16K
Yuxin Wen retweetledi
Neel Jain
Neel Jain@neeljain1717·
Excited to present Refusal Tokens at #COLM2025, Thursday morning at Poster #72, which explores managing refusal rates across different categories, as each category requires its own. Stop by to find out more!
Neel Jain tweet media
English
1
9
23
2.4K
Yuxin Wen retweetledi
Jonas Geiping
Jonas Geiping@jonasgeiping·
Would LLMs ever *lie* to their users to prevent harm, instead of refusing harmful questions? I hope you're not too tired of reading about LLM Deception this week, because here is our report on 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝗶𝗰 𝗗𝗶𝘀𝗵𝗼𝗻𝗲𝘀𝘁𝘆 in LLMs, and how it complicates Safety Evals:
Jonas Geiping tweet mediaJonas Geiping tweet media
English
3
16
94
9.7K
Yuxin Wen retweetledi
Monte Hoover
Monte Hoover@MonteBHoover·
Guardrails with custom polices are hard for models trained on safety and harm-related datasets. But what if you trained a guardian model on arbitrary rules? Introducing DynaGuard, a guardian model for custom policies: arxiv.org/abs/2509.02563
Monte Hoover tweet media
English
1
18
43
13.3K
Yuxin Wen retweetledi
Kaiyu Yue
Kaiyu Yue@kaiyuyue·
🚀 Train Small, Run Big - Surrogate Training for Giant VLMs. Training a tiny 400M vision encoder that plugs into a 70B LLM – ✅ No billion $ GPU bills ✅ No endless fine‑tuning. Sounds like a free lunch? 🍱 Our #ICCV2025 paper shows it’s real with Zero‑Shot Grafting.📝 Paper: arxiv.org/abs/2505.22664 🧶 Thread ↓
Kaiyu Yue tweet media
English
1
3
11
1.2K
Yuxin Wen retweetledi
Jonas Geiping
Jonas Geiping@jonasgeiping·
(Structured) Model pruning is a nice tool when you really need to deploy a model that is a *bit* smaller, but don't want to deploy a bigger hammer like quantization. We recently published an improved *automated* model pruning method, surprisingly based on model merging:
Jonas Geiping tweet media
English
4
23
242
36.1K
Yuxin Wen retweetledi
Ruchit Rawal
Ruchit Rawal@RawalRuchit·
Introducing ARGUS 👁️ A benchmark for measuring hallucinations and omissions in free-form captions generated by Video-LLMs.
Ruchit Rawal tweet media
English
1
7
27
12.7K
Yuxin Wen retweetledi
Zikui Cai
Zikui Cai@zikuicai·
Introducing MORSE-500 🌐 morse-500.github.io 500 scripted videos that stress-test six reasoning skills — beyond math, beyond static pics, built to get harder. Key Features: 🚀 Fresh & Portable 🎯 Diverse Categories 👁️ Pure Visual Cues 📈 Scalable Difficulty Dive in 🧵
Zikui Cai tweet media
English
4
12
32
15.2K
Yuxin Wen retweetledi
Ashwinee Panda
Ashwinee Panda@PandaAshwinee·
fine-grained editing of videos is hard. if I use a Video Diffusion Transformer to make my videos, just adding "red" to the prompt totally changes the video. in our new paper, we dive deep into the attention maps of VDiTs and find a way to do fine-grained editing, and other stuff!
English
2
17
129
26.7K
Yuxin Wen retweetledi
Kwang Moo Yi
Kwang Moo Yi@kwangmoo_yi·
Preprint of today: Wen et al., "Analysis of Attention in Video Diffusion Transformers" -- arxiv.org/abs/2504.10317 Super interesting insights on Video ViTs. Attention sinks, sparse attention (which can be properly sparsified), specific layers being more important, and more!
English
1
28
140
11.8K
Yuxin Wen retweetledi
Juzheng Zhang
Juzheng Zhang@juzheng_z·
🚨 How much parameter redundancy does LoRA really contain? We introduce LoRI, a method that keeps performance strong—even when we drastically shrink trainable parameters of LoRA. 🧵1/N
Juzheng Zhang tweet media
English
5
40
281
49.5K
Yuxin Wen retweetledi
Ashwinee Panda
Ashwinee Panda@PandaAshwinee·
we show for the first time ever how to privacy audit LLM training. we give new SOTA methods that show how much models can memorize. by using our methods, you can know beforehand whether your model is going to memorize its training data, and how much, and when, and why! (1/n 🧵)
Ashwinee Panda tweet media
English
1
22
127
14.3K
Yuxin Wen retweetledi
Dayal Kalra
Dayal Kalra@dayal_kalra·
Low-memory optimizers sometimes match Adam but aren't as reliable, making practitioners reluctant to use them. We examine when Adam's second moments can be compressed during training. We also introduce SlimAdam, which compresses moments when feasible & preserves when detrimental
Dayal Kalra tweet media
English
2
12
87
25.9K
Yuxin Wen retweetledi
Sean McLeish
Sean McLeish@SeanMcleish·
Introducing the Gemstones💎. 22 models ranging from 50M to 2B parameters, spanning 11 widths and 18 depths trained for 350B tokens of Dolma to allow for a more detailed analysis of scaling laws. 1/n
Sean McLeish tweet media
English
5
29
171
57.6K
Yuxin Wen retweetledi
Jonas Geiping
Jonas Geiping@jonasgeiping·
Ok, so I can finally talk about this! We spent the last year (actually a bit longer) training an LLM with recurrent depth at scale. The model has an internal latent space in which it can adaptively spend more compute to think longer. I think the tech report ...🐦‍⬛
Jonas Geiping tweet media
English
54
196
2.2K
369.5K