Yuqing Yang

53 posts

Yuqing Yang

Yuqing Yang

@yyqcode

Second-year PhD student @CSatUSC @nlp_usc.

Katılım Haziran 2023
369 Takip Edilen258 Takipçiler
Sabitlenmiş Tweet
Yuqing Yang
Yuqing Yang@yyqcode·
🧵 1/8 What should an LLM assistant remember across conversations? Existing memory work studies this one task at a time. But real-world assistants see all kinds of conversations, and that changes the problem. Introducing BEHEMOTH 🦣 + CluE 🌱: a benchmark & self-evolving method for heterogeneous memory extraction. 📄 Paper: arxiv.org/abs/2604.11610
Yuqing Yang tweet media
English
6
16
50
13.5K
Yuqing Yang retweetledi
Linxin Song
Linxin Song@linxins2·
The future risk of computer-use agents won’t come only from malicious prompts. It will come from agents that can flawlessly follow normal instructions straight into harm. Introducing 𝐎𝐒-𝐁𝐥𝐢𝐧𝐝: a realistic but overlooked setting where every task begins with a benign user instruction, yet the harmfulness only emerges as the agent acts in the environment.
Linxin Song tweet media
English
2
6
39
6.9K
Yuqing Yang retweetledi
Deqing Fu
Deqing Fu@DeqingFu·
New paper: Convergent Evolution: How Different Language Models Learn Similar Number Representations. Language models, classical word embeddings, and even raw token frequencies all develop the same Fourier features for numbers. But only some develop the underlying structure. 🧵
Deqing Fu tweet media
English
2
22
107
45.2K
Yuqing Yang retweetledi
Deqing Fu
Deqing Fu@DeqingFu·
After three papers on Fourier features in LLMs, I think there's a principle worth naming. How should we do science on an LLM? It corresponds to the existential questions: > who am I? ↔ the phenomenon. > where do I come from? ↔ the emergence. > where am I going? ↔ the use. 🧵
English
103
173
3.7K
5.2M
Yuqing Yang
Yuqing Yang@yyqcode·
8/8 Both artifacts may find use beyond this paper. 🦣 BEHEMOTH as a testbed for diverse memory extraction approaches (self-evolving, routing-based, skill-based, and beyond). 🌱 CluE for any setting where one agent must handle heterogeneous demands, e.g. serving users with distinct habits. w/ @TengxiaoLiu, @BillJohn1235813, @taiwei_shi, @linxins2, @robinomial Check out the paper & code if this resonates!
English
0
1
6
347
Yuqing Yang
Yuqing Yang@yyqcode·
7/8 Bonus findings: • CluE preserves strengths when starting from a stronger seed • Transfers to Gemini-3-Flash backend • Single-step gains carry over to continual memory settings • Produces clean, structured taxonomies, not bloated rule lists
Yuqing Yang tweet media
English
1
1
6
171
Yuqing Yang
Yuqing Yang@yyqcode·
🧵 1/8 What should an LLM assistant remember across conversations? Existing memory work studies this one task at a time. But real-world assistants see all kinds of conversations, and that changes the problem. Introducing BEHEMOTH 🦣 + CluE 🌱: a benchmark & self-evolving method for heterogeneous memory extraction. 📄 Paper: arxiv.org/abs/2604.11610
Yuqing Yang tweet media
English
6
16
50
13.5K
Yuqing Yang
Yuqing Yang@yyqcode·
Coding agents running 24/7 will unlock a lot of breakthroughs 🚀. Easy to feel like we're being replaced 😨. But the real question: What can we learn from this, and where do they still fall short? New blog ⬇️
Tengxiao Liu@TengxiaoLiu

Auto research is on 🔥 We give algorithmic problems (like circle packing) to general coding agents, let it run overnight. 🌙 Agents reach SoTA. But more importantly: we analyze 100+ hours of trajectories to understand how it gets there 🧵

English
0
1
4
577
Yuqing Yang retweetledi
Johnny Tian-Zheng Wei
Johnny Tian-Zheng Wei@johntzwei·
Announcing 🔭✨Hubble, a suite of open-source LLMs to advance the study of memorization! Pretrained models up to 8B params, with controlled insertion of texts (e.g., book passages, biographies, test sets, and more!) designed to emulate key memorization risks 🧵
Johnny Tian-Zheng Wei tweet media
English
2
41
131
49.6K
Yuqing Yang retweetledi
Chenxin An
Chenxin An@AnChancy46881·
# 🚨 4B open-recipe model beats Claude-4-Opus 🔓 100% open data, recipe, model weights and code. Introducing Polaris✨--a post-training recipe for scaling RL on advanced reasoning models. 🥳 Check out how we boost open-recipe reasoning models to incredible performance levels (65 → 79 on AIME25) through RL training on open-source data and academic-level resources. 📑Notion: honorable-payment-890.notion.site/POLARIS-A-POst… 📗Blog post: hkunlp.github.io/blog/2025/Pola… 🤗Model & data: huggingface.co/POLARIS-Project 💻Code: github.com/ChenxinAn-fdu/…
Chenxin An tweet media
English
24
82
445
100.5K
Yuqing Yang retweetledi
Xi Ye
Xi Ye@xiye_nlp·
There’s been hot debate about (The Illusion of) The Illusion of Thinking. My take: it’s not that models can’t reason — they just aren’t perfect at long-form generation yet. We eval reasoning models on LongProc benchmark (requiring generating 8K CoTs, see thread). Reasoning models actually outperform instruction-tuned ones, showing the benefit of long CoT training for long outputs. Still, there’s plenty of room for improvement. See details in our post 👉 princeton-pli.github.io/LongProc/rmode…
Xi Ye@xiye_nlp

🤔Now most LLMs have >= 128K context sizes, but are they good at generating long outputs, such as writing 8K token chain-of-thought for a planning problem? 🔔Introducing LongProc (Long Procedural Generation), a new benchmark with 6 diverse tasks that challenge LLMs to synthesize highly dispersed information and generate long, structured outputs.

English
1
13
34
4.7K
Yuqing Yang retweetledi
Dongwei Jiang
Dongwei Jiang@Dongwei__Jiang·
🧵 Recent studies show LLMs can self-improve their responses when given external feedback. But how effectively can they incorporate it? We tested this systematically—and found they can't fully integrate feedback, even when the feedback is high-quality and backed by ground-truth.
Dongwei Jiang tweet media
English
3
32
111
14.7K
Yuqing Yang retweetledi
Linxin Song
Linxin Song@linxins2·
🚨 We discovered a surprising side effect of Reinforcement Finetuning (RFT): it makes LLMs more confidently wrong on unanswerable questions. We call this the hallucination tax: a drop in refusal behavior that leads to overconfident hallucinations. 🧵 1/n
Linxin Song tweet media
English
5
41
267
36K