Yuqing Yang

39 posts

Yuqing Yang

Yuqing Yang

@yyqcode

First-year PhD student @CSatUSC @nlp_usc.

Joined Haziran 2023
368 Following232 Followers
Pinned Tweet
Yuqing Yang
Yuqing Yang@yyqcode·
🧐When do LLMs admit their mistakes when they should know better? In our new paper, we define this behavior as retraction: the model indicates that its generated answer was wrong. LLMs can retract—but they rarely do.🤯 arxiv.org/abs/2505.16170 👇🧵
Yuqing Yang tweet media
English
5
23
115
14.3K
Yuqing Yang retweeted
Johnny Tian-Zheng Wei
Johnny Tian-Zheng Wei@johntzwei·
Announcing 🔭✨Hubble, a suite of open-source LLMs to advance the study of memorization! Pretrained models up to 8B params, with controlled insertion of texts (e.g., book passages, biographies, test sets, and more!) designed to emulate key memorization risks 🧵
Johnny Tian-Zheng Wei tweet media
English
2
40
130
47.8K
Yuqing Yang retweeted
Chenxin An
Chenxin An@AnChancy46881·
# 🚨 4B open-recipe model beats Claude-4-Opus 🔓 100% open data, recipe, model weights and code. Introducing Polaris✨--a post-training recipe for scaling RL on advanced reasoning models. 🥳 Check out how we boost open-recipe reasoning models to incredible performance levels (65 → 79 on AIME25) through RL training on open-source data and academic-level resources. 📑Notion: honorable-payment-890.notion.site/POLARIS-A-POst… 📗Blog post: hkunlp.github.io/blog/2025/Pola… 🤗Model & data: huggingface.co/POLARIS-Project 💻Code: github.com/ChenxinAn-fdu/…
Chenxin An tweet media
English
24
80
444
100.3K
Yuqing Yang retweeted
Xi Ye
Xi Ye@xiye_nlp·
There’s been hot debate about (The Illusion of) The Illusion of Thinking. My take: it’s not that models can’t reason — they just aren’t perfect at long-form generation yet. We eval reasoning models on LongProc benchmark (requiring generating 8K CoTs, see thread). Reasoning models actually outperform instruction-tuned ones, showing the benefit of long CoT training for long outputs. Still, there’s plenty of room for improvement. See details in our post 👉 princeton-pli.github.io/LongProc/rmode…
Xi Ye@xiye_nlp

🤔Now most LLMs have >= 128K context sizes, but are they good at generating long outputs, such as writing 8K token chain-of-thought for a planning problem? 🔔Introducing LongProc (Long Procedural Generation), a new benchmark with 6 diverse tasks that challenge LLMs to synthesize highly dispersed information and generate long, structured outputs.

English
1
7
34
4.6K
Yuqing Yang retweeted
Dongwei Jiang
Dongwei Jiang@Dongwei__Jiang·
🧵 Recent studies show LLMs can self-improve their responses when given external feedback. But how effectively can they incorporate it? We tested this systematically—and found they can't fully integrate feedback, even when the feedback is high-quality and backed by ground-truth.
Dongwei Jiang tweet media
English
3
26
111
14.6K
Yuqing Yang retweeted
Linxin Song
Linxin Song@linxins2·
🚨 We discovered a surprising side effect of Reinforcement Finetuning (RFT): it makes LLMs more confidently wrong on unanswerable questions. We call this the hallucination tax: a drop in refusal behavior that leads to overconfident hallucinations. 🧵 1/n
Linxin Song tweet media
English
5
39
268
35.9K
Yuqing Yang retweeted
Deqing Fu
Deqing Fu@DeqingFu·
Textual steering vectors can improve visual understanding in multimodal LLMs! You can extract steering vectors via any interpretability toolkit you like -- SAEs, MeanShift, Probes -- and apply them to image or text tokens (or both) of Multimodal LLMs. And They Steer!
Deqing Fu tweet media
English
1
13
48
7.6K
Yuqing Yang
Yuqing Yang@yyqcode·
🧵6/7 In short: Retraction is belief-driven. And belief can be probed, steered, and trained. We hope our work contributes to a better understanding of self-correction in LLMs.
English
1
0
3
180
Yuqing Yang
Yuqing Yang@yyqcode·
🧐When do LLMs admit their mistakes when they should know better? In our new paper, we define this behavior as retraction: the model indicates that its generated answer was wrong. LLMs can retract—but they rarely do.🤯 arxiv.org/abs/2505.16170 👇🧵
Yuqing Yang tweet media
English
5
23
115
14.3K
Yuqing Yang retweeted
Tianyi Zhou
Tianyi Zhou@tianyi_zhou12·
Billion-parameter LLMs still struggle with simple arithmetic? 📞 FoNE (Fourier Number Embedding) tackles this problem. By mapping numbers directly into Fourier space, it bypasses tokenization and significantly improves numerical accuracy with better efficiency and accuracy.
GIF
English
2
12
24
3.4K
Yuqing Yang retweeted
Muru Zhang
Muru Zhang@zhang_muru·
Running your model on multiple GPUs but often found the speed not satisfiable? We introduce Ladder-residual, a parallelism-aware architecture modification that makes 70B Llama with tensor parallelism ~30% faster! Work done at @togethercompute. Co-1st author with @MayankMish98 and mentored by @ben_athi, @tri_dao arxiv.org/pdf/2501.06589 🧵[1/7]
Muru Zhang tweet media
English
5
59
323
76.9K
Yuqing Yang retweeted
Tengxiao Liu
Tengxiao Liu@TengxiaoLiu·
Come join the #NeurIPS2024 poster session and discuss whether language models can learn to skip steps in reasoning! 🗓Dec 12, Thursday, 11:00 am - 2:00 pm 📍East Exhibit Hall A-C #2900 Feel free to stop by and say hi! I am actively seeking Summer 2025 internship opportunities!
Tengxiao Liu@TengxiaoLiu

🤔Can LMs learn to skip steps to improve reasoning efficiency while maintaining accuracy? ✅The answer is Yes! In our #NeurIPS 2024 work, we show this behavior boosts efficiency, maintains accuracy, and even enhances generalization in OOD scenarios! 🚀arxiv.org/pdf/2411.01855 🧵⬇️

English
0
3
14
958
Yuqing Yang
Yuqing Yang@yyqcode·
Arrived in Vancouver for #NeurIPS2024 🇨🇦! I'll be presenting Alignment for Honesty, a year-old paper that still fascinates me with how LLMs navigate knowledge boundaries. Also glad to chat about self-correction and reasoning. Actively seeking a 2025 summer internship!
Yuqing Yang tweet media
English
3
12
55
5.7K