Yuqing Yang

39 posts

Yuqing Yang

@yyqcode

First-year PhD student @CSatUSC @nlp_usc.

加入时间 Haziran 2023

368 关注232 粉丝

置顶推文

Yuqing Yang@yyqcode·29 May

🧐When do LLMs admit their mistakes when they should know better? In our new paper, we define this behavior as retraction: the model indicates that its generated answer was wrong. LLMs can retract—but they rarely do.🤯 arxiv.org/abs/2505.16170 👇🧵

English

115

14.3K

Yuqing Yang@yyqcode·26 Kas

A practical insight!

Tengxiao Liu@TengxiaoLiu

🏧Giving your agent unlimited tool calls doesn't make it smarter. 💡Why? It lacks 'Budget Awareness'! Introducing Budget Tracker, a simple plug-in that enables more effective scaling behaviors: higher performance, lower cost. Paper: arxiv.org/pdf/2511.17006

English

281

Yuqing Yang 已转推

Johnny Tian-Zheng Wei@johntzwei·24 Eki

Announcing 🔭✨Hubble, a suite of open-source LLMs to advance the study of memorization! Pretrained models up to 8B params, with controlled insertion of texts (e.g., book passages, biographies, test sets, and more!) designed to emulate key memorization risks 🧵

English

130

47.8K

Yuqing Yang 已转推

Chenxin An@AnChancy46881·20 Haz

# 🚨 4B open-recipe model beats Claude-4-Opus 🔓 100% open data, recipe, model weights and code. Introducing Polaris✨--a post-training recipe for scaling RL on advanced reasoning models. 🥳 Check out how we boost open-recipe reasoning models to incredible performance levels (65 → 79 on AIME25) through RL training on open-source data and academic-level resources. 📑Notion: honorable-payment-890.notion.site/POLARIS-A-POst… 📗Blog post: hkunlp.github.io/blog/2025/Pola… 🤗Model & data: huggingface.co/POLARIS-Project 💻Code: github.com/ChenxinAn-fdu/…

English

444

100.3K

Yuqing Yang 已转推

Xi Ye@xiye_nlp·19 Haz

There’s been hot debate about (The Illusion of) The Illusion of Thinking. My take: it’s not that models can’t reason — they just aren’t perfect at long-form generation yet. We eval reasoning models on LongProc benchmark (requiring generating 8K CoTs, see thread). Reasoning models actually outperform instruction-tuned ones, showing the benefit of long CoT training for long outputs. Still, there’s plenty of room for improvement. See details in our post 👉 princeton-pli.github.io/LongProc/rmode…

Xi Ye@xiye_nlp

🤔Now most LLMs have >= 128K context sizes, but are they good at generating long outputs, such as writing 8K token chain-of-thought for a planning problem？ 🔔Introducing LongProc (Long Procedural Generation), a new benchmark with 6 diverse tasks that challenge LLMs to synthesize highly dispersed information and generate long, structured outputs.

English

4.6K

Yuqing Yang 已转推

Dongwei Jiang@Dongwei__Jiang·16 Haz

🧵 Recent studies show LLMs can self-improve their responses when given external feedback. But how effectively can they incorporate it? We tested this systematically—and found they can't fully integrate feedback, even when the feedback is high-quality and backed by ground-truth.

English

111

14.6K

Yuqing Yang 已转推

Linxin Song@linxins2·21 May

🚨 We discovered a surprising side effect of Reinforcement Finetuning (RFT): it makes LLMs more confidently wrong on unanswerable questions. We call this the hallucination tax: a drop in refusal behavior that leads to overconfident hallucinations. 🧵 1/n

English

268

35.9K

Yuqing Yang 已转推

Deqing Fu@DeqingFu·21 May

Textual steering vectors can improve visual understanding in multimodal LLMs! You can extract steering vectors via any interpretability toolkit you like -- SAEs, MeanShift, Probes -- and apply them to image or text tokens (or both) of Multimodal LLMs. And They Steer!

English

7.6K

Yuqing Yang@yyqcode·29 May

🧵7/7 📄Paper: arxiv.org/abs/2505.16170 💻 Code & data: github.com/ayyyq/llm-retr… 🙌 Many thanks to my awesome advisor @robinomial and everyone at Allegro Lab!

English

168

Yuqing Yang@yyqcode·29 May

🧵6/7 In short: Retraction is belief-driven. And belief can be probed, steered, and trained. We hope our work contributes to a better understanding of self-correction in LLMs.

English

180

Yuqing Yang@yyqcode·29 May

English

115

14.3K

Yuqing Yang 已转推

Linxin Song@linxins2·1 Nis

Want to know what your LLM don’t know? This is how 👇 Preprint: arxiv.org/abs/2503.23361 Code: github.com/uscnlp-lime/SEA

English

23.1K

Yuqing Yang 已转推

Tianyi Zhou@tianyi_zhou12·18 Şub

Billion-parameter LLMs still struggle with simple arithmetic? 📞 FoNE (Fourier Number Embedding) tackles this problem. By mapping numbers directly into Fourier space, it bypasses tokenization and significantly improves numerical accuracy with better efficiency and accuracy.

GIF

English

3.4K

Yuqing Yang 已转推

Muru Zhang@zhang_muru·4 Şub

Running your model on multiple GPUs but often found the speed not satisfiable? We introduce Ladder-residual, a parallelism-aware architecture modification that makes 70B Llama with tensor parallelism ~30% faster! Work done at @togethercompute. Co-1st author with @MayankMish98 and mentored by @ben_athi, @tri_dao arxiv.org/pdf/2501.06589 🧵[1/7]

English

323

76.9K

Yuqing Yang 已转推

Tengxiao Liu@TengxiaoLiu·12 Ara

Come join the #NeurIPS2024 poster session and discuss whether language models can learn to skip steps in reasoning! 🗓Dec 12, Thursday, 11:00 am - 2:00 pm 📍East Exhibit Hall A-C #2900 Feel free to stop by and say hi! I am actively seeking Summer 2025 internship opportunities!

Tengxiao Liu@TengxiaoLiu

🤔Can LMs learn to skip steps to improve reasoning efficiency while maintaining accuracy? ✅The answer is Yes! In our #NeurIPS 2024 work, we show this behavior boosts efficiency, maintains accuracy, and even enhances generalization in OOD scenarios! 🚀arxiv.org/pdf/2411.01855 🧵⬇️

English

958

Yuqing Yang@yyqcode·11 Ara

@ShuaichenChang Thank you!!

English

103

Shuaichen Chang@ShuaichenChang·11 Ara

@yyqcode We (AWS AI LAB) are hiring!

English

112

Yuqing Yang@yyqcode·10 Ara

Arrived in Vancouver for #NeurIPS2024 🇨🇦! I'll be presenting Alignment for Honesty, a year-old paper that still fascinates me with how LLMs navigate knowledge boundaries. Also glad to chat about self-correction and reasoning. Actively seeking a 2025 summer internship!

English

5.7K

发现

@robinomial @togethercompute @MayankMish98 @ben_athi @tri_dao @ShuaichenChang @elonmusk @BarackObama