Kaishuai

37 posts

Kaishuai banner
Kaishuai

Kaishuai

@xukaish

Hong Kong Katılım Ocak 2015
178 Takip Edilen44 Takipçiler
Kaishuai retweetledi
Yuji Zhang
Yuji Zhang@Yuji_Zhang_NLP·
🤔Hold on, I can answer better. 🔗New preprint on LLM multi-turn performance drop and recovery [arxiv.org/pdf/2604.04325]. 💡We identify a hidden tension in multi-turn reasoning: hold vs. lure. ⌛️Models can hold their intent to answer until sufficient evidence is observed, avoiding premature errors. ☔️But this ability is fragile—salient information can lure models to answer. ⬇️Even with the same information, performance drops significantly when moving from single-turn to multi-turn reasoning. ❓We ask: is this due to an overly strong intent to answer early? 🧑‍⚕️This is especially critical in medical diagnosis, a high-stakes setting with low tolerance for error, where a wrong answer at any turn can have serious consequences. 🎯To study this, we introduce MINT (Medical Incremental N-Turn Benchmark). MINT is: ✔ Information-preserving: decomposed cases can be concatenated to recover original single-turn performance, isolating the effect of interaction ✔ High-fidelity: clinically structured evidence (e.g., history, labs) with controlled turn granularity 💡Our key findings: 🏃1. Strong early-answer intent: Over 55% of answers are given within the first 2 turns, leading to a 20–50% accuracy drop from single-turn to multi-turn. ⏰2. Holding unlocks self-correction: When models are instructed to WAIT, the performance drop is greatly reduced. Incorrect→correct revisions occur up to 10.6× more often than the reverse, revealing a latent self-correction ability suppressed by early commitment. 🦴3. Strong lures override control: Clinically salient signals (e.g., lab results) trigger premature answers—even when models are explicitly told to wait. 👇4. Actionable implications: • Deferring the diagnostic question improves first-answer accuracy by up to 62.6% • Delaying salient evidence prevents up to 23.3% catastrophic accuracy drop. Thanks to all our coauthors for their amazing support! @ Jinrui Fang @ Runhan Chen @ Xu Yang @ Jian Yu @ Jiawei Xu @ Ashwin Vinod @WenqiShi0106 @TianlongChen4 @hengjinlp @ Chengxiang Zhai @TIMANUIUC @ying000
Yuji Zhang tweet mediaYuji Zhang tweet mediaYuji Zhang tweet mediaYuji Zhang tweet media
English
2
17
97
10.4K
Kaishuai retweetledi
Caiqi Zhang
Caiqi Zhang@caiqizh·
We studied confidence estimation for LLMs in multi-turn interactions. 💬 The verdict? Current methods for single-turn struggle to adapt when conversations get deeper. 📉 We really need more research in this direction! 🚀 📄 Read the full paper: arxiv.org/abs/2601.02179…
English
3
5
9
546
Kaishuai retweetledi
Jian Wang
Jian Wang@jwanglvy·
👉NEW research on parameter-efficient methods for RLVR!
MikaStars★@MikaStars39

Stop using LoRA for RLVR!!! New paper released👉Evaluating Parameter Efficient Methods for RLVR 📖Alphaxiv: alphaxiv.org/abs/2512.23165 💻Github: github.com/MikaStars39/Pe… Is standard LoRA truly the optimal choice for Reinforcement Learning?. We present the first large-scale evaluation of over 12 PEFT methodologies using the DeepSeek-R1-Distill family on complex mathematical reasoning benchmarks. Key Finding: Standard LoRA is suboptimal. Structural variants such as DoRA, AdaLoRA, and MiSS consistently outperform standard LoRA. Notably, DoRA (46.6% avg. accuracy) even surpasses full-parameter fine-tuning (44.9%) across multiple benchmarks. The failure of SVD-based initialization.  Strategies like PiSSA and MiLORA experience significant performance degradation or total training collapse. This is due to a fundamental "spectral misalignment": these methods force updates on principal components, while RLVR intrinsically operates in the off-principal regime. The Expressivity Floor.  While RLVR can tolerate moderate parameter reduction, extreme compression (e.g., VeRA, IA³, or Rank-1 adapters) creates an information bottleneck. Reasoning tasks require a minimum threshold of trainable capacity to successfully reorient policy circuits. Recommendations for the community: a. Move beyond the default adoption of standard LoRA. b. Prioritize geometry-aware adapters like DoRA that decouple magnitude and direction. c. Avoid SVD-informed initializations for RL tasks.

English
1
2
1
372
Kaishuai retweetledi
Anthropic
Anthropic@AnthropicAI·
New Anthropic research: Signs of introspection in LLMs. Can language models recognize their own internal thoughts? Or do they just make up plausible answers when asked about them? We found evidence for genuine—though limited—introspective capabilities in Claude.
Anthropic tweet media
English
286
781
4.8K
1.2M
Kaishuai retweetledi
Yang Xiao
Yang Xiao@Yang_Xiao_nlp·
1/9 🔥 NEW PAPER: "LIMI: Less is More for Agency" The Age of AI Agency demands systems that don't just think, but work: vibe coding and automated research. We used just 78 samples to beat GPT-5 by 14.1% and discovered the Agency Efficiency Principle. See details below! 📊
Yang Xiao tweet media
English
2
21
27
5.3K
Kaishuai retweetledi
Yuji Zhang
Yuji Zhang@Yuji_Zhang_NLP·
🔍 New Preprint! Why do LLMs generate hallucinations even when trained on all truths? 🤔 Check out our paper [arxiv.org/abs/2407.08039] 💡 We find that universally, data imbalance causes LLMs to over-generalize popular knowledge and produce amalgamated hallucinations. 📊 Interestingly, we unveil the strong correlation between generalization and hallucination, then quantify how condition length, imbalance ratio, and weight decay impact generalization and hallucination. 🚀We propose using “knowledge overshadowing” as an early warning signal, which successfully forecasts hallucination! Our self-contrastive decoding method significantly reduces hallucinations!
Yuji Zhang tweet mediaYuji Zhang tweet mediaYuji Zhang tweet mediaYuji Zhang tweet media
English
13
97
438
76.8K
Kaishuai retweetledi
Hanlin Wang
Hanlin Wang@hanlinwang1024·
🚀 Excited to share our new work on Agent RL Training! 📑 SPA-RL: Reinforcing LLM Agents via Stepwise Progress Attribution 🔍 Key innovations: • Novel reward redistribution framework • Tackles delayed rewards in multi-turn RL • Improves long-horizon task completion
Hanlin Wang tweet media
English
3
11
33
2.8K
Kaishuai retweetledi
Caiqi Zhang
Caiqi Zhang@caiqizh·
🔥 We teach LLMs to say how confident they are on-the-fly during long-form generation. 🤩No sampling. No slow post-hoc methods. Not limited to short-form QA! ‼️Just output confidence in a single decoding pass. ✅Better calibration! 🚀 20× faster runtime. arXiv:2505.23912 👇
Caiqi Zhang tweet mediaCaiqi Zhang tweet media
English
2
23
41
4K
Kaishuai retweetledi
Heming Xia
Heming Xia@hemingkx·
Thrilled to present our #ICLR2025 poster tomorrow from 3:00-5:30 PM! 👏 📍 Location: Hall 3 + Hall 2B, Booth #49 Drop by to chat and discuss with us!
Heming Xia tweet media
English
0
3
17
1K
Kaishuai
Kaishuai@xukaish·
@_akhaliq A similar issue was identified in our earlier work. We propose Error-injected Self-editing (RISE) to construct hard pairs and apply DPO to mitigate subtle but critical reasoning failures. 📄 arxiv.org/abs/2410.06638 🧠 Let’s aim for robust LLM reasoning! #LLM #Reasoning #DPO
English
0
0
2
57
AK
AK@_akhaliq·
ByteDance presents Recitation over Reasoning How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
AK tweet media
English
4
31
184
18.5K
Kaishuai retweetledi
Alyssa, Yi CHENG
Alyssa, Yi CHENG@YiCheng77783310·
🚀 Accepted by ICLR’25! We introduce Integrative Decoding, a novel decoding algorithm to tackle the hallucination problem in LLMs. The core idea is to integrate “self-consistency” into the decoding objective to improve factuality. ✨ Super easy to implement: no training needed, no customized prompt design required. And it has zero restrictions on the generation form, making it a versatile solution for various scenarios. 🧪 We tested it on over a dozen different LLMs and three common benchmarks. The results? Highly stable and significant improvements! 📄 Check out the paper at arxiv.org/pdf/2410.01556 🙏 Big thanks to all our co-authors! @MasterVito0601 @wendyxiao06091 @Yuji_Zhang_NLP @houwenjun060 @xukaish #ICLR2025 #LLM
Alyssa, Yi CHENG tweet mediaAlyssa, Yi CHENG tweet mediaAlyssa, Yi CHENG tweet mediaAlyssa, Yi CHENG tweet media
English
4
13
79
6.4K
Kaishuai retweetledi
Cooper Leong
Cooper Leong@cooperleong22·
Looking to innovate in LLM alignment? 🎯 FeatureAlignment merges interpretability with alignment (DPO + SAE) for feature-level training. Distributed training + flexible logging + ready-to-use datasets = your next research breakthrough! github.com/MikaStars39/Fe…
MikaStars★@MikaStars39

@NeelNanda5 github.com/MikaStars39/Fe… Thanks Neel for all your introduction to Sparse AutoEncoder. We believe that LLM alignment in feature-level with SAE is the future. Check FeatureAlignment = Alignment e.g. DPO + Mechanistic Interpretability e.g. SAE.

English
1
1
6
431
Kaishuai retweetledi
Hanlin Wang
Hanlin Wang@hanlinwang1024·
🌟 Excited to share our #EMNLP2024Findings work: E2CL: Exploration-based Error Correction Learning for Embodied Agents, a novel framework for improving embodied agents' alignment with environments! 🌟 📄 Welcome to check our work: aclanthology.org/2024.findings-…
Hanlin Wang tweet media
English
0
3
9
1.3K
Kaishuai retweetledi
Binyuan Hui
Binyuan Hui@huybery·
After months of efforts, we are pleased to announce the evolution from Qwen1.5 to Qwen2. This time, we bring to you: ⭐ Base and Instruct models of 5 sizes, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B. Having been trained on data in 27 additional languages besides English and Chinese. 🌟 SOTA performance in a large number of benchmark evaluations. Significantly improved performance in coding and math. 🌠 Extended context length support up to 128K tokens with Qwen2-7B-Instruct and Qwen2-72B-Instruct. 📚 BLOG: qwenlm.github.io/blog/qwen2/ 🤗 DEMO: hf.co/spaces/Qwen/Qw… 🤖 CODE: github.com/QwenLM/Qwen2
Binyuan Hui tweet media
English
65
174
826
202.7K
Kaishuai retweetledi
Omar Sanseviero
Omar Sanseviero@osanseviero·
2023: "Don't make ML open source, China will steal it" 2024: Stanford "steals" model from Chinese group with no recognition
English
50
223
2.9K
321K