Yu-Neng Chuang

24 posts

Yu-Neng Chuang

Yu-Neng Chuang

@YuNengChuang

Katılım Aralık 2022
63 Takip Edilen24 Takipçiler
Yu-Neng Chuang retweetledi
Feng Luo
Feng Luo@FengLuo895614·
🚀 Can LLMs stop overthinking when detailed reasoning isn't needed? Excited to share our latest work on LLM reasoning: AutoL2S 🧠⚡ 📄 Paper: arxiv.org/abs/2505.22662 🤖 Model: huggingface.co/amandaa/AutoL2… LLMs often overthink—generating unnecessarily long CoTs even for easy questions, increasing cost & latency. We propose Auto Long-Short Reasoning (AutoL2S): A model-agnostic framework that dynamically choose long or short reasoning based on question complexity. 💡 Just add a token—that's all it takes to teach the model when to skip redundant steps. 🖼️ (See below 👇) How AutoL2S switches reasoning strategies using simple markers like and 📉 Up to 57% reduction in CoT length across four reasoning tasks without performance drop. Credits to all co-authors: @FengLuo895614 *, @YuNengChuang*, @Guanchu_Gary, Hoang Anh Duy Le, @henryzhongsc , Hongyi Liu, @jiayiy , @YangSui, Vladimir Braverman, Vipin Chaudhary, @huxia
Feng Luo tweet media
English
0
5
9
621
Yu-Neng Chuang retweetledi
elvis
elvis@omarsar0·
A survey on efficient reasoning for LLMs. That was quick! I have been featuring papers on the topic of efficient reasoning and I see a few familiar papers in this survey. Good read overall!
elvis tweet media
English
9
85
374
57.1K
Yu-Neng Chuang retweetledi
Sumit
Sumit@_reachsumit·
MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation Proposes a training-free RAG framework using multiple LLM agents to collaboratively filter retrieved documents, improving retrieval precision while maintaining high recall. 📝arxiv.org/abs/2501.00332
English
0
5
13
1.4K
Yu-Neng Chuang
Yu-Neng Chuang@YuNengChuang·
📢Excited to present "Taylor Unswift" poster at #EMNLP24 in Miami! Join us on Nov 13 (Wed), 10:30–12:00, at Main #778. "Taylor Unswift" aims to solve the dilemma of secured weight release for LLM developers and users. 🔗Paper: arxiv.org/pdf/2410.05331 🔗Code: github.com/guanchuwang/Ta… Wanna know more about "Taylor Unswift"😉: 🚨 Oftentimes, model developers face a dilemma: open-source their models and lose control, or offer closed APIs but bear costs and deter privacy-conscious users. 🚑 Introducing "Taylor Unswift": a method using Taylor Expansion Theory to protect model weights while allowing users to run models on their own data without accessing the weights. These correspond to the 'Taylor' and 'Unswift' in the title. 🌟 Developers can prevent misuse of their models, while users can run models on their own data without sharing it—unlike with services like the ChatGPT API. More detailed insights can be found in the paper! Kudos to all co-authors: @Guanchu_Gary*, @YuNengChuang*, @RuixiangT, @henryzhongsc, @jiayiy, @serendip410, @ziruirayliu, Vipin Chaudhary, Shuai Xu, James Caverlee, @huxia #LLM #security #NLP #EMNLP
English
1
2
6
812
Yu-Neng Chuang retweetledi
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
After "Attention Is All You Need", AI paper titles be like:
Yuchen Jin tweet media
English
43
176
1.7K
204.4K
Yu-Neng Chuang retweetledi
Jiayi Yuan
Jiayi Yuan@jiayiy·
🚀Excited to share our latest #EMNLP2024 work on benchmarking the long context ability with KV Cache compression across RNN-based architectures, token eviction, prompt compression, and quantization. We also provide an easy-to-use codebase (it also has my favorite WoW quote 😉). Feel free to give it a try and ⭐ it if you find it useful! 📄 Paper: arxiv.org/abs/2407.01527 💻 Code: github.com/henryzhongsc/l… Some interesting findings/suggestions include: 1️⃣ Maintaining an uncompressed prefill process is essential for performance, especially with harder tasks. 2️⃣ Combining RNN-based models with attention significantly enhances long-context capabilities. 3️⃣ In "needle-in-a-haystack" evaluation for recent LLMs like Llama-3, we should use longer needles (like 64 digits) since these models tokenize multiple digits into one token. More results and insights can be found in the paper! Kudos to all collaborators: @jiayiy, Hongyi Liu, @henryzhongsc, @YuNengChuang, Songchen Li, Guanchu Wang, Duy Le, @serendip410, Vipin Chaudhary, @ZhaozhuoX, @ziruirayliu, @huxia
Jiayi Yuan tweet mediaJiayi Yuan tweet media
English
1
10
42
10.2K
Yu-Neng Chuang
Yu-Neng Chuang@YuNengChuang·
Introducing the LTSM-bundle Package! 🌟Thrilled to launch our open-source tool 🔧Assess various crucial designs to train Large Time Series Models (LTSMs), and identity the best training practices 🔗 Paper: arxiv.org/abs/2406.14045 🔗 GitHub: github.com/daochenzha/ltsm
Yu-Neng Chuang tweet mediaYu-Neng Chuang tweet media
English
0
7
13
1.7K
Yu-Neng Chuang retweetledi
HongyeJ@NeurIPS
HongyeJ@NeurIPS@serendip410·
We tested our SelfExtend (arxiv.org/pdf/2401.01325…) for LLama-3-8B/70B-Instruct on the new challenging long context benchmark Ada-Eval (arxiv.org/abs/2404.06480). The task is selecting the best answer from candidates. The results are pretty good! 🌟 Highlights: 1: Equipped with SelfExtend, Llama-3-70B beats all except GPT-4-turbo. 2: Even for Mistral-7B-Instruct-v0.2, which has enough context window, SelfExtend can boost its performance! !Especially for long cases. 3: The Llama-3 series, at their respective scales, are impressive! Check our repo for more details on SelfExtend: github.com/datamllab/Long…
HongyeJ@NeurIPS tweet media
English
0
8
15
1.3K
Yu-Neng Chuang retweetledi
HongyeJ@NeurIPS
HongyeJ@NeurIPS@serendip410·
🚨Recently, we attempted to investigate the impact of different group size/neighbor window combinations on SelfExtend using the 'Needle In a Haystack' task. 🧐 Generally, SelfExtend is not overly sensitive to the two hyperparameters. We also got some intriguing findings: 1️⃣ Mistral-ins-0.1 stands out with a surprisingly narrow flexibility zone. Does this arise from their SWA during retraining? If so, how? 🤔 2️⃣The 70b LLama-2 has a larger flexible area compared to its 7b siblings! Could it be larger models' superior noise handling or just more layers? 🧩 3️⃣Phi-2, although much smaller, has a relatively large flexible area. Does this stem from the fact that it uses 40% of the head dimension for RoPE, or just its talent at these tasks? ✨ Dive into our repo for more details! 🔗 github.com/datamllab/Long… #MachineLearning #LLMs
HongyeJ@NeurIPS tweet media
English
0
2
8
1K
Yu-Neng Chuang retweetledi
Zirui Liu
Zirui Liu@ziruirayliu·
🚀 Deploying long-context LLMs is hindered by huge KVCache size. Our new method KIVI directly addresses this problem by quantizing KVCache into 2/4bit number. In Mistral-v0.2 testing, KIVI demonstrates similar accuracy as the full-precision baseline with 5.3X less KV Cache!
Zirui Liu tweet mediaZirui Liu tweet media
English
3
18
67
10.7K
Yu-Neng Chuang retweetledi
Xiaotian (Max) Han
Xiaotian (Max) Han@XiaotianHan1·
SelfExtend, without further training, upgrades Mistral-inst-v0.1 to match the performance level of its successor, v0.2, in qa tasks. therefore, the value of SelfExtend is at least equivalent to the training cost of Mistral-inst-v0.2?
Xiaotian (Max) Han tweet media
English
0
4
16
1.8K
Yu-Neng Chuang retweetledi
Wei-Rui Chen
Wei-Rui Chen@WeiRuiChen01·
🤔 How many languages does #ChatGPT know? 🚀 Our work Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability is an attempt to answer this question and has been accepted to #NAACL2024 Findings. Paper Link: arxiv.org/abs/2311.09696 (1/5)
English
4
3
19
2.5K
Yu-Neng Chuang retweetledi
HongyeJ@NeurIPS
HongyeJ@NeurIPS@serendip410·
Despite the mixed feelings about Google's latest Gemma model, we're big fans! @GoogleAI Why? Coz we found it pairs incredibly well with our SelfExtend 🤣🤣🤣 - like, perfectly! With Self-Extend, no fine-tuning needed, we effortlessly expanded Gemma's window from 8k to 90k+! On the 'Needle in the haystack' task, Gemma-2b-it even struggled at 8k, but with SelfExtend, Gemma-2b-it easily tackles it within 90k range! #AI #Gemma #SelfExtend #LLMs 🚀 Paper: arxiv.org/abs/2401.01325 Github: github.com/datamllab/Long…
HongyeJ@NeurIPS tweet media
English
6
37
210
32.4K
Yu-Neng Chuang retweetledi
HongyeJ@NeurIPS
HongyeJ@NeurIPS@serendip410·
🚀 Our Self-Extend method remains effective for Gemma-7b. We've successfully applied the Self-Extend patch to Gemma, showcasing its potential in passkey retrieval tasks (16k). Our exploration continues as we test it on more complex tasks and longer sequences (x4, x8?). Have you encountered any issues? Do you have new results in your own case? We're eager to hear from you – please email us! Your feedback is invaluable as we push the boundaries of our research. Stay tuned for more updates! 🌟 Github: github.com/datamllab/Long… Paper: arxiv.org/pdf/2401.01325…
HongyeJ@NeurIPS tweet mediaHongyeJ@NeurIPS tweet media
English
0
4
15
3.3K