Wenjun Hou

20 posts

Wenjun Hou

@houwenjun060

🪫Researcher @ HKU Shanghai X-Lab | Phd @HongKongPolyU ⚡Building Agentic System

Shanghai, China Katılım Nisan 2022

388 Takip Edilen54 Takipçiler

Wenjun Hou@houwenjun060·27 Nis

@cooperleong22 👍👍

QME

Cooper Leong@cooperleong22·24 Nis

ZXX

Wenjun Hou@houwenjun060·9 Nis

@Yuji_Zhang_NLP Cool😆

English

122

Yuji Zhang@Yuji_Zhang_NLP·8 Nis

🤔Hold on, I can answer better. 🔗New preprint on LLM multi-turn performance drop and recovery [arxiv.org/pdf/2604.04325]. 💡We identify a hidden tension in multi-turn reasoning: hold vs. lure. ⌛️Models can hold their intent to answer until sufficient evidence is observed, avoiding premature errors. ☔️But this ability is fragile—salient information can lure models to answer. ⬇️Even with the same information, performance drops significantly when moving from single-turn to multi-turn reasoning. ❓We ask: is this due to an overly strong intent to answer early? 🧑‍⚕️This is especially critical in medical diagnosis, a high-stakes setting with low tolerance for error, where a wrong answer at any turn can have serious consequences. 🎯To study this, we introduce MINT (Medical Incremental N-Turn Benchmark). MINT is: ✔ Information-preserving: decomposed cases can be concatenated to recover original single-turn performance, isolating the effect of interaction ✔ High-fidelity: clinically structured evidence (e.g., history, labs) with controlled turn granularity 💡Our key findings: 🏃1. Strong early-answer intent: Over 55% of answers are given within the first 2 turns, leading to a 20–50% accuracy drop from single-turn to multi-turn. ⏰2. Holding unlocks self-correction: When models are instructed to WAIT, the performance drop is greatly reduced. Incorrect→correct revisions occur up to 10.6× more often than the reverse, revealing a latent self-correction ability suppressed by early commitment. 🦴3. Strong lures override control: Clinically salient signals (e.g., lab results) trigger premature answers—even when models are explicitly told to wait. 👇4. Actionable implications: • Deferring the diagnostic question improves first-answer accuracy by up to 62.6% • Delaying salient evidence prevents up to 23.3% catastrophic accuracy drop. Thanks to all our coauthors for their amazing support! @ Jinrui Fang @ Runhan Chen @ Xu Yang @ Jian Yu @ Jiawei Xu @ Ashwin Vinod @WenqiShi0106 @TianlongChen4 @hengjinlp @ Chengxiang Zhai @TIMANUIUC @ying000

English

10.4K

Wenjun Hou retweetledi

Elon Musk@elonmusk·20 Mar

With @Grok, we keep the honest versions and kill the bad transformers (I believe they are called “Decepticons”)

I,Hypocrite@lporiginalg

This is fine.

English

4.6K

138.2K

37.8M

Wenjun Hou retweetledi

Yuji Zhang@Yuji_Zhang_NLP·1 Mar

🔍New findings of knowledge overshadowing! Why do LLMs hallucinate over all true training data? 🤔Can we predict hallucinations even before model training or inference? 🚀Check out our new preprint: [arxiv.org/pdf/2502.16143] The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination. 💥We unveil the log-linear law of knowledge overshadowing: hallucination rate increases linearly with the logarithmic scale of relative knowledge popularity, relative knowledge length, and model sizes! 💥We are enabled to foresee hallucinations even before model training or inference by our unveiled law! 🧪Built on overshadowing effect, we propose a new decoding strategy CoDA to emphasize overshadowed knowledge while downweight dominant knowledge bias, significantly enhancing model factuality! ✨Our findings deepen understandings of the underlying mechanisms of hallucinations, opening exciting possibilities towards more predictable and controllable language models. ❤️Big thanks to the amazing team @ZoeyLi20 @qiancheng1231 @JiatengLiu @PengfeiYu @Glaciohound @May_F1_ for the inspiring collaborations! ❤️A huge appreciation for my mentors Prof. KathleenMckeown, Prof. ChengxiangZhai, Prof. @ManlingLi_ , and Prof. @hengjinlp for their amazing support and suggestions!

English

125

27.1K

Wenjun Hou retweetledi

Alyssa, Yi CHENG@YiCheng77783310·23 Oca

🚀 Accepted by ICLR’25! We introduce Integrative Decoding, a novel decoding algorithm to tackle the hallucination problem in LLMs. The core idea is to integrate “self-consistency” into the decoding objective to improve factuality. ✨ Super easy to implement: no training needed, no customized prompt design required. And it has zero restrictions on the generation form, making it a versatile solution for various scenarios. 🧪 We tested it on over a dozen different LLMs and three common benchmarks. The results? Highly stable and significant improvements! 📄 Check out the paper at arxiv.org/pdf/2410.01556 🙏 Big thanks to all our co-authors! @MasterVito0601 @wendyxiao06091 @Yuji_Zhang_NLP @houwenjun060 @xukaish #ICLR2025 #LLM

English

6.4K

Wenjun Hou@houwenjun060·19 Haz

@alesfav @siddharth0708 @ashwinsw Impressive

English

Alessandro Favero@alesfav·17 Haz

Vision-language models sometimes hallucinate objects not in the image. In our #CVPR2024 work, we show that this happens due to excessive reliance on language priors and propose two solutions. If in Seattle, visit our poster on Thu to learn more! 📄 arxiv.org/abs/2403.14003

English

1.1K

Wenjun Hou retweetledi

Zhijing Jin@ZhijingJin·19 Mar

Really excited that our paper "A PhD Student’s🧑‍🎓 Perspective on Research in NLP in the Era of #LLMs" will be at #COLING2024! We brainstormed 45 topics💡 that students can work on despite the LLMs. Co-led by @OanaIgnatRo @ZhijingJin and @radamihalcea with contributions from many.

Rada Mihalcea@radamihalcea

“What should I work on?” is a question we hear more & more often from NLP students, during a time when the media rhetoric is that “it’s been all solved” Turns out there are many NLP research areas rich for exploration—here is our answer from 20+ students arxiv.org/abs/2305.12544

English

239

27.8K

Wenjun Hou retweetledi

Cooper Leong@cooperleong22·28 Şub

Explore LLM Interpretability with this comprehensive resource compilation: github.com/cooperleong00/… 📚 Tutorials, libraries, surveys, papers, blogs & more! 📂 Categorized for easy navigation 🔄 Continually updated 🗨️ Your thoughts & feedback are welcome! #NLProc #LLM

English

2.8K

Wenjun Hou retweetledi

Heming Xia@hemingkx·25 Şub

🚨What is currently the best Speculative Decoding method for accelerating LLM inference?🔍 We introduce Spec-Bench📖: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding!🚀 Project page: sites.google.com/view/spec-bench 🧵1/n

English

13.4K

Wenjun Hou retweetledi

Alyssa, Yi CHENG@YiCheng77783310·26 Şub

Check out our paper at #AAAI2024! 🤔Many recent works found that LLMs struggle to achieve a "complex" dialogue goal through multi-turn interactions strategically. 💡We propose Cooper to address this by coordinating multiple specialized agents towards a complex dialogue goal.

English

517

Wenjun Hou@houwenjun060·26 Şub

📢Can a model generate consistent reports for semantically equivalent radiographs? 📖Check out our new paper "ICON: Improving Inter-Report Consistency for Radiology Report Geneartion via Lesion-aware Mix-up Augmentation" for more details. 🔍arXiv: arxiv.org/abs/2402.12844

English

1.3K

Wenjun Hou retweetledi

Peiqin Lin@lpq29743·30 Oca

Happy to share that our paper "mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models" (with Chengzhi Hu, @zheyu_bot, @andre_t_martins, @HinrichSchuetze) has been accepted to #EACL2024 Findings. See you in Malta! arxiv.org/abs/2305.13684

English

1.8K

Wenjun Hou retweetledi

Jian Wang@jwanglvy·22 Şub

Tuning LLMs has been a prevalent paradigm for building capable dialogue agents. Can a different tuning method enhance the consistency of dialogue generation? 👉 Our recent work may bring some insights! 🔍 arXiv: arxiv.org/abs/2402.06967

English

2.5K

Wenjun Hou retweetledi

Heming Xia@hemingkx·17 Oca

📢Excited to share our new survey paper "Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding"! Paper: arxiv.org/abs/2401.07851 🧵1/n

English

6.4K

Wenjun Hou retweetledi

Bruce Sun@BruceSun1995·11 Oca

Could I trust LLM critique? 🔥 We are the pioneers in prioritizing critique evaluation and introducing the critique of critique, termed MetaCritique. Repo: github.com/GAIR-NLP/MetaC… Paper: arxiv.org/abs/2401.04518 (1/7)

English

5.8K

Wenjun Hou retweetledi

Kaishuai@xukaish·15 Oca

🤒You wouldn't feel at ease letting just a language model be your medical assistant. 👨‍⚕️An explicit and understandable differential diagnosis is required! 📖Paper: arxiv.org/abs/2401.06541

English

211

Wenjun Hou@houwenjun060·7 Ara

Thrilled to attend the EMNLP@Singapore🇸🇬 We leverage the historical records for precise radiology report generation. Check out our EMNLP Findings paper for more details: arxiv.org/abs/2310.13864

English

351

Wenjun Hou retweetledi

Jian Wang@jwanglvy·5 Ara

Excited to arrive in Singapore for #EMNLP2023 today! We will present our work in the East Foyer room on Dec. 8. Feel free to stop by during our poster sessions. I'm looking forward to meeting new and old friends!

English

915

Wenjun Hou retweetledi

Peiqin Lin@lpq29743·11 Tem

Beyond elated that Glot500 won an Area Chair award. I'll also be presenting on this at: - Jul 11 (Tue), 09:00-09:15, Metropolitan Centre - Jul 12 (Wed), 09:00, Bay - Unit 3 Come if you are interested in multilinguality and scaling large language models to 500+ languages.

Ayyoob Imani@imani_ayyoob

Excited that our paper "Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages" won the Area Chair Award at #ACL2023. Big kudos to our squad at @CisLmu who made it all happen. @lpq29743 to present the findings. #NLProc

English

4.4K

Wenjun Hou retweetledi

Jian Wang@jwanglvy·10 Tem

Excited to be in Toronto for two days! I will present our #ACL2023NLP work "Dialogue Planning via Brownian Bridge Stochastic Process for Goal-directed Proactive Dialogue" in the spotlight session on Monday, July 10, 19:00-21:00, Metropolitan East. Welcome to have a chat!🤗

English

435

Keşfet

@cooperleong22 @Yuji_Zhang_NLP @WenqiShi0106 @TianlongChen4 @hengjinlp @TIMANUIUC @ying000 @grok