Han Wang

204 posts

Han Wang banner
Han Wang

Han Wang

@HanWang98

PhD student @unc @unccs @unc_ai_group; Formerly @AMD @AmazonScience @MSFTResearch @NlpWestlake. RT & like ≠ endorsements. Views are my own. He/him

Chapel Hill, NC, USA Katılım Temmuz 2019
596 Takip Edilen286 Takipçiler
Sabitlenmiş Tweet
Han Wang
Han Wang@HanWang98·
🚨Real-world retrieval is messy: queries can be ambiguous, or documents may conflict/have incorrect/irrelevant info. How can we jointly address all these problems? We introduce: ➡️ RAMDocs, a challenging dataset with ambiguity, misinformation, and noise. ➡️ MADAM-RAG, a multi-agent framework that debates and aggregates evidence across sources. MADAM-RAG outperforms strong baselines on existing datasets by up to +11.4% on AmbigDocs and +15.8% on FaithEval with Llama3.3-70B. RAMDocs reveals a large performance gap -- best baseline reaches only 32.6% accuracy -- highlighting the need for reasoning-based RAG under real-world complexity. 🧵⬇️
Han Wang tweet media
English
2
30
69
29.9K
Han Wang retweetledi
Han Lin
Han Lin@hanlin_hl·
🚀 Excited to share V-Co, a diffusion model that jointly denoises pixels and pretrained semantic features (e.g., DINO). We find a simple but effective recipe: 1️⃣ architecture matters a lot --> fully dual-stream JiT 2️⃣ CFG needs a better unconditional branch --> semantic-to-pixel masking for CFG 3️⃣ the best semantic supervision is hybrid --> perceptual-drifting hybrid loss 4️⃣ calibration is essential --> RMS-based feature rescaling We conducted a systematic study on V-Co, which is highly competitive at a comparable scale, and outperforms JiT-G/16 (~2B, FID 1.82) with fewer training epochs. 🧵 👇
Han Lin tweet media
English
2
41
129
20.6K
Han Wang retweetledi
Daeun Lee
Daeun Lee@danadaeun·
🚨 Excited to share VisionCoach, an RL framework for reinforcing grounded video reasoning via visual-perception prompting and self-distillation! 🧠 Video reasoning models often miss where to look or rely on language priors. Instead of only supervising final answers, we encourage the model to learn to attend to the right visual evidence. ⚽️ VisionCoach uses RL to reward correct visual attention, with dynamic visual prompting as a training-time coach for better spatio-temporal grounding, while keeping inference simple and tool-free via self-distillation. ⭐️ Achieves state-of-the-art zero-shot performance across video reasoning, video understanding, and temporal grounding benchmarks (V-STAR, VideoMME, World-Sense, VideoMMMU, PerceptionTest, and Charades-STA). 👇🧵
English
1
24
75
10.5K
Han Wang retweetledi
Peter Hase
Peter Hase@peterbhase·
Can we train models to have more monitorable CoT? We introduce Counterfactual Simulation Training to improve CoT faithfulness/monitorability. CST produces models that admit to reward hacking and deferring too much to Stanford profs (@chrisgpotts told me this is very dangerous)
Peter Hase tweet media
English
12
36
210
20.9K
Han Wang retweetledi
Daeun Lee
Daeun Lee@danadaeun·
🥳 Happy to announce that StreamGaze is accepted to #CVPR2026! 👀 We introduce the first benchmark that evaluates gaze-guided temporal reasoning (past, present, and future) and proactive understanding for streaming video understanding. We find that all MLLMs fall far below human performance, particularly in temporal continuity, gaze grounding, and proactive prediction. 💗 Huge thanks to my last year's AdobeResearch team: Subhojyoti Mukherjee, Branislav Kveton, Ryan A. Rossi, Viet Lai, David Seunghyun Yoon, Trung Bui, Franck Dernoncourt, and my advisor Mohit Bansal 😃
Daeun Lee@danadaeun

🤔 We rely on gaze to guide our actions, but can current MLLMs truly understand it and infer our intentions? Introducing StreamGaze 👀, the first benchmark that evaluates gaze-guided temporal reasoning (past, present, and future) and proactive understanding in streaming video settings. ➡️ Gaze-Guided Streaming Benchmark: 10 tasks spanning past, present, and proactive reasoning, from gaze-sequence matching to alerting when objects appear within the FOV area. ➡️ Gaze-Guided Streaming Data Construction Pipeline: We align egocentric videos with raw gaze trajectories using fixation extraction, region-specific visual prompting, and scanpath construction to generate spatio-temporally grounded QA pairs. This process is human-verified. ➡️ Comprehensive Evaluation of State-of-the-Art MLLMs: Across all gaze-conditioned streaming tasks, we highlight fundamental limits of current MLLMs. All MLLMs fall far below human performance. Models particularly struggle with temporal continuity, gaze grounding, and proactive prediction.

English
3
19
67
5.5K
Han Wang retweetledi
Elias Stengel-Eskin
Elias Stengel-Eskin@EliasEskin·
🚨 Excited to share Reasoning Execution by Multiple Listeners (REMuL), a multi-party training method for faithful reasoning. Consistently boosts faithfulness evals (hint attribution, early answering, mistake injection) across diverse reasoning tasks while maintaining accuracy! ➡️ Faithfulness is key for CoT interpretability but current LLMs produce unfaithful reasoning that is hard to follow, with standard outcome-focused RL hurting faithfulness. ➡️ REMuL approaches faithfulness through the lens of executability. A CoT is faithful if independent "listener" models can follow/execute a truncated CoT prefix and reliably arrive at the same conclusion as the “speaker” model. ➡️ REMuL trains the speaker via GRPO to produce reasoning that achieves consistent answers among listeners, while maintaining correctness via masked supervised finetuning. ➡️ Interestingly, REMuL's multi-party training generalizes better. Directly optimizing for faithfulness metrics improves those metrics alone, but not others, while REMuL improves across metrics! 🧵👇
Elias Stengel-Eskin tweet media
English
1
25
37
5.8K
Han Wang retweetledi
Runchu Tian
Runchu Tian@Runchu_Tian·
🎉Excited to share that I’ll be starting my PhD at UNC Chapel Hill @UNC, joining MURGe-Lab, advised by Prof. Mohit Bansal @mohitban47! I’ll be working on multimodality, reasoning, and AI agents. New chapter begins! #PhD #NLP #UNCCH #Multimodal
Runchu Tian tweet media
English
17
17
100
7K
Han Wang retweetledi
Archiki Prasad
Archiki Prasad@ArchikiPrasad·
🚨 I’m on the 2026 Research Scientist Job Market! I am a PhD student at UNC Chapel Hill (advised by @mohitban47) and recipient of the Apple Scholars in AI/ML PhD Fellowship. My research centers around: 🔸Reasoning & RL/Post-Training: Evaluating and interpreting the reasoning process, and improving post-training and alignment through self-generated and reward-based signals (Intrinsic Dim., ReCEVAL, ScPO, LASeR). 🔸Agents & Planning: Designing adaptive agent frameworks to that use extra test-time compute & reasoning upon failure (ADaPT, System-1.x, PRInTS). 🔸Reward & Skill Discovery in Code: Leveraging execution signals to build reliable rewards, automate debugging, and discover abstractions in code (UTGen, ReGAL). Prev (Research Intern): Google DeepMind, Meta FAIR, Allen Institute for AI (AI2), and Adobe Research. Feel free to reach out via DM or email if you’re interested, have leads, or would like to connect! 🌐 archiki.github.io 📧 archiki@cs.unc.edu #NLP #AI #JobSearch
English
15
59
344
54.7K
Han Wang retweetledi
Zun Wang
Zun Wang@ZunWang919·
🚀 Excited to share AnchorWeave — a local-memory-augmented framework for world-consistent long-horizon video generation. - Global 3D reconstruction as memory accumulates cross-view misalignment and contaminates conditioning signals. - We replace a single noisy global 3D memory with multiple retrieved local 3D memories and learn to weave them. - Stronger long-horizon scene consistency and generalization ability. 🧵👇
English
1
55
299
27.5K
Han Wang retweetledi
Han Wang retweetledi
Ziyang Wang
Ziyang Wang@ZiyangW00·
🚨Excited to share our new paper MuRGAt: “Multimodal Fact-Level Attribution for Verifiable Reasoning” Key finding: even strong MLLMs can be right on the final answer but wrong on the evidence (hallucinated citations / mis-grounded modality or timestamp). What MuRGAt adds: - Human annotations to judge whether each cited piece of evidence actually supports a claim. ✅ - Atomic fact decomposition to evaluate attribution at the fact level, not just the final answer. 🧩 - MuRGAt-SCORE, a metric that aligns well with human judgment. 📏 - Benchmarks across strong MLLMs + studies showing programmatic grounding can improve attribution. ⚖️ Paper + code + details in the original thread 👇
David Wan@meetdavidwan

🚀Announcing MuRGAt! MLLMs are improving at reasoning over complex multimodal inputs, but does that translate to faithful grounding to multimodal sources (video, audio, charts, etc.)? We find that even strong MLLMs often hallucinate citations despite getting the answer correct!🤯 We introduce a benchmark for Fact-Level Multimodal Attribution featuring: ✅ High-quality Human Annotations for validation. ✅ MuRGAt-SCORE: A decomposed metric that highly correlates with human judgment. ✅ Methods to improve citations, showing that Programmatic Grounding boosts attribution. 🧵👇

English
1
10
16
1.7K
Han Wang retweetledi
hyunji amy lee
hyunji amy lee@hyunji_amy_lee·
🧐MLLMs are improving at reasoning tasks, but do they actually reason with correct sources? We introduce MuRGAt, a benchmark for Multimodal Reasoning with Grounded Attribution: ❗️Even strong MLLMs often hallucinate citations despite answering correctly. ❗️There’s a trade-off between reasoning and attribution: increased thinking can improve reasoning while degrading grounding, and programmatic grounding boosts attribution at the cost of reasoning accuracy. More details in the thread below ⬇️
David Wan@meetdavidwan

🚀Announcing MuRGAt! MLLMs are improving at reasoning over complex multimodal inputs, but does that translate to faithful grounding to multimodal sources (video, audio, charts, etc.)? We find that even strong MLLMs often hallucinate citations despite getting the answer correct!🤯 We introduce a benchmark for Fact-Level Multimodal Attribution featuring: ✅ High-quality Human Annotations for validation. ✅ MuRGAt-SCORE: A decomposed metric that highly correlates with human judgment. ✅ Methods to improve citations, showing that Programmatic Grounding boosts attribution. 🧵👇

English
0
10
28
1.8K
Han Wang retweetledi
David Wan
David Wan@meetdavidwan·
🚀Announcing MuRGAt! MLLMs are improving at reasoning over complex multimodal inputs, but does that translate to faithful grounding to multimodal sources (video, audio, charts, etc.)? We find that even strong MLLMs often hallucinate citations despite getting the answer correct!🤯 We introduce a benchmark for Fact-Level Multimodal Attribution featuring: ✅ High-quality Human Annotations for validation. ✅ MuRGAt-SCORE: A decomposed metric that highly correlates with human judgment. ✅ Methods to improve citations, showing that Programmatic Grounding boosts attribution. 🧵👇
David Wan tweet media
English
2
28
52
9.2K
Han Wang retweetledi
Archiki Prasad
Archiki Prasad@ArchikiPrasad·
🚨Excited to share our new work viewing reasoning strategies as teaching tools: for fixed target model, which CoT strategies best support learning and generalization? ✨Our answer is intrinsic dimensionality (minimum effective capacity a model needs to solve the task). Somewhat counterintuitively, adding CoT – which requires generating longer and more structured outputs – can reduce learning complexity. Good reasoning compresses the task, i.e., it reduces the degrees of freedom the model needs to map inputs to correct solutions. 🧵⬇️ (1/5)
Archiki Prasad tweet media
English
5
45
186
24.2K
Han Wang retweetledi
Shoubin Yu
Shoubin Yu@shoubin621·
🚨 Excited to share AVIC — an analysis and framework for adaptive test-time scaling with world model imagination in visual spatial reasoning. 📉 Always-on visual imagination is often unnecessary, or even misleading. 📈 AVIC treats visual imagination as a selective, query-dependent test-time resource—showing that better spatial reasoning comes from deciding when and how much to imagine, not from imagining more. ➡️ Across spatial reasoning & embodied navigation, we get stronger accuracy with far fewer world-model calls and tokens. 🧵👇[1/6]
English
3
38
88
15.7K
Han Wang retweetledi
Mohit Bansal
Mohit Bansal@mohitban47·
🚨 If you are looking for a very strong researcher who can bridge the gap between factual reliability and complex multimodal reasoning, definitely check out David (he is a Google PhD Fellow with several useful contributions in faithfulness & hallucination mitigation, fine-grained attribution, multimodal retrieval, etc.) 👇👇
David Wan@meetdavidwan

🚀 I'm on the 2026 Research Scientist Job Market! I am a Google PhD Fellow at UNC (advised by @mohitban47). I work on Faithful and Multimodal AI, focusing on reducing hallucinations and improving reasoning in generation tasks by: 🔹 Faithfulness & Hallucination Mitigation: Developing metrics and methods to ensure model outputs are factually consistent (e.g., FactPEGASUS, PrefixNLI). 🔹 Fine-grained Attribution & RAG: Creating frameworks that allow models to cite their sources and reason transparently (e.g., GenerationPrograms, LAQuer). 🔹 Multimodal Reasoning & Retrieval: Grounding vision-language models to reduce hallucinations in cross-modal tasks (e.g., CLaMR, Contrastive Region Guidance). Prev Intern: Google, Meta, Salesforce, Amazon. 🔗 meetdavidwan.github.io #NLP #AI #JobSearch

English
0
14
55
9.3K
Han Wang retweetledi
David Wan
David Wan@meetdavidwan·
🚀 I'm on the 2026 Research Scientist Job Market! I am a Google PhD Fellow at UNC (advised by @mohitban47). I work on Faithful and Multimodal AI, focusing on reducing hallucinations and improving reasoning in generation tasks by: 🔹 Faithfulness & Hallucination Mitigation: Developing metrics and methods to ensure model outputs are factually consistent (e.g., FactPEGASUS, PrefixNLI). 🔹 Fine-grained Attribution & RAG: Creating frameworks that allow models to cite their sources and reason transparently (e.g., GenerationPrograms, LAQuer). 🔹 Multimodal Reasoning & Retrieval: Grounding vision-language models to reduce hallucinations in cross-modal tasks (e.g., CLaMR, Contrastive Region Guidance). Prev Intern: Google, Meta, Salesforce, Amazon. 🔗 meetdavidwan.github.io #NLP #AI #JobSearch
English
4
35
179
23.3K