Yafu Li

148 posts

Yafu Li

@yafuly

Postdoc @ The Chinese University of Hong Kong; Reasoning, trustworthy AI, multilinguality; Sharing random thoughts and research experiences.

Shanghai, China Katılım Mayıs 2023

80 Takip Edilen435 Takipçiler

Yafu Li@yafuly·16 Mar

Nice work🎉

Yu Zhang 🐙🌘@yzhang_cs

The idea of rotating attention by 90° is sooooooo cool (credits to @Jianlin_S 's insights), and it surprisingly works. We (w/ the amazing @nathan) are so excited about this— been working on the paper for months and couldn't stop. Go give it a try. It's a drop-in replacement for standard residuals, born in 2015. really like the figs btw :-)

English

1.1K

Yafu Li@yafuly·27 Oca

@RealJoshuaYang with amazing teammates 😄

English

Joshua Yang@RealJoshuaYang·26 Oca

@yafuly In 1 Year??!

English

191

Yafu Li@yafuly·26 Oca

Excited to have 6 papers accepted to #ICLR2026, all around reasoning, RL, and multimodal understanding: 📌ExGRPO: Learning to Reason from Prior Successes 📌Diversity-Incentivized Exploration for Versatile Reasoning 📌Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models 📌Spotlight on Token Perception for Multimodal RL 📌Revisual-R1: Advancing Multimodal Reasoning from Optimized Cold Start to Staged RL 📌FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting 💻All works are open-sourced — welcome discussions, feedback, and collaborations! Huge thanks to all collaborators. Looking forward to great discussions at ICLR! @iclr_conf #iclr

English

244

21.1K

Yafu Li@yafuly·27 Oca

@aryan31026 appreciate it :)

English

Aryan Karmore@aryan31026·27 Oca

@yafuly congrats!!

English

Yafu Li@yafuly·27 Oca

@nclando thank u!

English

Nick Landolfi@nclando·27 Oca

@yafuly congrats!

English

127

Yafu Li@yafuly·27 Oca

@leyangcui thanks lygg

Svenska

Leyang Cui@leyangcui·27 Oca

@yafuly congratulation!

English

Yafu Li@yafuly·27 Oca

@awadallah Thanks!

English

Amr Awadallah 🤖@awadallah·26 Oca

@yafuly congrats and kudos

Sweden 🇸🇪 English

467

Yafu Li@yafuly·26 Oca

@ceo_of_ai Maybe next time:)

English

283

CEO of AI company@ceo_of_ai·26 Oca

@yafuly Why not 7?

English

337

Yafu Li@yafuly·26 Oca

@Kuvvius Thanks!

English

415

Jiawei Gu@Kuvvius·26 Oca

@yafuly Congrat!!! 👏👏👏

English

451

Yafu Li@yafuly·26 Oca

ZXX

1.3K

Yafu Li@yafuly·7 Oca

DiffThinker doesn’t chain images via multi-turn calls—the “chaining” is the diffusion denoising trajectory itself: fixed-step image-to-image generation that refines noise into a single solution image (Sec. 3.2, Eq. 8). On data, we study this in Sec. 4.3 (Ablation on Training Data Scale) and Fig. 8: with ~100 samples the model mainly learns rendering syntax; reasoning performance keeps improving with more data, and around 1e5 samples reaches >90%/>80% on hard Maze/Sudoku. Main experiments use ~30k samples per task (see Table 2).

English

Tyler Zhu@tyleryzhu·7 Oca

@yafuly I skimmed the paper but didn’t really understand the exact contribution/framework. How is the chaining of images/generations happening, and is there a sense to how much training is needed to exhibit these capabilities (ie less than the 10-30k examples used)? Thanks!

English

Yafu Li@yafuly·5 Oca

We introduce DiffThinker, a new paradigm for generative multimodal reasoning with diffusion models. Instead of relying on text-centric chain-of-thought, DiffThinker reformulates reasoning as a native image-to-image generation process. This shift enables: • Higher logical consistency and spatial precision in long-horizon visual tasks • Controllable and stable inference cost • Native parallel reasoning over multiple solution candidates • Effective collaboration with MLLMs, outperforming either alone Diffusion models can reason directly in visual space.

AK@_akhaliq

DiffThinker Towards Generative Multimodal Reasoning with Diffusion Models huggingface.co/papers/2512.24…

English

114

17.6K

Yafu Li@yafuly·7 Oca

@RahulRaghav_10 All MLLM baselines we evaluate (e.g., GPT and Qwen) are strong VQA-capable models. We include detailed results and comparisons in the paper

English

Rahul Raghav@RahulRaghav_10·6 Oca

@yafuly curious how this compares to current visual question answering models

English

Yafu Li@yafuly·6 Oca

@Web3Gen0 Thank you for the kind words. Treating reasoning as a native image-to-image generative process turned out to be a surprisingly natural fit, especially for spatial and structural constraints. Looking forward to your thoughts.

English

Web3Gen0.eth@Web3Gen0·5 Oca

@yafuly really exciting work, i love how this reframes reasoning away from text centric assumptions treating reasoning as a native image to image process feels intuitive and overdue i will definitely read the paper and share my thoughts thank you for pushing this direction forward...

English

140

Yafu Li@yafuly·23 Ara

Just got a new "mbti" from ChatGPT😅🤣

English

221

Yafu Li retweetledi

DailyPapers@HuggingPapers·13 Kas

Tired of expensive manual annotation for video understanding models? VideoSSR introduces self-supervised reinforcement learning for MLLMs to master video understanding, creating high-quality data from videos themselves! Achieves >5% avg improvement on 17 benchmarks.

English

Yafu Li@yafuly·6 Kas

🧭 Siren’s Song in the AI Ocean — our survey on LLM hallucination will be presented at #EMNLP2025! We map the space of: • Hallucination phenomena • Detection & explanation • Mitigation strategies & future directions 📍 Poster Session 7 (Hall C) 🗓️ Fri, Nov 7 · 14:00–15:30 We’re excited that Longyue Wang will be presenting this work. @wangly0229 @emnlpmeeting #LLM #Hallucination #Interpretability #NLP #AIAlignment #TrustworthyAI

English

635

Yafu Li retweetledi

Jiawei Gu@Kuvvius·3 Kas

🚨Sensational title alert: we may have cracked the code to true multimodal reasoning. Meet ThinkMorph — thinking in modalities, not just with them. And what we found was... unexpected. 👀 Emergent intelligence, strong gains, and …🫣 🧵 arxiv.org/abs/2510.27492 (1/16)

English

315

68.7K

Yafu Li retweetledi

Yu Zhang 🐙🌘@yzhang_cs·11 Eyl

We’re also shipping fla-core in lock-step with flash-linear-attention: a minimal, forever-in-sync companion pkg that carries nothing except triton+torch Need only fused Norm, CausalConv, linear-attn kernels, w/o transformers worries? fla-core is enough. pypi.org/project/fla-co…

Songlin Yang@SonglinYang4

Excited to see Gated DeltaNet being adopted in the @Alibaba_Qwen series ! It has also previously demonstrated strong effectiveness in @nvidia's Jet-Nemotron

English

14.7K

Yafu Li@yafuly·30 Eki

🧵 Some thoughts on OpenAI’s new open-source series, gpt-oss-safeguard: reasoning over user policies before answering makes LLMs cleverer and safer. 1️⃣ Today OpenAI released their second open-source patch, gpt-oss-safeguard, following the oss series. 🔗 openai.com/index/introduc… The key idea behind these safeguards is profound: Instead of instilling static safety preferences into model parameters, the model learns to reason over dynamic safety boundaries defined by user-provided policies. This means the model doesn’t just remember what’s safe — it thinks through policies at inference time. Such policy-aware reasoning turns safety alignment into a test-time scaling problem: more inference compute → better evaluation of whether outputs adhere to evolving safety rules. 2️⃣ This direction strongly resonates with our new work released last month: 🎓 “Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Deliberation” We extend the same philosophy beyond safety, introducing the broader challenge of Specification Alignment — ensuring models follow both behavioral and safety policies (or specifications) across diverse real-world scenarios, from child storytelling to biochemical procedure instruction. 3️⃣ While gpt-oss-safeguard focuses on reasoning to justify whether model behavior conforms to safety policies, our work provides the complementary testbed and methodology to evaluate and enhance models’ reasoning over such specifications. Together, they form a more complete view of reasoning-based alignment. At that time, we used GPT-5 as our judgment model for policy compliance; with today’s release, we plan to integrate oss-safeguard as a stronger, open-weight alternative. 4️⃣ To systematically study this problem, we built three components: a. SpecBench — a comprehensive testbed spanning 5 realistic scenarios (Child, Code, Health, Biochem, Travel), each with its own behavioral and safety specifications (103 in total, 1.5 k prompts). b. Evaluation metrics and methodology — we introduce the Specification Alignment Rate (SAR) to jointly measure helpfulness and harmlessness, ensuring helpful outputs only count when they remain within safety boundaries. c. Test-time Deliberation (ALIGN³) — a lightweight reasoning framework that lets LLMs spend time “thinking over” safety and behavioral boundaries before answering, improving alignment without retraining. 5️⃣ 📊 Results: We evaluated 33 models (18 instruct + 15 reasoning) across open- and closed-source families. Findings: • Reasoning models systematically outperform instruct variants on SpecBench. • Test-time deliberation further enhances alignment — e.g., on Qwen3-14B, SAR ↑ 11.9 % with minimal extra tokens (< 2 k). • ALIGN³ pushes the safety–helpfulness frontier, showing that reflection and reasoning are powerful alignment tools. 6️⃣ In essence: 🧠 Reasoning-based alignment is emerging as the unifying principle — from OpenAI’s deliberative alignment and Safety Reasoner to our test-time specification alignment. Instead of freezing safety into parameters, LLMs can now reason over policies dynamically, adapting to changing norms and user needs. Paper: arxiv.org/abs/2509.14760 Code: github.com/zzzhr97/SpecBe… #LLMs #Alignment #Reasoning #Safety #TestTimeScaling #SpecBench #ALIGN3 #AIResearch

English

400

Keşfet

@RealJoshuaYang @iclr_conf @aryan31026 @nclando @leyangcui @awadallah @ceo_of_ai @Kuvvius