Tianhao Wang

28 posts

Tianhao Wang

@0920wth

Assistant Professor @HDSIUCSD. Previously Research Assistant Professor @TTIC_Connect and PhD in Statistics & Data Science @Yale.

Chicago Katılım Temmuz 2017

255 Takip Edilen214 Takipçiler

Tianhao Wang retweetledi

Zhijian Liu@zhijianliu_·8 Nis

DFlash just landed in both SGLang and vLLM! 🚀 More draft models dropping soon: GLM-5.1, Kimi-K2.5 (preview live now!), Qwen3.5-397B & 122B. Try it now ↓ SGLang: github.com/sgl-project/sg… (🙏 @_dcw02) vLLM: github.com/vllm-project/v… (🙏 @BenjaminCh44989)

Zhijian Liu@zhijianliu_

Holiday cooking finally ready to serve! 🥳 Introducing DFlash — speculative decoding with block diffusion. 🚀 6.2× lossless speedup on Qwen3-8B ⚡ 2.5× faster than EAGLE-3 Diffusion vs AR doesn’t have to be a fight. At today’s stage: • dLLMs = fast, highly parallel, but lossy • AR LLMs = accurate, sequential, but slow DFlash = diffusion drafts, AR verifies.

English

416

54.7K

Tianhao Wang retweetledi

Zhijian Liu@zhijianliu_·24 Şub

Reasoning LLMs generate very long chains-of-thought, so even small quantization errors add up. With AWQ, Qwen3-4B drops 71.0 → 68.2 on MMLU-Pro (~4% relative loss). 😬 ParoQuant fixes this! It keeps only the critical rotation pairs and fuses everything into a single kernel. Recovers most of the lost reasoning accuracy with minimal overhead — so 4-bit models stay strong at reasoning. 💪💪

English

143

1.4K

170.4K

Tianhao Wang retweetledi

Zhuoran Yang@zhuoran_yang·20 Şub

New Paper -- "On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking" We give a complete mechanistic and dynamic picture of how neural networks learn modular addition f(x,y) = (x+y) mod p. We answer three questions: (1) What does the trained network compute? (2) How do Fourier features emerge during training? (3) Why does grokking happen? Each answer comes with a mathematical characterization backed by theory and experiments. Paper: arxiv.org/abs/2602.16849 Blog: y-agent.github.io/posts/modular_… Demo: huggingface.co/spaces/y-agent… Code: github.com/Y-Agent/modula…

English

311

17K

Tianhao Wang retweetledi

Jiaqi Ma@Jiaqi_Ma_·18 Oca

The ARC challenge claims to measure "fluid intelligence" through tasks that are "simple for people yet difficult for AI." However, is the AI failure really due to the lack of "fluid intelligence?" Our recent work shows that the answer is NO with a carefully designed diagnostic study. ArXiv: arxiv.org/pdf/2512.21329 Joint work with Xinhe Wang, @JinHuang9306000, @_Jimmy_Zhang_ , @0920wth Our study is motivated by an observation that ARC problems are easy for humans because their representation strongly favors human vision. For example, in the attached figure, the same ARC problem presented in a serialized way becomes much more challenging for humans. 1/

English

5.9K

Tianhao Wang retweetledi

Arya Mazumdar@MountainOfMoon·23 Ara

The University of California, San Diego invites applications for one or more ladder rank faculty appointments based in the Halıcıoğlu Data Science Institute, the academic unit of the newly formed School of Computing, Information and Data Sciences. This is an open rank search for all levels of appointment (assistant, associate, or full professor). We seek outstanding candidates from ALL areas of Artificial Intelligence and Machine Learning as represented within HDSI's research scope, including but not limited to: 1) Computer Vision 2) AI for Science 3) Emerging Technologies for AI (such as Quantum Computing) apol-recruit.ucsd.edu/JPF04397 @HDSIUCSD @UCSD

English

25.2K

Tianhao Wang retweetledi

Sadhika Malladi@SadhikaMalladi·16 Eyl

Excited to share that I will be starting as an Assistant Professor in CSE at UCSD (@ucsd_cse) in Fall 2026! I am currently recruiting PhD students who want to bridge theory and practice in deep learning - see here: cs.princeton.edu/~smalladi/recr…

English

547

86.7K

Tianhao Wang retweetledi

Zhiyuan Li@zhiyuanli_·16 Tem

Adaptive optimizers range from AdaGrad-Norm to Shampoo and full-matrix AdaGrad, with increasingly expressive preconditioners. But does more adaptivity always translate to fewer steps to converge? Our ICML 2025 paper answers negatively via a unified convergence analysis. 🧵1/6

English

18.3K

Tianhao Wang retweetledi

Zhuoran Yang@zhuoran_yang·18 Haz

🚀 We're excited to share our paper, "Taming Polysemanticity in LLMs," which introduces Group Bias Adaptation (GBA)—the FIRST Sparse Autoencoder (SAE) training method with a provable guarantee for untangling monosemantic concepts! 📄 Paper: arxiv.org/abs/2506.14002 🌐 Website: y-agent.github.io/taming-sae-gba… 🎯 Demo (Layer 26 of Qwen 2.5B-Base): y-agent.github.io/taming-sae-gba… Joint work with @siyuc3141, @HeejuneSheen, Xuyuan Xiong, and @0920wth

English

110

10.2K

Tianhao Wang retweetledi

Zhiyuan Li@zhiyuanli_·24 Nis

Why does Adam outperform SGD in LLMs training? Adaptive step sizes alone don't fully explain this, as Adam also surpasses adaptive SGD. Is coordinate-wise adaptivity the secret? Not entirely—Adam actually struggles in the rotated parameter space! 🧵 (1/6) arxiv.org/abs/2410.08198

English

266

47.1K

Tianhao Wang retweetledi

Zhuoran Yang@zhuoran_yang·26 Mar

[New paper on in-context learning] "In-Context Linear Regression Demystified" (link: arxiv.org/abs/2503.12734). Joint work @JLiangHe, @xintianpan, @siyuc3141. We establish a rather complete understanding of how one-layer multi-head attention solves in-context linear regression,

English

109

7.7K

Tianhao Wang retweetledi

Ruili Feng@feng_ruili_frl·21 Kas

A step towards neural interactive simulation, where is Neo?

Hongyang Zhang@hongyangzh

Introducing The Matrix --- a foundation world model for generating infinite-length, hyper-realistic videos with real-time, frame-level control: - Infinite-length video generation - 720p high-quality rendering - Real-time, frame-level control at 16 FPS - Generalization to real-world video control 🔗Blog: thematrix1999.github.io 📄Paper: thematrix1999.github.io/article/the_ma… 💻Code & Playable Demo: Coming soon! Key Innovation: A brand new technique called the shift-window denoise process model, enabling auto-regressive generation for diffusion and consistency models in real-time. Special thanks to project leader Ruili Feng and the entire Matrix team for their dedication and hard work over the year-long project.

English

1.5K

Tianhao Wang retweetledi

Barna Saha@B1ar2n3a·18 Eki

Applications now open for broad area search in Data Science at the brand new School of Computing, Information & Data Sciences at UCSD. @yuxiangw_cs @GuptaUcsd @MountainOfMoon @HDSIUCSD apol-recruit.ucsd.edu/JPF04109

English

7.7K

Tianhao Wang retweetledi

Zhuoran Yang@zhuoran_yang·18 Eyl

[New Paper on In-Context Learning] Title: Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers Joint work with @siyuc3141 @HeejuneSheen @0920wth Link: arxiv.org/abs/2409.10559

English

144

19K

Tianhao Wang@0920wth·15 Haz

@B1ar2n3a Looking forward to joining you in person!!

English

Barna Saha@B1ar2n3a·15 Haz

Thuy-Duong “June” Vuyong and Tianhao Wang @0920wth joining us as new faculties, and multiple wedding bells in the group. (The pictures missing few other folks who joined, and the awesome food that we had 🙂)

English

1.2K

Barna Saha@B1ar2n3a·15 Haz

The UCSD theory group EOY celebration. We had a lot to celebrate: alum @JessSorrell joining JHU as an assistant prof, @MHop_Theory and Rex Lei graduating, lot of amazing work including Chris’s work selected as ICML Oral, a big cohort of students and postdocs joining in 24,