Peihao Wang

41 posts

Peihao Wang

@peihao_wang

📚 PhD Student @utexasece @WNCG_UT @VITAGroupUT; 🌟 Stanford Rising Star in Data Science 2025; 🎓 Google Fellowship 2025 in ML & ML foundations; 🎄@ccccrs_0908

Austin, TX Katılım Ocak 2020

238 Takip Edilen212 Takipçiler

Peihao Wang@peihao_wang·1d

Interestingly, we revealed a duality: 🔵 Training-time alignment ≈ amortized parameter-space optimization 🔵 Test-time optimization ≈ latent space sampling From a classical statistical inference lens, these two are tightly connected, just operating over different spaces.

English

395

Peihao Wang@peihao_wang·1d

We formulate decoding as an optimization problem: find responses that maximize a differentiable reward subject to being sampled from an LLM . Gradients are backpropagated into the model’s hidden states, steering inference into a form of test-time training.

English

486

Peihao Wang@peihao_wang·1d

Latent space reasoning via looped transformers has gained attention lately. It is rooted in optimization unrolling , where each loop implicitly models a GD step on hidden states. Our ICLR paper studied what if we explicitly run GD in latent space at test time?

Zhen Wang@zhenwang9102

1/🧵 What if test-time reasoning wasn't discrete search, but gradient descent in latent space? Happy to share our #ICLR2026 paper ∇-Reasoner: a paradigm shift from zeroth-order search to first-order optim at test time. Led by @peihao_wang @ccccrs_0908 iclr.cc/virtual/2026/p…

English

365

32.3K

Peihao Wang@peihao_wang·1 Nis

@liliang_ren Congrats. Looking forward to your new chapter.

English

176

Liliang Ren@liliang_ren·31 Mar

Life update: Actually I joined Thinking Machines Lab yesterday. Feels grateful for my friends and managers at Microsoft for their flexibility, trust and support. And super excited to see what I can build next at TML!

Liliang Ren@liliang_ren

A big shoutout to my brilliant and supportive collaborators at Microsoft: @nlpyang, Yelong Shen, and @WeizhuChen!

English

289

36.8K

Peihao Wang@peihao_wang·13 Mar

"One static model does not fit all." Reminds me of the old parametric vs. non-parametric regression debate. Nice to see scalable generative weight-space models finally taking shape.

Tencent Hy@TencentHunyuan

One static model does not fit all😭 We just dropped our latest work: Functional Neural Memory. Instead of static models, we generate custom "parameters" for every single input. ✅Prompt your model anytime ✅Instant personalization ✅Better instruction following ✅Flexible & dynamic memory (w/o memory bank✌️) (🧵1/6)

English

149

Peihao Wang retweetledi

DAIR.AI@dair_ai·24 Oca

Are multi-agent systems necessary? Here is a great new paper addressing this. The big assumption most AI devs make today is that more agents lead to better performance. But here is the overlooked reality: most multi-agent systems are homogeneous. All agents typically share the same base LLM, differing only in prompts, tools, and positions in the workflow. This raises a compelling question of whether a single agent can simulate these workflows through multi-turn conversations. This new research investigates this across seven benchmarks spanning coding, mathematics, QA, domain-specific reasoning, and real-world planning. A single agent with KV cache reuse can match the performance of homogeneous multi-agent workflows while reducing inference costs. The cost advantage comes from shared KV cache across agent interactions, avoiding redundant prefill computation. Because homogeneous agents possess identical reasoning capabilities and differ only in specialized instructions, a single agent can role-play these agents sequentially, exploiting the workflow's task decomposition without needing separate model instances. Building on this finding, the researchers propose OneFlow, an algorithm that automatically designs workflows optimized for single-agent execution. OneFlow uses a dual meta-LLM architecture (Creative Designer + Critical Reviewer) with Monte Carlo Tree Search to discover streamlined workflows with comprehensive system prompts and fewer total agents. OneFlow with single-agent execution achieves 92.1% on HumanEval, 81.4% on MBPP, 93.3% on GSM8K, matching or exceeding multi-agent baselines while significantly reducing cost. Single-LLM methods cannot capture truly heterogeneous workflows where agents use different base models, since KV caches cannot be shared across different LLMs. These results position single-LLM implementation as a strong baseline for MAS research. The authors suggest that the real opportunity lies in developing heterogeneous systems where model diversity benefits outweigh coordination costs. Paper: arxiv.org/abs/2601.12307 Learn to build effective AI agents in our academy: dair-ai.thinkific.com

English

209

23.8K

Peihao Wang@peihao_wang·27 Eki

@zhiwen_fan_ Thx Zhiwen! Glad that I finally made some progress chasing your excellence.

English

Zhiwen(Aaron) Fan@zhiwen_fan_·27 Eki

Huge congratulations, Peihao🫡 @peihao_wang

VITA Group@VITAGroupUT

🎉 Huge congratulations to PhD student Peihao Wang (@peihao_wang ) on two major honors: 🏆 2025 Google PhD Fellowship in Machine Learning & ML Foundations 🌟 Stanford Rising Star in Data Science Incredibly proud of Peihao's outstanding achievements! 🔶⚡

Français

1.1K

Peihao Wang@peihao_wang·27 Eki

@VITAGroupUT It won't be possible without the team I'm working with.

English

VITA Group@VITAGroupUT·24 Eki

English

2.4K

Peihao Wang@peihao_wang·27 Eki

Thank you, @JeffDean. Really honored to join the 2025 class of Google PhD Fellows! Excited to carry forward the inspiration to explore the AI frontier where logics and physics meet algebra and geometry.

Jeff Dean@JeffDean

Congrats to all the 255 recipients of this year's Google PhD Fellows awards, across 35 countries! 🎉

English

8.8K

Peihao Wang@peihao_wang·27 Haz

This work is so special to me. I first touched cryo-EM as a junior - couldn’t believe a neural net could predict bio structure from extremely low SNR, unposed images. with so many AI progress in these 5 years, scaling laws make AI-driven protein discovery feel real

Zhiwen(Aaron) Fan@zhiwen_fan_

DUSt3R-like models work for scientific imaging too! Our ICCV’25 paper “CryoFastAR” shows that a geometric foundation model can do feed-forward ab initio cryo-EM reconstruction—10× faster and state-of-the-art quality on noisy particle images! #ICCV2025 #CryoEM 📎Paper: arxiv.org/abs/2506.05864

English

603

Peihao Wang retweetledi

Zhiwen(Aaron) Fan@zhiwen_fan_·13 May

We already introduced #LightGaussian last year to accelerate the rendering speed of 3DGS. In our CVPR'25 paper, SteepGS, we go further by demystifying and improving density control during 3DGS optimization — making training more efficient and reliable. Project Page: vita-group.github.io/SteepGS/

English

106

5.7K

Peihao Wang@peihao_wang·19 Ara

@ccccrs_0908 Congrats 🎊 so proud of you

English

179

Ruisi Cai@ccccrs_0908·19 Ara

Excited to share that I have been awarded NVIDIA fellowship! 🎉 Immensely grateful for the recognition and support - this inspires me to continue advancing research in LLM efficiency and AI security. blogs.nvidia.com/blog/graduate-…

English

254

22K

Peihao Wang retweetledi

Ruisi Cai@ccccrs_0908·6 Ara

Layer-wise routers are surprisingly redundant in current MoE. Check out Read-ME for the system-friendly MoE refactorization technique with system co-design!

Yeonju Ro@j777ro

(1/n) Do you think token batching in MoE is inefficient? Are you looking for ways to transform pre-trained LLMs into MoEs? Then you should check out Read-ME at NeurIPS'24! 📖 arxiv.org/abs/2410.19123

English

3.7K

Peihao Wang retweetledi

Zhiwen(Aaron) Fan@zhiwen_fan_·25 Eki

🚀 Our NeurIPS '24 work, Large Spatial Model (LSM), is here! LSM performs semantic 3D reconstruction in just 0.1s, processing unposed data via feed-forward 3D reconstruction. 👉It leverages large-scale 3D datasets with minimal annotations, defining a 3D latent space. We are continuously exploring how this explicit 3D representation can further enhance reasoning and robotic learning. 🔗 Try our online Gradio demo with your own data at largespatialmodel.github.io #NeurIPS2024 #3DReconstruction

English

309

43.7K

Peihao Wang retweetledi

Ruisi Cai@ccccrs_0908·17 Tem

Train one - Get many🚀! Check more details about Flextron at cairuisi.github.io/Flextron/

Pavlo Molchanov@PavloMolchanov

🚀 Introducing Flextron - a Many-in-One LLM - Oral at ICML! Train one model and get many optimal models for each GPU at inference without any additional retraining. 🌟 🔗 Paper: arxiv.org/abs/2406.10260 Main benefits with only 5% post-training finetuning: ✅ Best model for every GPU (small & large) without retraining ✅ Change inference cost on the fly based on load ✅ Input-adaptive inference (heterogeneous weight-shared MoE, Attention) ✅Instead of training many models, we train only 1: LLaMa2-7B ➡️ 3B, 4B, 5B, 6B, etc. Method in observation in thread. 🧵👇

English

1.3K

Peihao Wang retweetledi

Mingyuan Zhou@MingyuanZhou·25 Haz

Introducing Score identity Distillation with Long and Short Guidance (SiD-LSG), our data-free solution to distill Stable Diffusion models into one-step text-to-image generators, achieving a COCO2014 zero-shot FID of 8.15. Excited to share the code and checkpoints with the community! Code: github.com/mingyuanzhou/S… Paper: arxiv.org/abs/2406.01561 #Diffusion #Distillation #StableDiffusion @ZhendongWang6 @UnderGroundJeg @haihuang_ml

English

683

Peihao Wang retweetledi

Ruisi Cai@ccccrs_0908·18 Haz

Tired of training varying-size LLMs to fit various GPU memory and latency requirements? Check out Flextron! Our new ICML (Oral) paper shows how to train one model deployable across GPU series. Learn more: cairuisi.github.io/Flextron/🚀

English

5.3K

Peihao Wang retweetledi

Ruisi Cai@ccccrs_0908·18 Haz

The Flextron-Llama2-7B model family demonstrates superior MMLU performance compared to both open-source models (including Pythia, OpenLLaMA-v2) and existing post-hoc compression methods (including Sheared-LLaMA, SliceGPT, LLM-Pruner, Compresso, LaCo).

English

1.3K

Peihao Wang retweetledi

Ruisi Cai@ccccrs_0908·12 Haz

Managing long context is challenging due to quadratic attention memory usage. But what if we could compress growing context information into a fixed-size memory? 🤔 Check out our new ICML paper: "LoCoCo: Dropping In Convolutions for Long Context Compression"! 1/3

English

20.1K

Peihao Wang@peihao_wang·30 Mar

Training 3D foundation models? In our CVPR2024 work, we propose a new concept that directly enhances 2D prediction’s view consistency via image based rendering. It generalizes to many 2D foundation models in zero shot and transfers their success to 3D at little training cost.

Mukund@sneezygiraffe

Progress in 2D vision models has been exciting, e.g. SAM, DINO, etc. But how do we apply them on a 3D scene? We propose Lift3D, a plug ‘n play framework that converts any arbitrary 2D vision model to be 3D consistent w/o any extra optimization. arxiv.org/abs/2403.18922

English

725

Keşfet

@liliang_ren @zhiwen_fan_ @VITAGroupUT @JeffDean @ccccrs_0908 @ZhendongWang6 @UnderGroundJeg @haihuang_ml