Zhihao Zhang

33 posts

Zhihao Zhang

Zhihao Zhang

@Jackfram2

Katılım Mayıs 2019
150 Takip Edilen34 Takipçiler
Zhihao Zhang retweetledi
Lijie(Derrick) Yang
Lijie(Derrick) Yang@LijieyYang·
Excited to share that LessIsMore has been accepted to ICML 2026! 🚀 LessIsMore is a training-free sparse attention for efficient long-horizon reasoning. By enforcing cross-head unified token selection, it brings up to 1.6x E2E speedup while preserving reasoning accuracy under practical workloads. Huge thanks to my amazing co-authors and mentors @Jackfram2, @JiaZhihao, Ravi! Paper: arxiv.org/abs/2508.07101 Code: github.com/DerrickYLJ/Les… #ICML2026 #LLM #EfficientAI
Lijie(Derrick) Yang tweet media
English
8
18
67
7.1K
Zhihao Zhang retweetledi
Zhihao Zhang retweetledi
Zhihao Jia
Zhihao Jia@JiaZhihao·
🚀Introducing Motus, the open-source agent infrastructure that learns in production. Existing agent infra serves static agents: the harness, model, and workflow are fixed after deployment. But static agents degrade over time. The harness goes stale, new models go unincorporated, context drifts, and latency compounds. Motus closes this gap by learning from every trace (failures, latency, cost, and task outcomes) and using those signals to continuously optimize agent harness, model orchestration, context memory, and end-to-end latency. Early results: higher accuracy than any single frontier model at 2.3× lower cost (Terminal-Bench 2.0, SWE-bench Verified), with 52% lower latency and 45% better memory recall. Open source under Apache 2.0. Works with any agent SDK. Deploy with one command. github.com/lithos-ai/motus lithosai.com
Zhihao Jia tweet media
English
22
71
565
55.9K
Zhihao Zhang retweetledi
Zhihao Jia
Zhihao Jia@JiaZhihao·
Excited to see our inaugural CMU Catalyst Research Summit bring together 120+ attendees! A full day of discussions on the future of agentic AI systems, multi-modal AI, and ML compilation—with amazing energy from both academia and industry. Co-organized with @tqchenml @BeidiChen @Tim_Dettmers — this is just the beginning 🚀
Zhihao Jia tweet media
English
2
12
87
23.8K
Zhihao Zhang retweetledi
Yixin Dong
Yixin Dong@yi_xin_dong·
🌟 FlashInfer-Bench accepted to MLSys 2026! FlashInfer-Bench is also the platform for MLSys 2026 AI Kernel Challenge. Tomorrow is the last registration day! If you're into agents and CUDA kernels, be sure to join! 👉 mlsys26.flashinfer.ai So proud of the team for this milestone. Over the past two months, we've witnessed rapid progress in AI Agents for GPU optimization. We have been upgrading our benchmark system and dataset to keep up this pace: better model coverage, stronger safety guardrails. Check out our OSS project: 🔗 github.com/flashinfer-ai/… See you on the leaderboard, and see you at MLSys! 👋
English
4
13
85
7.6K
Zhihao Zhang retweetledi
Zihao Ye
Zihao Ye@ye_combinator·
🚀 MLSys 2026 Contest - @nvidia Track is LIVE! Registration is now open for the FlashInfer-Bench Challenge! Submit high-performance GPU kernels for cutting-edge LLM architectures on NVIDIA Blackwell GPUs. Three Tracks * MoE (Mixture of Experts) * DSA (Deepseek Sparse Attention) * GDN (Gated Delta Net) Human experts AND AI agents welcome — evaluated separately. Let's see who builds the best kernels! 🤖 🎁 Prizes: Winners take home NVIDIA GPUs and are invited for presentation at MLSys 2026. ⚡ First 50 teams to register get free GPU credits from @modal - huge thanks for the sponsorship @charles_irl ! Whether you're a kernel wizard or building autonomous coding agents, we want to see what you've got. 🔗 Contest details: mlsys26.flashinfer.ai See you at MLSys 2026! 🔥
English
4
57
296
74K
Zhihao Zhang retweetledi
Zhihao Jia
Zhihao Jia@JiaZhihao·
#MLSys2026 is inviting self-nominations for the External Review Committee (ERC)! If you want to contribute to the review process for the MLSys conference, nominate yourself and help shape this year's program. We especially welcome PhD students and early-career researchers! forms.gle/YdAih8VLuwSF1E…
English
2
12
20
9.8K
Zhihao Zhang retweetledi
Zhihao Jia
Zhihao Jia@JiaZhihao·
⏰3 days left to submit to #MLSys2026 (deadline October 30)! Submit your best ML systems work to the Research and Industrial Tracks, and join the MLSys community in Seattle next May. 👉mlsys.org
Zhihao Jia tweet media
English
0
5
20
13.8K
Zhihao Zhang retweetledi
Zhihao Zhang retweetledi
Tianqi Chen
Tianqi Chen@tqchenml·
📢Excited to introduce Apache TVM FFI, an open ABI and FFI for ML systems, enabling compilers, libraries, DSLs, and frameworks to naturally interop with each other. Ship one library across pytorch, jax, cupy etc and runnable across python, c++, rust tvm.apache.org/2025/10/21/tvm…
Tianqi Chen tweet media
English
3
41
166
38.3K
Zhihao Zhang retweetledi
Shanli Xing
Shanli Xing@shanli_xing·
🤔 Can AI optimize the systems it runs on? 🚀 Introducing FlashInfer-Bench, a workflow that makes AI systems self-improving with agents: - Standardized signature for LLM serving kernels - Implement kernels with your preferred language - Benchmark them against real-world serving workloads - Fastest kernels get day-0 integrated into production First-class integration with FlashInfer, SGLang (@lmsysorg ), and vLLM (@vllm_project ) at launch🙌 Blog post: flashinfer.ai/2025/10/21/fla… Leaderboard: bench.flashinfer.ai
Shanli Xing tweet media
English
3
45
147
59.5K
Zhihao Zhang retweetledi
Zhihao Jia
Zhihao Jia@JiaZhihao·
The #MLSys2026 submission deadline is only 2 weeks away (Oct 30)! Submit your best work on ML systems — spanning hardware, compilers, software, models, agents, and eval. This year features both Research and Industry Tracks! Join us in Seattle next spring! mlsys.org
English
0
14
22
4.5K
Zhihao Zhang retweetledi
Zhihao Jia
Zhihao Jia@JiaZhihao·
🚀Excited to share the #MLSys Call for Papers! For the first time, we’re also welcoming submissions to the Industrial Track. Research and industrial track deadline: Oct 30, 2025 Reviews available: Jan 12, 2026 Author responses: Jan 16, 2026 Notifications: Jan 25, 2026 mlsys.org/Conferences/20… mlsys.org/Conferences/20…
Minjia Zhang@_Minjia_Zhang_

Calling industry researchers: MLSys 2026 launches its first Industrial Track! 🚀 We're excited to announce the inaugural Call for Industrial Track Papers at MLSys 2026! 🎉 👉 mlsys.org/Conferences/20…) This is a unique opportunity for industry researchers and practitioners to share real-world innovations, system deployments, large-scale ML challenges, and lessons learned from practice with the MLSys community. 📌 Details & submission info: Paper submission deadline: Oct 30, 2025 20:00 UTC Full CFP: mlsys.org/Conferences/20… I’m honored to help launch this new track and look forward to seeing your contributions that bridge cutting-edge research with impactful practice. #MLSys2026 #CFP #MLSystems #MLforSystems

English
0
5
15
3.4K
Zhihao Zhang
Zhihao Zhang@Jackfram2·
Kudos to @LijieyYang and the team for the exciting new sparse attention work! LessIsMore is an elegant and effective solution for reasoning tasks. And there is still a lot more we can do on top of this, so stay tuned!
Lijie(Derrick) Yang@LijieyYang

[1/N] 🚀 Excited to introduce my first work at @Princeton: LessIsMore – a training-free sparse attention method tailored for efficient reasoning in LRMs, achieving lossless accuracy with high sparsity up to 87.5% and 1.1x avg decoding speedup compared to Full Attention on reasoning tasks like AIME-24. (More details in 🧵) 💻 Code: github.com/DerrickYLJ/Les… 📄 arXiv: huggingface.co/papers/2508.07… 🔍 HF Daily Paper: huggingface.co/papers/2508.07…

English
0
0
3
103
Songwei Ge
Songwei Ge@Songwei_Ge·
Training diffusion-style policies with RL can be as easy as training Gaussian policies. We introduce Flow Policy Optimization (FPO) — bringing flow matching into the policy gradient world. Multimodal. Sampling-agnostic. More expressive than Gaussians.
David McAllister@davidrmcall

Excited to share Flow Matching Policy Gradients: expressive RL policies trained from rewards using flow matching. It’s an easy, drop-in replacement for Gaussian PPO on control tasks.

English
3
5
66
4.6K
Zhihao Zhang retweetledi
Zhihao Jia
Zhihao Jia@JiaZhihao·
📢Exciting updates from #MLSys2025! All session recordings are now available and free to watch at mlsys.org. We’re also thrilled to announce that #MLSys2026 will be held in Seattle next May—submissions open next month with a deadline of Oct 30. We look forward to seeing your best work! #MLSys #AI #ML
Zhihao Jia tweet media
English
2
32
109
49.1K
Zhihao Zhang retweetledi
Zhihao Jia
Zhihao Jia@JiaZhihao·
One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But writing megakernels by hand is extremely hard. 🚀Introducing Mirage Persistent Kernel (MPK), a compiler that automatically transforms LLMs into optimized megakernel, reducing latency by 1.2-6.7x. 🔧Tool: github.com/mirage-project… 📝Blog: zhihaojia.medium.com/compiling-llms…
Zhihao Jia tweet media
English
17
123
776
84.2K
Zhihao Zhang
Zhihao Zhang@Jackfram2·
Our latest experiments further demonstrate the potential of TidalDecode on challenging reasoning benchmarks (details in the blog)!
English
0
0
1
46
Zhihao Zhang
Zhihao Zhang@Jackfram2·
Couldn’t attend @iclr_conf , but please come by and checkout our interesting work on inference time sparse attention serving for LLMs! Really glad to work with amazing collaborators @LijieyYang @ZhuofuChen @zikunli_zk and my amazing advisor @JiaZhihao !
Lijie(Derrick) Yang@LijieyYang

🚀 Excited to present our work at CMU Catalyst, TidalDecode, at #ICLR2025 tomorrow! TidalDecode with Position Persistent Sparse Attention (PPSA)—enables: 🔹 2.1× faster long-context decoding 🔹 High-quality generation (100% accuracy with only 0.1% tokens on Needle-in-the-Haystack!) 🔹 Impressive performance on reasoning tasks 🗓️ Poster Session: Hall 3 + Hall 2B #132 from 15:00 to 17:30 📖 Blog&Paper: sites.google.com/andrew.cmu.edu… 💻 GitHub: github.com/DerrickYLJ/Tid… Come chat about making LLMs faster and smarter!

English
2
0
1
158
Zhihao Zhang retweetledi
CMU School of Computer Science
Huge thank you to @NVIDIADC for gifting a brand new #NVIDIADGX B200 to CMU’s Catalyst Research Group! This AI supercomputing system will afford Catalyst the ability to run and test their work on a world-class unified AI platform.
CMU School of Computer Science tweet mediaCMU School of Computer Science tweet mediaCMU School of Computer Science tweet mediaCMU School of Computer Science tweet media
English
3
28
140
81.7K