Yi Pan

45 posts

Yi Pan

@conlesspan

PhD Student @BerkeleySky on systems and AI; Prev @UWSyFI @sjtu1896

Sumali Ağustos 2019

291 Sinusundan80 Mga Tagasunod

Yi Pan nag-retweet

Guilherme Favaron@guifav·1d

Every team shipping a coding agent — Claude Code, Codex, Cursor — is really running a serving-systems problem. The "tech behind the tech" is the LLM-serving stack underneath, and until now nobody had real data on what that workload looks like. New arXiv (2606.30560) from @bariskasikci's SyFI lab (@UWSyFi, @uwcse) is the first large cross-provider trace of real coding-agent use: ~4,300 sessions, 350K LLM steps, 430K tool calls, 43 developers, 8 months, Claude Code + Codex. It breaks the intuition that agents mean long generations. The median step replays ~119K context tokens to emit just ~214 output tokens — two orders of magnitude more reading than writing. So the bill is the context, not the answer: prefix tokens are 59.5% of total cost. Tool calls are brutally long-tailed: 80+ tools, but the top 3 are 80%+ of calls, and the 4% of calls that run >1 min eat 85% of all tool time. And the prefix cache everyone leans on? 95.7% hit rate — yet misses cluster right after a human pauses to think, amplifying prefill 3.8x. Those human-gap misses alone are ~46% of fresh tokens and ~13% of spend. For technical leaders: your agent's cost and latency live in the loop, the replayed context, and the idle gaps — not raw token generation. Tune tool-call overhead, append-length-aware prefill, and KV-cache eviction around human gaps before you scale the fleet.

English

803

Yi Pan nag-retweet

Mathew Jacob@mat_jacob1002·1d

To make serving coding agents more efficient, we need better hills to climb than traces from synthetic benchmarks like SWE-Bench. This work led by @KanZhu854772 is critical in curating a realistic workload for the ML Sys community to climb! (plus you can analyze your own usage!)

UW SyFi@UWSyFi

🔥 Coding agents have become one of the hottest LLM workloads. But serving them looks nothing like serving a chatbot: 294× more input than output, hundreds of thousands of tool calls, and extremely long-tailed latency. 🚀 We are releasing the SyFI Coding Trace: ~4,300 real-world coding-agent sessions from our daily use, plus TraceLab, an open-source pipeline to collect, sanitize, analyze, and replay your own traces. More in the thread below 🧵👇 (1/n)

English

740

Yi Pan nag-retweet

Yilong Zhao@ylzhao_dreamer·1d

Checkout this great statistic and analysis on modern agentic inference workloads!

UW SyFi@UWSyFi

English

740

Yi Pan@conlesspan·1d

Check out our latest work on coding agent serving led by @serendipity_zk ! If you want to analyze characteristics of coding agents, collect your own vibe coding data, or use traces to optimize serving, give it a spin!

UW SyFi@UWSyFi

English

121

Yi Pan@conlesspan·23 Haz

@xiangfeng_zhu @ratulm @arvind_uw Congrats!

English

Xiangfeng Zhu@xiangfeng_zhu·23 Haz

I feel incredibly fortunate to have had two advisors whose unwavering support shaped my PhD journey. They gave me the freedom to explore, take risks, and occasionally disappear down research rabbit holes. Thank you, @ratulm and @arvind_uw ! Now, on to the next phase :)

Ratul Mahajan@ratulm

Rituals are silly, but fun too. Here are @arvind_uw and I hooding our PhD student, @xiangfeng_zhu. His thesis showed how to design and implement networks that are hyper-customized to applications' needs rather than requiring applications to work around whatever the network stack happens to provide. He is now off to help machines think at Thinking Machines. Good luck, Xiangfeng!

English

30.5K

Yi Pan@conlesspan·11 Haz

Check out our recent work!

UW SyFi@UWSyFi

New distributed training strategies should not require new distributed runtimes. Introducing Piper: a programmable PyTorch training system for deploying complex training strategies by separating model placement and GPU scheduling from model code. 📄 arxiv.org/abs/2606.11169

English

150

Yi Pan@conlesspan·2 Haz

@cHHillee This is related to #issuecomment-3723524517" target="_blank" rel="nofollow noopener">github.com/google/perfett…. My current workaround is using a script to manually adjust the timestamps to ensure no overlapping 🫠

English

178

Horace He@cHHillee·2 Haz

I'm not sure how useful this is but it certainly would have been useful for me last year... If a chrome trace has events that overlap on the same stream (e.g. event1 ends at 10.1 and event2 begins at 10.05), perfetto's behavior is to not show the events and show an empty gap in your trace >:(

English

160

20.5K

Yi Pan@conlesspan·23 May

Huge congrats to the team!!!

Baris Kasikci@bariskasikci

Super stoked that UW SyFI (syfi.cs.washington.edu) members won a number of prizes at the MLSys'26 competition, NVIDIA Track. Hugre congrats to @KeisukeKamahori , @sudopowr , Yile Gu, Wei Shen, Steven Gao! Thanks to @nvidia , @modal , and the Flashinfer team for the support. 1st place in the GDN Track — Full-Agent Approach 2nd place in the GDN Track — Agent-Assisted Approach 3rd place in the DSA Track — Full-Agent Approach

English

107

Yi Pan@conlesspan·8 May

@ZhiyuanZeng_ GOAT!

English

Zhiyuan Zeng@ZhiyuanZeng_·6 May

Excited to see RLVE being adopted! 🤩

Zyphra@ZyphraAI

Post-training is a 4-stage RL cascade on a shared algorithmic spine: async PipelineRL, DPPO Binary-TV trust regions, Dr-GRPO loss aggregation, MaxRL advantages, no KL-in-reward. Reasoning warmup → RLVE-Gym curriculum → math/code/TTC RL → behavioral RL.

English

5.3K

Yi Pan nag-retweet

Vic Shihang Li@sudopowr·19 Mar

Today's AI agents can diagnose production incidents, but they start from scratch every single time. What if they could remember? New on @acmsigops: our work on the Self-Defining Operator, a multi-agent system with long-term memory for autonomous ops.

ACM SIGOPS@ACMSIGOPS

New SIGOPS Blog -- "The Long Game: How Agents That Remember Resolve Operational Issues Faster" by Shihang (Vic) Li, Thomas Anderson, Ratul Mahajan, Simon Peter, Luke Zettlemoyer, and the SDS team. sigops.org/2026/the-long-…

English

3.1K

Yi Pan@conlesspan·8 Ara

@ying11231 @radixark @lmsysorg @slime_framework @jxwuyi @xai Congrats!

English

305

Ying Sheng@ying11231·8 Ara

We've been running @radixark for a few months, started by many core developers in SGLang @lmsysorg and its extended ecosystem (slime @slime_framework , AReaL @jxwuyi). I left @xai in August — a place where I built deep emotions and countless beautiful memories. It was the best place I’ve ever worked, the place I watched grow from a few dozen people to hundreds, and it truly felt like home. What pushed me to make such a hard decision is the momentum of building SGLang open source and the mission of creating an ambitious future, within an open spirit that I learnt from my first job at @databricks after my PhD. We started SGLang in the summer of 2023 and made it public in January 2024. Over the past 2 years, hundreds of people have made great efforts to get to where they are today. We experienced several waves of growth after its first release. I still remember the many dark nights in the summer of 2024, I spent with @lm_zheng , @lsyincs , and @zhyncs42 debugging, while @ispobaoke single-handedly took on DeepSeek inference optimizations, seeing @GenAI_is_real and the community strike team tag-teaming on-call shifts non-stop. There are so many more who have joined that I'm out of space to call out, but they're recorded on the GitHub contributor list forever. The demands grow exponentially, and we have been pushed to make it a dedicated effort supported by RadixArk. It’s the step-by-step journey of a thousand miles that has carried us here today, and the same relentless Long March that will lead us into the tens of thousands of miles yet to come. The story never stops growing. Over the past year, we’ve seen something very clear: The world is full of people eager to build AI, but the infrastructure that makes it possible is not shared. The most advanced inference and training stacks live inside a few companies. Everyone else is forced to rebuild the same schedulers, compilers, serving engines, and training pipelines again and again — often under enormous pressure, with lots of duplicated effort and wasted insight. RadixArk was born to change that. Today, we’re building an infrastructure-first, deep-tech company with a simple and ambitious mission: "Make frontier-level AI infrastructure open and accessible to everyone." If the two values below resonate with you, come talk to us: (1) Engineering as an art. Infrastructure is a first-class citizen in RadixArk. We care about elegant design and code that lasts. Beneath every line of code lies the soul of the engineer who wrote it. (2) A belief in openness. We share what we build. We bet on long-term compounding through community, contribution, and giving more than we take. A product is defined by its users, yet it truly comes alive the moment functionality transcends mere utility and begins to embody aesthetics. Thanks to all the miles (the name of our first released RL framework; see below). radixark.ai

English

116

131

1.2K

551.5K

Yi Pan nag-retweet

Baris Kasikci@bariskasikci·8 Eki

How to beat all compression using LLMs? ⚙️ Introducing LLMc — a lossless compressor built with LLMs. LLMc leverages the predictive power of LLMs to beat traditional compressors like Gzip and LZMA on natural language text. (1/4) 🔗 Blog Post: syfi.cs.washington.edu/blog/2025-10-0… 💻 Code: github.com/uw-syfi/LLMc

English

3.3K

Yi Pan nag-retweet

Baris Kasikci@bariskasikci·30 Eyl

🎙️ Introducing VoxServe — a high-throughput, low-latency serving system built for Speech Language Models (TTS, STS, etc.), natively handling audio detokenization + streaming with performance as the core goal. (1/4) 🔗 blog post: vox-serve.github.io/2025/09/29/int… 💻 code: github.com/vox-serve/vox-…

English

966

Yi Pan@conlesspan·19 Eyl

Congrats!

Xingyang Li@XYLi_Bruce

Thrilled to announce that my first first-author paper in efficient ML is accepted by #NeurIPS2025! Let’s make video generation bigger and greater! Thanks my mentors and my advisor for their kind mentorship and encouragement. Can’t wait to see you guys at San Diego!

English

231

Yi Pan@conlesspan·13 Eyl

@b1antaidaye 😭

QME

157

ChatGPT辽太郎@jian_w3ng·12 Eyl

计算机系统：安卓方向；人工智能：苹果方向

中文

Yi Pan@conlesspan·29 Ağu

@iskyzh Just had my dinner there😋

English

153

迟猫猫🐱@iskyzh·29 Ağu

a sip of Bellevue 🤪 I love this place (only in summer)

English

6.3K

Yi Pan@conlesspan·23 Ağu

@bariskasikci @emnlpmeeting Congrats, Baris and Keisuke!

English

Baris Kasikci@bariskasikci·22 Ağu

🚀 Presenting LiteASR: a method that halves the compute cost of speech encoders by 2x, leveraging low-rank approximation of activations. LiteASR is accepted to #EMNLP2025 (main) @emnlpmeeting

English

7.5K

Yi Pan nag-retweet

Tianyin Xu@tianyin_xu·20 Tem

A petition to SIGOPS to adopt the USENIX Annual Technical Conference (ATC) and retain its steering committee docs.google.com/document/d/1wK… (not sure whether it can be done by SIGOPS alone, but it's great to let the voice be heard)

English

6.2K

Yi Pan@conlesspan·30 Haz

@bariskasikci congrats!

English

Baris Kasikci@bariskasikci·29 Haz

Grateful to the DSN community for the rising star recognition! Huge thanks to the letter writers, organizers, selection committee, all my collaborators, advisor, and most importantly my group members, which make it all possible!

Saurabh Bagchi@bagchi_saurabh

IEEE/IFIP DSN conference @DsnIeee just wrapped up in Naples. The Rising Star award, given to someone less than 10 years from graduation, went to Baris Kasikci @bariskasikci of University of Washington for his contributions to theory and industrial impact of dependability. I chaired the committee and thanks to the members for a diligent process to arrive at the winner. Miguel P. Correia (University of Lisbon) @miguelnpcorreia, Bianca Schroeder (University of Toronto), Amith Singhee (IBM Research, India) @asinghee1, Angelos Stavrou (Virginia Tech) @AngelosStavrou.

English

3.6K

Yi Pan@conlesspan·28 Haz

@BanghuaZ congrats!

English

Banghua Zhu@BanghuaZ·27 Haz

Excited to share that I’m joining NVIDIA as a Principal Research Scientist! We’ll be joining forces on efforts in model post-training, evaluation, agents, and building better AI infrastructure—with a strong emphasis on collaboration with developers and academia. We’re committed to open-sourcing our work and sharing it with the world. Let’s build a stronger, more open AI community together!

English

141

2.5K

249.8K

Tuklasin

@bariskasikci @UWSyFi @uwcse @KanZhu854772 @serendipity_zk @xiangfeng_zhu @ratulm @arvind_uw