Aurick Qiao

176 posts

Aurick Qiao

Aurick Qiao

@aurickq

ML Systems @thinkymachines | PhD CS @CarnegieMellon

Seattle, WA Katılım Kasım 2016
350 Takip Edilen942 Takipçiler
Aurick Qiao retweetledi
Mira Murati
Mira Murati@miramurati·
Today we're sharing our work on interaction models. A new class of model trained from scratch to handle real-time interaction natively, instead of gluing it onto a turn-based one. youtu.be/A12AVongNN4
YouTube video
YouTube
English
331
937
9K
1.2M
Aurick Qiao retweetledi
Ying Sheng
Ying Sheng@ying11231·
Congrats @radixark ! From SGLang @lmsysorg to Miles, and to future products, RadixArk is dedicated to building a crucible capable of repeatedly producing cutting-edge AI, bringing the best of AI into every household. We believe in a future of AI diversity and hope to drive the integration of AI into every aspect of production and daily life. In the future we envision, AI will become a partner to many companies and individuals, finding ways to self-evolve—in production, in daily companionship, and within virtual worlds. Everything we have experienced and will continue to experience in the SGLang and Miles open-source communities is unforgettable and highly anticipated. It has been both demanding and exhilarating, allowing us to see friendship, the world, and the boundaries. Over the past six months, I have witnessed for the first time how a united team moves forward hand in hand, and how deeply passionate they are about creation. Each of us has taken on our respective roles and numerous new tasks for the first time; we are all stepping out of our comfort zones, growing, and creating at a rapid pace. "It’s the step-by-step journey of a thousand miles that has carried us here today, and the same relentless march that will lead us into the tens of thousands of miles yet to come." In an era where AI has made ordinary productivity cheaper, relentless, day-to-day refinement has increasingly become the rare key that drives innovation and the future. We hope this will forever remain the soul of RadixArk's culture: focused, uncompromising, humble, and fearless. The underlying logic of creation is not the deliberate pursuit of novelty, but rather independent thinking that remains unswayed by temptation, paired with a meticulous drive for perfection.
RadixArk@radixark

Today, we are thrilled to officially launch RadixArk with $100M in Seed funding at a $400M valuation. The round was led by @Accel and co-led by @sparkcapital. RadixArk exists to make frontier AI infrastructure open and accessible to everyone. Today, the systems behind the most capable AI models are concentrated in a small number of companies. As a result, most AI teams are forced to rebuild training and inference stacks from scratch, duplicating the same infrastructure work instead of focusing on new models, products, and ideas. RadixArk was founded to change that. We are building an AI platform that makes it easier for teams to train and serve the best models at scale. RadixArk comes from the open-source community. We started with SGLang, where many of us are core developers and maintainers, and expanded our work to Miles for large-scale RL and post-training. We will continue contributing to both projects and working with the community to make them the strongest open-source infrastructure foundations for frontier AI. We would like to thank our long-term partners, contributors, and the broader SGLang community for believing in this mission. We're also grateful to @Accel and @sparkcapital, NVentures (Venture capital arm of @nvidia), Salience Capital, A&E Investment, @HOFCapital, @walden_catalyst, @AMD, LDVP, WTT Fubon Family, @MediaTek, Vocal Ventures, @Sky9Capital and our angel investors @ibab, @LipBuTan1, Hock Tan, @johnschulman2, @soumithchintala, @lilianweng, @oliveur, @Thom_Wolf, @LiamFedus, @robertnishihara, @ericzelikman, @OfficialLoganK, and @multiply_matrix among others. Thanks for the exclusive interview with @MeghanBobrowsky at @WSJ about our vision.

English
21
27
210
18.4K
Aurick Qiao retweetledi
Hao Zhang
Hao Zhang@haozhangml·
Real-time videogen has been something I have been pushing hard at FastVideo Team. And Today, we have a big update -- we just made it: Now you can create a 5s 1080p Video in 4.5s with FastVideo on a Single GPU I believe this is the fastest 1080p text-image-to-audio-video pipeline ever! Try our free demo to feel the speed and quality: 1080p.fastvideo.org and give us feedback Blog: haoailab.com/blogs/fastvide…
Hao AI Lab@haoailab

(1/N) Content creators have been stuck with costly and slow video generation APIs for far too long. We couldn’t take it anymore.😅😭 FastVideo’s new real-time inference stack has the fastest 1080p TI2AV pipeline ever.😍🚀🚀 Our optimized LTX-2.3 pipeline creates 5-second 1080p videos with audio in 4.55 s, on a single GPU! 3.9x faster than the next fastest option. 🕹️Live demo: 1080p.fastvideo.org 📜Blog: haoailab.com/blogs/fastvide…

English
5
11
109
15.9K
Aurick Qiao retweetledi
Woosuk Kwon
Woosuk Kwon@woosuk_k·
Today, we're proud to announce @inferact, a startup founded by creators and core maintainers of @vllm_project, the most popular open-source LLM inference engine. Our mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. The Challenge Inference is not solved. It's getting harder. Models grow larger. New architectures proliferate: mixture-of-experts, multimodal, agentic. Every breakthrough demands new infrastructure. Meanwhile, hardware fragments: more accelerators, more programming models, and more combinations to optimize. The capability gap between models and the systems that serve them is widening. Left this way, the most capable models remain bottlenecked and with full scope of their capabilities accessible only to those who can build custom infrastructure. Close the gap, and we unlock new possibilities. And the problem is growing. Inference is shifting from a fraction of compute to the majority: test-time compute, RL training loops, synthetic data. We see a future where serving AI becomes effortless. Today, deploying a frontier model at scale requires a dedicated infrastructure team. Tomorrow, it should be as simple as spinning up a serverless database. The complexity doesn't disappear; it gets absorbed into the infrastructure we're building. Why Us vLLM sits at the intersection of models and hardware: a position that took years to build. When model vendors ship new architectures, they work with us to ensure day-zero support. When hardware vendors develop new silicon, they integrate with vLLM. When teams deploy at scale, they run vLLM, from frontier labs to hyperscalers to startups serving millions of users. Today, vLLM supports 500+ model architectures, runs on 200+ accelerator types, and powers inference at global scale. This ecosystem, built with 2,000+ contributors, is our foundation. We've been stewards of this engine since its first commit. We know it inside out. We deployed it at frontier scale—in research and in production. Open Source vLLM was built in the open. That's not changing. Inferact exists to supercharge vLLM adoption. The optimizations we develop flow back to the community. We plan to push vLLM's performance further, deepen support for emerging model architectures, and expand coverage across frontier hardware. The AI industry needs inference infrastructure that isn't locked behind proprietary walls. Join Us Through the open source community, we are fortunate to work with some of the best people we know. For @inferact, we're hiring engineers and researchers to work at the frontier of inference, where models meet hardware at scale. Come build with us. We're fortunate to be supported by investors who share our vision, including @a16z and @lightspeedvp who led our $150M seed, as well as @sequoia, @AltimeterCap, @Redpoint, @ZhenFund, The House Fund, @strikervp, @LaudeVentures, and @databricks. - @woosuk_k, @simon_mo_, @KaichaoYou, @rogerw0108, @istoica05 and the rest of the founding team
Woosuk Kwon tweet media
English
181
129
1.2K
481K
Aurick Qiao retweetledi
Woosuk Kwon
Woosuk Kwon@woosuk_k·
It still feels a little unreal to look back at how far @vllm_project has come. What started as a small research project that Zhuohan and I launched ended up receiving so much love and connecting me with people who are now some of my closest friends. In so many ways, I already feel incredibly lucky for what this journey has given me. To be honest, my path with vLLM hasn’t been perfectly straight. Over the past three years, my passion dipped at times, and I did spend my energy exploring things I thought were more interesting than vLLM and inference. vLLM is what it is today because of the community, and I’m truly grateful for their commitment. My view on inference also evolved a lot along the way. What once felt mostly “solved” turned out to be far from it. The rapid pace of new models, increasingly complex architectures, diverse hardware setups, and agents have made inference genuinely hard. The need for strong inference infrastructure has only kept growing, and it became clear just how much important work remains. Somewhere along that journey, I realized how special this work really is and how uniquely positioned vLLM is. Now, I’m committed to pushing it all the way. With that, I started @inferact with @simon_mo_, @KaichaoYou, @rogerw0108, @istoica05, and amazing founding team from both inside and outside the vLLM community. I’m deeply grateful to our investors, including @a16z and @lightspeedvp, for believing in us and giving us this opportunity. Excited for this next chapter, and looking forward to sharing more soon.
Woosuk Kwon@woosuk_k

Today, we're proud to announce @inferact, a startup founded by creators and core maintainers of @vllm_project, the most popular open-source LLM inference engine. Our mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. The Challenge Inference is not solved. It's getting harder. Models grow larger. New architectures proliferate: mixture-of-experts, multimodal, agentic. Every breakthrough demands new infrastructure. Meanwhile, hardware fragments: more accelerators, more programming models, and more combinations to optimize. The capability gap between models and the systems that serve them is widening. Left this way, the most capable models remain bottlenecked and with full scope of their capabilities accessible only to those who can build custom infrastructure. Close the gap, and we unlock new possibilities. And the problem is growing. Inference is shifting from a fraction of compute to the majority: test-time compute, RL training loops, synthetic data. We see a future where serving AI becomes effortless. Today, deploying a frontier model at scale requires a dedicated infrastructure team. Tomorrow, it should be as simple as spinning up a serverless database. The complexity doesn't disappear; it gets absorbed into the infrastructure we're building. Why Us vLLM sits at the intersection of models and hardware: a position that took years to build. When model vendors ship new architectures, they work with us to ensure day-zero support. When hardware vendors develop new silicon, they integrate with vLLM. When teams deploy at scale, they run vLLM, from frontier labs to hyperscalers to startups serving millions of users. Today, vLLM supports 500+ model architectures, runs on 200+ accelerator types, and powers inference at global scale. This ecosystem, built with 2,000+ contributors, is our foundation. We've been stewards of this engine since its first commit. We know it inside out. We deployed it at frontier scale—in research and in production. Open Source vLLM was built in the open. That's not changing. Inferact exists to supercharge vLLM adoption. The optimizations we develop flow back to the community. We plan to push vLLM's performance further, deepen support for emerging model architectures, and expand coverage across frontier hardware. The AI industry needs inference infrastructure that isn't locked behind proprietary walls. Join Us Through the open source community, we are fortunate to work with some of the best people we know. For @inferact, we're hiring engineers and researchers to work at the frontier of inference, where models meet hardware at scale. Come build with us. We're fortunate to be supported by investors who share our vision, including @a16z and @lightspeedvp who led our $150M seed, as well as @sequoia, @AltimeterCap, @Redpoint, @ZhenFund, The House Fund, @strikervp, @LaudeVentures, and @databricks. - @woosuk_k, @simon_mo_, @KaichaoYou, @rogerw0108, @istoica05 and the rest of the founding team

English
4
5
152
15.2K
Aurick Qiao retweetledi
Kevin Kwok
Kevin Kwok@kevinakwok·
Wow TML really *is* taking a different approach from the other AI labs
Kevin Kwok tweet media
English
28
4
360
250.4K
Aurick Qiao retweetledi
Thinking Machines
Thinking Machines@thinkymachines·
Tinker is now generally available. We also added support for advanced vision input models, Kimi K2 Thinking, and a simpler way to sample from models. thinkingmachines.ai/blog/tinker-ge…
English
47
172
1.7K
1.1M
Angela Jiang
Angela Jiang@jiangelaa·
👋@worktrace_ai is out of stealth! Which also means I've officially rejoined the workforce...I couldn't help but join @deepakv91 to pursue this vision together. I really think we & our amazing team are on track to make a meaningful difference in bridging the AI divide. Join us!
Worktrace AI@worktrace_ai

Today, we're launching @worktrace_ai to help businesses uncover their best automation opportunities and build those automations. Our founders, Angela Jiang (product manager of GPT-3.5 and GPT-4 at OpenAI) and Deepak Vasisht (UIUC CS professor, MIT researcher, IIT graduate of the last decade), are determined to eliminate the AI divide between frontier labs and the workforce. Our $9M seed round is led by @8vc and @conviction with participation from @OpenAI, @svangel and @_geniusventures. Join us!

English
21
18
134
72.6K
Ying Sheng
Ying Sheng@ying11231·
We've been running @radixark for a few months, started by many core developers in SGLang @lmsysorg and its extended ecosystem (slime @slime_framework , AReaL @jxwuyi). I left @xai in August — a place where I built deep emotions and countless beautiful memories. It was the best place I’ve ever worked, the place I watched grow from a few dozen people to hundreds, and it truly felt like home. What pushed me to make such a hard decision is the momentum of building SGLang open source and the mission of creating an ambitious future, within an open spirit that I learnt from my first job at @databricks after my PhD. We started SGLang in the summer of 2023 and made it public in January 2024. Over the past 2 years, hundreds of people have made great efforts to get to where they are today. We experienced several waves of growth after its first release. I still remember the many dark nights in the summer of 2024, I spent with @lm_zheng , @lsyincs , and @zhyncs42 debugging, while @ispobaoke single-handedly took on DeepSeek inference optimizations, seeing @GenAI_is_real and the community strike team tag-teaming on-call shifts non-stop. There are so many more who have joined that I'm out of space to call out, but they're recorded on the GitHub contributor list forever. The demands grow exponentially, and we have been pushed to make it a dedicated effort supported by RadixArk. It’s the step-by-step journey of a thousand miles that has carried us here today, and the same relentless Long March that will lead us into the tens of thousands of miles yet to come. The story never stops growing. Over the past year, we’ve seen something very clear: The world is full of people eager to build AI, but the infrastructure that makes it possible is not shared. The most advanced inference and training stacks live inside a few companies. Everyone else is forced to rebuild the same schedulers, compilers, serving engines, and training pipelines again and again — often under enormous pressure, with lots of duplicated effort and wasted insight. RadixArk was born to change that. Today, we’re building an infrastructure-first, deep-tech company with a simple and ambitious mission: "Make frontier-level AI infrastructure open and accessible to everyone." If the two values below resonate with you, come talk to us: (1) Engineering as an art. Infrastructure is a first-class citizen in RadixArk. We care about elegant design and code that lasts. Beneath every line of code lies the soul of the engineer who wrote it. (2) A belief in openness. We share what we build. We bet on long-term compounding through community, contribution, and giving more than we take. A product is defined by its users, yet it truly comes alive the moment functionality transcends mere utility and begins to embody aesthetics. Thanks to all the miles (the name of our first released RL framework; see below). radixark.ai
English
115
130
1.2K
547.9K
Aurick Qiao
Aurick Qiao@aurickq·
After two amazing years at Snowflake AI Research, I have joined @thinkymachines! I am excited to work with the incredible team here and build world-class ML systems for the next generation of multimodal AI
English
11
0
187
24.9K
Aurick Qiao retweetledi
Zhihao Jia
Zhihao Jia@JiaZhihao·
Super excited about this work! 🔥 SuffixDecoding accelerates multi-round agent serving by reusing and optimizing over previous agent iterations—5x speedups on AgenticSQL. Come see @GabrieleOliaro’s #NeurIPS2025 Spotlight!
Gabriele Oliaro@GabrieleOliaro

🐢 Are your #LLM #agents too slow? 🚀 Introducing SuffixDecoding: make agentic workloads run up to 5.3x faster! 🎯 Emerging AI workflows suffer high latency. We fix this with extreme speculative decoding using suffix trees. 🌟 Come see our #NeurIPS2025 Spotlight!

English
0
3
22
4.1K
Aurick Qiao retweetledi
Gabriele Oliaro
Gabriele Oliaro@GabrieleOliaro·
🐢 Are your #LLM #agents too slow? 🚀 Introducing SuffixDecoding: make agentic workloads run up to 5.3x faster! 🎯 Emerging AI workflows suffer high latency. We fix this with extreme speculative decoding using suffix trees. 🌟 Come see our #NeurIPS2025 Spotlight!
Gabriele Oliaro tweet media
English
1
4
22
6.1K
Aurick Qiao
Aurick Qiao@aurickq·
Suffix Decoding is at #NeurIPS2025 as a 🏅spotlight! It accelerates LLM inference for coding, agents, and RL. We also optimized its speculation speed by 7.4x and merged it into vLLM (incoming to SGLang). Talk to @GabrieleOliaro or me at poster #816 Friday 11am! Links in🧵
Aurick Qiao tweet media
English
2
5
29
12.7K