Inferact

63 posts

Inferact banner
Inferact

Inferact

@inferact

Silicon Katılım Aralık 2025
3 Takip Edilen3.7K Takipçiler
Inferact retweetledi
Woosuk Kwon
Woosuk Kwon@woosuk_k·
Going from Ampere to Hopper and to Blackwell, we always find new ways to leverage the architectural innovations to accelerate inference performance. Excited to collaborate with @nvidia to advance @inferact’s mission to grow vLLM!
Inferact@inferact

We are thrilled to announce that @nvidia is the latest investor in @inferact. We look forward to continuing the momentum driven by our deep collaboration: (1) Engineering velocity: a significant uptick in @nvidia pull requests to the @vllm_project repo. (2) Product synergy: close integration with NVIDIA Dynamo, ModelOpt, Nemotron, and more products! It’s an exciting time for the growth and development of vLLM, the world's AI inference engine!

English
1
3
92
10.8K
Inferact retweetledi
Simon Mo
Simon Mo@simon_mo_·
@vllm_project has always been about the partnership and ecosystem that support open source inference. I’m excited to continue our collaboration with @nvidia and welcome them as @inferact’s latest investor.
Inferact@inferact

We are thrilled to announce that @nvidia is the latest investor in @inferact. We look forward to continuing the momentum driven by our deep collaboration: (1) Engineering velocity: a significant uptick in @nvidia pull requests to the @vllm_project repo. (2) Product synergy: close integration with NVIDIA Dynamo, ModelOpt, Nemotron, and more products! It’s an exciting time for the growth and development of vLLM, the world's AI inference engine!

English
6
2
49
8.5K
Inferact
Inferact@inferact·
We are thrilled to announce that @nvidia is the latest investor in @inferact. We look forward to continuing the momentum driven by our deep collaboration: (1) Engineering velocity: a significant uptick in @nvidia pull requests to the @vllm_project repo. (2) Product synergy: close integration with NVIDIA Dynamo, ModelOpt, Nemotron, and more products! It’s an exciting time for the growth and development of vLLM, the world's AI inference engine!
English
8
7
81
26K
Inferact retweetledi
Roger Wang
Roger Wang@rogerw0108·
"Math is hard - I find myself struggle with math very often." - guy with IMO & IOI gold who just joined us.
English
1
2
25
2.7K
Inferact retweetledi
Bogomil Balkansky
Bogomil Balkansky@BogieBalkansky·
It's wonderful to see the creators of the @vllm_project start a company @inferact . vLLM has been capturing the hearts and minds of the technical community for years, and a company based on it means more innovation from the brilliant minds behind it: @simon_mo_ @WooksuK and the whole team.
Woosuk Kwon@woosuk_k

Today, we're proud to announce @inferact, a startup founded by creators and core maintainers of @vllm_project, the most popular open-source LLM inference engine. Our mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. The Challenge Inference is not solved. It's getting harder. Models grow larger. New architectures proliferate: mixture-of-experts, multimodal, agentic. Every breakthrough demands new infrastructure. Meanwhile, hardware fragments: more accelerators, more programming models, and more combinations to optimize. The capability gap between models and the systems that serve them is widening. Left this way, the most capable models remain bottlenecked and with full scope of their capabilities accessible only to those who can build custom infrastructure. Close the gap, and we unlock new possibilities. And the problem is growing. Inference is shifting from a fraction of compute to the majority: test-time compute, RL training loops, synthetic data. We see a future where serving AI becomes effortless. Today, deploying a frontier model at scale requires a dedicated infrastructure team. Tomorrow, it should be as simple as spinning up a serverless database. The complexity doesn't disappear; it gets absorbed into the infrastructure we're building. Why Us vLLM sits at the intersection of models and hardware: a position that took years to build. When model vendors ship new architectures, they work with us to ensure day-zero support. When hardware vendors develop new silicon, they integrate with vLLM. When teams deploy at scale, they run vLLM, from frontier labs to hyperscalers to startups serving millions of users. Today, vLLM supports 500+ model architectures, runs on 200+ accelerator types, and powers inference at global scale. This ecosystem, built with 2,000+ contributors, is our foundation. We've been stewards of this engine since its first commit. We know it inside out. We deployed it at frontier scale—in research and in production. Open Source vLLM was built in the open. That's not changing. Inferact exists to supercharge vLLM adoption. The optimizations we develop flow back to the community. We plan to push vLLM's performance further, deepen support for emerging model architectures, and expand coverage across frontier hardware. The AI industry needs inference infrastructure that isn't locked behind proprietary walls. Join Us Through the open source community, we are fortunate to work with some of the best people we know. For @inferact, we're hiring engineers and researchers to work at the frontier of inference, where models meet hardware at scale. Come build with us. We're fortunate to be supported by investors who share our vision, including @a16z and @lightspeedvp who led our $150M seed, as well as @sequoia, @AltimeterCap, @Redpoint, @ZhenFund, The House Fund, @strikervp, @LaudeVentures, and @databricks. - @woosuk_k, @simon_mo_, @KaichaoYou, @rogerw0108, @istoica05 and the rest of the founding team

English
10
6
77
13.8K
Inferact retweetledi
The House Fund
The House Fund@thehousefund·
We backed @Inferact at inception, based on the Berkeley research project vLLM. Today, they announced a $150M seed led by @a16z and @LightspeedVP, with @Sequoia and The House Fund — one of the largest seed rounds ever. What started in a lab is now the open-source inference standard, powering AI at Meta, Google, and Character.AI, with 2,000+ contributors worldwide. Huge congrats to @simon_mo_, @woosuk_k, @KaichaoYou, @rogerw0108, @istoica05, @profjoeyg & team! The best of Berkeley AI + infrastructure. 🐻 Go Bears!
English
6
6
60
8.9K
Inferact retweetledi
Hao Zhang
Hao Zhang@haozhangml·
Big congrats on @inferact! Since we initiated vLLM’s earliest research push back in 2023, it has been incredible to watch @vllm_project become the OSS inference engine for so many teams. Building a project like this takes persistence across everything: research breakthroughs, ruthless engineering, performance + stability work, ecosystem integration, and the unglamorous grind of docs/CI/issues/releases. Huge gratitude to the maintainers & contributors—can’t wait to keep upstreaming new inference ideas in 2026 with the greater community and @inferact 🚀
Woosuk Kwon@woosuk_k

Today, we're proud to announce @inferact, a startup founded by creators and core maintainers of @vllm_project, the most popular open-source LLM inference engine. Our mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. The Challenge Inference is not solved. It's getting harder. Models grow larger. New architectures proliferate: mixture-of-experts, multimodal, agentic. Every breakthrough demands new infrastructure. Meanwhile, hardware fragments: more accelerators, more programming models, and more combinations to optimize. The capability gap between models and the systems that serve them is widening. Left this way, the most capable models remain bottlenecked and with full scope of their capabilities accessible only to those who can build custom infrastructure. Close the gap, and we unlock new possibilities. And the problem is growing. Inference is shifting from a fraction of compute to the majority: test-time compute, RL training loops, synthetic data. We see a future where serving AI becomes effortless. Today, deploying a frontier model at scale requires a dedicated infrastructure team. Tomorrow, it should be as simple as spinning up a serverless database. The complexity doesn't disappear; it gets absorbed into the infrastructure we're building. Why Us vLLM sits at the intersection of models and hardware: a position that took years to build. When model vendors ship new architectures, they work with us to ensure day-zero support. When hardware vendors develop new silicon, they integrate with vLLM. When teams deploy at scale, they run vLLM, from frontier labs to hyperscalers to startups serving millions of users. Today, vLLM supports 500+ model architectures, runs on 200+ accelerator types, and powers inference at global scale. This ecosystem, built with 2,000+ contributors, is our foundation. We've been stewards of this engine since its first commit. We know it inside out. We deployed it at frontier scale—in research and in production. Open Source vLLM was built in the open. That's not changing. Inferact exists to supercharge vLLM adoption. The optimizations we develop flow back to the community. We plan to push vLLM's performance further, deepen support for emerging model architectures, and expand coverage across frontier hardware. The AI industry needs inference infrastructure that isn't locked behind proprietary walls. Join Us Through the open source community, we are fortunate to work with some of the best people we know. For @inferact, we're hiring engineers and researchers to work at the frontier of inference, where models meet hardware at scale. Come build with us. We're fortunate to be supported by investors who share our vision, including @a16z and @lightspeedvp who led our $150M seed, as well as @sequoia, @AltimeterCap, @Redpoint, @ZhenFund, The House Fund, @strikervp, @LaudeVentures, and @databricks. - @woosuk_k, @simon_mo_, @KaichaoYou, @rogerw0108, @istoica05 and the rest of the founding team

English
4
6
79
17.5K
Inferact retweetledi
Woosuk Kwon
Woosuk Kwon@woosuk_k·
Thank you so much! I still remember the day @haozhangml suggested working on LLM inference back in 2022. vLLM truly wouldn’t exist without you.
Hao Zhang@haozhangml

Big congrats on @inferact! Since we initiated vLLM’s earliest research push back in 2023, it has been incredible to watch @vllm_project become the OSS inference engine for so many teams. Building a project like this takes persistence across everything: research breakthroughs, ruthless engineering, performance + stability work, ecosystem integration, and the unglamorous grind of docs/CI/issues/releases. Huge gratitude to the maintainers & contributors—can’t wait to keep upstreaming new inference ideas in 2026 with the greater community and @inferact 🚀

English
1
5
61
8K
Inferact retweetledi
Lightspeed
Lightspeed@lightspeedvp·
Inferact Co-Founder Simon Mo on AI economics: "You build the data centers, the training cluster, fund the training run, produce a model… but at that point, there is no value created." "Only delivering inference is the point where you can actually capitalize on this intelligence." Inference, not training, is increasingly where AI resources are being concentrated. @simon_mo_ @inferact
Lightspeed@lightspeedvp

We co-led Inferact's $150M seed round to support them in their mission to build the inference engine for all current and future AI. In this episode of The Investment Memo, Lightspeed's Bucky Moore and James Alcorn sit down with Simon Mo (Co-Founder & CEO @inferact) to cover: - How vLLM grew to 60K+ GitHub stars - Why inference is shifting to the majority of compute - How vLLM evolved from a research project into the industry standard - Why building a company was the next step to push open-source inference forward 00:00 Introduction 02:03 The investment memo 04:47 Latency vs throughput vs cost 06:19 Paged attention explained 08:04 The evolution of attention 09:42 Growing the vLLM open source community 11:41 Working with hardware vendors 14:45 Deploying vLLM at large scale 16:03 Inferact's culture of openness 18:45 Building an open ecosystem and horizontal stack 19:45 Inferact's approach to fundraising 22:14 What is the future of inference? @simon_mo_ @buckymoore @JamesAlcorn94

English
3
3
26
6.1K
Inferact retweetledi
Lightspeed
Lightspeed@lightspeedvp·
Inferact CEO @simon_mo_ says the AI infrastructure buildout is misunderstood: "The clusters being built for training—six months later, they'll be used entirely for inference." "Inference will start to eat up that capacity, and consume all the newly provisioned energy." @inferact
Lightspeed@lightspeedvp

We co-led Inferact's $150M seed round to support them in their mission to build the inference engine for all current and future AI. In this episode of The Investment Memo, Lightspeed's Bucky Moore and James Alcorn sit down with Simon Mo (Co-Founder & CEO @inferact) to cover: - How vLLM grew to 60K+ GitHub stars - Why inference is shifting to the majority of compute - How vLLM evolved from a research project into the industry standard - Why building a company was the next step to push open-source inference forward 00:00 Introduction 02:03 The investment memo 04:47 Latency vs throughput vs cost 06:19 Paged attention explained 08:04 The evolution of attention 09:42 Growing the vLLM open source community 11:41 Working with hardware vendors 14:45 Deploying vLLM at large scale 16:03 Inferact's culture of openness 18:45 Building an open ecosystem and horizontal stack 19:45 Inferact's approach to fundraising 22:14 What is the future of inference? @simon_mo_ @buckymoore @JamesAlcorn94

English
2
1
26
4.4K
Inferact retweetledi
Yusen DAI | 戴雨森
Yusen DAI | 戴雨森@yusen·
Very excited to partner with @inferact in support of their mission to build the inference engine for AI. ZhenFund is proud to have been an early supporter of @vllm_project. Huge congrats to @simon_mo_, @woosuk_k, @KaichaoYou, @rogerw0108, @istoica05, and the rest of the founding team.
Simon Mo@simon_mo_

vLLM has grown to 2000+ contributors scale with a diverse community of model, hardwares, and applications. I see @vllm_project on the path of becoming the world's inference engine and @inferact to accelerate AI progress. We cannot be more excited about the road ahead.

English
7
2
29
11.8K
Inferact retweetledi
Lily Liu
Lily Liu@eqhylxx·
vLLM was where I first got deep into MLsys—so excited to see the company finally here. Huge congrats and best wishes to @woosuk_k and @inferact!
Woosuk Kwon@woosuk_k

Today, we're proud to announce @inferact, a startup founded by creators and core maintainers of @vllm_project, the most popular open-source LLM inference engine. Our mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. The Challenge Inference is not solved. It's getting harder. Models grow larger. New architectures proliferate: mixture-of-experts, multimodal, agentic. Every breakthrough demands new infrastructure. Meanwhile, hardware fragments: more accelerators, more programming models, and more combinations to optimize. The capability gap between models and the systems that serve them is widening. Left this way, the most capable models remain bottlenecked and with full scope of their capabilities accessible only to those who can build custom infrastructure. Close the gap, and we unlock new possibilities. And the problem is growing. Inference is shifting from a fraction of compute to the majority: test-time compute, RL training loops, synthetic data. We see a future where serving AI becomes effortless. Today, deploying a frontier model at scale requires a dedicated infrastructure team. Tomorrow, it should be as simple as spinning up a serverless database. The complexity doesn't disappear; it gets absorbed into the infrastructure we're building. Why Us vLLM sits at the intersection of models and hardware: a position that took years to build. When model vendors ship new architectures, they work with us to ensure day-zero support. When hardware vendors develop new silicon, they integrate with vLLM. When teams deploy at scale, they run vLLM, from frontier labs to hyperscalers to startups serving millions of users. Today, vLLM supports 500+ model architectures, runs on 200+ accelerator types, and powers inference at global scale. This ecosystem, built with 2,000+ contributors, is our foundation. We've been stewards of this engine since its first commit. We know it inside out. We deployed it at frontier scale—in research and in production. Open Source vLLM was built in the open. That's not changing. Inferact exists to supercharge vLLM adoption. The optimizations we develop flow back to the community. We plan to push vLLM's performance further, deepen support for emerging model architectures, and expand coverage across frontier hardware. The AI industry needs inference infrastructure that isn't locked behind proprietary walls. Join Us Through the open source community, we are fortunate to work with some of the best people we know. For @inferact, we're hiring engineers and researchers to work at the frontier of inference, where models meet hardware at scale. Come build with us. We're fortunate to be supported by investors who share our vision, including @a16z and @lightspeedvp who led our $150M seed, as well as @sequoia, @AltimeterCap, @Redpoint, @ZhenFund, The House Fund, @strikervp, @LaudeVentures, and @databricks. - @woosuk_k, @simon_mo_, @KaichaoYou, @rogerw0108, @istoica05 and the rest of the founding team

English
2
2
58
6.3K
Inferact retweetledi
David Bloom
David Bloom@daveybloom·
When exceptional talent meets a compelling vision, it's an easy decision to invest. This team proved themselves on campus while growing @vllm_project . Now we're excited to support them as they build something special. Let's get to work! @woosuk_k @simon_mo_
Woosuk Kwon@woosuk_k

Today, we're proud to announce @inferact, a startup founded by creators and core maintainers of @vllm_project, the most popular open-source LLM inference engine. Our mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. The Challenge Inference is not solved. It's getting harder. Models grow larger. New architectures proliferate: mixture-of-experts, multimodal, agentic. Every breakthrough demands new infrastructure. Meanwhile, hardware fragments: more accelerators, more programming models, and more combinations to optimize. The capability gap between models and the systems that serve them is widening. Left this way, the most capable models remain bottlenecked and with full scope of their capabilities accessible only to those who can build custom infrastructure. Close the gap, and we unlock new possibilities. And the problem is growing. Inference is shifting from a fraction of compute to the majority: test-time compute, RL training loops, synthetic data. We see a future where serving AI becomes effortless. Today, deploying a frontier model at scale requires a dedicated infrastructure team. Tomorrow, it should be as simple as spinning up a serverless database. The complexity doesn't disappear; it gets absorbed into the infrastructure we're building. Why Us vLLM sits at the intersection of models and hardware: a position that took years to build. When model vendors ship new architectures, they work with us to ensure day-zero support. When hardware vendors develop new silicon, they integrate with vLLM. When teams deploy at scale, they run vLLM, from frontier labs to hyperscalers to startups serving millions of users. Today, vLLM supports 500+ model architectures, runs on 200+ accelerator types, and powers inference at global scale. This ecosystem, built with 2,000+ contributors, is our foundation. We've been stewards of this engine since its first commit. We know it inside out. We deployed it at frontier scale—in research and in production. Open Source vLLM was built in the open. That's not changing. Inferact exists to supercharge vLLM adoption. The optimizations we develop flow back to the community. We plan to push vLLM's performance further, deepen support for emerging model architectures, and expand coverage across frontier hardware. The AI industry needs inference infrastructure that isn't locked behind proprietary walls. Join Us Through the open source community, we are fortunate to work with some of the best people we know. For @inferact, we're hiring engineers and researchers to work at the frontier of inference, where models meet hardware at scale. Come build with us. We're fortunate to be supported by investors who share our vision, including @a16z and @lightspeedvp who led our $150M seed, as well as @sequoia, @AltimeterCap, @Redpoint, @ZhenFund, The House Fund, @strikervp, @LaudeVentures, and @databricks. - @woosuk_k, @simon_mo_, @KaichaoYou, @rogerw0108, @istoica05 and the rest of the founding team

English
1
1
16
1.9K
Inferact retweetledi
Roy Wang
Roy Wang@esmeetu87·
Been loving the vLLM journey since 2023 and this wonderfully warm community. Proud to work with a brilliant team and keep pushing vLLM to be the best open source inference engine in the world. 💙
Woosuk Kwon@woosuk_k

Today, we're proud to announce @inferact, a startup founded by creators and core maintainers of @vllm_project, the most popular open-source LLM inference engine. Our mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. The Challenge Inference is not solved. It's getting harder. Models grow larger. New architectures proliferate: mixture-of-experts, multimodal, agentic. Every breakthrough demands new infrastructure. Meanwhile, hardware fragments: more accelerators, more programming models, and more combinations to optimize. The capability gap between models and the systems that serve them is widening. Left this way, the most capable models remain bottlenecked and with full scope of their capabilities accessible only to those who can build custom infrastructure. Close the gap, and we unlock new possibilities. And the problem is growing. Inference is shifting from a fraction of compute to the majority: test-time compute, RL training loops, synthetic data. We see a future where serving AI becomes effortless. Today, deploying a frontier model at scale requires a dedicated infrastructure team. Tomorrow, it should be as simple as spinning up a serverless database. The complexity doesn't disappear; it gets absorbed into the infrastructure we're building. Why Us vLLM sits at the intersection of models and hardware: a position that took years to build. When model vendors ship new architectures, they work with us to ensure day-zero support. When hardware vendors develop new silicon, they integrate with vLLM. When teams deploy at scale, they run vLLM, from frontier labs to hyperscalers to startups serving millions of users. Today, vLLM supports 500+ model architectures, runs on 200+ accelerator types, and powers inference at global scale. This ecosystem, built with 2,000+ contributors, is our foundation. We've been stewards of this engine since its first commit. We know it inside out. We deployed it at frontier scale—in research and in production. Open Source vLLM was built in the open. That's not changing. Inferact exists to supercharge vLLM adoption. The optimizations we develop flow back to the community. We plan to push vLLM's performance further, deepen support for emerging model architectures, and expand coverage across frontier hardware. The AI industry needs inference infrastructure that isn't locked behind proprietary walls. Join Us Through the open source community, we are fortunate to work with some of the best people we know. For @inferact, we're hiring engineers and researchers to work at the frontier of inference, where models meet hardware at scale. Come build with us. We're fortunate to be supported by investors who share our vision, including @a16z and @lightspeedvp who led our $150M seed, as well as @sequoia, @AltimeterCap, @Redpoint, @ZhenFund, The House Fund, @strikervp, @LaudeVentures, and @databricks. - @woosuk_k, @simon_mo_, @KaichaoYou, @rogerw0108, @istoica05 and the rest of the founding team

English
1
2
11
1.2K
Inferact retweetledi
Zhewen Li
Zhewen Li@LiZhewen71800·
I'm beyond excited to join Inferact! Inference should be fast, cheap, and accessible to everyone. That's the future I want to build, and I'm grateful to be doing it with such a talented team and vibrant community. Let's make AI inference work for everyone, everywhere🚀
Woosuk Kwon@woosuk_k

Today, we're proud to announce @inferact, a startup founded by creators and core maintainers of @vllm_project, the most popular open-source LLM inference engine. Our mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. The Challenge Inference is not solved. It's getting harder. Models grow larger. New architectures proliferate: mixture-of-experts, multimodal, agentic. Every breakthrough demands new infrastructure. Meanwhile, hardware fragments: more accelerators, more programming models, and more combinations to optimize. The capability gap between models and the systems that serve them is widening. Left this way, the most capable models remain bottlenecked and with full scope of their capabilities accessible only to those who can build custom infrastructure. Close the gap, and we unlock new possibilities. And the problem is growing. Inference is shifting from a fraction of compute to the majority: test-time compute, RL training loops, synthetic data. We see a future where serving AI becomes effortless. Today, deploying a frontier model at scale requires a dedicated infrastructure team. Tomorrow, it should be as simple as spinning up a serverless database. The complexity doesn't disappear; it gets absorbed into the infrastructure we're building. Why Us vLLM sits at the intersection of models and hardware: a position that took years to build. When model vendors ship new architectures, they work with us to ensure day-zero support. When hardware vendors develop new silicon, they integrate with vLLM. When teams deploy at scale, they run vLLM, from frontier labs to hyperscalers to startups serving millions of users. Today, vLLM supports 500+ model architectures, runs on 200+ accelerator types, and powers inference at global scale. This ecosystem, built with 2,000+ contributors, is our foundation. We've been stewards of this engine since its first commit. We know it inside out. We deployed it at frontier scale—in research and in production. Open Source vLLM was built in the open. That's not changing. Inferact exists to supercharge vLLM adoption. The optimizations we develop flow back to the community. We plan to push vLLM's performance further, deepen support for emerging model architectures, and expand coverage across frontier hardware. The AI industry needs inference infrastructure that isn't locked behind proprietary walls. Join Us Through the open source community, we are fortunate to work with some of the best people we know. For @inferact, we're hiring engineers and researchers to work at the frontier of inference, where models meet hardware at scale. Come build with us. We're fortunate to be supported by investors who share our vision, including @a16z and @lightspeedvp who led our $150M seed, as well as @sequoia, @AltimeterCap, @Redpoint, @ZhenFund, The House Fund, @strikervp, @LaudeVentures, and @databricks. - @woosuk_k, @simon_mo_, @KaichaoYou, @rogerw0108, @istoica05 and the rest of the founding team

English
7
2
21
2.4K