
Jason Cui
1.3K posts

Jason Cui
@JasonSCui
Partner @a16z investing in infra & AI | Prev product @databricks & @uber, founder at Jemi (YC S20, acq) | Technology optimist ☀️


Today I’m excited to congratulate @simon_mo_ on an outstanding PhD thesis defense on his work exploring the design of Inference Serving Systems. 🎉 Simon has been working on inference systems with me for nearly a decade -- long before most people even considered inference serving a research problem worth studying. Over that time, he helped drive inference systems projects spanning Clipper, @raydistributed Serve, and now @vllm_project. Together, these systems helped define the modern inference serving stack that powers today’s AI applications. Beyond being an exceptional researcher, Simon has also been a remarkable team and community builder, especially through his leadership on vLLM and the open-source ecosystem around it. Along with my colleagues @istoica05 and @koushik77, I am excited to see Simon leading @inferact as CEO and helping shape the future of inference systems and AI infrastructure. Congratulations, Simon!

Agentic AI is changing the rules for inference. With DeepSeek V4, NVIDIA Blackwell delivered 20x lower cost per token out of the box, running a 1.6T parameter MoE model with a 1M token context on day one. But the real story is how: NVIDIA is the only platform co-designed end-to-end across five rack-scale systems—engineered to operate as a unified AI factory rather than a collection of discrete components. That’s what enables: → Higher throughput for agentic workloads → Lower latency across multi-step reasoning loops → Sustained improvements in token economics over time As AI factories scale, cost per token becomes the metric that matters and extreme co-design is the advantage that compounds. 📗 nvda.ws/3OJ5j9F

a16z GP @BornsteinMatt on the next wave of AI x Science companies — and the founders building them. Science x AI Summit. May 13. Register: sair.foundation/events/science…



Investing in AI x Science — the panel. Mod: @JasonSCui (a16z) With: @xuezhao (Basis Set), @protokultur (M12), @jonchu (Khosla), @ttunguz (Theory), @ivzhou (Accel) Six of the sharpest AI investors. One stage. Science x AI Summit. May 13. Palo Alto. Register: sair.foundation/events/science…

Super fulfilled and energized hosting this 7-hour video hackathon with our friends at @fal @MuxHQ @zakariaornot at Overshoot. We’ve seen cool demos ranging from real-time moderation, to virtual tryon, to live podcasting to AI monitoring the situation. It’s a good remember how powerful video is as a computing interface and the most versatile medium. Thank you our judges @joshalphonse @Christos_antono @zakariaornot. And congrats to all the prize winners! 👏









.@MaikaThoughts says modern AI models work a lot like Memento: pre-training gives them the past, and everything after that needs scaffolding. "In Memento, the main protagonist has a form of this amnesia where he cannot form new memories. He uses sticky notes where he writes some of the notes to himself. He even tattoos some of the memories that he wants to imprint." "It kind of maps one to one to AI, how AI models work today." "We have the training phase where we basically encompass all of the world's knowledge, and that part is what we call pre-training. After the training phase, we basically have the cutoff date after which point we deploy the models into the world." "The model is basically frozen." "We use retrieval mechanisms like RAGs, we have the system prompt that essentially serves as a tattoo."



SpaceXAI and @cursor_ai are now working closely together to create the world’s best coding and knowledge work AI. The combination of Cursor’s leading product and distribution to expert software engineers with SpaceX’s million H100 equivalent Colossus training supercomputer will allow us to build the world’s most useful models. Cursor has also given SpaceX the right to acquire Cursor later this year for $60 billion or pay $10 billion for our work together.

We @a16z are co-hosting a video hackathon with Overshoot @zakariaornot @fal @MuxHQ! 4 hours to hack on real-time video, generative media, and new workflows with the best tools. Space is limited & sign up below (approval required) 👇 w/ @JenniferHli @venturetwins @JasonSCui
