Lucas Bunzel

316 posts

Lucas Bunzel banner
Lucas Bunzel

Lucas Bunzel

@LBunzel

Marketing @_inception_ai

San Francisco Katılım Haziran 2011
203 Takip Edilen158 Takipçiler
Lucas Bunzel retweetledi
Inception
Inception@_inception_ai·
Day 2 at @MLSysConf. Thanks to everyone who came by yesterday. The conversations on diffusion for language, the future of language models, and what fast inference unlocks have been the highlight. Come find us at the booth today and meet the team behind Mercury 2. And join us tonight for drinks. 🔗 luma.com/9rw9lx31
Inception tweet media
English
0
2
19
2.1K
Lucas Bunzel retweetledi
Inception
Inception@_inception_ai·
We're at @MLSysConf in Seattle! Catch our co-founder and Chief Scientist @volokuleshov on stage today at 2:30pm. Learn more about diffusion LLMs and how Mercury 2 hits >1,000 tok/s on standard GPUs, at comparable quality to speed-optimized autoregressive models. Swing by the booth after to meet the team.
Inception tweet media
English
0
2
16
6.4K
Lucas Bunzel retweetledi
Inception
Inception@_inception_ai·
Today's autoregressive models generate one token at a time. Mercury 2 generates tokens in parallel. Over 1,000 tok/sec on standard GPUs, at comparable quality to speed-optimized models. Since launch, the community has been showing what diffusion LLMs can unlock. Thanks to the team at Clyep for the breakdown.
English
15
25
310
20.9K
Lucas Bunzel retweetledi
Inception
Inception@_inception_ai·
Inception is heading to #MLSys2026 in Seattle next week. Two things worth your time: 1️⃣ Mon 5/18 at 2pm: lightning talk from @volokuleshov, co-founder of Inception. Come hear about a new generation of training and inference for diffusion-based language models. 2️⃣ Tues 5/19 evening: drinks + conversations with @akashpalrecha98, @apoorv_umang, @sawyerbirnbaum, and the team. 👇 Luma RSVP below
Inception tweet media
English
2
4
33
3.2K
Lucas Bunzel retweetledi
Sid Sharma
Sid Sharma@phylera14·
Inception is hiring a Head of Product This is a hands-on role for a technical product lead who wants to help build the next generation of LLMs. You'd work directly with S-tier AI researchers at the frontier of model architecture, inference, and enterprise deployment. We're one of the only AI labs where the product is live in production with enterprises and AI-native companies today - and the valuation is at a stage where your equity has real upside (not financial advice). The bar is high. The role is not a walk in the park. But if you’ve been watching the frontier AI labs from the sidelines and waiting for the seat where you can help build foundational AI infrastructure before the category is obvious, this is it. DM me. Bay Area only. jobs.gem.com/inception/am9i…
English
2
2
14
1.7K
Lucas Bunzel retweetledi
Augment Code
Augment Code@augmentcode·
At @augmentcode , we took a counter-intuitive bet on our AI architecture. Instead of using the primary coding model to preserve KV cache (the industry standard), we used Mercury 2 by @_inception_ai as a dedicated subagent. The payoff for our users: 82% faster context compaction, 90% lower summarization costs, <1s tool-search summaries, 30% lower LLM spend via Prism routing Read the full story here: inceptionlabs.ai/blog/rise-of-r…
Inception@_inception_ai

@augmentcode rebuilt their context compaction layer around Mercury 2. 82% latency cut. 90% cost cut. Comparable quality to Opus 4.7. Running in production today. "We took a counter-intuitive bet. We decoupled summarization entirely, offloading it to Mercury 2 as a dedicated subagent. Mercury 2 is the highly efficient engine powering our most critical workflows." -@RustagiAnkur & @jm1234567890, Members of Technical Staff at Augment Code The subagent layer needs the most efficient model. Full methodology and eval setup in the writeup. inceptionlabs.ai/blog/rise-of-r…

English
0
8
68
8.9K
Lucas Bunzel retweetledi
Stefano Ermon
Stefano Ermon@StefanoErmon·
This is the future of coding agents: specialized, high-speed subagents handling critical workflows in the background. Proud to see @augmentcode running Mercury 2 in production for context compaction with major latency and cost improvements.
Inception@_inception_ai

@augmentcode rebuilt their context compaction layer around Mercury 2. 82% latency cut. 90% cost cut. Comparable quality to Opus 4.7. Running in production today. "We took a counter-intuitive bet. We decoupled summarization entirely, offloading it to Mercury 2 as a dedicated subagent. Mercury 2 is the highly efficient engine powering our most critical workflows." -@RustagiAnkur & @jm1234567890, Members of Technical Staff at Augment Code The subagent layer needs the most efficient model. Full methodology and eval setup in the writeup. inceptionlabs.ai/blog/rise-of-r…

English
2
9
80
10.6K
Lucas Bunzel retweetledi
Sid Sharma
Sid Sharma@phylera14·
The best AI agents in production aren't one model. They're 5-10 specialized subagents running in parallel, each matched to the right task/cost/speed tradeoff. @augmentcode's architecture is one of the cleanest examples of this shift. We wrote up how they do it.
Inception@_inception_ai

@augmentcode rebuilt their context compaction layer around Mercury 2. 82% latency cut. 90% cost cut. Comparable quality to Opus 4.7. Running in production today. "We took a counter-intuitive bet. We decoupled summarization entirely, offloading it to Mercury 2 as a dedicated subagent. Mercury 2 is the highly efficient engine powering our most critical workflows." -@RustagiAnkur & @jm1234567890, Members of Technical Staff at Augment Code The subagent layer needs the most efficient model. Full methodology and eval setup in the writeup. inceptionlabs.ai/blog/rise-of-r…

English
1
1
2
179
Lucas Bunzel retweetledi
Ankur Rustagi
Ankur Rustagi@RustagiAnkur·
At Augment, we aren’t tied to a single provider, which gives us the freedom to prioritize models with optimal speed and cost-efficiency for our users. Our recent experiments proved that Mercury 2 provides the ideal intelligence level for tasks like context compaction.
Inception@_inception_ai

@augmentcode rebuilt their context compaction layer around Mercury 2. 82% latency cut. 90% cost cut. Comparable quality to Opus 4.7. Running in production today. "We took a counter-intuitive bet. We decoupled summarization entirely, offloading it to Mercury 2 as a dedicated subagent. Mercury 2 is the highly efficient engine powering our most critical workflows." -@RustagiAnkur & @jm1234567890, Members of Technical Staff at Augment Code The subagent layer needs the most efficient model. Full methodology and eval setup in the writeup. inceptionlabs.ai/blog/rise-of-r…

English
1
3
13
1.7K
Lucas Bunzel retweetledi
Inception
Inception@_inception_ai·
@augmentcode rebuilt their context compaction layer around Mercury 2. 82% latency cut. 90% cost cut. Comparable quality to Opus 4.7. Running in production today. "We took a counter-intuitive bet. We decoupled summarization entirely, offloading it to Mercury 2 as a dedicated subagent. Mercury 2 is the highly efficient engine powering our most critical workflows." -@RustagiAnkur & @jm1234567890, Members of Technical Staff at Augment Code The subagent layer needs the most efficient model. Full methodology and eval setup in the writeup. inceptionlabs.ai/blog/rise-of-r…
Inception tweet media
English
2
15
78
26.9K
Lucas Bunzel retweetledi
Inception
Inception@_inception_ai·
What does it take to build a frontier AI lab from scratch? The team co-invented diffusion models, flash attention, decision transformers, and DPO. Part 5 of our founder story series with @timt at @MenloVentures on building the culture. Featuring @StefanoErmon, @adityagrover_, @volokuleshov
English
0
3
27
2.2K
Michael Xu
Michael Xu@MichaelXu25·
@LBunzel @StefanoErmon @StartupGrind Fascinating keynote. The framing of efficiency, rather than raw capability alone, as a central constraint for high-volume agentic workloads is very compelling. Was the talk recorded? I would be very interested in watching the full keynote. @LBunzel @StefanoErmon
English
1
0
1
64
Lucas Bunzel
Lucas Bunzel@LBunzel·
"Beyond autoregressive: why diffusion is the future of language models" @StefanoErmon's keynote at @startupgrind yesterday. Fully packed Fox Theatre. Mercury 2 is hitting >1,000 tok/sec on standard GPUs at a fraction of the cost, comparable quality to frontier speed-optimized models. Diffusion. Parallel token generation. His closing line: the question isn't which model is smartest, it's which model is most efficient, without sacrificing quality, on the highest-volume tasks. When agents make 50 LLM calls per task, latency is the product. @_inception_ai
Lucas Bunzel tweet media
English
2
11
34
5.2K
Lucas Bunzel retweetledi
Inception
Inception@_inception_ai·
"Beyond autoregressive: why diffusion is the future of language models." @StefanoErmon's keynote at @StartupGrind last week at the Fox Theatre. Your AI product doesn't make one model call per session. It makes thousands. Most run quietly under the hood. The work that keeps the agent moving. Using a frontier model for all of it means paying frontier pricing for work that doesn't need frontier intelligence. Mercury 2 is built for that work: >1,000 tokens/sec on standard GPUs, comparable quality to frontier speed-optimized models for a fraction of the cost. The question isn't which model is smartest. It's which model is most efficient, on the highest-volume tasks. @adityagrover_ @volokuleshov
English
2
5
31
1.6K
Lucas Bunzel retweetledi
Tomas Hernando Kofman
Tomas Hernando Kofman@tomas_hk·
Our model router now supports @_inception_ai's Mercury 2, the fastest code gen model in existence. Use it with Not Diamond or @OpenRouter's /auto mode. For max speeds, use the latency tradeoff in nd or the plugins param in OpenRouter to route bw Mercury and a stronger model.
English
3
9
24
2.4K
Lucas Bunzel retweetledi
Sid Sharma
Sid Sharma@phylera14·
p50: 175ms vs 686ms p99: 517ms vs 1183ms a top-10 US tech company benchmarked Mercury 2 from @_inception_ai against Gemini Flash on their search pipelines in prod. same tasks. same eval. diffusion LLMs are a different animal.
Sid Sharma tweet media
English
0
7
18
3.6K
Lucas Bunzel retweetledi
Inception
Inception@_inception_ai·
Turn your sound on 🔊 @joycech3n asked Mercury 2 to plan a friday happy hour bar crawl for the Inception team. It reasoned, called tools, picked spots, at the speed of a normal conversation. Real-time voice agents weren't possible with autoregressive latency. They are now. Try Mercury 2 in your voice agent stack today → platform.inceptionlabs.ai TTS by @elevenlabs
English
5
6
23
3.4K
Lucas Bunzel retweetledi
Inception
Inception@_inception_ai·
In 2014, generative models were a niche people didn't think would work. @StefanoErmon kept betting on them anyway. His group at Stanford went on to co-invent diffusion, the technique behind Sora and Midjourney. Then he made the bet that the same approach could work for text and code, and brought in @adityagrover_ and @volokuleshov to start the company. The full arc, by @Si_Campbell_ in @SFBusinessTimes: bizjournals.com/sanfrancisco/n…
Inception tweet media
English
0
11
45
4.7K