Harley Wiltzer

1

19

2.3K

Harley Wiltzer retweetledi

Stefano Ermon@StefanoErmon·12 May

This is the future of coding agents: specialized, high-speed subagents handling critical workflows in the background. Proud to see @augmentcode running Mercury 2 in production for context compaction with major latency and cost improvements.

Inception@_inception_ai

@augmentcode rebuilt their context compaction layer around Mercury 2. 82% latency cut. 90% cost cut. Comparable quality to Opus 4.7. Running in production today. "We took a counter-intuitive bet. We decoupled summarization entirely, offloading it to Mercury 2 as a dedicated subagent. Mercury 2 is the highly efficient engine powering our most critical workflows." -@RustagiAnkur & @jm1234567890, Members of Technical Staff at Augment Code The subagent layer needs the most efficient model. Full methodology and eval setup in the writeup. inceptionlabs.ai/blog/rise-of-r…

English

9

80

10.6K

Harley Wiltzer retweetledi

Augment Code@augmentcode·12 May

At @augmentcode , we took a counter-intuitive bet on our AI architecture. Instead of using the primary coding model to preserve KV cache (the industry standard), we used Mercury 2 by @_inception_ai as a dedicated subagent. The payoff for our users: 82% faster context compaction, 90% lower summarization costs, <1s tool-search summaries, 30% lower LLM spend via Prism routing Read the full story here: inceptionlabs.ai/blog/rise-of-r…

Inception@_inception_ai

@augmentcode rebuilt their context compaction layer around Mercury 2. 82% latency cut. 90% cost cut. Comparable quality to Opus 4.7. Running in production today. "We took a counter-intuitive bet. We decoupled summarization entirely, offloading it to Mercury 2 as a dedicated subagent. Mercury 2 is the highly efficient engine powering our most critical workflows." -@RustagiAnkur & @jm1234567890, Members of Technical Staff at Augment Code The subagent layer needs the most efficient model. Full methodology and eval setup in the writeup. inceptionlabs.ai/blog/rise-of-r…

English

8

68

8.9K

Harley Wiltzer retweetledi

Inception@_inception_ai·12 May

@augmentcode rebuilt their context compaction layer around Mercury 2. 82% latency cut. 90% cost cut. Comparable quality to Opus 4.7. Running in production today. "We took a counter-intuitive bet. We decoupled summarization entirely, offloading it to Mercury 2 as a dedicated subagent. Mercury 2 is the highly efficient engine powering our most critical workflows." -@RustagiAnkur & @jm1234567890, Members of Technical Staff at Augment Code The subagent layer needs the most efficient model. Full methodology and eval setup in the writeup. inceptionlabs.ai/blog/rise-of-r…

English

15

78

26.9K

Harley Wiltzer retweetledi

Ben Lang@benln·10 May

Pulled the fastest-growing startups based on X follower growth over the past 90 days:

English

55

78

989

103.7K

Harley Wiltzer retweetledi

Tomas Hernando Kofman@tomas_hk·5 May

Our model router now supports @_inception_ai's Mercury 2, the fastest code gen model in existence. Use it with Not Diamond or @OpenRouter's /auto mode. For max speeds, use the latency tradeoff in nd or the plugins param in OpenRouter to route bw Mercury and a stronger model.

English

3

9

24

2.4K

Harley Wiltzer retweetledi

Sid Sharma@phylera14·4 May

p50: 175ms vs 686ms p99: 517ms vs 1183ms a top-10 US tech company benchmarked Mercury 2 from @_inception_ai against Gemini Flash on their search pipelines in prod. same tasks. same eval. diffusion LLMs are a different animal.

English

7

18

3.6K

Harley Wiltzer retweetledi

Mohamad H. Danesh@mo_danesh·1 May

📢 New paper out! We introduce QWM: a single locomotion world model trained across 8 quadrupeds and deployed zero-shot on robots it had never seen by conditioning on their morphology specs: ANYmal-D and Unitree Go1 🦾 No fine-tuning, no warm-up, no retraining from scratch. The key insight: robot morphology isn't a latent variable to infer from motion history, it's a known engineering spec sitting in the USD (or URDF) file. So we just use it directly.

GIF

English

Abandoned by Cerebras and your prod traffic has nowhere to go? Mercury 2 is still here: >1,000 tok/s Haiku/Flash-level quality $0.75 / 1M output tokens Fast inference shouldn’t disappear just because your provider changed strategy. DM: sid @inceptionlabs.ai

3

21

1.4K

Harley Wiltzer retweetledi

Stefano Ermon@StefanoErmon·27 Nis

If your fast inference capacity is constrained, Mercury 2 is ready. Same speed class, GPU-based deployment, and enough capacity for production traffic.

Sid Sharma@phylera14

English

34

9.8K

Harley Wiltzer retweetledi

Sid Sharma@phylera14·27 Nis

Abandoned by Cerebras and your prod traffic has nowhere to go? Mercury 2 is still here: >1,000 tok/s Haiku/Flash-level quality $0.75 / 1M output tokens Fast inference shouldn’t disappear just because your provider changed strategy. DM: sid@inceptionlabs.ai

English

8

32

13.8K

Harley Wiltzer@harwiltz·26 Nis

@ArthurGretton Last two MMD papers? Should I be happy or sad? Looks great!

English

0

1

132

Arthur Gretton@ArthurGretton·26 Nis

Talk on Optimized MMD for Detecting Distribution Shift at the #ICLR26 workshop: sites.google.com/view/iclr-2026… on 26 April, 11:15am ... covering the last two MMD papers you'll ever need! proceedings.neurips.cc/paper_files/pa… jmlr.org/papers/v24/21-…

English

7

68

3.5K

Harley Wiltzer retweetledi

Nishanth Anand@itsNVA7·25 Nis

Our new paper uses RL 🤖 to break VLMs! 🔨 Because our reward signal is non-stationary, the RL agent is forced to constantly evolve, uncovering 36 novel failure modes ⚡️ that human-designed benchmarks overlooked. Paper: arxiv.org/pdf/2604.04733 Full details👇

Kanishk Jain@kanji1011

Excited to share our new paper: "Discovering Failure Modes of Vision-Language Models using Reinforcement Learning"

English

3

24

2.3K

Harley Wiltzer retweetledi

Inception@_inception_ai·23 Nis

Mercury 2 is in a league of its own. 1,200 tok/s at comparable quality to speed-optimized autoregressive models, per @ArtificialAnlys.

English

8

12

152

12.4K

Harley Wiltzer@harwiltz·17 Nis

@RayZiyan41307 Very well said :)

English

2

35

Ziyan "Ray" Luo@RayZiyan41307·16 Nis

Nishanth’s expertise in RL is exceptional—precise, insightful, and reliable. A true contributor to both research and community, and an outstanding teacher who taught me a lot. Congrats, my friend, glad to be part of this beautiful photo.

Nishanth Anand@itsNVA7

Thrilled to share that on March 20th, I defended my Ph.D. with flying colours! 🎨🎓🧠 My thesis, "The permanent and transient framework for continual RL," had an amazing reception from my committee: Peter Dayan, Marc Bellemare, Paul Masset, and Doina Precup.

English

0

16

1K

Harley Wiltzer retweetledi

Inception@_inception_ai·16 Nis

Artificial Analysis launched a Model Recommender. Set your priorities for intelligence, speed, and cost, and it ranks the best models for your stack. Mercury 2 ranks first. See the full ranking on @ArtificialAnlys: artificialanalysis.ai/models/recomme…

English

8

22

130

23.6K

Harley Wiltzer retweetledi

Inception@_inception_ai·14 Nis

1,000+ tokens per second. 10x faster than autoregressive models. On standard GPUs. @StefanoErmon and @volokuleshov break down where that speed matters most: voice agents, coding, and production agent systems where latency compounds across every call. Our founder series with @timt at @MenloVentures.

English

4

9

43

6.5K

Harley Wiltzer retweetledi

Inception@_inception_ai·8 Nis

"That's why I think it's going to be the architecture of the future." @StefanoErmon on why diffusion will replace autoregressive models for language. Now on NVIDIA On-Demand, moderated by @liu_mingyu, VP of Research at @nvidia. #GTC26 nvidia.com/en-us/on-deman…

English

5

25

1.4K

Harley Wiltzer retweetledi

Inception@_inception_ai·7 Nis

An Evening with Inception x @iendeavors - Tues, April 14, Palo Alto. @StefanoErmon and the Inception team. Drinks, bites, and conversation about diffusion LLMs. No talks. No slides. We'd love to meet researchers, engineers, and students who are curious about dLLMs and where they're headed. Space is limited - invite here: luma.com/5p7cz4tq

English

9

63

8.8K

Harley Wiltzer retweetledi

Lucas Bunzel@LBunzel·6 Nis

Everyone's talking about which model to run in OpenClaw. Looked at the PinchBench data. Step 3.5 Flash is the most popular model on OpenRouter — 3T+ tokens in March. Here's how it compares to Mercury 2 on real agent tasks: Step 3.5 Flash → Mercury 2 → Speed: 62 → 96 → Cost efficiency: 94 → 99 → Consistency: 89 → 90 Comparable task success to GPT-5 Mini, GPT-4o, Gemini Flash, and DeepSeek Chat. Agents chain 20+ tool calls per task. Latency compounds. Cost compounds. That's where this gap matters most. pinchbench.com/?view=graphs&g…

English