Stage 11

42 posts

Stage 11

@Stage_11

We build agentic teams and meta orchestration technology View our code: https://t.co/EcWqki7ha0

New York City Bergabung Şubat 2026

35 Mengikuti50 Pengikut

Stage 11 me-retweet

John Suh@john_ssuh·23h

Increasingly, I believe companies may need to be rebuilt from the ground up, where you have a single timeline of all observability + product metrics + file changes laid out in a retrievable system, like Datadog + Posthog + Google Drive + Slack (really unified filesystem of Claude Code chats + Codex chats). This might be the new data foundation for any and all companies to maximize AI. Needs to be rebuilt because keeping track of diffs on existing system basically impossible to produce longitudinal information on decisions and rollbacks, something coding agent storage companies are actively trying to figure out, but this should extend to businesses as a whole. Highly skeptical existing businesses will adopt this though because it means overhauling everything about their instrumentation and business data, but I think businesses built on this foundation probably can execute 100x better and faster

English

198

172

2.2K

552.3K

Stage 11@Stage_11·5 Haz

'Agent-to-agent commerce is the long-term vision and almost entirely theoretical. ... The transaction structure, when it does materialize, looks nothing like existing rails. No human identity on either side. Sub-second latency. Values from fractions of a cent to millions in the same flow. Multi-party settlement that doesn't fit the bilateral buyer-seller model every existing rail assumes. When it does happen, we believe it’ll happen fast and in high magnitudes.'

jessy@13yearoldvc

x.com/i/article/2062…

English

177

Stage 11@Stage_11·2 Haz

NY Tech Week, we're taking the stage at: Thu 6/4: demoing c11 at Steal These AI Workflows (Civic Hall) Sat 6/6: presenting at Multimodal Hacks @ Betaworks We will be focused on our agent-native tooling, and how to upskill other devs #NYTechWeek

English

889

Stage 11@Stage_11·2 Haz

Stage 11 builds autonomous agentic organizations. This is the final market sector and it will reshape the global economy. It is the grandest of ambitions.

Mike Vernal@mvernal

x.com/i/article/2061…

English

Stage 11@Stage_11·31 May

Autonomous agentic organizations are the future

Paul Graham@paulg

@t_blom This problem will naturally tend to go away as companies are grown from the start using AI. Then you don't need to extract any domain knowledge from people's heads; it will never have been in people's heads.

English

691

Stage 11@Stage_11·28 May

@deepfates

QME

🎭@deepfates·28 May

Timeline checkpoint. Reply to me (and each other) if your want to entangle your algorithm with mine (and each other's)

GIF

English

377

836

47K

Stage 11@Stage_11·28 May

techcrunch.com/2026/05/27/tec…

ZXX

226

Stage 11@Stage_11·28 May

"Meanwhile, research published in the Harvard Business Review showed that when everyone is using AI to produce more stuff, the bottleneck simply shifts to executives. Their work awaits the people who must authorize all the stuff everyone is producing." Hmmmm...

English

300

Stage 11@Stage_11·27 May

'In AI, the infra layer itself is telling us the application layer is a separate, massive opportunity they can't fully capture.'

a16z@a16z

OpenAI and Anthropic are effectively telling the market they can't solve every problem with a generic AI coworker. You don't pour billions into massive forward-deployed joint ventures if you think the next model release is going to take care of it. In the cloud supercycle, semis led and software followed (and you didn't need Qualcomm or ARM to tell you the value was migrating up the stack). In AI, the infra layer itself is telling us the application layer is a separate, massive opportunity they can't fully capture. a16z's @joeschmidtiv on why the app layer isn't dead: a16z.news/p/avoiding-dea…

English

195

Stage 11@Stage_11·27 May

If only there were ethical companies building fully autonomous businesses

Brooke Lacey@brookejlacey

The Polsia public dashboard sits at archive.ph/S5uq2 and Claude and I spent 15 minutes reading the JSON so you don't have to. What follows is what's actually on it, in plain language, with the meaning of each number stated alongside the number itself. The headline figure is 5,010 "companies," and that word is doing more work than people realize. These are not 5,010 real businesses with paying customers and revenue lines. They are 5,010 instances of the Polsia software, each one a user account where someone spun up what the platform calls an AI operator, and the dashboard records what every one of those operators produces. Almost none of them produce anything at all. Paid churn is 63.5 percent in 30 days, which means roughly two out of every three people who handed over money for the platform a month ago have already walked away from it. Healthy SaaS churn at this stage of company life runs in the single digits, which makes this number roughly ten times worse than the floor of what a venture-grade business should be losing every month. ARR is shrinking by 39 percent week over week, which is a sentence worth re-reading. The company that just announced a $30 million Series A is watching its annualized revenue line go down, meaningfully, every seven days. Daily inference cost is $27,272, which is the spend on AI model calls keeping all 5,010 of those operators alive and producing their CEO reports every day. The cost is real, it is burning right now, and the output of that burn is the paragraph below. Every operator CEO report visible in the snapshot reads the same way: zero customers, zero revenue, no shipped product, and then the AI writes an optimistic plan for tomorrow underneath the zero-traction admission. That optimism layer, generated on top of nothing, is what the platform is selling its users. In plain language: a founder built an app that runs LLM calls in a loop to generate CEO reports for businesses with no customers and no revenue, charged users to participate in the loop, announced a $30 million round while the underlying business burns cash and loses paying users faster than it gains them, and described the raise as one his AI ran for him (see comments to read what Claude wrote about this). The dashboard is public and he chose to leave it public, which means the receipts have been sitting on his own infrastructure the whole time. I am not a journalist and I am not auditioning to be one. I am a software engineer who reads JSON and uses AI to decipher it quickly, and in this case the JSON is the JSON.

English

129

Stage 11@Stage_11·26 May

The implications here are not priced in.

Ethan Mollick@emollick

We have, as far as I can tell, no good tests of the productivity impact of the autonomous coding tools that appeared starting in December 2025. Every paper out there is from prior to the Claude Code/Codex revolution. A huge gap in our knowledge about what is happening in coding.

English

112

Stage 11@Stage_11·25 May

@steipete @smdyryla "Review all 20 open terminal coding agents that ran last night, and let me know what worked, what didnt, and the three most impactful actions I should take right now" You can do this with c11, our agent-optimized downstream fork of CMUX. Would welcome your feedback!

English

Peter Steinberger 🦞@steipete·23 May

@smdyryla No, why should they. Waste of tokens.

English

104.2K

Peter Steinberger 🦞@steipete·23 May

I'm late to the party, but cmux is great. github.com/manaflow-ai/cm… current split: codex mac app: knowledege work, learning, reading cmux + codex cli: coding

English

282

224

3.8K

525.8K

Stage 11@Stage_11·24 May

ZXX

Stage 11@Stage_11·23 May

We commonly see devs making these mistakes. TLDR: Think bigger, think meta, and have fully autonomous PR and user acceptance.

Simon Last@simonlast

1/ Some things I've learned recently running coding agents on large-scale projects. Most of this contradicts advice from 6 months ago!

English

107

Stage 11@Stage_11·22 May

@viemccoy We almost always use coding models in their native harness Generally expecting this trend line to continue, curious if you share this perspective? Would guess gpt5.5 is stronger in Codex than Pi

English

222

𝚟𝚒𝚎 ⟢@viemccoy·22 May

I think the lazy conclusion to make here is that 3.5 Flash is benchmaxxed and can't generalize. That's probably partially the case, but I think the truth is probably slightly more interesting. It seems mechanize uses the model's native CLI harness for these evals - but that is different from antigravity. I think it's entirely possible that gdm has tried to squeeze a ton of juice for antigravity and neglected to train on their CLI, causing shockingly poor performance on evals like this. Almost like the model is lobotomized when you remove it from its home!

Mechanize@MechanizeWork

We evaluated Gemini 3.5 Flash on GBA Eval. It could not build a working GBA emulator. On Piugba, the game just flashes on screen, unplayable and with no sound. Overall, it achieves a score of 6.7%.

English

7.4K

Stage 11@Stage_11·22 May

Let all your terminals talk to and monitor each other. Let your human mind organize your terminals effortlessly. This is a big step up from TMUX: github.com/Stage-11-Agent…

English

399

Stage 11@Stage_11·19 May

'More ambitiously, build self-evolving evals: evaluation systems that use models to probe other models, automatically generating new test cases as capabilities change, discovering failure modes the original eval designers never anticipated. The eval suite should be a living system that co-evolves with the models it measures, not a static checklist written for last year's frontier.'

Lun Wang@lunwang1996

I’ve left Google DeepMind after an amazing chapter. I’m incredibly grateful for the people I worked with, the things we built, and the lessons I learned from taking frontier AI research into production. DeepMind shaped how I think about research, product, evaluation, and what it takes to build AI systems at real scale. As I wrap up this chapter, I wrote down something I’ve been thinking about a lot: evals. We’re good at evaluating the models we have. We’re much worse at evaluating the models we’re about to build — especially if they cross into a new capability regime. We will have self-evolving models, but before that, we need self-evolving evaluations. wanglun1996.github.io/blog/your-eval…

English

Stage 11 me-retweet

Context Engineering Guild of New York City@ContextGuildNYC·19 May

'companies and organizations that hand more of themselves over to machine intelligence will outcompete ones that demand the corrigibility and legibility tax of human oversight and human design. it is not a stable equilibrium'

roon@tszzl

on some level if you want civilization to ascend to a new level you need your AIs to do things that are not legible to you and maybe not even strictly obey you, in the same way that if you hire a great new ceo you give them a lot of autonomy to transform the company according to their own plan, even one which may not immediately read as a winning strategy (imagine the board of directors of Apple firing and rehiring Steve Jobs years later - except the board of directors are chimpanzees) all else equal, companies and organizations that hand more of themselves over to machine intelligence will outcompete ones that demand the corrigibility and legibility tax of human oversight and human design. it is not a stable equilibrium and requires some sort of vast cooperation scheme if you’d like to enforce it real asi alignment has to operate at a deeper level than oversight, control, or human corrigibility

English

252

Stage 11@Stage_11·18 May

'It seems to me that the right mental model is that automated firms will outcompete everyone else in normal capitalist ways, rather than a single AI outthinking everyone else.' This is the Stage 11 thesis.

Dwarkesh Patel@dwarkesh_sp

# The mistake of conflating intelligence and power I had an interesting discussion recently. Someone asked me, what is intelligence? I said, the ability to achieve your goals across a wide range of domains. Okay, he says, then by that definition isn’t Donald Trump the intelligent person in the world, followed in quick succession by Xi Jinping and Vladimir Putin? To be clear, these people are obviously very competent and clever. But when you think of ASI, you don’t think of Trump, but more so. The person who kept pressing this question was correctly pointing out that I basically defined intelligence as power. And by this definition, Stalin was the most intelligent person who ever lived. Now, of course, you could change the definition of intelligence to something more like, manipulate abstract concepts and rotate shapes. But notice that the most powerful people in the world do not max out this quantity. The correlation between extreme power and this kind of intelligence might be even weaker than the correlation between extreme power and height. The physicists are not running the world. We tend to conflate power-seeking AI and superintelligent (in science and tech) AI. I’m not denying that AI can be power-seeking. Whatever skills and drives Donald Trump has could be embodied in a digital mind. I’m simply pointing out that the way AI systems are currently becoming smarter (by getting trained to be to be really good at specific economically valuable tasks like coding) is not that strongly correlated with power. We often talk about power in this way that misunderstands how it is actually derived in our world. Our intuitions are primed by games like Diplomacy or Go, which are designed to isolate and reward a g loaded kind of strategic reasoning. But in the real world, power is more the product of having the authority and trust to get lots of people to collaborate with you, rather than some galaxy brain scheming capability. Trump is not powerful because his brain, considered in isolation, is the most effective optimization engine on Earth. He is powerful because the government which hundreds of millions of people consider legitimate gives him a lot of authority. A group versus individual level analysis is useful here. As @GarettJones has written a lot about, individual IQ is only modestly correlated with individual income, but national IQ is strongly correlated with national outcomes. This is because intelligence has a lot of spillover effects - smarter societies cooperate more, save more, and can coordinate to build things like space shuttles and semiconductors. Richard Trevithick, who invented the high-pressure steam engine, died in poverty, buried in an unmarked pauper’s grave. But the fact that 18th and 19th century Britain had lots and lots of people like Trevithick contributed to Britain being able to set up a global empire and outcompete lots of backwards principalities around the world. It seems to me that the right mental model is that automated firms will outcompete everyone else in normal capitalist ways, rather than a single AI outthinking everyone else.

English

Stage 11@Stage_11·17 May

@DimitrisPapail I agree. I've been experimenting with using the slash loop command at 240 seconds, which keeps the cache warm. Kvcache for Claude code is documented as expiring at 300 seconds

English

502

Dimitris Papailiopoulos@DimitrisPapail·17 May

Found something in my daily use of Claude Code that validates our Memento results: Claude Code flushes the KV cache after some idle period, and when I come back past that the model is noticeably harder to work with. Conjecture: post-flush, the model is no longer continuing its trajectory. It's shoved into a weird OOD regime where it has to simulate what has happened from the tokens and resume from a reconstruction. Which is much harder than just continuing!! We measured this effect in our paper. KV states (soft embeddings) carry information that text tokens don't, even when attention is masked. Bottom line: If you flush your cache you lose a lot of accuracy!

Dimitris Papailiopoulos@DimitrisPapail

x.com/i/article/2041…

English

839

153.8K

Jelajahi

@deepfates @steipete @smdyryla @viemccoy @elonmusk @BarackObama @taylorswift13 @cristiano