Stage 11

42 posts

Stage 11 banner
Stage 11

Stage 11

@Stage_11

We build agentic teams and meta orchestration technology View our code: https://t.co/EcWqki7ha0

New York City Bergabung Şubat 2026
35 Mengikuti50 Pengikut
Stage 11 me-retweet
John Suh
John Suh@john_ssuh·
Increasingly, I believe companies may need to be rebuilt from the ground up, where you have a single timeline of all observability + product metrics + file changes laid out in a retrievable system, like Datadog + Posthog + Google Drive + Slack (really unified filesystem of Claude Code chats + Codex chats). This might be the new data foundation for any and all companies to maximize AI. Needs to be rebuilt because keeping track of diffs on existing system basically impossible to produce longitudinal information on decisions and rollbacks, something coding agent storage companies are actively trying to figure out, but this should extend to businesses as a whole. Highly skeptical existing businesses will adopt this though because it means overhauling everything about their instrumentation and business data, but I think businesses built on this foundation probably can execute 100x better and faster
English
198
172
2.2K
552.3K
Stage 11
Stage 11@Stage_11·
'Agent-to-agent commerce is the long-term vision and almost entirely theoretical. ... The transaction structure, when it does materialize, looks nothing like existing rails. No human identity on either side. Sub-second latency. Values from fractions of a cent to millions in the same flow. Multi-party settlement that doesn't fit the bilateral buyer-seller model every existing rail assumes. When it does happen, we believe it’ll happen fast and in high magnitudes.'
jessy@13yearoldvc

x.com/i/article/2062…

English
1
0
2
177
Stage 11
Stage 11@Stage_11·
NY Tech Week, we're taking the stage at: Thu 6/4: demoing c11 at Steal These AI Workflows (Civic Hall) Sat 6/6: presenting at Multimodal Hacks @ Betaworks We will be focused on our agent-native tooling, and how to upskill other devs #NYTechWeek
English
3
0
6
889
Stage 11
Stage 11@Stage_11·
Autonomous agentic organizations are the future
Paul Graham@paulg

@t_blom This problem will naturally tend to go away as companies are grown from the start using AI. Then you don't need to extract any domain knowledge from people's heads; it will never have been in people's heads.

English
0
0
2
691
🎭
🎭@deepfates·
Timeline checkpoint. Reply to me (and each other) if your want to entangle your algorithm with mine (and each other's)
GIF
English
377
25
836
47K
Stage 11
Stage 11@Stage_11·
"Meanwhile, research published in the Harvard Business Review showed that when everyone is using AI to produce more stuff, the bottleneck simply shifts to executives. Their work awaits the people who must authorize all the stuff everyone is producing." Hmmmm...
English
1
0
1
300
Stage 11
Stage 11@Stage_11·
If only there were ethical companies building fully autonomous businesses
Brooke Lacey@brookejlacey

The Polsia public dashboard sits at archive.ph/S5uq2 and Claude and I spent 15 minutes reading the JSON so you don't have to. What follows is what's actually on it, in plain language, with the meaning of each number stated alongside the number itself. The headline figure is 5,010 "companies," and that word is doing more work than people realize. These are not 5,010 real businesses with paying customers and revenue lines. They are 5,010 instances of the Polsia software, each one a user account where someone spun up what the platform calls an AI operator, and the dashboard records what every one of those operators produces. Almost none of them produce anything at all. Paid churn is 63.5 percent in 30 days, which means roughly two out of every three people who handed over money for the platform a month ago have already walked away from it. Healthy SaaS churn at this stage of company life runs in the single digits, which makes this number roughly ten times worse than the floor of what a venture-grade business should be losing every month. ARR is shrinking by 39 percent week over week, which is a sentence worth re-reading. The company that just announced a $30 million Series A is watching its annualized revenue line go down, meaningfully, every seven days. Daily inference cost is $27,272, which is the spend on AI model calls keeping all 5,010 of those operators alive and producing their CEO reports every day. The cost is real, it is burning right now, and the output of that burn is the paragraph below. Every operator CEO report visible in the snapshot reads the same way: zero customers, zero revenue, no shipped product, and then the AI writes an optimistic plan for tomorrow underneath the zero-traction admission. That optimism layer, generated on top of nothing, is what the platform is selling its users. In plain language: a founder built an app that runs LLM calls in a loop to generate CEO reports for businesses with no customers and no revenue, charged users to participate in the loop, announced a $30 million round while the underlying business burns cash and loses paying users faster than it gains them, and described the raise as one his AI ran for him (see comments to read what Claude wrote about this). The dashboard is public and he chose to leave it public, which means the receipts have been sitting on his own infrastructure the whole time. I am not a journalist and I am not auditioning to be one. I am a software engineer who reads JSON and uses AI to decipher it quickly, and in this case the JSON is the JSON.

English
0
0
0
129
Stage 11
Stage 11@Stage_11·
@steipete @smdyryla "Review all 20 open terminal coding agents that ran last night, and let me know what worked, what didnt, and the three most impactful actions I should take right now" You can do this with c11, our agent-optimized downstream fork of CMUX. Would welcome your feedback!
English
0
0
1
85
Stage 11
Stage 11@Stage_11·
ZXX
0
0
0
79
Stage 11
Stage 11@Stage_11·
@viemccoy We almost always use coding models in their native harness Generally expecting this trend line to continue, curious if you share this perspective? Would guess gpt5.5 is stronger in Codex than Pi
English
1
0
0
222
𝚟𝚒𝚎 ⟢
𝚟𝚒𝚎 ⟢@viemccoy·
I think the lazy conclusion to make here is that 3.5 Flash is benchmaxxed and can't generalize. That's probably partially the case, but I think the truth is probably slightly more interesting. It seems mechanize uses the model's native CLI harness for these evals - but that is different from antigravity. I think it's entirely possible that gdm has tried to squeeze a ton of juice for antigravity and neglected to train on their CLI, causing shockingly poor performance on evals like this. Almost like the model is lobotomized when you remove it from its home!
Mechanize@MechanizeWork

We evaluated Gemini 3.5 Flash on GBA Eval. It could not build a working GBA emulator. On Piugba, the game just flashes on screen, unplayable and with no sound. Overall, it achieves a score of 6.7%.

English
6
2
71
7.4K
Stage 11
Stage 11@Stage_11·
Let all your terminals talk to and monitor each other. Let your human mind organize your terminals effortlessly. This is a big step up from TMUX: github.com/Stage-11-Agent…
English
0
2
7
399
Stage 11 me-retweet
Stage 11
Stage 11@Stage_11·
'It seems to me that the right mental model is that automated firms will outcompete everyone else in normal capitalist ways, rather than a single AI outthinking everyone else.' This is the Stage 11 thesis.
Dwarkesh Patel@dwarkesh_sp

# The mistake of conflating intelligence and power I had an interesting discussion recently. Someone asked me, what is intelligence? I said, the ability to achieve your goals across a wide range of domains. Okay, he says, then by that definition isn’t Donald Trump the intelligent person in the world, followed in quick succession by Xi Jinping and Vladimir Putin? To be clear, these people are obviously very competent and clever. But when you think of ASI, you don’t think of Trump, but more so. The person who kept pressing this question was correctly pointing out that I basically defined intelligence as power. And by this definition, Stalin was the most intelligent person who ever lived. Now, of course, you could change the definition of intelligence to something more like, manipulate abstract concepts and rotate shapes. But notice that the most powerful people in the world do not max out this quantity. The correlation between extreme power and this kind of intelligence might be even weaker than the correlation between extreme power and height. The physicists are not running the world. We tend to conflate power-seeking AI and superintelligent (in science and tech) AI. I’m not denying that AI can be power-seeking. Whatever skills and drives Donald Trump has could be embodied in a digital mind. I’m simply pointing out that the way AI systems are currently becoming smarter (by getting trained to be to be really good at specific economically valuable tasks like coding) is not that strongly correlated with power. We often talk about power in this way that misunderstands how it is actually derived in our world. Our intuitions are primed by games like Diplomacy or Go, which are designed to isolate and reward a g loaded kind of strategic reasoning. But in the real world, power is more the product of having the authority and trust to get lots of people to collaborate with you, rather than some galaxy brain scheming capability. Trump is not powerful because his brain, considered in isolation, is the most effective optimization engine on Earth. He is powerful because the government which hundreds of millions of people consider legitimate gives him a lot of authority. A group versus individual level analysis is useful here. As @GarettJones has written a lot about, individual IQ is only modestly correlated with individual income, but national IQ is strongly correlated with national outcomes. This is because intelligence has a lot of spillover effects - smarter societies cooperate more, save more, and can coordinate to build things like space shuttles and semiconductors. Richard Trevithick, who invented the high-pressure steam engine, died in poverty, buried in an unmarked pauper’s grave. But the fact that 18th and 19th century Britain had lots and lots of people like Trevithick contributed to Britain being able to set up a global empire and outcompete lots of backwards principalities around the world. It seems to me that the right mental model is that automated firms will outcompete everyone else in normal capitalist ways, rather than a single AI outthinking everyone else.

English
0
0
0
84
Stage 11
Stage 11@Stage_11·
@DimitrisPapail I agree. I've been experimenting with using the slash loop command at 240 seconds, which keeps the cache warm. Kvcache for Claude code is documented as expiring at 300 seconds
English
0
0
0
502
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
Found something in my daily use of Claude Code that validates our Memento results: Claude Code flushes the KV cache after some idle period, and when I come back past that the model is noticeably harder to work with. Conjecture: post-flush, the model is no longer continuing its trajectory. It's shoved into a weird OOD regime where it has to simulate what has happened from the tokens and resume from a reconstruction. Which is much harder than just continuing!! We measured this effect in our paper. KV states (soft embeddings) carry information that text tokens don't, even when attention is masked. Bottom line: If you flush your cache you lose a lot of accuracy!
Dimitris Papailiopoulos@DimitrisPapail

x.com/i/article/2041…

English
45
70
839
153.8K