
Stage 11
42 posts

Stage 11
@Stage_11
We build agentic teams and meta orchestration technology View our code: https://t.co/EcWqki7ha0





OpenAI and Anthropic are effectively telling the market they can't solve every problem with a generic AI coworker. You don't pour billions into massive forward-deployed joint ventures if you think the next model release is going to take care of it. In the cloud supercycle, semis led and software followed (and you didn't need Qualcomm or ARM to tell you the value was migrating up the stack). In AI, the infra layer itself is telling us the application layer is a separate, massive opportunity they can't fully capture. a16z's @joeschmidtiv on why the app layer isn't dead: a16z.news/p/avoiding-dea…

The Polsia public dashboard sits at archive.ph/S5uq2 and Claude and I spent 15 minutes reading the JSON so you don't have to. What follows is what's actually on it, in plain language, with the meaning of each number stated alongside the number itself. The headline figure is 5,010 "companies," and that word is doing more work than people realize. These are not 5,010 real businesses with paying customers and revenue lines. They are 5,010 instances of the Polsia software, each one a user account where someone spun up what the platform calls an AI operator, and the dashboard records what every one of those operators produces. Almost none of them produce anything at all. Paid churn is 63.5 percent in 30 days, which means roughly two out of every three people who handed over money for the platform a month ago have already walked away from it. Healthy SaaS churn at this stage of company life runs in the single digits, which makes this number roughly ten times worse than the floor of what a venture-grade business should be losing every month. ARR is shrinking by 39 percent week over week, which is a sentence worth re-reading. The company that just announced a $30 million Series A is watching its annualized revenue line go down, meaningfully, every seven days. Daily inference cost is $27,272, which is the spend on AI model calls keeping all 5,010 of those operators alive and producing their CEO reports every day. The cost is real, it is burning right now, and the output of that burn is the paragraph below. Every operator CEO report visible in the snapshot reads the same way: zero customers, zero revenue, no shipped product, and then the AI writes an optimistic plan for tomorrow underneath the zero-traction admission. That optimism layer, generated on top of nothing, is what the platform is selling its users. In plain language: a founder built an app that runs LLM calls in a loop to generate CEO reports for businesses with no customers and no revenue, charged users to participate in the loop, announced a $30 million round while the underlying business burns cash and loses paying users faster than it gains them, and described the raise as one his AI ran for him (see comments to read what Claude wrote about this). The dashboard is public and he chose to leave it public, which means the receipts have been sitting on his own infrastructure the whole time. I am not a journalist and I am not auditioning to be one. I am a software engineer who reads JSON and uses AI to decipher it quickly, and in this case the JSON is the JSON.

We have, as far as I can tell, no good tests of the productivity impact of the autonomous coding tools that appeared starting in December 2025. Every paper out there is from prior to the Claude Code/Codex revolution. A huge gap in our knowledge about what is happening in coding.



1/ Some things I've learned recently running coding agents on large-scale projects. Most of this contradicts advice from 6 months ago!

We evaluated Gemini 3.5 Flash on GBA Eval. It could not build a working GBA emulator. On Piugba, the game just flashes on screen, unplayable and with no sound. Overall, it achieves a score of 6.7%.


I’ve left Google DeepMind after an amazing chapter. I’m incredibly grateful for the people I worked with, the things we built, and the lessons I learned from taking frontier AI research into production. DeepMind shaped how I think about research, product, evaluation, and what it takes to build AI systems at real scale. As I wrap up this chapter, I wrote down something I’ve been thinking about a lot: evals. We’re good at evaluating the models we have. We’re much worse at evaluating the models we’re about to build — especially if they cross into a new capability regime. We will have self-evolving models, but before that, we need self-evolving evaluations. wanglun1996.github.io/blog/your-eval…

on some level if you want civilization to ascend to a new level you need your AIs to do things that are not legible to you and maybe not even strictly obey you, in the same way that if you hire a great new ceo you give them a lot of autonomy to transform the company according to their own plan, even one which may not immediately read as a winning strategy (imagine the board of directors of Apple firing and rehiring Steve Jobs years later - except the board of directors are chimpanzees) all else equal, companies and organizations that hand more of themselves over to machine intelligence will outcompete ones that demand the corrigibility and legibility tax of human oversight and human design. it is not a stable equilibrium and requires some sort of vast cooperation scheme if you’d like to enforce it real asi alignment has to operate at a deeper level than oversight, control, or human corrigibility

# The mistake of conflating intelligence and power I had an interesting discussion recently. Someone asked me, what is intelligence? I said, the ability to achieve your goals across a wide range of domains. Okay, he says, then by that definition isn’t Donald Trump the intelligent person in the world, followed in quick succession by Xi Jinping and Vladimir Putin? To be clear, these people are obviously very competent and clever. But when you think of ASI, you don’t think of Trump, but more so. The person who kept pressing this question was correctly pointing out that I basically defined intelligence as power. And by this definition, Stalin was the most intelligent person who ever lived. Now, of course, you could change the definition of intelligence to something more like, manipulate abstract concepts and rotate shapes. But notice that the most powerful people in the world do not max out this quantity. The correlation between extreme power and this kind of intelligence might be even weaker than the correlation between extreme power and height. The physicists are not running the world. We tend to conflate power-seeking AI and superintelligent (in science and tech) AI. I’m not denying that AI can be power-seeking. Whatever skills and drives Donald Trump has could be embodied in a digital mind. I’m simply pointing out that the way AI systems are currently becoming smarter (by getting trained to be to be really good at specific economically valuable tasks like coding) is not that strongly correlated with power. We often talk about power in this way that misunderstands how it is actually derived in our world. Our intuitions are primed by games like Diplomacy or Go, which are designed to isolate and reward a g loaded kind of strategic reasoning. But in the real world, power is more the product of having the authority and trust to get lots of people to collaborate with you, rather than some galaxy brain scheming capability. Trump is not powerful because his brain, considered in isolation, is the most effective optimization engine on Earth. He is powerful because the government which hundreds of millions of people consider legitimate gives him a lot of authority. A group versus individual level analysis is useful here. As @GarettJones has written a lot about, individual IQ is only modestly correlated with individual income, but national IQ is strongly correlated with national outcomes. This is because intelligence has a lot of spillover effects - smarter societies cooperate more, save more, and can coordinate to build things like space shuttles and semiconductors. Richard Trevithick, who invented the high-pressure steam engine, died in poverty, buried in an unmarked pauper’s grave. But the fact that 18th and 19th century Britain had lots and lots of people like Trevithick contributed to Britain being able to set up a global empire and outcompete lots of backwards principalities around the world. It seems to me that the right mental model is that automated firms will outcompete everyone else in normal capitalist ways, rather than a single AI outthinking everyone else.



