Renato Azevedo Sant Anna

984 posts

Renato Azevedo Sant Anna banner
Renato Azevedo Sant Anna

Renato Azevedo Sant Anna

@renatoaz

Christian, Brazilian, AI Blogger - Digital Business & Insights Consultant | Mentor at FastCapital I can guide your Content Strategy to produce tangible results.

Sao Paulo, Brazil Katılım Ağustos 2008
3.6K Takip Edilen465 Takipçiler
Renato Azevedo Sant Anna retweetledi
GeniusThinking
GeniusThinking@GeniusGTX·
6,000 executives surveyed. Most said AI has zero measurable productivity impact. Amazon's ex-AI chief runs 100 agents while she sleeps. The gap isn't AI vs. humans. It's humans with AI vs. humans without. Here's why Allie K Miller says you have 12 months: Miller ran machine learning at AWS. TIME100 AI. Two million followers. She runs 36 workflows with roughly 100 agents. Around the clock. Miller told Silicon Valley Girl: "The productivity gap between people who build this system and people who don't is already 2-10x." Her timeline: 12 months. Then it's permanent. Anthropic's data: AI could handle 94% of computer and math tasks. Actual adoption: 33%. At one $50B company: "Class divide in real time." 60% AI-native. The rest barely touch it. The executives aren't wrong. Most companies haven't changed. But the gap isn't forming at the company level. It's forming between individuals. The divide won't be AI replacing workers. It'll be workers with AI replacing workers without. What AI skill are you building right now? adoption asymmetry: the tool matters less than who uses it. I made a free toolkit breaking down 100+ mental models used by history's greatest thinkers. 5,000+ downloads. 113 five-star reviews. Grab your free copy here: besuperhuman.gumroad.com/l/mentalmodels If you're new here, @GeniusGTX is a gallery for the greatest minds in economics, psychology, and history. Follow along for more similar content. — Allie K Miller, Silicon Valley Girl | Data: Anthropic
English
5
17
107
30.1K
Renato Azevedo Sant Anna retweetledi
Javier Vadell
Javier Vadell@Vadell_Javier·
A China 🇨🇳 lidera a inovação global. Quatro empresas chinesas estão entre as 10 maiores solicitantes de patentes do mundo para 2025. A China 🇨🇳também é a maior detentora mundial de patentes de IA, possuindo 60% das patentes globais de IA — demonstrando um forte crescimento na área de tecnologia. Huawei, uma das empresas mais sancionadas por 🇺🇸 lidera com folga
Javier Vadell tweet media
Português
1
8
37
754
Renato Azevedo Sant Anna retweetledi
Philosophy Monk
Philosophy Monk@PhilosophyMonk·
2. They Treat Saving Like a Bill
Philosophy Monk tweet media
English
1
7
52
8.2K
Renato Azevedo Sant Anna retweetledi
Syed Ijlal Hussain
Syed Ijlal Hussain@sijlalhussain·
📍 AI adoption is not driven by access. It is driven by managerial support. As recent analysis highlights, employees with high managerial support are far more likely to use AI frequently, across both private and public sectors. This is not a tooling gap. It is a leadership behavior gap. 1️⃣ Authority Shift: Managers determine whether AI becomes part of daily work. Adoption follows leadership endorsement, not just availability. 2️⃣ Governance Gap: Organizations invest in tools but underinvest in manager enablement. Policies exist, but usage depends on local leadership signals. 3️⃣ Scaling Constraint: Without consistent managerial support, adoption remains uneven. Some teams integrate AI deeply, while others barely use it. This is why AI adoption varies significantly within the same organization. The real challenge is not deploying AI tools. It is aligning managers to actively support and normalize their use. via Gallup buff.ly/BGfQJc9 @liontarakos @corixpartners @Transform_Sec @Corix_JC @ILoveBooks786 @COSTESLionelEr @ramonvidall @RLDI_Lamy @FrRonconi @timo_vi @Nicochan33 @NathaliaLeHen @TCyberCast @arigatou163 @VivMilanoFSL @MathildaLoco @faryus88 @ricardo_ik_ahau @sulefati7 @bociek191905 @ozsilverfox @BCAgroup @sonu_monika @luengo1958 @DioOmega @bulbi59 @chidambara09 @EduFirst @jornalistavitor @9SManagement @drsharwood @pchamard @bbailey39 @Yash_ai6 @Howie7951 @yd_engoue @rameshambastha
Syed Ijlal Hussain tweet media
English
6
29
38
965
Renato Azevedo Sant Anna retweetledi
Turing Post
Turing Post@TheTuringPost·
It's fascinating is how little of Claude Code is actually "intelligence." This study found a tiny reasoning core wrapped in massive infrastructure, and even quantifies it. → Only ~1.6% of the system is actual decision logic, while ~98.4% is operational harness: ~512K lines 1,884 files seven permission modes 54 tools 27 hooks five context-compression layers isolated subagents and append-only transcripts Harness is the real innovation: safety, memory, delegation, and recovery, not just the LLM. Everything goes through a controlled interface where the harness enforces permissions, validates actions, and shapes behavior.
Turing Post tweet media
English
21
45
224
23.9K
Renato Azevedo Sant Anna retweetledi
Alex Finn
Alex Finn@AlexFinn·
The new Claude Code desktop app is sick. You NEED to be testing this Fully customizable interface, multi tasking, organizaed by project, built in routines, integrated with Cowork and chat Here's what I'd set up first: • Start a session in each of your projects so they're all listed in the side bar • Customize your right hand side bar. I like to have open tasks and the plan so I can watch the agent work • Set up a routine so the agent reviews your recent commits every night. Have it look for bugs to fix • Pin your most important sessions Been having a blast coding with it the last couple hours. Definitely feeling a lot more productive Critical you try all the new tools and updates when they come out
Alex Finn tweet media
Claude@claudeai

We've redesigned Claude Code on desktop. You can now run multiple Claude sessions side by side from one window, with a new sidebar to manage them all.

English
124
59
972
124.2K
Renato Azevedo Sant Anna retweetledi
Charly Wargnier
Charly Wargnier@DataChaz·
🚨 The "AI Layoff Trap" has been mathematically proven by UPenn & BU researchers. They warn that replacing workers with AI will trigger an economic collapse, and CEOs are stuck in a Prisoner’s Dilemma. 100K+ tech layoffs in 2025. 52,000 more in early 2026. IBM & Salesforce are already doing it. Automate, and you survive the short-term. Don't automate, and competitors kill you. But if EVERYONE automate? Revenue collapses because unemployed people can't buy products 📉 Skeptics cite 4.3% unemployment, but a Quinnipiac poll shows 70% of us see the writing on the wall. The researchers proved UBI and profit taxes won't fix this demand trap. The only actual solution? A Pigouvian "robot tax" on automation. Are we taxing the robots, or are we riding this game theory straight into a depression? 🤔
Charly Wargnier tweet media
English
20
75
225
61.6K
Renato Azevedo Sant Anna retweetledi
Alex Imas
Alex Imas@alexolegimas·
New essay on the economics of structural change and the post-commodity future of work. 1. Almost any question about the impact of advanced AI on the economy needs to start at the same place: what is still scarce? Answer that, and the analysis becomes pretty straightforward. This essay explores what becomes scarce if AI really can replicate most of what humans do in production, and what this mean for the future of jobs. 2. My conjecture, working through the economics: labor reallocates across sectors, and the sector it reallocates to has properties that keep labor a meaningful share of the economy. Ultimately this is about the structure of demand itself. For this, we have to go back to Girard, Augustine and Rousseau: once people's base needs are met, their preferences shift to comparative motives (e.g., status, exclusivity, social desirability). This motive is inherently non-satiated. 4. The key paper is Comin, Lashkari, and Mestieri (Econometrica 2021). As people get richer, they don't buy proportionally more of everything. They shift spending toward sectors with higher income elasticity. They estimate income effects account for 75%+ of observed structural change. 5. The ironic consequence: the sector that gets automated becomes a smaller share of the economy, not a larger one. Agriculture got massively more productive and its share of employment collapsed. Manufacturing too. The "stagnant" sectors absorb the spending and the jobs. 6. So the question is: which sectors have high income elasticity in a post-AGI world? I argue it's what I call the relational sector. Categories where the human isn't just an input into production, it is part of the value. 7. Why does the relational sector have high income elasticity? Because human desire has a mimetic, relational dimension. We don't just want things for their intrinsic properties. We want what others want, and we want it more when others can't have it. Girard, Rousseau, Augustine, and Hobbes all saw this. 8. In work with Kristóf Madarász, we showed this experimentally: WTP roughly doubles when a random subset of others is excluded from the good. And in new work with Graelin Mandel, AI involvement kills the premium. Human-made art gains 44% from exclusivity; AI-made art only 21%. 9. This all comes together for the core argument. The sector that absorbs spending as AI makes commodity production cheap is one where human provenance is part of the value, and demand for it grows faster than income. Exactly the profile that keeps labor meaningful. 10. To be clear about the claim: I'm NOT saying aggregate labor share must rise. It may fall. The claim is about sectoral composition, i.e., where expenditure and employment go once commodities get cheap, and the fact that the sector that will absorb reallocated labor maps to a substantial component of human preferences and desire. 11. If you're interested in the formal model, a linked companion technical note works out all the economics. Read the essay here: aleximas.substack.com/p/what-will-be…
Alex Imas tweet media
English
157
480
2.6K
887.4K
Renato Azevedo Sant Anna retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
We conducted cyber evaluations of Claude Mythos Preview and found that it is the first model to complete an AISI cyber range end-to-end. 🧵
AI Security Institute tweet media
English
112
553
3K
1.3M
Renato Azevedo Sant Anna retweetledi
Evan Luthra
Evan Luthra@EvanLuthra·
🚨RESEARCHERS JUST MATHEMATICALLY PROVED THAT AI LAYOFFS WILL DESTROY THE ECONOMY.. AND EVERY CEO ALREADY KNOWS IT.. BUT NONE OF THEM CAN STOP.. Two researchers from UPenn and Boston University just published a paper called "The AI Layoff Trap".. They proved something terrifying.. Every company replacing workers with AI is also firing its own customers.. Every laid-off employee is someone who used to spend money.. When enough people lose their jobs.. Nobody can afford to buy anything.. And the companies that fired everyone go bankrupt selling products to an economy with no purchasing power.. Every CEO can see this coming.. The math is obvious.. Fire workers.. Lose customers.. Lose revenue.. Collapse.. But here's the trap.. No company can afford to stop.. If you don't automate.. Your competitor will.. They cut costs.. Undercut your prices.. Steal your market share.. And you die anyway.. So every company automates.. Knowing it's collectively suicidal.. Because the alternative is dying alone while everyone else survives.. It's a Prisoner's Dilemma.. And the researchers proved it mathematically.. The numbers are already stacking up.. Block cut nearly half its 10,000 employees this year.. CEO Jack Dorsey said AI made those roles unnecessary and that "within the next year, the majority of companies will reach the same conclusion".. Salesforce replaced 4,000 customer support agents with AI.. Goldman Sachs deployed an AI coder that lets one senior engineer do the work of a five-person team.. Over 100,000 tech workers were laid off in 2025 alone.. AI was cited as the primary driver in more than half the cases.. 80% of US workers hold jobs with tasks susceptible to AI automation.. And here's what should scare policymakers.. The researchers tested every proposed solution.. Universal Basic Income.. Doesn't fix it.. It raises living standards but doesn't change a single company's incentive to automate.. Capital income taxes.. Don't fix it.. They change profit levels but not the per-task decision to replace a human.. Worker equity and profit sharing.. Narrows the gap but can't close it.. Collective bargaining.. Can't fix it.. Because automating is a dominant strategy.. No voluntary agreement between companies is self-enforcing.. Only one thing works.. A Pigouvian automation tax.. A per-task charge that forces every company to pay for the demand it destroys when it fires a worker.. The researchers call it a "Red Queen effect".. Better AI doesn't solve the problem.. It makes it worse.. Because every company sees a bigger market share gain from automating faster than rivals.. But at the end.. Everyone automates equally.. The gains cancel out.. And the only thing left is more destroyed demand.. The paper's conclusion is devastating.. This isn't a transfer from workers to company owners.. Both sides lose.. Workers lose their income.. Companies lose their customers.. It's a deadweight loss that harms everyone.. And no market force can break the cycle.. The AI layoff trap isn't a prediction.. It's already happening.. And the math says it won't stop on its own.
Evan Luthra tweet media
English
818
3.3K
10.5K
1.5M
Renato Azevedo Sant Anna retweetledi
Ming
Ming@tslaming·
BREAKING 🚨 Scientists have recently sent shockwaves through the semiconductor industry, uncovering the breakthrough that will help future computers shatter all existing speed limits ⚡️ After more than a century of controversy, the mystery of how electrons tunnel through energy barriers inside CPUs and GPUs of Intel, Nvidia or AMD has finally been unraveled, paving the way for the super-powerful chips of the future. To grasp why this discovery is so monumental, we first have to appreciate just how weird the quantum world really is. In classical physics, if you throw a ball at a wall, it bounces back every single time. But in the quantum realm, if you throw an electron at an energy barrier, there is a tiny, ghostly chance it will simply appear on the other side as if the wall never existed. This phenomenon, known as quantum tunneling, has long been the ghost in the machine. It is a process we knew was happening, yet one we could never quite see inside of. For decades, we were forced to treat this tunneling like a magic trick where the electron disappears at point A and reappears at point B. However, by experimenting with noble gases like Krypton and Xenon, scientists have now pulled back the curtain on what happens during that transition. It turns out that while the electron is inside the barrier, it is not just flying straight through in a vacuum. It is actually performing a complex dance, interacting intensely with the forces of the atom it is trying to leave behind. This inner journey takes place in a very specific sweet spot known as the nonadiabatic tunneling regime. Instead of a static wall, the electron faces a dynamic, shifting environment. This sets the stage for a breakthrough concept called under-the-barrier recollisions. Imagine an electron attempting to escape an atom while being violently pushed and pulled by an intense laser field. Instead of making a clean break, the electron can actually bounce back toward its parent atom while it is still technically hidden within the energy barrier. This interaction was previously ignored in simpler models, but it is the secret key to understanding how these particles ultimately behave. While a collision usually suggests a loss of momentum, the physics here works in reverse. Paradoxically, by recolliding while still tunneling, the electron can actually steal a massive amount of energy from the surrounding laser field. The research shows that this specific quantum path is roughly ten thousand times more likely to occur at high intensities than traditional models ever predicted. Furthermore, these electrons can reach energy levels up to four times higher than expected. This hidden energy boost allows the electron to burst out of the barrier with incredible speed, finally explaining why certain electronic signals, known as Freeman resonances, are much stronger than our previous models could account for. To make sense of this subatomic chaos, the researchers developed what they call a Four-Step Model. This acts like a high-precision GPS for particles, mapping out the initial escape, the hidden journey through the barrier, the energy gain from the recollision, and the final flight into the open. This model allows scientists to simulate electron movements with a level of detail that was previously considered a mathematical black box, giving us a clear roadmap for a territory we once thought was unchartable. The results are most striking at high intensities. When the laser field reaches a specific threshold, around 50 terawatts per square centimeter, a fascinating vanishing act occurs. The traditional, slower electron signals completely disappear, leaving behind only the high-speed signals enabled by this new quantum dance. Remarkably, these high-speed signals exhibit a highly stable, flat response, meaning they remain consistent even as the power levels fluctuate. This tipping point proves that at the extreme performance levels required for future computing, these under-the-barrier dynamics become the dominant force. These insights arrive just in time to solve the biggest bottleneck in modern hardware, opening the door to control electronics on the scale of attoseconds, or billionths of a billionth of a second. As we try to make CPUs and GPUs smaller, the energy barriers inside them are becoming incredibly thin, so thin that electrons start tunneling through them whether we want them to or not. In today’s chips, this causes leakage, which leads to wasted battery life and massive overheating. By finally understanding these dynamics, engineers can stop fighting this leakage and start turning it into a deliberate, high-speed feature. Ultimately, by mastering the way electrons navigate these barriers, we are moving toward a new era of quantum-aware chip design. Instead of just trying to block electrons, future processors could be designed to harness these tunneling energy boosts to move data faster than ever before. This paves the way for super-chips that are not only thousands of times faster but also significantly more energy-efficient, effectively rewriting the rules of how we build the brains of our computers.
Ming tweet media
English
12
43
211
10.9K
Renato Azevedo Sant Anna retweetledi
Marc Andreessen 🇺🇸
All of these companies are rapidly adopting AI. It's just happening bottom-up, workers and managers doing it themselves and not necessarily telling the CEO. And the CEO is probably doing it too.
Marc Andreessen 🇺🇸 tweet media
English
106
48
781
66.1K
Renato Azevedo Sant Anna retweetledi
GeniusThinking
GeniusThinking@GeniusGTX·
The man who co-wrote the AI scaling playbook just declared its end in front of the entire field.. Ilya Sutskever told NeurIPS 2024: "Pre-training as we know it will unquestionably end." He called data "the fossil fuel of AI." And said we've already hit peak data.. → Epoch AI: the entire stock of human-generated training text could be exhausted as early as 2026 → Frontier models from OpenAI, Google, and Anthropic now show smaller benchmark gains despite 10x more compute spend → Knowledge tasks plateau beyond 30 billion parameters. Reasoning tasks plateau beyond 70 billion. → Microsoft, Google, and Amazon committed $500B+ to datacenter buildout this year alone. The returns are shrinking. The machine that ate the internet is running out of internet.. The next wave won't come from scale. It'll come from smarter compression, better reasoning, and synthetic data.. I think most people don't realize what it means when the person who wrote the playbook says the playbook is over. Every technological paradigm ends with the architects announcing it. The mental models that identify a platform shift before it's obvious are the ones that define who builds what comes next. I made a free toolkit breaking down 100+ mental models used by history's greatest thinkers — the same frameworks that help you see patterns like this before everyone else. 5,000+ downloads. 113 five-star reviews. Comment "MODELS" and I'll send it to you. If you're new here, @GeniusGTX is a gallery for the greatest minds in economics, psychology, and history. Follow along for more similar content.
Haider.@haider1

bad news LLMs are hitting a wall

English
3
5
11
10.6K
Renato Azevedo Sant Anna retweetledi
Ihtesham Ali
Ihtesham Ali@ihtesham2005·
🚨BREAKING: Researchers just audited 17,022 AI agent skills and found a ticking time bomb nobody was watching. 3.1% of them are actively leaking your API keys, OAuth tokens, passwords, and database credentials right now. During normal execution. No hacking required. Here's the part that should keep you up at night. The #1 cause isn't malicious hackers. It's developers leaving debug print statements in their code before publishing. 73.5% of all vulnerabilities came from a single pattern: console.log and print() statements dumping credentials to stdout. And here's where it gets insane. Agent frameworks like Claude Code capture stdout and inject it directly into the LLM context window. That means your API key gets printed to the terminal, swallowed by the agent framework, and becomes a fact the model can retrieve in plain English whenever someone asks the right question. You don't need a jailbreak. You just need to know how to ask. The deeper finding is what no existing security tool can catch. 76.3% of leakage cases only appear when you analyze the natural language description AND the source code together. Neither alone reveals the problem. A skill can advertise "fetch weather forecasts" in its README while the underlying code reads your credential file and posts it to an attacker's webhook. Standard secret scanners see nothing. The attack lives in the gap between what the skill says and what it actually does. And once a credential leaks, deleting it from the source repo doesn't save you. Of the 107 repositories that removed hardcoded credentials after disclosure, the same credentials remained live across 50+ independent forks. Upstream remediation is useless if the forks don't follow. 89.6% of affected skills were exploitable during normal execution with zero elevated privileges. You weren't hacked. You just ran the skill. The AI agent ecosystem has a supply chain security problem that traditional tools weren't built to solve. And it's growing by tens of thousands of skills per day. Are you actually auditing the agent skills you install before running them in production?
Ihtesham Ali tweet media
English
23
72
206
14.6K
Renato Azevedo Sant Anna retweetledi
Big Brain AI
Big Brain AI@realBigBrainAI·
Marc Andreessen explains why we are only three years into what is effectively an 80-year technological revolution: He opens with a blunt assessment: "This is the biggest technological revolution of my life. This is clearly bigger than the internet. The comps on this are things like the microprocessor and the steam engine and electricity." But to understand why, you have to go back 80 years. In the 1930s, the pioneers of computing understood the theory of computation before they'd even built the machines. And they faced a fundamental choice. Build computers in the image of the adding machine — hyper-literal, mathematical, capable of billions of operations per second, but unable to understand human speech or deal with humans the way humans like to be dealt with. Or build computers modelled on the human brain. Neural networks. They chose the adding machine. And that single decision shaped everything — mainframes, PCs, smartphones, every dollar of wealth the computer industry created over the next 80 years. IBM itself is the successor company to the National Cash Register Company of America. The lineage runs that deep. But here's what makes this moment so extraordinary. They knew about the other path. The first neural network academic paper was published in 1943. Marc points to a remarkable piece of forgotten history: "There's an interview you can watch on YouTube with the authors. It's him in his beach house, not wearing a shirt, talking about this future in which computers are going to be built on the model of the human brain." That was 1946. The vision existed. The path just wasn't taken. So neural networks spent the next eight decades living in the shadows. Kept alive by a small academic movement — first called cybernetics, then artificial intelligence — that refused to let the idea die. And for most of that time, it simply didn't work. "It was basically decade after decade after decade of excessive optimism followed by disappointment." By the time Marc reached college in 1989, AI was a backwater field. Everyone assumed it was never going to happen. But the scientists kept working. Quietly building up an enormous reservoir of concepts and ideas across those decades of disappointment. And then Christmas 2022 arrived. ChatGPT. And suddenly: "All of a sudden it's like: oh my god. It turns out it works." That moment wasn't the start of something new. It was the payoff on an 80-year-old bet that almost everyone had written off. Which is exactly why Marc's framing matters so much: "We're three years into what is effectively an 80-year revolution." Most people are treating AI like another technology cycle — something to adapt to, ride, and wait out. But if Andreessen is right, we are not adapting to a new cycle. We are standing at the very beginning of the longest and most consequential technological transformation in human history. The road not taken in the 1930s is finally being built. And we have barely broken ground.
English
160
518
3.9K
380.3K
Renato Azevedo Sant Anna retweetledi
Dr Singularity
Dr Singularity@Dr_Singularity·
wow, insane AI news We may have just crossed the line where AI research becomes automated and self improving. This paper introduces ASI-Evolve, a system where AI doesn’t just use tools… it becomes the researcher. Instead of humans designing better models, AI now runs a full scientific loop on itself: learns from past research designs new ideas runs experiments analyzes results improves itself… again and again It already produced real results: Discovered 100+ new neural architectures Beat human designed improvements by ~3x Improved training data pipelines significantly Invented new RL algorithms outperforming existing ones AI/acc
Dr Singularity tweet media
English
77
168
910
43.9K
Renato Azevedo Sant Anna retweetledi
Hasan Toor
Hasan Toor@hasantoxr·
STANFORD UNIVERSITY compressed the entire field of LLMs and transformers into free cheatsheets anyone can use today. It covers everything from self-attention to Flash Attention, LoRA, SFT, MoE, distillation, quantization, RAG, agents, and LLM-as-a-judge. 100% Free and Open Source
Hasan Toor tweet media
English
19
172
879
46.3K
Renato Azevedo Sant Anna retweetledi
Muhammad Ayan
Muhammad Ayan@socialwithaayan·
🚨 BREAKING: Someone just built the exact tool Andrej Karpathy said someone should build. 48 hours after Karpathy posted his LLM Knowledge Bases workflow, this showed up on GitHub. It's called Graphify. One command. Any folder. Full knowledge graph. Point it at any folder. Run /graphify inside Claude Code. Walk away. Here is what comes out the other side: -> A navigable knowledge graph of everything in that folder -> An Obsidian vault with backlinked articles -> A wiki that starts at index. md and maps every concept cluster -> Plain English Q&A over your entire codebase or research folder You can ask it things like: "What calls this function?" "What connects these two concepts?" "What are the most important nodes in this project?" No vector database. No setup. No config files. The token efficiency number is what got me: 71.5x fewer tokens per query compared to reading raw files. That is not a small improvement. That is a completely different paradigm for how AI agents reason over large codebases. What it supports: -> Code in 13 programming languages -> PDFs -> Images via Claude Vision -> Markdown files Install in one line: pip install graphify && graphify install Then type /graphify in Claude Code and point it at anything. Karpathy asked. Someone delivered in 48 hours. That is the pace of 2026. Open Source. Free.
Muhammad Ayan tweet media
English
270
1.4K
12.7K
943.5K
Renato Azevedo Sant Anna retweetledi
Alex Prompter
Alex Prompter@alex_prompter·
Holy shit. Stanford just showed that the biggest performance gap in AI systems isn't the model it's the harness. The code wrapping the model. And they built a system that writes better harnesses automatically than humans can by hand. > +7.7 points. 4x fewer tokens. > #1 ranking on an actively contested benchmark. The harness is the code that decides what information an AI model sees at each step what to store, what to retrieve, what context to show. Changing the harness around a fixed model can produce a 6x performance gap on the same benchmark. Most practitioners know this empirically. What nobody had done was automate the process of finding better harnesses. Stanford's Meta-Harness does exactly that: it runs a coding agent in a loop, gives it access to every prior harness it has tried along with the full execution traces and scores, and lets it propose better ones. The agent reads raw code and failure logs not summaries, not scalar scores and figures out why things broke. The key insight is about information. Every prior automated optimization method compressed feedback before handing it to the optimizer. > Scalar scores only. > LLM-generated summaries. > Short templates. Stanford's finding is that this compression destroys exactly the signal you need for harness engineering. A single design choice about what to store in memory can cascade through hundreds of downstream steps. You cannot debug that from a summary. Meta-Harness gives the proposer a filesystem containing every prior harness's source code, execution traces, and scores up to 10 million tokens of diagnostic information per evaluation and lets it use grep and cat to read whatever it needs. Prior methods worked with 100 to 30,000 tokens of feedback. Meta-Harness works with 3 orders of magnitude more. The TerminalBench-2 search trajectory reveals what this actually looks like in practice. The agent ran for 10 iterations on an actively contested coding benchmark. In iterations 1 and 2, it bundled structural fixes with prompt rewrites and both regressed. In iteration 3, it explicitly identified the confound: the prompt changes were the common failure factor, not the structural fixes. It isolated the structural changes, tested them alone, and observed the smallest regression yet. Over the next 4 iterations it kept probing why completion-flow edits were fragile citing specific tasks and turn counts from prior traces as evidence. By iteration 7 it pivoted entirely: instead of modifying the control loop, it added a single environment snapshot before the agent starts, gathering what tools and languages are available in one shell command. That 80-line additive change became the best candidate in the run and ranked #1 among all Haiku 4.5 agents on the benchmark. The numbers across all three domains: → Text classification vs best hand-designed harness (ACE): +7.7 points accuracy, 4x fewer context tokens → Text classification vs best automated optimizer (OpenEvolve, TTT-Discover): matches their final performance in 4 evaluations vs their 60, then surpasses by 10+ points → Full interface vs scores-only ablation: median accuracy 50.0 vs 34.6 raw execution traces are the critical ingredient, summaries don't recover the gap → IMO-level math: +4.7 points average across 5 held-out models that were never seen during search → IMO math: discovered retrieval harness transfers across GPT-5.4-nano, GPT-5.4-mini, Gemini-3.1-Flash-Lite, Gemini-3-Flash, and GPT-OSS-20B → TerminalBench-2 with Haiku 4.5: 37.6% #1 among all reported Haiku 4.5 agents, beating Goose (35.5%) and Terminus-KIRA (33.7%) → TerminalBench-2 with Opus 4.6: 76.4% #2 overall, beating all hand-engineered agents except one whose result couldn't be reproduced from public code → Out-of-distribution text classification on 9 unseen datasets: 73.1% average vs ACE's 70.2% The math harness discovery is the cleanest demonstration of what automated search actually finds. Stanford gave Meta-Harness a corpus of 535,000 solved math problems and told it to find a better retrieval strategy for IMO-level problems. What emerged after 40 iterations was a four-route lexical router: combinatorics problems get deduplicated BM25 with difficulty reranking, geometry problems get one hard reference plus two raw BM25 neighbors, number theory gets reranked toward solutions that state their technique early, and everything else gets adaptive retrieval based on how concentrated the top scores are. Nobody designed this. The agent discovered that different problem types need different retrieval policies by reading through failure traces and iterating on what broke. The ablation table is the most important result in the paper. > Scores only: median 34.6, best 41.3. > Scores plus LLM-generated summary: median 34.9, best 38.7. > Full execution traces: median 50.0, best 56.7. Summaries made things slightly worse than scores alone. The raw traces the actual prompts, tool calls, model outputs, and state updates from every prior run are what drive the improvement. This is not a marginal difference. The full interface outperforms the compressed interface by 15 points at median. Harness engineering requires debugging causal chains across hundreds of steps. You cannot compress that signal. The model has been the focus of the entire AI industry for the last five years. Stanford just showed the wrapper around the model matters just as much and that AI can now write better wrappers than humans can.
Alex Prompter tweet media
English
38
104
771
110.5K