
andrey
702 posts

andrey
@andreylebedev29
australian in london. building. ex-goldman sachs M&A. prev world #1 concert guitarist. 2:47 marathoner. side project https://t.co/XPRQd0UyIf


Your coding agents inherit your credentials and your permissions. No identity system in the stack can tell the difference between you and the agent acting in your name. Today: Keycard for Coding Agents 🧵


The monster 2026 from Anthropic continues. It's not just APIs and coding: Claude now captures 50%+ of spend on business AI chat.


This is one of the biggest sticking points on AI Excel that I'm trying to understand. 73% accuracy is progress, but is it useful for anything at all? We were on a vendor call last month and the vendor bragged of hitting 65% accuracy in Excel and Andrew Carr and I texted "an analyst who is 65% accurate in Excel is 100% fired". Why is AI Excel only 60-70% accurate? Are these issues fundamental or solvable? > Is MCP fundamentally too brittle to get to 99% accuracy? > Is the data layer clean enough to hit 99% accuracy (i.e. there's a reason why hedge fund analysts don't start their models with a Bloomberg download) > Are the foundation models powerful enough to handle the multi-modal (filings, PRs, investor decks, data supplementals), multi-document, "needle in a haystack" issues for LLMs? Context windows have grown, but they are still not large enough to capture all of the documents and files for one ticker (letalone a coverage universe) > Is the commercial opportunity large enough for foundation labs to build RL environments for public equity modeling, as they are doing on investment banking modeling? Does the "march of 9s" on AI Excel take 6 months or 6 years? Driverless cars took 13 years from DARPA Urban challenge to first Waymo. These are legit questions. I don't know. I also don't really trust public evaluation sets (i.e. LLM's win physics competitions...then you learn the LLM trained on the physics competition test bank lol). The real questions in investment research modeling are out of sample questions (i.e. how to model SAAS retention in a Claude-world...there is no prior on which to rely). So I am building my own evaluation set. 100 use cases ranging from simple (input 3 statements from 10-K to AMZN model) to complex (model GE split/spin). Am I wasting my time? 36 months form now, are we still only at 80% accuracy in AI Excel? These are questions, now answers - love your takes in replies or DM!

It’s starting to get wild what agents are able to do within your software. Here’s Box + Claude + Excel for full spreadsheet automation. This is just the beginning. Agents that have access to your data and tools, safely, will accelerate all knowledge work.


This week, on Benchmark's new podcast Uncapped 😂, I sat down with @btaylor, founder of Sierra and Chairman of OpenAI. He's easily one of the most impressive people I’ve met in tech or in general. We talked about AI and the saaspocalypse, the unique considerations of building an AI native / agent company, whether young or experienced founders have the advantage right now, Codex and OpenAI ads, and much more. Learned a ton from Bret, hope you enjoy. (0:00) Intro (0:20) The Saaspocalypse and systems of record (12:34) Sierra's landscape (17:05) Outcome-based pricing (24:22) The rapid evolution of AI support technology (28:21) Young founders vs. experienced founders (34:12) What comes next beyond support (38:47) Codex and the future of software engineering (51:49) OpenAI and advertising (54:59) Working with investors and boards








Opus 4.6 is here. The jump in autonomy is real. The biggest shift for me personally has been learning to let it run. Give it the context, step away, and come back to something pretty amazing. The way we work alongside models is starting to completely change.

