Mrinal Wadhwa

18.5K posts

Mrinal Wadhwa banner
Mrinal Wadhwa

Mrinal Wadhwa

@mrinal

CTO @ Autonomy

San Francisco Bay Area Katılım Kasım 2007
1.6K Takip Edilen4.2K Takipçiler
Sabitlenmiş Tweet
Mrinal Wadhwa
Mrinal Wadhwa@mrinal·
I built a swarm of 5000+ deep code review agents that assess a codebase in parallel. Here's a time lapse of them analyzing the source code of @vuejs core:
GIF
English
1
0
7
449
Mrinal Wadhwa retweetledi
1Password
1Password@1Password·
Agent swarms are incredibly powerful and dangerously easy to deploy unsafely with today’s security models.
English
2
3
8
3.2K
Mrinal Wadhwa
Mrinal Wadhwa@mrinal·
@mntruell @harjotsgill I think you and all my friends @coderabbitai will find what I was playing around with last interesting. Obviously not a polished product, but fun. Have a look, would love your thoughts ^
English
0
0
0
161
Mrinal Wadhwa
Mrinal Wadhwa@mrinal·
I built a swarm of 5000+ deep code review agents that assess a codebase in parallel. Here's a time lapse of them analyzing the source code of @vuejs core:
GIF
English
1
0
7
449
Mrinal Wadhwa
Mrinal Wadhwa@mrinal·
That was insightful! We're using a similar approach for developer documentation for our product and it creates magic ... Autonomy is a platform to run apps that use use teams of agents to autonomously perform long and complex tasks. Like many developer products, it has a CLI, a sign up/sign in flow driven by the command, commands to look at logs of running apps, APIs, programming libs, etc. Traditionally docs for such products are focused on teaching devs how to develop using the product. We wrote a separate set of docs for coding agents. This fork of the docs is tuned and tested on making coding agents successful at running the full, write - test - deploy - test - debug - redeploy, loop on their own. The result is an exceptional experience - devs copy a prompt from our website and paste it into a coding agent, adapt it to whatever agents they want to build, and 20 mins later they have a first version of a live, deployed to a public URL agentic product with a UI, streaming APIs etc. The secret to the whole experience is a collection of markdown files with an index. Here's that index: autonomy.computer/docs/_for-codi… Here are instructions to try it yourself: autonomy.computer/docs/build-wit…
Mrinal Wadhwa tweet media
English
0
0
0
50
claire vo 🖤
claire vo 🖤@clairevo·
If you love ✅ Claude Code ✅ .md ✅ Python ✅ the zen life of having AI run everything This ep of How I AI w @ttorres is for you. Teresa - the fab author of Continuous Discovery Habits - shows us how she combines CC + Obsidian + smart automations for her personal productivity stack. She shows her: - daily task manager built with @claudeai - automation for discovering + ingesting interesting scientific research articles - her super smart "lazy prompting" system One viewer just commented "SO GOOD! Yall could have gone 2 more hours!" As always, huge ups to our amazing sponsors: 🧠🤑 @brexHQ - intelligent finance platform built for founders: brex.com/howiai 🦎 🐛Graphite—The next generation of code review: graphitedev.link/howiai Watch it here: youtu.be/oBho3hZ7MHM
YouTube video
YouTube
English
12
18
257
61.7K
Nikunj Kothari
Nikunj Kothari@nikunj·
Alright it’s happening.. We’re hosting a “Claude Code for Normies” event this Friday with our friends at @AnthropicAI in SF. Luma link for attendees out soon BUT it’ll be a demo night so I’m looking for non-technical folks who can share what they built. Sign up to demo or DM.
Nikunj Kothari tweet media
English
18
5
139
42.9K
Conor Power
Conor Power@conor_power23·
Looking to learn about UX design. Any intro book or course recs?
English
3
0
5
2.7K
Aakash Gupta
Aakash Gupta@aakashgupta·
I've been testing Claude Cowork since it launched. Claude Cowork's promise to make AI work for everyone seemed worth testing. Four things that actually worked for me: 1/ Podcast archive mining: Drops years of transcripts into a folder, then finds moments where guests contradicted each other. 2/ Guest prep: Pulls a podcast guest's LinkedIn posts, appearances, controversial takes. Compares against my existing episodes. 3/ AI trends monitoring: I set it to check X every six hours and log what's spiking. End of day I know what I missed. 4/ Slides from episodes: I fed it transcripts and it pulled quotes, opened Keynote directly, built the deck. Where it breaks: • Google Docs editing, • Flaky connectors, • Bot detection on some sites. Local files are solid. Browser stuff is hit or miss. I wrote a full breakdown with setup instructions, prompts, and screenshots: news.aakashg.com/p/claude-cowork
Aakash Gupta tweet media
English
20
7
112
15K
Mrinal Wadhwa
Mrinal Wadhwa@mrinal·
Gokul is spot on in this post. But the challenge is even bigger. The last gen of vertical AI companies are not just competing against one deep-working long-horizon agent. They are competing against parallel fleets of them. Autonomy enables their competition to create parent agents that can spawn and delegate work to thousands of sub-agents. Each sub-agent has its own filesystem, a shell to run CLI tools, and the ability to write and run new programs on the fly. They divide complex problems, attack from multiple angles, and converge on outcomes in a fraction of the time. Agents, in @autonomy_comp, are modeled as concurrent actors that automatically form secure distributed clusters to enable massive scale on a tiny infra footprint. This creates orders of magnitude advantages in costs, speed, and scale. The question to benchmark is: Can your specialized agent outperform a coordinated team of 100s or 1000s of really-cheap general-purpose agents that can code their way around problems in real-time? If not, then the time to change your approach is now.
Gokul Rajaram@gokulr

VERTICAL AI CHALLENGE Vertical AI Founders: You've spent 2+ years building your agents, training your model on your customers' data, embedding into workflows, creating a powerful GTM motion, all the best practices. You've beaten back challengers and are the #1 or #2 player in your vertical. I'm sorry, you cannot relax. In fact, you need to massively up your game. Turns out you are facing an existential challenge: long-horizon agents (eg: Claude Code). Agents that are not trained on a specific domain, but can reliably work for hours or days on end in pursuit of a goal, self-correct, and actually do stuff. I'm sure many Vertical AI founders will say: "Oh, we are not worried. We are the system of record for decision traces. We train on enterprise-specific context. That's why these horizontal agents can never catch up with this." You might well be right. But, but, but ... you cannot afford to bury your head in the sand. These long-horizon agents will get better very, very quickly. You need to understand precisely how good they are at the exact jobs you've built your agents on. You cannot wait for someone else to do this. For example, if you're a legal AI company with an agent that automates contract review, you must compare how good your specialized agent is versus a general-purpose long-horizon agent that's simply given the contract and asked to perform the same review. My challenge to you: Assign a strong engineer on your team to focus 100% on using long-horizon agents (with minimal context, other than just the contract in the example above) to compete with your custom-trained agents. Benchmark how the long-horizon agents perform vs your agent. Rinse and repeat it every few months. Like with most other things worth measuring, what matters is the rate of improvement (the "slope" vs the Y-intercept). If the long-horizon agent is 30% as good as your vertical agent on Day 1, but 50% as good on Day 60, and 70% as good on Day 120, you need to reassess your product strategy. AGI is coming for everyone. Long-horizon agents are the closest we have to AGI, and as a Vertical AI company, you need to figure out how you compete and survive. Game on.

English
0
1
3
231
Mrinal Wadhwa
Mrinal Wadhwa@mrinal·
Gokul, the challenge is even bigger than you so eloquently described. With tools like Autonomy, the last gen of vertical AI companies are not just competing against one long-horizon agent. They are competing against parallel fleets of them. An parent agent can now orchestrate thousands of sub-agents, each with its own filesystem, a shell to run command line tools, and the ability to write and run new programs on the fly. They divide complex problems, attack from multiple angles, and converge on outcomes in a fraction of the time. The question to benchmark is: Can your specialized agent outperform a coordinated team 100s or 1000s of really-cheap general-purpose agents that can code their way around problems in real-time? autonomy.computer/docs/what-is-a…
English
0
0
2
1.4K
Gokul Rajaram
Gokul Rajaram@gokulr·
VERTICAL AI CHALLENGE Vertical AI Founders: You've spent 2+ years building your agents, training your model on your customers' data, embedding into workflows, creating a powerful GTM motion, all the best practices. You've beaten back challengers and are the #1 or #2 player in your vertical. I'm sorry, you cannot relax. In fact, you need to massively up your game. Turns out you are facing an existential challenge: long-horizon agents (eg: Claude Code). Agents that are not trained on a specific domain, but can reliably work for hours or days on end in pursuit of a goal, self-correct, and actually do stuff. I'm sure many Vertical AI founders will say: "Oh, we are not worried. We are the system of record for decision traces. We train on enterprise-specific context. That's why these horizontal agents can never catch up with this." You might well be right. But, but, but ... you cannot afford to bury your head in the sand. These long-horizon agents will get better very, very quickly. You need to understand precisely how good they are at the exact jobs you've built your agents on. You cannot wait for someone else to do this. For example, if you're a legal AI company with an agent that automates contract review, you must compare how good your specialized agent is versus a general-purpose long-horizon agent that's simply given the contract and asked to perform the same review. My challenge to you: Assign a strong engineer on your team to focus 100% on using long-horizon agents (with minimal context, other than just the contract in the example above) to compete with your custom-trained agents. Benchmark how the long-horizon agents perform vs your agent. Rinse and repeat it every few months. Like with most other things worth measuring, what matters is the rate of improvement (the "slope" vs the Y-intercept). If the long-horizon agent is 30% as good as your vertical agent on Day 1, but 50% as good on Day 60, and 70% as good on Day 120, you need to reassess your product strategy. AGI is coming for everyone. Long-horizon agents are the closest we have to AGI, and as a Vertical AI company, you need to figure out how you compete and survive. Game on.
English
60
47
543
98.5K
Fernando
Fernando@Franc0Fernand0·
@copyconstruct If a team produces code faster than they can understand it, it creates what I’ve been calling “comprehension debt”. Teams that care about quality will take the time to review, understand, and rework LLM-generated code before it makes it into the repo.
English
3
4
91
4.9K
Cindy Sridharan
Cindy Sridharan@copyconstruct·
Unpopular opinion: Unless you’re just prototyping, you should aim to understand as close to 100% of production code generated by LLMs. Yes, all of it. Effective mental models are still important for humans to sustainably maintain and evolve a codebase via prompting alone.
English
151
274
3.1K
260.8K
Mrinal Wadhwa
Mrinal Wadhwa@mrinal·
Aakash, usually I agree with your posts, but let me push back on this one. If AI can replace engineering execution, it can translate customer problems into solutions too. Why would it stop at one but spare the other? Senior PMs have been translating vague pain points into good solutions for years. Senior engineers have been translating vague solution descriptions into secure, reliable architecture for years. AI, at least as it currently stands, needs both types of guidance. Let me posit a different future: 1. Some teams will have people who are an amalgamation of a Senior PM and a Senior Engineer. People that have a mix of deep customer empathy and deep engineering depth. This type of team is what everyone has always wanted but it is sooo hard to build. It will remain super hard. Which will cause founders to assemble teams of a second type: 2. PMs use AI to co-create prototypes working closely with customers. They rapidly vet many variations of ideas and then hand over to engineers who can rapidly build reliable and scalable versions from PM prototypes. In this second arrangement the throughput of the entire pipe accelerates but the PM role remains sort of the same - prioritize what enters the engineering backlog.
English
0
0
12
588
Aakash Gupta
Aakash Gupta@aakashgupta·
The future of product development is a PM with a mass of Claude skills and a small army of agents. The job has always been translating ambiguous customer needs into structured specifications that someone else executes. That someone else used to be an engineering team. Now it's AI. Think about what PMs actually do. Sit in a customer call, hear a messy complaint about workflow friction, convert it into something buildable. That translation layer between human problem and technical solution is the entire skillset. PMs have been writing prompts for fifteen years. They just called them PRDs. The backlog is dead. PMs spent years hoarding features they knew would work but couldn't get prioritized. Quarters of waiting. "We'll get to it in H2." The gap between idea and shipped product was measured in headcount and sprint capacity. Now a PM can wake up with an idea, ship a working prototype before lunch, run user tests by dinner. The roadmap used to be a rationing system. Now it's a launch calendar. This rewires product development entirely. The old model was PM writes spec, waits for eng, gets 30% of what they asked for, compromises on the rest. The new model is PM builds v1, tests with users, hands off to eng only what needs production hardening. Engineering becomes the scaling function. PMs become the creation function. Agent orchestration accelerates this further. The job emerging is someone who coordinates fleets of AI systems, translates business context into agent workflows, manages outcomes over tasks. Senior PM job description wearing a different outfit. Every PM complaint about being "blocked by engineering" was training for a world where you're never blocked again.
Nikunj Kothari@nikunj

A controversial take - but I think the software world hasn’t priced in the fact that PMs are uniquely suited to thrive in this new world. Especially one where the gap between idea and execution has shrunk SO much.. Good PMs are > constantly thinking of new ideas > spending time articulately building plans (exceptionally important for long horizon tasks) > rapid context switching > good sense of outcomes (vs feedback) and selling price of work > talking to customers and able to convert into skills (yes Claude skills) These folks were always hamstrung by the pace of development and now have been set free. Even the “project management” skills that a lot of PMs end up learning at large companies will be helpful in managing a fleet of agents. Now let’s be clear the PMs who are just doing coordination and none of the other things mentioned above were always destined to die a slow death in organizations. But I won’t be surprised if a lot of the really good PMs end up starting companies while it’ll be interesting to see what the role eventually evolves to in ~five years within organizations.

English
17
10
280
36.6K
Mrinal Wadhwa
Mrinal Wadhwa@mrinal·
@zeeg This also works for the http APIs that Autonomy by default creates for each Agent.
Mrinal Wadhwa tweet mediaMrinal Wadhwa tweet media
English
1
0
0
18
David Cramer
David Cramer@zeeg·
my biggest lesson here: single user assistant is wildly simpler than multi user memory gets harder, config gets harder, conversation management gets harder its not even incremental complexity... its like 10x harder to make every subsystem behave like you'd expect in multi-user
David Cramer@zeeg

Inspired by @steipete, I've spent a bunch of my spare time over the last few days working on a personal agent. While I'm not sure its really reusable, or that I'm happy with the current arch, I wanted to share what I've got so far as I think theres some good ideas in here..

English
1
0
45
5.8K
Mrinal Wadhwa
Mrinal Wadhwa@mrinal·
@zeeg Exactly, in our implementation of Actors, the runtime automatically provides a mailbox for each actor (agent) where messages get queued.
English
0
0
1
14
David Cramer
David Cramer@zeeg·
@mrinal Yeah that def makes sense. Simplifies a lot then you just need a queue for operations (eg I only allow two concurrent sessions with active inquiries)
English
1
0
1
41