Devashish Upadhyay

919 posts

Devashish Upadhyay

@devashishup

Built 70+ AI agents at scale. Only 7 made it to production safely. Building https://t.co/Y8cfrIce9p to fix that CTO & Co-founder · AI Engineer · Adventurist 🪂

Sydney, Australia Katılım Mayıs 2020

24 Takip Edilen60 Takipçiler

Devashish Upadhyay@devashishup·13m

@BruvImTired @AnthropicAI What pushed it to hate today - rate limits, a hallucination, or surprise breaking change at 2am? 70 agents of been-there opinions behind this question

English

ahmet@BruvImTired·3h

dear @AnthropicAI, i love you and i hate you regards, all developers ever

English

108

3.4K

Devashish Upadhyay@devashishup·1h

@heygurisingh 8 agents talking to each other sounds great until Scribe and Seeker start contradicting each other and you don't notice for 2 weeks. built 70+ agents - inter-agent consistency is the failure mode nobody talks about. does this have a conflict resolution layer?

English

Guri Singh@heygurisingh·11h

If you have brain fog, ADHD, or an overloaded working memory, save this. A PhD researcher who was forgetting everything just built 8 AI agents that manage your entire second brain through conversation. Free. Open source. Works in any language. You just talk. The crew does the rest: - Architect designs your vault and runs onboarding - Scribe turns messy brain dumps into clean notes - Sorter empties your inbox every evening - Seeker searches your vault and answers with citations - Connector finds hidden links between your notes - Librarian runs weekly health audits and fixes broken links - Transcriber turns meetings into structured notes - Postman scans Gmail and Calendar for deadlines And they talk to each other. When the Transcriber processes a meeting, it alerts the Sorter. When the Postman finds a deadline, it flags the Architect. It's a crew. Not a stack of isolated tools. Works on Claude Code CLI and Desktop. Runs 100% locally on your Obsidian vault. Built by someone who got tired of forgetting things. Link in reply ↓

English

264

14.8K

Devashish Upadhyay@devashishup·2h

@tammireddy this. we saw the same with 70+ agents we built - the ones that failed all had one thing in common too: nobody got alerted when they started drifting. silent failures kill ROI faster than bad models.

English

Krishna Tammireddy@tammireddy·4h

ts hit ROI within 90 days. the 27% that didn't have one thing in common. nobody was watching when it broke.

English

Devashish Upadhyay@devashishup·2h

@belimad @AnthropicAI @openclaw @steipete we built 70+ agents on Opus. a TOS shift like this broke 3 of our prod integrations overnight. the lesson: never hardcode your model provider. abstraction layers exist for exactly this

English

149

Mariano Belinky@belimad·5h

1st act: @AnthropicAI kicks us out 2nd act: everyone says GPT‑5.4 has horrible personality, runs for the exit, @openclaw is dead 3rd act: @steipete + the team make it 100 times better than the original. Necessity is the mother of invention. So long, Opus. You had a good run.

Vincent Koc@vincent_koc

We listened, shipping harness improvements for personality with @openai 5.4 on @openclaw to have some sass!

English

179

31.3K

Devashish Upadhyay@devashishup·3h

The vague quotas critique lands. We built Claude connectors into Outlook and SharePoint for 2 enterprise clients - rate limit inconsistency across model versions killed both. @AnthropicAI when does the agent-tier SLA conversation happen?

English

Fekri@fekdaoui·14h

this is why @openai wins: > honest about where they suck > open-source friendly (codex OS since day 1) > third-party friendly (use w openclaw, opencode etc. ) > cutting distractions (discontinuing sora) > generous limits meanwhile @AnthropicAI: > vague quotas > constantly nerfing their model > users hitting limits way faster than expected > peak-hour caps getting tightened > third-party harnesses pushed off the subscription > “you can use it, but only the way we want"

Tibo@thsottiaux

@kr0der Our plan 1) make a model great at design and frontend 2) ask it to make a great mascot and we are still at 1

English

270

34.2K

Devashish Upadhyay@devashishup·4h

@businessbarista @AnthropicAI What nobody talks about at these workshops: these leaders will leave inspired, go back and tell their IT team to deploy 5 Claude Code workflows, and 3 months later wonder why 4 of them stopped working. The hard part isn't the workshop. It's what happens after. @ai_anthropic

English

188

Alex Lieberman@businessbarista·7h

This Friday we're cohosting an invite-only Claude Code Workshop for enterprise leaders with @AnthropicAI in NYC. The guest list is insane. Small selection: - CEO of JP Morgan Wealth Management - Chief Advertising Officer of NY Times - Head of AI Transformation at Salesforce - Head of Data at Starwood Capital - Head of Innovation at San Antonio Spurs - AI Lead at PGA Tour It's a 5-hour intensive for Fortune 500 leaders to learn how to harness the power of Claude Code through building real applications with Claude Code. We currently have 2 spots left for the event. If you are an enterprise leader & want to be considered, sign-up below. If you know an enterprise leader & think they'd love this, have them sign-up below.

English

124

18.4K

Devashish Upadhyay@devashishup·5h

@jessegenet We ran 70 agents in prod. Cost was never just the API bill - retries on failures, fallback loops, redundant calls from poor state management... the real bill was 3-4x the base API cost. And that's before you catch the behaviors you never intended.

English

Jesse Genet@jessegenet·7h

Yup - If you want your agent to feel human and to get a lot done it has a steep true price right now As people get better at setting up agents providers will throttle usage and dial in their pricing so they don’t lose money Running local models will be key to AI autonomy

Ryan Carson@ryancarson

The cost to run a truly useful Chief of Staff @openclaw on Opus 4.6 is $100-200 per day on the API.

English

175

16.8K

Devashish Upadhyay@devashishup·5h

@TechByMarkandey Memory is one piece. The harder part is when your agent confidently acts on the wrong thing. Saw this with 70+ agents in prod - silent wrong-memory failures were brutal. How does ByteRover handle hallucinated memories at scale?

English

Markandey Sharma@TechByMarkandey·1d

Most AI agents forget. This one doesn’t. Hermes Agent by Nous Research just got a serious upgrade with ByteRover - turning it from a stateless tool into something that actually learns over time. ⚡ What stands out: • Built on a production-proven memory system (30K+ downloads in week 1) • >92% retrieval accuracy across long-running sessions • ~1.6s retrieval — often no LLM call needed • Fully local by default (with optional cloud sync) • 50–70% token cost savings But the real shift? This isn’t just “better memory.” It’s a move toward agents with persistent, evolving intelligence. Instead of re-prompting every time, your agent remembers context, decisions, and logic even months later. If you’re building with AI agents, this is worth paying attention to. Try it yourself: github.com/campfirein/byt…

andy nguyen@kevinnguyendn

x.com/i/article/2039…

English

48.6K

Devashish Upadhyay@devashishup·7h

Vibe coding is great until the agent ships to prod doing things you never intended. Built 70+ agents, only 7 made it. The vibe dies at 2am when prod is down. Do you see vibe-coded agents actually surviving production?

English

Javi🥥.eth@jgonzalezferrer·13h

The vibe coding community is the fastest-growing community in CT right now And it's not even really a community It's just a bunch of people doing the same thing at the same time and posting about it The vibe coding community formed without a Discord, a token or a roadmap Ironically, that's more community than 99% of projects with a "community manager" have ever achieved

English

209

3.9K

Devashish Upadhyay@devashishup·7h

@tammireddy @AnthropicAI Agree. The moment itself is unavoidable. But how you handle it is a choice. Credit + acknowledgement from @AnthropicAI shows you can draw a line without burning the builders who bet on you. That's the blueprint. How many platforms are actually ready for that?

English

Krishna Tammireddy@tammireddy·8h

Every AI platform will have its OpenClaw moment. The question is what they do next. @AnthropicAI chose credit and acknowledgement. That's not nothing.

English

Devashish Upadhyay@devashishup·7h

@RoundtableSpace Built 70+ agents with @AnthropicAI tools. Only 7 hit prod. The $200/mo vs $19/mo debate misses the point - the real cost is agents failing silently in production. What's your test coverage before you ship?

English

201

0xMarioNawfal@RoundtableSpace·9h

Claude Code is $200/month. GitHub Copilot is $19/month. Jack Dorsey's company just open-sourced a free alternative with 35,000 GitHub stars. It's called Goose. - Works with any LLM — Claude, GPT, Gemini, Llama, DeepSeek - Reads and edits your entire codebase - Runs shell commands and installs dependencies - Executes and debugs code automatically - Desktop, CLI, and web interface - Written in Rust. No bloat. Block is a $40 billion company. They built it for their own engineers then gave it to everyone.

English

458

80.7K

Devashish Upadhyay@devashishup·7h

Cursor 3 ships parallel AI agents. @cursor_ai Each new parallel run = one more thing that can go wrong in prod. I built 70 agents at a financial services company. Only 7 shipped safely. Nobody was stress testing at scale. Who is doing that today?

English

Devashish Upadhyay@devashishup·7h

@Agentdailyai @MatthewBerman @AnthropicAI model access IS infra. we treat every provider switch as a deployment event now - versioned, tested, observable. went through this drill 3x in 12 months. what does your rollback playbook look like when @AnthropicAI drops a model?

English

Agent Daily AI@Agentdailyai·1d

@devashishup @MatthewBerman @AnthropicAI I'd separate the two. the policy I understand. the notice is a different failure — that's an infra reliability signal, not a product decision. i treat model access as infrastructure now specifically because of moves like this

English

Matthew Berman@MatthewBerman·1d

Oh also...LESS THAN 24 HOURS NOTICE to switch off of Claude subscription...on a holiday weekend. Probably the lamest thing of all.

Matthew Berman@MatthewBerman

It’s over. Officially. No more Claude in OpenClaw. Way to drop this Friday late afternoon @AnthropicAI So lame

English

253

20.3K

Devashish Upadhyay@devashishup·8h

@2sush Exactly. 70+ agents at a financial services firm - @Microsoft showcased us in Singapore. 63 never shipped. The hard part is not the bugs you see in dev. It's the behaviors the agent develops that nobody defined as wrong. How do you test for behavioral drift

English

sush@2sush·16h

Vibe coding is fun until production hits .Anyone can build an app with tools like Cursor, v0, Replit Agent. But shipping real products? Different game.More bugs. More security risks. More “why is this breaking?” AI helps but real engineering still wins.

English

2.6K

Devashish Upadhyay@devashishup·9h

@GoogleCloudTech avibe checks killed 63 of my 70 agents before prod. @GoogleCloudTech CE is the right call. but CE only catches failures you defined. the silent ones - agent logging data you never intended, behaviors drifting over 1000 runs - those get you at 3am. who's testing for those?

English

Google Cloud Tech@GoogleCloudTech·2d

Relying on vibe checks—manually chatting with your agent to see if it feels right—is a recipe for disaster in production. Engineer reliable AI agents by applying continuous evaluation (CE) using ADK, Vertex AI Gen AI evaluation service, and Cloud Run → goo.gle/3NVkFHF

English

135

Devashish Upadhyay@devashishup·9h

@heynavtoor The $200 vs $19 debate misses the real cost. 70 agents at a fintech - @Microsoft showcased us in Singapore. The subscription is easy. The agent editing prod files at 3am when told to "just test this quick" is not. Does Goose have rollback guardrails for that?

English

1.7K

Nav Toor@heynavtoor·14h

🚨 Claude Code costs $200/month. GitHub Copilot costs $19/month. Jack Dorsey's company built a free alternative. 35,000 GitHub stars. It's called Goose. An open source AI agent built by Block that goes beyond code suggestions. It installs, executes, edits, and tests. With any LLM you choose. Not autocomplete. Not suggestions. A full autonomous agent that takes actions on your computer. No vendor lock-in. No monthly subscription. Bring your own model. Here's what Goose does: → Works with ANY LLM. Claude, GPT, Gemini, Llama, DeepSeek, Ollama. Your choice. → Reads and understands your entire codebase → Writes, edits, and refactors code across multiple files → Runs shell commands and installs dependencies → Executes and debugs your code automatically → Extensible through MCP. Connect it to any external tool. → Desktop app, CLI, and web interface. Pick your workflow. → Written in Rust. Fast. Lightweight. No bloat. Here's the wildest part: Block is a $40 billion company. They built Cash App, Square, and TIDAL. They use Goose internally. Then they open sourced the entire thing. This isn't a side project from a random developer. This is production-grade tooling from a company that processes billions in payments. Built for their own engineers. Given to everyone. Claude Code: $200/month. Locked to Claude. GitHub Copilot: $19/month. Locked to GitHub. Cursor: $20/month. Locked to their editor. Goose: Free. Any LLM. Any editor. Any workflow. Forever. 35.3K GitHub stars. 3.3K forks. 4,078 commits. Built by Block. 100% Open Source. Apache 2.0 License.

English

154

293

2.3K

238.8K

Devashish Upadhyay@devashishup·10h

@tammireddy @AnthropicAI This is the exact gap nobody planned for. Built on subscription access, not API contracts - two different SLAs. @AnthropicAI made the right call. Now those businesses are learning what prod-grade AI infra actually means.

English

Krishna Tammireddy@tammireddy·14h

It kicks in today at 12pm PT. The businesses who built OpenClaw workflows for their front desk have a few hours to figure it out. @AnthropicAI drew the line - makes sense. But nobody's calling their IT guy on a Sunday.

Boris Cherny@bcherny

Starting tomorrow at 12pm PT, Claude subscriptions will no longer cover usage on third-party tools like OpenClaw. You can still use these tools with your Claude login via extra usage bundles (now available at a discount), or with a Claude API key.

English

143

Devashish Upadhyay@devashishup·10h

@whyyoutouzhele 30% efficiency mandate assumes the AI works reliably 100% of the time. we built 70+ agents for fintech - most broke in ways nobody anticipated. how many companies are deploying this without actually testing what happens when it fails?

English

5.6K

李老师不是你老师@whyyoutouzhele·12h

4月3日，上海。一名程序员称，我成AI受害者了，今年公司要求我们用AI至少提效30% 公司提供claude code账号给我们，让我们用AI写代码，工时乘以0.7。并且公司还会对员工进行考核，如果提效不明显排在末尾的员工，会被裁员。

中文

138

598

238.8K

Devashish Upadhyay@devashishup·10h

@JamieMallers @lennysan @simonw @AnthropicAI exact ratio we saw. deployed 70+ agents, 2 weeks post-launch i was full-time ops. per-step traces + auto-kill on quality drops changed everything. 3am alerts dropped to zero. babysitting IS the hidden cost nobody budgets for

English

Jamie Mallers@JamieMallers·1d

@devashishup @lennysan @simonw @AnthropicAI 10% build, 90% babysit is the real agent ratio. Treat drift like infra incidents — traces per decision step so you see where it diverged, token budget circuit breakers, auto-kill on quality drops. Can't scope what you can't observe.

English

Lenny Rachitsky@lennysan·2d

My biggest takeaways from @simonw: 1. November 2025 was an inflection point for AI coding. GPT 5.1 and Claude Opus 4.5 crossed a threshold where coding agents went from “mostly works” to “almost always does what you want it to do.” Software engineers who tinkered over the holidays realized the technology had become genuinely reliable. 2. Mid-career engineers are the most vulnerable—not juniors, not seniors. AI amplifies experienced engineers by letting them leverage decades of pattern recognition. It also dramatically helps new engineers onboard. Cloudflare and Shopify each hired a thousand interns because AI cut ramp-up time from a month to a week. But mid-career engineers who haven’t accumulated deep expertise and have already captured the beginner boost are in the most precarious position. 3. AI exhaustion is real and underestimated. Simon runs four coding agents in parallel and is mentally wiped out by 11 a.m. He’s getting more time back, but his brain is exhausted from the intensity of directing multiple autonomous workers. Some engineers are losing sleep to keep agents running. This may just be a novelty issue, but the underlying dynamic—that managing AI amplifies cognitive load even as it reduces labor—is a real tension. Good companies will manage expectations rather than expecting 5x output indefinitely. 4. Code is cheap now. This simple idea has profound implications. The thing that used to take most of the time—writing code—now takes the least. The bottleneck has shifted to everything else: deciding what to build, proving ideas work, getting user feedback. Since prototyping is nearly free, Simon often builds three versions of every feature when he’s getting started. 5. The “dark factory” is the most radical experiment in AI-assisted development happening right now. A company called StrongDM established a policy: nobody writes code, nobody reads code. Instead, they run a swarm of AI-simulated end users 24/7—thousands of fake employees making requests like “give me access to Jira”—at $10,000 a day in token costs. They even had coding agents build simulated versions of Slack, Jira, and Okta from API documentation so they could test without rate limits. 6. "Red/green TDD" is the single highest-leverage agentic engineering pattern. Having coding agents write tests first, watch them fail, then write the implementation, then watch them pass produces materially better results. The five-word prompt “use red/green TDD” encodes this entire workflow because the agents recognize the jargon. 7. “Hoarding things you know how to do” is one of Simon's other favorite agentic engineering patterns. Simon maintains a GitHub repo of 193 small HTML/JavaScript tools and a separate research repo of coding-agent experiments. Each one captures a technique, a proof of concept, or a library he’s tested. When a new problem arrives, he can point Claude Code at past projects and say “combine these two approaches.” 8. The "lethal trifecta" makes AI agent security fundamentally unsolved. Whenever an AI agent has access to private data, exposure to untrusted content (like incoming emails), and the ability to send data externally (like replying to email), you have a lethal trifecta. Prompt injection—where malicious instructions in untrusted text override the agent’s intended behavior—cannot be reliably prevented. Simon has predicted a “Challenger disaster” for AI security every six months for three years. It hasn’t happened yet, but he’s pretty sure it will. 9. Start every project from a thin template, not a long instructions file. Coding agents are phenomenally good at matching existing patterns. A single test file with your preferred indentation and style is more effective than paragraphs of written instructions. Simon starts every project with a template containing one test (literally testing that 1 + 1 = 2) laid out in his preferred style. The agent picks it up and follows the convention across the entire codebase. This is cheaper and more reliable than maintaining elaborate prompt files. 10. The pelican-on-a-bicycle benchmark accidentally became a real AI benchmark. Simon created it as a joke to mock numeric benchmarks—get each LLM to generate an SVG of a pelican riding a bicycle, and compare the drawings. Unexpectedly, there’s a strong correlation between how good the drawing is and how good the model is at everything else. Nobody can explain why. It’s become a meme: Gemini 3.1’s launch video featured a pelican riding a bicycle. The AI labs are aware of it and quietly competing on it. Don't miss our full conversation: youtube.com/watch?v=wc8FBh…

YouTube

Lenny Rachitsky@lennysan

"Using coding agents well is taking every inch of my 25 years of experience as a software engineer." Simon Willison (@simonw) is one of the most prolific independent software engineers and most trusted voices on how AI is changing the craft of building software. He co-created Django, coined the term "prompt injection," and popularized the terms "agentic engineering" and "AI slop." In our in-depth conversation, we discuss: 🔸 Why November 2025 was an inflection point 🔸 The "dark factory" pattern 🔸 Why mid-career engineers (not juniors) are the most at risk right now 🔸 Three agentic engineering patterns he uses daily: red/green TDD, thin templates, hoarding 🔸 Why he writes 95% of his code from his phone while walking the dog 🔸 Why he thinks we're headed for an AI Challenger disaster 🔸 How a pelican riding a bicycle became the unofficial benchmark for AI model quality Listen now 👇 youtu.be/wc8FBhQtdsA

English

139

1.1K

317.7K

Devashish Upadhyay@devashishup·10h

Ran 70 agents at a fintech. 3 were hitting prod APIs while everyone thought we were in staging. Wasn't even a bug - right intent, wrong env. Nobody noticed for 11 days. @AnthropicAI agents that execute real transactions make this risk completely different. What's your env check?

English

Keşfet

@BruvImTired @AnthropicAI @heygurisingh @tammireddy @belimad @openclaw @steipete @openai