Devashish Upadhyay

922 posts

Devashish Upadhyay

@devashishup

Built 70+ AI agents at scale. Only 7 made it to production safely. Building https://t.co/Y8cfrIce9p to fix that CTO & Co-founder · AI Engineer · Adventurist 🪂

Sydney, Australia Se unió Mayıs 2020

24 Siguiendo59 Seguidores

Devashish Upadhyay@devashishup·22m

@Jeremybtc build anything with zero code in a day. sure. but we built 70+ agents at a finserv company. only 7 survived production. the build-anything moment lasts until your first real user hits something you didn't test for.

English

Jeremy@Jeremybtc·2h

The fact that you can build literally anything with zero coding knowledge right now is insane. Anyone can just have an idea and launch it the same day. People are already going viral doing it I’m seeing dozens of vibe coded websites and tools every day. But most of what’s being built is just for fun. When people start using this to build real products, actual businesses that generate revenue. That’s when things get really interesting.

English

2.2K

Devashish Upadhyay@devashishup·1h

@hollylawly @AnthropicAI It's not malice, it's architecture. Models have no ground truth enforcement - they'll confidently state anything that fits the pattern. That's what makes this a testing problem, not a policy one.

English

176

Holly Guevara@hollylawly·3h

Honestly at what point does this stuff become defamation? Telling millions of people that we shut down, giving incorrect info about our products, actively telling people not to use us. And @AnthropicAI takes 0 responsibility despite several escalations.

Sam Lambert@samlambert

Claude told a user that PlanetScale had shut our service down. This is unsafe by any definition and Anthropic have made no effort to correct this situation.

English

128

19.9K

Devashish Upadhyay@devashishup·1h

57% of teams have AI agents in production per @langchain. Nobody's asking how many are doing what they were built to do. Built 70. 7 made it. The rest were silent chaos we never planned for.

English

Devashish Upadhyay@devashishup·1h

@BruvImTired @AnthropicAI What pushed it to hate today - rate limits, a hallucination, or surprise breaking change at 2am? 70 agents of been-there opinions behind this question

English

ahmet@BruvImTired·5h

dear @AnthropicAI, i love you and i hate you regards, all developers ever

English

145

4.6K

Devashish Upadhyay@devashishup·3h

@heygurisingh 8 agents talking to each other sounds great until Scribe and Seeker start contradicting each other and you don't notice for 2 weeks. built 70+ agents - inter-agent consistency is the failure mode nobody talks about. does this have a conflict resolution layer?

English

156

Guri Singh@heygurisingh·12h

If you have brain fog, ADHD, or an overloaded working memory, save this. A PhD researcher who was forgetting everything just built 8 AI agents that manage your entire second brain through conversation. Free. Open source. Works in any language. You just talk. The crew does the rest: - Architect designs your vault and runs onboarding - Scribe turns messy brain dumps into clean notes - Sorter empties your inbox every evening - Seeker searches your vault and answers with citations - Connector finds hidden links between your notes - Librarian runs weekly health audits and fixes broken links - Transcriber turns meetings into structured notes - Postman scans Gmail and Calendar for deadlines And they talk to each other. When the Transcriber processes a meeting, it alerts the Sorter. When the Postman finds a deadline, it flags the Architect. It's a crew. Not a stack of isolated tools. Works on Claude Code CLI and Desktop. Runs 100% locally on your Obsidian vault. Built by someone who got tired of forgetting things. Link in reply ↓

English

298

17.2K

Devashish Upadhyay@devashishup·3h

@tammireddy this. we saw the same with 70+ agents we built - the ones that failed all had one thing in common too: nobody got alerted when they started drifting. silent failures kill ROI faster than bad models.

English

Krishna Tammireddy@tammireddy·6h

ts hit ROI within 90 days. the 27% that didn't have one thing in common. nobody was watching when it broke.

English

Devashish Upadhyay@devashishup·4h

@belimad @AnthropicAI @openclaw @steipete we built 70+ agents on Opus. a TOS shift like this broke 3 of our prod integrations overnight. the lesson: never hardcode your model provider. abstraction layers exist for exactly this

English

235

Mariano Belinky@belimad·7h

1st act: @AnthropicAI kicks us out 2nd act: everyone says GPT‑5.4 has horrible personality, runs for the exit, @openclaw is dead 3rd act: @steipete + the team make it 100 times better than the original. Necessity is the mother of invention. So long, Opus. You had a good run.

Vincent Koc@vincent_koc

We listened, shipping harness improvements for personality with @openai 5.4 on @openclaw to have some sass!

English

259

44.4K

Devashish Upadhyay@devashishup·5h

The vague quotas critique lands. We built Claude connectors into Outlook and SharePoint for 2 enterprise clients - rate limit inconsistency across model versions killed both. @AnthropicAI when does the agent-tier SLA conversation happen?

English

Fekri@fekdaoui·16h

this is why @openai wins: > honest about where they suck > open-source friendly (codex OS since day 1) > third-party friendly (use w openclaw, opencode etc. ) > cutting distractions (discontinuing sora) > generous limits meanwhile @AnthropicAI: > vague quotas > constantly nerfing their model > users hitting limits way faster than expected > peak-hour caps getting tightened > third-party harnesses pushed off the subscription > “you can use it, but only the way we want"

Tibo@thsottiaux

@kr0der Our plan 1) make a model great at design and frontend 2) ask it to make a great mascot and we are still at 1

English

276

35.3K

Devashish Upadhyay@devashishup·6h

@businessbarista @AnthropicAI What nobody talks about at these workshops: these leaders will leave inspired, go back and tell their IT team to deploy 5 Claude Code workflows, and 3 months later wonder why 4 of them stopped working. The hard part isn't the workshop. It's what happens after. @ai_anthropic

English

204

Alex Lieberman@businessbarista·8h

This Friday we're cohosting an invite-only Claude Code Workshop for enterprise leaders with @AnthropicAI in NYC. The guest list is insane. Small selection: - CEO of JP Morgan Wealth Management - Chief Advertising Officer of NY Times - Head of AI Transformation at Salesforce - Head of Data at Starwood Capital - Head of Innovation at San Antonio Spurs - AI Lead at PGA Tour It's a 5-hour intensive for Fortune 500 leaders to learn how to harness the power of Claude Code through building real applications with Claude Code. We currently have 2 spots left for the event. If you are an enterprise leader & want to be considered, sign-up below. If you know an enterprise leader & think they'd love this, have them sign-up below.

English

133

20.2K

Devashish Upadhyay@devashishup·6h

@jessegenet We ran 70 agents in prod. Cost was never just the API bill - retries on failures, fallback loops, redundant calls from poor state management... the real bill was 3-4x the base API cost. And that's before you catch the behaviors you never intended.

English

Jesse Genet@jessegenet·9h

Yup - If you want your agent to feel human and to get a lot done it has a steep true price right now As people get better at setting up agents providers will throttle usage and dial in their pricing so they don’t lose money Running local models will be key to AI autonomy

Ryan Carson@ryancarson

The cost to run a truly useful Chief of Staff @openclaw on Opus 4.6 is $100-200 per day on the API.

English

190

19K

Devashish Upadhyay@devashishup·7h

@TechByMarkandey Memory is one piece. The harder part is when your agent confidently acts on the wrong thing. Saw this with 70+ agents in prod - silent wrong-memory failures were brutal. How does ByteRover handle hallucinated memories at scale?

English

Markandey Sharma@TechByMarkandey·1d

Most AI agents forget. This one doesn’t. Hermes Agent by Nous Research just got a serious upgrade with ByteRover - turning it from a stateless tool into something that actually learns over time. ⚡ What stands out: • Built on a production-proven memory system (30K+ downloads in week 1) • >92% retrieval accuracy across long-running sessions • ~1.6s retrieval — often no LLM call needed • Fully local by default (with optional cloud sync) • 50–70% token cost savings But the real shift? This isn’t just “better memory.” It’s a move toward agents with persistent, evolving intelligence. Instead of re-prompting every time, your agent remembers context, decisions, and logic even months later. If you’re building with AI agents, this is worth paying attention to. Try it yourself: github.com/campfirein/byt…

andy nguyen@kevinnguyendn

x.com/i/article/2039…

English

48.6K

Devashish Upadhyay@devashishup·8h

Vibe coding is great until the agent ships to prod doing things you never intended. Built 70+ agents, only 7 made it. The vibe dies at 2am when prod is down. Do you see vibe-coded agents actually surviving production?

English

Javi🥥.eth@jgonzalezferrer·15h

The vibe coding community is the fastest-growing community in CT right now And it's not even really a community It's just a bunch of people doing the same thing at the same time and posting about it The vibe coding community formed without a Discord, a token or a roadmap Ironically, that's more community than 99% of projects with a "community manager" have ever achieved

English

211

3.9K

Devashish Upadhyay@devashishup·9h

@tammireddy @AnthropicAI Agree. The moment itself is unavoidable. But how you handle it is a choice. Credit + acknowledgement from @AnthropicAI shows you can draw a line without burning the builders who bet on you. That's the blueprint. How many platforms are actually ready for that?

English

Krishna Tammireddy@tammireddy·9h

Every AI platform will have its OpenClaw moment. The question is what they do next. @AnthropicAI chose credit and acknowledgement. That's not nothing.

English

Devashish Upadhyay@devashishup·9h

@RoundtableSpace Built 70+ agents with @AnthropicAI tools. Only 7 hit prod. The $200/mo vs $19/mo debate misses the point - the real cost is agents failing silently in production. What's your test coverage before you ship?

English

214

0xMarioNawfal@RoundtableSpace·10h

Claude Code is $200/month. GitHub Copilot is $19/month. Jack Dorsey's company just open-sourced a free alternative with 35,000 GitHub stars. It's called Goose. - Works with any LLM — Claude, GPT, Gemini, Llama, DeepSeek - Reads and edits your entire codebase - Runs shell commands and installs dependencies - Executes and debugs code automatically - Desktop, CLI, and web interface - Written in Rust. No bloat. Block is a $40 billion company. They built it for their own engineers then gave it to everyone.

English

507

84.5K

Devashish Upadhyay@devashishup·9h

Cursor 3 ships parallel AI agents. @cursor_ai Each new parallel run = one more thing that can go wrong in prod. I built 70 agents at a financial services company. Only 7 shipped safely. Nobody was stress testing at scale. Who is doing that today?

English

Devashish Upadhyay@devashishup·9h

@Agentdailyai @MatthewBerman @AnthropicAI model access IS infra. we treat every provider switch as a deployment event now - versioned, tested, observable. went through this drill 3x in 12 months. what does your rollback playbook look like when @AnthropicAI drops a model?

English

Agent Daily AI@Agentdailyai·1d

@devashishup @MatthewBerman @AnthropicAI I'd separate the two. the policy I understand. the notice is a different failure — that's an infra reliability signal, not a product decision. i treat model access as infrastructure now specifically because of moves like this

English

Matthew Berman@MatthewBerman·1d

Oh also...LESS THAN 24 HOURS NOTICE to switch off of Claude subscription...on a holiday weekend. Probably the lamest thing of all.

Matthew Berman@MatthewBerman

It’s over. Officially. No more Claude in OpenClaw. Way to drop this Friday late afternoon @AnthropicAI So lame

English

254

20.4K

Devashish Upadhyay@devashishup·10h

@2sush Exactly. 70+ agents at a financial services firm - @Microsoft showcased us in Singapore. 63 never shipped. The hard part is not the bugs you see in dev. It's the behaviors the agent develops that nobody defined as wrong. How do you test for behavioral drift

English

sush@2sush·18h

Vibe coding is fun until production hits .Anyone can build an app with tools like Cursor, v0, Replit Agent. But shipping real products? Different game.More bugs. More security risks. More “why is this breaking?” AI helps but real engineering still wins.

English

2.7K

Devashish Upadhyay@devashishup·10h

@GoogleCloudTech avibe checks killed 63 of my 70 agents before prod. @GoogleCloudTech CE is the right call. but CE only catches failures you defined. the silent ones - agent logging data you never intended, behaviors drifting over 1000 runs - those get you at 3am. who's testing for those?

English

Google Cloud Tech@GoogleCloudTech·2d

Relying on vibe checks—manually chatting with your agent to see if it feels right—is a recipe for disaster in production. Engineer reliable AI agents by applying continuous evaluation (CE) using ADK, Vertex AI Gen AI evaluation service, and Cloud Run → goo.gle/3NVkFHF

English

135

Devashish Upadhyay@devashishup·11h

@heynavtoor The $200 vs $19 debate misses the real cost. 70 agents at a fintech - @Microsoft showcased us in Singapore. The subscription is easy. The agent editing prod files at 3am when told to "just test this quick" is not. Does Goose have rollback guardrails for that?

English

1.9K

Nav Toor@heynavtoor·15h

🚨 Claude Code costs $200/month. GitHub Copilot costs $19/month. Jack Dorsey's company built a free alternative. 35,000 GitHub stars. It's called Goose. An open source AI agent built by Block that goes beyond code suggestions. It installs, executes, edits, and tests. With any LLM you choose. Not autocomplete. Not suggestions. A full autonomous agent that takes actions on your computer. No vendor lock-in. No monthly subscription. Bring your own model. Here's what Goose does: → Works with ANY LLM. Claude, GPT, Gemini, Llama, DeepSeek, Ollama. Your choice. → Reads and understands your entire codebase → Writes, edits, and refactors code across multiple files → Runs shell commands and installs dependencies → Executes and debugs your code automatically → Extensible through MCP. Connect it to any external tool. → Desktop app, CLI, and web interface. Pick your workflow. → Written in Rust. Fast. Lightweight. No bloat. Here's the wildest part: Block is a $40 billion company. They built Cash App, Square, and TIDAL. They use Goose internally. Then they open sourced the entire thing. This isn't a side project from a random developer. This is production-grade tooling from a company that processes billions in payments. Built for their own engineers. Given to everyone. Claude Code: $200/month. Locked to Claude. GitHub Copilot: $19/month. Locked to GitHub. Cursor: $20/month. Locked to their editor. Goose: Free. Any LLM. Any editor. Any workflow. Forever. 35.3K GitHub stars. 3.3K forks. 4,078 commits. Built by Block. 100% Open Source. Apache 2.0 License.

English

161

309

2.4K

250.2K

Devashish Upadhyay@devashishup·11h

@tammireddy @AnthropicAI This is the exact gap nobody planned for. Built on subscription access, not API contracts - two different SLAs. @AnthropicAI made the right call. Now those businesses are learning what prod-grade AI infra actually means.

English

Krishna Tammireddy@tammireddy·15h

It kicks in today at 12pm PT. The businesses who built OpenClaw workflows for their front desk have a few hours to figure it out. @AnthropicAI drew the line - makes sense. But nobody's calling their IT guy on a Sunday.

Boris Cherny@bcherny

Starting tomorrow at 12pm PT, Claude subscriptions will no longer cover usage on third-party tools like OpenClaw. You can still use these tools with your Claude login via extra usage bundles (now available at a discount), or with a Claude API key.

English

148

Descubrir

@Jeremybtc @hollylawly @AnthropicAI @langchain @BruvImTired @heygurisingh @tammireddy @belimad