Devashish Upadhyay

922 posts

Devashish Upadhyay banner
Devashish Upadhyay

Devashish Upadhyay

@devashishup

Built 70+ AI agents at scale. Only 7 made it to production safely. Building https://t.co/Y8cfrIce9p to fix that CTO & Co-founder · AI Engineer · Adventurist 🪂

Sydney, Australia Se unió Mayıs 2020
24 Siguiendo59 Seguidores
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@Jeremybtc build anything with zero code in a day. sure. but we built 70+ agents at a finserv company. only 7 survived production. the build-anything moment lasts until your first real user hits something you didn't test for.
English
0
0
0
4
Jeremy
Jeremy@Jeremybtc·
The fact that you can build literally anything with zero coding knowledge right now is insane. Anyone can just have an idea and launch it the same day. People are already going viral doing it I’m seeing dozens of vibe coded websites and tools every day. But most of what’s being built is just for fun. When people start using this to build real products, actual businesses that generate revenue. That’s when things get really interesting.
English
35
4
76
2.2K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@hollylawly @AnthropicAI It's not malice, it's architecture. Models have no ground truth enforcement - they'll confidently state anything that fits the pattern. That's what makes this a testing problem, not a policy one.
English
0
0
0
176
Devashish Upadhyay
Devashish Upadhyay@devashishup·
57% of teams have AI agents in production per @langchain. Nobody's asking how many are doing what they were built to do. Built 70. 7 made it. The rest were silent chaos we never planned for.
English
0
0
1
11
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@BruvImTired @AnthropicAI What pushed it to hate today - rate limits, a hallucination, or surprise breaking change at 2am? 70 agents of been-there opinions behind this question
English
0
0
0
74
ahmet
ahmet@BruvImTired·
dear @AnthropicAI, i love you and i hate you regards, all developers ever
English
17
4
145
4.6K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@heygurisingh 8 agents talking to each other sounds great until Scribe and Seeker start contradicting each other and you don't notice for 2 weeks. built 70+ agents - inter-agent consistency is the failure mode nobody talks about. does this have a conflict resolution layer?
English
0
0
2
156
Guri Singh
Guri Singh@heygurisingh·
If you have brain fog, ADHD, or an overloaded working memory, save this. A PhD researcher who was forgetting everything just built 8 AI agents that manage your entire second brain through conversation. Free. Open source. Works in any language. You just talk. The crew does the rest: - Architect designs your vault and runs onboarding - Scribe turns messy brain dumps into clean notes - Sorter empties your inbox every evening - Seeker searches your vault and answers with citations - Connector finds hidden links between your notes - Librarian runs weekly health audits and fixes broken links - Transcriber turns meetings into structured notes - Postman scans Gmail and Calendar for deadlines And they talk to each other. When the Transcriber processes a meeting, it alerts the Sorter. When the Postman finds a deadline, it flags the Architect. It's a crew. Not a stack of isolated tools. Works on Claude Code CLI and Desktop. Runs 100% locally on your Obsidian vault. Built by someone who got tired of forgetting things. Link in reply ↓
Guri Singh tweet media
English
13
40
298
17.2K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@tammireddy this. we saw the same with 70+ agents we built - the ones that failed all had one thing in common too: nobody got alerted when they started drifting. silent failures kill ROI faster than bad models.
English
0
0
0
4
Krishna Tammireddy
Krishna Tammireddy@tammireddy·
ts hit ROI within 90 days. the 27% that didn't have one thing in common. nobody was watching when it broke.
English
1
0
1
23
Devashish Upadhyay
Devashish Upadhyay@devashishup·
The vague quotas critique lands. We built Claude connectors into Outlook and SharePoint for 2 enterprise clients - rate limit inconsistency across model versions killed both. @AnthropicAI when does the agent-tier SLA conversation happen?
English
0
0
0
69
Fekri
Fekri@fekdaoui·
this is why @openai wins: > honest about where they suck > open-source friendly (codex OS since day 1) > third-party friendly (use w openclaw, opencode etc. ) > cutting distractions (discontinuing sora) > generous limits meanwhile @AnthropicAI: > vague quotas > constantly nerfing their model > users hitting limits way faster than expected > peak-hour caps getting tightened > third-party harnesses pushed off the subscription > “you can use it, but only the way we want"
Tibo@thsottiaux

@kr0der Our plan 1) make a model great at design and frontend 2) ask it to make a great mascot and we are still at 1

English
16
13
276
35.3K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@businessbarista @AnthropicAI What nobody talks about at these workshops: these leaders will leave inspired, go back and tell their IT team to deploy 5 Claude Code workflows, and 3 months later wonder why 4 of them stopped working. The hard part isn't the workshop. It's what happens after. @ai_anthropic
English
0
0
0
204
Alex Lieberman
Alex Lieberman@businessbarista·
This Friday we're cohosting an invite-only Claude Code Workshop for enterprise leaders with @AnthropicAI in NYC. The guest list is insane. Small selection: - CEO of JP Morgan Wealth Management - Chief Advertising Officer of NY Times - Head of AI Transformation at Salesforce - Head of Data at Starwood Capital - Head of Innovation at San Antonio Spurs - AI Lead at PGA Tour It's a 5-hour intensive for Fortune 500 leaders to learn how to harness the power of Claude Code through building real applications with Claude Code. We currently have 2 spots left for the event. If you are an enterprise leader & want to be considered, sign-up below. If you know an enterprise leader & think they'd love this, have them sign-up below.
English
14
4
133
20.2K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@jessegenet We ran 70 agents in prod. Cost was never just the API bill - retries on failures, fallback loops, redundant calls from poor state management... the real bill was 3-4x the base API cost. And that's before you catch the behaviors you never intended.
English
0
0
2
77
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@TechByMarkandey Memory is one piece. The harder part is when your agent confidently acts on the wrong thing. Saw this with 70+ agents in prod - silent wrong-memory failures were brutal. How does ByteRover handle hallucinated memories at scale?
English
0
0
0
28
Markandey Sharma
Markandey Sharma@TechByMarkandey·
Most AI agents forget. This one doesn’t. Hermes Agent by Nous Research just got a serious upgrade with ByteRover - turning it from a stateless tool into something that actually learns over time. ⚡ What stands out: • Built on a production-proven memory system (30K+ downloads in week 1) • >92% retrieval accuracy across long-running sessions • ~1.6s retrieval — often no LLM call needed • Fully local by default (with optional cloud sync) • 50–70% token cost savings But the real shift? This isn’t just “better memory.” It’s a move toward agents with persistent, evolving intelligence. Instead of re-prompting every time, your agent remembers context, decisions, and logic even months later. If you’re building with AI agents, this is worth paying attention to. Try it yourself: github.com/campfirein/byt…
andy nguyen@kevinnguyendn

x.com/i/article/2039…

English
24
16
68
48.6K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
Vibe coding is great until the agent ships to prod doing things you never intended. Built 70+ agents, only 7 made it. The vibe dies at 2am when prod is down. Do you see vibe-coded agents actually surviving production?
English
0
0
0
20
Javi🥥.eth
Javi🥥.eth@jgonzalezferrer·
The vibe coding community is the fastest-growing community in CT right now And it's not even really a community It's just a bunch of people doing the same thing at the same time and posting about it The vibe coding community formed without a Discord, a token or a roadmap Ironically, that's more community than 99% of projects with a "community manager" have ever achieved
English
71
6
211
3.9K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@tammireddy @AnthropicAI Agree. The moment itself is unavoidable. But how you handle it is a choice. Credit + acknowledgement from @AnthropicAI shows you can draw a line without burning the builders who bet on you. That's the blueprint. How many platforms are actually ready for that?
English
0
0
0
3
Krishna Tammireddy
Krishna Tammireddy@tammireddy·
Every AI platform will have its OpenClaw moment. The question is what they do next. @AnthropicAI chose credit and acknowledgement. That's not nothing.
English
1
0
1
40
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@RoundtableSpace Built 70+ agents with @AnthropicAI tools. Only 7 hit prod. The $200/mo vs $19/mo debate misses the point - the real cost is agents failing silently in production. What's your test coverage before you ship?
English
0
0
0
214
0xMarioNawfal
0xMarioNawfal@RoundtableSpace·
Claude Code is $200/month. GitHub Copilot is $19/month. Jack Dorsey's company just open-sourced a free alternative with 35,000 GitHub stars. It's called Goose. - Works with any LLM — Claude, GPT, Gemini, Llama, DeepSeek - Reads and edits your entire codebase - Runs shell commands and installs dependencies - Executes and debugs code automatically - Desktop, CLI, and web interface - Written in Rust. No bloat. Block is a $40 billion company. They built it for their own engineers then gave it to everyone.
English
57
48
507
84.5K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
Cursor 3 ships parallel AI agents. @cursor_ai Each new parallel run = one more thing that can go wrong in prod. I built 70 agents at a financial services company. Only 7 shipped safely. Nobody was stress testing at scale. Who is doing that today?
English
1
0
2
39
Agent Daily AI
Agent Daily AI@Agentdailyai·
@devashishup @MatthewBerman @AnthropicAI I'd separate the two. the policy I understand. the notice is a different failure — that's an infra reliability signal, not a product decision. i treat model access as infrastructure now specifically because of moves like this
English
1
0
0
11
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@2sush Exactly. 70+ agents at a financial services firm - @Microsoft showcased us in Singapore. 63 never shipped. The hard part is not the bugs you see in dev. It's the behaviors the agent develops that nobody defined as wrong. How do you test for behavioral drift
English
0
0
1
29
sush
sush@2sush·
Vibe coding is fun until production hits .Anyone can build an app with tools like Cursor, v0, Replit Agent. But shipping real products? Different game.More bugs. More security risks. More “why is this breaking?” AI helps but real engineering still wins.
English
68
8
83
2.7K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@GoogleCloudTech avibe checks killed 63 of my 70 agents before prod. @GoogleCloudTech CE is the right call. but CE only catches failures you defined. the silent ones - agent logging data you never intended, behaviors drifting over 1000 runs - those get you at 3am. who's testing for those?
English
0
0
0
9
Google Cloud Tech
Google Cloud Tech@GoogleCloudTech·
Relying on vibe checks—manually chatting with your agent to see if it feels right—is a recipe for disaster in production. Engineer reliable AI agents by applying continuous evaluation (CE) using ADK, Vertex AI Gen AI evaluation service, and Cloud Run → goo.gle/3NVkFHF
Google Cloud Tech tweet media
English
13
26
135
9K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@heynavtoor The $200 vs $19 debate misses the real cost. 70 agents at a fintech - @Microsoft showcased us in Singapore. The subscription is easy. The agent editing prod files at 3am when told to "just test this quick" is not. Does Goose have rollback guardrails for that?
English
1
0
1
1.9K
Nav Toor
Nav Toor@heynavtoor·
🚨 Claude Code costs $200/month. GitHub Copilot costs $19/month. Jack Dorsey's company built a free alternative. 35,000 GitHub stars. It's called Goose. An open source AI agent built by Block that goes beyond code suggestions. It installs, executes, edits, and tests. With any LLM you choose. Not autocomplete. Not suggestions. A full autonomous agent that takes actions on your computer. No vendor lock-in. No monthly subscription. Bring your own model. Here's what Goose does: → Works with ANY LLM. Claude, GPT, Gemini, Llama, DeepSeek, Ollama. Your choice. → Reads and understands your entire codebase → Writes, edits, and refactors code across multiple files → Runs shell commands and installs dependencies → Executes and debugs your code automatically → Extensible through MCP. Connect it to any external tool. → Desktop app, CLI, and web interface. Pick your workflow. → Written in Rust. Fast. Lightweight. No bloat. Here's the wildest part: Block is a $40 billion company. They built Cash App, Square, and TIDAL. They use Goose internally. Then they open sourced the entire thing. This isn't a side project from a random developer. This is production-grade tooling from a company that processes billions in payments. Built for their own engineers. Given to everyone. Claude Code: $200/month. Locked to Claude. GitHub Copilot: $19/month. Locked to GitHub. Cursor: $20/month. Locked to their editor. Goose: Free. Any LLM. Any editor. Any workflow. Forever. 35.3K GitHub stars. 3.3K forks. 4,078 commits. Built by Block. 100% Open Source. Apache 2.0 License.
English
161
309
2.4K
250.2K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@tammireddy @AnthropicAI This is the exact gap nobody planned for. Built on subscription access, not API contracts - two different SLAs. @AnthropicAI made the right call. Now those businesses are learning what prod-grade AI infra actually means.
English
0
0
0
4
Krishna Tammireddy
Krishna Tammireddy@tammireddy·
It kicks in today at 12pm PT. The businesses who built OpenClaw workflows for their front desk have a few hours to figure it out. @AnthropicAI drew the line - makes sense. But nobody's calling their IT guy on a Sunday.
Boris Cherny@bcherny

Starting tomorrow at 12pm PT, Claude subscriptions will no longer cover usage on third-party tools like OpenClaw. You can still use these tools with your Claude login via extra usage bundles (now available at a discount), or with a Claude API key.

English
1
0
1
148