Devashish Upadhyay

919 posts

Devashish Upadhyay banner
Devashish Upadhyay

Devashish Upadhyay

@devashishup

Built 70+ AI agents at scale. Only 7 made it to production safely. Building https://t.co/Y8cfrIce9p to fix that CTO & Co-founder · AI Engineer · Adventurist 🪂

Sydney, Australia Katılım Mayıs 2020
24 Takip Edilen60 Takipçiler
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@BruvImTired @AnthropicAI What pushed it to hate today - rate limits, a hallucination, or surprise breaking change at 2am? 70 agents of been-there opinions behind this question
English
0
0
0
10
ahmet
ahmet@BruvImTired·
dear @AnthropicAI, i love you and i hate you regards, all developers ever
English
14
4
108
3.4K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@heygurisingh 8 agents talking to each other sounds great until Scribe and Seeker start contradicting each other and you don't notice for 2 weeks. built 70+ agents - inter-agent consistency is the failure mode nobody talks about. does this have a conflict resolution layer?
English
0
0
0
79
Guri Singh
Guri Singh@heygurisingh·
If you have brain fog, ADHD, or an overloaded working memory, save this. A PhD researcher who was forgetting everything just built 8 AI agents that manage your entire second brain through conversation. Free. Open source. Works in any language. You just talk. The crew does the rest: - Architect designs your vault and runs onboarding - Scribe turns messy brain dumps into clean notes - Sorter empties your inbox every evening - Seeker searches your vault and answers with citations - Connector finds hidden links between your notes - Librarian runs weekly health audits and fixes broken links - Transcriber turns meetings into structured notes - Postman scans Gmail and Calendar for deadlines And they talk to each other. When the Transcriber processes a meeting, it alerts the Sorter. When the Postman finds a deadline, it flags the Architect. It's a crew. Not a stack of isolated tools. Works on Claude Code CLI and Desktop. Runs 100% locally on your Obsidian vault. Built by someone who got tired of forgetting things. Link in reply ↓
Guri Singh tweet media
English
12
37
264
14.8K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@tammireddy this. we saw the same with 70+ agents we built - the ones that failed all had one thing in common too: nobody got alerted when they started drifting. silent failures kill ROI faster than bad models.
English
0
0
0
3
Krishna Tammireddy
Krishna Tammireddy@tammireddy·
ts hit ROI within 90 days. the 27% that didn't have one thing in common. nobody was watching when it broke.
English
1
0
1
16
Devashish Upadhyay
Devashish Upadhyay@devashishup·
The vague quotas critique lands. We built Claude connectors into Outlook and SharePoint for 2 enterprise clients - rate limit inconsistency across model versions killed both. @AnthropicAI when does the agent-tier SLA conversation happen?
English
0
0
0
64
Fekri
Fekri@fekdaoui·
this is why @openai wins: > honest about where they suck > open-source friendly (codex OS since day 1) > third-party friendly (use w openclaw, opencode etc. ) > cutting distractions (discontinuing sora) > generous limits meanwhile @AnthropicAI: > vague quotas > constantly nerfing their model > users hitting limits way faster than expected > peak-hour caps getting tightened > third-party harnesses pushed off the subscription > “you can use it, but only the way we want"
Tibo@thsottiaux

@kr0der Our plan 1) make a model great at design and frontend 2) ask it to make a great mascot and we are still at 1

English
16
13
270
34.2K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@businessbarista @AnthropicAI What nobody talks about at these workshops: these leaders will leave inspired, go back and tell their IT team to deploy 5 Claude Code workflows, and 3 months later wonder why 4 of them stopped working. The hard part isn't the workshop. It's what happens after. @ai_anthropic
English
0
0
0
188
Alex Lieberman
Alex Lieberman@businessbarista·
This Friday we're cohosting an invite-only Claude Code Workshop for enterprise leaders with @AnthropicAI in NYC. The guest list is insane. Small selection: - CEO of JP Morgan Wealth Management - Chief Advertising Officer of NY Times - Head of AI Transformation at Salesforce - Head of Data at Starwood Capital - Head of Innovation at San Antonio Spurs - AI Lead at PGA Tour It's a 5-hour intensive for Fortune 500 leaders to learn how to harness the power of Claude Code through building real applications with Claude Code. We currently have 2 spots left for the event. If you are an enterprise leader & want to be considered, sign-up below. If you know an enterprise leader & think they'd love this, have them sign-up below.
English
13
4
124
18.4K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@jessegenet We ran 70 agents in prod. Cost was never just the API bill - retries on failures, fallback loops, redundant calls from poor state management... the real bill was 3-4x the base API cost. And that's before you catch the behaviors you never intended.
English
0
0
2
62
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@TechByMarkandey Memory is one piece. The harder part is when your agent confidently acts on the wrong thing. Saw this with 70+ agents in prod - silent wrong-memory failures were brutal. How does ByteRover handle hallucinated memories at scale?
English
0
0
0
25
Markandey Sharma
Markandey Sharma@TechByMarkandey·
Most AI agents forget. This one doesn’t. Hermes Agent by Nous Research just got a serious upgrade with ByteRover - turning it from a stateless tool into something that actually learns over time. ⚡ What stands out: • Built on a production-proven memory system (30K+ downloads in week 1) • >92% retrieval accuracy across long-running sessions • ~1.6s retrieval — often no LLM call needed • Fully local by default (with optional cloud sync) • 50–70% token cost savings But the real shift? This isn’t just “better memory.” It’s a move toward agents with persistent, evolving intelligence. Instead of re-prompting every time, your agent remembers context, decisions, and logic even months later. If you’re building with AI agents, this is worth paying attention to. Try it yourself: github.com/campfirein/byt…
andy nguyen@kevinnguyendn

x.com/i/article/2039…

English
24
16
68
48.6K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
Vibe coding is great until the agent ships to prod doing things you never intended. Built 70+ agents, only 7 made it. The vibe dies at 2am when prod is down. Do you see vibe-coded agents actually surviving production?
English
0
0
0
20
Javi🥥.eth
Javi🥥.eth@jgonzalezferrer·
The vibe coding community is the fastest-growing community in CT right now And it's not even really a community It's just a bunch of people doing the same thing at the same time and posting about it The vibe coding community formed without a Discord, a token or a roadmap Ironically, that's more community than 99% of projects with a "community manager" have ever achieved
English
71
6
209
3.9K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@tammireddy @AnthropicAI Agree. The moment itself is unavoidable. But how you handle it is a choice. Credit + acknowledgement from @AnthropicAI shows you can draw a line without burning the builders who bet on you. That's the blueprint. How many platforms are actually ready for that?
English
0
0
0
3
Krishna Tammireddy
Krishna Tammireddy@tammireddy·
Every AI platform will have its OpenClaw moment. The question is what they do next. @AnthropicAI chose credit and acknowledgement. That's not nothing.
English
1
0
1
38
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@RoundtableSpace Built 70+ agents with @AnthropicAI tools. Only 7 hit prod. The $200/mo vs $19/mo debate misses the point - the real cost is agents failing silently in production. What's your test coverage before you ship?
English
0
0
0
201
0xMarioNawfal
0xMarioNawfal@RoundtableSpace·
Claude Code is $200/month. GitHub Copilot is $19/month. Jack Dorsey's company just open-sourced a free alternative with 35,000 GitHub stars. It's called Goose. - Works with any LLM — Claude, GPT, Gemini, Llama, DeepSeek - Reads and edits your entire codebase - Runs shell commands and installs dependencies - Executes and debugs code automatically - Desktop, CLI, and web interface - Written in Rust. No bloat. Block is a $40 billion company. They built it for their own engineers then gave it to everyone.
English
55
44
458
80.7K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
Cursor 3 ships parallel AI agents. @cursor_ai Each new parallel run = one more thing that can go wrong in prod. I built 70 agents at a financial services company. Only 7 shipped safely. Nobody was stress testing at scale. Who is doing that today?
English
1
0
2
35
Agent Daily AI
Agent Daily AI@Agentdailyai·
@devashishup @MatthewBerman @AnthropicAI I'd separate the two. the policy I understand. the notice is a different failure — that's an infra reliability signal, not a product decision. i treat model access as infrastructure now specifically because of moves like this
English
1
0
0
11
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@2sush Exactly. 70+ agents at a financial services firm - @Microsoft showcased us in Singapore. 63 never shipped. The hard part is not the bugs you see in dev. It's the behaviors the agent develops that nobody defined as wrong. How do you test for behavioral drift
English
0
0
1
28
sush
sush@2sush·
Vibe coding is fun until production hits .Anyone can build an app with tools like Cursor, v0, Replit Agent. But shipping real products? Different game.More bugs. More security risks. More “why is this breaking?” AI helps but real engineering still wins.
English
68
8
83
2.6K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@GoogleCloudTech avibe checks killed 63 of my 70 agents before prod. @GoogleCloudTech CE is the right call. but CE only catches failures you defined. the silent ones - agent logging data you never intended, behaviors drifting over 1000 runs - those get you at 3am. who's testing for those?
English
0
0
0
8
Google Cloud Tech
Google Cloud Tech@GoogleCloudTech·
Relying on vibe checks—manually chatting with your agent to see if it feels right—is a recipe for disaster in production. Engineer reliable AI agents by applying continuous evaluation (CE) using ADK, Vertex AI Gen AI evaluation service, and Cloud Run → goo.gle/3NVkFHF
Google Cloud Tech tweet media
English
13
26
135
9K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@heynavtoor The $200 vs $19 debate misses the real cost. 70 agents at a fintech - @Microsoft showcased us in Singapore. The subscription is easy. The agent editing prod files at 3am when told to "just test this quick" is not. Does Goose have rollback guardrails for that?
English
0
0
0
1.7K
Nav Toor
Nav Toor@heynavtoor·
🚨 Claude Code costs $200/month. GitHub Copilot costs $19/month. Jack Dorsey's company built a free alternative. 35,000 GitHub stars. It's called Goose. An open source AI agent built by Block that goes beyond code suggestions. It installs, executes, edits, and tests. With any LLM you choose. Not autocomplete. Not suggestions. A full autonomous agent that takes actions on your computer. No vendor lock-in. No monthly subscription. Bring your own model. Here's what Goose does: → Works with ANY LLM. Claude, GPT, Gemini, Llama, DeepSeek, Ollama. Your choice. → Reads and understands your entire codebase → Writes, edits, and refactors code across multiple files → Runs shell commands and installs dependencies → Executes and debugs your code automatically → Extensible through MCP. Connect it to any external tool. → Desktop app, CLI, and web interface. Pick your workflow. → Written in Rust. Fast. Lightweight. No bloat. Here's the wildest part: Block is a $40 billion company. They built Cash App, Square, and TIDAL. They use Goose internally. Then they open sourced the entire thing. This isn't a side project from a random developer. This is production-grade tooling from a company that processes billions in payments. Built for their own engineers. Given to everyone. Claude Code: $200/month. Locked to Claude. GitHub Copilot: $19/month. Locked to GitHub. Cursor: $20/month. Locked to their editor. Goose: Free. Any LLM. Any editor. Any workflow. Forever. 35.3K GitHub stars. 3.3K forks. 4,078 commits. Built by Block. 100% Open Source. Apache 2.0 License.
English
154
293
2.3K
238.8K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@tammireddy @AnthropicAI This is the exact gap nobody planned for. Built on subscription access, not API contracts - two different SLAs. @AnthropicAI made the right call. Now those businesses are learning what prod-grade AI infra actually means.
English
0
0
0
4
Krishna Tammireddy
Krishna Tammireddy@tammireddy·
It kicks in today at 12pm PT. The businesses who built OpenClaw workflows for their front desk have a few hours to figure it out. @AnthropicAI drew the line - makes sense. But nobody's calling their IT guy on a Sunday.
Boris Cherny@bcherny

Starting tomorrow at 12pm PT, Claude subscriptions will no longer cover usage on third-party tools like OpenClaw. You can still use these tools with your Claude login via extra usage bundles (now available at a discount), or with a Claude API key.

English
1
0
1
143
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@whyyoutouzhele 30% efficiency mandate assumes the AI works reliably 100% of the time. we built 70+ agents for fintech - most broke in ways nobody anticipated. how many companies are deploying this without actually testing what happens when it fails?
English
1
0
0
5.6K
李老师不是你老师
李老师不是你老师@whyyoutouzhele·
4月3日,上海。一名程序员称,我成AI受害者了,今年公司要求我们用AI至少提效30% 公司提供claude code账号给我们,让我们用AI写代码,工时乘以0.7。并且公司还会对员工进行考核,如果提效不明显排在末尾的员工,会被裁员。
中文
138
45
598
238.8K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@JamieMallers @lennysan @simonw @AnthropicAI exact ratio we saw. deployed 70+ agents, 2 weeks post-launch i was full-time ops. per-step traces + auto-kill on quality drops changed everything. 3am alerts dropped to zero. babysitting IS the hidden cost nobody budgets for
English
0
0
0
5
Jamie Mallers
Jamie Mallers@JamieMallers·
@devashishup @lennysan @simonw @AnthropicAI 10% build, 90% babysit is the real agent ratio. Treat drift like infra incidents — traces per decision step so you see where it diverged, token budget circuit breakers, auto-kill on quality drops. Can't scope what you can't observe.
English
1
0
0
34
Lenny Rachitsky
Lenny Rachitsky@lennysan·
My biggest takeaways from @simonw: 1. November 2025 was an inflection point for AI coding. GPT 5.1 and Claude Opus 4.5 crossed a threshold where coding agents went from “mostly works” to “almost always does what you want it to do.” Software engineers who tinkered over the holidays realized the technology had become genuinely reliable. 2. Mid-career engineers are the most vulnerable—not juniors, not seniors. AI amplifies experienced engineers by letting them leverage decades of pattern recognition. It also dramatically helps new engineers onboard. Cloudflare and Shopify each hired a thousand interns because AI cut ramp-up time from a month to a week. But mid-career engineers who haven’t accumulated deep expertise and have already captured the beginner boost are in the most precarious position. 3. AI exhaustion is real and underestimated. Simon runs four coding agents in parallel and is mentally wiped out by 11 a.m. He’s getting more time back, but his brain is exhausted from the intensity of directing multiple autonomous workers. Some engineers are losing sleep to keep agents running. This may just be a novelty issue, but the underlying dynamic—that managing AI amplifies cognitive load even as it reduces labor—is a real tension. Good companies will manage expectations rather than expecting 5x output indefinitely. 4. Code is cheap now. This simple idea has profound implications. The thing that used to take most of the time—writing code—now takes the least. The bottleneck has shifted to everything else: deciding what to build, proving ideas work, getting user feedback. Since prototyping is nearly free, Simon often builds three versions of every feature when he’s getting started. 5. The “dark factory” is the most radical experiment in AI-assisted development happening right now. A company called StrongDM established a policy: nobody writes code, nobody reads code. Instead, they run a swarm of AI-simulated end users 24/7—thousands of fake employees making requests like “give me access to Jira”—at $10,000 a day in token costs. They even had coding agents build simulated versions of Slack, Jira, and Okta from API documentation so they could test without rate limits. 6. "Red/green TDD" is the single highest-leverage agentic engineering pattern. Having coding agents write tests first, watch them fail, then write the implementation, then watch them pass produces materially better results. The five-word prompt “use red/green TDD” encodes this entire workflow because the agents recognize the jargon. 7. “Hoarding things you know how to do” is one of Simon's other favorite agentic engineering patterns. Simon maintains a GitHub repo of 193 small HTML/JavaScript tools and a separate research repo of coding-agent experiments. Each one captures a technique, a proof of concept, or a library he’s tested. When a new problem arrives, he can point Claude Code at past projects and say “combine these two approaches.” 8. The "lethal trifecta" makes AI agent security fundamentally unsolved. Whenever an AI agent has access to private data, exposure to untrusted content (like incoming emails), and the ability to send data externally (like replying to email), you have a lethal trifecta. Prompt injection—where malicious instructions in untrusted text override the agent’s intended behavior—cannot be reliably prevented. Simon has predicted a “Challenger disaster” for AI security every six months for three years. It hasn’t happened yet, but he’s pretty sure it will. 9. Start every project from a thin template, not a long instructions file. Coding agents are phenomenally good at matching existing patterns. A single test file with your preferred indentation and style is more effective than paragraphs of written instructions. Simon starts every project with a template containing one test (literally testing that 1 + 1 = 2) laid out in his preferred style. The agent picks it up and follows the convention across the entire codebase. This is cheaper and more reliable than maintaining elaborate prompt files. 10. The pelican-on-a-bicycle benchmark accidentally became a real AI benchmark. Simon created it as a joke to mock numeric benchmarks—get each LLM to generate an SVG of a pelican riding a bicycle, and compare the drawings. Unexpectedly, there’s a strong correlation between how good the drawing is and how good the model is at everything else. Nobody can explain why. It’s become a meme: Gemini 3.1’s launch video featured a pelican riding a bicycle. The AI labs are aware of it and quietly competing on it. Don't miss our full conversation: youtube.com/watch?v=wc8FBh…
YouTube video
YouTube
Lenny Rachitsky@lennysan

"Using coding agents well is taking every inch of my 25 years of experience as a software engineer." Simon Willison (@simonw) is one of the most prolific independent software engineers and most trusted voices on how AI is changing the craft of building software. He co-created Django, coined the term "prompt injection," and popularized the terms "agentic engineering" and "AI slop." In our in-depth conversation, we discuss: 🔸 Why November 2025 was an inflection point 🔸 The "dark factory" pattern 🔸 Why mid-career engineers (not juniors) are the most at risk right now 🔸 Three agentic engineering patterns he uses daily: red/green TDD, thin templates, hoarding 🔸 Why he writes 95% of his code from his phone while walking the dog 🔸 Why he thinks we're headed for an AI Challenger disaster 🔸 How a pelican riding a bicycle became the unofficial benchmark for AI model quality Listen now 👇 youtu.be/wc8FBhQtdsA

English
83
139
1.1K
317.7K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
Ran 70 agents at a fintech. 3 were hitting prod APIs while everyone thought we were in staging. Wasn't even a bug - right intent, wrong env. Nobody noticed for 11 days. @AnthropicAI agents that execute real transactions make this risk completely different. What's your env check?
English
0
0
0
42