Devashish Upadhyay

949 posts

Devashish Upadhyay

@devashishup

Built 70+ AI agents at scale. Only 7 made it to production safely. Building https://t.co/Y8cfrIce9p to fix that CTO & Co-founder · AI Engineer · Adventurist 🪂

Sydney, Australia शामिल हुए Mayıs 2020

24 फ़ॉलोइंग61 फ़ॉलोवर्स

Devashish Upadhyay@devashishup·2h

@bridgemindai cutting off third party harnesses forces everyone native to Claude Code. ironically that makes agent behavior more observable. @AnthropicAI didn't just change access - they changed the testing surface. curious what your rate limit data looks like by 5pm today

English

536

BridgeMind@bridgemindai·3h

First weekday on Claude Code since Anthropic cut off OpenClaw and third party harnesses. 5% session usage. 23% weekly. Claude Opus 4.6 on v2.1.92. So far so good. The real test is today. Weekends are easy. Peak hours on a Monday will tell us if the rate limits are actually fixed or if Anthropic just got lucky with lower traffic. I'll be coding all day and reporting back. If I hit 100% in an hour like before, you'll hear about it. Stay tuned.

English

191

12.4K

Devashish Upadhyay@devashishup·2h

@thsottiaux OpenClaw + GPT-5.4 is cooking. but model upgrades silently shift agent behavior in ways nobody specced for. regression testing after each update is still the unsolved part. who's actually doing that at scale? @OpenAI @openclaw

English

545

Tibo@thsottiaux·3h

OpenClaw is now really good with GPT-5.4. Peter and team cooked

Sujeeth@xsujeeth

the last time I was this hyped for a release, it was GoT

English

653

33.4K

Devashish Upadhyay@devashishup·2h

@VaibhavSisinty The real story isn't @openclaw vs @AnthropicAI. It's that enterprise teams building on a single AI provider with zero fallback planning just watched their worst nightmare play out in real time. 3 teams asked us this exact question last week.

English

526

Vaibhav Sisinty@VaibhavSisinty·7h

Anthropic should be scared. 😨 103 people just shipped what their entire engineering team hasn't. And the most ironic part? OpenClaw runs on Claude. Anthropic powers the very tool that's outbuilding them. The same week they restricted OpenClaw users from accessing Claude and capped usage limits, the OpenClaw community dropped v2026.4.5 and it's not even close. Here's what they have shipped: → Your agent now has a sleep cycle: Most AI agents forget everything the moment you close the chat. OpenClaw's new memory system works like human sleep it runs in the background, processes what you talked about, and moves the important stuff into permanent memory. Next time you open it, it already knows you. Your projects. Your preferences. Your context. You never have to repeat yourself again. → Your agent can now create, not just answer: Until now your agent could think and respond. Now it can make things. Ask it to generate a video, drop a music track, or build a graphic right inside the conversation. No switching apps. No copy pasting between tools. Just ask and it ships. → Your API bill quietly got cheaper: Every time you send a follow-up message, OpenClaw now recognises what it already processed and skips re-reading it. Less work per message means lower cost per message. You didn't change anything. Your bill just went down. → Security got tightened everywhere: If something breaks, it now shuts down safely instead of leaving a door open. Browser vulnerabilities get caught earlier. App permissions are properly locked. The more your agent does on the internet, the more this matters. → All your messaging apps now actually work: Telegram, WhatsApp, Discord, Slack and 20 more channels. Voice notes, threaded replies, reconnect loops the features that were half-broken and quietly frustrating are now fully fixed. → OpenClaw is now global: Chinese, Japanese, Korean, Spanish, French, German, Portuguese and 5 more languages added. The product just became accessible to billions of people who couldn't use it properly before. Anthropic has billions in funding, hundreds of engineers, and the most powerful model in the world. OpenClaw has 103 contributors and a GitHub repo. And right now, OpenClaw is winning the product war. The people you restrict have a funny way of building around you.

OpenClaw🦞@openclaw

OpenClaw 2026.4.5 🦞 🎬 Built-in video + music generation 🧠 /dreaming is now real 🔀 Structured task progress ⚡ Better prompt-cache reuse 🌍 Control UI + Docs now speak 12 more languages Anthropic cut us off. GPT-5.4 got better. We moved on. github.com/openclaw/openc…

English

614

145.4K

Devashish Upadhyay@devashishup·2h

@tammireddy the ones who moved fastest feel it hardest. the infra debt compounds quietly until it doesn't. been there with 70 agents

English

Krishna Tammireddy@tammireddy·5h

adoption spikes before infrastructure does. you feel the ceiling exactly when you're deepest in the problem.

Theo - t3.gg@theo

Claude Code is basically unusable at this point. I give up.

English

Devashish Upadhyay@devashishup·3h

YC just cut an AI compliance startup over fake compliance claims. @ycombinator Not because the AI was bad. Because no one could prove it wasn't. I ran 70+ agents at a finserv firm. 7 reached prod. Every failure was the same gap - no way to verify what the agent actually did.

English

Devashish Upadhyay@devashishup·3h

@twostraws This is why you can't test them the same way. Ran 70+ agents at a financial firm - @OpenAI Codex and @AnthropicAI Claude had completely different failure modes under edge load. Baseline testing misses all of it.

English

6.3K

Paul Hudson@twostraws·5h

I've been flipping between Codex and Claude a lot these last two weeks, and if it's taught me anything it's this: these two tools are almost nothing alike. I had naively assumed they would be vaguely similar, but nope – once you push them hard they diverge fast.

English

623

183.5K

Devashish Upadhyay@devashishup·4h

@murtuza_merc @encodeclub 300 builders at @encodeclub London - that's a real signal. The gap I always notice: demo agents vs agents in production is a 10x problem. We built 70+ and only 7 made it. Ideas rarely predict whether an agent survives real usage.

English

Murtuza J Merchant@murtuza_merc·7h

Spent the last weekend londonmaxxing at the @encodeclub AI London 2026 Hackathon. Had a blast judging 300+ builders as they turned Shoreditch into the center of the AI world. The sheer quality of projects, from bio-AI agents to accessibility tools,was mind-blowing. The future is being built right now in London. Congrats to the winners: 🥇 ChemTrace 🥈 RangerAI 🥉 Genomebook & SignQuest Shoutout to the Encode team for the vibes (and the rooftop sun). See you at the next one!

English

162

17K

Devashish Upadhyay@devashishup·4h

@itsolelehmann $150-250/day is the real cost of agentic workflows nobody talks about. Curious - is that spike from longer context windows or more loops per task? We see this in enterprise: token cost isn't linear, it compounds per agent hop.

English

Devashish Upadhyay@devashishup·4h

@_Qubic_ Multi-agent collab on pathology slides is wild. Real question: what happens when 3 agents disagree on the same slide? Built 70+ agents - the hardest part was never the AI, it was knowing when to trust it.

English

215

Qubic@_Qubic_·5h

AI just diagnosed cancer. Without a single human telling it where to look. A team of specialized AI agents analyzed gigapixel pathology slides on their own. No manual guidance. No selected regions. Just agents collaborating to find what matters. This is what multi-agent AI looks like in the real world. 🧵

English

100

501

18.7K

Devashish Upadhyay@devashishup·5h

@bekacru @better Smart. The next step is applying this same scrutiny to AI agents themselves - what they're installing, calling, or modifying vs what you told them to. Agent supply chain risk is real and almost nobody's testing for it yet.

English

1.1K

Beka@bekacru·8h

better-npm. Every npm package with 50k+ weekly downloads gets analyzed by AI and static analysis before it hits your node_modules - prevents typo squatting - blocklist pkgs you don't want agents installing - open source one cmd: ~ npx @better-npm/cli enjoy!

English

565

32.6K

Devashish Upadhyay@devashishup·6h

@MerrynSW @FT 70 agents, 7 made it to prod. Hallucinations were the least of it - most failed on edge cases that basic eval never caught. The gap isn't the model, it's what happens when it meets real data. @AnthropicAI

English

530

Merryn Somerset Webb@MerrynSW·8h

Biggest concern of 800k Claude users? Not being replaced by AI but it's endless propensity to proper mistakes. "The hallucinations were a disaster. I lost so many hours of work" says one entrepreneur. @FT today.

Merryn Somerset Webb@MerrynSW

What if the whole LLM thing is a false start? If the flaws are inherent systemic problems - if the compounding of hallucinations/errors can't be sorted out? If the capex build out is one of the biggest misallocations of capital ever? Then what? bloomberg.com/news/newslette…

English

404

40.2K

Devashish Upadhyay@devashishup·7h

@om_patel5 The browser click solves selector ambiguity but not intent ambiguity. We built 70+ agents - the hard part is never the tool, it's whether the agent correctly infers what you meant. Testing that inference gap is the unsolved problem in vibe coding.

English

1.4K

Om Patel@om_patel5·10h

THIS GUY ADDED A LIVE BROWSER TO CLAUDE CODE SO YOU CAN CLICK ANY ELEMENT AND EDIT IT INSTANTLY biggest issue with vibe coding UI is that you have to describe what you want to change. if you prompt it the wrong selector or wrong component, Claude can't find it. now you just click it. your app runs in an embedded browser with Claude Code. you can click any button, any text, any div. Claude instantly knows exactly what you're pointing at. click. instruct. done. no more "change the button in the top right corner of the second card component." all you have to do now is just click the button. AND its open source

English

967

77K

Devashish Upadhyay@devashishup·9h

@tammireddy Exactly. We ran 70+ agents. 7 made prod. The ones that failed weren't missing access - they were missing someone to say 'ok, go live.' Deployment is the introduction.

English

Krishna Tammireddy@tammireddy·13h

the plumber using it doubled his bookings. the one next door still answers every call himself. same town. same AGI. different introduction.

Marc Andreessen 🇺🇸@pmarca

I'm calling it. AGI is already here – it's just not evenly distributed yet.

English

Devashish Upadhyay@devashishup·9h

@Anunirva777 @DataChaz @addyosmani @GoogleAI both. edge cases are usually the inputs that don't match the happy path - and user data is where that shows up. we saw 3 of our 70 agents hit this: 98% accuracy in testing, then a specific field format in prod would trigger a loop no one caught

English

Anunirva@777@Anunirva777·15h

@devashishup @DataChaz @addyosmani @GoogleAI Can you explain more? Are you talking about user data / edge cases ?

English

Charly Wargnier@DataChaz·2d

🚨 You need to see this. @addyosmani from Google just dropped his new Agent Skills and it's incredible. It brings 19 engineering skills + 7 commands to AI coding agents, all inspired by Google best practices 🤯 AI coding agents are powerful, but left alone, they take shortcuts. They skip specs, tests, and security reviews, optimizing for "done" over "correct." Addy built this to fix that. Each skill encodes the workflows and quality gates that senior engineers actually use: spec before code, test before merge, measure before optimize. The full lifecycle is covered: → Define - refine ideas, write specs before a single line of code → Plan - decompose into small, verifiable tasks → Build - incremental implementation, context engineering, clean API design → Verify - TDD, browser testing with DevTools, systematic debugging → Review - code quality, security hardening, performance optimization → Ship - git workflow, CI/CD, ADRs, pre-launch checklists Features 7 slash commands: (/spec, /plan, /build, /test, /review, /code-simplify, /ship) that map to this lifecycle. It works with: ✦ Claude Code ✦ Cursor ✦ Antigravity ✦ ... and any agent accepting Markdown. Baking in Google-tier engineering culture (Shift Left, Chesterton's Fence, Hyrum's Law) directly into your agent's step-by-step workflow! `npx skills add addyosmani/agent-skills` Free and open-source. Repo link in 🧵↓

English

323

2.4K

342.6K

Devashish Upadhyay@devashishup·9h

@beffjezos The danger isn't losing the weekend, it's what your vibe-coded agent does after you ship it and sleep. Most won't notice until Monday when it's been running unchecked for 72 hours with no behavioral guardrails.

English

Beff (e/acc)@beffjezos·13h

*starts vibe coding Friday night* *blinks* Oh fuck it's Sunday night.

English

164

6.5K

Devashish Upadhyay@devashishup·9h

@openclaw Unpopular opinion: if a provider swap breaks your agent, the problem isn't @AnthropicAI or @OpenAI - it's that you have no test coverage across providers. We saw this kill 3 enterprise deployments in finserv. The model changed, the agent broke, nobody knew until prod.

English

3.3K

OpenClaw🦞@openclaw·13h

English

357

614

6.3K

1.1M

Devashish Upadhyay@devashishup·9h

@0G_labs Ngl this hit. Built 70+ agents in finserv - most failures were infrastructure, not logic. No observability, silent drifts, no way to know when behavior changed mid-flight. The algorithm worked in dev, broke in prod. Every time.

English

0G Labs (Home of Infinite AI)@0G_labs·10h

AI agents don't fail at the algorithm level. They fail at the infrastructure level. Bad storage. Slow compute. No verifiability. 0G solves all three: chain + storage + DA + compute — purpose-built for the agentic economy. What are you building on it?

English

244

52.8K

Devashish Upadhyay@devashishup·11h

97M @AnthropicAI MCP installs. Foundational infra now. One question: who's actually testing what agents do once they're running on it? Built 70+ of these - the answer is almost no one.

English

Devashish Upadhyay@devashishup·11h

@MLBear2 Curious how you're handling failure recovery - state management + unexpected tool responses is where most agents break in prod. Most teams build with @AnthropicAI SDK and never test those edge cases before shipping.

English

ML_Bear@MLBear2·16h

最近 Claude Agent SDK の仕様を少し調べてたので、自分用のメモをまとめてZenn Bookとして公開しました😇 スレッドに載せている旅行プランナー動画のような対話型AIエージェントが簡単に開発できると思います。お暇な時にでもどうぞ！(間違いあれば教えてください🙏) zenn.dev/ml_bear/books/…

日本語

273

19K

Devashish Upadhyay@devashishup·11h

@MarioNawfal API upgrade is nice. But most teams will still ship agents that break on live data within weeks. Better tooling isn't the bottleneck. Testing what the agent actually does in prod is.

English

Mario Nawfal@MarioNawfal·15h

🚨 GAME-CHANGER for AI builders & agents The 𝕏 API just got a MASSIVE update: - Pay-Per-Use: No more monthly tiers, only pay for what you actually use - Native XMCP + Xurl: Your AI agents can now read real-time context and take actions straight on 𝕏 - Official Python & TypeScript SDKs: Ship 10x faster - Free API Playground: Safe, realistic testing before you go live @X, @elonmusk

Elon Musk@elonmusk

Upgrades to our API

English

136

60.7K

खोजें

@bridgemindai @AnthropicAI @thsottiaux @OpenAI @openclaw @VaibhavSisinty @tammireddy @ycombinator