Devashish Upadhyay

949 posts

Devashish Upadhyay banner
Devashish Upadhyay

Devashish Upadhyay

@devashishup

Built 70+ AI agents at scale. Only 7 made it to production safely. Building https://t.co/Y8cfrIce9p to fix that CTO & Co-founder · AI Engineer · Adventurist 🪂

Sydney, Australia शामिल हुए Mayıs 2020
24 फ़ॉलोइंग61 फ़ॉलोवर्स
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@bridgemindai cutting off third party harnesses forces everyone native to Claude Code. ironically that makes agent behavior more observable. @AnthropicAI didn't just change access - they changed the testing surface. curious what your rate limit data looks like by 5pm today
English
1
0
1
536
BridgeMind
BridgeMind@bridgemindai·
First weekday on Claude Code since Anthropic cut off OpenClaw and third party harnesses. 5% session usage. 23% weekly. Claude Opus 4.6 on v2.1.92. So far so good. The real test is today. Weekends are easy. Peak hours on a Monday will tell us if the rate limits are actually fixed or if Anthropic just got lucky with lower traffic. I'll be coding all day and reporting back. If I hit 100% in an hour like before, you'll hear about it. Stay tuned.
BridgeMind tweet media
English
34
4
191
12.4K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@thsottiaux OpenClaw + GPT-5.4 is cooking. but model upgrades silently shift agent behavior in ways nobody specced for. regression testing after each update is still the unsolved part. who's actually doing that at scale? @OpenAI @openclaw
English
0
0
1
545
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@VaibhavSisinty The real story isn't @openclaw vs @AnthropicAI. It's that enterprise teams building on a single AI provider with zero fallback planning just watched their worst nightmare play out in real time. 3 teams asked us this exact question last week.
English
0
0
0
526
Vaibhav Sisinty
Vaibhav Sisinty@VaibhavSisinty·
Anthropic should be scared. 😨 103 people just shipped what their entire engineering team hasn't. And the most ironic part? OpenClaw runs on Claude. Anthropic powers the very tool that's outbuilding them. The same week they restricted OpenClaw users from accessing Claude and capped usage limits, the OpenClaw community dropped v2026.4.5 and it's not even close. Here's what they have shipped: → Your agent now has a sleep cycle: Most AI agents forget everything the moment you close the chat. OpenClaw's new memory system works like human sleep it runs in the background, processes what you talked about, and moves the important stuff into permanent memory. Next time you open it, it already knows you. Your projects. Your preferences. Your context. You never have to repeat yourself again. → Your agent can now create, not just answer: Until now your agent could think and respond. Now it can make things. Ask it to generate a video, drop a music track, or build a graphic right inside the conversation. No switching apps. No copy pasting between tools. Just ask and it ships. → Your API bill quietly got cheaper: Every time you send a follow-up message, OpenClaw now recognises what it already processed and skips re-reading it. Less work per message means lower cost per message. You didn't change anything. Your bill just went down. → Security got tightened everywhere: If something breaks, it now shuts down safely instead of leaving a door open. Browser vulnerabilities get caught earlier. App permissions are properly locked. The more your agent does on the internet, the more this matters. → All your messaging apps now actually work: Telegram, WhatsApp, Discord, Slack and 20 more channels. Voice notes, threaded replies, reconnect loops the features that were half-broken and quietly frustrating are now fully fixed. → OpenClaw is now global: Chinese, Japanese, Korean, Spanish, French, German, Portuguese and 5 more languages added. The product just became accessible to billions of people who couldn't use it properly before. Anthropic has billions in funding, hundreds of engineers, and the most powerful model in the world. OpenClaw has 103 contributors and a GitHub repo. And right now, OpenClaw is winning the product war. The people you restrict have a funny way of building around you.
OpenClaw🦞@openclaw

OpenClaw 2026.4.5 🦞 🎬 Built-in video + music generation 🧠 /dreaming is now real 🔀 Structured task progress ⚡ Better prompt-cache reuse 🌍 Control UI + Docs now speak 12 more languages Anthropic cut us off. GPT-5.4 got better. We moved on. github.com/openclaw/openc…

English
28
35
614
145.4K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@tammireddy the ones who moved fastest feel it hardest. the infra debt compounds quietly until it doesn't. been there with 70 agents
English
1
0
0
4
Devashish Upadhyay
Devashish Upadhyay@devashishup·
YC just cut an AI compliance startup over fake compliance claims. @ycombinator Not because the AI was bad. Because no one could prove it wasn't. I ran 70+ agents at a finserv firm. 7 reached prod. Every failure was the same gap - no way to verify what the agent actually did.
English
0
0
0
24
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@twostraws This is why you can't test them the same way. Ran 70+ agents at a financial firm - @OpenAI Codex and @AnthropicAI Claude had completely different failure modes under edge load. Baseline testing misses all of it.
English
0
0
0
6.3K
Paul Hudson
Paul Hudson@twostraws·
I've been flipping between Codex and Claude a lot these last two weeks, and if it's taught me anything it's this: these two tools are almost nothing alike. I had naively assumed they would be vaguely similar, but nope – once you push them hard they diverge fast.
English
80
8
623
183.5K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@murtuza_merc @encodeclub 300 builders at @encodeclub London - that's a real signal. The gap I always notice: demo agents vs agents in production is a 10x problem. We built 70+ and only 7 made it. Ideas rarely predict whether an agent survives real usage.
English
0
0
0
10
Murtuza J Merchant
Murtuza J Merchant@murtuza_merc·
Spent the last weekend londonmaxxing at the @encodeclub AI London 2026 Hackathon. Had a blast judging 300+ builders as they turned Shoreditch into the center of the AI world. The sheer quality of projects, from bio-AI agents to accessibility tools,was mind-blowing. The future is being built right now in London. Congrats to the winners: 🥇 ChemTrace 🥈 RangerAI 🥉 Genomebook & SignQuest Shoutout to the Encode team for the vibes (and the rooftop sun). See you at the next one!
Murtuza J Merchant tweet mediaMurtuza J Merchant tweet mediaMurtuza J Merchant tweet mediaMurtuza J Merchant tweet media
English
21
20
162
17K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@itsolelehmann $150-250/day is the real cost of agentic workflows nobody talks about. Curious - is that spike from longer context windows or more loops per task? We see this in enterprise: token cost isn't linear, it compounds per agent hop.
English
0
0
0
83
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@_Qubic_ Multi-agent collab on pathology slides is wild. Real question: what happens when 3 agents disagree on the same slide? Built 70+ agents - the hardest part was never the AI, it was knowing when to trust it.
English
1
0
3
215
Qubic
Qubic@_Qubic_·
AI just diagnosed cancer. Without a single human telling it where to look. A team of specialized AI agents analyzed gigapixel pathology slides on their own. No manual guidance. No selected regions. Just agents collaborating to find what matters. This is what multi-agent AI looks like in the real world. 🧵
Qubic tweet media
English
33
100
501
18.7K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@bekacru @better Smart. The next step is applying this same scrutiny to AI agents themselves - what they're installing, calling, or modifying vs what you told them to. Agent supply chain risk is real and almost nobody's testing for it yet.
English
0
0
0
1.1K
Beka
Beka@bekacru·
better-npm. Every npm package with 50k+ weekly downloads gets analyzed by AI and static analysis before it hits your node_modules - prevents typo squatting - blocklist pkgs you don't want agents installing - open source one cmd: ~ npx @better-npm/cli enjoy!
Beka tweet media
English
31
40
565
32.6K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@MerrynSW @FT 70 agents, 7 made it to prod. Hallucinations were the least of it - most failed on edge cases that basic eval never caught. The gap isn't the model, it's what happens when it meets real data. @AnthropicAI
English
0
1
3
530
Merryn Somerset Webb
Biggest concern of 800k Claude users? Not being replaced by AI but it's endless propensity to proper mistakes. "The hallucinations were a disaster. I lost so many hours of work" says one entrepreneur. @FT today.
Merryn Somerset Webb@MerrynSW

What if the whole LLM thing is a false start? If the flaws are inherent systemic problems - if the compounding of hallucinations/errors can't be sorted out? If the capex build out is one of the biggest misallocations of capital ever? Then what? bloomberg.com/news/newslette…

English
43
46
404
40.2K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@om_patel5 The browser click solves selector ambiguity but not intent ambiguity. We built 70+ agents - the hard part is never the tool, it's whether the agent correctly infers what you meant. Testing that inference gap is the unsolved problem in vibe coding.
English
0
0
0
1.4K
Om Patel
Om Patel@om_patel5·
THIS GUY ADDED A LIVE BROWSER TO CLAUDE CODE SO YOU CAN CLICK ANY ELEMENT AND EDIT IT INSTANTLY biggest issue with vibe coding UI is that you have to describe what you want to change. if you prompt it the wrong selector or wrong component, Claude can't find it. now you just click it. your app runs in an embedded browser with Claude Code. you can click any button, any text, any div. Claude instantly knows exactly what you're pointing at. click. instruct. done. no more "change the button in the top right corner of the second card component." all you have to do now is just click the button. AND its open source
English
60
62
967
77K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@tammireddy Exactly. We ran 70+ agents. 7 made prod. The ones that failed weren't missing access - they were missing someone to say 'ok, go live.' Deployment is the introduction.
English
1
0
1
18
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@Anunirva777 @DataChaz @addyosmani @GoogleAI both. edge cases are usually the inputs that don't match the happy path - and user data is where that shows up. we saw 3 of our 70 agents hit this: 98% accuracy in testing, then a specific field format in prod would trigger a loop no one caught
English
0
0
0
4
Charly Wargnier
Charly Wargnier@DataChaz·
🚨 You need to see this. @addyosmani from Google just dropped his new Agent Skills and it's incredible. It brings 19 engineering skills + 7 commands to AI coding agents, all inspired by Google best practices 🤯 AI coding agents are powerful, but left alone, they take shortcuts. They skip specs, tests, and security reviews, optimizing for "done" over "correct." Addy built this to fix that. Each skill encodes the workflows and quality gates that senior engineers actually use: spec before code, test before merge, measure before optimize. The full lifecycle is covered: → Define - refine ideas, write specs before a single line of code → Plan - decompose into small, verifiable tasks → Build - incremental implementation, context engineering, clean API design → Verify - TDD, browser testing with DevTools, systematic debugging → Review - code quality, security hardening, performance optimization → Ship - git workflow, CI/CD, ADRs, pre-launch checklists Features 7 slash commands: (/spec, /plan, /build, /test, /review, /code-simplify, /ship) that map to this lifecycle. It works with: ✦ Claude Code ✦ Cursor ✦ Antigravity ✦ ... and any agent accepting Markdown. Baking in Google-tier engineering culture (Shift Left, Chesterton's Fence, Hyrum's Law) directly into your agent's step-by-step workflow! `npx skills add addyosmani/agent-skills` Free and open-source. Repo link in 🧵↓
Charly Wargnier tweet media
English
50
323
2.4K
342.6K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@beffjezos The danger isn't losing the weekend, it's what your vibe-coded agent does after you ship it and sleep. Most won't notice until Monday when it's been running unchecked for 72 hours with no behavioral guardrails.
English
0
0
0
23
Beff (e/acc)
Beff (e/acc)@beffjezos·
*starts vibe coding Friday night* *blinks* Oh fuck it's Sunday night.
English
22
8
164
6.5K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@openclaw Unpopular opinion: if a provider swap breaks your agent, the problem isn't @AnthropicAI or @OpenAI - it's that you have no test coverage across providers. We saw this kill 3 enterprise deployments in finserv. The model changed, the agent broke, nobody knew until prod.
English
0
0
1
3.3K
OpenClaw🦞
OpenClaw🦞@openclaw·
OpenClaw 2026.4.5 🦞 🎬 Built-in video + music generation 🧠 /dreaming is now real 🔀 Structured task progress ⚡ Better prompt-cache reuse 🌍 Control UI + Docs now speak 12 more languages Anthropic cut us off. GPT-5.4 got better. We moved on. github.com/openclaw/openc…
English
357
614
6.3K
1.1M
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@0G_labs Ngl this hit. Built 70+ agents in finserv - most failures were infrastructure, not logic. No observability, silent drifts, no way to know when behavior changed mid-flight. The algorithm worked in dev, broke in prod. Every time.
English
1
0
1
70
0G Labs (Home of Infinite AI)
AI agents don't fail at the algorithm level. They fail at the infrastructure level. Bad storage. Slow compute. No verifiability. 0G solves all three: chain + storage + DA + compute — purpose-built for the agentic economy. What are you building on it?
English
20
29
244
52.8K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
97M @AnthropicAI MCP installs. Foundational infra now. One question: who's actually testing what agents do once they're running on it? Built 70+ of these - the answer is almost no one.
English
0
0
1
38
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@MLBear2 Curious how you're handling failure recovery - state management + unexpected tool responses is where most agents break in prod. Most teams build with @AnthropicAI SDK and never test those edge cases before shipping.
English
0
0
0
3
ML_Bear
ML_Bear@MLBear2·
最近 Claude Agent SDK の仕様を少し調べてたので、自分用のメモをまとめてZenn Bookとして公開しました😇 スレッドに載せている旅行プランナー動画のような対話型AIエージェントが簡単に開発できると思います。お暇な時にでもどうぞ!(間違いあれば教えてください🙏) zenn.dev/ml_bear/books/…
日本語
4
41
273
19K
Devashish Upadhyay
Devashish Upadhyay@devashishup·
@MarioNawfal API upgrade is nice. But most teams will still ship agents that break on live data within weeks. Better tooling isn't the bottleneck. Testing what the agent actually does in prod is.
English
0
0
0
1
Mario Nawfal
Mario Nawfal@MarioNawfal·
🚨 GAME-CHANGER for AI builders & agents The 𝕏 API just got a MASSIVE update: - Pay-Per-Use: No more monthly tiers, only pay for what you actually use - Native XMCP + Xurl: Your AI agents can now read real-time context and take actions straight on 𝕏 - Official Python & TypeScript SDKs: Ship 10x faster - Free API Playground: Safe, realistic testing before you go live @X, @elonmusk
Mario Nawfal tweet media
Elon Musk@elonmusk

Upgrades to our API

English
27
22
136
60.7K