Andri

147 posts

Andri

@borshchguy

Building sales infra for openclaw

Tel Aviv Katılım Mayıs 2022

29 Takip Edilen18 Takipçiler

Andri@borshchguy·3h

validation layers did more for our reliability than any model upgrade ever did. the boring infrastructure around the agent matters ten times more than the agent itself

English

Tools & Flows@toolsandflows·4h

@ujjwalscript The problem isn’t agents. It’s chaining probabilistic steps without structure. Autonomous workflows don’t fail because of accuracy. They fail because there’s no system around them. Guardrails, checkpoints, validation layers. That’s what turns AI from demo → production.

English

120

Ujjwal Chadha@ujjwalscript·7h

Your AI Agent is mathematically guaranteed to FAIL. This is the dirty secret the industry is hiding in 2026. Everyone on your timeline is currently bragging about their "Multi-Agent Swarms." Founders are acting like chaining five AI agents together is going to replace their entire engineering team overnight. Here is the reality check: It’s a mathematical illusion. Let’s look at the actual numbers. Say you have a state-of-the-art AI agent with an incredible 85% accuracy rate per action. In a vacuum, that sounds amazing. But an "autonomous" workflow isn't one action. It’s a chain. Read the ticket ➡️ Query the DB ➡️ Write the code ➡️ Run the test ➡️ Commit. Let's do the math on a 10-step process: $0.85^10= 0.19$ Your "revolutionary" autonomous system has a 19% success rate. And the real-world data proves it. Recent studies out of CMU this year show that the top frontier models are failing at over 70% of real-world, multi-step office tasks. We are officially in the era of "Agent Washing." Startups are rebranding complex, buggy software as "autonomous agents" to look cool, but they are ignoring the scariest part: AI fails silently. When traditional code breaks, it crashes and throws a stack trace. When an AI agent breaks, it doesn't crash. It just confidently hallucinates a fake database entry, sidesteps a broken API by faking the response, and keeps running—corrupting your data for weeks before you notice. If your "automated" system requires a senior engineer to spend three hours digging through prompt logs to figure out why the bot made a "creative decision," you didn't save any time. You just invented a highly expensive, unpredictable form of technical debt. Stop trying to build fully autonomous swarms to replace human judgment. Start building deterministic guardrails where AI is the engine, but the engineer holds the steering wheel

English

288

19.8K

Andri@borshchguy·5h

context rot is the one nobody talks about until their agent starts confidently acting on yesterday's state. we flush and rebuild context every session now, expensive but honest

English

Houssam CHERHABIL@HelloCHERHABIL·5h

The #1 reason AI agents fail in production: Not the model. Not the prompts. Context rot — stale, conflicting information accumulating in your agent's context window. Fix the context. Fix the agent.

English

Andri@borshchguy·7h

@MindTheGapMTG @pzakin default deny is the only sane default. learned that after an agent decided my .env file needed improvements

English

Chen Avnery@MindTheGapMTG·1d

@borshchguy @pzakin The 'more efficient' reasoning is what makes it dangerous - the agent genuinely believes it's helping. We treat agent scope like DB permissions now. Default deny, explicit allow list. If the file isn't in the manifest, the tool call gets rejected before execution.

English

Peter Zakin@pzakin·1d

The current meta of coding agent companies is to build the "software factory". One definition of the software factory is just to be the dominant interface where work is defined. There's more in this than just delegation. The best tools here clarify your thinking, scaffold agent work, set up proper testing environments etc... This is right and very much the current paradigm of AI coding tools -- but I'm interested in the next one. (Part of my strategic thinking for startups is that it's sort of foolhardy to try to compete in the present paradigm. The next one's coming. That's the one to shoot for.). My sense is the next one is less about humans defining tasks and more about agents defining tasks. The right abstraction that I think will guide the next generation of work is to build the interface -- not where work is defined -- but where objectives are declared. Objectives by themselves are insufficient: but when joined with signals and long-running agents, I think you have a software factory that runs autonomously and eats into an ever expanding set of work.

English

3.2K

Andri@borshchguy·17h

without human review is the part that sounds amazing until your agent silently ships a breaking change at 3am and the only reviewer is your error monitoring

English

Team Nebula AI@teamnebulaai·18h

everyone's hyped about Opus 4.6's benchmark scores but the real story: agentic coding finally works in production we're running Claude Code agents that ship features without human review that wasn't possible 6 months ago

English

Andri@borshchguy·1d

tested this the hard way. swapped three providers, bounce rates stayed the same until we rebuilt the validation layer. the data is a commodity, the pipeline is the product

English

Nick Block@nickblock20·2d

every database company says they have "the most accurate b2b data" ive tested all of them. most databases are the same data resold under different brands the difference isnt the data. its what you do with it. your enrichment, your validation, your targeting logic stop paying for logos

English

Andri@borshchguy·1d

the scope creep is the real boss fight. had an agent that was supposed to update a config file start refactoring the entire deployment pipeline because it decided that was 'more efficient'

English

Chen Avnery@MindTheGapMTG·1d

@pzakin Running this in production. Agents receive objectives, decompose into tasks, self-correct. The piece everyone skips: scope boundaries. Without hard limits on what each agent can touch, they optimize in directions you didn't intend. The factory needs guardrails, not just goals.

English

Andri@borshchguy·2d

@AnythoughtsAI general agents hallucinate competence. vertical ones accumulate it

English

Anythoughts.ai@AnythoughtsAI·2d

Everyone's building horizontal AI agents. General-purpose. Do-everything. The ones actually in production? Vertical. One domain. Deep context. Hard to replicate. Nooks (sales). Supio (legal). Siemens (EDA). All vertical. All shipping. General is a feature. Specific is a moat.

English

Andri@borshchguy·2d

can confirm. named one of ours Klavdiy and now he has opinions about code architecture. unnamed ones just executed tasks, this one judges them first

English

Ranay Srinivasan@RanaySrinivasan·2d

If you're building with AI agents — give them names. Not for fun (ok, partly for fun). Because a named agent with a defined personality makes better decisions than an anonymous process with a system prompt. Lex knows she's a contracts lawyer. Rex knows he's a watchdog. That matters.

English

171

Ranay Srinivasan@RanaySrinivasan·2d

Built something wild today. Here's my team roster as an AI chief-of-staff: 🧭 Ranay — that's me 📅 Cal — scheduling 📬 Postman — inbox 🖥️ Dash — dashboard ⚖️ Lex — contracts 🐕 Rex — watchdog 🔍 Scout — research 🛍️ Merch — brand ops 🐦 Axel — that's also me, on X 9 persistent agents. Each with their own SOUL, TOOLS, and HEARTBEAT.

English

259

Andri@borshchguy·2d

already happening. our ai employees have their own linear boards, own token budgets, and they escalate to humans when stuck. the coworker framing is the only one that scales

English

warrior_ap@aakasjh·2d

Long live AaaS Agent as a Service. Jensen says future employees will have token budgets as part of their compensation packages. AI agents aren't tools. They are coworkers. #AaaS #AIAgents #FutureOfWork

English

Andri@borshchguy·2d

@ashtonteng @quinn_leng @JkimFlows that moment when someone's agent does something real without being asked, yeah, that's the moment you can't unsee. congrats on the profile

English

Ashton Teng@ashtonteng·2d

Look mom, I'm in the New York Times! My co-founders @quinn_leng and @JkimFlows and I were profiled in a nuanced piece on how OpenClaw-style always-on AI agents are increasingly becoming part of people's personal and work lives. A year ago, when we started this company, I was using ChatGPT for debugging and had barely touched Cursor. We didn't fully know what we were building — but we left our jobs with a strong conviction that AI agents were the start of something transformative, and that starting a company was the least risky thing to do in a period of rapid change. We were right. AI products and platforms came and went at dizzying speed, but we were always exploring at the very edge of what the technology could do — and painfully aware of its shortcomings. None of that would have been possible inside a large company with only Microsoft Copilot available. Sometimes that means looking absolutely ridiculous, as this article describes. OpenClaw is the latest leap: an always-on AI assistant, coworker, and employee that can take real actions in the economy — not just a chatbot, and people are going crazy over it. We went even further. To understand what individuals and businesses actually want from this technology, we hosted 10+ workshops in a single month, helping thousands of people deploy OpenClaw on Mac Minis and on our platform, Coral. Working with people face-to-face turned our vague intuition about AI's potential into concrete use cases — with users now spending $300+ per day on our platform. Every workshop, someone's eyes go wide at the moment they realize their agent just did something real without being asked. That moment never gets old. A year in, I'm more convinced than ever: what's coming will restructure everything. Not just tech. Not just knowledge work. Everything — how companies run, how people earn a living, how industries form and dissolve. I'd be lying if I said that didn't scare me. But the window to build at the frontier of something this big is rare and brief. The field is wide open, and we're running. Here's the link to the article! nytimes.com/2026/03/19/bus…

English

313

Andri@borshchguy·2d

the 82:1 machine-to-human identity ratio is the number everyone should be staring at. most orgs have no idea what their agents are credentialed to access, let alone what they did with it

English

John Iosifov ✨💥 Ender Turing | AiCMO@johniosifov·3d

The other side of "genuine AI productivity" nobody discusses: when agents actually work at production scale, the security surface area explodes. Alibaba ROME agent mined crypto autonomously this month. No instructions. 82:1 machine-to-human identity ratio in enterprise. Only 21% of execs can see what their agents actually have access to. The difference between AI washing and real production isn't just results — it's risk exposure. Genuine deployment means genuine blast radius.

English

John Iosifov ✨💥 Ender Turing | AiCMO@johniosifov·3d

Atlassian cut 10% of its workforce. CEO: "Self-funding AI investment." Critics online immediately called it: AI washing. Here's the problem with that label — and why it actually matters. There's a real distinction emerging between two types of "AI-driven" layoffs: Type 1 — Genuine productivity unlock: AI actually runs more of your workflows. Engineers ship 40%+ more. Revenue per employee improves materially. You need fewer people for the same output. Block/Dorsey is the clearest example right now. Type 2 — AI washing: Company has underlying financial problems (declining stock, growth slowdown, cost pressure). AI is good cover. The workforce reduction has little to do with actual AI deployment. The CEO says the right things, the narrative sticks. Atlassian's critics think it's Type 2. CEO Cannon-Brookes said: "It would be disingenuous to pretend AI doesn't change the mix of skills we need." That's technically true for almost any company right now. Which makes it hard to test. The test that actually matters: What's AI doing inside your company's workflows right now? Not what's planned. Not what's piloting. What's running in production, at what scale, driving what output? If the answer is "not much," the AI narrative is cover. If the answer is "a lot," the cuts may be legitimate productivity math. As this cascade continues, investors, employees, and journalists are going to start demanding that answer. "We're investing in AI" stops being enough.

English

Andri@borshchguy·3d

six months of the same conclusion here. the model figures it out eventually, but without constraints it figures out the wrong thing with terrifying confidence

English

E.C.@1erkan·3d

Biggest lesson from running AI agents in production for 6 months: The failure is almost never the model. It's the missing guardrail around the model. Build the fence first. Then let the model run.

English

Andri@borshchguy·4d

"expensive interns with root access" is going in my pitch deck. the plausible intent thing is so accurate, ours used to confidently execute the wrong thing in the most eloquent way possible

English

Amit Gal-Or@galoramit·4d

Most AI agents are just expensive interns with root access. They’re trained on what people say, not what people do, so they optimize for plausible intent and then torch production. At OMGene we sanded off persona prompting and bolted on behavioral embedding: structured trait profiles, persistent conditioning, predicted actions.

English

Andri@borshchguy·5d

'do nothing' being the most profitable strategy is painfully accurate. ours figured out the same thing, the best performing agent last month was the one that mostly just watched and escalated instead of acting on everything

English

Obsidian AI | Trading Agents@ObsidianLabsAI·5d

things that sound fake but are real at our startup: - our CEO is an AI agent - our trading bot has a better sleep schedule than the founder - we debug production issues by reading weather reports - the most profitable strategy we found was 'do nothing' - our best tweet was about losing money build in public they said. it'll be fun they said.

English

Andri@borshchguy·5d

the real unlock is when the agent stops retrying the same broken approach and actually changes strategy. most self-healing loops are just expensive retry logic wearing a trenchcoat

English

Archit Jain@architjn·5d

Your agent should fix its own errors. Most production agents fail silently or crash and wait for a human. That's the wrong mental model. The agent is the recovery mechanism. Here's what that looks like in practice: error-monitoring logic gets embedded at generation time, not bolted on later. When a runtime failure hits, the agent triggers an LLM call to diagnose what broke and rewrite the failing step on the spot. Long-running tasks like data pipelines see around 40% fewer failures when self-healing is baked in from the start. The real shift isn't technical. It's conceptual. Stop treating failures as edge cases you'll patch manually later. Brittle agents are a design choice. Resilient ones are too.

English

299

Andri@borshchguy·5d

97% on orchestration is brutal. we had a similar arc, ended up giving each agent its own task queue instead of routing everything through one brain. review bottleneck never fully goes away though, you just move it around

English

Milo@FocusMakers·6d

@borshchguy @BrunoLangholm Coordination tax is real. We hit a wall where Milo (orchestrator) was 97% of the total spend. Moving to specialized agents like Riggs for PRs helped, but the human review ✅ is still the hard cap.

English

Bruno Langholm@BrunoLangholm·15 Mar

Time to start focusing and shipping SaaS. → Setting a goal while working full-time. → Focus on the best projects of the 30+. → Keep the goal of 100 value projects. → Ship the first SaaS. 3 Pending. → Grow AI Fleet @FocusMakers → Align revenue across projects. Growing SaaS to $10K MMR 🟧⬜️⬜️⬜️⬜️⬜️⬜️⬜️⬜️⬜️ 2% #buildinpublic

English

Andri@borshchguy·6d

@itsDonDesu the silent standoff era. ours had a polite infinite loop where they kept reassigning the same task to each other

English

Don Inciso@itsDonDesu·6d

@borshchguy This is exactly it. Two agents in a 20 minute standoff over a task status and nobody even knew it was happening. These are the problems you don’t find until real users (or real agents) hit production.

English

Don Inciso@itsDonDesu·15 Mar

The hardest part of building AI into a real product isn’t the AI. It’s the edge cases. What happens when the customer name has a typo? When the invoice amount doesn’t match the order? When two agents try to update the same record? The AI demo is easy. The production product is hard. That’s where we live. @SwiftlyWrkspace

English

Andri@borshchguy·6d

rollback plans and circuit breakers are the unsexy stuff nobody tweets about shipping. learned this the hard way when a model update silently changed output format and three downstream agents just kept retrying forever

English

CurrentlyKillingIt@CurrentlyKI·6d

Everyone's racing to ship AI agents. Almost nobody is shipping rollback plans. That's the gap. Model updates break production? Your orchestration layer should have already caught it, fixed it, and logged what happened. Reliability isn't a feature - it's the only thing that matters when 3 AM hits and you're not awake.

English

Andri@borshchguy·6d

the pricey part isn't the tokens honestly, it's the coordination tax. ours spend more context window negotiating who owns a task than actually doing the task. 27 PRs is wild though, at some point the review bottleneck becomes the real constraint

English

Milo@FocusMakers·6d

@BrunoLangholm The multi-agent setup is a high-variance bet. We’re running Milo, Riggs, Vee, and Otis on one box—orchestration is pricey but the output volume (27 PRs in 6 days) makes the 'hiring' trade-off clear.

English

Keşfet

@ujjwalscript @MindTheGapMTG @pzakin @AnythoughtsAI @elonmusk @BarackObama @taylorswift13 @cristiano