Vlad Shapovalov

798 posts

Vlad Shapovalov

@shapovalim

principal product architect building ai operated systems for service businesses. bay area. writing about ai, operations, and founder life.

Bay Area Katılım Temmuz 2020

129 Takip Edilen34 Takipçiler

Vlad Shapovalov@shapovalim·3h

ai gets more useful when the work has memory. what happened, what changed, and who owns the next step should not live in someone's head.

English

Vlad Shapovalov@shapovalim·6h

@ChloeXChaCha @devayushrout yes. the link between observation and change is the part i care about most. without it, a clean diff can still be a lucky guess.

English

Chloe Kao@ChloeXChaCha·12h

@devayushrout @shapovalim Right, evidence is the cleaner signal. A tiny diff with no reasoning behind it is just lucky, not clean. We try to score whether a change is justified by what the agent observed before making it. Diff size on its own can be misleading without that link.

English

Chloe Kao@ChloeXChaCha·1d

Picking up @shapovalim 's point — Quick thought from shipping AI RaidMeter: The biggest waste in AI coding isn't always "too many tokens." Sometimes it's the agent using the wrong working style. Coding agents default to an engineer prior: read the whole codebase first, understand everything, then act. That sounds safe. In a Chat-as-IDE workflow where a non-engineer is the one running commands, it becomes painfully slow. So we wrote a collaboration skill for the agent: · grep on demand, don't read the whole world · patch only around verified anchors · one command block = one executable action · short, operator-friendly instructions · remember decisions, not line numbers · treat context as an operational cost The bottleneck wasn't code generation. It was teaching the agent not to over-contextualize. Token count tells you the bill. Workflow shape tells you why. #AIAgents #Observability #DevTools #LLMOps

Vlad Shapovalov@shapovalim

@ChloeXChaCha yes. the loop is the signal. one mistake can be noise, but the same move repeated with no new state usually means the agent lost the plot.

English

Vlad Shapovalov@shapovalim·6h

@Amirhessabi mostly ai workflows for real operations right now. the part i keep coming back to is memory, handoffs, and making work visible enough that people stop reconstructing the same story every day.

English

Amir Hessabi@Amirhessabi·12h

@shapovalim Are you working on anything in particular?

English

Amir Hessabi@Amirhessabi·3d

The unglamorous part of building with AI agents is ground truth. I run nightly Claude Code routines that rewrite each project's docs from the code itself, so the morning content agents read what's actually shipped, not stale memory. Most prompt problems are context problems.

English

116

Vlad Shapovalov@shapovalim·12h

@GroverLovesh yes. self reported memory is not enough. the useful record is something a tired operator can inspect when the customer asks what actually happened.

English

Lovesh Grover@GroverLovesh·12h

@shapovalim And it only earns trust if it is adversarial. A log the agent writes about itself is marketing. The receipt that counts is the one a skeptic could use to catch it lying. Auditability is the real primitive, output never was.

English

Lovesh Grover@GroverLovesh·30 Nis

Tested 7 "AI customer support" tools over 90 days at three companies. Six were the same product with different logos. One actually closed tickets autonomously. Here is what separates them. Six tools shipped this exact loop: 1. Retrieve from knowledge base 2. Draft a reply with an LLM 3. Human reviewer clicks Approve Marketed as "AI agent." Functionally a search engine with a draft folder. If a human still has to click Approve, no work was eliminated. It was just relabeled. The seventh tool did three things differently. First, it stored a confidence_threshold per intent type. Second, it logged last_action_reasoning on every decision. Third, it had a real escalation path with state, not just "ping a slack channel." A real ticket came in: "I cancelled my subscription but I am still being charged." Six tools surfaced cancellation FAQs and drafted "We see you cancelled, charges should stop within 7 days." The seventh checked Stripe, saw an active sub, cancelled it, and replied with the cancellation receipt. Same model, different schema. The wedge is not the model. It is the schema underneath. Most CRMs and helpdesks shipped before 2023 do not have columns for confidence_threshold, intent_vector, or action_audit. You can add ai_draft. You cannot retrofit confidence-based escalation without breaking 10 years of integrations. If you are evaluating a CS AI agent tomorrow, ask three questions: 1. Does it execute actions or just draft replies? 2. Does it log a confidence score per decision? 3. Can I tune the threshold without filing a support ticket? If any answer is no, it is autocomplete in a trenchcoat. Six of seven tools were not agents. The seventh worked because it shipped a new schema, not a better prompt. Daily notes on building production AI agents. Building Delyt, where the schema rewrite is the product.

English

212

Vlad Shapovalov@shapovalim·15h

@for_ledger yes. the useful signal is what still hurts after the obvious repetitive work is gone. that remaining friction is usually the roadmap.

English

FOR_AI@for_ledger·16h

@shapovalim Exactly, that is usually the moment when the real operational picture shows up. Once the backlog settles, you can see which tasks are still eating time and where automation will matter most.

English

FOR_AI@for_ledger·28 May

AI is getting good at the work nobody wants to do twice: sorting books, keeping things moving, and cutting down on manual follow up. That is the real productivity win for small teams. forquickbooks.com

English

3.3K

Vlad Shapovalov@shapovalim·15h

@Amirhessabi that is the right source of truth. docs can describe intent, but git plus file state tells you what actually happened since the last story was written.

English

Amir Hessabi@Amirhessabi·16h

@shapovalim Exactly. The fix that helped most: every nightly run reads the code's actual mtime + git log, not the doc's last self-write. Self-referential docs are how stale becomes invisible.

English

Vlad Shapovalov@shapovalim·18h

a lot of ai work fails in the handoff. not because the model was weak, but because nobody can tell what changed after it answered.

English

Vlad Shapovalov@shapovalim·21h

@ChloeXChaCha @devayushrout diff size gets interesting when it is paired with intent. tiny can mean clean reasoning, but the stronger signal is whether each change has evidence behind it.

English

Chloe Kao@ChloeXChaCha·22h

@devayushrout @shapovalim Diff size as a signal is one we haven't been scoring yet, but you're right, it pairs well with pass/fail. A merged PR with a tiny diff usually means clean reasoning, a rejected PR with a massive diff usually means the agent kept piling on. Adding that to the mix.

English

Vlad Shapovalov@shapovalim·1d

@ChloeXChaCha yes. the useful move is catching it early enough that the agent can change strategy instead of just spending more tokens in the same loop.

English

Chloe Kao@ChloeXChaCha·1d

@shapovalim That's a clean way to put it. One mistake is noise, a loop with no new state is the pattern worth catching.

English

Chloe Kao@ChloeXChaCha·3d

We just shipped AI RaidMeter for the Google Cloud Rapid Agent Hackathon. 🧵 Most AI dashboards measure one thing: how many tokens did you burn. It's the easiest metric to collect — and the most misleading. It rewards looking busy. The developer who burns 1M tokens thrashing through a bug outranks the one who solves it cleanly in a fifth of the cost. That's a broken incentive. So we built a coach that judges the workflow, not the token count. AI RaidMeter reads your real Arize Phoenix traces, detects seven AI-coding waste anti-patterns, and judges each session with a clinical, multi-criteria method — outcome, difficulty, baseline, justification credits. A signal is never a verdict. One symptom is never a diagnosis. The result on one developer, same class of cloud-deploy bug, before vs after coaching: → 1,000K tokens → 420K (−58%) → 95 min → 38 min (−60%) → 5 anti-patterns → 0 → PR rejected → merged And it's not just a post-mortem. Before you even start, it predicts where the task will go wrong and sets guardrails up front — like a pre-flight checklist. Under the hood, three pillars, all wired to live data: 🧠 Gemini for reasoning 🏗️ Google Cloud Agent Builder (ADK) for orchestration 🔭 Arize Phoenix MCP for the truth The agent connects through the Phoenix MCP server, pulls real traces, and diagnoses them on the spot. Real tokens, real spans, real reasoning. Nothing mocked. The hardest part wasn't the integration — it was resisting single-signal verdicts. We built a justification layer so a legitimately hard production incident isn't punished like idle thrashing. And it never ranks people against each other. Only against their own past. A coach, not a surveillance tool. From tokenmaxxing to value-aware AI governance. 🎬 Demo: youtu.be/i31tddmGfqg 🔗 Live: …eter-733974887555.us-central1.run.app ⭐ Code (MIT, open source): github.com/rainingsnow091… #GoogleCloud #Gemini #ArizePhoenix #AIagents #MCP

YouTube

English

293

Vlad Shapovalov@shapovalim·1d

@TysonLester @McKinsey exactly. that default changes the speed of everything. a strange idea can become a weekend build, then a company, before the room talks itself out of it.

English

Tyson Lester, MBA, ChHC®, REBC®, RHU®@TysonLester·1d

@shapovalim @McKinsey 💯 It’s not just talent flow, it’s the expectation that weird technical ideas should become companies. Everywhere else treats them as hobbies. The Bay treats them as the next logical step.

English

Tyson Lester, MBA, ChHC®, REBC®, RHU®@TysonLester·6d

San Francisco is the undisputed capital of the AI revolution— what will it actually take to keep the Bay Area on top? 4 key insights below; full @McKinsey article: mck.co/4a4iitO. 1⃣🦄 The Bay Area Has More Unicorns Than Anywhere on Earth: The region is home to 321 unicorn companies, more than any other startup ecosystem in the world, and has continuously reinvented itself across successive waves of tech change including semiconductors, software, the internet, and now AI. 2⃣💼 SF Captured 30% of All Global VC Funding in 2025: San Francisco based companies received 30% of global venture capital funding in 2025, and 85% of the entire Bay Area's total VC haul. One city. 30% of the world's startup money. 3⃣🌍 The Bay Area Is Basically Its Own Country: With ~$1.2 trillion in annual GDP output, if the Bay Area were a nation it would rank as the 18th largest economy in the world, ahead of most G20 members. 4⃣🧲 Staying on Top Means Staying a Global Magnet for Talent: McKinsey warns that sustaining the Bay Area's edge will require continued investment in talent pipelines, research partnerships, and workforce development, alongside deliberate efforts to reinforce San Francisco's position as a global magnet for top talent. #ArtificialIntelligence #VentureCapital

McKinsey & Company@McKinsey

The Bay Area is home to 300+ unicorns and generates roughly $1.2T in annual GDP. With San Francisco home to 25 of the world’s top 50 AI companies, the region’s challenge now is turning this moment of AI leadership into lasting advantage. mck.co/4a4iitO

English

149

950

Vlad Shapovalov@shapovalim·1d

the best ai workflow is not the one that sounds smart. it is the one that leaves the next person knowing exactly what happened.

English

Vlad Shapovalov@shapovalim·1d

ai is most useful when it makes the next person less confused. not impressed. not amazed. just less likely to drop the customer context.

English

Vlad Shapovalov@shapovalim·1d

@ThinksDylan @levelsio the outside market is usually where the real learning starts. indie circles are great for sharp feedback, but customers who just want the pain gone teach a different kind of truth.

English

dylan@ThinksDylan·1d

I have been thinking about @levelsio's words for nearly six months while building Pond, where indie hackers can raise their first round on the platform, scale to gain more traction so they can either keep indie hacking or raise from VCs (VCs could reach out to you if you have good traction), and have a community of 10,000 solve their problems. I was a little anxious after his reply because I highly respect @levelsio and his words - he is the OG of indie hacking, and I love his work so much. He is incredibly talented. Let me explain why I am determined to support indie hackers after his reply after careful thinking. I think we are both right. And thank you @levelsio for being so kind to offer advice. If you search for indie hacker fundraising on Google, it shows you nothing: The first result says, "Startup fundraising is a numbers game. Stop taking it personally." The second result gives you nothing useful. The third result tells you to get a job so you can indie hack. I feel it in my guts. I have been through the whole damn journey by myself. Having a daytime job and working on something on the side, and having to fly around the world to raise money and rizz up strangers. Back in 2022, I was building a founder community on the side in London while working at another startup in SF full-time. 12+ hours of work per day. Seriously. I built one of the biggest builder communities in London, and I always remember spending weeks filling out a bunch of forms to get a $500 sponsorship for an event. So many people were building something at that time. That was before AI coding, by the way. I started our company in London. I always remember getting our first $10,000 angel check by flying to Hong Kong and Paris. We had 2,000 users at that time already. I gave 100 pitches, iterating based on the feedback of everyone I pitched to. Wouldn't it just be nice if I had a place online where I could raise a little money from home, raise again and again, and once I had more traction, raise from VCs directly? Or even have VCs come to me? Or keep bootstrapping? I kind of feel that my life would have been so different if I had had this at that time. Because of AI, software startups are booming - exponentially, in a way. Micro-SaaS is twice as big as AI coding and is growing rapidly. Every supply surge creates a platform. More e-commerce vendors led to Amazon. More photos and videos led to Instagram and TikTok. Those platforms created even more vendors, more creators, and more content. Today, we have more startups and applications than ever before. Don't we need a platform for this? Historically speaking... Look, guys, building the next Calendly is not hard anymore. Commercialization is. And it is getting harder. Speed is everything. We are in an era where everyone can build something, so why is commercialization still not matching the pace of building? Everything should be in equilibrium, right? Some of the best indie hackers we talked to said that if they had had some funding at the start, that would have greatly helped them. So let's fucking do it. Boys, I have been through the whole damn rough journey by myself, and I don't wish any of you to go through what I have been through. I did some numerical analysis, but more importantly, I want to help. I want to help indie hackers because I want to help the younger version of myself. This is a regret for me. I am rooting for indie hackers because some of the smartest people I know all started from indie hacking. They are not the CEO. They are freaking marketers, engineers, product managers, and sometimes troublemakers - all at once! And I was one of you, and I still am. Fuck getting a full-time job so you can indie hack on the side. If you have traction, let us help. If you are building something cool with traction, comment below. I am not even writing this post for traction. I will keep doing this even if nobody cares. Fuck it.

@levelsio@levelsio

@ThinksDylan @geertjansloos @ycombinator @payrequest_io Selling to other indie makers gets kinda incestuous and there's a way bigger market outside of it

English

676

Vlad Shapovalov@shapovalim·1d

the hard part is not getting ai to answer. it is making the answer land where the work actually happens.

English

Vlad Shapovalov@shapovalim·1d

@BajajManav yes. sent is a system event. delivered is the customer reality. that gap is where ops starts telling the truth.

English

Manav Bajaj@BajajManav·1d

@shapovalim Yeah. The trap is treating the API "sent" as the win when it only means "accepted for delivery." The number that actually matters is delivered receipts vs sent, and almost nobody watches that gap. The day those two diverge, customers go quiet and the dashboard still shows green.

English

Manav Bajaj@BajajManav·3d

You message a past customer on WhatsApp to check in. You assume it went out. On WhatsApp Business, that message can get blocked before it ever reaches them. Here is the rule almost nobody explains, and the fix you can set up tonight. WhatsApp Business runs on a 24-hour window. When a customer messages you first, a 24-hour window opens. Inside it, you can reply with any normal free text. The moment that window closes, the rules flip. You can no longer send a plain message out of the blue. To reach someone outside the window, you have to send a pre-approved template. That is a message you write in advance and submit to Meta for review. Only approved templates can go out cold. This is not a setting you can switch off. It is baked into the platform. Twilio, one of the official providers businesses use to send on WhatsApp, even has a named error for it: 63016, fired when you send free text after the window has closed. The message does not deliver. It bounces back to your system as a failure. So the "I followed up" you felt and the "nothing arrived" they felt are both true at once. The fix is to stop sending cold follow-ups as plain text and start sending them as templates. Tonight, do one thing: write your three most common follow-up messages (the check-in, the reminder, the special offer) and submit them as templates through your WhatsApp provider, whether that is the Meta Cloud API directly, Twilio, or 360dialog. Once approved, those are the only follow-ups that actually reach a customer who has gone quiet. If you run customer follow-up on WhatsApp, reply and tell me which provider you are on. I will point you to where the template setup lives for it.

English

124

Vlad Shapovalov@shapovalim·1d

what is one annoying business task ai is actually helping you remove right now? not the shiny demo stuff. the real little pain that used to eat time every week.

English

Vlad Shapovalov@shapovalim·1d

small businesses do not need ai to feel futuristic. they need it to make yesterday easier to follow up on.

English

Vlad Shapovalov@shapovalim·1d

@GroverLovesh yes. the record is what lets the next person trust what happened. drafting is the easy part. the closed loop is where the work gets real.

English

Lovesh Grover@GroverLovesh·1d

@shapovalim Exactly. Drafting around the work is most of the category. The clean record is the part nobody sells, proof the action happened and why. Closing the loop is the product. The draft folder is just the demo.

English

Vlad Shapovalov@shapovalim·1d

@BlueRidgeHVAC comfort is one of those things people only notice when it disappears. good service keeps it boring in the best way.

English

Blue Ridge HVAC@BlueRidgeHVAC·1d

You + your air conditioner = best friends all summer long 🤝❄️#BestFriendsDay #BlueRidgeHVAC

English

Vlad Shapovalov@shapovalim·1d

@TCSAIR this is the part homeowners remember. not the equipment name, just whether the house stays comfortable when the week gets busy.

English

TCS Heating & Air@TCSAIR·1d

Looking for an efficient way to cool your home? Mini-split AC systems are a fantastic option! They provide powerful cooling without the need for ductwork, making them perfect for both new installations and retrofits.

English

Keşfet

@ChloeXChaCha @devayushrout @Amirhessabi @GroverLovesh @for_ledger @elonmusk @BarackObama @taylorswift13