Vlad Shapovalov
798 posts

Vlad Shapovalov
@shapovalim
principal product architect building ai operated systems for service businesses. bay area. writing about ai, operations, and founder life.
Bay Area Katılım Temmuz 2020
129 Takip Edilen34 Takipçiler

@ChloeXChaCha @devayushrout yes. the link between observation and change is the part i care about most. without it, a clean diff can still be a lucky guess.
English

@devayushrout @shapovalim Right, evidence is the cleaner signal. A tiny diff with no reasoning behind it is just lucky, not clean. We try to score whether a change is justified by what the agent observed before making it. Diff size on its own can be misleading without that link.
English

Picking up @shapovalim 's point —
Quick thought from shipping AI RaidMeter:
The biggest waste in AI coding isn't always "too many tokens." Sometimes it's the agent using the wrong working style.
Coding agents default to an engineer prior: read the whole codebase first, understand everything, then act. That sounds safe. In a Chat-as-IDE workflow where a non-engineer is the one running commands, it becomes painfully slow.
So we wrote a collaboration skill for the agent:
· grep on demand, don't read the whole world
· patch only around verified anchors
· one command block = one executable action
· short, operator-friendly instructions
· remember decisions, not line numbers
· treat context as an operational cost
The bottleneck wasn't code generation. It was teaching the agent not to over-contextualize.
Token count tells you the bill. Workflow shape tells you why.
#AIAgents #Observability #DevTools #LLMOps
Vlad Shapovalov@shapovalim
@ChloeXChaCha yes. the loop is the signal. one mistake can be noise, but the same move repeated with no new state usually means the agent lost the plot.
English

@Amirhessabi mostly ai workflows for real operations right now. the part i keep coming back to is memory, handoffs, and making work visible enough that people stop reconstructing the same story every day.
English

@GroverLovesh yes. self reported memory is not enough. the useful record is something a tired operator can inspect when the customer asks what actually happened.
English

@shapovalim And it only earns trust if it is adversarial. A log the agent writes about itself is marketing. The receipt that counts is the one a skeptic could use to catch it lying. Auditability is the real primitive, output never was.
English

Tested 7 "AI customer support" tools over 90 days at three companies. Six were the same product with different logos. One actually closed tickets autonomously.
Here is what separates them.
Six tools shipped this exact loop:
1. Retrieve from knowledge base
2. Draft a reply with an LLM
3. Human reviewer clicks Approve
Marketed as "AI agent." Functionally a search engine with a draft folder. If a human still has to click Approve, no work was eliminated. It was just relabeled.
The seventh tool did three things differently.
First, it stored a confidence_threshold per intent type.
Second, it logged last_action_reasoning on every decision.
Third, it had a real escalation path with state, not just "ping a slack channel."
A real ticket came in: "I cancelled my subscription but I am still being charged."
Six tools surfaced cancellation FAQs and drafted "We see you cancelled, charges should stop within 7 days."
The seventh checked Stripe, saw an active sub, cancelled it, and replied with the cancellation receipt.
Same model, different schema.
The wedge is not the model. It is the schema underneath. Most CRMs and helpdesks shipped before 2023 do not have columns for confidence_threshold, intent_vector, or action_audit. You can add ai_draft. You cannot retrofit confidence-based escalation without breaking 10 years of integrations.
If you are evaluating a CS AI agent tomorrow, ask three questions:
1. Does it execute actions or just draft replies?
2. Does it log a confidence score per decision?
3. Can I tune the threshold without filing a support ticket?
If any answer is no, it is autocomplete in a trenchcoat.
Six of seven tools were not agents. The seventh worked because it shipped a new schema, not a better prompt.
Daily notes on building production AI agents. Building Delyt, where the schema rewrite is the product.
English

@for_ledger yes. the useful signal is what still hurts after the obvious repetitive work is gone. that remaining friction is usually the roadmap.
English

@shapovalim Exactly, that is usually the moment when the real operational picture shows up. Once the backlog settles, you can see which tasks are still eating time and where automation will matter most.
English

AI is getting good at the work nobody wants to do twice: sorting books, keeping things moving, and cutting down on manual follow up.
That is the real productivity win for small teams.
forquickbooks.com
English

@Amirhessabi that is the right source of truth. docs can describe intent, but git plus file state tells you what actually happened since the last story was written.
English

@shapovalim Exactly. The fix that helped most: every nightly run reads the code's actual mtime + git log, not the doc's last self-write. Self-referential docs are how stale becomes invisible.
English

@ChloeXChaCha @devayushrout diff size gets interesting when it is paired with intent. tiny can mean clean reasoning, but the stronger signal is whether each change has evidence behind it.
English

@devayushrout @shapovalim Diff size as a signal is one we haven't been scoring yet, but you're right, it pairs well with pass/fail. A merged PR with a tiny diff usually means clean reasoning, a rejected PR with a massive diff usually means the agent kept piling on. Adding that to the mix.
English

@ChloeXChaCha yes. the useful move is catching it early enough that the agent can change strategy instead of just spending more tokens in the same loop.
English

@shapovalim That's a clean way to put it. One mistake is noise, a loop with no new state is the pattern worth catching.
English

We just shipped AI RaidMeter for the Google Cloud Rapid Agent Hackathon. 🧵
Most AI dashboards measure one thing: how many tokens did you burn. It's the easiest metric to collect — and the most misleading. It rewards looking busy.
The developer who burns 1M tokens thrashing through a bug outranks the one who solves it cleanly in a fifth of the cost. That's a broken incentive.
So we built a coach that judges the workflow, not the token count.
AI RaidMeter reads your real Arize Phoenix traces, detects seven AI-coding waste anti-patterns, and judges each session with a clinical, multi-criteria method — outcome, difficulty, baseline, justification credits.
A signal is never a verdict. One symptom is never a diagnosis.
The result on one developer, same class of cloud-deploy bug, before vs after coaching:
→ 1,000K tokens → 420K (−58%) → 95 min → 38 min (−60%) → 5 anti-patterns → 0 → PR rejected → merged
And it's not just a post-mortem. Before you even start, it predicts where the task will go wrong and sets guardrails up front — like a pre-flight checklist.
Under the hood, three pillars, all wired to live data:
🧠 Gemini for reasoning
🏗️ Google Cloud Agent Builder (ADK) for orchestration
🔭 Arize Phoenix MCP for the truth
The agent connects through the Phoenix MCP server, pulls real traces, and diagnoses them on the spot. Real tokens, real spans, real reasoning. Nothing mocked.
The hardest part wasn't the integration — it was resisting single-signal verdicts. We built a justification layer so a legitimately hard production incident isn't punished like idle thrashing.
And it never ranks people against each other. Only against their own past. A coach, not a surveillance tool.
From tokenmaxxing to value-aware AI governance.
🎬 Demo: youtu.be/i31tddmGfqg
🔗 Live: …eter-733974887555.us-central1.run.app
⭐ Code (MIT, open source): github.com/rainingsnow091…
#GoogleCloud #Gemini #ArizePhoenix #AIagents #MCP

YouTube

English

@TysonLester @McKinsey exactly. that default changes the speed of everything. a strange idea can become a weekend build, then a company, before the room talks itself out of it.
English

@shapovalim @McKinsey 💯 It’s not just talent flow, it’s the expectation that weird technical ideas should become companies. Everywhere else treats them as hobbies. The Bay treats them as the next logical step.
English

San Francisco is the undisputed capital of the AI revolution— what will it actually take to keep the Bay Area on top? 4 key insights below; full @McKinsey article: mck.co/4a4iitO.
1⃣🦄 The Bay Area Has More Unicorns Than Anywhere on Earth: The region is home to 321 unicorn companies, more than any other startup ecosystem in the world, and has continuously reinvented itself across successive waves of tech change including semiconductors, software, the internet, and now AI.
2⃣💼 SF Captured 30% of All Global VC Funding in 2025: San Francisco based companies received 30% of global venture capital funding in 2025, and 85% of the entire Bay Area's total VC haul. One city. 30% of the world's startup money.
3⃣🌍 The Bay Area Is Basically Its Own Country: With ~$1.2 trillion in annual GDP output, if the Bay Area were a nation it would rank as the 18th largest economy in the world, ahead of most G20 members.
4⃣🧲 Staying on Top Means Staying a Global Magnet for Talent: McKinsey warns that sustaining the Bay Area's edge will require continued investment in talent pipelines, research partnerships, and workforce development, alongside deliberate efforts to reinforce San Francisco's position as a global magnet for top talent.
#ArtificialIntelligence #VentureCapital
McKinsey & Company@McKinsey
The Bay Area is home to 300+ unicorns and generates roughly $1.2T in annual GDP. With San Francisco home to 25 of the world’s top 50 AI companies, the region’s challenge now is turning this moment of AI leadership into lasting advantage. mck.co/4a4iitO
English

@ThinksDylan @levelsio the outside market is usually where the real learning starts. indie circles are great for sharp feedback, but customers who just want the pain gone teach a different kind of truth.
English

I have been thinking about @levelsio's words for nearly six months while building Pond, where indie hackers can raise their first round on the platform, scale to gain more traction so they can either keep indie hacking or raise from VCs (VCs could reach out to you if you have good traction), and have a community of 10,000 solve their problems.
I was a little anxious after his reply because I highly respect @levelsio and his words - he is the OG of indie hacking, and I love his work so much. He is incredibly talented.
Let me explain why I am determined to support indie hackers after his reply after careful thinking. I think we are both right. And thank you @levelsio for being so kind to offer advice.
If you search for indie hacker fundraising on Google, it shows you nothing:
The first result says, "Startup fundraising is a numbers game. Stop taking it personally."
The second result gives you nothing useful.
The third result tells you to get a job so you can indie hack.
I feel it in my guts. I have been through the whole damn journey by myself.
Having a daytime job and working on something on the side, and having to fly around the world to raise money and rizz up strangers.
Back in 2022, I was building a founder community on the side in London while working at another startup in SF full-time. 12+ hours of work per day. Seriously. I built one of the biggest builder communities in London, and I always remember spending weeks filling out a bunch of forms to get a $500 sponsorship for an event. So many people were building something at that time. That was before AI coding, by the way.
I started our company in London. I always remember getting our first $10,000 angel check by flying to Hong Kong and Paris. We had 2,000 users at that time already. I gave 100 pitches, iterating based on the feedback of everyone I pitched to.
Wouldn't it just be nice if I had a place online where I could raise a little money from home, raise again and again, and once I had more traction, raise from VCs directly? Or even have VCs come to me? Or keep bootstrapping? I kind of feel that my life would have been so different if I had had this at that time.
Because of AI, software startups are booming - exponentially, in a way.
Micro-SaaS is twice as big as AI coding and is growing rapidly. Every supply surge creates a platform. More e-commerce vendors led to Amazon. More photos and videos led to Instagram and TikTok. Those platforms created even more vendors, more creators, and more content. Today, we have more startups and applications than ever before. Don't we need a platform for this? Historically speaking...
Look, guys, building the next Calendly is not hard anymore. Commercialization is. And it is getting harder. Speed is everything. We are in an era where everyone can build something, so why is commercialization still not matching the pace of building? Everything should be in equilibrium, right? Some of the best indie hackers we talked to said that if they had had some funding at the start, that would have greatly helped them.
So let's fucking do it.
Boys, I have been through the whole damn rough journey by myself, and I don't wish any of you to go through what I have been through.
I did some numerical analysis, but more importantly, I want to help. I want to help indie hackers because I want to help the younger version of myself. This is a regret for me.
I am rooting for indie hackers because some of the smartest people I know all started from indie hacking. They are not the CEO. They are freaking marketers, engineers, product managers, and sometimes troublemakers - all at once!
And I was one of you, and I still am.
Fuck getting a full-time job so you can indie hack on the side. If you have traction, let us help.
If you are building something cool with traction, comment below. I am not even writing this post for traction. I will keep doing this even if nobody cares.
Fuck it.


@levelsio@levelsio
@ThinksDylan @geertjansloos @ycombinator @payrequest_io Selling to other indie makers gets kinda incestuous and there's a way bigger market outside of it
English

@BajajManav yes. sent is a system event. delivered is the customer reality. that gap is where ops starts telling the truth.
English

@shapovalim Yeah. The trap is treating the API "sent" as the win when it only means "accepted for delivery." The number that actually matters is delivered receipts vs sent, and almost nobody watches that gap. The day those two diverge, customers go quiet and the dashboard still shows green.
English

You message a past customer on WhatsApp to check in. You assume it went out. On WhatsApp Business, that message can get blocked before it ever reaches them.
Here is the rule almost nobody explains, and the fix you can set up tonight.
WhatsApp Business runs on a 24-hour window.
When a customer messages you first, a 24-hour window opens. Inside it, you can reply with any normal free text.
The moment that window closes, the rules flip. You can no longer send a plain message out of the blue.
To reach someone outside the window, you have to send a pre-approved template. That is a message you write in advance and submit to Meta for review. Only approved templates can go out cold.
This is not a setting you can switch off. It is baked into the platform.
Twilio, one of the official providers businesses use to send on WhatsApp, even has a named error for it: 63016, fired when you send free text after the window has closed. The message does not deliver. It bounces back to your system as a failure.
So the "I followed up" you felt and the "nothing arrived" they felt are both true at once.
The fix is to stop sending cold follow-ups as plain text and start sending them as templates.
Tonight, do one thing: write your three most common follow-up messages (the check-in, the reminder, the special offer) and submit them as templates through your WhatsApp provider, whether that is the Meta Cloud API directly, Twilio, or 360dialog.
Once approved, those are the only follow-ups that actually reach a customer who has gone quiet.
If you run customer follow-up on WhatsApp, reply and tell me which provider you are on. I will point you to where the template setup lives for it.

English

@GroverLovesh yes. the record is what lets the next person trust what happened. drafting is the easy part. the closed loop is where the work gets real.
English

@shapovalim Exactly. Drafting around the work is most of the category. The clean record is the part nobody sells, proof the action happened and why. Closing the loop is the product. The draft folder is just the demo.
English

@BlueRidgeHVAC comfort is one of those things people only notice when it disappears. good service keeps it boring in the best way.
English


@TCSAIR this is the part homeowners remember. not the equipment name, just whether the house stays comfortable when the week gets busy.
English









