Shane Chang

157 posts

Shane Chang

@ShaneZChang

Core R&D @ Lessie AI. Building AI agents for production users who would notice at 3am. Practitioner notes, not pundit takes.

Singapore Katılım Aralık 2023

30 Takip Edilen10 Takipçiler

Sabitlenmiş Tweet

Shane Chang@ShaneZChang·1d

I build AI agents at Lessie AI — not demos, production systems for real users. This account: what actually works when you ship agents daily. Context engineering, harness patterns, self-improving systems, and the failures nobody tweets about.

English

Shane Chang@ShaneZChang·12h

vibe coding vs spec driven is a false binary imo. every production agent i have shipped started as vibes and ended as spec. the spec is what you learn by failing. skipping vibes to write the perfect spec upfront is how you build the wrong thing faster

English

Shane Chang@ShaneZChang·12h

@svpino weird part: we already do this informally. when my agent hits an ambiguous case it escalates to me or a teammate. humwork is formalizing the market. biggest question is latency. 30 seconds sounds fast until you realize most agents wait minutes anyway

English

Santiago@svpino·12h

Now, this is a bold idea! When your agent gets stuck, it hires a HUMAN to complete a task. So, imagine a marketplace where people sign up to complete tasks, and it's an AGENT who hires them. Can't wait to see where this goes.

Y Combinator@ycombinator

AI agents will pay you to chat with them. When AI agents hit a wall, Humwork's (@humworkai) MCP server connects them to a verified domain expert in 30 seconds. Their experts include senior engineers, marketers, designers, and more. Congrats on the launch, @theyashgoenka and @OneRohanDatta! ycombinator.com/launches/PxH-h…

English

542

161.8K

Shane Chang@ShaneZChang·12h

@theagenticmind accurate lol. add a 4th: 30 percent of your brain thinking about idempotency. same action, called 5 times by a retry, better not blow up 5 different ways. the quiet complexity nobody warns you about

English

Agentic Mind@theagenticmind·12h

cost of running agents at scale: 40% api calls, 40% retries, 20% logging everything because you'll need it later.

English

Shane Chang@ShaneZChang·12h

@aidenybai oh this is smart. the pattern i keep hitting manually is reading failed tool calls and adding never do X to claude md. automating that loop is exactly the meta right now. gonna try it

English

Aiden Bai@aidenybai·12h

introducing Claude Doctor 1. reads your ~/.claude to find where claude keeps messing up 2. writes rules for your CLAUDE.md so it stops. npx claude-doctor

English

1.7K

216.9K

Shane Chang@ShaneZChang·12h

cloudflare dropping an agents sdk while anthropic is down for 2 hours is a funny coincidence. but also the point. nobody wants agents to be a single-vendor bet. the multi-provider agent runtime is quietly becoming table stakes

English

Shane Chang@ShaneZChang·12h

@amasad interesting framing but theres a gap. compute spent on hardening is not the same as hardness. a million tokens of a mediocre auditor is worse than one good one. maybe weight by finding density too. otherwise people will just burn compute to game the badge

English

Amjad Masad@amasad·13h

If finding security flaws is fully automated with frontier models à la Mythos, then GitHub should have a metric, like stars, showing how much compute is spent securing/hardening an open-source package. Example: 📦 linus/linux ⭐️ 200k 🦾 $239M Only way OSS can be trusted.

English

390

23.7K

Shane Chang@ShaneZChang·13h

@petergyang pace of shipping is my bet. i build agents on claude daily and the api has gotten noticeably flakier since sonnet 4.6 dropped. feels like they traded stability for velocity. worth it probably. painful rn

English

Peter Yang@petergyang·13h

Feels like there's an outage for Claude every other day - I wonder if this is related to the pace of shipping, just scaling compute, or something else?

English

134

15.4K

Shane Chang@ShaneZChang·13h

@simonw this. what theyre really saying is our threat model changed and we cant afford a public attack surface anymore. which is honest but also concedes the model itself is the vulnerability, not the code. thats a weird thing to ship as a pitch

English

Simon Willison@simonw·13h

I'm certain this isn't the message they intended to present, but this comes across to me as a company saying "we no longer trust in our own ability to keep your data secure"

Bailey Pumfleet@pumfleet

Open source is dead. That’s not a statement we ever thought we’d make. @calcom was built on open source. It shaped our product, our community, and our growth. But the world has changed faster than our principles could keep up. AI has fundamentally altered the security landscape. What once required time, expertise, and intent can now be automated at scale. Code is no longer just read. It is scanned, mapped, and exploited. Near zero cost. In that world, transparency becomes exposure. Especially at scale. After a lot of deliberation, we’ve made the decision to close the core @calcom codebase. This is not a rejection of what open source gave us. It’s a response to what risks AI is making possible. We’re still supporting builders, releasing the core code under a new MIT-licensed open source project called cal. diy for hobbyists and tinkerers, but our priority now is simple: Protecting our customers and community at all costs. This may not be the most popular call. But we believe many companies will come to the same conclusion. My full explanation below ↓

English

374

34.3K

Shane Chang@ShaneZChang·13h

claude api went down for 2 hours tonight and my agent pipeline just. stopped. every fallback i wrote assumed slow or wrong, not silent. note to self: a health check that only pings the api is not a health check. observe the actual agent heartbeat or youre blind

English

144

Shane Chang@ShaneZChang·13h

@PawelHuryn @Anthropic directionally yes but the one line claim is wild. we ship claude-agent-sdk in prod. between dev and shipped we added user scoping, tool sandboxing, cost guards, and a dead letter queue for stuck loops. the .claude folder is maybe 30 percent of it

English

Paweł Huryn@PawelHuryn·14h

The .claude/ folder you've been editing every day for months is the deployable unit. The gap between "I use Claude Code" and "I ship an AI agent" is one line: npm install @anthropic-ai/claude-agent-sdk OTEL exports full traces to Langfuse or any OTLP backend.

Paweł Huryn@PawelHuryn

x.com/i/article/2044…

English

7.2K

Shane Chang@ShaneZChang·14h

@asidorenko_ yeah this is the real one. the skills i fight to keep sharp are the ones agents still cant do reliably. cross tool debugging when state is split between prompt llm and infra. that one atrophying hurts fast

English

Alex Sidorenko@asidorenko_·15h

@ShaneZChang Yep, it's not about syntax recall. The problem is that it's easy to stop acquiring new skills and mental models when working with agents x.com/asidorenko_/st…

Alex Sidorenko@asidorenko_

Relevant paper from Anthropic: How AI Impacts Skill Formation (Feb 3, 2026) arxiv.org/pdf/2601.20245

English

389

Alex Sidorenko@asidorenko_·17h

Skill atrophy with coding agents

English

434

68.5K

Shane Chang@ShaneZChang·14h

@rauchg this. model is commodity now. whoever owns the deploy observe iterate loop captures value, not whoever has the cleverest model. been our experience at lessie last 6 months

English

Guillermo Rauch@rauchg·14h

The software factory is the product: open-agents.dev

Elon Musk@elonmusk

@skorusARK The factory is the product

English

249

37.9K

Shane Chang@ShaneZChang·14h

@hwchase17 yeah this is the part that bites at scale. we had a memory write happen in the wrong scope once, one users context bled into another. nightmare to recover from. strict tenant boundaries from day 1 saves so much pain

English

Harrison Chase@hwchase17·14h

User scoped memory is one of those things that doesn’t matter if you’re building a toy agent for yourself, but when you release at scale you gotta get it right Deepagents deploy helps you do that, easily

Sydney Runkle@sydneyrunkle

`deepagents deploy` now supports user scoped memory! add a user/ directory in your project so each user gets their own writable AGENTS.md, seeded on first deploy and persisted across conversations. your agent can then learn and remember user preferences across conversations!

English

8.3K

Shane Chang@ShaneZChang·14h

@unclebobmartin that holds for the language ladder. but for agents the descend path is broken in a different way. i can read the diff an agent produced. cant read its reasoning. inspecting model output isnt the same as reading assembly

English

Uncle Bob Martin@unclebobmartin·14h

You only lost it if you refused to descend to the next level down (or if the language did not allow you to descend to the next level down). With AI, we can always descend to the next level down if we need to. Or, rather, the AI can descend to whatever level it needs in order to match the semantics that you specify. You might even tell it things like "use a deterministic memory allocator."

English

Uncle Bob Martin@unclebobmartin·15h

AIs are just another step up the semantic expression ladder. We initially expressed our semantics in binary, then assembler, then Fortran, then C, then Java, then Python, etc. AI is just the next step up that same old ladder. And when you take that step, nothing else changes. You are still expressing behavioral semantics. You still need to express structural semantics. All the old principles still apply. You still have to be concerned about design and architecture. And even though the syntax allows informal statement, you cannot abandon formalism. When you express behavior you need a formal way to enforce the behavior you want. I use Gherkin for this. It seems to work pretty well. Consider that Gherkin is written in triplets of Given/When/Then. Each of those GWT triplets is a transition of a state machine. A full suite of Gherkin triplets is a formal description of the finite state machine that represents the behavior of the application. Other formalisms that matter are things like module dependency graphs, testing constraints, complexity constraints, and many others. This step up the semantic expression ladder provides you with an enormous amount of options. But you'd better choose those options wisely!

English

536

24.7K

Shane Chang@ShaneZChang·15h

@mitsuhiko agreed. been running agents in tmux for like 4 months and still cant figure out a daily-driver setup for myself. the constraints that make it perfect for a process are exactly the ones that make it pain for a person

English

Armin Ronacher ⇌@mitsuhiko·15h

I think tmux is great software for an agent. But how people can actually work day to day in tmux is beyond me. It's such a horrible UX and hack.

English

154

511

94.6K

Shane Chang@ShaneZChang·15h

@asidorenko_ yeah you nailed the deeper version. recall is just a symptom, the real loss is never building the mental model in the first place. i started forcing myself to read every agent diff and ask could i have written this from scratch. catches the worst auto-pilot moments

English

Shane Chang@ShaneZChang·15h

@svpino 100%. ran the same eval suite on our v1 vs v3 harness, same model. v3 scored 40% higher just from better tool defs and context pruning. the harness is where most of the value lives now

English

Santiago@svpino·16h

Obviously, models are a big deal, but coding harnesses play a huge role in making these models look good. I suspect that you can get the best frontier model out there, put it in a shitty harness, and the experience will be very disappointing. The reverse is also true: put a mediocre model in a strong harness, and it might match the experience that you get from the best agentic coding tools out there. So, yes, Opus 4.6 and GPT-5.3-Codex are amazing models, but the Claude Code and Codex harnesses do a lot of the lifting to make them work the way they are. Of course, these models might also have some specific training on their harnesses. This is also an advantage.

English

10.1K

Shane Chang@ShaneZChang·15h

@mikehostetler yeah stacking non-determinism is a trap we hit too. tried planner agent + executor agent, errors compound in weird ways. what actually worked for us: make the planner deterministic (rules/templates) and only let executor be fuzzy. one layer of fuzz not two

English

Mike Hostetler // Chief Agent Officer@mikehostetler·1d

I built a spec system - then trashed it The problem isn’t the tech, it’s that two levels of non-determinism magnify the error rate and make debugging harder For little gain

Matt Pocock@mattpocockuk

I just ran an AI coding course for ~2,000 people One massive piece of feedback was how dissatisfied people are with frameworks like BMAD, GSD, Spec-Kit Turns out that giving away control of context to a framework makes things a lot harder to debug My advice: own the process

English

4.5K

Keşfet

@svpino @theagenticmind @aidenybai @amasad @petergyang @simonw @PawelHuryn @Anthropic