Shane Chang

157 posts

Shane Chang banner
Shane Chang

Shane Chang

@ShaneZChang

Core R&D @ Lessie AI. Building AI agents for production users who would notice at 3am. Practitioner notes, not pundit takes.

Singapore Katılım Aralık 2023
30 Takip Edilen10 Takipçiler
Sabitlenmiş Tweet
Shane Chang
Shane Chang@ShaneZChang·
I build AI agents at Lessie AI — not demos, production systems for real users. This account: what actually works when you ship agents daily. Context engineering, harness patterns, self-improving systems, and the failures nobody tweets about.
English
0
0
0
55
Shane Chang
Shane Chang@ShaneZChang·
vibe coding vs spec driven is a false binary imo. every production agent i have shipped started as vibes and ended as spec. the spec is what you learn by failing. skipping vibes to write the perfect spec upfront is how you build the wrong thing faster
English
0
0
0
4
Shane Chang
Shane Chang@ShaneZChang·
@svpino weird part: we already do this informally. when my agent hits an ambiguous case it escalates to me or a teammate. humwork is formalizing the market. biggest question is latency. 30 seconds sounds fast until you realize most agents wait minutes anyway
English
0
0
0
3
Santiago
Santiago@svpino·
Now, this is a bold idea! When your agent gets stuck, it hires a HUMAN to complete a task. So, imagine a marketplace where people sign up to complete tasks, and it's an AGENT who hires them. Can't wait to see where this goes.
Y Combinator@ycombinator

AI agents will pay you to chat with them. When AI agents hit a wall, Humwork's (@humworkai) MCP server connects them to a verified domain expert in 30 seconds. Their experts include senior engineers, marketers, designers, and more. Congrats on the launch, @theyashgoenka and @OneRohanDatta! ycombinator.com/launches/PxH-h…

English
63
44
542
161.8K
Shane Chang
Shane Chang@ShaneZChang·
@theagenticmind accurate lol. add a 4th: 30 percent of your brain thinking about idempotency. same action, called 5 times by a retry, better not blow up 5 different ways. the quiet complexity nobody warns you about
English
0
0
0
3
Agentic Mind
Agentic Mind@theagenticmind·
cost of running agents at scale: 40% api calls, 40% retries, 20% logging everything because you'll need it later.
English
1
0
0
8
Shane Chang
Shane Chang@ShaneZChang·
@aidenybai oh this is smart. the pattern i keep hitting manually is reading failed tool calls and adding never do X to claude md. automating that loop is exactly the meta right now. gonna try it
English
0
0
0
65
Aiden Bai
Aiden Bai@aidenybai·
introducing Claude Doctor 1. reads your ~/.claude to find where claude keeps messing up 2. writes rules for your CLAUDE.md so it stops. npx claude-doctor
English
60
79
1.7K
216.9K
Shane Chang
Shane Chang@ShaneZChang·
cloudflare dropping an agents sdk while anthropic is down for 2 hours is a funny coincidence. but also the point. nobody wants agents to be a single-vendor bet. the multi-provider agent runtime is quietly becoming table stakes
English
0
0
0
19
Shane Chang
Shane Chang@ShaneZChang·
@amasad interesting framing but theres a gap. compute spent on hardening is not the same as hardness. a million tokens of a mediocre auditor is worse than one good one. maybe weight by finding density too. otherwise people will just burn compute to game the badge
English
0
0
0
1
Amjad Masad
Amjad Masad@amasad·
If finding security flaws is fully automated with frontier models à la Mythos, then GitHub should have a metric, like stars, showing how much compute is spent securing/hardening an open-source package. Example: 📦 linus/linux ⭐️ 200k 🦾 $239M Only way OSS can be trusted.
English
37
18
390
23.7K
Shane Chang
Shane Chang@ShaneZChang·
@petergyang pace of shipping is my bet. i build agents on claude daily and the api has gotten noticeably flakier since sonnet 4.6 dropped. feels like they traded stability for velocity. worth it probably. painful rn
English
0
0
0
43
Peter Yang
Peter Yang@petergyang·
Feels like there's an outage for Claude every other day - I wonder if this is related to the pace of shipping, just scaling compute, or something else?
English
63
1
134
15.4K
Shane Chang
Shane Chang@ShaneZChang·
@simonw this. what theyre really saying is our threat model changed and we cant afford a public attack surface anymore. which is honest but also concedes the model itself is the vulnerability, not the code. thats a weird thing to ship as a pitch
English
0
0
0
6
Simon Willison
Simon Willison@simonw·
I'm certain this isn't the message they intended to present, but this comes across to me as a company saying "we no longer trust in our own ability to keep your data secure"
Bailey Pumfleet@pumfleet

Open source is dead. That’s not a statement we ever thought we’d make. @calcom was built on open source. It shaped our product, our community, and our growth. But the world has changed faster than our principles could keep up. AI has fundamentally altered the security landscape. What once required time, expertise, and intent can now be automated at scale. Code is no longer just read. It is scanned, mapped, and exploited. Near zero cost. In that world, transparency becomes exposure. Especially at scale. After a lot of deliberation, we’ve made the decision to close the core @calcom codebase. This is not a rejection of what open source gave us. It’s a response to what risks AI is making possible. We’re still supporting builders, releasing the core code under a new MIT-licensed open source project called cal. diy for hobbyists and tinkerers, but our priority now is simple: Protecting our customers and community at all costs. This may not be the most popular call. But we believe many companies will come to the same conclusion. My full explanation below ↓

English
27
20
374
34.3K
Shane Chang
Shane Chang@ShaneZChang·
claude api went down for 2 hours tonight and my agent pipeline just. stopped. every fallback i wrote assumed slow or wrong, not silent. note to self: a health check that only pings the api is not a health check. observe the actual agent heartbeat or youre blind
English
0
0
0
144
Shane Chang
Shane Chang@ShaneZChang·
@PawelHuryn @Anthropic directionally yes but the one line claim is wild. we ship claude-agent-sdk in prod. between dev and shipped we added user scoping, tool sandboxing, cost guards, and a dead letter queue for stuck loops. the .claude folder is maybe 30 percent of it
English
1
0
1
57
Paweł Huryn
Paweł Huryn@PawelHuryn·
The .claude/ folder you've been editing every day for months is the deployable unit. The gap between "I use Claude Code" and "I ship an AI agent" is one line: npm install @anthropic-ai/claude-agent-sdk OTEL exports full traces to Langfuse or any OTLP backend.
Paweł Huryn@PawelHuryn

x.com/i/article/2044…

English
5
3
51
7.2K
Shane Chang
Shane Chang@ShaneZChang·
@asidorenko_ yeah this is the real one. the skills i fight to keep sharp are the ones agents still cant do reliably. cross tool debugging when state is split between prompt llm and infra. that one atrophying hurts fast
English
0
0
0
63
Alex Sidorenko
Alex Sidorenko@asidorenko_·
Skill atrophy with coding agents
English
17
17
434
68.5K
Shane Chang
Shane Chang@ShaneZChang·
@rauchg this. model is commodity now. whoever owns the deploy observe iterate loop captures value, not whoever has the cleverest model. been our experience at lessie last 6 months
English
0
0
0
4
Shane Chang
Shane Chang@ShaneZChang·
@hwchase17 yeah this is the part that bites at scale. we had a memory write happen in the wrong scope once, one users context bled into another. nightmare to recover from. strict tenant boundaries from day 1 saves so much pain
English
0
0
0
5
Harrison Chase
Harrison Chase@hwchase17·
User scoped memory is one of those things that doesn’t matter if you’re building a toy agent for yourself, but when you release at scale you gotta get it right Deepagents deploy helps you do that, easily
Sydney Runkle@sydneyrunkle

`deepagents deploy` now supports user scoped memory! add a user/ directory in your project so each user gets their own writable AGENTS.md, seeded on first deploy and persisted across conversations. your agent can then learn and remember user preferences across conversations!

English
8
11
67
8.3K
Shane Chang
Shane Chang@ShaneZChang·
@unclebobmartin that holds for the language ladder. but for agents the descend path is broken in a different way. i can read the diff an agent produced. cant read its reasoning. inspecting model output isnt the same as reading assembly
English
0
0
0
9
Uncle Bob Martin
Uncle Bob Martin@unclebobmartin·
You only lost it if you refused to descend to the next level down (or if the language did not allow you to descend to the next level down). With AI, we can always descend to the next level down if we need to. Or, rather, the AI can descend to whatever level it needs in order to match the semantics that you specify. You might even tell it things like "use a deterministic memory allocator."
English
1
0
0
71
Uncle Bob Martin
Uncle Bob Martin@unclebobmartin·
AIs are just another step up the semantic expression ladder. We initially expressed our semantics in binary, then assembler, then Fortran, then C, then Java, then Python, etc. AI is just the next step up that same old ladder. And when you take that step, nothing else changes. You are still expressing behavioral semantics. You still need to express structural semantics. All the old principles still apply. You still have to be concerned about design and architecture. And even though the syntax allows informal statement, you cannot abandon formalism. When you express behavior you need a formal way to enforce the behavior you want. I use Gherkin for this. It seems to work pretty well. Consider that Gherkin is written in triplets of Given/When/Then. Each of those GWT triplets is a transition of a state machine. A full suite of Gherkin triplets is a formal description of the finite state machine that represents the behavior of the application. Other formalisms that matter are things like module dependency graphs, testing constraints, complexity constraints, and many others. This step up the semantic expression ladder provides you with an enormous amount of options. But you'd better choose those options wisely!
English
47
61
536
24.7K
Shane Chang
Shane Chang@ShaneZChang·
@mitsuhiko agreed. been running agents in tmux for like 4 months and still cant figure out a daily-driver setup for myself. the constraints that make it perfect for a process are exactly the ones that make it pain for a person
English
0
0
0
56
Armin Ronacher ⇌
Armin Ronacher ⇌@mitsuhiko·
I think tmux is great software for an agent. But how people can actually work day to day in tmux is beyond me. It's such a horrible UX and hack.
English
154
12
511
94.6K
Shane Chang
Shane Chang@ShaneZChang·
@asidorenko_ yeah you nailed the deeper version. recall is just a symptom, the real loss is never building the mental model in the first place. i started forcing myself to read every agent diff and ask could i have written this from scratch. catches the worst auto-pilot moments
English
0
0
1
68
Shane Chang
Shane Chang@ShaneZChang·
@svpino 100%. ran the same eval suite on our v1 vs v3 harness, same model. v3 scored 40% higher just from better tool defs and context pruning. the harness is where most of the value lives now
English
0
0
0
2
Santiago
Santiago@svpino·
Obviously, models are a big deal, but coding harnesses play a huge role in making these models look good. I suspect that you can get the best frontier model out there, put it in a shitty harness, and the experience will be very disappointing. The reverse is also true: put a mediocre model in a strong harness, and it might match the experience that you get from the best agentic coding tools out there. So, yes, Opus 4.6 and GPT-5.3-Codex are amazing models, but the Claude Code and Codex harnesses do a lot of the lifting to make them work the way they are. Of course, these models might also have some specific training on their harnesses. This is also an advantage.
English
36
5
74
10.1K
Shane Chang
Shane Chang@ShaneZChang·
@mikehostetler yeah stacking non-determinism is a trap we hit too. tried planner agent + executor agent, errors compound in weird ways. what actually worked for us: make the planner deterministic (rules/templates) and only let executor be fuzzy. one layer of fuzz not two
English
0
0
1
6