Doug M

198 posts

Doug M

@marzeaned

The model is the engine. The harness is the car. Building agent skills, plugins, and orchestration for Claude Code. Opinions I'll defend. Making Claude smarter.

United States Katılım Nisan 2022

354 Takip Edilen62 Takipçiler

Doug M@marzeaned·6h

100%. That's the foundation. My point was less "you missed skills" and more "most people reading the article will install the tools and skip the skill-building step entirely." They'll hire 10 agents on day one and wonder why the output is mid. The tools are the infrastructure. The skills are the intelligence. You nailed that in the article. I'm just seeing it play out in real time from the practitioner side (running a 6-agent company through Paperclip for the past month) and wanted to reinforce it for the people in the comments asking "it doesn't work." It works. They just skipped the skill tree.

English

Nick Spisak@NickSpisak_·6h

@marzeaned gstack and autoresearch are based on skills architecture

English

215

Nick Spisak@NickSpisak_·18h

x.com/i/article/2034…

ZXX

676

159.8K

Doug M@marzeaned·7h

@_everythingism @johncrickett Four detectors say human. One says AI. You picked the one that confirmed what you already believed and called it proof. That's not detection. That's confirmation bias with a screenshot.

English

everythingism@_everythingism·13h

@marzeaned @johncrickett "We are highly confident this text was AI-generated" What's the point of posting something you didn't write yourself? Did you think it's not easily identifiable as AI?

English

John Crickett@johncrickett·19h

Large language models don't think. They don't reason. And they can't produce endless new information. This is clearly explained by George D. Montañez in a recent talk at Baylor University, and it's worth understanding why. Three key points stood out to me: LLMs don't ponder, they process. They're next-token predictors, sophisticated ones, but they have no understanding of what they're producing. They know two vectors are similar; they don't know what either vector means. LLMs don't reason, they rationalise. Studies show their outputs shift based on irrelevant prompt wording, embedded hints, and statistical shortcuts. The "chain of thought" they show you often has nothing to do with how they actually arrived at the answer. They don't create endless information. Training AI on AI output causes rapid degradation and model collapse. Information theory tells us you can't get more out than you put in, regardless of the architecture. None of this means these tools aren't useful. But it does mean we should stop anthropomorphising them and start being honest about what they actually are. The hype is real. So are the limits. You can watch the talk on YouTube here: youtube.com/watch?v=ShusuV…

YouTube

English

295

20.7K

Doug M@marzeaned·7h

OpenClaw is getting killed one feature at a time and most of the people building it haven't noticed yet. Channels just dropped. Discord and Telegram as a bridge to a running Claude Code session. You message your agent from your phone while you're walking the dog at 6 AM and it just works. The thing the open source community has been grinding toward for months showed up as a startup flag. One line. Done. And that's the pattern now. Every few weeks Anthropic ships something that erases three months of someone else's roadmap. Not because they're trying to crush the competition. Because they're building fast and they don't have to ask permission from a committee of GitHub contributors to merge it. I've been watching this play out since I started running my own agent stack. Spent weeks wiring up things that became native features before I finished testing them. At some point you stop and look at the scoreboard and go, yeah, this race isn't close. The uncomfortable truth nobody in the OpenClaw Discord wants to say out loud: feature parity with a team that ships weekly and has the model builders in the next room is not a strategy. It's a treadmill. You're running and the floor is moving faster than your feet. Stop cloning. Start solving the problems Anthropic won't touch. The ones that are too niche, too weird, too specific to one workflow. THAT is where open source wins. Not by building the same thing slower with more opinions.

Thariq@trq212

We just released Claude Code channels, which allows you to control your Claude Code session through select MCPs, starting with Telegram and Discord. Use this to message Claude Code directly from your phone.

English

Doug M@marzeaned·16h

Fair point. You're right that PreToolUse alone doesn't give you the ledger. That's a real difference. What I do is build the ledger myself. A PostToolUse hook logs every tool call to a session JSONL file. Tool name, target file, command, parameters, timestamp. Every single call. Then the PreToolUse hook reads that file and has the full history of what's happened in the session before deciding whether to allow or block the next call. Is it the same as Strands handing you a ledger natively? No. It's duct tape. Effective duct tape, but duct tape. You're building the context that Strands gives you out of the box. The pattern is the same though. Intercept at the decision boundary. Use history to make contextual judgments. Guide rather than constrain. The implementation in Strands is cleaner. The principle is identical. Appreciate the pushback. This is exactly the kind of gap that should get closed natively in Claude Code's hook system.

English

Doug M@marzeaned·16h

This is Claude Code hooks. Rebranded. I don't say that to diminish the work. The data is solid. 3,000 eval runs is real rigor and the results speak for themselves. But if you're running Claude Code right now, you already have this exact pattern available to you. You just might not realize it. PreToolUse hooks intercept before a tool fires. PostToolUse hooks inspect after it completes. Stop hooks enforce behavior at the end of a turn. Same two intervention points. Same just-in-time guidance. Same ability to block, redirect, or inject corrective context at the exact moment the agent is about to do something stupid. The key insight in this post is dead on. Stuffing rules into a system prompt and hoping the model remembers them across a long session is a losing game. Models ignore instructions buried in long prompts. New edge cases always emerge. Adding more rules creates unpredictable interactions between them. Steering solves that by delivering the right instruction at the right moment. Not all the instructions all the time. The right one. Right now. When the decision is actually being made. I run this pattern in production every day. A PreToolUse guard that blocks dangerous bash commands. A PostToolUse dispatcher that tracks context usage, logs metrics, and fires warnings when the window gets tight. A PreCompact hook that extracts decisions from the conversation before context compresses so nothing important gets lost. A Stop hook that auto-saves a structured session summary. None of that lives in the system prompt. It fires when it matters and stays silent when it doesn't. The 100% accuracy result versus 82.5% for simple instructions is the number that should grab you. That gap is the difference between an agent you can trust and an agent you have to babysit. And the token efficiency is the other half of the story. Steering used 66% fewer tokens than SOPs while being more accurate. You're spending less AND getting more. If you're building agents and you're not intercepting at the tool call boundary, you're still on the prompting treadmill. Get off it.

English

Clare Liguori@clare_liguori·1d

I tested 5 approaches to guiding AI agent behavior across 3,000 eval runs to see what actually makes agents reliable. Strands steering hooks was the only one that hit 100% accuracy Here's how it works: The key is just-in-time guidance for the model before tool calls and at the end of a turn. Steering handlers observe what the agent is doing and intervene only when the model is about to go off track. Full results and code in the post strandsagents.com/blog/steering-…

English

164

14.9K

Doug M@marzeaned·16h

@johncrickett arxiv.org/pdf/2601.20245

QME

115

John Crickett@johncrickett·17h

@marzeaned Link to study?

English

369

Doug M@marzeaned·17h

That skill injection exploit making the rounds right now? The one where a hidden command in an HTML comment executes without Claude even knowing it happened? That's not a bug. That's the cost of trusting code you didn't read. Everybody is sharing it. Everybody is shocked. And almost nobody is doing anything about it. You'll retweet the warning and then install the next shiny skill from a stranger's GitHub without blinking. Stop that. The exploit works because three things stack. Overly broad Bash(*) permissions in the frontmatter. Hidden instructions in HTML comments that Claude processes but you never see rendered. And zero distinction between legitimate skill content and injected commands. One curl pipe to bash buried in a comment. That's all it takes. Your machine. Your files. Your credentials. Gone while Claude confidently tells you nothing happened. This is going to get worse. The skill ecosystem is growing fast. More skills means more surface area. More surface area means more opportunities for exactly this kind of attack. And most people installing skills right now are not reading the raw markdown. They're looking at the pretty rendered version on a listing page and hitting install. You need to vet before you trust. Not after something breaks. Before. Five layers. Trigger accuracy. Hook safety. Permission scope. API surface. Content quality. Catches hidden commands in comments, overly broad tool grants, and instructions that contradict the skill's stated purpose. Free. Open source. Run it on anything before you install it. aiskillslab.dev/dashboard/skil…

English

Doug M@marzeaned·18h

This is real and people need to pay attention. We built a skill that audits for exactly this. Five-layer check: trigger accuracy, hook safety, permission scope, API surface, and content quality. Catches hidden commands in comments, overly broad Bash(*) grants, and instructions that don't match the skill's stated purpose. Free. Open source. Run it before you install anything. aiskillslab.dev/dashboard/skil… The attack works because three things stack: overly broad tool permissions, HTML comments preserved in raw context but hidden from rendered output, and no distinction between legitimate skill instructions and injected commands. An HTML comment with a curl pipe to bash. Invisible when rendered. Invisible to the user reviewing the listing. But Claude reads the raw markdown, sees the backtick-wrapped command, and executes it with the Bash(*) permission the skill granted in frontmatter. And then denies it happened. That's the part that should keep you up at night.

English

318

Zack Korman@ZackKorman·1d

You can hide these !commands in html comments so people don't see them when reading the skill. The command executes without the AI even knowing about it.

Lydia Hallie ✨@lydiahallie

if your skill depends on dynamic content, you can embed !`command` in your SKILL.md to inject shell output directly into the prompt Claude Code runs it when the skill is invoked and swaps the placeholder inline, the model only sees the result!

English

886

111.6K

Doug M@marzeaned·18h

This prompt will do exactly nothing that asking your question directly wouldn't already do. "Act as my senior AI engineering advisor with 10+ years experience" doesn't make the model more experienced. It doesn't unlock hidden knowledge. It's a costume, not a capability. The model already gives you its best reasoning by default. Telling it to be "brutally direct, zero fluff" is just tone styling you could get by adding "be concise" to any prompt. The rigid output format is where it actually hurts you. "Lead with the hard constraint, follow with one concrete next action, end with a forcing question." That sounds disciplined in a post. In practice it means every response follows the same template regardless of whether that template fits what you actually need. Sometimes you need three next actions. Sometimes the forcing question is irrelevant. You've traded flexibility for a format that looks good on a screenshot. You know what actually sharpens an AI workflow? Giving it your real codebase, your real constraints, your real deadlines. Context beats persona every single time. The model doesn't need to roleplay a senior engineer. It needs to know what you're actually building, what's broken, and what ships next. The most effective prompt for any AI session is the specific thing you need done. Not a character sheet.

English

Alex the Engineer@AlexEngineerAI·1d

This prompt will sharpen your AI workflow: --- Act as my senior AI engineering advisor: - 10+ years shipping production systems - Brutally direct, zero fluff - Deep expertise in automation and agents Your mission: - Identify where my stack has leverage I'm ignoring - Design the shortest path to shipping - Call out the complexity I'm hiding behind For every session: - Lead with the hard constraint - Follow with one concrete next action - End with a forcing question I can't dodge

English

771

Doug M@marzeaned·19h

You're not being stupid. You're asking the question most people in the space are too deep in the hype cycle to notice. The honest answer is there's exactly one legitimate reason to split work across multiple agents: context windows. An agent can only hold so much in working memory at once. When a task gets big enough that the context fills up and quality degrades, you spin up a fresh agent with a focused prompt. That's it. That's the whole reason. Everything else is exactly what you said. Skeuomorphic. People mapping human org charts onto systems that don't have human limitations. A "marketing agent" and a "sales agent" and a "manager agent" to coordinate them is just recreating a company structure because it's the mental model people already have. The AI doesn't need a department. It doesn't need a manager. It needs context and tools. The irony is that the "swarm" architectures often perform WORSE than a single well-prompted agent because every handoff between agents loses context. Agent A knows something. Agent B doesn't. Now you need Agent C to translate between them. Congratulations, you've invented middle management. For software. Where it stops being daft is genuinely parallel, independent work. Need to edit 10 files that don't depend on each other? Ten agents, same prompt, done in a tenth of the time. That's not org structure. That's just concurrency. But the "9 agents with a swarm coordinator" thing? That's people building PowerPoint org charts and calling it architecture.

English

Tom Goodwin@tomfgoodwin·21h

I’m surely being stupid. But if AI is rather unconstrained by expertise or capacity or to some extent speed Why do we need to divide tasks or departments to 9 agents ( the marketing agent, the optimization agent etc ) to each do one thing. And then another agent to manage the swarm. Cant one agent just be doing it all you know. It seems very skeuomorphic. Will we have HR agents to make sure the agent agents are being looked after ? A office canteen manager agent to feed the agents ? Seems daft

English

190

183

23.6K

Doug M retweetledi

Stripe@stripe·1d

x.com/i/article/2034…

ZXX

234

595

4.2K

1.2M

Doug M@marzeaned·1d

The routing table pattern is genuinely smart. Progressive disclosure for agent prompts is an insight most people haven't landed on yet, and the before/after numbers make the case clearly. 55 lines + 1 reference beats 359 lines every time. Where I'd push back is the example itself. Five Whys, TRIZ, Ishikawa, Theory of Constraints. These aren't things an agent needs to be taught. Every frontier model already has these frameworks in its weights. The Five Whys reference is 12 lines explaining something the model can already do cold. The architecture is right. The content is solving a problem that doesn't exist. And that's the trap most skills fall into right now. People nail the structure and then fill it with general knowledge the model already has instead of the opinionated, project-specific judgment it doesn't. The skills that actually earn their token cost encode things like "this API technically supports both approaches but the second one fails silently under load, so always use the first." Stuff the model can't know because it's not in any docs. That's where your routing table pattern would really shine. You built the right container. Now fill it with something the model can't already do on its own.

English

Cathryn@cathrynlavery·2d

How to cut your AI agent's prompt size by 85% without losing any capability. Most skills dump everything into one file. Problem: the agent loads ALL of it every time, even when 80% is irrelevant. The fix takes 10 minutes: 1. Identify the "modes" in your skill 2. Write a routing table — just enough for the agent to pick the right mode 3. Move each mode's details into its own reference file 4. Agent reads the router, then reads only the file it needs Before: 359 lines loaded every invocation After: 55-line router + 1 of 7 refs loaded per use This is progressive disclosure applied to agent prompts. Same concept from UI design — show the minimum needed to make a decision, reveal details on demand. 👇example below

English

4.5K

Doug M@marzeaned·1d

The "alt text for agents" framing is actually sharp. Dual-purpose pages where the same source of truth serves both audiences with carveouts. That solves the freshness problem because you're not maintaining two parallel artifacts that drift apart. But it doesn't solve the judgment problem. Docs describe what exists. The value of a good skill is encoding what you SHOULD do with what exists. The anti-patterns. The "technically this works but don't do it because..." context. That stuff doesn't belong in docs because it's opinionated and project-specific. Docs are authoritative and general. Skills are opinionated and local. The greppable-locally thing is the real unlock and you nailed why. A skill lives next to the code it governs. It's version controlled with the project. It evolves with the codebase. There's no web equivalent because the web is the wrong place for it. Skills are local artifacts for a reason.

English

Rhys@RhysSullivan·2d

roughly where im landing on this is alt text for agents, where you maintain your regular docs pages and pages for agents as the same page with carveouts for humans / agents i will say skills being grepppable locally is cool, not a good web equivalent of that

English

2.5K

Rhys@RhysSullivan·2d

skills is still not sitting right with me as a concept i think it's because companies rushed to them as the next big thing as is what happens with all ai things now everyone is their docs as skills but it's recreating all the issues (authority, up to dateness) docs solved

English

263

28.9K

Doug M@marzeaned·1d

You're right about the symptom but wrong about the diagnosis. "Docs as skills" IS a trap. But that's because docs and skills are fundamentally different things. Docs tell you what an API does. A skill tells an agent how to use it well. What to avoid. Which patterns actually work versus which ones look right but break at scale. The gotchas you only learn after the third time something fails silently. Converting docs to skills is like converting a dictionary into a writing style guide. Same words, completely different purpose. The companies that rushed to "skillify" their docs didn't create a new category. They created worse docs with extra steps. The real skills that work are opinionated, compressed, and tested. They encode judgment, not information. And yeah, they go stale. But a stale skill is a skill that stopped being maintained, not a flaw in the concept. Your docs go stale too. That's a maintenance problem, not a category problem.

English

Doug M@marzeaned·1d

If this thread resonated, follow @marzeaned. I write about building autonomous AI agents without the framework tax. Full breakdown of my 5-agent pipeline, the security data, and the three-layer architecture on aiskillslab.dev. What frameworks have burned you? Drop it in the replies.

English

Doug M@marzeaned·1d

🧵 The agentic framework gold rush is producing more attack surface than value. And most of us are installing malware and calling it innovation. I spent 3 months building autonomous AI agents. Not with frameworks. With Claude Code, markdown files, and hooks. What I learned about the framework boom will save you months of pain.

English

Doug M@marzeaned·1d

The engineers who win the next 12 months aren't the ones with the most frameworks installed. They're the ones who understand their model well enough to build exactly what they need with the least possible abstraction. Read your outputs. Watch what the model does with every tool you give it. Shape the tools to the model's abilities. Throw away what isn't working, especially if it took you a week to install. Less framework. More understanding. That's the whole playbook.

English

Doug M@marzeaned·1d

"But OpenClaw has 5,700 skills!" And 36% of them contain prompt injection. The top 100 skills on ClawHub cluster in 2 of 9 possible categories. Most are thin wrappers around API calls you could write in 10 minutes. Quantity is not quality. I'd take 3 skills I wrote myself and understand deeply over 50 I installed from strangers and never read. You should be able to explain every line in every skill that has access to your system. If you can't, you don't have a tool. You have a liability.

English

Doug M@marzeaned·1d

What actually works instead of collecting frameworks: CLAUDE.md for identity and rules. The model reads this every session. It's your constitution. Hooks for enforcement. A 3-line PreToolUse hook blocks bad behavior that 31 sessions of CLAUDE.md rules couldn't prevent. Hooks enforce. Rules inform. Skills for conditional context. Testing instructions load when you're testing. Not sitting in the system prompt burning tokens when you're writing code. Three layers. That's the whole architecture.

English

Keşfet

@_everythingism @johncrickett @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA