Jaymie Jones retweetledi
Jaymie Jones
15.6K posts

Jaymie Jones
@pixelstackcom
Engineering Manager @canva - Connect API - Canva MCP, former Engineer Manager @envato. Host @codercatchup. Build with code and teams. ❤️ Photo & Video
Katılım Şubat 2010
2.8K Takip Edilen613 Takipçiler
Jaymie Jones retweetledi

Folks: when you write skills, ask your agent to be token efficient, relax grammer. I see too many skills that write books in the skill description, and all that crap is loaded into every context.
I wrote a skill that finds the worst offenders. github.com/steipete/agent…
English
Jaymie Jones retweetledi

🚀 Introducing SkillOpt — an optimizer for agent skills.
Instead of finetuning model weights, we treat a natural-language skill as a trainable external parameter.
Think of it as deep learning for the frontier-model + agent era: learning rate, LR schedule, mini-batch, batch size, epoch, momentum — all in text-space optimization.
SkillOpt enables stable, controllable skill updates through bounded edits, allowing the optimizer to summarize “gradient directions” from agent experience and continuously improve procedural capability.
We evaluate SkillOpt across 6 benchmarks and 7 models, under both direct model calls and real agent execution loops with Codex + Claude Code. SkillOpt achieves best or tied-best results in 52/52 settings.
Train the skill, not the model. 🛠️🤖
🌐 aka.ms/skillopt
📄 huggingface.co/papers/2605.23…
English
Jaymie Jones retweetledi
Jaymie Jones retweetledi

Gradient descent for SKILL.md files sounds interesting, maybe a bit complex but it's becoming a real part of agent harness.
SkillOpt is one of the first papers to treat markdown skill files as trainable parameters and provides a proper optimization framework for them.
A few things I learned that you should consider too.
1. The validation gate is the only thing that matters in a self-editing loop.
Held-out set, strict improvement, ties rejected. End-to-end, their best skills land with 1 to 4 accepted edits total. If your "self-improving agent" is accepting most of what it proposes, you're shipping slop.
2. Bounded edits are better than full rewrites. 4 to 8 edits per step is the sweet spot.
Remove the budget and performance collapses. This is the textual analog of learning rate, and it transfers to any LLM-as-author loop. If you're using an agent to refactor your docs, your prompts, or your skills, cap the diff size.
3. Compactness wins. Median final skill: ~920 tokens.
Skills do not need to be long. They need to be high-signal. Most skill files I see are bloated because length feels like effort. It isn't.
4. The harness is becoming less important; the skill is becoming more important.
A Codex-trained skill ported into Claude Code hit +59.7 points on SpreadsheetBench. Procedural knowledge is more general than the runtime that
produced it.
5. Frozen model + trained context is the practical adaptation.
GPT-5.4-nano with a SkillOpt'd skill ≈ frontier behavior on procedural benchmarks. Cheaper, portable, inspectable, zero inference-time cost. This is
the answer to "how do we adapt a frontier model for our domain" for almost everyone who isn't training their own models.
6. Verification is the bottleneck.
Every gate in this paper depends on an auto-grader. That works for benchmarks. It fails for writing, design, and strategy, exactly the open-ended work we want to automate. Whoever builds the verifier for open-ended tasks owns the next stage.
There are also two leassons I learned while shipping v2.3.0 of my Context Engineering Agent Skills repo, measured across composer-2, claude-opus-4-7,
gpt-5.5, and gemini-3.1-pro via the @cursor_ai SDK:
- Description and body are two different surfaces. The router only sees the description. The agent sees the body once activated. They can quietly disagree, and only end-to-end task tests catch it.
- Aggregate accuracy is the wrong unit. When I rewrote three descriptions, the corpus average moved ~1pp. Individual skills moved 23–25pp. Per-skill effect size is where the action is.
Also, in Feb 2026 I shared a piece called Personal Brain OS arguing that the markdown file is a first-class substrate for agent state. SkillOpt is the optimizer-shaped version of that same argument: not "store memory in files" but "treat files as trainable parameters with proper optimization machinery around them." That's the move from static to measured.
The fast/slow split they describe already lives implicitly in the digital-brain-skill repo:
- voice-guide and tone-of-voice.md are slow-state (rarely touched)
- posts.jsonl and bookmarks.jsonl are fast-state
What SkillOpt adds that I didn't have is a protected section invariant, a structural guarantee that fast edits cannot overwrite slow lessons. Removing that mechanism cost them 22 points on SpreadsheetBench. Worth borrowing.
If you're building agents, SkillOpt: Executive Strategy for Self-Evolving Agent Skills is a good paper to read: arxiv.org/pdf/2605.23904

English
Jaymie Jones retweetledi

You might believe you should spend less time thinking about code because of AI.
I strongly disagree! We’re watching this play out live where tons of AI generated code becomes a liability.
At the end of the day, an engineer needs to be responsible / on call for code that gets shipped to production. If you don’t understand the system you’re trying to debug, you’re probably going to have a bad time.
Yes, AI can help with all of this, if you set up the proper systems. You can have agents triage prod logs, look at errors, etc. You can speed up parts of the investigation, but an engineer needs to make the call. There might be serious customer or financial implications from that change.
I expect the trend continue for trimming dependencies, vendoring code so you can modify it directly, preferring simpler systems with fewer abstractions, and spending waaaay more time thinking about system design and code maintenance.
I’ve said this before, but it’s a great time to get familiar with CS fundamentals and some of the history behind what great software looks like. Many parts will be different in the coming years as AI progresses, but also a lot more than people realize will stay the same.
English
Jaymie Jones retweetledi

Every engineer should read this.
The principles for building reliable software systems have been around for a long time. Max outlines them beautifully.
Here's to getting that 99.99% on your status page.
planetscale.com/blog/the-princ…
English
Jaymie Jones retweetledi
Jaymie Jones retweetledi

Everyone building AI agents is focusing on building the prefrontal cortex. Planning. Reasoning. Multi-step chains. There's value here. CEO-stuff.
But also, a reframe: there is value in building the cerebellum. It's offloading boring tasks into reflex so the complex thought can focus.
Your mortgage gets paid by a standing order, not a committee. The things that are not fun, not interesting, but have to be done? Done. Most agent frameworks will fail because they treat all cognition as high cognition.
The winners will nail the boring stuff first.
English
Jaymie Jones retweetledi

An Australian family in Perth just sat down and did the maths the government hoped you’d never do.
Cost to buy & own a home over 34 years: $2,016,850
Taxes paid to the government over the same period: $2,717,865
You paid more in tax than for your own house. Let that sink in.
Breakdown:
• $2.2M in income taxes, GST, duties & excises
• $105k in council rates
• $94k in vehicle taxes across 7 cars
• $300k in tax on your super (the money meant for retirement)
And what’s the big relief in the 2025 Budget? A $268 tax cut.
That’s $5.15 a week — less than a pie and a beer.
You’re not bad with money. You’re being taxed into the ground.
I love this country, but I’m bloody tired of everyday Aussies working their whole lives just to hand over more to the government than they spend on their home — while those collecting it face zero consequences.
The numbers don’t lie.
Time to prepare, protect and future-proof your family. The fighting spirit is needed now more than ever.
(Martene Wallace on Instagram)
What do you think? 🇦🇺
English
Jaymie Jones retweetledi

Google 把内部工程师的代码审查(Code Review)规范公开啦
这几乎是目前业界最顶级的标准
很多程序员只会写代码,但不知道怎么审代码,可以看看 Google 是怎么做的
1.双向指南:不仅教审查者怎么挑毛病,还教作者怎么写出容易通过的代码
2.术语科普:解释了 Google 内部常用的 LGTM(看起来不错)和 CL(变更列表)到底意味着什么
3.实战价值:这套规范不是理论,而是 Google 每一位工程师都在用的实际操作准则
如果你想提升团队的代码质量,或者想知道顶级大厂的开发门槛,这份文档必读!
github.com/google/eng-pra…
中文
Jaymie Jones retweetledi

if you want to design with AI agents, these skills are amazing
- impeccable impeccable.style
- taste tasteskill.dev
- layers layers.jamiemill.com
- superdesign app.superdesign.dev
I also made a plugin based on Refactoring UI (use for polish): github.com/gnurio/refacto…
find more here: prodmgmt.world/resources
English
Jaymie Jones retweetledi

instead of watching 2 hours of Netflix tonight, watch this 40-minute masterclass from the founder of a $20B China AI company
it's the clearest explanation I've seen of how Agent Swarms and AI systems actually work at scale
useful whether you've never built an agent in your life or have been using Claude every day for the past year
I took the key ideas and turned them into a practical guide on how to actually build with Kimi
find it below
Kirill@kirillk_web3
English
Jaymie Jones retweetledi

The 10 fastest growing GitHub repos this week:
1. codegraph (+14.1K stars)
Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, OpenCode, and Hermes Agent — fewer tokens, fewer tool calls, 100% local
github.com/colbymchenry/c…
2. openhuman (+17.1K stars)
Your Personal AI super intelligence. Private, Simple and extremely powerful.
github.com/tinyhumansai/o…
3. academic-research-skills (+11.6K stars)
Academic Research Skills for Claude Code: research → write → review → revise → finalize
github.com/Imbad0202/acad…
4. RuView (+6.8K stars)
π RuView turns commodity WiFi signals into real-time spatial intelligence, vital sign monitoring, and presence detection — all without a single pixel of video.
github.com/ruvnet/RuView
5. agentmemory (+6.9K stars)
#1 Persistent memory for AI coding agents based on real-world benchmarks
github.com/rohitg00/agent…
6. supertonic (+3.6K stars)
Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX.
github.com/supertone-inc/…
7. CloakBrowser (+7.0K stars)
Stealth Chromium that passes every bot detection test. Drop-in Playwright replacement with source-level fingerprint patches. 30/30 tests passed.
github.com/CloakHQ/CloakB…
8. ViMax (+2.7K stars)
"ViMax: Agentic Video Generation (Director, Screenwriter, Producer, and Video Generator All-in-One)"
github.com/HKUDS/ViMax
9. 12-factor-agents (+1.9K stars)
What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers?
github.com/humanlayer/12-…
10. bun (+2.0K stars)
Incredibly fast JavaScript runtime, bundler, test runner, and package manager – all in one
github.com/oven-sh/bun
The theme this week: agent memory, context efficiency, and on-device intelligence are making AI infrastructure the hottest build category.
Bookmark this. Next week's list will look completely different.

English
Jaymie Jones retweetledi

My brain is reeling with the implications. I keep having these revelations and I'm beginning to wonder when they will stop.
It turns out that property testing is yet another hardening technique that the agents can profitably engage. Agents can determine whether a function is appropriate for property testing, and can specify the range and domain of those tests. They can implement them quickly, run them, and fix any detected issues.
I just found two production bugs this way. Property testing is going to be part of my normal practice, along with Crap analysis, Function mutation, acceptance test mutation, Dry analysis, etc.
English
Jaymie Jones retweetledi

A 6-person team is building task-specific AI models that are 4-8x faster than anything from OpenAI or Anthropic. 500K downloads on HuggingFace. No hype. Just better engineering winning on the merits.
This is what "make something people want" looks like in the model layer.
zeroentropy.dev
English






