Jaymie Jones

15.6K posts

Jaymie Jones

@pixelstackcom

Engineering Manager @canva - Connect API - Canva MCP, former Engineer Manager @envato. Host @codercatchup. Build with code and teams. ❤️ Photo & Video

Katılım Şubat 2010

2.8K Takip Edilen613 Takipçiler

Jaymie Jones retweetledi

Serena Ge (Datacurve)@serenaa_ge·22h

Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.

English

392

573

4.6K

1.2M

Jaymie Jones retweetledi

Peter Steinberger 🦞@steipete·2d

Folks: when you write skills, ask your agent to be token efficient, relax grammer. I see too many skills that write books in the skill description, and all that crap is loaded into every context. I wrote a skill that finds the worst offenders. github.com/steipete/agent…

English

183

389

4.9K

305.7K

Jaymie Jones retweetledi

Yifan Yang@Yif_Yang·2d

🚀 Introducing SkillOpt — an optimizer for agent skills. Instead of finetuning model weights, we treat a natural-language skill as a trainable external parameter. Think of it as deep learning for the frontier-model + agent era: learning rate, LR schedule, mini-batch, batch size, epoch, momentum — all in text-space optimization. SkillOpt enables stable, controllable skill updates through bounded edits, allowing the optimizer to summarize “gradient directions” from agent experience and continuously improve procedural capability. We evaluate SkillOpt across 6 benchmarks and 7 models, under both direct model calls and real agent execution loops with Codex + Claude Code. SkillOpt achieves best or tied-best results in 52/52 settings. Train the skill, not the model. 🛠️🤖 🌐 aka.ms/skillopt 📄 huggingface.co/papers/2605.23…

English

102

819

80.3K

Jaymie Jones retweetledi

Sergey Nazarov@sergeynazarovx·11 May

We used to go to a special website, ask strangers for help with programming, and get humiliated in return

English

304

3.5K

39.5K

873.4K

Jaymie Jones retweetledi

Muratcan Koylan@koylanai·1d

Gradient descent for SKILL.md files sounds interesting, maybe a bit complex but it's becoming a real part of agent harness. SkillOpt is one of the first papers to treat markdown skill files as trainable parameters and provides a proper optimization framework for them. A few things I learned that you should consider too. 1. The validation gate is the only thing that matters in a self-editing loop. Held-out set, strict improvement, ties rejected. End-to-end, their best skills land with 1 to 4 accepted edits total. If your "self-improving agent" is accepting most of what it proposes, you're shipping slop. 2. Bounded edits are better than full rewrites. 4 to 8 edits per step is the sweet spot. Remove the budget and performance collapses. This is the textual analog of learning rate, and it transfers to any LLM-as-author loop. If you're using an agent to refactor your docs, your prompts, or your skills, cap the diff size. 3. Compactness wins. Median final skill: ~920 tokens. Skills do not need to be long. They need to be high-signal. Most skill files I see are bloated because length feels like effort. It isn't. 4. The harness is becoming less important; the skill is becoming more important. A Codex-trained skill ported into Claude Code hit +59.7 points on SpreadsheetBench. Procedural knowledge is more general than the runtime that produced it. 5. Frozen model + trained context is the practical adaptation. GPT-5.4-nano with a SkillOpt'd skill ≈ frontier behavior on procedural benchmarks. Cheaper, portable, inspectable, zero inference-time cost. This is the answer to "how do we adapt a frontier model for our domain" for almost everyone who isn't training their own models. 6. Verification is the bottleneck. Every gate in this paper depends on an auto-grader. That works for benchmarks. It fails for writing, design, and strategy, exactly the open-ended work we want to automate. Whoever builds the verifier for open-ended tasks owns the next stage. There are also two leassons I learned while shipping v2.3.0 of my Context Engineering Agent Skills repo, measured across composer-2, claude-opus-4-7, gpt-5.5, and gemini-3.1-pro via the @cursor_ai SDK: - Description and body are two different surfaces. The router only sees the description. The agent sees the body once activated. They can quietly disagree, and only end-to-end task tests catch it. - Aggregate accuracy is the wrong unit. When I rewrote three descriptions, the corpus average moved ~1pp. Individual skills moved 23–25pp. Per-skill effect size is where the action is. Also, in Feb 2026 I shared a piece called Personal Brain OS arguing that the markdown file is a first-class substrate for agent state. SkillOpt is the optimizer-shaped version of that same argument: not "store memory in files" but "treat files as trainable parameters with proper optimization machinery around them." That's the move from static to measured. The fast/slow split they describe already lives implicitly in the digital-brain-skill repo: - voice-guide and tone-of-voice.md are slow-state (rarely touched) - posts.jsonl and bookmarks.jsonl are fast-state What SkillOpt adds that I didn't have is a protected section invariant, a structural guarantee that fast edits cannot overwrite slow lessons. Removing that mechanism cost them 22 points on SpreadsheetBench. Worth borrowing. If you're building agents, SkillOpt: Executive Strategy for Self-Evolving Agent Skills is a good paper to read: arxiv.org/pdf/2605.23904

English

214

2.1K

702.5K

Jaymie Jones@pixelstackcom·2d

ZXX

Jaymie Jones retweetledi

Lee Robinson@leerob·2d

You might believe you should spend less time thinking about code because of AI. I strongly disagree! We’re watching this play out live where tons of AI generated code becomes a liability. At the end of the day, an engineer needs to be responsible / on call for code that gets shipped to production. If you don’t understand the system you’re trying to debug, you’re probably going to have a bad time. Yes, AI can help with all of this, if you set up the proper systems. You can have agents triage prod logs, look at errors, etc. You can speed up parts of the investigation, but an engineer needs to make the call. There might be serious customer or financial implications from that change. I expect the trend continue for trimming dependencies, vendoring code so you can modify it directly, preferring simpler systems with fewer abstractions, and spending waaaay more time thinking about system design and code maintenance. I’ve said this before, but it’s a great time to get familiar with CS fundamentals and some of the history behind what great software looks like. Many parts will be different in the coming years as AI progresses, but also a lot more than people realize will stay the same.

English

263

525

4.1K

583.9K

Jaymie Jones retweetledi

Ben Dicken@BenjDicken·3d

Every engineer should read this. The principles for building reliable software systems have been around for a long time. Max outlines them beautifully. Here's to getting that 99.99% on your status page. planetscale.com/blog/the-princ…

English

167

1.7K

108.8K

Jaymie Jones retweetledi

0xSero@0xSero·3d

First thing I do on a new machine is install Opencode. Cause I can get the free models to get the system set up and open the box to my tailscale before I log into anything else.

English

1.1K

62.8K

Jaymie Jones retweetledi

Garry Tan@garrytan·2d

Everyone building AI agents is focusing on building the prefrontal cortex. Planning. Reasoning. Multi-step chains. There's value here. CEO-stuff. But also, a reframe: there is value in building the cerebellum. It's offloading boring tasks into reflex so the complex thought can focus. Your mortgage gets paid by a standing order, not a committee. The things that are not fun, not interesting, but have to be done? Done. Most agent frameworks will fail because they treat all cognition as high cognition. The winners will nail the boring stuff first.

English

328

257

3.2K

194.1K

Jaymie Jones retweetledi

Gav Meister@GavinBrx·4d

An Australian family in Perth just sat down and did the maths the government hoped you’d never do. Cost to buy & own a home over 34 years: $2,016,850 Taxes paid to the government over the same period: $2,717,865 You paid more in tax than for your own house. Let that sink in. Breakdown: • $2.2M in income taxes, GST, duties & excises • $105k in council rates • $94k in vehicle taxes across 7 cars • $300k in tax on your super (the money meant for retirement) And what’s the big relief in the 2025 Budget? A $268 tax cut. That’s $5.15 a week — less than a pie and a beer. You’re not bad with money. You’re being taxed into the ground. I love this country, but I’m bloody tired of everyday Aussies working their whole lives just to hand over more to the government than they spend on their home — while those collecting it face zero consequences. The numbers don’t lie. Time to prepare, protect and future-proof your family. The fighting spirit is needed now more than ever. (Martene Wallace on Instagram) What do you think? 🇦🇺

English

236

1.1K

4.3K

121K

Jaymie Jones retweetledi

nini@nini_incrypto_·4d

Google 把内部工程师的代码审查（Code Review）规范公开啦这几乎是目前业界最顶级的标准很多程序员只会写代码，但不知道怎么审代码，可以看看 Google 是怎么做的 1.双向指南：不仅教审查者怎么挑毛病，还教作者怎么写出容易通过的代码 2.术语科普：解释了 Google 内部常用的 LGTM（看起来不错）和 CL（变更列表）到底意味着什么 3.实战价值：这套规范不是理论，而是 Google 每一位工程师都在用的实际操作准则如果你想提升团队的代码质量，或者想知道顶级大厂的开发门槛，这份文档必读！ github.com/google/eng-pra…

中文

608

3.8K

267.6K

Jaymie Jones retweetledi

George from 🕹prodmgmt.world@nurijanian·3d

if you want to design with AI agents, these skills are amazing - impeccable impeccable.style - taste tasteskill.dev - layers layers.jamiemill.com - superdesign app.superdesign.dev I also made a plugin based on Refactoring UI (use for polish): github.com/gnurio/refacto… find more here: prodmgmt.world/resources

English

143

131.6K

Jaymie Jones@pixelstackcom·3d

GIF

ZXX

Jaymie Jones retweetledi

Kirill@kirillk_web3·5d

instead of watching 2 hours of Netflix tonight, watch this 40-minute masterclass from the founder of a $20B China AI company it's the clearest explanation I've seen of how Agent Swarms and AI systems actually work at scale useful whether you've never built an agent in your life or have been using Claude every day for the past year I took the key ideas and turned them into a practical guide on how to actually build with Kimi find it below

Kirill@kirillk_web3

x.com/i/article/2056…

English

2.2K

16.9K

13.4M

Jaymie Jones retweetledi

Sharbel@sharbel·4d

The 10 fastest growing GitHub repos this week: 1. codegraph (+14.1K stars) Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, OpenCode, and Hermes Agent — fewer tokens, fewer tool calls, 100% local github.com/colbymchenry/c… 2. openhuman (+17.1K stars) Your Personal AI super intelligence. Private, Simple and extremely powerful. github.com/tinyhumansai/o… 3. academic-research-skills (+11.6K stars) Academic Research Skills for Claude Code: research → write → review → revise → finalize github.com/Imbad0202/acad… 4. RuView (+6.8K stars) π RuView turns commodity WiFi signals into real-time spatial intelligence, vital sign monitoring, and presence detection — all without a single pixel of video. github.com/ruvnet/RuView 5. agentmemory (+6.9K stars) #1 Persistent memory for AI coding agents based on real-world benchmarks github.com/rohitg00/agent… 6. supertonic (+3.6K stars) Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX. github.com/supertone-inc/… 7. CloakBrowser (+7.0K stars) Stealth Chromium that passes every bot detection test. Drop-in Playwright replacement with source-level fingerprint patches. 30/30 tests passed. github.com/CloakHQ/CloakB… 8. ViMax (+2.7K stars) "ViMax: Agentic Video Generation (Director, Screenwriter, Producer, and Video Generator All-in-One)" github.com/HKUDS/ViMax 9. 12-factor-agents (+1.9K stars) What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers? github.com/humanlayer/12-… 10. bun (+2.0K stars) Incredibly fast JavaScript runtime, bundler, test runner, and package manager – all in one github.com/oven-sh/bun The theme this week: agent memory, context efficiency, and on-device intelligence are making AI infrastructure the hottest build category. Bookmark this. Next week's list will look completely different.

English

180

1.6K

131.7K

Jaymie Jones retweetledi

Uncle Bob Martin@unclebobmartin·4d

My brain is reeling with the implications. I keep having these revelations and I'm beginning to wonder when they will stop. It turns out that property testing is yet another hardening technique that the agents can profitably engage. Agents can determine whether a function is appropriate for property testing, and can specify the range and domain of those tests. They can implement them quickly, run them, and fix any detected issues. I just found two production bugs this way. Property testing is going to be part of my normal practice, along with Crap analysis, Function mutation, acceptance test mutation, Dry analysis, etc.

English

851

180.2K

Jaymie Jones retweetledi

Garry Tan@garrytan·4d

A 6-person team is building task-specific AI models that are 4-8x faster than anything from OpenAI or Anthropic. 500K downloads on HuggingFace. No hype. Just better engineering winning on the merits. This is what "make something people want" looks like in the model layer. zeroentropy.dev

English

119

272

2.8K

386.5K

Jaymie Jones@pixelstackcom·3d

Legends 🙌

Tibo@thsottiaux

Some of you noticed limits drained faster in Codex, we root caused it to an optimization that we rolled back that had an impact on cache hit rates when compacting across long running sessions. We fixed this and have now reset usage limits for all accounts. Enjoy the weekend.

English

Jaymie Jones@pixelstackcom·4d

Woah!

DeepSeek@deepseek_ai

We are making our discount permanent! 🎉 Enjoy building with DeepSeek-V4-Pro and bring your innovative ideas to life! 🚀

English

Keşfet

@cursor_ai @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine