Jaymie Jones

15.6K posts

Jaymie Jones banner
Jaymie Jones

Jaymie Jones

@pixelstackcom

Engineering Manager @canva - Connect API - Canva MCP, former Engineer Manager @envato. Host @codercatchup. Build with code and teams. ❤️ Photo & Video

Katılım Şubat 2010
2.8K Takip Edilen613 Takipçiler
Jaymie Jones retweetledi
Serena Ge (Datacurve)
Serena Ge (Datacurve)@serenaa_ge·
Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.
Serena Ge (Datacurve) tweet media
English
392
573
4.6K
1.2M
Jaymie Jones retweetledi
Peter Steinberger 🦞
Folks: when you write skills, ask your agent to be token efficient, relax grammer. I see too many skills that write books in the skill description, and all that crap is loaded into every context. I wrote a skill that finds the worst offenders. github.com/steipete/agent…
English
183
389
4.9K
305.7K
Jaymie Jones retweetledi
Yifan Yang
Yifan Yang@Yif_Yang·
🚀 Introducing SkillOpt — an optimizer for agent skills. Instead of finetuning model weights, we treat a natural-language skill as a trainable external parameter. Think of it as deep learning for the frontier-model + agent era: learning rate, LR schedule, mini-batch, batch size, epoch, momentum — all in text-space optimization. SkillOpt enables stable, controllable skill updates through bounded edits, allowing the optimizer to summarize “gradient directions” from agent experience and continuously improve procedural capability. We evaluate SkillOpt across 6 benchmarks and 7 models, under both direct model calls and real agent execution loops with Codex + Claude Code. SkillOpt achieves best or tied-best results in 52/52 settings. Train the skill, not the model. 🛠️🤖 🌐 aka.ms/skillopt 📄 huggingface.co/papers/2605.23…
English
49
102
819
80.3K
Jaymie Jones retweetledi
Sergey Nazarov
Sergey Nazarov@sergeynazarovx·
We used to go to a special website, ask strangers for help with programming, and get humiliated in return
Sergey Nazarov tweet media
English
304
3.5K
39.5K
873.4K
Jaymie Jones retweetledi
Muratcan Koylan
Muratcan Koylan@koylanai·
Gradient descent for SKILL.md files sounds interesting, maybe a bit complex but it's becoming a real part of agent harness. SkillOpt is one of the first papers to treat markdown skill files as trainable parameters and provides a proper optimization framework for them. A few things I learned that you should consider too. 1. The validation gate is the only thing that matters in a self-editing loop. Held-out set, strict improvement, ties rejected. End-to-end, their best skills land with 1 to 4 accepted edits total. If your "self-improving agent" is accepting most of what it proposes, you're shipping slop. 2. Bounded edits are better than full rewrites. 4 to 8 edits per step is the sweet spot. Remove the budget and performance collapses. This is the textual analog of learning rate, and it transfers to any LLM-as-author loop. If you're using an agent to refactor your docs, your prompts, or your skills, cap the diff size. 3. Compactness wins. Median final skill: ~920 tokens. Skills do not need to be long. They need to be high-signal. Most skill files I see are bloated because length feels like effort. It isn't. 4. The harness is becoming less important; the skill is becoming more important. A Codex-trained skill ported into Claude Code hit +59.7 points on SpreadsheetBench. Procedural knowledge is more general than the runtime that produced it. 5. Frozen model + trained context is the practical adaptation. GPT-5.4-nano with a SkillOpt'd skill ≈ frontier behavior on procedural benchmarks. Cheaper, portable, inspectable, zero inference-time cost. This is the answer to "how do we adapt a frontier model for our domain" for almost everyone who isn't training their own models. 6. Verification is the bottleneck. Every gate in this paper depends on an auto-grader. That works for benchmarks. It fails for writing, design, and strategy, exactly the open-ended work we want to automate. Whoever builds the verifier for open-ended tasks owns the next stage. There are also two leassons I learned while shipping v2.3.0 of my Context Engineering Agent Skills repo, measured across composer-2, claude-opus-4-7, gpt-5.5, and gemini-3.1-pro via the @cursor_ai SDK: - Description and body are two different surfaces. The router only sees the description. The agent sees the body once activated. They can quietly disagree, and only end-to-end task tests catch it. - Aggregate accuracy is the wrong unit. When I rewrote three descriptions, the corpus average moved ~1pp. Individual skills moved 23–25pp. Per-skill effect size is where the action is. Also, in Feb 2026 I shared a piece called Personal Brain OS arguing that the markdown file is a first-class substrate for agent state. SkillOpt is the optimizer-shaped version of that same argument: not "store memory in files" but "treat files as trainable parameters with proper optimization machinery around them." That's the move from static to measured. The fast/slow split they describe already lives implicitly in the digital-brain-skill repo: - voice-guide and tone-of-voice.md are slow-state (rarely touched) - posts.jsonl and bookmarks.jsonl are fast-state What SkillOpt adds that I didn't have is a protected section invariant, a structural guarantee that fast edits cannot overwrite slow lessons. Removing that mechanism cost them 22 points on SpreadsheetBench. Worth borrowing. If you're building agents, SkillOpt: Executive Strategy for Self-Evolving Agent Skills is a good paper to read: arxiv.org/pdf/2605.23904
Muratcan Koylan tweet media
English
43
214
2.1K
702.5K
Jaymie Jones retweetledi
Lee Robinson
Lee Robinson@leerob·
You might believe you should spend less time thinking about code because of AI. I strongly disagree! We’re watching this play out live where tons of AI generated code becomes a liability. At the end of the day, an engineer needs to be responsible / on call for code that gets shipped to production. If you don’t understand the system you’re trying to debug, you’re probably going to have a bad time. Yes, AI can help with all of this, if you set up the proper systems. You can have agents triage prod logs, look at errors, etc. You can speed up parts of the investigation, but an engineer needs to make the call. There might be serious customer or financial implications from that change. I expect the trend continue for trimming dependencies, vendoring code so you can modify it directly, preferring simpler systems with fewer abstractions, and spending waaaay more time thinking about system design and code maintenance. I’ve said this before, but it’s a great time to get familiar with CS fundamentals and some of the history behind what great software looks like. Many parts will be different in the coming years as AI progresses, but also a lot more than people realize will stay the same.
English
263
525
4.1K
583.9K
Jaymie Jones retweetledi
Ben Dicken
Ben Dicken@BenjDicken·
Every engineer should read this. The principles for building reliable software systems have been around for a long time. Max outlines them beautifully. Here's to getting that 99.99% on your status page. planetscale.com/blog/the-princ…
English
23
167
1.7K
108.8K
Jaymie Jones retweetledi
0xSero
0xSero@0xSero·
First thing I do on a new machine is install Opencode. Cause I can get the free models to get the system set up and open the box to my tailscale before I log into anything else.
0xSero tweet media
English
58
37
1.1K
62.8K
Jaymie Jones retweetledi
Garry Tan
Garry Tan@garrytan·
Everyone building AI agents is focusing on building the prefrontal cortex. Planning. Reasoning. Multi-step chains. There's value here. CEO-stuff. But also, a reframe: there is value in building the cerebellum. It's offloading boring tasks into reflex so the complex thought can focus. Your mortgage gets paid by a standing order, not a committee. The things that are not fun, not interesting, but have to be done? Done. Most agent frameworks will fail because they treat all cognition as high cognition. The winners will nail the boring stuff first.
English
328
257
3.2K
194.1K
Jaymie Jones retweetledi
Gav Meister
Gav Meister@GavinBrx·
An Australian family in Perth just sat down and did the maths the government hoped you’d never do. Cost to buy & own a home over 34 years: $2,016,850 Taxes paid to the government over the same period: $2,717,865 You paid more in tax than for your own house. Let that sink in. Breakdown: • $2.2M in income taxes, GST, duties & excises • $105k in council rates • $94k in vehicle taxes across 7 cars • $300k in tax on your super (the money meant for retirement) And what’s the big relief in the 2025 Budget? A $268 tax cut. That’s $5.15 a week — less than a pie and a beer. You’re not bad with money. You’re being taxed into the ground. I love this country, but I’m bloody tired of everyday Aussies working their whole lives just to hand over more to the government than they spend on their home — while those collecting it face zero consequences. The numbers don’t lie. Time to prepare, protect and future-proof your family. The fighting spirit is needed now more than ever. (Martene Wallace on Instagram) What do you think? 🇦🇺
English
236
1.1K
4.3K
121K
Jaymie Jones retweetledi
nini
nini@nini_incrypto_·
Google 把内部工程师的代码审查(Code Review)规范公开啦 这几乎是目前业界最顶级的标准 很多程序员只会写代码,但不知道怎么审代码,可以看看 Google 是怎么做的 1.双向指南:不仅教审查者怎么挑毛病,还教作者怎么写出容易通过的代码 2.术语科普:解释了 Google 内部常用的 LGTM(看起来不错)和 CL(变更列表)到底意味着什么 3.实战价值:这套规范不是理论,而是 Google 每一位工程师都在用的实际操作准则 如果你想提升团队的代码质量,或者想知道顶级大厂的开发门槛,这份文档必读! github.com/google/eng-pra…
中文
31
608
3.8K
267.6K
Jaymie Jones retweetledi
Kirill
Kirill@kirillk_web3·
instead of watching 2 hours of Netflix tonight, watch this 40-minute masterclass from the founder of a $20B China AI company it's the clearest explanation I've seen of how Agent Swarms and AI systems actually work at scale useful whether you've never built an agent in your life or have been using Claude every day for the past year I took the key ideas and turned them into a practical guide on how to actually build with Kimi find it below
Kirill@kirillk_web3

x.com/i/article/2056…

English
97
2.2K
16.9K
13.4M
Jaymie Jones retweetledi
Sharbel
Sharbel@sharbel·
The 10 fastest growing GitHub repos this week: 1. codegraph (+14.1K stars) Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, OpenCode, and Hermes Agent — fewer tokens, fewer tool calls, 100% local github.com/colbymchenry/c… 2. openhuman (+17.1K stars) Your Personal AI super intelligence. Private, Simple and extremely powerful. github.com/tinyhumansai/o… 3. academic-research-skills (+11.6K stars) Academic Research Skills for Claude Code: research → write → review → revise → finalize github.com/Imbad0202/acad… 4. RuView (+6.8K stars) π RuView turns commodity WiFi signals into real-time spatial intelligence, vital sign monitoring, and presence detection — all without a single pixel of video. github.com/ruvnet/RuView 5. agentmemory (+6.9K stars) #1 Persistent memory for AI coding agents based on real-world benchmarks github.com/rohitg00/agent… 6. supertonic (+3.6K stars) Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX. github.com/supertone-inc/… 7. CloakBrowser (+7.0K stars) Stealth Chromium that passes every bot detection test. Drop-in Playwright replacement with source-level fingerprint patches. 30/30 tests passed. github.com/CloakHQ/CloakB… 8. ViMax (+2.7K stars) "ViMax: Agentic Video Generation (Director, Screenwriter, Producer, and Video Generator All-in-One)" github.com/HKUDS/ViMax 9. 12-factor-agents (+1.9K stars) What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers? github.com/humanlayer/12-… 10. bun (+2.0K stars) Incredibly fast JavaScript runtime, bundler, test runner, and package manager – all in one github.com/oven-sh/bun The theme this week: agent memory, context efficiency, and on-device intelligence are making AI infrastructure the hottest build category. Bookmark this. Next week's list will look completely different.
Sharbel tweet media
English
62
180
1.6K
131.7K
Jaymie Jones retweetledi
Uncle Bob Martin
Uncle Bob Martin@unclebobmartin·
My brain is reeling with the implications. I keep having these revelations and I'm beginning to wonder when they will stop. It turns out that property testing is yet another hardening technique that the agents can profitably engage. Agents can determine whether a function is appropriate for property testing, and can specify the range and domain of those tests. They can implement them quickly, run them, and fix any detected issues. I just found two production bugs this way. Property testing is going to be part of my normal practice, along with Crap analysis, Function mutation, acceptance test mutation, Dry analysis, etc.
English
47
46
851
180.2K
Jaymie Jones retweetledi
Garry Tan
Garry Tan@garrytan·
A 6-person team is building task-specific AI models that are 4-8x faster than anything from OpenAI or Anthropic. 500K downloads on HuggingFace. No hype. Just better engineering winning on the merits. This is what "make something people want" looks like in the model layer. zeroentropy.dev
English
119
272
2.8K
386.5K