Augment Code: "We added @karpathy -inspired coding rules from @jiayuan_jy to AGENTS.md and ran "

Post

We added @karpathy -inspired coding rules from @jiayuan_jy to AGENTS.md and ran 40 @openclaw PRs through three coding agents. The result: Code quality was basically unchanged, but the agents got there with less work. Fewer tool calls, lower time and cost.

English

143

12.6K

Augment Code@augmentcode·4d

The setup: ✔️40 hand-selected PRs from OpenClaw, mid-complexity (100–300 LOC excluding tests) ✔️ Three runners: Auggie on Opus 4.7, Claude Code on Opus 4.7, Codex on GPT-5.4 ✔️ Two variants per PR: baseline AGENTS.md (~18K chars) vs. AGENTS-karpathy.md (~20.5K chars) ✔️ 6 runs per config, total 18 repeats per individual PR ✔️Scored by an LLM judge on completeness, correctness, best practices, code reuse, and unsolicited documentation

English

1.2K

Augment Code@augmentcode·4d

The Karpathy rules are the kind of thing you'd write at the top of a team style guide. Here is the summary: 1. Think before coding: state assumptions, surface tradeoffs, ask if unclear 2. Simplicity first: no speculative features, no single-use abstractions, minimum code 3. Surgical changes: don't touch adjacent code, match existing style, no drive-by refactors 4. Goal-driven execution: define verifiable success criteria, loop until met

English

1.1K

Augment Code@augmentcode·4d

Quality: basically unchanged for Auggie and Codex. Claude Code dropped −0.07, with more conservative trajectories and ~5% fewer files touched per task. Karpathy-style guidelines don’t transfer uniformly across agent harnesses and repositories. In Codex, the guidelines likely add useful structure (improving efficiency). In Augment, the baseline prompt already encodes similar constraints, so the marginal impact is smaller. In Claude Code, the system prompt may already be highly constrained, so layering additional constraints could reduce exploration and degrade performance.

English

1.2K

Augment Code@augmentcode·4d

Every runner used fewer tool calls and finished faster. The agents found what they needed in fewer lookups. Output tokens fell by similar margins across runners. Per PR, Karpathy was faster and cheaper on about 30 of 40 PRs. The pattern held across all three agents. A 3–10% efficiency gain from a small prompt change isn't a model breakthrough, but if you're running a coding agent at scale, it's real money, real latency, and real capacity.

English

297

Augment Code@augmentcode·4d

@karpathy @jiayuan_jy @openclaw Full post by @VZhenylenko here: augmentcode.com/blog/karpathy-…

English

903

Paylaş