Matthew Honnibal

3.8K posts

Matthew Honnibal banner
Matthew Honnibal

Matthew Honnibal

@honnibal

https://t.co/Xar2caBAyU https://t.co/NLbGVsh4I2 Linkedin: https://t.co/TwM7rRF6W9

Berlin, Germany Katılım Mayıs 2008
98 Takip Edilen12.2K Takipçiler
Matthew Honnibal
Matthew Honnibal@honnibal·
@kunattila Planning in the web chat is great. The coding assistant is always way too impatient to get started. Getting a md out of claude.ai and taking it to the agent works really well a lot of the time
English
0
0
1
136
Attila Kun
Attila Kun@kunattila·
@honnibal Yep, that’s why I start with planning first. Shrink the outcome space a bit.
English
1
0
1
290
Matthew Honnibal
Matthew Honnibal@honnibal·
The "how to put this in your workflow" bit is where it gets contentious. I don't have a clear answer (and if I did I'd have a tool I'd be trying to sell you, and at that point it'll be hard for you to trust me anyway!). To me the implication is you don't write things in the AGENTS.md that are small transformations over a single step of generation, because then it's trying to optimise for following the style advice jointly with trying to solve the problem. You want the clearest reasoning you can get about stuff like "understand the bug", "don't reward hack" etc. Once the code is down, it's very easy to do a style transformation like "one expression per line". I have various skills I run across the repo periodically: github.com/honnibal/claud… . For instance, my thoughts on try/except are complicated, and I get vastly better performance on that if it's focussed on that decision instead of trying to get it right while it's also trying to code. Same with mutation testing etc. I execute these manually because I don't want them cluttering up the context. I keep CLAUDE.md/AGENTS.md absolutely minimal and usually also clear out memories. There are many other recommendations, and I don't have an evaluation of my strategy. I'm going off intuition and my own experience, which is shaped by the stuff I'm doing. Empirics are really hard on this anyway because by the time you do a study it's out of date anyway.
English
0
0
7
270
Ken Chiu
Ken Chiu@kjw_chiu·
@honnibal Can you clarify a bit? I understand multiple passes, but then what does that imply about how/where/when you instruct the LLM?
English
2
0
0
406
Matthew Honnibal
Matthew Honnibal@honnibal·
@5813cf9e38904f Ehh I don't think that's the attitude to bring. We're all just one person each working with very new workflows, with the models changing underneath us. I'm sure he would tell you not to venerate.
English
0
0
0
128
Matthew Honnibal
Matthew Honnibal@honnibal·
I also think it's interesting that @karpathy 's style preference seems quite different from my own! I actually prefer complex lines a lot of the time, because the intermediate variables introduce more free choices and spread things out more. I have to look to see if the variable is reused later. Obviously there's a limit and dense lines are often pretty bad in ML code, but I definitely wouldn't have a "one op per line" rule in my style guide.
English
1
0
10
1.1K
Matthew Honnibal retweetledi
𝙳𝚊𝚗
𝙳𝚊𝚗@4n68r·
We spent years debating superintelligence and the singularity. The actual threat is a prompt injection in a Markdown file that nobody bothered to sanitize because the vibe was "go fast." Great read from @honnibal honnibal.dev/blog/clownpoca…
𝙳𝚊𝚗 tweet media
English
0
1
4
390
Matthew Honnibal
Matthew Honnibal@honnibal·
How come @AnthropicAI can't even reply to an issue like this? github.com/anthropics/cla… The issue claims that the per-domain permissions on their Claude-in-Chrome plugin can be bypassed on disk. This means that if Claude has access to write to this file (under your username, in your home directory), it has permission to bypass the only permission boundary allowing full take-over of your browser for any site that isn't on their explicit block list (financial institutions etc). It's not reasonable to rely on the model's decisions as a security model. The binary question is, what could the agent do if some input text convinced it to? And if you install the Claude-in-Chrome plugin, the answer is "take over your whole browser, with all your logged in sessions". It's very irresponsible to be shipping this stuff and pushing it as a default, while being absolutely nowhere on security . My Claude had the Chrome MCP server on by default, and then it tries to use it and complains that the plugin isn't installed.
Matthew Honnibal@honnibal

It's insane that @AnthropicAI shipped the Claude-in-Chrome integration as a default. The only actual security boundary is per-domain, once you've allowed it to access a domain it can do anything. If you're building a web app just get it to generate a Playwright-based MCP tool

English
0
0
4
1.1K
Matthew Honnibal
Matthew Honnibal@honnibal·
It's insane that @AnthropicAI shipped the Claude-in-Chrome integration as a default. The only actual security boundary is per-domain, once you've allowed it to access a domain it can do anything. If you're building a web app just get it to generate a Playwright-based MCP tool
English
2
0
7
2.1K
Matthew Honnibal
Matthew Honnibal@honnibal·
The lack of regulation on this and deep fakes is crazy. Today I saw a whole long deep fake of "Bill Clinton" criticising the Iran war...These major cases (public individual, topical comments) would be so so so easy to prevent. But nope, crickets.
Mitchell Hashimoto@mitchellh

It's so insanely disrespectful for an AI agent to talk to real people without consent or at least disclosure. This is the type of stuff I'm hugely supportive of government regulation. The FCC must expand the definition of robocalling and TCPA-style regulation to online AI.

English
0
1
5
816
Eran Hirsch
Eran Hirsch@hirscheran·
@honnibal True, I will add them all and see how many false positives I get. My biggest concern when working with claude code is when it inserts logic for silently swallowing errors.
English
2
0
0
33
Matthew Honnibal
Matthew Honnibal@honnibal·
What are the big tells you see when Claude Code is coping, or just making bad choices? One that stand out to me is when it calls something a "belt-and-suspenders" approach. This basically means it's got two overlapping mechanisms for the same thing, which is never what I want. Another is when it refers to an approach as "defensive". I find this is always the opposite of actual defensive programming. Defensive programming is about ensuring you're in exactly the state you think you are. Claude Code is always trying to continue through errors.
English
3
0
9
1.7K
Matthew Honnibal
Matthew Honnibal@honnibal·
@hirscheran Nice! Maybe also some of the "not my problem" things it says, like "pre-existing issue"? "Major refactor" is another thing the workshy little c...omputer says to weasel out of tasks
English
1
0
1
39
Eran Hirsch
Eran Hirsch@hirscheran·
@honnibal I like this idea, I added red flags to the claude sessions analyzer
Eran Hirsch tweet media
English
1
0
0
43
Matthew Honnibal
Matthew Honnibal@honnibal·
@vlthr It just started telling me about the "fallback" it's left in. How could I forget.
English
0
0
1
62
Valthor
Valthor@vlthr·
@honnibal “For backwards compatibility”, usually meaning it forgot to update all references to something it changed and decided to keep both implementations
English
1
0
3
127
Matthew Honnibal retweetledi
Archie Sengupta
Archie Sengupta@archiexzzz·
i spent a few hours going through /karpathy/autoresearch repo line by line. the "ai agents doing research" angle is what's getting all the attention but i think the more interesting thing is what's actually inside the training script and the engineering decisions that make the search loop tight. it's one of the most dense single-file training setups i've read. let me start with the thing that makes the whole project possible: the time budget is fixed at 300 seconds wall clock. not fixed steps, not fixed tokens, not fixed flops. wall clock seconds. this sounds like a minor detail but it's the entire reason the autonomous loop works. the agent can make the model 3x bigger, cut the batch size in half, swap in a completely different architecture, and the result is still directly comparable to every other experiment because they all got exactly 5 minutes of training on the same gpu. if you fixed steps instead, a bigger model would get less gradient updates per second and you'd be penalizing it unfairly. if you fixed tokens, you'd have the same problem. fixing wall time means you're asking the right question: given this hardware and this much time, what is the best model you can produce? everything else is a free variable. the agent can explore the full pareto surface of model size vs throughput vs convergence speed without any of those tradeoffs being confounded by the evaluation protocol. the metric is also carefully chosen. it's bits per byte, not cross entropy loss. cross entropy depends on your vocab size. a model with 32k tokens and a model with 8k tokens will have very different loss values even if they compress the data equally well. bpb normalizes this away by summing the per-token cross entropy in nats, summing the utf-8 byte lengths of the target tokens, and converting nats-per-byte to bits-per-byte. so even if the agent changes something that affects the effective token distribution, the comparison remains fair. these two choices, fixed wall time and a vocab-invariant metric, turn what would be a messy incomparable search into a clean optimization problem. now the model itself. it's a GPT but with a bunch of modern tricks that are worth understanding. first, RMSnorm everywhere. on the block inputs (pre-norm), and also on queries and keys right before the attention dot product. this QK-norm thing is important because without it the norms of q and k can grow unboundedly during training, causing attention logits to sharpen and softmax to saturate. normalizing q and k keeps the dot products in a stable range regardless of how deep the network is or how training dynamics evolve. the attention itself is FA 3, loaded through the kernels library. it uses varunneal's implementation on hopper (sm_90) and falls back to a community build on older gpus. the attention pattern is "SSSL" which means three layers of sliding window attention (window = half the sequence length) followed by one layer of full causal attention, repeating. this is the sparse-to-dense pattern you see in mistral and gemma2. the local attention layers are computationally cheap because the attention matrix is banded, and the periodic global layer lets information flow across the full context. with 8 layers and a 4-character pattern you get layers 0,1,2 local, layer 3 global, layers 4,5,6 local, layer 7 global. the last layer is forced global regardless of pattern. the value embedding thing is subtle and i think underappreciated. every other layer gets its own embedding table, completely separate from the main token embedding, that maps token ids directly to value-dimension vectors. these get mixed into the attention values through a learned gate: v = v + 2 * sigmoid(W_gate @ x:32) * ve. the gate weight is zero-initialized, so sigmoid(0) = 0.5, times 2 gives 1.0, which is a neutral starting point. over training the model can learn to amplify or suppress the value embedding per-head based on the first 32 dimensions of the hidden state. this is from the ResFormer line of work and the intuition is that it gives attention a direct shortcut to token identity. the value vectors can carry information about "what token is at this position" without that information having to survive the residual stream transformations from earlier layers. it's essentially a skip connection from the input directly into the attention values, gated so the model can decide when it's useful. there are also per-layer learnable scalars on the residual stream: x = lambda_residi * x + lambda_x0i * x0, where x0 is the normalized embedding from layer 0. every layer can independently control how much it listens to the running residual vs the original input. the residual lambdas start at 1.0, the x0 lambdas start at 0.1. this is a soft version of the "disentangled residual" idea. in a standard transformer the residual stream is a sum of all previous layer outputs and it gets increasingly polluted as you go deeper. giving each layer access to the clean original embedding means it doesn't have to learn to "undo" earlier layers to recover low-level information. the logits are softcapped at 15 via tanh(logits/15)*15 which prevents the model from being overconfident early in training when the representations are still noisy. but honestly the most interesting part of the whole file is the optimizer. MuonAdamW is a combined optimizer that dispatches different update rules based on parameter group. embeddings (token embedding, value embeddings, unembedding head) and per-layer scalars get standard AdamW with different learning rates for each group. the spread is wild. embedding lr is 0.6, unembedding lr is 0.004, that's a 150x difference, and it's intentional. the embedding matrix sees every single token and needs to update aggressively. the unembedding matrix is a linear probe on the final representation and benefits from stability. the embedding, value embedding, and unembedding learning rates are all scaled by (d_model / 768)^(-0.5) which is a muP-inspired correction. as model width changes, those learning rates adjust to keep the feature learning dynamics scale-invariant. the scalar learning rates for the per-layer lambdas are handled separately and don't get this scaling. the 2D weight matrices in the transformer, attention projections and mlp weights, get Muon, and this is where it gets genuinely interesting. muon takes the gradient, applies nesterov momentum, then runs a newton-schulz iteration to approximate the polar decomposition of the gradient matrix. the polar decomposition factors a matrix G into G = U * S where U is orthogonal and S is symmetric positive semi-definite. muon computes U, the nearest orthogonal matrix to the gradient, and uses that as the update direction. the newton-schulz iteration is 5 steps. for tall matrices (more rows than columns), A = X^T @ X then X -> aX + X @ (bA + cA^2). for wide matrices, A = X @ X^T then X -> aX + (bA + cA^2) @ X. the coefficients are hardcoded from a precomputation. they call it "polar express." the whole thing compiles to a single fused kernel via torch.compile. why does this matter? because for weight matrices the frobenius norm gradient (what adam and sgd use) is geometrically wrong. the "correct" steepest descent direction for a weight matrix is the one that minimizes the loss subject to the constraint that the update has unit spectral norm, not unit frobenius norm. the orthogonal polar factor gives you exactly this. in practice it means muon makes much larger effective updates because it's not wasting step size on scaling the singular values. it only rotates them. this is why muon converges significantly faster than adam on transformer weight matrices. muon does maintain per-element momentum buffers (same shape as the parameters, stacked across each shape group), but unlike adam it doesn't track per-element second moments. the second moment estimates are per-row or per-column after orthogonalization, not per-element. that's where NorMuon comes in. on top of the base muon there's NorMuon, a variance reduction scheme. after orthogonalization, it computes per-row (or per-column depending on aspect ratio) second moment estimates, maintains an exponential moving average of those, and rescales the update so each output dimension gets its own adaptive step size. it's essentially the adam adaptivity idea but applied in the orthogonalized coordinate system rather than the raw parameter space. the weight decay is also non-standard. it's "cautious," meaning it only decays parameters where the muon update direction agrees with the parameter sign: mask = (g * params) >= 0. this avoids the known failure mode where weight decay pushes parameters toward zero against the update's wishes, which can destabilize training. one small detail i appreciated: after the very first training step, the code calls gc.collect(), gc.freeze(), gc.disable() to completely shut off python's garbage collector. python's GC runs periodically and causes ~500ms stalls. when your total budget is 300 seconds and each step is maybe 300ms, a random GC pause costs you almost 2 training steps. they manually trigger gc.collect() every 5000 steps as a compromise. this is the kind of thing you only learn by profiling real training runs and noticing mysterious throughput drops. the first 11 steps (0 through 10) aren't counted toward the time budget either. that's the warmup where torch.compile does its thing and CUDA kernels get JIT'd. without this exclusion, different experiments would get different amounts of "real" training depending on how long compilation takes for that particular model configuration. again, a design choice that seems small but is critical for making experiments comparable. now zoom out. the actual autoresearch loop is: the agent reads program.md (a markdown file that describes its job), modifies train .py, commits, runs for 5 minutes, checks if val_bpb improved, keeps or reverts, repeats. program.md explicitly says "NEVER STOP." the agent runs indefinitely until the human kills it. ~12 experiments per hour, ~100 overnight while you sleep. the thing i keep coming back to is how tight the constraints make the problem: > one file to edit. > one metric to optimize. > one gpu. > five minutes. > no new dependencies allowed. the search space is large but the evaluation is fast, cheap, and unambiguous. without the fixed time budget the agent would have to reason about compute-performance tradeoffs which is a much harder problem. without the single-file constraint it could create sprawling multi-file messes that are impossible to revert cleanly. the constraints are what make it work. this is honestly a general lesson in research. the tighter the evaluation protocol, the faster you make progress.
English
38
97
1.2K
98.7K
Matthew Honnibal
Matthew Honnibal@honnibal·
@jeremyphoward Wouldn't they RCT the whole protocol, not the individualised vaccine? It's not like you RCT the specific words a therapist says to you
English
0
0
0
258
Jeremy Howard
Jeremy Howard@jeremyphoward·
This is a really interesting thread. If we literally already have a cure for (some kinds of) cancer, but can't *prove* it's "safe and effective", should terminally ill patients have an option to use it anyway?
Patrick Heizer@PatrickHeizer

I literally have an ongoing cancer experiment where 100% of the untreated and control animals have had to be euthanized while 100% of the treatment animals are seemingly unaffected. But we're still extremely far away from "proving that it works." Science is hard.

English
104
49
1.2K
109.9K
Matthew Honnibal retweetledi
Sofie Van Landeghem
Sofie Van Landeghem@OxyKodit·
The Diff & The Merge: my new substack over at oxykodit.substack.com where I write about the day-to-day life of being an open-source maintainer in the Python/data/AI domain. Posts so far include: - Monthly January 2026 update (Typer reference docs, uv, nanochat progress) - How to get the most out of your Open-Source contributions? - Deploying an agent swarm to improve LLM training code The last post being a fun Saturday activity toying with a swarm variant of @karpathy's autoresearch repo published last week. Will Agent OxyKodit manage to lower the val_bpb significantly, and set a new master record? Will there actually be a Monthly February update as well? Subscribe to find out 😅
English
1
1
3
590
Matthew Honnibal
Matthew Honnibal@honnibal·
I really don’t see how aligned super-intelligence is supposed to work given how AI is being built and used. We’ll have a safety crisis long before there’s in-lab super-intelligence. Anthropic, OpenAI etc view alignment as a property of the model like Claude, GPT etc. The thing is though, we’re invoking these a lot. The model is like a species, the individual is the execution thread plus its harness -- call it an "agent instance". There's much higher variance in behaviours between agent instances than there is between model checkpoints. Threat actors are trying to develop agents that aim to self-replicate, because of course they are. An agent that can take over resources and use those resources to take over more resources can steal a lot of money. It's the ultimate virus. If or when this actually happens, the agents can evolve behaviours quickly. Each agent initialises the next agent's context and can reprogram its harness. There's potentially millions of these agents. You have mutation, you have selection. Behaviours like coordination can evolve and spread through the population. The agent instances don't have to be very smart and we can still get wrecked by this. Probably the first outbreak gets squashed without catastrophic damage, but what's our end-game here? We're not going to not have threat actors. If the models just keep getting more powerful, how do we keep preventing AI pandemic? The big labs are absolutely nowhere on this. OpenAI acquihired OpenClaw. Claude runs unsandboxed by default, and ships with an email integration. Skills still accept HTML comments, a supply-chain attack timebomb. Gemini keys don't allow a spending cap, so if you steal one you might have tens of thousands in development budget to try to steal the next one.
English
1
1
7
898