
Sidhant Thole
140 posts










Modded-NanoGPT optimization result #29 (2026/05/11): @nilinabra has achieved a new step-count record of 2990 (40-step improvement) by halving the growth rate of the L2-norm of the hidden matrix parameters. This result is better than the previous record with a p-value of 4e-5.









even more baking with baby




In modded-nanogpt we also found that the last couple attention layers hate interacting with the final prediction MLPs. So we work around it with a cached activation from earlier. In the attention residuals paper, Kimi doesn't explicitly mention it, but you can see from their chart that the final attn layers dont engage with the final outputs. So I think there is something fundamental going on here.



Mathematician reacts to OpenAI's recent proof:



i’m increasingly convinced that the best agent evals will come from mining real agent failure traces. my view is that every failed trace contains a potential eval but not in its raw form. raw traces are messy, long and too specific. the research problem is to distill them into clean reproducible tests. the pipeline i’m interested in is (which i'm currently working on): failure trace → failure attribution → earliest divergence point → minimal reproducible state → targeted eval → regression suite this turns trace data from passive observability into an active improvement loop. like can we extract the exact decision point where the agent should have behaved differently? and can we convert that into an eval that catches the same failure class in the future? i guess this matters because most agent failures are trajectory-level failures and not just output-level failures. personally i think this is much more realistic than relying only on hand-written benchmarks (imo they should look more like failure memory systems). hand-written evals encode what we think agents will fail on. traces encode what agents actually failed on. also once you have the mechanism, you can mutate the trace into variants. that is basically fuzzing for agents.










