
Bernie Sanders: “60% of our people living paycheck-to-paycheck, and one guy, Elon Musk, owns more wealth than the bottom 53% of American households... Think maybe that might be an issue that we should be talking about?"
David Sweet
3.1K posts

@phinance99
Applied Epistemologist Learn to experiment: https://t.co/F9l8CmYFn2 Prevent code complexity creep: cargo install kiss-ai

Bernie Sanders: “60% of our people living paycheck-to-paycheck, and one guy, Elon Musk, owns more wealth than the bottom 53% of American households... Think maybe that might be an issue that we should be talking about?"


I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)







I have stumbled onto a way to improve agent steering. Namely, how to improve performance when you say "make sure you do this" and the LLM doesn't do it. Here it is: Saying "remember to do X" is unreliable - it requires the LLM to agent to spontaneously initiate a procedural behavior. But presenting the agent with a specific, possibly-wrong claim ("You should be doing X - are you still doing it?") reliably triggers corrective behavior when the claim is wrong. The agent doesn't need to remember to check. The mismatch between presented state and actual state creates a correction event that the agent LLM naturally responds to. This reminds me of the old maxim of "the best way to get a correct answer on the internet is to post a wrong one" and I guess that makes sense since LLMs are predominantly the distilled "knowledge" of the internet. Anyhow I've been building a long-running memory system for my agents and implementing it this way fixed a lot of problems.












This problem was solved in ~1950. Read about Toyota quality. Also, Stewart, Deming, Six Sigma. tldr: Define quality. Measure it at every step. Don't proceed to next step until quality is high enough. Take small steps. Practical terms: Make a small change. Insist it passes linters, tests, and reviews. Repeat ad infinitum. Also: `cargo install kiss-ai`, a linter for code complexity.


This problem was solved in ~1950. Read about Toyota quality. Also, Stewart, Deming, Six Sigma. tldr: Define quality. Measure it at every step. Don't proceed to next step until quality is high enough. Take small steps. Practical terms: Make a small change. Insist it passes linters, tests, and reviews. Repeat ad infinitum. Also: `cargo install kiss-ai`, a linter for code complexity.


