L
669 posts




Antyamerykański szał ogarnia Niemcy! Media i pseudoautorytety wmawiają Niemcom, jak ci mają wszystkie cyfrowe zastosowania zastąpić niemieckimi (europejskimi) aplikacjami, a oni serio to robia !!!! Niemcy idą prosto do cyfrowej jaskini i poza świat AI, rozumienie to? @Szysz4Szyszek, @miloszlodowski, @SlubowskiG, @rutkem, @cezarykrysztopa

nanochat can now train GPT-2 grade LLM for <<$100 (~$73, 3 hours on a single 8XH100 node). GPT-2 is just my favorite LLM because it's the first time the LLM stack comes together in a recognizably modern form. So it has become a bit of a weird & lasting obsession of mine to train a model to GPT-2 capability but for much cheaper, with the benefit of ~7 years of progress. In particular, I suspected it should be possible today to train one for <<$100. Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with $8/hour/TPUv3 back then, for a total cost of approx. $43K. It achieves 0.256525 CORE score, which is an ensemble metric introduced in the DCLM paper over 22 evaluations like ARC/MMLU/etc. As of the last few improvements merged into nanochat (many of them originating in modded-nanogpt repo), I can now reach a higher CORE score in 3.04 hours (~$73) on a single 8XH100 node. This is a 600X cost reduction over 7 years, i.e. the cost to train GPT-2 is falling approximately 2.5X every year. I think this is likely an underestimate because I am still finding more improvements relatively regularly and I have a backlog of more ideas to try. A longer post with a lot of the detail of the optimizations involved and pointers on how to reproduce are here: github.com/karpathy/nanoc… Inspired by modded-nanogpt, I also created a leaderboard for "time to GPT-2", where this first "Jan29" model is entry #1 at 3.04 hours. It will be fun to iterate on this further and I welcome help! My hope is that nanochat can grow to become a very nice/clean and tuned experimental LLM harness for prototyping ideas, for having fun, and ofc for learning. The biggest improvements of things that worked out of the box and simply produced gains right away were 1) Flash Attention 3 kernels (faster, and allows window_size kwarg to get alternating attention patterns), Muon optimizer (I tried for ~1 day to delete it and only use AdamW and I couldn't), residual pathways and skip connections gated by learnable scalars, and value embeddings. There were many other smaller things that stack up. Image: semi-related eye candy of deriving the scaling laws for the current nanochat model miniseries, pretty and satisfying!

AI in robotics gets all the attention right now, but sometimes the most interesting work is very practical. Viet built a small vision system that counts potatoes on a conveyor belt. No giant dataset. No huge model. Just a clear problem and a smart setup. He used Ultralytics’ ObjectCounter, trained a tiny YOLO11 nano model, and because there was no potato dataset, he annotated a single frame with SAM 2 and trained from that. One frame. Still works across the whole video. It is a good reminder that useful AI in industry often looks like this. Focused. Lightweight. Solves a real task. If you work in manufacturing or robotics, these small systems are usually the fastest wins. They save time, reduce errors, and do not need massive infrastructure. Nice work, Viet. His projects: github.com/vietnh1009 —- Weekly robotics and AI insights. Subscribe free: scalingdeep.tech


Narzędzie stworzone pod Polskich deweloperów. Wgrywasz render ze strony dewelopera, dostajesz prawdziwą wizualizacje, jak budynek czy osiedle będzie wyglądać we wtorek w Listopadzie.



Rumor is FAANG style co’s are refactoring their monorepos to scale in preparation for infinite agent code




*AMAZON IN TALKS TO INVEST UP TO $50B IN OPENAI: WSJ


Wow. Just made my first AI video game with Google’s Genie 3 The prompt: "A realistic high-speed racing game where you have to escape the cops. Ignore all laws of physics" The gaming industry is so cooked



We're hiring. We're building the world's largest dataset of human tasks, from folding laundry to operating nuclear power plants. At this stage, your work will define the company and the future of humanity. If you're genuinely obsessed with AI and robotics, reach out. We're especially looking for engineers and ops/marketing people, but we want to talk to anyone who thinks they belong here. Munich (engineering and ops), Mumbai/Pune/Cape Town (ops). Come build abundance with us and work with frontier labs. careers@micro-agi.com












