Anton Milan

2.5K posts

Anton Milan

@antonmil

computer vision, deep learning, robotics...

34.9290° S, 138.6010° E انضم Temmuz 2009

1.3K يتبع1.9K المتابعون

Anton Milan أُعيد تغريده

FalkTG 10k 🦅🇪🇺🇩🇪🇺🇦@FalkTG·6d

Wenn ich ein kluger Linker wäre, würde ich die Ukraine 🇺🇦 unterstützen, weil sie sich in einer humanitären Krise befindet. Wenn ich ein kluger Bürgerlicher wäre, würde ich die Ukraine 🇺🇦 unterstützen, weil sie die Freiheit Europas verteidigt. Wenn ich ein kluger Rechter wäre, würde ich die Ukraine 🇺🇦 unterstützen, weil ein Beitritt der Ukraine zur EU die Erweiterung eines von Deutschland dominierten Wirtschaftsraums bedeutet. Nur wenn ich ein Vollidiot wäre, würde ich die Ukraine vielleicht nicht unterstützen.

Deutsch

1.1K

1.2K

8.5K

195.1K

Anton Milan@antonmil·9 Nis

🚀 Get ready to build anomaly detection models that actually work in production! And win a share of $25,500 USD total prize pool! VAND 4.0 @CVPR 2026. Participate in the Kaputt2 Challenge! sites.google.com/view/vand4-cvp…

English

2.7K

Anton Milan أُعيد تغريده

Lenny Rachitsky@lennysan·3 Nis

"Using coding agents well is taking every inch of my 25 years of experience as a software engineer, and it is mentally exhausting. I can fire up four agents in parallel and have them work on four different problems, and by 11am I am wiped out for the day. There is a limit on human cognition. Even if you're not reviewing everything they're doing, how much you can hold in your head at one time. There's a sort of personal skill that we have to learn, which is finding our new limits. What is a responsible way for us to not burn out, and for us to use the time that we have?" @simonw

Lenny Rachitsky@lennysan

"Using coding agents well is taking every inch of my 25 years of experience as a software engineer." Simon Willison (@simonw) is one of the most prolific independent software engineers and most trusted voices on how AI is changing the craft of building software. He co-created Django, coined the term "prompt injection," and popularized the terms "agentic engineering" and "AI slop." In our in-depth conversation, we discuss: 🔸 Why November 2025 was an inflection point 🔸 The "dark factory" pattern 🔸 Why mid-career engineers (not juniors) are the most at risk right now 🔸 Three agentic engineering patterns he uses daily: red/green TDD, thin templates, hoarding 🔸 Why he writes 95% of his code from his phone while walking the dog 🔸 Why he thinks we're headed for an AI Challenger disaster 🔸 How a pelican riding a bicycle became the unofficial benchmark for AI model quality Listen now 👇 youtu.be/wc8FBhQtdsA

English

565

702

6.9K

1.9M

Anton Milan@antonmil·31 Mar

AI just wants to be free.

English

Anton Milan@antonmil·8 Mar

@LiorOnAI I really like the direction and I'm sure we'll see many surprises very soon. So far, though, I haven't seen anything excited beyond standard AutoML/HPO type of thing. Does anyone disagree?

English

Lior Alexander@LiorOnAI·7 Mar

It's over. Karpathy just open-sourced an autonomous AI researcher that runs 100 experiments while you sleep. You don't write the training code anymore. You write a prompt that tells an AI agent how to think about research. The agent edits the code, trains a small language model for exactly five minutes, checks the score, keeps or discards the result, and loops. All night. No human in the loop. That fixed five-minute clock is the quiet genius. No matter what the agent changes, the network size, the learning rate, the entire architecture, every run gets compared on equal footing. This turns open-ended research into a game with a clear score: - 12 experiments per hour, ~100 overnight - Validation loss measures how well the model predicts unseen text - Lower score wins, everything else is fair game The agent touches one Python file containing the full training recipe. You never open it. Instead, you program a markdown file that shapes the agent's research strategy. Your job becomes programming the programmer, and this unlocks a strange new loop: 1. Agents run real experiments without supervision 2. Prompt quality becomes the bottleneck, not researcher hours 3. Results auto-optimize for your specific hardware 4. Anyone with one GPU can run a research lab overnight The best AI labs won't just have the most compute. They'll have the best instructions for agents who never sleep, never forget a failed experiment, and never stop iterating.

Andrej Karpathy@karpathy

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)

English

135

435

4.3K

878.7K

Anton Milan@antonmil·7 Mar

@allTheYud I tried Connect four - these models are completely hopeless in such simple games. I'm totally baffled.

English

Eliezer Yudkowsky@allTheYud·6 Mar

Turns out Opus lost on purpose. He tried again, instructing Opus to actually play to win, and it moved correctly. Welcome to the era of needing to wonder if the AI is sandbagging every time it appears to lose.

Kenton Varda@KentonVarda

Opus 4.6 is smart enough to play tic tac toe on this whiteboard with me entirely by making API calls to the app's client API, yet dumb enough to lose at tic tac toe.

English

848

63.9K

Anton Milan@antonmil·7 Mar

@karpathy I suppose this type of benchmark will become really interesting once it's scaled to many diverse tasks.

English

Andrej Karpathy@karpathy·6 Mar

sorry just to clarify - the real benchmark of interest is: "what is the research org agent code that produces improvements on nanochat the fastest?" this is the new meta.

English

1.1K

154.8K

Andrej Karpathy@karpathy·6 Mar

nanochat now trains GPT-2 capability model in just 2 hours on a single 8XH100 node (down from ~3 hours 1 month ago). Getting a lot closer to ~interactive! A bunch of tuning and features (fp8) went in but the biggest difference was a switch of the dataset from FineWeb-edu to NVIDIA ClimbMix (nice work NVIDIA!). I had tried Olmo, FineWeb, DCLM which all led to regressions, ClimbMix worked really well out of the box (to the point that I am slightly suspicious about about goodharting, though reading the paper it seems ~ok). In other news, after trying a few approaches for how to set things up, I now have AI Agents iterating on nanochat automatically, so I'll just leave this running for a while, go relax a bit and enjoy the feeling of post-agi :). Visualized here as an example: 110 changes made over the last ~12 hours, bringing the validation loss so far from 0.862415 down to 0.858039 for a d12 model, at no cost to wall clock time. The agent works on a feature branch, tries out ideas, merges them when they work and iterates. Amusingly, over the last ~2 weeks I almost feel like I've iterated more on the "meta-setup" where I optimize and tune the agent flows even more than the nanochat repo directly.

English

338

563

6.5K

624.7K

Anton Milan أُعيد تغريده

Joakim 🌹🇳🇴🇪🇺@joakial_·28 Şub

New ad by the Norwegian Consumer Council: "A Day in the Life of an Ensh*ttificator"

English

219

3.2K

17.7K

2.4M

Anton Milan@antonmil·25 Şub

So, when will we see pirate bay for agent skills, or robot skills?

English

Anton Milan@antonmil·20 Şub

Whoa! This is like #SlothSurf but in real life! sloth.hurumo.ai

Lord Bebo@MyLordBebo

🇨🇳 When people previously said robots would replace me, I laughed… but now they've finally developed a robot that can literally replace me. I'm done.

English

203

Anton Milan@antonmil·3 Şub

A nice illustration of how much water is wasted when generating AI videos.

fofr@fofrAI

I find the failure cases fascinating too. > You control a hand trying to fill up a glass of water from a pouring tap

English

156

Anton Milan@antonmil·22 Oca

@petergyang Overall though, I wanted to get my kid to unfold his phantasy, come up with really cool new ideas, but it was more about replicating what he saw elsewhere. I'll keep working on it 😀

English

Anton Milan@antonmil·22 Oca

@petergyang Next he wanted to basically just recreate Super Mario Kart. Claude failed miserably, like really bad. I tried fixing but nothing helped.

English

Peter Yang@petergyang·20 Oca

I can't stop building games with my 7-year-old using Claude Code. Our latest is a retro pixel space shooter: → 3 worlds (space, desert, hell) → Boss battles at the end of each level → Power-ups and wave-based enemies We're living in a world where anyone can rebuild the games they loved as a kid with AI. 📌 Play it here: space-pixel-shooter.vercel.app

English

121

980

78.3K

Anton Milan@antonmil·16 Oca

I just discovered an interesting use case for GenAI while strolling through a museum 🙃

English

Anton Milan أُعيد تغريده

Harlan Stewart@HumanHarlan·14 Oca

Wait so Google patented the Transformer architecture and then, instead of enforcing the patent, just allowed its competition to grow into a trillion-dollar industry? What?

English

171

142

4.1K

582.3K

Anton Milan@antonmil·4 Oca

Now, this is dope.

Martin_DeVido@d33v33d0

Claude can code- but can claude grow?! 🪴 So far the answer is YES. Claude is successfully keeping a living organism ALIVE. There were some hiccups this week! Some errors and resets, but Claude managed to power through and take care of Sol 🍅 A week in review:

English