Henri Bonamy (@henribonamy) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Henri Bonamy@henribonamy·21 Nis

ml-intern: beats claude code + fully integrated with hf to train directly from your terminal. Check it out ! Happy to have helped with it :) 🤗

Aksel@akseljoonas

Introducing ml-intern, the agent that just automated the post-training team @huggingface It's an open-source implementation of the real research loop that our ML researchers do every day. You give it a prompt, it researches papers, goes through citations, implements ideas in GPU sandboxes, iterates and builds deeply research-backed models for any use case. All built on the Hugging Face ecosystem. It can pull off crazy things: We made it train the best model for scientific reasoning. It went through citations from the official benchmark paper. Found OpenScience and NemoTron-CrossThink, added 7 difficulty-filtered dataset variants from ARC/SciQ/MMLU, and ran 12 SFT runs on Qwen3-1.7B. This pushed the score 10% → 32% on GPQA in under 10h. Claude Code's best: 22.99%. In healthcare settings it inspected available datasets, concluded they were too low quality, and wrote a script to generate 1100 synthetic data points from scratch for emergencies, hedging, multilingual etc. Then upsampled 50x for training. Beat Codex on HealthBench by 60%. For competitive mathematics, it wrote a full GRPO script, launched training with A100 GPUs on hf.co/spaces, watched rewards claim and then collapse, and ran ablations until it succeeded. All fully backed by papers, autonomously. How it works? ml-intern makes full use of the HF ecosystem: - finds papers on arxiv and hf.co/papers, reads them fully, walks citation graphs, pulls datasets referenced in methodology sections and on hf.co/datasets - browses the Hub, reads recent docs, inspects datasets and reformats them before training so it doesn't waste GPU hours on bad data - launches training jobs on HF Jobs if no local GPUs are available, monitors runs, reads its own eval outputs, diagnoses failures, retrains ml-intern deeply embodies how researchers work and think. It knows how data should look like and what good models feel like. Releasing it today as a CLI and a web app you can use from your phone/desktop. CLI: github.com/huggingface/ml… Web + mobile: huggingface.co/spaces/smolage… And the best part? We also provisioned 1k$ GPU resources and Anthropic credits for the quickest among you to use.

English

1

6

580

Henri Bonamy@henribonamy·1h

project very strongly inspired from @distillpub, check our their interactive demo (cooler than mine btw) here distill.pub/2020/growing-c… :)

English

0

Henri Bonamy@henribonamy·1h

moreover, the structure can reform itself following a disturbance (a "wound"), similar to how organisms self-repair still fascinated by seeing it in action (4/5) 🧵

English

1

0

2

Henri Bonamy@henribonamy·1h

Life is weird: complex things can emerge from very basic rules this full lizard grows pretty much by itself ! about a year ago, I was looking for a cool deep learning project. I found an article about neural network cellular automata. the idea is simple: (1/5) 🧵

English

1

0

2

11

Henri Bonamy@henribonamy·10h

@nateberkopec It's not the posts the problem — it's the formulation

English

0

72

Nate Berkopec@nateberkopec·1d

I'm so sick of reading em dashes and "it's not x, it's y." I'm so sick of it, man.

English

365

271

4.6K

268.5K

Henri Bonamy@henribonamy·2d

@BenjaminTrom mighty toddler

English

0

11

Benjamin Trom@BenjaminTrom·2d

@henribonamy a toddler with a vocab size of 262144

English

1

0

1

23

Henri Bonamy@henribonamy·2d

llms feel like coding superintelligence until you're every so slightly out of distribution and instantly it's like talking to a toddler

English

1

0

3

56

Henri Bonamy@henribonamy·6d

@Yuchenj_UW taste is kind of verifiable. people generally agree on whether a website is beautiful or not probably true for art however, because it's beauty comes from unique ideas

English

1

0

69

Yuchen Jin@Yuchenj_UW·12 May

AI will solve coding and math first, because the outputs are verifiable. AI won’t “solve” art, because art has no unit test. There is no single definition of good or bad. And by art, I don’t just mean paintings or music. I mean designing a great product, building a great company, and anything where taste is the moat.

English

155

35

386

26.1K

Henri Bonamy@henribonamy·6d

@vincentweisser interesting, crazy speedup gains

English

0

1

64

Vincent Weisser@vincentweisser·6d

We are open sourcing renderers For RL, the inference server should be simple Tokens in, tokens out renderers is the token-level chat templating layer to >render messages to tokens >parse completions to structure >bridge rollouts byte-for-byte > >3x throughput on openmodels

Prime Intellect@PrimeIntellect

Introducing Renderers RL trainers work in tokens. Environments work in messages. Going back and forth corrupts sampled tokens, wasting compute on every agentic turn. With Renderers, we fix this mismatch. This unlocks >3x throughput on popular open models.

English

4

9

120

10K

Henri Bonamy@henribonamy·11 May

@akseljoonas @cmpatino_ most productive intern

English

0

72

Aksel@akseljoonas·11 May

3 weeks since ml-intern launched and we just hit 1M messages exchanged. that's 3.3 agent-years of ML research in 21 days. 2 months worth of research every day. 17,383 training jobs total. talk about AI acceleration. here's some of what people built: @cmpatino_ replicated the full DeepSeek v4 architecture and pre+post trained a 100M MoE from scratch. → huggingface.co/cmpatino/nanow… it landed a third place submission on @kellerjordan0 optimizer competition. autoresearch on SOTA territory. github.com/KellerJordan/m… @_lewtun Got the intern to convert @AlecRad's cool new talkie-lm 1930 model to work with transformers. tokenizer, chat template, model conversion etc all one-shotted by ml-intern. huggingface.co/lewtun/talkie-… someone created entire PhD dissertation chapter on context-aware agentic cyber defense drafted with 16 research subagents. and someone used it to crack an @Anthropic kernel optimization take-home. (we don't know how to feel about this one 👀 ) just getting started → huggingface.co/spaces/smolage…

English

19

17

155

34.6K

Henri Bonamy@henribonamy·11 May

Apparently the muon optimizer led to 25% of parameters in a model to die during training (become inactive) @tilderesearch published a blog post about training a model equivalent to qwen3 with orders of magnitude less training tokens, and 25% less parameters

Tilde@tilderesearch

Introducing Aurora, a new optimizer for training frontier-scale models. We train Aurora-1.1B, which achieves 100x data efficiency on open-source internet data. Despite having 25% fewer parameters, 2 orders of magnitude fewer training tokens, and using fully open-source internet-only data, Aurora matches Qwen3-1.7B on several benchmarks. Aurora was developed after identifying a major failure mode that can occur under Muon, an increasingly popular optimizer that has shown strong gains over Adam(W). We find that Muon can cause a huge percentage of neurons to effectively die early in training, reducing effective network capacity so that many parameters no longer meaningfully contribute to network outputs. By redistributing update energy more uniformly across neurons while preserving Muon’s stability properties, Aurora prevents neuron death and recovers substantial model capacity. What makes this work especially exciting is that it points toward a broader direction for ML research: better optimizers may not come purely from elegant mathematical abstractions, but from understanding and addressing the concrete dynamics and pathologies that emerge inside real training systems.

English

0

3

165

Henri Bonamy@henribonamy·11 May

@willccbb very interesting article read it this morning. crazy nice visualizations

English

0

120

will brown@willccbb·10 May

lovely article going deeper into the RL-SFT-OPD spectrum with some very nice intuitions + experiments :)

wh@nrehiew_

x.com/i/article/2053…

English

2

24

430

68.5K

Henri Bonamy@henribonamy·6 May

@parisbayarea @donatelli2026 @HackIterate wait I know this guy from somewhere 👀

English

1

0

1

155

parisbayarea@parisbayarea·6 May

Nice to meet you X thank you for the sauce @donatelli2026 follow @HackIterate for more

English

25

4

93

8.4K

Henri Bonamy@henribonamy·5 May

@LerWilson cool ! can't wait to see underwater footage

English

0

1

28

Wilson Ler@LerWilson·5 May

piloting / teleoperating this was not easy at all

jerica@jerica_kuah

day 10 at finc - started tele operating the ROV last week in sea - learnt that water currents are no joke - tested the VLA with better inference still on the quest of creating autonomous robot divers does anyone in the Bay Area needs something retrieved in the sea? Dm me

English

1

0

5

205

Henri Bonamy@henribonamy·5 May

@cmpatino_ @karpathy is it still an intern 👀

English

0

1

350

Carlos Miguel Patiño@cmpatino_·4 May

Introducing nanowhale 🐳! A tiny DeepSeek model fully pretrained by an agent. Inspired by @karpathy's nanochat, we gave ml-intern the task of training a tiny MoE with all the architectural advancements of DeepSeek v4. To test it end-to-end, it trained a 100M-parameter MoE through both pretraining and post-training.

Aksel@akseljoonas

Introducing ml-intern, the agent that just automated the post-training team @huggingface It's an open-source implementation of the real research loop that our ML researchers do every day. You give it a prompt, it researches papers, goes through citations, implements ideas in GPU sandboxes, iterates and builds deeply research-backed models for any use case. All built on the Hugging Face ecosystem. It can pull off crazy things: We made it train the best model for scientific reasoning. It went through citations from the official benchmark paper. Found OpenScience and NemoTron-CrossThink, added 7 difficulty-filtered dataset variants from ARC/SciQ/MMLU, and ran 12 SFT runs on Qwen3-1.7B. This pushed the score 10% → 32% on GPQA in under 10h. Claude Code's best: 22.99%. In healthcare settings it inspected available datasets, concluded they were too low quality, and wrote a script to generate 1100 synthetic data points from scratch for emergencies, hedging, multilingual etc. Then upsampled 50x for training. Beat Codex on HealthBench by 60%. For competitive mathematics, it wrote a full GRPO script, launched training with A100 GPUs on hf.co/spaces, watched rewards claim and then collapse, and ran ablations until it succeeded. All fully backed by papers, autonomously. How it works? ml-intern makes full use of the HF ecosystem: - finds papers on arxiv and hf.co/papers, reads them fully, walks citation graphs, pulls datasets referenced in methodology sections and on hf.co/datasets - browses the Hub, reads recent docs, inspects datasets and reformats them before training so it doesn't waste GPU hours on bad data - launches training jobs on HF Jobs if no local GPUs are available, monitors runs, reads its own eval outputs, diagnoses failures, retrains ml-intern deeply embodies how researchers work and think. It knows how data should look like and what good models feel like. Releasing it today as a CLI and a web app you can use from your phone/desktop. CLI: github.com/huggingface/ml… Web + mobile: huggingface.co/spaces/smolage… And the best part? We also provisioned 1k$ GPU resources and Anthropic credits for the quickest among you to use.

English

39

101

996

108.5K

Henri Bonamy@henribonamy·4 May

@Hamzeml would be very cool in Haskell 👀

English

1

0

3

1.7K

Hamzé 🦀@Hamzeml·3 May

Python made AI accessible. Rust can make parts of AI understandable. That’s the bet behind Category Theory for Tiny ML in Rust. We’re building tiny ML systems from first principles using: Rust types typed transformations composition training loops category theory as an engineering tool Not abstraction cosplay. Executable structure. Working draft. Public feedback welcome.

English

58

326

2.4K

134.4K

Henri Bonamy@henribonamy·4 May

@yitong the trend barely started and I've seen enough already

English

0

1.5K

yitong@yitong·3 May

behold the most cursed aesthetic of 2026: "Taste" set in Instrument serif over a midjourney generated impressionist painting not corporate memphis. more like noguchi table from temu vibes