Fabrizio Milo

2.3K posts

Fabrizio Milo

Fabrizio Milo

@fabmilo

LF angels investors (inception phase) AI for Software Development at Scale. I believe: - English is the new programming language - Code will eat the world

San Francisco Katılım Kasım 2009
2.2K Takip Edilen843 Takipçiler
Andrej Karpathy
Andrej Karpathy@karpathy·
Had to go see Project Hail Mary right away (it's based on the book of Andy Weir, of also The Martian fame). Both very pleased and relieved to say that 1) the movie sticks very close to the book in both content and tone and 2) is really well executed. The book is one of my favorites when it comes to alien portrayals because a lot of thought was clearly given to the scientific details of an alternate biochemistry, evolutionary history, sensorium, psychology, language, tech tree, etc. It's different enough that it is highly creative and plausible, but also similar enough that you get a compelling story and one of the best bromances in fiction. Not to mention the other (single-cellular) aliens. I can count fictional portrayals of aliens of this depth on one hand. A lot of these aspects are briefly featured - if you read the book you'll spot them but if you haven't, the movie can't spend the time to do them justice. I'll say that the movie inches a little too much into the superhero movie tropes with the pacing, the quips, the Bathos and such for my taste, and we get a little bit less the grand of Interstellar and a little bit less of the science of The Martian, but I think it's ok considering the tone of the original content. And it does really well where it counts - on Rocky and the bromance. Thank you to the film crew for the gem!
English
302
281
7.8K
477K
Fabrizio Milo
Fabrizio Milo@fabmilo·
Do you guys still use google? I feel I haven't done a search in ages. I just deepresearch everything.
English
0
0
0
47
Fabrizio Milo
Fabrizio Milo@fabmilo·
end of day used to be stack overflow down now is this
Fabrizio Milo tweet media
English
0
0
1
64
Fabrizio Milo
Fabrizio Milo@fabmilo·
@hamostaf04 @DennwsLee yes, I saw the same pattern in my own harness. The key is to have a "tree" of small atomic changes and a trace of failure / success in a short form. I applied to kernel generation with benchmarks in my case which is verifiable
English
0
0
1
63
hamza mostafa
hamza mostafa@hamostaf04·
my friend @DennwsLee and i spent the past week tinkering with autoresearch we gave 4 AI agents a research loop and told them to never stop 48 hours later: 550+ experiments, zero babysitting. One agent hit 93% on competition math from pure reward signal. another proved SFT beats RL at half the cost. highlights in 🧵
hamza mostafa@hamostaf04

x.com/i/article/2033…

English
16
14
199
32.3K
Fabrizio Milo
Fabrizio Milo@fabmilo·
Finally managed to have a native macosx mujoco client for whole body simulation. The idea is to have a minimal environment running locally that can communicate to a remote GPU machine over ZMQ to run the actual policy.
Fabrizio Milo tweet media
English
0
0
0
73
Fabrizio Milo
Fabrizio Milo@fabmilo·
good harnesses is all you need
English
0
0
0
48
Fabrizio Milo
Fabrizio Milo@fabmilo·
Today is @PyTorch hackathon in San Francisco with the @GPU_MODE folks. Will my custom helion autoresearch harness be able to produce better kernels and win the competion ?
English
0
0
1
263
Fabrizio Milo
Fabrizio Milo@fabmilo·
@ellen_in_sf What model is running the experiments claude opus? is this still using the mlx version as target?
English
0
0
0
71
Fabrizio Milo
Fabrizio Milo@fabmilo·
I just generated this image from my imagination about the future of AI. And by looking at it, made me realize that one of the drawbacks of current LLMs is their linearity dictated by the back-propagation training algorithm. LLMs are cool but definitely not the final solution.
Fabrizio Milo tweet media
English
0
0
1
52
Fabrizio Milo retweetledi
Chen Liang
Chen Liang@crazydonkey200·
@karpathy Very inspiring as always! We are also open sourcing part of our infra on automated research for Gemini to evolve itself at github.com/google-deepmin… More complex than the nanochat setup but closer to SOTA LLM pre/post-training while staying as minimal as possible. More on the way.
English
13
148
1.4K
101.6K
Kevin Patrick Murphy
Kevin Patrick Murphy@sirbayes·
I agree that for agents (as opposed to making media slop for human consumption), symbolic world models that abstract away from pixels is key! I have a little paper on this (arxiv.org/abs/2602.02799) with @topwasu , @ellisk_kellis and @WLehrach. But we took a shortcut and assumed symbolic input (from OC Atari), so we could focus on learning temporally abstract WMs using code synthesis. Thus our approach didn't solve the signal-to-symbol problem, and relies heavily on pretrained LLMs, which feels a bit like cheating. In general, once you leave the happy land of symbolic data (code, language, math), modeling becomes much harder, since you need to decide what loss function to use for training. If you can't afford to model every little detail, you can't use maximum likelihood (or some simple bound like ELBO), so you need some other way to decide how to "color the bits" (as @alemi likes to say).
Moonlake@moonlake

x.com/i/article/2029…

English
7
32
264
55.7K
Tri Dao
Tri Dao@tri_dao·
I’m unreasonably excited about the fact that we wrote everything in Cute-DSL, embedded in Python. Installing / “compiling” now takes seconds instead of minutes / hours (looking at you, C++ templates). Try pip install fa4!
English
5
18
432
27.1K
Tri Dao
Tri Dao@tri_dao·
The FA4 paper is finally out after a year of work. On Blackwell GPUs, attention now goes about as fast as matmul even though the bottlenecks are so different! Tensor cores are now crazy fast that attn fwd is bottlenecked by exponential, and attn bwd is bottlenecked by shared memory bandwidth.  Some fun stuff in the redesigned algorithm to overcome these bottlenecks: exponential emulation with polynomials, new online softmax to avoid 90% of softmax rescaling, 2CTA MMA instructions that allow two thread blocks to share operands to reduce smem traffic.
Ted Zadouri@tedzadouri

Asymmetric hardware scaling is here. Blackwell tensor cores are now so fast, exp2 and shared memory are the wall. FlashAttention-4 changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed! joint work w/ Markus Hoehnerbach, Jay Shah(@ultraproduct), Timmy Liu, Vijay Thakkar (@__tensorcore__ ), Tri Dao (@tri_dao) 1/

English
30
230
1.8K
183.3K
Fabrizio Milo
Fabrizio Milo@fabmilo·
I think every evaluation paper should report the cost of running a full evaluation.
Fabrizio Milo tweet media
English
0
0
1
70
Fabrizio Milo
Fabrizio Milo@fabmilo·
What every company with instructions on how to use their products that wants to be agent-friendly should implement: “Copy page as Markdown”.
Fabrizio Milo tweet media
English
0
0
0
40
Zain
Zain@ZainHasan6·
We just open sourced a dataset that cost us $130k to generate! It's 6.7B tokens of agentic coding traces of 51k tasks across 1.6k unique repos. You can SFT on it to make your models better coding agents!
Zain tweet media
Together AI@togethercompute

We’re open-sourcing CoderForge-Preview — 258K test-verified coding-agent trajectories (155K pass | 103K fail). Fine-tuning Qwen3-32B on the passing subset boosts SWE-bench Verified: 23.0% → 59.4% pass@1, and it ranks #1 among open-data models ≤32B parameters. Thread on the data generation pipeline 🧵

English
27
87
1.2K
97.9K