Fabrizio Milo

2.3K posts

Fabrizio Milo

@fabmilo

LF angels investors (inception phase) AI for Software Development at Scale. I believe: - English is the new programming language - Code will eat the world

San Francisco Katılım Kasım 2009

2.2K Takip Edilen843 Takipçiler

Sabitlenmiş Tweet

Fabrizio Milo@fabmilo·23 Ağu

Created with @NotebookLM after discussing these topics with @Cyndesama @tensorqt @Niccolg92 and few others

English

1.3K

Fabrizio Milo@fabmilo·1h

If you are running parameter golf by OpenAI, here are some stats that might be useful: gist.github.com/Mistobaan/f3fe…

English

Fabrizio Milo@fabmilo·19h

@karpathy Spoilers!

English

Andrej Karpathy@karpathy·20h

Had to go see Project Hail Mary right away (it's based on the book of Andy Weir, of also The Martian fame). Both very pleased and relieved to say that 1) the movie sticks very close to the book in both content and tone and 2) is really well executed. The book is one of my favorites when it comes to alien portrayals because a lot of thought was clearly given to the scientific details of an alternate biochemistry, evolutionary history, sensorium, psychology, language, tech tree, etc. It's different enough that it is highly creative and plausible, but also similar enough that you get a compelling story and one of the best bromances in fiction. Not to mention the other (single-cellular) aliens. I can count fictional portrayals of aliens of this depth on one hand. A lot of these aspects are briefly featured - if you read the book you'll spot them but if you haven't, the movie can't spend the time to do them justice. I'll say that the movie inches a little too much into the superhero movie tropes with the pacing, the quips, the Bathos and such for my taste, and we get a little bit less the grand of Interstellar and a little bit less of the science of The Martian, but I think it's ok considering the tone of the original content. And it does really well where it counts - on Rocky and the bromance. Thank you to the film crew for the gem!

English

302

281

7.8K

477K

Fabrizio Milo@fabmilo·2d

Do you guys still use google? I feel I haven't done a search in ages. I just deepresearch everything.

English

Fabrizio Milo@fabmilo·2d

My dream job would be to research more efficient architectures for AI. Let’s see if it becomes reality

OpenAI@OpenAI

Are you up for a challenge? openai.com/parameter-golf

English

172

Fabrizio Milo@fabmilo·3d

end of day used to be stack overflow down now is this

English

Fabrizio Milo@fabmilo·3d

Gonna read this right away. I do think we need sequential processing of the input and use the N^2 attention on a much smaller subset of the space. I love thinking about deep learning architectures.

Albert Gu@_albertgu

The newest model in the Mamba series is finally here 🐍 Hybrid models have become increasingly popular, raising the importance of designing the next generation of linear models. We've introduced several SSM-centric ideas to significantly increase Mamba-2's modeling capabilities without compromising on speed. The resulting Mamba-3 model has noticeable performance gains over the most popular previous linear models (such as Mamba-2 and Gated DeltaNet) at all sizes. This is the first Mamba that was student led: all credit to @aakash_lahoti @kevinyli_ @_berlinchen @caitWW9, and of course @tri_dao!

English

124

Fabrizio Milo@fabmilo·4d

@hamostaf04 @DennwsLee yes, I saw the same pattern in my own harness. The key is to have a "tree" of small atomic changes and a trace of failure / success in a short form. I applied to kernel generation with benchmarks in my case which is verifiable

English

hamza mostafa@hamostaf04·5d

my friend @DennwsLee and i spent the past week tinkering with autoresearch we gave 4 AI agents a research loop and told them to never stop 48 hours later: 550+ experiments, zero babysitting. One agent hit 93% on competition math from pure reward signal. another proved SFT beats RL at half the cost. highlights in 🧵

hamza mostafa@hamostaf04

x.com/i/article/2033…

English

199

32.3K

Fabrizio Milo@fabmilo·5d

Finally managed to have a native macosx mujoco client for whole body simulation. The idea is to have a minimal environment running locally that can communicate to a remote GPU machine over ZMQ to run the actual policy.

English

Fabrizio Milo@fabmilo·5d

Actually won something with my kernel harness !

Fabrizio Milo@fabmilo

Today is @PyTorch hackathon in San Francisco with the @GPU_MODE folks. Will my custom helion autoresearch harness be able to produce better kernels and win the competion ?

English

181

Fabrizio Milo@fabmilo·6d

good harnesses is all you need

English

Fabrizio Milo@fabmilo·6d

Today is @PyTorch hackathon in San Francisco with the @GPU_MODE folks. Will my custom helion autoresearch harness be able to produce better kernels and win the competion ?

English

263

Fabrizio Milo@fabmilo·14 Mar

@ellen_in_sf What model is running the experiments claude opus? is this still using the mlx version as target?

English

ellen ᯅ 🇺🇸🇮🇩@ellen_in_sf·14 Mar

6 hours later 127 experiments ~ 34% improvements on my experiment running autoresearch with no human-in-the-loop + 27% keep rate. I see an issue: the val_bpb tends to plateau with no early stopping mechanism, token burns. Will update with a workaround

ellen ᯅ 🇺🇸🇮🇩@ellen_in_sf

x.com/i/article/2032…

English

10.3K

Fabrizio Milo@fabmilo·11 Mar

I just generated this image from my imagination about the future of AI. And by looking at it, made me realize that one of the drawbacks of current LLMs is their linearity dictated by the back-propagation training algorithm. LLMs are cool but definitely not the final solution.

English

Fabrizio Milo retweetledi

Chen Liang@crazydonkey200·8 Mar

@karpathy Very inspiring as always! We are also open sourcing part of our infra on automated research for Gemini to evolve itself at github.com/google-deepmin… More complex than the nanochat setup but closer to SOTA LLM pre/post-training while staying as minimal as possible. More on the way.

English

148

1.4K

101.6K

Fabrizio Milo@fabmilo·8 Mar

@sirbayes @topwasu @ellisk_kellis @WLehrach What is the signal to symbol problem? A symbol is a grouping of common similar signals ?

English

229

Kevin Patrick Murphy@sirbayes·8 Mar

I agree that for agents (as opposed to making media slop for human consumption), symbolic world models that abstract away from pixels is key! I have a little paper on this (arxiv.org/abs/2602.02799) with @topwasu , @ellisk_kellis and @WLehrach. But we took a shortcut and assumed symbolic input (from OC Atari), so we could focus on learning temporally abstract WMs using code synthesis. Thus our approach didn't solve the signal-to-symbol problem, and relies heavily on pretrained LLMs, which feels a bit like cheating. In general, once you leave the happy land of symbolic data (code, language, math), modeling becomes much harder, since you need to decide what loss function to use for training. If you can't afford to model every little detail, you can't use maximum likelihood (or some simple bound like ELBO), so you need some other way to decide how to "color the bits" (as @alemi likes to say).

Moonlake@moonlake

x.com/i/article/2029…

English

264

55.7K

Fabrizio Milo retweetledi

Kevin Patrick Murphy@sirbayes·6 Mar

I am delighted to see a new version of the book by @_sdbuchanan, @druv_pai , @pengwang2003 and @YiMaTweets . This is the best book on the foundations of deep representation learning! In this era of coding agents, the math is all you need to learn :) ma-lab-berkeley.github.io/deep-represent…

English

103

608

56K

Fabrizio Milo@fabmilo·5 Mar

@tri_dao big win, compiling flash attention was dreadful

English

511

Tri Dao@tri_dao·5 Mar

I’m unreasonably excited about the fact that we wrote everything in Cute-DSL, embedded in Python. Installing / “compiling” now takes seconds instead of minutes / hours (looking at you, C++ templates). Try pip install fa4!

English

432

27.1K

Tri Dao@tri_dao·5 Mar

The FA4 paper is finally out after a year of work. On Blackwell GPUs, attention now goes about as fast as matmul even though the bottlenecks are so different! Tensor cores are now crazy fast that attn fwd is bottlenecked by exponential, and attn bwd is bottlenecked by shared memory bandwidth. Some fun stuff in the redesigned algorithm to overcome these bottlenecks: exponential emulation with polynomials, new online softmax to avoid 90% of softmax rescaling, 2CTA MMA instructions that allow two thread blocks to share operands to reduce smem traffic.

Ted Zadouri@tedzadouri

Asymmetric hardware scaling is here. Blackwell tensor cores are now so fast, exp2 and shared memory are the wall. FlashAttention-4 changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed! joint work w/ Markus Hoehnerbach, Jay Shah(@ultraproduct), Timmy Liu, Vijay Thakkar (@__tensorcore__ ), Tri Dao (@tri_dao) 1/

English

230

1.8K

183.3K

Fabrizio Milo@fabmilo·1 Mar

I think every evaluation paper should report the cost of running a full evaluation.

English

Fabrizio Milo@fabmilo·1 Mar

What every company with instructions on how to use their products that wants to be agent-friendly should implement: “Copy page as Markdown”.

English

Fabrizio Milo@fabmilo·27 Şub

@ZainHasan6 @julien_c Is there a complexity metric for the traces? for curriculum learning.

English

Zain@ZainHasan6·26 Şub

We just open sourced a dataset that cost us $130k to generate! It's 6.7B tokens of agentic coding traces of 51k tasks across 1.6k unique repos. You can SFT on it to make your models better coding agents!

Together AI@togethercompute

We’re open-sourcing CoderForge-Preview — 258K test-verified coding-agent trajectories (155K pass | 103K fail). Fine-tuning Qwen3-32B on the passing subset boosts SWE-bench Verified: 23.0% → 59.4% pass @1, and it ranks #1 among open-data models ≤32B parameters. Thread on the data generation pipeline 🧵

English

1.2K

97.9K

Keşfet

@karpathy @hamostaf04 @DennwsLee @PyTorch @GPU_MODE @ellen_in_sf @sirbayes @topwasu