Michael Griffiths

5.5K posts

Michael Griffiths

@msjgriffiths

Data Science

Brooklyn Katılım Ekim 2007

3K Takip Edilen4.2K Takipçiler

Michael Griffiths@msjgriffiths·11h

@DonutGup @StreetsblogNYC There was a good 2-3 seconds of time. Friend, that should be plenty. If it’s not, the driver needs to be traveling slower.

English

107

Donut Gup@DonutGup·23h

@msjgriffiths @StreetsblogNYC no judge would ever put that driver liable, how is he supposed to know that a person, one who is already very hidden with their clothing, is going to start running in the direction of his car? he had no time to react once she entered his lane of travel.

English

221

Streetsblog New York City@StreetsblogNYC·1d

When a yellow light means "speed up," pedestrians die.

Streetsblog New York City@StreetsblogNYC

Espina Bundley made a tiny little error — the kind millions of New Yorkers make every single day — and she paid with her life early on Wednesday morning the driver of a 4,000-pound SUV traveling at high speed slammed into her on Second Avenue, cops said. nyc.streetsblog.org/2026/03/27/cyc…

English

351

44K

Michael Griffiths@msjgriffiths·1d

@DonutGup @StreetsblogNYC And yet, still the drivers fault. The driver may have right of way, but they are responsible for avoiding obstacles.

English

229

Donut Gup@DonutGup·1d

@StreetsblogNYC new york is permissive yellow, which means that driver had every right to go. there is no reason for her to have started walking, and shes literally running diagonal instead of to the curb, and then even stops running midway.

English

1.3K

Michael Griffiths@msjgriffiths·4d

Humorously, the LLM falls into the same trap many people do of want to write what it did, not what the reader should take away. It also handwaves references in literature search! Very human. Read here: prism.openai.com/?u=aba1c799-69…

English

Michael Griffiths@msjgriffiths·4d

For fun, I had Codex w/ GPT 5.4 on xhigh do a series of small scale experiments with transformers (i.e. can we use Tracr to encode functions in transformers before the grokking stage, and thus shorten pretraining time?). It's not amazing, but fun to see how far they can go now!

English

Michael Griffiths@msjgriffiths·5d

I wonder if this would make continual learning easier.

English

Michael Griffiths@msjgriffiths·5d

I am delighted to learn this has been done! arxiv.org/abs/2412.09871

English

Michael Griffiths@msjgriffiths·5d

Tokenization in general seems to have progressed less than I expected from fasttext. Still doing subwords, common words, despite work showing compressed tokens work (eg zip2zip), new domains (code, harnesses) is iffy. Feels like byte level ngrams should be part of mid training.

English

Michael Griffiths@msjgriffiths·5d

Dumb question: why isn’t BPE computed as a streaming algorithm with a warmup period? People seem to train on a fraction of full corpus in an exact way but that doesn’t seem obviously better?

English

Michael Griffiths@msjgriffiths·22 Mar

@ryderkessler The flip side is that services should also extend to the top 1% - like childcare - instead of “means testing”

English

418

Ryder Kessler@ryderkessler·21 Mar

There should not be anything controversial about raising taxes on the wealthiest New Yorkers. Not only can the 1% afford to pay their fair share to support the public sector services that benefit everyone, but this level of inequality is bad for democracy.

unusual_whales@unusual_whales

"The top 1 percent of American households, which have a minimum net worth of $11.1 million, now collectively own about $25.6 trillion worth of stocks and mutual funds, the same amount as the remaining 99% of the country," per the Federal Reserve

English

4.3K

Michael Griffiths retweetledi

Palli Thordarson@PalliThordarson·15 Mar

Proud with @UNSWRNA to have been involved & making the mRNA-LNP for Rosie. There are nuances here that the thread below misses but nevertheless, the intersection of RNA technology, genomic & AI poses an opportunity to change the way do medicine and make access more equitable 1/8

Greg Brockman@gdb

How AI empowered Paul Conyngham to create a custom mRNA vaccine to cure his dog’s cancer when she had only months to live. The first personalized cancer vaccine designed for a dog:

English

246

1.6K

215.3K

Michael Griffiths@msjgriffiths·12 Mar

@emollick And the flip side is that competitive advantage comes from the bottlenecks. Perhaps it's in "environment" (data generation, aka data) so it's more ML logic. But I am not sure.

English

Michael Griffiths@msjgriffiths·12 Mar

@emollick i.e. it does for all "technical" skills what calculators do for basic maths (addition/subtraction/multiplication). Obviously those skills drop in value tremendously: then something else *must* become the bottleneck.

English

Ethan Mollick@emollick·12 Mar

I wrote about the exponential improvement path of AI, the early signs of massive transformations in the nature of work (including software companies where nobody codes any more), and how one week in February is an omen of our future as things get weirder. open.substack.com/pub/oneusefult…

English

578

87.2K

Michael Griffiths@msjgriffiths·11 Mar

GPT-4 on exact prompt

Joseph Viviano@josephdviviano

me: "can you use whatever resources you like, and python, to generate a short 'youtube poop' video and render it using ffmpeg ? can you put more of a personal spin on it? it should express what it's like to be a LLM" claude opus 4.6:

English

Michael Griffiths@msjgriffiths·9 Mar

@shcallaway @LakshyAAAgrawal @ChiragCX I’ve used GEPA for a bunch of things. It’s plausible that a git repo is a really good database for a GEPA variant.

English

1.9K

Sherwood@shcallaway·8 Mar

@ChiragCX @LakshyAAAgrawal GEPA uses AI to generate prompt “mutations” and iteratively achieve the optimal prompt. Here, Karpathy is doing something similar to achieve optimal pre-training hyper parameters

English

2.1K

Sherwood@shcallaway·8 Mar

GEPA for pre-training

Andrej Karpathy@karpathy

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)

English

137

16.5K

Michael Griffiths@msjgriffiths·8 Mar

Marx’s critique of factories - alienation of worker from the work - now coming to knowledge work.

Machine Learning Street Talk@MLStreetTalk

A masterclass from @jeremyphoward on why AI coding tools can be a trap -- and what 45 years of programming taught him that most vibe coders will never learn. - AI coding tools exploit gambling psychology - The difference between typing code and software engineering - Enterprise coding AND prompt-only vibe coding are "inhumane" i.e. disconnecting humans from understanding-building - AI tools remove the "desirable difficulty" you need to build deep mental models. Out on MLST now!

English

Michael Griffiths retweetledi

George@odysseus0z·8 Mar

x.com/i/article/2030…

ZXX

255

1.3K

722.4K

Michael Griffiths retweetledi

Ted Zadouri@tedzadouri·5 Mar

Asymmetric hardware scaling is here. Blackwell tensor cores are now so fast, exp2 and shared memory are the wall. FlashAttention-4 changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed! joint work w/ Markus Hoehnerbach, Jay Shah(@ultraproduct), Timmy Liu, Vijay Thakkar (@__tensorcore__ ), Tri Dao (@tri_dao) 1/

English

132

781

221.9K

Michael Griffiths@msjgriffiths·28 Şub

@rickasaurus Sub agents in a single session are cool. I expect that to develop a lot over the next year.

English

Rick@rickasaurus·28 Şub

What I want is an actual manager Claude to look after my other Claude’s but to come to me for what to do. Like a CTO reporting to a product focused CEO.

Andrej Karpathy@karpathy

I had the same thought so I've been playing with it in nanochat. E.g. here's 8 agents (4 claude, 4 codex), with 1 GPU each running nanochat experiments (trying to delete logit softcap without regression). The TLDR is that it doesn't work and it's a mess... but it's still very pretty to look at :) I tried a few setups: 8 independent solo researchers, 1 chief scientist giving work to 8 junior researchers, etc. Each research program is a git branch, each scientist forks it into a feature branch, git worktrees for isolation, simple files for comms, skip Docker/VMs for simplicity atm (I find that instructions are enough to prevent interference). Research org runs in tmux window grids of interactive sessions (like Teams) so that it's pretty to look at, see their individual work, and "take over" if needed, i.e. no -p. But ok the reason it doesn't work so far is that the agents' ideas are just pretty bad out of the box, even at highest intelligence. They don't think carefully though experiment design, they run a bit non-sensical variations, they don't create strong baselines and ablate things properly, they don't carefully control for runtime or flops. (just as an example, an agent yesterday "discovered" that increasing the hidden size of the network improves the validation loss, which is a totally spurious result given that a bigger network will have a lower validation loss in the infinite data regime, but then it also trains for a lot longer, it's not clear why I had to come in to point that out). They are very good at implementing any given well-scoped and described idea but they don't creatively generate them. But the goal is that you are now programming an organization (e.g. a "research org") and its individual agents, so the "source code" is the collection of prompts, skills, tools, etc. and processes that make it up. E.g. a daily standup in the morning is now part of the "org code". And optimizing nanochat pretraining is just one of the many tasks (almost like an eval). Then - given an arbitrary task, how quickly does your research org generate progress on it?

English

885

Michael Griffiths@msjgriffiths·25 Şub

@ModeledBehavior You mean you want people to come threaten my livelihood?? Take money away from my kids? No, that sounds like oppressor power language, real colonialist stuff. I have an inviolable right to life (of my choosing), freedom (from pain/threat), and property (the more the better!) /s

English

105

Adam Ozimek@ModeledBehavior·24 Şub

One of those things you shouldn’t have to say but these days you need to: Yes, high skilled workers should have to compete open.substack.com/pub/agglomerat…

English

5.7K

Keşfet

@DonutGup @StreetsblogNYC @ryderkessler @UNSWRNA @emollick @shcallaway @LakshyAAAgrawal @ChiragCX