Norman Casagrande

2.4K posts

Norman Casagrande

@nova77t

ML, history, space & sciencey stuff. Research Eng @ Google DeepMind. Opinions are my own etc. Find me @adabstract.bsky.social & @[email protected]

@ Google DeepMind Katılım Ocak 2010

247 Takip Edilen796 Takipçiler

Norman Casagrande retweetledi

Andrej Karpathy@karpathy·10 Mar

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

961

2.1K

19.3K

3.5M

Norman Casagrande retweetledi

Seth Forsgren@sethforsgren·24 Şub

It started with the silly idea of generating images and turning them into music. We didn’t know if it would work, and we didn’t care if anyone used it. We were building it for ourselves. As musicians, we marveled at how this instrument could inspire and challenge us. A lot has changed since then, but that sense of wonder has remained. Couldn’t be more excited to continue the journey as part of @GoogleLabs

Producer.ai@producer_ai

Producer is now part of Google! We’re proud to be joining @GoogleLabs and @GoogleDeepMind to build the future of music creation. Producer is here to stay, with more on the way. Come make music with us!

English

8.2K

Norman Casagrande@nova77t·21 Şub

@dccommonsense Time to turn the executive into a council like in Switzerland (whose constitution was inspired by the US one). Swiss council has 7 members: while not perfect it's another obstacle to tyranny. Especially if they are forced to speak "as one" instead of across party lines.

English

Dan Carlin@dccommonsense·20 Şub

(More)...and one of the main rationales for reducing presidential power is that that's the most likely branch of the government to lead to tyranny. Congress,of course,has taken its turn at being overbearing...but it's harder to be tyranized by hundreds of humans than one person.

English

1.1K

34.2K

Norman Casagrande retweetledi

Jason Baldridge@jasonbaldridge·20 Şub

A little jam for all the absent minded professors out there. (And demonstrating some of Lyria's flexibility and weirdness. I love the waka-waka Doppler-effect sound from 24-26 seconds in.)

English

469

Norman Casagrande retweetledi

David Pfau@pfau·19 Şub

Alright, time for my own vibe-coding story. Over the last several months, we've been rewriting the plasma simulator used as an RL environment in our 2022 Nature paper from Matlab to JAX. An experimental version is now available on EPFL's Gitlab: gitlab.epfl.ch/spc/public/meq…

English

100

10.1K

Norman Casagrande retweetledi

Sander Dieleman@sedielem·18 Şub

Some recent examples include: - Sphere encoders: arxiv.org/abs/2602.15030 x.com/tomgoldsteincs… - The wristband Gaussian loss: x.com/MParakhin/stat… - Distribution matching variational autoencoders: arxiv.org/abs/2512.07778 (3/10)

Mikhail Parakhin@MParakhin

x.com/i/article/2022…

English

Norman Casagrande@nova77t·19 Şub

@pfau hahah, yeah: fusion as well. But I cannot claim much fame since my contribution was minuscule 😅 (for now at least!😉)

English

David Pfau@pfau·19 Şub

@nova77t Um, and fusion?

English

909

Norman Casagrande@nova77t·18 Şub

I can finally share something I've been working on since last summer. And it's one of my oldest love: music & machine learning! :) deepmind.google/models/lyria/

English

1.4K

Norman Casagrande retweetledi

Google Gemini@GeminiApp·18 Şub

Introducing Lyria 3, our new music generation model in Gemini that lets you turn any idea, photo, or video into a high-fidelity track with custom lyrics. From funny jingles to lo-fi beats, you can create custom 30-second soundtracks for any moment. See how it works. 🧵

English

505

1.3K

8.9K

4.6M

Norman Casagrande@nova77t·18 Şub

And I can finally update my awkward answering machine reply! 😄🎵

English

217

Norman Casagrande@nova77t·9 Şub

@DarioBressanini @malinverno_luca Oh la seconda: non avrei mai immaginato che qualcuno desse in pasto un libro per scrivere codice migliore!

Italiano

Dario Bressanini@DarioBressanini·9 Şub

@nova77t @malinverno_luca Non ti aspettavi che usassi Gemini? O che gli abbia dato in pasto qualcosa prima? (volevo anche dargli il Kernighan & Plauger ma non l'avevo sottomano)

Italiano

Dario Bressanini@DarioBressanini·6 Şub

Sono stupefatto: sto sperimentando con la generazione di codice tramite AI. Ho chiesto di scrivere da zero degli algoritmi di simulazioni di computational physics (dove devi prima capire cosa sono e cosa fanno prima di programmare). Ha fatto tutto! Io ci avrei messo un mese. 😱

Italiano

694

73.2K

Norman Casagrande@nova77t·9 Şub

@DarioBressanini @malinverno_luca Hahaha, questo non me lo sarei aspettato! È un po’ che non lavoro su modelli di generazione del codice(in particolare “agentic”) quindi non posso prendermi il merito, ma passerò il messaggio d’apprezzamento

Italiano

Dario Bressanini@DarioBressanini·9 Şub

@nova77t @malinverno_luca Oddio su X è un po’ difficile ;) comunque prima che iniziasse a scrivere codice (uso Gemini) ho avuto un lungo scambio di idee su come volevo che scrivesse. E gli ho anche dato un pasto un famoso scritto di Rob Pike per vedere a concordava 😄

Italiano

Norman Casagrande@nova77t·8 Şub

@DarioBressanini @malinverno_luca Ecco, ora sarei curioso di vederlo! ;)

Italiano

Dario Bressanini@DarioBressanini·8 Şub

@nova77t @malinverno_luca Io sono decisamente bravo a programmare ;) e comunque mi ha stupito per la chiarezza del codice.

Italiano

Norman Casagrande@nova77t·8 Şub

@DarioBressanini @malinverno_luca Ma certo, tutto è relativo. Dove l’ho trovato particolarmente utile era in un progettino per la conversione di codice da Matlab scritto da fisici (!) a Jax. In confronto al codice dei fisici quello del modello era Knuth! Certo la barra di partenza era bassa, eh! :p

Italiano

Dario Bressanini@DarioBressanini·8 Şub

@nova77t @malinverno_luca Io invece sono rimasto piacevolmente sorpreso dalleleganza. Nomi di variabili evocativi, funzioni chiare, strutture dati solide e tutto ben commentato. Ho dovuto solo abbassare un po la sua propensione a pensare “per oggetti”

Italiano

Norman Casagrande@nova77t·8 Şub

@malinverno_luca @DarioBressanini Esatto. Con "eleganza" non mi riferisco alle 2 linee criptiche al posto di 40, quanto a qualcosa che e' facilmente mantenibile, leggibile, e chiaro nel suo intento.

Italiano

Luca Malinverno, PhD@malinverno_luca·7 Şub

@nova77t @DarioBressanini Quella dell'elenganza é una metrica molto interessante in effetti... Potrebbe essere usata come discriminante sul codice. perchè poi elegante si traduce in sicuro, scalabile, manutenibile ecc...

Italiano

Norman Casagrande@nova77t·8 Şub

@Petedemountain @DarioBressanini Scrivo codice da 30 anni: se mi fa risparmiare e' al massimo qualche minuto, anche se e' cmq uno strumento interessante. A volte mi tocca pero' riarrangiare quanto mi suggerisce perche' l'"eleganza" e' quello che ti permette di risparmiare tempo alla lunga!

Italiano

Pete de mountain@Petedemountain·7 Şub

@nova77t @DarioBressanini Quanto conta l'eleganza quando ti fa in mezz'ora quello che a te avrebbe richiesto una settimana? In fondo anche quando hanno inventato i linguaggi di alto livello si è perso il "dettaglio fine" nella scrittura del linguaggio macchina ma il vantaggio in termini di tempo era tale.

Italiano

Norman Casagrande retweetledi

David Pfau@pfau·12 Oca

This is the key difference between in-domain and out-of-domain generalization, and we still have not truly solved out-of-domain generalization. It just turns out you can build world changing technology by throwing so much data at things that the entire universe is in-domain.

Niels Rogge@NielsRogge

One of the best visual explanations I've ever seen for why scaling Transformers works, but is suboptimal, as it's just brute-forcing things, by @YesThisIsLion (co-author of the Transformer) on @MLStreetTalk "In the (rejected) paper "Intelligent Matrix Exponentiation", they show the decision boundary of a classic MLP with a ReLu/Tanh activation function on the classic Spiral dataset." "You can see they both technically solve it with great scores on the test set. Next, they show the decision boundary of the "M-layer" they propose in the paper. And it represents the spiral ... as a spiral!" "Shouldn't we? If the data is a spiral... shouldn't we represent it as a spiral?" "If you look back at the decision boundaries of the MLP, it's clear that you just have these tiny, piecewise separations without learning the concept of a spiral. That's what I mean!" "If you train these things enough, it can fit the spiral and get a high accuracy. But there's no indication that the MLP actually understands a spiral. When you represent it as a spiral, it extrapolates correctly, cause the spiral just keeps going out."

English

337

35.9K

Norman Casagrande@nova77t·6 Oca

@ibab @DavidSHolz One of my favorite quotes, even though it's from that crazy dude.

English

Igor Babuschkin@ibab·4 Oca

@DavidSHolz There are decades where nothing happens, and there are weeks where decades happen

English

476

19.9K

David@DavidSHolz·4 Oca

ive done more personal coding projects over christmas break than i have in the last 10 years. its crazy. i can sense the limitations, but i *know* nothing is going to be the same anymore.

English

298

464

8.5K

1.2M

Norman Casagrande retweetledi

underwood@underwoodxie96·30 Kas

I tweaked the prompt. The old version generated different camera angles from a single image, but it’s hard to stitch those into a coherent clip. So I updated the setup: now the AI expands keyframes based on the same scene + storyline for better continuity.

TechHalla@techhalla

these 2 prompts for Nano Banana Pro will save you a ton of time. just upload an image, generate the cinematic grid, and pull the frames you like! examples made in Higgsfield AI, and prompts below 👇

English

690K

Keşfet

@GoogleLabs @dccommonsense @pfau @DarioBressanini @malinverno_luca @elonmusk @BarackObama @taylorswift13