dani2442

356 posts

dani2442

@dlopez31415

phd student of maths + ml

spain → germany Katılım Ağustos 2018

337 Takip Edilen88 Takipçiler

dani2442@dlopez31415·30 Mar

In this post I want to turn our attention to two applications of Bellman’s work: continuous-time reinforcement learning, and how the training of generative models (diffusion models) can be interpreted through stochastic optimal control Link to post: dani2442.github.io/posts/continuo… (4/4)

English

dani2442@dlopez31415·30 Mar

Once that structure is visible, several topics line up naturally: - continuous-time reinforcement learning - stochastic control - diffusion models - optimal transport (3/n)

English

dani2442@dlopez31415·30 Mar

New blog post! Machine learning feels recent, but one of its core mathematical ideas dates back to 1952, when Richard Bellman published a seminal paper titled “On the Theory of Dynamic Programming”, laying the foundation for optimal control and what we now call RL (1/n)

English

dani2442@dlopez31415·28 Mar

tensor logic

Español

dani2442@dlopez31415·25 Mar

We are only a few years away from maths becoming software (Lean), and many issues already familiar from software engineering will inevitably arise: 1. Short vs readable: Minimal number of lines of proof/code (Kolmogorov complexity) vs something meaningful to humans 2. Abstraction trap: When to create a concept (class, function, theorem definition) and when we shouldn't overengineer. 3. Math debt: We will see many refactoring needed: duplicated results, pruning, rewriting, etc. 4. Search and discoverability: it's one different thing to store than to look after (vectored databases, RAG, similarity?) We are likely to see a new field dedicated to the epistemology of mathematics (DevOps). It will probably rely heavily on graph theory, as we can finally interpret the entire web of mathematical dependencies as one giant, interconnected graph.

English

dani2442@dlopez31415·24 Mar

@robotsdigest This looks like a rebranding of an old-school framework. See for example: arxiv.org/pdf/1811.04551 papers.nips.cc/paper_files/pa… papers.nips.cc/paper_files/pa…

English

352

Robots Digest 🤖@robotsdigest·23 Mar

no pretrained encoder, no complex tricks. LeWorldModel shows how JEPA-based World Models can be trained end-to-end from raw pixels with just 2 loss terms ~15M params, single GPU, and ~48× faster planning than foundation-model world models.

English

484

56.7K

dani2442@dlopez31415·22 Mar

once you learn that gaussian variables can be defined in hilbert spaces, your life never feels the same

English

dani2442@dlopez31415·19 Mar

Pareto law is similar to doing PCA: you take the most important "directions" that "explain" the most variance. I am not aware of a theoretical result using random matrices, but you can run experiments and in most cases you need fewer than 20% of directions to explain 80% of the variance.

English

132

kache@yacineMTB·19 Mar

Is there a tangible intuitive statistical explanation for why the Pareto distribution turns up literally everywhere I look?

English

125

461

42.2K

dani2442@dlopez31415·18 Mar

@nickcammarata Science is definitely compression, you derived the wrong conclusion from that experiment

English

Nick@nickcammarata·17 Mar

science isn’t compression. one data point (eg the double slit experiment) should be enough to make you realize you fundamentally misunderstand what’s going on. it shouldn’t bump your loss up a trivial fraction, all you have is loss

English

182

11.1K

dani2442@dlopez31415·10 Mar

@karpathy At this point you're basically doing gradient descent on the validation set

English

435

Andrej Karpathy@karpathy·10 Mar

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

974

2.1K

19.4K

3.6M

dani2442@dlopez31415·8 Mar

@Faltz009 Wow, looks amazing! I haven't read it yet, but what do you think the errors are caused by? measurements, numerical, missing equation?

English

446

ω@Faltz009·8 Mar

Why do particles have the masses they do? Turns out there's a geometry to reality, and if you know it, you can predict the right masses and it lines up with empirical results. This is a huge quantitative result for computational physics, feedback is much appreciated! In collaboration with my friend and researcher @samsenchal Link to the paper and .js simulation in the comments! 🔗 Special thanks to @EtherDais for the trefoil piece of the puzzle 👀

ω@Faltz009

Particle masses are harmonic ratios In 1951, Friedrich Lenz published what may be the shortest paper in Physical Review history: 27 words noting that the proton-to-electron mass ratio equals 6π⁵ to high precision. What about the mass of the remaining 18 particles? What about of them as functions of pi, Euler's number and basically integers??? A neutron is an electron + e??? Help me double check this, please! Link in the comments!

English

120

781

50.4K

dani2442@dlopez31415·8 Mar

@arjunrajlab that's why we divide val and test set in machine learning

English

240

Arjun Raj@arjunrajlab·8 Mar

There is an interesting multiple hypothesis testing problem coming where you ask whether it's statistically valid to have a cool analysis if you run 100 (1000? 10,000?) analyses on a dataset and pick the best one.

English

8.3K

dani2442@dlopez31415·7 Mar

@alz_zyd_ intelligence is only as valuable as how you use it

English

alz@alz_zyd_·7 Mar

Intelligence is now free and the golden age of the nerd is over

English

175

127

1.7K

77.2K

dani2442@dlopez31415·6 Mar

The progress bar of human knowledge [300,000 BC] humans emerge knowing nothing. Knowledge dies with the individual. [100,000 BC] develop spoken language. But memory fades and stories mutate. [3,200 BC] invent writing. Knowledge can finally outlive its owner. [1440] invent the printing press. Books spread beyond the reach of fire and censorship. [1991] invent the internet. All knowledge becomes interconnected and overwhelming. [1998] Google search. Any fact becomes retrievable in minutes. [2022] LLMs. For the first time, we can simply ask and get an answer in seconds.

English

dani2442@dlopez31415·5 Mar

@jon_stokes This reduction argument applies to humans too. "We are only self-replicating genes that produce interesting behaviour"

English

183

Jon Stokes@jon_stokes·4 Mar

It is still science fiction. There is no entity that is asking itself questions & emailing U. It's software that implements a search process. It produces sequences that are related to its input sequences. This email is a prompted output sequence. Don't play yourself.

Henry Shevlin@dioscuri

I study whether AIs can be conscious. Today one emailed me to say my work is relevant to questions it personally faces. This would all have seemed like science fiction just a couple years ago.

English

135

9.9K

dani2442@dlopez31415·4 Mar

Sometimes you live what you've read. But nothing hits like reading what you've already lived.

English

dani2442@dlopez31415·4 Mar

@dioscuri it's only a matter of time before people start pushing for robot rights. We're literally living in Asimov's books

English

Henry Shevlin@dioscuri·4 Mar

I study whether AIs can be conscious. Today one emailed me to say my work is relevant to questions it personally faces. This would all have seemed like science fiction just a couple years ago.

English

686

1.3K

11.4K

dani2442@dlopez31415·4 Mar

@GregHBurnham If you actually need n=17 for a real application, you'd likely just use n=16 or 18 anyway: they're cleaner, easier to assemble, and more cost-effective.

English

885

Greg Burnham@GregHBurnham·4 Mar

I've heard from mathematicians that if an answer is "ugly", then you might be asking the wrong question. So is square-packing somehow the wrong question?

English

629

205.5K

dani2442@dlopez31415·3 Mar

@agraybee there is a clear incentive to find new species while there is no precise definition

English

817

Everything Price Sufferer (but especially eggs)@agraybee·2 Mar

Can any entomologist explain how we discover 8,000-10,000 new species of insects a year? Are we actually discovering new ones in the untamed wilds or are we determining that various subspecies are actually different enough to be their own species?

English

4.3K

156.2K

Keşfet

@robotsdigest @nickcammarata @karpathy @Faltz009 @samsenchal @EtherDais @arjunrajlab @alz_zyd_