dani2442

356 posts

dani2442 banner
dani2442

dani2442

@dlopez31415

phd student of maths + ml

spain → germany Katılım Ağustos 2018
337 Takip Edilen88 Takipçiler
dani2442
dani2442@dlopez31415·
In this post I want to turn our attention to two applications of Bellman’s work: continuous-time reinforcement learning, and how the training of generative models (diffusion models) can be interpreted through stochastic optimal control Link to post: dani2442.github.io/posts/continuo… (4/4)
English
0
0
1
7
dani2442
dani2442@dlopez31415·
Once that structure is visible, several topics line up naturally: - continuous-time reinforcement learning - stochastic control - diffusion models - optimal transport (3/n)
English
1
0
0
25
dani2442
dani2442@dlopez31415·
New blog post! Machine learning feels recent, but one of its core mathematical ideas dates back to 1952, when Richard Bellman published a seminal paper titled “On the Theory of Dynamic Programming”, laying the foundation for optimal control and what we now call RL (1/n)
dani2442 tweet media
English
1
0
1
32
dani2442
dani2442@dlopez31415·
tensor logic
dani2442 tweet media
Español
0
0
0
24
dani2442
dani2442@dlopez31415·
We are only a few years away from maths becoming software (Lean), and many issues already familiar from software engineering will inevitably arise: 1. Short vs readable: Minimal number of lines of proof/code (Kolmogorov complexity) vs something meaningful to humans 2. Abstraction trap: When to create a concept (class, function, theorem definition) and when we shouldn't overengineer. 3. Math debt: We will see many refactoring needed: duplicated results, pruning, rewriting, etc. 4. Search and discoverability: it's one different thing to store than to look after (vectored databases, RAG, similarity?) ​We are likely to see a new field dedicated to the epistemology of mathematics (DevOps). It will probably rely heavily on graph theory, as we can finally interpret the entire web of mathematical dependencies as one giant, interconnected graph.
English
0
0
0
40
Robots Digest 🤖
Robots Digest 🤖@robotsdigest·
no pretrained encoder, no complex tricks. LeWorldModel shows how JEPA-based World Models can be trained end-to-end from raw pixels with just 2 loss terms ~15M params, single GPU, and ~48× faster planning than foundation-model world models.
Robots Digest 🤖 tweet media
English
16
45
484
56.7K
dani2442
dani2442@dlopez31415·
once you learn that gaussian variables can be defined in hilbert spaces, your life never feels the same
dani2442 tweet media
English
0
0
0
23
dani2442
dani2442@dlopez31415·
Pareto law is similar to doing PCA: you take the most important "directions" that "explain" the most variance. I am not aware of a theoretical result using random matrices, but you can run experiments and in most cases you need fewer than 20% of directions to explain 80% of the variance.
English
0
0
0
132
kache
kache@yacineMTB·
Is there a tangible intuitive statistical explanation for why the Pareto distribution turns up literally everywhere I look?
English
125
5
461
42.2K
dani2442
dani2442@dlopez31415·
@nickcammarata Science is definitely compression, you derived the wrong conclusion from that experiment
English
0
0
0
96
Nick
Nick@nickcammarata·
science isn’t compression. one data point (eg the double slit experiment) should be enough to make you realize you fundamentally misunderstand what’s going on. it shouldn’t bump your loss up a trivial fraction, all you have is loss
English
15
4
182
11.1K
dani2442
dani2442@dlopez31415·
@karpathy At this point you're basically doing gradient descent on the validation set
English
0
0
7
435
Andrej Karpathy
Andrej Karpathy@karpathy·
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Andrej Karpathy tweet media
English
974
2.1K
19.4K
3.6M
dani2442
dani2442@dlopez31415·
@Faltz009 Wow, looks amazing! I haven't read it yet, but what do you think the errors are caused by? measurements, numerical, missing equation?
English
1
0
1
446
ω
ω@Faltz009·
Why do particles have the masses they do? Turns out there's a geometry to reality, and if you know it, you can predict the right masses and it lines up with empirical results. This is a huge quantitative result for computational physics, feedback is much appreciated! In collaboration with my friend and researcher @samsenchal Link to the paper and .js simulation in the comments! 🔗 Special thanks to @EtherDais for the trefoil piece of the puzzle 👀
ω tweet mediaω tweet mediaω tweet mediaω tweet media
ω@Faltz009

Particle masses are harmonic ratios In 1951, Friedrich Lenz published what may be the shortest paper in Physical Review history: 27 words noting that the proton-to-electron mass ratio equals 6π⁵ to high precision. What about the mass of the remaining 18 particles? What about of them as functions of pi, Euler's number and basically integers??? A neutron is an electron + e??? Help me double check this, please! Link in the comments!

English
49
120
781
50.4K
dani2442
dani2442@dlopez31415·
@arjunrajlab that's why we divide val and test set in machine learning
English
0
0
0
240
Arjun Raj
Arjun Raj@arjunrajlab·
There is an interesting multiple hypothesis testing problem coming where you ask whether it's statistically valid to have a cool analysis if you run 100 (1000? 10,000?) analyses on a dataset and pick the best one.
English
6
0
31
8.3K
dani2442
dani2442@dlopez31415·
@alz_zyd_ intelligence is only as valuable as how you use it
English
0
0
2
24
alz
alz@alz_zyd_·
Intelligence is now free and the golden age of the nerd is over
English
175
127
1.7K
77.2K
dani2442
dani2442@dlopez31415·
The progress bar of human knowledge [300,000 BC] humans emerge knowing nothing. Knowledge dies with the individual. [100,000 BC] develop spoken language. But memory fades and stories mutate. [3,200 BC] invent writing. Knowledge can finally outlive its owner. [1440] invent the printing press. Books spread beyond the reach of fire and censorship. [1991] invent the internet. All knowledge becomes interconnected and overwhelming. [1998] Google search. Any fact becomes retrievable in minutes. [2022] LLMs. For the first time, we can simply ask and get an answer in seconds.
English
0
0
0
27
dani2442
dani2442@dlopez31415·
@jon_stokes This reduction argument applies to humans too. "We are only self-replicating genes that produce interesting behaviour"
English
0
0
5
183
Jon Stokes
Jon Stokes@jon_stokes·
It is still science fiction. There is no entity that is asking itself questions & emailing U. It's software that implements a search process. It produces sequences that are related to its input sequences. This email is a prompted output sequence. Don't play yourself.
Henry Shevlin@dioscuri

I study whether AIs can be conscious. Today one emailed me to say my work is relevant to questions it personally faces. This would all have seemed like science fiction just a couple years ago.

English
36
11
135
9.9K
dani2442
dani2442@dlopez31415·
Sometimes you live what you've read. But nothing hits like reading what you've already lived.
English
0
0
1
32
dani2442
dani2442@dlopez31415·
@dioscuri it's only a matter of time before people start pushing for robot rights. We're literally living in Asimov's books
English
0
0
1
71
Henry Shevlin
Henry Shevlin@dioscuri·
I study whether AIs can be conscious. Today one emailed me to say my work is relevant to questions it personally faces. This would all have seemed like science fiction just a couple years ago.
Henry Shevlin tweet media
English
686
1.3K
11.4K
1M
dani2442
dani2442@dlopez31415·
@GregHBurnham If you actually need n=17 for a real application, you'd likely just use n=16 or 18 anyway: they're cleaner, easier to assemble, and more cost-effective.
English
0
0
0
885
Greg Burnham
Greg Burnham@GregHBurnham·
I've heard from mathematicians that if an answer is "ugly", then you might be asking the wrong question. So is square-packing somehow the wrong question?
English
65
12
629
205.5K
dani2442
dani2442@dlopez31415·
@agraybee there is a clear incentive to find new species while there is no precise definition
English
0
0
0
817
Everything Price Sufferer (but especially eggs)
Can any entomologist explain how we discover 8,000-10,000 new species of insects a year? Are we actually discovering new ones in the untamed wilds or are we determining that various subspecies are actually different enough to be their own species?
English
33
24
4.3K
156.2K