Jon Deaton

44 posts

Jon Deaton banner
Jon Deaton

Jon Deaton

@deaton_jon

Research Scientist at @GoogleDeepMind

San Francisco, CA شامل ہوئے Kasım 2012
178 فالونگ100 فالوورز
(((ل()(ل() 'yoav))))👾
is there a variant of bayesian inference where evidence that is aligned with the prior is weighted non linearly more than evidence that contradicts it?
English
9
0
13
3K
roon
roon@tszzl·
@mrkocnnll @longstosee I think the lack of violence in the modern world is weirder than the instincts to defend your kin tbh
English
26
13
1K
26.1K
lusso
lusso@luusssso·
The Golden Gate Bridge is the single best American infrastructure project ever built and you can’t tell me otherwise
lusso tweet medialusso tweet medialusso tweet media
English
216
307
6.5K
2.7M
Jon Deaton
Jon Deaton@deaton_jon·
@kvfrans pallas is the way to get what you want when xla doesn't do it
English
0
0
0
87
Kevin Frans
Kevin Frans@kvfrans·
New notes: We've been building a research-friendly LLM-RL repo in JAX, and I recently took the time to optimize the sampling/training pipeline. We're able to match vLLM sampling and get decent training batchsizes now! notes.kvfrans.com/7-misc/rl-infr…
Kevin Frans tweet media
English
4
20
175
13.4K
Jon Deaton
Jon Deaton@deaton_jon·
@francoisfleuret vae is an adversarial training dynamic are you expecting different curves
English
0
0
0
167
Peter Hague
Peter Hague@peterrhague·
Wife: <problem> Me: <solution>? Wife: I don’t want <solution>! How do you get past this dynamic?
English
12K
883
29.6K
11.7M
Jon Deaton
Jon Deaton@deaton_jon·
@francoisfleuret I thought this was going to be a big problem especially in your sigma gpt but I found it doesn't matter
English
1
0
2
902
François Fleuret
François Fleuret@francoisfleuret·
I really don't like that in the first layers X_t should be the representation of token t and gradually becomes that of token t+1 in the last layer. It makes absolutely no sense, it is objectively repugnant.
English
30
7
157
17.5K
owl
owl@owl_posting·
what book should i read? my criteria is that it teaches you nothing, has great writing, and the author clearly had fun with it. this last point is extremely important
English
495
87
2.4K
136.3K
Jiacheng Liu
Jiacheng Liu@liujc1998·
Ever wondered what CAN'T be transformed by Transformers? 🪨 I wrote a fun blog post on finding "fixed points" of your LLMs. If you prompt it with a fixed point token, the LLM is gonna decode it repeatedly forever, guaranteed. There's some connection with LLMs' repetition issue.
Jiacheng Liu tweet media
English
12
61
721
56.7K
François Fleuret
François Fleuret@francoisfleuret·
uhhh if there is a value that should not be too large, I clamp it because otherwise it explodes. If it hits the clamping value, then when times come to make it smaller, the gradient does not propagate to reduce it?
English
6
3
72
79.7K
Fermat's Library
Fermat's Library@fermatslibrary·
Happy Pythagorean Triple Square Day! Today’s date is made of 3 perfect squares and they form a Pythagorean triple: 3² + 4² = 5² This only happens once a century.
Fermat's Library tweet media
English
124
1.9K
6.7K
383.6K
Jon Deaton
Jon Deaton@deaton_jon·
@ebetica I did that on my own home directory once
English
1
0
3
69
Zeming Lin
Zeming Lin@ebetica·
You're absolutely right... You're absolutely right... You're absolutely right... You're absolutely right, I shouldn't have run rm -rf on your home directory.
English
2
0
9
703
Patrick Hsu
Patrick Hsu@pdhsu·
ML peeps, is OOM out of memory or order of magnitude?
English
9
1
6
5.7K
François Fleuret
François Fleuret@francoisfleuret·
If after a week of profound reflexions you finally reinvent E-M, is it a
English
4
0
3
2.6K
Jon Deaton
Jon Deaton@deaton_jon·
@thisismadani thats a neat trick you did to preserve the token positions of the span while doing fill-in-the middle. seems effective
English
0
0
1
261
Ali Madani
Ali Madani@thisismadani·
What could scaling unlock for biology? Introducing ProGen3- our next AI foundation models for protein generation. We develop compute-optimal scaling laws up to 46B parameters on 1.5T tokens with real evidence in the wet lab. +we solve a new set of challenges for drug discovery
English
21
61
351
80.5K
Charlie Snell
Charlie Snell@sea_snell·
> wake up > launch yet another YOLO run (600M H100 hours, powered by 16 suns) > spend entire day anxiously refreshing wandb > fuck, learning rate too high again > beg manager for just one more YOLO run tomorrow > go to bed and repeat
English
22
14
725
49.1K
flynn slicker
flynn slicker@flynnslick·
Drop the best documentary that you’ve seen. One that blew your mind and you couldn’t stop thinking about for weeks. I want something that will break my brain.
English
8.4K
3.1K
53.7K
9M
Jon Deaton
Jon Deaton@deaton_jon·
This is an incredibly cool plot - it shows how protein language models form internal representations of physical structure when trained only on amino acid sequences selected by evolution. The representation fidelity scales with compute.
Alex Rives@alexrives

Information about protein structure in ESM C representations improves predictably with increasing training compute, demonstrating linear scaling across multiple orders of magnitude. (We overtrained the 300M and 600M models past the predicted point of compute optimality).

English
0
2
15
1.6K