Jon Deaton

44 posts

Jon Deaton

@deaton_jon

Research Scientist at @GoogleDeepMind

San Francisco, CA انضم Kasım 2012

178 يتبع100 المتابعون

Jon Deaton@deaton_jon·16 Mar

@yoavgo its called "confirmation bias"

English

(((ل()(ل() 'yoav))))👾@yoavgo·14 Mar

is there a variant of bayesian inference where evidence that is aligned with the prior is weighted non linearly more than evidence that contradicts it?

English

Jon Deaton@deaton_jon·6 Mar

@tszzl @mrkocnnll @longstosee gerardian

Euskara

roon@tszzl·6 Mar

@mrkocnnll @longstosee I think the lack of violence in the modern world is weirder than the instincts to defend your kin tbh

English

26.1K

Jon Deaton@deaton_jon·19 Oca

@luusssso im proud of this one

English

lusso@luusssso·18 Oca

The Golden Gate Bridge is the single best American infrastructure project ever built and you can’t tell me otherwise

English

216

307

6.5K

2.7M

Jon Deaton@deaton_jon·26 Kas

@kvfrans pallas is the way to get what you want when xla doesn't do it

English

Kevin Frans@kvfrans·25 Kas

New notes: We've been building a research-friendly LLM-RL repo in JAX, and I recently took the time to optimize the sampling/training pipeline. We're able to match vLLM sampling and get decent training batchsizes now! notes.kvfrans.com/7-misc/rl-infr…

English

175

13.4K

Jon Deaton@deaton_jon·12 Kas

@francoisfleuret vae is an adversarial training dynamic are you expecting different curves

English

167

François Fleuret@francoisfleuret·12 Kas

Weirdest graph ever, but this thing is robust. The recovery on Human Eval + is spectacular. Anway version +1 already running, we'll see.

François Fleuret@francoisfleuret

English

16.7K

Jon Deaton@deaton_jon·12 Kas

@peterrhague another lagrange multiplier for the new constraint

English

Peter Hague@peterrhague·12 Kas

Wife: <problem> Me: <solution>? Wife: I don’t want <solution>! How do you get past this dynamic?

English

12K

883

29.6K

11.7M

Jon Deaton@deaton_jon·30 Eki

@snowclipsed take a look at alpha fold 3

English

506

snow@snowclipsed·29 Eki

just out of vain curiousity ; what happens if you increase the complexity of attention? like, has anyone tried cubic attention lol

Lucas Beyer (bl16)@giffmana

> There’s no free lunch. > When you reduce the complexity of attention, you pay a price. > The question is, where? This is *exactly* how I typically end my Transformer tutorial. This slide is already 4 years old, I've never updated it, but it still holds:

English

486

132.6K

Jon Deaton@deaton_jon·29 Eki

@francoisfleuret I thought this was going to be a big problem especially in your sigma gpt but I found it doesn't matter

English

902

François Fleuret@francoisfleuret·29 Eki

I really don't like that in the first layers X_t should be the representation of token t and gradually becomes that of token t+1 in the last layer. It makes absolutely no sense, it is objectively repugnant.

English

157

17.5K

Jon Deaton@deaton_jon·19 Eki

@owl_posting discworld probably fits

English

owl@owl_posting·19 Eki

what book should i read? my criteria is that it teaches you nothing, has great writing, and the author clearly had fun with it. this last point is extremely important

English

495

2.4K

136.3K

Jon Deaton@deaton_jon·9 Eki

@liujc1998 can we call them eigentokens

English

275

Jiacheng Liu@liujc1998·8 Eki

Ever wondered what CAN'T be transformed by Transformers? 🪨 I wrote a fun blog post on finding "fixed points" of your LLMs. If you prompt it with a fixed point token, the LLM is gonna decode it repeatedly forever, guaranteed. There's some connection with LLMs' repetition issue.

English

721

56.7K

Jon Deaton@deaton_jon·20 Eyl

@francoisfleuret more specifically clamp * tanh(x/clamp)

English

227

Jon Deaton@deaton_jon·20 Eyl

@francoisfleuret tanh

Tiếng Việt

3.4K

François Fleuret@francoisfleuret·20 Eyl

uhhh if there is a value that should not be too large, I clamp it because otherwise it explodes. If it hits the clamping value, then when times come to make it smaller, the gradient does not propagate to reduce it?

English

79.7K

Jon Deaton@deaton_jon·16 Eyl

@fermatslibrary best pythag triple is (1, i, 0)

English

325

Fermat's Library@fermatslibrary·16 Eyl

Happy Pythagorean Triple Square Day! Today’s date is made of 3 perfect squares and they form a Pythagorean triple: 3² + 4² = 5² This only happens once a century.

English

124

1.9K

6.7K

383.6K

Jon Deaton@deaton_jon·16 Ağu

@ebetica I did that on my own home directory once

English

Zeming Lin@ebetica·16 Ağu

You're absolutely right... You're absolutely right... You're absolutely right... You're absolutely right, I shouldn't have run rm -rf on your home directory.

English

703

Jon Deaton@deaton_jon·1 Ağu

@pdhsu Out of mana

English

Patrick Hsu@pdhsu·1 Ağu

ML peeps, is OOM out of memory or order of magnitude?

English

5.7K

Jon Deaton@deaton_jon·21 Haz

@francoisfleuret been there

English

François Fleuret@francoisfleuret·20 Haz

If after a week of profound reflexions you finally reinvent E-M, is it a

English

2.6K

Jon Deaton@deaton_jon·17 Nis

@thisismadani thats a neat trick you did to preserve the token positions of the span while doing fill-in-the middle. seems effective

English

261

Ali Madani@thisismadani·16 Nis

What could scaling unlock for biology? Introducing ProGen3- our next AI foundation models for protein generation. We develop compute-optimal scaling laws up to 46B parameters on 1.5T tokens with real evidence in the wet lab. +we solve a new set of challenges for drug discovery

English

351

80.5K

Jon Deaton@deaton_jon·2 Mar

@sea_snell have you heard of mup

English

Charlie Snell@sea_snell·2 Mar

> wake up > launch yet another YOLO run (600M H100 hours, powered by 16 suns) > spend entire day anxiously refreshing wandb > fuck, learning rate too high again > beg manager for just one more YOLO run tomorrow > go to bed and repeat

English

725

49.1K

Jon Deaton@deaton_jon·13 Ara

@flynnslick The yin and yang of Gerry Lopez

English

flynn slicker@flynnslick·12 Ara

Drop the best documentary that you’ve seen. One that blew your mind and you couldn’t stop thinking about for weeks. I want something that will break my brain.

English

8.4K

3.1K

53.7K

Jon Deaton@deaton_jon·4 Ara

This is an incredibly cool plot - it shows how protein language models form internal representations of physical structure when trained only on amino acid sequences selected by evolution. The representation fidelity scales with compute.

Alex Rives@alexrives

Information about protein structure in ESM C representations improves predictably with increasing training compute, demonstrating linear scaling across multiple orders of magnitude. (We overtrained the 300M and 600M models past the predicted point of compute optimality).

English

1.6K

اكتشف

@yoavgo @tszzl @mrkocnnll @longstosee @luusssso @kvfrans @francoisfleuret @peterrhague