Nathaniel Daw

2.3K posts

Nathaniel Daw banner
Nathaniel Daw

Nathaniel Daw

@nathanieldaw

Princeton neuro prof. But Twitter is an absurd platform for professional communication so I strive to use it most unprofessionally.

Princeton, NJ Katılım Eylül 2010
862 Takip Edilen7K Takipçiler
Nathaniel Daw
Nathaniel Daw@nathanieldaw·
@mattyglesias If it's like mine it keeps track of the average age of the gas in the tank so you can improve matters just by using some and topping it off
English
0
0
0
724
D K
D K@DamarisKroeber·
@marcelomattar @nathanieldaw Beautiful paper. This reminds me of a motor cortex “planning” paper by Churchland et al where they argue that the usefulness of a motor trajectory is already stored (learned) and simply “rotated” into a task-relevant output space.
English
2
0
2
42
Nathaniel Daw retweetledi
Marcelo Mattar
Marcelo Mattar@marcelomattar·
New Annual Review with @nathanieldaw. We argue that the planning machinery of the brain is mostly used for learning from simulated experience, and that thinking prospectively at decision time is just one special case of this more general process. annualreviews.org/content/journa…
English
3
48
188
14.7K
David Pfau
David Pfau@pfau·
@bygregorr Nothing is more tiresome than people after the fact being like "well akshually this other paper did something kinda similar if you squint". Nothing is truly original, huge breakthroughs always have some precedent, stop being a pendant about it.
English
4
4
119
6K
David Pfau
David Pfau@pfau·
Oh god are we really doing this? Jeff Dean trained an n-gram model on the entire internet in 2007. Jelinek coined the term "language model" in the '70s. It's called "Claude" because Claude Shannon was estimating the entropy rate of the English language in 1951!
Aran Komatsuzaki@arankomatsuzaki

While Alec is one of the best ML researchers of all time, LLM started way before. Here's one from 2013 for non-neural architecture and one from 2016, which is afaik the first neural LLM if we define LLM as LM w/ >1B params.

English
33
83
1.3K
473.7K
Nathaniel Daw
Nathaniel Daw@nathanieldaw·
@_Aaditya_Prasad @IanOsband @a_weers @giffmana If a low prob action has high value there is a big return gain for improving policy. If I collect data under pi, some actions will be under-sampled in that data. These two things seem separate, eg I could sample a diff data distribrtion and still discover the policy improvement
English
0
0
2
29
Alex Weers
Alex Weers@a_weers·
Interesting question, since both share the same motivation and try to reweigh gradients such that it is closer to CE. However, I don't think they are the same, they have different expected gradients and perform differently in practice. - MaxRL uses group statistics to counter the p factor in the expected gradient introduced by REINFORCE with w=((1-(1-p)^N)/p): one sample is PG (w=1), in the limit it becomes ML/CE (w = 1/p) - DG uses gates based on surprisal and advantage, so it works with single samples. In the special tabular case it adds a factor of sigmoid(-log p) to the expected gradient, which is a compression of gradients for high p, but a softer one. And for other (asymmetric) contexts the gradient directions rotates again The plots show the performance on MNIST for different number of rollouts per sample. DG is positioned in-between of PG and CE even for single rollouts per sample, but does not approximate CE exactly (in contrast to MaxRL).
Alex Weers tweet media
Lucas Beyer (bl16)@giffmana

My "squinted" understanding of both MaxRL and DG is they essentially reweight TP/FP/TN/FN differently, such that learning converges to the same as xent, and both have a very nice classification "toy" example to make it very clear. So I'm genuinely very curious if they are exactly the same independent finding just phrased differently, or if they have some important differences, and if so what they are. That's why i was looking for such discussion either in DG's related works section, or in the thread here :)

English
4
6
64
14.9K
Nathaniel Daw
Nathaniel Daw@nathanieldaw·
@IanOsband @a_weers @giffmana Low prob under current pi is useful for two reasons I think, one (your motivation?) is opportunity for policy improvement; but also these actions are poorly sampled on pi. Wonder if these are separable/which is doing the work. Anyway good to see you back at gdm!!
English
2
0
3
53
Ian Osband
Ian Osband@IanOsband·
Btw I don't think my intuition was ever "make it more like CE"... Although the paper does use that for justification. The intuition is more simple: > The best data for policy is an example doing something better than you normally do (high advantage) and low prob under current pi (high surprisal) So the idea is just to pay more attention to the most delightful data. Unlike maxRL that has nothing to do with "how many samples I take"... Make sense?
English
2
0
8
812
Nathaniel Daw
Nathaniel Daw@nathanieldaw·
@yoavgo I think many seminar courses esp in technical areas benefit from a bit of introductory lecturing to frame the questions and introduce the formalisms. I usually do a touch of this every week to set up next week's paper but a longer framing lecture at the start can be useful
English
1
0
1
416
(((ل()(ل() 'yoav))))👾
cs/ml/ai profs: do you have tips for not wasting the first class of a seminar course on purely logistics ("these are the topics, these are the papers, here is how the course works, who will present next week and what")? (this year we were graced by end of class being interrupted by incoming missile alert from Iran, but hopefully future years will be different)
English
4
2
13
4.1K
Nathaniel Daw
Nathaniel Daw@nathanieldaw·
@yoavgo @mmitchell_ai I also love the piece tbc: the point about conclusory terminology (attention, reasoning) is crucial and very widely applicable
English
0
0
1
18
Nathaniel Daw
Nathaniel Daw@nathanieldaw·
@yoavgo @mmitchell_ai Isn't a good parrot stochastic (as training objective) just bc target function is probabilistic? What I don't get is once the definition is refined to this it seems false-even old llms were instruction tuned, rlhf'd etc: not just parrots and not just due to other "ai" wrappers
English
1
0
1
20
MMitchell
MMitchell@mmitchell_ai·
"AI" is not a stochastic parrot.🦜 I wrote this piece a couple weeks ago, but it was hard for me to finish up given AI's role in society and war over the past few weeks. I should share it at some point though. Not perfect, but here it is. @margarmitchell/no-ai-is-not-a-stochastic-parrot-a99e57766bed" target="_blank" rel="nofollow noopener">medium.com/@margarmitchel
English
11
26
161
35.3K
Nathaniel Daw
Nathaniel Daw@nathanieldaw·
@TheEbonyMaw I was eating spicy noodles and my toddler toddled up and begged for a bite and I couldn't resist him and I took a tiny piece of noodle and scraped off the sauce. he put it in his mouth and gave me this soul shattering look of total shock and betrayal. Now he's 16.
English
0
0
0
218
Maw
Maw@TheEbonyMaw·
Sitting down. Drinking iced black coffee. 2yr old daughter (Twin A) walks over to me for a sip. She does this many times. I always say no. You know what? Just give her a sip. She’ll hate it, and then she’ll never ask again. I give her a sip. She likes it. Asks for another.
Maw tweet media
English
52
29
1.2K
14.6K
eigenrobot
eigenrobot@eigenrobot·
few to no professors on twitter drop bons mots under their own names and i think that speaks awfully of the profession. utterly degraded. you used to stand for learning and achievement
English
33
7
340
26.4K
Sandy Petersen 🪔
Sandy Petersen 🪔@SandyofCthulhu·
We've all played video games. So we all know how keys work. ANY key will open ANY door. The key is destroyed upon opening the door, and that door can never be locked again. Some doors must be fed several keys before they can open. I have never had a player argue with this description. It is accepted as truth. I've used it in several of my games.
Sandy Petersen 🪔 tweet media
English
84
20
1.4K
73K
Nathaniel Daw retweetledi
Sina Tafazoli
Sina Tafazoli@tafazolisina·
Thrilled that my paper is out in the @Nature . We explored how the brain builds complex tasks by compositionally combining simpler sub-task representations. nature.com/articles/s4158…
English
7
50
250
23.4K
Nathaniel Daw retweetledi
Three Year Letterman
Three Year Letterman@3YearLetterman·
Arrest everyone involved with this article right now
Three Year Letterman tweet media
English
84
146
2.9K
76.4K
Nathaniel Daw
Nathaniel Daw@nathanieldaw·
@polynoamial this oped so reminded me of the chomsky one about how LLMs couldnt possibly intuit the universal grammar.
English
0
0
2
360
Noam Brown
Noam Brown@polynoamial·
1987: AI can't win at chess—planning is uniquely human 1997: AI can't win at Go—intuition is uniquely human 2016: AI can't win at poker—bluffing is uniquely human 2023: AI can't get IMO gold—reasoning is uniquely human 2026: AI can't make wise decisions—judgment is uniquely human
Noam Brown tweet mediaNoam Brown tweet media
English
231
411
3.5K
968.5K
Tech Trad
Tech Trad@yipopov·
@nathanieldaw @tomfgoodwin No, because their whole restaurant industry runs on exactly those packaged ready-to-heat meals from a company called Sysco. All they have to do is cut out the middle man who punches the buttons on the microwave and sell them as is.
English
1
0
1
102
Tom Goodwin
Tom Goodwin@tomfgoodwin·
I'll never understand why the US doesn't really do ready made sandwiches. Around 5 times a week I want to eat something vaguely healthy, portable, immediate, and there's nothing. Slice of pizza? no thanks Fast food, hell no. Subway, piss off. Where's this sort of thing?
Tom Goodwin tweet mediaTom Goodwin tweet media
English
1.1K
29
2.2K
4.7M