Nathaniel Daw
2.3K posts

Nathaniel Daw
@nathanieldaw
Princeton neuro prof. But Twitter is an absurd platform for professional communication so I strive to use it most unprofessionally.









While Alec is one of the best ML researchers of all time, LLM started way before. Here's one from 2013 for non-neural architecture and one from 2016, which is afaik the first neural LLM if we define LLM as LM w/ >1B params.




My "squinted" understanding of both MaxRL and DG is they essentially reweight TP/FP/TN/FN differently, such that learning converges to the same as xent, and both have a very nice classification "toy" example to make it very clear. So I'm genuinely very curious if they are exactly the same independent finding just phrased differently, or if they have some important differences, and if so what they are. That's why i was looking for such discussion either in DG's related works section, or in the thread here :)































