Dersu
796 posts


Day 125/365 of GPU Programming One thing I'm still struggling to understand is why softmax? What is it about the softmax function that made it survive/thrive for this long? What is it about exp() compared to another positive, monotonic, differentiable function that is so sticky? So studying softmax functions in a bit more depth today, taking a look at optimizations via SFUs on Nvidia GPUs and listening to the GOATs (Andrew Ng, Hinton, etc) explain the reasoning behind softmax as a primary choice. If anyone has good resources that dive into softmax and softmax alternatives, please share!








Of course, this raises all sorts of questions about what is going to happen to mathematical research, with the impact on PhD students being particularly urgent. I give a few thoughts on this in the blog post, but I don't have anything like complete answers.





I've recently got in on the act of getting AI to solve open problems in mathematics. More precisely, I gave some questions asked by Melvyn Nathanson to ChatGPT 5.5 Pro, to which I have been given access, and it answered them. 🧵






Professor Marcos López de Prado at Cornell - the man Shannon and Thorp's framework eventually became He personally managed $13 billion at Guggenheim Partners with an audited risk-adjusted return of 2.3 - institutional Sharpe-equivalent that less than 1% of fund managers ever hit Then became the first head of machine learning at AQR Capital, a $226B fund. Then went to Cornell to teach this Shannon used information theory to beat Buffett. Thorp used it to beat Vegas. López de Prado uses it to manage billions for institutions today The article above is that exact same lineage applied to Polymarket - KL-divergence, max-entropy, entropy collapse, three tools you can use today 1 hour from one of three people on Earth qualified to teach this ↓




