I want to start a group chat for runners.
32 people max.
Two reqs:
1. You need to have an iPhone (sorry green bubbles)
2. You need to be a runner.
You don’t need to be elite. You just need to be committed.
We’ll make each other stronger, faster, better.
Who wants in?
@JeebsTX It would look better if we made the decision to change the Like button:
👍 422 👎
Not sure if I’m ready for the mob that it would create on this peaceful Sunday.
if you think AI will eventually "solve" an entire field in the limit, you're implicitly asserting that the growth of knowledge is fundamentally bounded.
claims about a field eventually being fully solved in the limit quietly assume the set of meaningful questions and problems is exhaustible and non-generative.
which is simply not true.
if anyone wants to try to finish up a speedrun attempt for the NanoGPT speedrun wr, i've implemented Muon+ (improves on norMuon by focusing on post-orthogonalization normalization w/o param-wise lr scaling), but haven't quite hit the record (quite close at about ~3.29 loss).
won't be working on this as I didn't have the intention of attempting to beat the record initially, but given that its quite close, figured someone might find it worth a shot.
putting links in comment below.
@harvie_z_z_w@unakar666 i wasn’t referring to the residual connections themselves, but rather to the identity transformation being better than gates for the residual connections.
DeepSeek mHC: dropping the "m" improves performance.
Empirical finding:  identity hres outperforms the original design. Concurrent discoveries by multiple teams confirm this
zhuanlan.zhihu.com/p/201085238967…