vxnuaj

18 posts

vxnuaj banner
vxnuaj

vxnuaj

@vxnuaj

grokking.

sf Katılım Eylül 2022
522 Takip Edilen4.7K Takipçiler
vxnuaj
vxnuaj@vxnuaj·
@danielcberk hi. been running for 4 years. currently on a build up to 100 miles a week before racing a half marathon (hopefully @ mid 1:10’s)
English
0
0
3
439
Daniel Berk 🐝
Daniel Berk 🐝@danielcberk·
I want to start a group chat for runners. 32 people max. Two reqs: 1. You need to have an iPhone (sorry green bubbles) 2. You need to be a runner. You don’t need to be elite. You just need to be committed. We’ll make each other stronger, faster, better. Who wants in?
English
68
0
95
11.5K
Nikita Bier
Nikita Bier@nikitabier·
@JeebsTX It would look better if we made the decision to change the Like button: 👍 422 👎 Not sure if I’m ready for the mob that it would create on this peaceful Sunday.
English
466
24
1.3K
124.4K
JeebsTX 🇺🇸
JeebsTX 🇺🇸@JeebsTX·
Hey @nikitabier, that dislike button looks so out of place in replies!! What's the end goal here? I can select “Spam” On Elon’s reply? lol
JeebsTX 🇺🇸 tweet mediaJeebsTX 🇺🇸 tweet media
English
19
6
217
62K
Reads with Ravi
Reads with Ravi@readswithravi·
Marcus Aurelius wrote this over 1800 years ago: “Think of yourself as dead. You have lived your life. Now take what's left and live it properly.”
English
75
2.5K
21.2K
605.7K
vxnuaj
vxnuaj@vxnuaj·
nothing is inevitable except problems but all problems are soluble
English
0
0
1
127
vxnuaj
vxnuaj@vxnuaj·
if you think AI will eventually "solve" an entire field in the limit, you're implicitly asserting that the growth of knowledge is fundamentally bounded. claims about a field eventually being fully solved in the limit quietly assume the set of meaningful questions and problems is exhaustible and non-generative. which is simply not true.
English
2
0
7
249
vxnuaj
vxnuaj@vxnuaj·
if anyone wants to try to finish up a speedrun attempt for the NanoGPT speedrun wr, i've implemented Muon+ (improves on norMuon by focusing on post-orthogonalization normalization w/o param-wise lr scaling), but haven't quite hit the record (quite close at about ~3.29 loss). won't be working on this as I didn't have the intention of attempting to beat the record initially, but given that its quite close, figured someone might find it worth a shot. putting links in comment below.
vxnuaj tweet media
English
1
0
8
839
vxnuaj
vxnuaj@vxnuaj·
@harvie_z_z_w @unakar666 i wasn’t referring to the residual connections themselves, but rather to the identity transformation being better than gates for the residual connections.
English
1
0
0
168
harvie
harvie@harvie_z_z_w·
@vxnuaj @unakar666 It’s gated connections / highway network and the plagiarism-hc didn’t cite it, even the first gradient vanishing paper by Sepp
English
1
0
1
106
Tian Xie (Unakar)
Tian Xie (Unakar)@unakar666·
DeepSeek mHC: dropping the "m" improves performance. Empirical finding:  identity hres outperforms the original design. Concurrent discoveries by multiple teams confirm this zhuanlan.zhihu.com/p/201085238967…
Tian Xie (Unakar) tweet media
English
5
34
315
41.4K
Name can't be blank
Name can't be blank@Algon_33·
Remember the LLM MCTS fad a few years ago? What happened to it?
English
5
0
13
2K
Adavya Sharma
Adavya Sharma@adavya_sharma·
“happiness in intelligent people is the rarest thing i know” -Ernest Hemingway
English
2
0
2
221