cider

775 posts

cider

cider

@jeffreycider

my purpose in life is to forget linear algebra 2x a year

San Francisco, CA Katılım Eylül 2019
632 Takip Edilen2.5K Takipçiler
Sabitlenmiş Tweet
cider
cider@jeffreycider·
linear transformations stretch euclidean space ReLU folds euclidean space neural networks are just repeated origami on high-dimensional laffy taffy
English
8
35
467
0
cider
cider@jeffreycider·
@norpadon (3) is just reservoir sampling right
English
1
0
0
464
Artur Chakhvadze
Artur Chakhvadze@norpadon·
Some fun ML interview problems
Artur Chakhvadze tweet media
English
5
6
197
20.2K
cider
cider@jeffreycider·
@LotusAbhi @konstmish this has been empirically known for awhile, not sure what the seminal reference is. central flows explains why constant LR prevents sharpness explosion
English
0
0
0
43
Konstantin Mishchenko
Konstantin Mishchenko@konstmish·
A nice and easy to read paper on what the existing Muon literature is missing.
Konstantin Mishchenko tweet mediaKonstantin Mishchenko tweet media
English
3
18
174
17.1K
cider
cider@jeffreycider·
@konstmish in fact, muon was designed as the steepest descent optimizer in a constant sharpness regime (eqn 7 in bernstein's modular duality paper)
English
0
0
0
51
cider
cider@jeffreycider·
@konstmish yes, my criticism is not that it's an apples-to-oranges comparison. it's that all useful NN optimization occurs in the constant-sharpness regime, so a line-search regime is irrelevant
English
1
0
1
36
atlas
atlas@creatine_cycle·
>show up to robot fight >opponent is 13 >lose
English
21
9
244
34.5K
tender
tender@tenderizzation·
@distributionat is daeho good if you don't want gimmick instagram food, asking for a friend
English
6
0
3
1K
toucan
toucan@distributionat·
My SF food take is that I think San Ho Won isn't very good. Nothing wrong with the food, it simply isn't very good. The banchan is fine, the meat is fine, only the service is one star. Han Il Kwan is good, Hwa Mi Won is good, Daeho is good. I don't understand San Ho Won.
English
11
0
71
10.4K
ylareia
ylareia@Impish_Bunny·
just finished breath of the wild after putting it off for several weeks :(
English
1
0
9
1.1K
cider
cider@jeffreycider·
@Nexuist @twocents how does a single USD put me in the top quartile
cider tweet media
English
7
0
124
10.5K
cider
cider@jeffreycider·
@tenobrus tokenization in the brain (or any sort of discretization) is really hard to imagine. like whatever neuron performs the discrete selection (like a softmax operation) is going to have to be physically wired to every language-related neuron
English
0
0
7
243
Tenobrus
Tenobrus@tenobrus·
do you think humans have something very like autoregressive token completion as one of (altho maybe not only) the core primitives in our language or world modeling?
English
10
0
25
2.9K
cider
cider@jeffreycider·
@yifan_zhang_ the Mu in Muon stands for momentum. the precise claim is that if you strip accumulation out of both Muon and Shampoo, they both give identical steepest descent updates
English
1
0
2
500
cider
cider@jeffreycider·
@skooookum isn't the mta run by new york state
English
0
0
0
93
skooks
skooks@skooookum·
If there was a NYC mayoral candidate whose *only* campaign promise was to make the subways better and increase the construction of new subway no one would be able to touch them.
English
3
1
88
3.7K
Bryan Cheong
Bryan Cheong@bryancsk·
Which meme do you think he shared with President Xi
Bryan Cheong tweet mediaBryan Cheong tweet media
English
6
2
85
5.5K
cider
cider@jeffreycider·
@khoomeik shifting mass between exponential cost coefficients
English
0
0
1
55
cider
cider@jeffreycider·
@SeunghyunSEO7 only in the 124M track. og muon is still better on the 350M track.
English
1
0
2
268
Seunghyun Seo
Seunghyun Seo@SeunghyunSEO7·
just noticed modded-nanogpt adopt 'NorMuon' as default (?). it looks like `AdaMuon`. i personally didnt buy this idea because i thought Muon is enough and dont want to introduce optim state for 2nd moment again like adam... hmm arxiv.org/abs/2510.05491 arxiv.org/abs/2507.11005…
Seunghyun Seo tweet mediaSeunghyun Seo tweet mediaSeunghyun Seo tweet media
English
6
7
65
5.3K
cider
cider@jeffreycider·
@aspergtame > compute is an inference hyperparameter banger
English
1
0
2
34
Tim Kanarsky
Tim Kanarsky@tkanarsky·
I'm trash at everything I do.
English
3
0
1
350