EIFY

7.2K posts

EIFY

EIFY

@EIFY

Software Engineer

Seattle เข้าร่วม Aralık 2008
261 กำลังติดตาม264 ผู้ติดตาม
ทวีตที่ปักหมุด
EIFY
EIFY@EIFY·
How did increased regulation of childhood affect social and geographical mobility?
English
0
1
5
0
EIFY
EIFY@EIFY·
@ml_4rtemi5 Cool! We have experimented with neg. Euclidean distance squared logit for CLIP-like models and gained some insights into them, so you may want to take a look. The next thing I would try is to remove the final LN, possibly w/ residual scaling: arxiv.org/abs/2409.13079 w/ @nahidalam
EIFY tweet media
English
0
0
1
61
Raphael Pisoni
Raphael Pisoni@ml_4rtemi5·
I dove deeper into the rabbit hole of RBF-Attention. I refined the Triton kernel, added register-tokens and developed SuSiE positional embedding as a replacement for RoPE in Euclidean space. Go have a look at the repo or the blogpost in the comments if you're interested! :)
Raphael Pisoni@ml_4rtemi5

For some reason I decided to swap out standard dot-product attention for a scaled-rbf kernel. Pretty much expected it to fail to converge or be impossibly slow but the scaled-rbf-attention is getting unexpectedly good results?? 👇

English
3
3
45
5.6K
EIFY
EIFY@EIFY·
@skate_dont @mr_scientism Not every Singaporean. English is the only language every Singaporean student needs to learn at school.
English
0
0
0
44
scientism
scientism@mr_scientism·
China should extend ’peaceful reunification’ to all under heaven. Seems unfair that only Taiwan has good options right now.
English
35
153
2.1K
85.1K
EIFY
EIFY@EIFY·
@Laz4rz As the innermost Matryoshka, no one can hear you scream🪆
English
0
0
0
417
@Rupprecht_A
@Rupprecht_A@RupprechtDeino·
Interesting! 😯 Looks at first sight without enlargement like just 3 J-16 far far away and much too blurry 🫣 ... but now I think more like one J-XDS and two CCAs? 🤔
@Rupprecht_A tweet media
DS北风@WenJian0922

捡来一张图

English
13
18
253
37K
EIFY
EIFY@EIFY·
4. These are ScionC experiments, designed to keep the weight norm stable. (I don't expect 2-4 to change the direction of the result) 5. Without biases somehow the avg. spectral norm is smaller and the L2 grad norm is higher. It's possible that the optimal WD may change... 5/5
EIFY tweet media
English
0
1
3
106
EIFY
EIFY@EIFY·
2. For my ViT-S I made sure that QKV grad. are separately orthogonalized and the input dim. of patchifier are flattened. In the process I already left out the bias of QKV. 3. I incorporated @tmpethick's 1.0 init. mo for the unbiased exp. 4/5
English
1
1
1
116
EIFY
EIFY@EIFY·
@chili_girl_ Isn't that leek (instead of ネギ)?
English
0
0
2
269
Sugar and spice
Sugar and spice@chili_girl_·
POV : Miku is stepping on you 💚
Sugar and spice tweet mediaSugar and spice tweet media
English
3
62
3.2K
41.5K
EIFY
EIFY@EIFY·
@KELMAND1 一審?多半會上訴。
中文
0
0
0
183
Eason Mao☢
Eason Mao☢@KELMAND1·
柯文哲一审被判17年 台北地方法院26日下午一审判处柯文哲17年徒刑,褫夺公权6年。
中文
25
5
77
25.3K
定盘之命
定盘之命@bitlord0429·
@9992rc4g7c3939 i can edit it too(look at the button),for me this is chinese clothes and chinese haircut during qing dynasty,look this boy’s haircut
定盘之命 tweet media定盘之命 tweet media
English
1
0
1
1.5K
EIFY
EIFY@EIFY·
@LongDesertTrain Would the GI-persistent SARS-CoV-2 eventually establish a fecal–oral transmission route?
English
1
0
2
273
EIFY
EIFY@EIFY·
@Ji_Ha_Kim @tonysilveti "Huge equal radius" = tiny WD, are you running it until steady-state? Wouldn't that take a long time...?
English
1
0
2
83
Ji-Ha
Ji-Ha@Ji_Ha_Kim·
@EIFY @tonysilveti Gemini had an interesting strategy, fixed momentum, first sweeping effective lr with huge equal radius for all, then re-running using the final weight norms as the radii, and it seems to work surprisingly well
English
3
0
2
89
Ji-Ha
Ji-Ha@Ji_Ha_Kim·
How are people tuning their hyperparameters for Scion optimizer?
English
2
0
9
1.2K
EIFY
EIFY@EIFY·
@Ji_Ha_Kim @tonysilveti Based on the fact that output Sign layer weight & grad don't become ind. I suspect / hypothesize that what's important for the output layer is the steady-state weight norm, not the detailed LR/WD dynamics. I haven't tested this tho...
English
1
0
0
83
Ji-Ha
Ji-Ha@Ji_Ha_Kim·
@EIFY The values in Scion paper just seem hand-tuned. Was there some strategy @tonysilveti
Ji-Ha tweet media
English
2
0
1
100
EIFY
EIFY@EIFY·
@Ji_Ha_Kim Separately (that said, without any particular justification)
English
2
0
0
79
Ji-Ha
Ji-Ha@Ji_Ha_Kim·
@EIFY Jointly or what?
English
1
0
0
109
EIFY
EIFY@EIFY·
@nftbanker ⋯⋯病急亂投醫?
中文
0
0
1
5.3K
小将
小将@nftbanker·
今晚国内的辅酶Q10卖爆了 本来今晚还有两个电话,客户都是取消了,说明天再聊,命要紧。
中文
87
25
785
534.9K
Rustem
Rustem@ruuustem_10·
Excited to share our latest work on bridging theory & practice in optimization 🚀 We study stochastic conditional methods with momentum and provide practical strategies for choosing batch size and Frank–Wolfe stepsizes when token budget increases Paper: arxiv.org/abs/2603.21191
English
2
9
49
4.9K