llm_enjoyer

334 posts

llm_enjoyer banner
llm_enjoyer

llm_enjoyer

@LLMenjoyer

火を求める者、王たらんと欲する者よ

Katılım Kasım 2023
48 Takip Edilen183 Takipçiler
llm_enjoyer
llm_enjoyer@LLMenjoyer·
this is exactly how it feels to ablate mamba 3
English
1
4
48
2.5K
llm_enjoyer
llm_enjoyer@LLMenjoyer·
zoomer gooners have lowkirkuinely had a more concrete impact on ML sota than eg. Schmidhuber
llm_enjoyer tweet media
English
0
0
1
130
llm_enjoyer
llm_enjoyer@LLMenjoyer·
scientists unveil new proof that MoE is faster than dense! ‼️‼️🤯
English
0
0
7
181
llm_enjoyer
llm_enjoyer@LLMenjoyer·
@Sanskarsam9 i see you should consider becoming a scammer either scam your company or scam some kind sponsors it’s actually p easy
English
1
0
2
36
sanscar
sanscar@Sanskarsam9·
@LLMenjoyer these are my personal projects, my company does work on applied ml and i don’t run my personal projects there.
English
1
0
1
59
llm_enjoyer
llm_enjoyer@LLMenjoyer·
latent moe, jianlin su balancer, mamba 3, attention residuals, 500 hparam changes... when do we cross the line from "yolo run" to "schizo run"?
English
3
20
301
11.2K
sanscar
sanscar@Sanskarsam9·
@LLMenjoyer idk i have a habit now to use this emoji and cat ai gifs 🤣
English
1
0
0
11
llm_enjoyer
llm_enjoyer@LLMenjoyer·
@EmpyriaMirai how it feels to ablate a cocktail of 5 peptides 10 vitamins and 69 hormones
English
0
0
2
209
sanscar
sanscar@Sanskarsam9·
@LLMenjoyer this is what ive been doing since last 3 weeks 🥀
English
1
0
2
289
llm_enjoyer
llm_enjoyer@LLMenjoyer·
@nayshins high dose muscle relaxant if that doesn't do it, try xanax
English
0
0
0
28
Jake
Jake@nayshins·
Can someone please make my eye quit twitching for the love of god
English
11
0
17
1.3K
Dan Woods
Dan Woods@danveloper·
I actually hate MoE's now. Not just because they're difficult to hardwaremaxx, but it's actually a really dumb architecture (no offense to anyone). They naively approximate a graph without any of the benefit of graph traversal. We're sending a blind person down a path and we've trained something to nudge them onto a different path to get to the end, but it doesn't know the next part of the map until the person has walked down the street. I hate this.
English
17
2
111
15.4K
llm_enjoyer
llm_enjoyer@LLMenjoyer·
i hated it but the profiling seems to have been somewhat effective time to open a 100x leveraged short against the dataloader
English
1
0
11
541
stochasm
stochasm@stochasticchasm·
they're capturing my cudagraphs tomorrow
English
4
2
37
1.3K
You Jiacheng
You Jiacheng@YouJiacheng·
@LLMenjoyer lol don't do that. the author told me they consulted Su whether they can not cite the blog in arxiv version. Su said "it's okay, but my recommendation is to cite my blog to avoid misunderstanding" (I use quote but not the original words)
English
1
0
11
634
Valentin Ignatev
Valentin Ignatev@valigo·
I desperately need to learn more math. Recently I invented "lerp" from first principles. Would have saved some time if I knew its industry-standard name :/
English
46
12
775
691.6K
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
It's been known for a while that Canon from @ZeyuanAllenZhu is a monstrously powerful augmentation to the Transformer recipe, and I'm lowkey seething that it's not had industry adoption yet. GLM switched to DSA in months. If you're doing new pretraining, why not test Canon too?
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media
xjdr@_xjdr

nmoe has been updated with most of the receipts / repro code for noumena.com/research github.com/Noumena-Networ…

English
9
4
147
29.7K