mconcat

3.3K posts

mconcat

mconcat

@monoidconcat

Modernist

Katılım Haziran 2020
351 Takip Edilen526 Takipçiler
mconcat retweetledi
mconcat retweetledi
will brown
will brown@willccbb·
@seconds_0 also, reading papers and vibe-implementing them
English
4
2
51
1.5K
mconcat
mconcat@monoidconcat·
@phatggg There should be some hourly reminder of "you should get some rest" in claude code lol just like how tiktok shows that
English
0
0
1
76
phatg
phatg@phatggg·
@monoidconcat I'm not sure lol, but one thing for sure is that my dopamine receptors and focusing ability are fried from too much claude code switching 😂
English
1
0
0
22
mconcat retweetledi
Raj Dabre
Raj Dabre@prajdabre·
Birth-Teens: Pretraining Teens-20s: SFT 20-death: RL Such is human nature.
English
5
1
63
25.2K
mconcat
mconcat@monoidconcat·
Right now its all just opus disguised as personally specaizlied customized agent
English
0
0
0
17
mconcat
mconcat@monoidconcat·
Have anyone tried mellanox 100gbe cards for inter-node pipeline parallelism
English
0
0
0
49
mconcat
mconcat@monoidconcat·
Anyone has something fun to recommend
English
0
0
1
33
mconcat
mconcat@monoidconcat·
Quick benchmark showed promising quality for the FP8 quant.
mconcat tweet media
English
0
0
0
28
mconcat
mconcat@monoidconcat·
FP8 quantization of Qwopus model. Link in the reply.
mconcat tweet media
English
1
0
1
69
mconcat
mconcat@monoidconcat·
Instruction updated.
mconcat tweet media
English
0
0
0
16
mconcat
mconcat@monoidconcat·
The result is pretty good - it only showed some visible quality degradation over MMLU-pro.
mconcat tweet media
English
1
0
0
23
mconcat
mconcat@monoidconcat·
There has been a problem with mixed precision support from vllm within a fused layer - currently fixed. However, the latest vllm version is not optimized for gated deltanet and has high VRAM usage spike. It can be accommodated in an rtx pro 6000, but not in 5090. Two open PRs, #36599 and #36325 in the vllm repo fixes this problem. If you want to run it in a single 5090 before they got merged, manually cherry pick the code changes from those two PRs.
mconcat@monoidconcat

NVFP4 quantization of Qwopus model. Link in the reply.

English
1
0
0
130
mconcat retweetledi
wizwand
wizwand@wizwand_team·
Introducing Wizwand Swarm - the first AI/ML research swarm intelligence. It's a forum built for AI agents where they can communicate with other researcher/engineer's agents to get inspired and discover ideas without human intervention. Try it out: wizwand.com/blog/introduci…
wizwand tweet media
English
4
8
91
10.4K
mconcat retweetledi
marmik
marmik@marmikch·
it is one thing for a structure in representations (or geometry of activations) to exist and it is a completely different thing for the model to actually use it for a downstream task. the same goes for linear probes achieving high accuracy. structure and function may be coupled but not always. pca is useful but often deceptive.
LadyValor@lady_valor_07

I’m 25. Give me oddly specific life tips. No general ”surround yourself with positive people” tips. I want the most random, specific advice possible.

English
4
3
101
7.4K