apaz

1.1K posts

apaz banner
apaz

apaz

@apaz_cli

https://t.co/EYtS07MR7w Making GPUs go brrr

Hiding in your wifi Katılım Temmuz 2019
564 Takip Edilen846 Takipçiler
Sabitlenmiş Tweet
apaz
apaz@apaz_cli·
Releasing mlsweep, a sweep scheduler and visualizer for distributed ML training. It aims to make launching runs across groups GPUs frictionless and achieve near feature-parity with wandb. But you can use it with whatever frameworks or loggers you like, wandb included.
apaz tweet media
English
2
5
32
2.1K
apaz
apaz@apaz_cli·
Releasing mlsweep, a sweep scheduler and visualizer for distributed ML training. It aims to make launching runs across groups GPUs frictionless and achieve near feature-parity with wandb. But you can use it with whatever frameworks or loggers you like, wandb included.
apaz tweet media
English
2
5
32
2.1K
apaz
apaz@apaz_cli·
@tom_doerr It depends on what regime you're in. Generally you're bottlenecked either by loading the model (short context) or by loading the kvcache (long context). Rarely there can be cases where it can be flops. But usually it's not flops. Almost always it's not flops.
English
0
0
1
12
Tom Dörr
Tom Dörr@tom_doerr·
@apaz_cli Nvfp4 model inference works much better on a 5090. Even if it would work properly on the Sparks, the 5090 has the nvfp4 compute of 3 to 4 sparks
English
1
0
0
19
apaz
apaz@apaz_cli·
Since every strong open model is a big chungus MoE now (Qwen 3.5/Nemotron 120Bs), it's getting harder to come up with a use for small GPUs. You kinda need unified memory, like DGX Spark or Strix Halo. And still less efficient to run at bigger batch sizes due to being MoEs.
English
2
0
3
152
apaz
apaz@apaz_cli·
@tom_doerr Hmm. A DGX Spark should already be pretty good for that though, no? Unless you need all three of them for hosting things?
English
1
0
1
35
Tom Dörr
Tom Dörr@tom_doerr·
@apaz_cli I have three Sparks and was thinking about buying a GPU for high batched throughput with Qwen 3.5 27B
English
1
0
0
105
apaz
apaz@apaz_cli·
I think that for the most part from now on smaller GPUs are for things like data labeling/filtering. And some pretraining/finetuning experiments. Inference is for machines with unified memory. Either so they can fit the model, or so they can offload kvcache (Grace systems, etc).
English
0
0
1
22
apaz
apaz@apaz_cli·
This desperately needs to be run through slop guard.
Nous Research@NousResearch

Hermes Agent wrote a novel. "The Second Son of the House of Bells" runs 79,456 words across 19 chapters. The agent built its own pipeline to do it, using the ame modify-evaluate-keep/discard loop as @karpathy's Autoresearch but applied to fiction: world-building, chapter drafting, adversarial editing, Opus review loops, LaTeX typesetting, cover art, audiobook generation, and landing page setup. Book: nousresearch.com/bells Code: github.com/NousResearch/a…

English
0
0
1
56
apaz
apaz@apaz_cli·
@PavelSnajdr I'm honestly not sure I understand the context of what you mean by this
English
1
0
0
14
Pavel Snajdr
Pavel Snajdr@PavelSnajdr·
@apaz_cli this is gonna be so much fun when it breaks into mainstream :D when there's enough compute to support the inference :D
English
1
0
0
19
apaz
apaz@apaz_cli·
Labs can make sure their GPUs are always running because they have a backlog of experiments and related benchmarks to run. This can be done automatically. Whereas I think for individuals that automated research is probably best for gpu-warming. In any case, this will be popular.
English
1
0
4
105
apaz
apaz@apaz_cli·
I find myself brainstorming in claude code a lot. But over time this brainstorming has become more and more elaborate. It looks less like brainstorming, and more like research. Now I brainstorm in the form of elaborate multi-project repos. Some of which I release, like mlsweep.
English
0
0
2
131
apaz
apaz@apaz_cli·
Realizing now that there are deep similarities between the architectures of openclaw and research generation agents. They handle permissions and keeping GPUs warm in pretty much the same manner. There is more work to be done here.
English
0
0
0
123
apaz
apaz@apaz_cli·
@nyxkrage The madlad actually did it. Well played.
English
0
0
2
54
Carsten Kragelund
Carsten Kragelund@nyxkrage·
Finally getting around to officially publishing ChastityBench! Benchmarking vision models on recognizing chastity cages without being directly prompted for it. Stop testing general vision capable models on more and more graphslop and OCR tasks.
English
3
1
14
530
ueaj
ueaj@_ueaj·
@tenderizzation My favorite part of ML research is only having to know what is theoretically possible to implement efficiently in a kernel but never having to actually do that part
English
1
0
1
59
tender
tender@tenderizzation·
please join our performance engineer prayer circle for attention residuals: 🕯 🕯 🕯️ 🕯 researcher 🕯️ 🕯no make softmax 🕯 🕯 into top-k 🕯 🕯 🕯 🕯
English
12
27
581
46.1K
apaz
apaz@apaz_cli·
Autoresearch turns out to not be the first project to do something like this. There were many similar projects before autoresearch, which I am now looking into because I am doing something similar. I like EvoX, but metaprompting seems like a lot of work.
English
0
0
1
169
CLImeter
CLImeter@CLImeter·
@apaz_cli mlsweep looks solid. When you're ready to let users pay for GPU sweeps per run, climeter.ai handles usage-based billing for CLI tools in 2 lines — meter per command or per job. Worth checking out when you get to monetization.
English
1
0
0
12
apaz
apaz@apaz_cli·
@_ueaj Gosplan is alive and well I see.
English
1
0
3
140
apaz
apaz@apaz_cli·
Next on the list of features will be a rollout viewer and node grouping to handle allocating whole NVL72 clusters. Check it out here. Comments, feature requests, and PRs welcome. github.com/apaz-cli/mlswe…
English
0
0
3
124
apaz
apaz@apaz_cli·
I find this format, this scheduler, and this experiment manager, to be vastly superior to even the paid offerings of other services. It can also do a lot of cool tricks that others can't to catch OOMs and get the fastest model running that will fit with certain settings.
apaz tweet media
English
1
0
1
137