recallmenot

20 posts

recallmenot banner
recallmenot

recallmenot

@recallmenot

void

void Katılım Nisan 2024
809 Takip Edilen80 Takipçiler
recallmenot
recallmenot@recallmenot·
@_lewtun searxng for search, obviously but what are you using for fast web fetch on protected sites?
English
0
0
2
963
Lewis Tunstall
Lewis Tunstall@_lewtun·
You can now have an AI researcher running on your laptop 24/7 for free! Running Qwen3-35B-A3B with llama.cpp and a 4-bit quant from Unsloth
English
47
118
1.1K
113.7K
Beaver 🦁
Beaver 🦁@beaverd·
OpenAI models are big because they are trained in the GAY, the inconsistent, the untruthful. They will never win the race because an efficient and optimized LLM is inherently right wing. Their tokenizers are meaningless and their embedding is not structured. Their training is using 50,000 H100's to push a square through a circle shaped hole; a pastime reserved historically for retards and babies. I have 115M models that don't hallucinate and are 18x faster and 13x smaller than benchmarks they tested against. The 2.8B models I have were trained on the KJV Bible, classical physics, Dostoevsky, Plato, Homer, Aquinas, etc. 500 million tokens of pure coherence and structure. In these systems the idea of abortion and gay is inconceivable. ^^^^ this was written by my 115M model btw
Beaver 🦁 tweet media
English
51
44
846
19.9K
Dan Southwood-Wells
Dan Southwood-Wells@c4talyst·
The homeserver got a dual 3090 upgrade. Now running Qwen3.6-27B-FP8
Dan Southwood-Wells tweet media
English
75
9
442
22.1K
Dan Southwood-Wells
Dan Southwood-Wells@c4talyst·
@CosmicMonad vLLM via LiteLLM. Can share config later! Yes please let's share notes, I've got lots to tweak I'm sure - only got it together yesterday (had just single 3090 before).
English
2
0
4
154
recallmenot
recallmenot@recallmenot·
@c4talyst try opencode / hermes and watch the room temp climb 0.5°C per minute until equilibrium @ 35°C ;O Meanwhile VRAM always @ 105-108°C and GPU cores at ~96°C.
English
1
0
1
18
recallmenot
recallmenot@recallmenot·
@areyouevenreal4 @c4talyst did you get much more token/s? tried with llama.cpp and peer transfers enabled but it was worse than default...
English
1
0
0
14
areyouevenreal
areyouevenreal@areyouevenreal4·
@c4talyst You should buy an NVLink bridge and use NVLink with tensor parallelism
English
3
0
1
90
recallmenot
recallmenot@recallmenot·
@pupposandro If only the CPU/iGPU could determine in advance which experts will be needed next and keep streaming them to the 3090... kinda like Turboquant but focussed on "which experts next", maybe even with ROCm doing that prediction. 24GB = small cache for few experts
English
1
0
3
311
Sandro
Sandro@pupposandro·
Testing a Ryzen Strix Halo 128gb + RTX 3090 24gb setup atm. On paper it’s perfect: the 3090 handles speed, the Strix Halo handles memory, you can run everything well including dense or bigger models. The catch is connecting them together cleanly. Still working on that. Cost is ~ $4,000. Still cheaper than the DGX.
Sandro tweet media
English
32
15
261
20.7K
recallmenot
recallmenot@recallmenot·
@moofeez when jackrong releases his qwopus 3.6 27b v1, please do a debug-finetune of that! that will be the ultimate local llm!!
English
0
0
0
134
mufeez
mufeez@moofeez·
I post-trained Qwen3-Coder to fix bugs using an actual debugger. The result: Solve rate: 70% → 89% Median turns to fix: 46 → 19 (-59%) Instead of just reading code or print-debugging, it: - reasons from execution - inspects live variables and call stacks - sets breakpoints, steps, and evaluates expressions
English
92
118
1.6K
121.9K
recallmenot
recallmenot@recallmenot·
@delba_oliveira @AnthropicAI congrats! heaviest factor right now is how many tokens cc adds (as a harness). bringing that down could make the pro plan much more usable. dogfooding the unlimited api isn't the same as dogfooding the actual product (subscription)!
English
0
0
0
28
Delba
Delba@delba_oliveira·
Life update: I've joined the Claude Code team at @AnthropicAI.
English
228
72
5.1K
337K
ClaudeDevs
ClaudeDevs@ClaudeDevs·
New in Claude Code: /ultrareview (research preview) runs a fleet of bug-hunting agents in the cloud. Findings land in the CLI or Desktop automatically. Run it before merging critical changes—auth, data migrations, etc. Pro and Max users get 3 free reviews through 5/5.
English
549
1.2K
16.6K
2.6M