recallmenot

20 posts

recallmenot

@recallmenot

void

void Katılım Nisan 2024

809 Takip Edilen80 Takipçiler

recallmenot@recallmenot·3d

@_lewtun searxng for search, obviously but what are you using for fast web fetch on protected sites?

English

963

Lewis Tunstall@_lewtun·3d

You can now have an AI researcher running on your laptop 24/7 for free! Running Qwen3-35B-A3B with llama.cpp and a 4-bit quant from Unsloth

English

118

1.1K

113.7K

recallmenot@recallmenot·8 May

@beaverd please release!

English

Beaver 🦁@beaverd·7 May

OpenAI models are big because they are trained in the GAY, the inconsistent, the untruthful. They will never win the race because an efficient and optimized LLM is inherently right wing. Their tokenizers are meaningless and their embedding is not structured. Their training is using 50,000 H100's to push a square through a circle shaped hole; a pastime reserved historically for retards and babies. I have 115M models that don't hallucinate and are 18x faster and 13x smaller than benchmarks they tested against. The 2.8B models I have were trained on the KJV Bible, classical physics, Dostoevsky, Plato, Homer, Aquinas, etc. 500 million tokens of pure coherence and structure. In these systems the idea of abortion and gay is inconceivable. ^^^^ this was written by my 115M model btw

English

846

19.9K

recallmenot@recallmenot·1 May

@c4talyst It's dangerous to go alone! Take this. github.com/ThomasBaruzier…

English

Dan Southwood-Wells@c4talyst·30 Nis

The homeserver got a dual 3090 upgrade. Now running Qwen3.6-27B-FP8

English

442

22.1K

recallmenot@recallmenot·1 May

@c4talyst @CosmicMonad yes please share - still using llama.cpp and vLLM would be interesting!

English

Dan Southwood-Wells@c4talyst·1 May

@CosmicMonad vLLM via LiteLLM. Can share config later! Yes please let's share notes, I've got lots to tweak I'm sure - only got it together yesterday (had just single 3090 before).

English

154

recallmenot@recallmenot·1 May

@c4talyst It's dangerous to go alone! Take this. github.com/ThomasBaruzier…

English

Dan Southwood-Wells@c4talyst·1 May

@recallmenot 👀 Soon!

English

recallmenot@recallmenot·1 May

@c4talyst try opencode / hermes and watch the room temp climb 0.5°C per minute until equilibrium @ 35°C ;O Meanwhile VRAM always @ 105-108°C and GPU cores at ~96°C.

English

Dan Southwood-Wells@c4talyst·1 May

@recallmenot Heh thanks - fortunately going into Winter so the extra warmth will be welcomed 😅

English

recallmenot@recallmenot·1 May

@areyouevenreal4 @c4talyst did you get much more token/s? tried with llama.cpp and peer transfers enabled but it was worse than default...

English

areyouevenreal@areyouevenreal4·1 May

@c4talyst You should buy an NVLink bridge and use NVLink with tensor parallelism

English

recallmenot@recallmenot·29 Nis

@pupposandro got them mixed up: speculative decoding, not tuboquant

English

recallmenot@recallmenot·29 Nis

@pupposandro If only the CPU/iGPU could determine in advance which experts will be needed next and keep streaming them to the 3090... kinda like Turboquant but focussed on "which experts next", maybe even with ROCm doing that prediction. 24GB = small cache for few experts

English

311

Sandro@pupposandro·29 Nis

Testing a Ryzen Strix Halo 128gb + RTX 3090 24gb setup atm. On paper it’s perfect: the 3090 handles speed, the Strix Halo handles memory, you can run everything well including dense or bigger models. The catch is connecting them together cleanly. Still working on that. Cost is ~ $4,000. Still cheaper than the DGX.

English

261

20.7K

recallmenot@recallmenot·29 Nis

@moofeez when jackrong releases his qwopus 3.6 27b v1, please do a debug-finetune of that! that will be the ultimate local llm!!

English

134

mufeez@moofeez·28 Nis

I post-trained Qwen3-Coder to fix bugs using an actual debugger. The result: Solve rate: 70% → 89% Median turns to fix: 46 → 19 (-59%) Instead of just reading code or print-debugging, it: - reasons from execution - inspects live variables and call stacks - sets breakpoints, steps, and evaluates expressions

English

118

1.6K

121.9K

recallmenot@recallmenot·24 Nis

@delba_oliveira @AnthropicAI congrats! heaviest factor right now is how many tokens cc adds (as a harness). bringing that down could make the pro plan much more usable. dogfooding the unlimited api isn't the same as dogfooding the actual product (subscription)!

English

Delba@delba_oliveira·22 Nis

Life update: I've joined the Claude Code team at @AnthropicAI.

English

228

5.1K

337K

recallmenot@recallmenot·22 Nis

@ClaudeDevs make the pro plan useful again (with opus)!

English

2.2K

ClaudeDevs@ClaudeDevs·22 Nis

New in Claude Code: /ultrareview (research preview) runs a fleet of bug-hunting agents in the cloud. Findings land in the CLI or Desktop automatically. Run it before merging critical changes—auth, data migrations, etc. Pro and Max users get 3 free reviews through 5/5.

English

549

1.2K

16.6K

2.6M

Keşfet

@_lewtun @beaverd @c4talyst @CosmicMonad @areyouevenreal4 @pupposandro @moofeez @delba_oliveira