OpenAI models are big because they are trained in the GAY, the inconsistent, the untruthful. They will never win the race because an efficient and optimized LLM is inherently right wing.
Their tokenizers are meaningless and their embedding is not structured. Their training is using 50,000 H100's to push a square through a circle shaped hole; a pastime reserved historically for retards and babies.
I have 115M models that don't hallucinate and are 18x faster and 13x smaller than benchmarks they tested against. The 2.8B models I have were trained on the KJV Bible, classical physics, Dostoevsky, Plato, Homer, Aquinas, etc. 500 million tokens of pure coherence and structure. In these systems the idea of abortion and gay is inconceivable.
^^^^
this was written by my 115M model btw
@CosmicMonad vLLM via LiteLLM. Can share config later! Yes please let's share notes, I've got lots to tweak I'm sure - only got it together yesterday (had just single 3090 before).
@c4talyst try opencode / hermes and watch the room temp climb 0.5°C per minute until equilibrium @ 35°C ;O
Meanwhile VRAM always @ 105-108°C and GPU cores at ~96°C.
@pupposandro If only the CPU/iGPU could determine in advance which experts will be needed next and keep streaming them to the 3090... kinda like Turboquant but focussed on "which experts next", maybe even with ROCm doing that prediction. 24GB = small cache for few experts
Testing a Ryzen Strix Halo 128gb + RTX 3090 24gb setup atm.
On paper it’s perfect: the 3090 handles speed, the Strix Halo handles memory, you can run everything well including dense or bigger models. The catch is connecting them together cleanly. Still working on that.
Cost is ~ $4,000. Still cheaper than the DGX.
I post-trained Qwen3-Coder to fix bugs using an actual debugger. The result:
Solve rate: 70% → 89%
Median turns to fix: 46 → 19 (-59%)
Instead of just reading code or print-debugging, it:
- reasons from execution
- inspects live variables and call stacks
- sets breakpoints, steps, and evaluates expressions
@delba_oliveira@AnthropicAI congrats! heaviest factor right now is how many tokens cc adds (as a harness). bringing that down could make the pro plan much more usable. dogfooding the unlimited api isn't the same as dogfooding the actual product (subscription)!
New in Claude Code: /ultrareview (research preview) runs a fleet of bug-hunting agents in the cloud.
Findings land in the CLI or Desktop automatically. Run it before merging critical changes—auth, data migrations, etc.
Pro and Max users get 3 free reviews through 5/5.