Kevin

2.7K posts

Kevin

@O2_Addict_

Oxygen Addict | Techno Puritan ✝️

Boston, MA शामिल हुए Ekim 2022

1.7K फ़ॉलोइंग238 फ़ॉलोवर्स

पिन किया गया ट्वीट

Kevin@O2_Addict_·1d

Space 🚀

Amazing Physics@amazing_physics

Would you rather explore outer space or the depths of the ocean?

English

132

Kevin@O2_Addict_·59m

@TiffanyFong Right on

English

Tiffany Fong@TiffanyFong·2h

ZXX

680

166

5.1K

57.2K

Kevin@O2_Addict_·1h

@0xSero Respect

English

0xSero@0xSero·6h

This guy did a much better job than me. If you want to run AI at home here’s current SOTA with functional configs.

AlexAImaginator@TraffAlex

🖥️ Best Local LLMs for Consumer GPUs — llama.cpp Guide (June 2026) What I actually run on consumer hardware right now. Every model below runs via llama.cpp with a simple one-liner — no Docker, no Python env, no cloud. ━━━ 8-16GB VRAM ━━━ 🔹 Gemma 4-12B (Google) • Smartest model in this size class — competes with stuff 2× bigger • Unsloth's MTP GGUFs: 162 tok/s vs 52 tok/s normal (3× speedup) • Minimum 8GB VRAM recommended for Q4_K_M quant • GGUF → huggingface.co/unsloth/gemma-… 🔹 LFM2.5-8B-A1B (LiquidAI) • Hybrid MoE, only 1B active params — absurdly fast for its size • Perfect for 8-12GB cards, MacBooks, or anyone on a tight budget • GGUF → huggingface.co/LiquidAI/LFM2.… ━━━ 16-32GB VRAM ━━━ 🔹 Qwen3.6-27B (Qwen) • Scored 1.00 on tool-efficiency benchmarks — best local agent available • 40 deterministic tasks, 32k/128k context needle tests — all passed • GGUF → huggingface.co/unsloth/Qwen3.… • MTP version (faster) → huggingface.co/unsloth/Qwen3.… 🔹 Qwopus3.6-27B-v2 (Jackrong) • Best quantization of Qwen3.6-27B — topped 5 agent & coding benchmarks (1200 samples) • If you're running Q4, this is the one to grab • GGUF → huggingface.co/Jackrong/Qwopu… • MTP version → huggingface.co/Jackrong/Qwopu… 🔹 Gemma 4-31B QAT (Google/Unsloth) • QAT variant with MTP draft head: 76-125 tok/s (1.67× speedup) • Excellent for multi-agent / subagent workflows • GGUF → huggingface.co/unsloth/gemma-… 🔹 Nex-N2-Mini (Nex AGI) • Post-train of Qwen3.5-35B-A3B — MoE with only 3B active params • Fits on 16GB+ VRAM, overflow loads from system RAM • Adaptive thinking saves ~20% tokens with no quality loss • For deep multi-step reasoning, nothing in this size comes close • GGUF → huggingface.co/sjakek/Nex-N2-… ━━━ Quick Picks ━━━ • 16GB all-rounder → Gemma 4-12B with MTP GGUFs • 32GB all-rounder → Qwen3.6-27B / Qwopus-v2 • Agents & tool use → Qwen3.6-27B or Qwopus Q4 • Deep reasoning → Nex-N2-Mini (MoE, fits 16GB+) • Tight budget → LFM2.5-8B-A1B • Cheapest full build: 1× used RTX 3090 (24GB) + rest of PC ≈ $1000-1500 ━━━ Setup on Windows ━━━ 1. Download llama.cpp → github.com/ggml-org/llama… (latest .zip) 2. Extract to any folder (e.g. C:\llama.cpp) 3. Download a .gguf from the links above (Q4_K_M or Q5_K_M for best quality/speed balance) 4. Run one of the commands below depending on your hardware ━━━ Launch Commands ━━━ SINGLE GPU — Standard model (no MTP): llama-server.exe ^ -m C:\models\Qwen3.6-27B-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ -ngl 100 ^ -np 1 ^ --port 8080 ^ --jinja SINGLE GPU — MTP model (faster inference): llama-server.exe ^ -m C:\models\Qwen3.6-27B-MTP-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ --spec-type draft-mtp ^ --spec-draft-n-max 3 ^ -ngl 100 ^ -np 1 ^ --port 8080 ^ --jinja DUAL GPU — Split across two cards: llama-server.exe ^ -m C:\models\Qwen3.6-27B-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ -ngl 100 ^ --tensor-split 0.55,0.45 ^ --main-gpu 0 ^ -np 1 ^ --port 8080 ^ --jinja DUAL GPU + MTP + Vision (multimodal): llama-server.exe ^ -m C:\models\Qwen3.6-27B-MTP-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ --spec-type draft-mtp ^ --spec-draft-n-max 3 ^ -ngl 100 ^ --tensor-split 0.60,0.40 ^ --main-gpu 0 ^ -np 1 ^ --port 8080 ^ --jinja ^ --mmproj C:\models\mmproj-F16.gguf ━━━ Parameter Breakdown ━━━ -m Path to your .gguf model file. Change this to wherever you downloaded it. --ctx-size 180000 Context window in tokens. 180k = huge context for long conversations or big codebases. Reduce to 32768 or 65536 if you don't need long context — uses less VRAM. --flash-attn on Flash Attention — dramatically speeds up inference and reduces VRAM usage. Works on RTX 30xx/40xx/50xx. Always enable this. --cache-type-k q4_0 / --cache-type-v q4_0 Quantizes the KV cache (key/value attention cache) to 4-bit. This is what makes 180k context fit in VRAM. Without it, huge contexts eat all your memory. Quality impact is minimal — this is a free performance win. --batch-size 1024 / --ubatch-size 512 batch-size = how many tokens are processed in one forward pass (throughput). ubatch-size = micro-batch actually sent to the GPU per step. Higher = faster prompt processing but needs more VRAM. If you run out of VRAM, lower these (e.g. 512/256). -ngl 100 Number of layers to offload to GPU. 100 = all layers on GPU (full offload). This is what you want if the model fits in your VRAM. If it doesn't fit, reduce this (e.g. -ngl 40) — remaining layers run on CPU/RAM. --tensor-split 0.55,0.45 How to split model layers across multiple GPUs. Values are ratios. 0.55,0.45 = GPU 0 gets 55% of layers, GPU 1 gets 45%. Adjust based on your VRAM — give more to the card with more memory. Example: 0.70,0.30 for a 24GB + 12GB setup. Not needed for single GPU setups. --main-gpu 0 Which GPU handles the batch computation (the "orchestrator"). Set to 0 (your primary GPU). The other GPU(s) handle their assigned layers. Minor performance impact — usually just leave it at 0. -np 1 Number of parallel slots (concurrent requests). 1 = one user at a time. Increase to 2-4 if you want multiple clients connected simultaneously. Each extra slot uses additional VRAM for its own KV cache. --port 8080 Which port the server listens on. Change if port 8080 is busy. --jinja Enables Jinja2 template processing — required for proper chat formatting. Most modern models expect this. Always include it. --spec-type draft-mtp Enables Multi-Token Prediction (MTP) speculative decoding. Only works with MTP GGUF models (downloaded separately). The model predicts multiple tokens at once and verifies them — big speed boost. --spec-draft-n-max 3 How many tokens the MTP draft head proposes per step. 3 is a good default. Higher = potentially faster but more VRAM and may reduce quality. --mmproj Path to the multimodal projector file (for vision models). Enables image understanding — paste screenshots into the web chat. Only needed if you want vision capabilities. Omit for text-only use. ━━━ Your Hardware → Your Command ━━━ Single GPU (8-24GB VRAM): Use the "Single GPU" command. Change -m to your model path. 8GB card → Gemma 4-12B Q4 or LFM2.5-8B 12GB card → Gemma 4-12B Q5/Q6 16GB card → Gemma 4-31B QAT Q4 or Nex-N2-Mini 24GB card → Qwen3.6-27B Q4/Q5, Qwopus-v2, Gemma 4-31B QAT Q5/Q6 Dual GPU: Use the "Dual GPU" command. Adjust --tensor-split based on your VRAM ratio. 24GB + 24GB → --tensor-split 0.50,0.50 24GB + 12GB → --tensor-split 0.70,0.30 24GB + 8GB → --tensor-split 0.75,0.25 Want speed? Use MTP versions of models with the "MTP" commands. Want vision? Add --mmproj with the projector file from the model's HuggingFace repo. 5. Once running, you get: • Web chat UI → http://localhost:8080 • OpenAI-compatible API → http://localhost:8080/v1 • Playground → http://localhost:8080/playground ━━━ Why /v1 API Is the Killer Feature ━━━ One local endpoint replaces your entire cloud API bill. The /v1 endpoint is drop-in OpenAI-spec compatible — every tool that speaks OpenAI just works. No custom code, no glue layer. Works out of the box with: • IDEs: Cursor, Continue, Windsurf, Cline, Roo Code • CLI tools: aider, Open Interpreter, OpenCode • Frameworks: LangChain, LlamaIndex, LiteLLM • Any OpenAI SDK (Python, Node, Go, Rust) Why this beats cloud APIs: • 100% private — code never leaves your machine • $0 per token — no rate limits, no quotas, no surprise bills • Works fully offline • Zero telemetry, no training on your data • Swap models by dropping in a different .gguf — no app changes needed • Run 32k–128k context windows without burning money Good combos: • Cursor + Qwopus-v2 → near-frontier quality, zero API cost • Continue + Qwen3.6-27B → best local coding agent • aider + Gemma 4-12B MTP → 162 tok/s, feels instant • OpenCode + Nex-N2-Mini → deep reasoning on 16GB Set any OpenAI-compatible client to your local endpoint: set OPENAI_API_KEY=sk-dummy (any non-empty string works) set OPENAI_BASE_URL=http://localhost:8080/v1 # every OpenAI-compatible tool now hits your local GPU Shoutouts: @0xSero @rS_alonewolf @witcheer @UnslothAI @LottoLabs

English

496

53.4K

Kevin@O2_Addict_·1h

@bluewmist Try Retardmaxxing. It’s good for you 👍

English

blue@bluewmist·1h

Free advice from an expensive psychologist: If you are an anxious person, do everything for fun. Go to a job interview for fun. Submit documents for fun. Start a blog for fun. Anxiety feeds on importance. Don't turn everything into a matter of life or death.

English

372

7.6K

Kevin@O2_Addict_·1h

@meshtimes_ Poetry. Cheers to you!

English

marisa@meshtimes_·4h

i used to be embarrassed by my fine arts degree i’d watch my friends study cs, get internships, get jobs, and feel like i messed up but the older i get the more grateful i am for it art school taught me how to observe , how to sit with ambiguity so looking back i don’t think i’d build the things i build today without

English

297

13.8K

Kevin@O2_Addict_·1h

Cool

Mindful Learning@MindfulL205

Digital microfluidics: The art of controlling droplets with electricity

English

Kevin@O2_Addict_·1h

@rachallison1 Suns up at 5am, the clock is lying to you! Go to sleep soon

English

rachael ✿@rachallison1·1h

is it too early to go to bed?

English

446

Kevin@O2_Addict_·1h

@yacineMTB Why does the goalpost keep moving? 🤔

English

kache@yacineMTB·3h

Canada isn't so much a mosaic, it is a melting pot. And America is a pressure cooker

Scott Robertson@sarobertson_

Carney: Canada is a mosaic, not a melting pot. And this is the distinction that matters. Because a mosaic doesn't dissolve or blend its pieces. Each is stitched to each and all the pieces hold all. And the beauty is in the arrangement, not in the blending.

English

Kevin रीट्वीट किया

Three.js@threejs·3h

11M downloads/week 🚀

English

371

12K

Kevin@O2_Addict_·1h

@D_the_Designer You’re on, and onto, something

English

Kevin रीट्वीट किया

D@D_the_Designer·8h

Would be a neat chandelier. Just saying.

自由🆓@mukoda_matcha

プライベートで量子コンピュータの形を調べてたら、めちゃくちゃブリップAに近くてなんだか感動したんですよね　綺麗…

English

583

Kevin रीट्वीट किया

Camus@newstart_2024·10h

Jordan Peterson made a profound point on Chris Williamson’s podcast. When God dies, a lot of unexpected things die with Him, including science. Science isn’t some purely neutral, secular tool. It rests on deeply religious assumptions: that truth exists, that it’s knowable, that pursuing it is good, and that the universe makes sense. These aren’t scientific claims, they’re metaphysical, rooted in a religious worldview. The universities themselves grew out of monasteries. Without that deeper foundation, science eventually stops being about truth and becomes just another tool for power, ideology, or convenience. You lose the reason to be honest when the data gets inconvenient. Do you think science can survive long-term without any belief in objective truth or a higher moral order?

English

169

777

3.1K

125.3K

Kevin रीट्वीट किया

Benjamin 🇬🇧🇺🇸@UKFREEDOMUNITE·7h

Britain needs a first and second amendment 🇬🇧🇺🇸

English

158

128

1.8K

12.1K

Kevin रीट्वीट किया

Benjamin Tumbleson@bentumbleson·8h

Software is cool and all, but there's something wildly satisfying about creating hardware from scratch.

Benjamin Tumbleson@bentumbleson

This is one of things where as I learn more, I realize how much I don't actually know.

English

142

4.1K

Kevin@O2_Addict_·1h

@GiaMMacool You girls know this won’t work anymore, right? It’s not cool, and it’s online for the world to see

English

Gia Macool@GiaMMacool·3h

Women give everything with zero restrictions to the guy who won’t commit because he demands nothing from her. No accountability, no expectations, no future obligations just fun in the moment. She can do whatever, be however, and he’ll disappear tomorrow without holding her responsible. But with the man who wants to commit she puts up walls and restrictions especially around sex because she knows he’ll expect her to step up as a wife: daily effort, accountability, keeping the attraction alive, responsibility. That feels like too much work and pressure to her. So she starves the man who actually values her long-term while rewarding the one who doesn’t. Then gets confused when good men pull back or lose interest. It’s a self-created trap rooted in avoiding responsibility while chasing easy validation.

MrBlack™@KE_MrBlack

Women are something else 😂

English

251

20.8K

Kevin@O2_Addict_·1h

@mr_layman_3 @dissidentwest You’re thinking of Catholics

English

mr_layman_3@mr_layman_3·4h

@dissidentwest It's hilarious that some people take this as "Christian-coded", seeing as any Christian would bend over backwards to not offend the demon.

English

472

Dissident West@dissidentwest·8h

This episode of Frieren is the most unapologetically right wing content I’ve literally ever seen. She sees right through the demons who use suicidal empathy to trick humans into accepting them to their own demise. It’s the perfect allegory for modern mass migration.

English

219

2.5K

20.4K

Kevin@O2_Addict_·1h

@Morindacil @dissidentwest Defending your country and its values is most absolutely right wing

English

Morindacil@Morindacil·2h

@dissidentwest It's not really right wing since it's Japanese culture and virtues first and foremost. It's also just common sense to not allow this shit to happen, something that the governments are tending to forget.

English

509

Kevin@O2_Addict_·1h

Based Frieren

Dissident West@dissidentwest

English