1.4K posts

Ak

@ak_7229

your average gamer

United Kingdom Katılım Haziran 2021

137 Takip Edilen15 Takipçiler

Ak@ak_7229·11h

@Ryan_And3rs0n @Michaelzsguo What spec are you using?

English

3.8K

Ryan Anderson@Ryan_And3rs0n·12h

@Michaelzsguo Bro TOTAL PLATFORM earnings in the last 30 days is $76

English

100

113.1K

Michael Guo@Michaelzsguo·18h

Apparently I can rent out my MacBook Pro for inference and make ~$423/month. That means the machine could pay for itself in roughly a year. Sounds too good to be true. Has anyone actually tested Darkbloom as a provider? Real demand, real payouts, or just optimistic marketplace math?

English

135

1.7K

665.7K

Ak@ak_7229·5d

@TraffAlex What about 96gb?

English

836

AlexAImaginator@TraffAlex·5d

🖥️ Best Local LLMs for Consumer GPUs — llama.cpp Guide (June 2026) What I actually run on consumer hardware right now. Every model below runs via llama.cpp with a simple one-liner — no Docker, no Python env, no cloud. ━━━ 8-16GB VRAM ━━━ 🔹 Gemma 4-12B (Google) • Smartest model in this size class — competes with stuff 2× bigger • Unsloth's MTP GGUFs: 162 tok/s vs 52 tok/s normal (3× speedup) • Minimum 8GB VRAM recommended for Q4_K_M quant • GGUF → huggingface.co/unsloth/gemma-… 🔹 LFM2.5-8B-A1B (LiquidAI) • Hybrid MoE, only 1B active params — absurdly fast for its size • Perfect for 8-12GB cards, MacBooks, or anyone on a tight budget • GGUF → huggingface.co/LiquidAI/LFM2.… ━━━ 16-32GB VRAM ━━━ 🔹 Qwen3.6-27B (Qwen) • Scored 1.00 on tool-efficiency benchmarks — best local agent available • 40 deterministic tasks, 32k/128k context needle tests — all passed • GGUF → huggingface.co/unsloth/Qwen3.… • MTP version (faster) → huggingface.co/unsloth/Qwen3.… 🔹 Qwopus3.6-27B-v2 (Jackrong) • Best quantization of Qwen3.6-27B — topped 5 agent & coding benchmarks (1200 samples) • If you're running Q4, this is the one to grab • GGUF → huggingface.co/Jackrong/Qwopu… • MTP version → huggingface.co/Jackrong/Qwopu… 🔹 Gemma 4-31B QAT (Google/Unsloth) • QAT variant with MTP draft head: 76-125 tok/s (1.67× speedup) • Excellent for multi-agent / subagent workflows • GGUF → huggingface.co/unsloth/gemma-… 🔹 Nex-N2-Mini (Nex AGI) • Post-train of Qwen3.5-35B-A3B — MoE with only 3B active params • Fits on 16GB+ VRAM, overflow loads from system RAM • Adaptive thinking saves ~20% tokens with no quality loss • For deep multi-step reasoning, nothing in this size comes close • GGUF → huggingface.co/sjakek/Nex-N2-… ━━━ Quick Picks ━━━ • 16GB all-rounder → Gemma 4-12B with MTP GGUFs • 32GB all-rounder → Qwen3.6-27B / Qwopus-v2 • Agents & tool use → Qwen3.6-27B or Qwopus Q4 • Deep reasoning → Nex-N2-Mini (MoE, fits 16GB+) • Tight budget → LFM2.5-8B-A1B • Cheapest full build: 1× used RTX 3090 (24GB) + rest of PC ≈ $1000-1500 ━━━ Setup on Windows ━━━ 1. Download llama.cpp → github.com/ggml-org/llama… (latest .zip) 2. Extract to any folder (e.g. C:\llama.cpp) 3. Download a .gguf from the links above (Q4_K_M or Q5_K_M for best quality/speed balance) 4. Run one of the commands below depending on your hardware ━━━ Launch Commands ━━━ SINGLE GPU — Standard model (no MTP): llama-server.exe ^ -m C:\models\Qwen3.6-27B-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ -ngl 100 ^ -np 1 ^ --port 8080 ^ --jinja SINGLE GPU — MTP model (faster inference): llama-server.exe ^ -m C:\models\Qwen3.6-27B-MTP-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ --spec-type draft-mtp ^ --spec-draft-n-max 3 ^ -ngl 100 ^ -np 1 ^ --port 8080 ^ --jinja DUAL GPU — Split across two cards: llama-server.exe ^ -m C:\models\Qwen3.6-27B-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ -ngl 100 ^ --tensor-split 0.55,0.45 ^ --main-gpu 0 ^ -np 1 ^ --port 8080 ^ --jinja DUAL GPU + MTP + Vision (multimodal): llama-server.exe ^ -m C:\models\Qwen3.6-27B-MTP-Q5_K_M.gguf ^ --ctx-size 180000 ^ --flash-attn on ^ --cache-type-k q4_0 ^ --cache-type-v q4_0 ^ --batch-size 1024 --ubatch-size 512 ^ --spec-type draft-mtp ^ --spec-draft-n-max 3 ^ -ngl 100 ^ --tensor-split 0.60,0.40 ^ --main-gpu 0 ^ -np 1 ^ --port 8080 ^ --jinja ^ --mmproj C:\models\mmproj-F16.gguf ━━━ Parameter Breakdown ━━━ -m Path to your .gguf model file. Change this to wherever you downloaded it. --ctx-size 180000 Context window in tokens. 180k = huge context for long conversations or big codebases. Reduce to 32768 or 65536 if you don't need long context — uses less VRAM. --flash-attn on Flash Attention — dramatically speeds up inference and reduces VRAM usage. Works on RTX 30xx/40xx/50xx. Always enable this. --cache-type-k q4_0 / --cache-type-v q4_0 Quantizes the KV cache (key/value attention cache) to 4-bit. This is what makes 180k context fit in VRAM. Without it, huge contexts eat all your memory. Quality impact is minimal — this is a free performance win. --batch-size 1024 / --ubatch-size 512 batch-size = how many tokens are processed in one forward pass (throughput). ubatch-size = micro-batch actually sent to the GPU per step. Higher = faster prompt processing but needs more VRAM. If you run out of VRAM, lower these (e.g. 512/256). -ngl 100 Number of layers to offload to GPU. 100 = all layers on GPU (full offload). This is what you want if the model fits in your VRAM. If it doesn't fit, reduce this (e.g. -ngl 40) — remaining layers run on CPU/RAM. --tensor-split 0.55,0.45 How to split model layers across multiple GPUs. Values are ratios. 0.55,0.45 = GPU 0 gets 55% of layers, GPU 1 gets 45%. Adjust based on your VRAM — give more to the card with more memory. Example: 0.70,0.30 for a 24GB + 12GB setup. Not needed for single GPU setups. --main-gpu 0 Which GPU handles the batch computation (the "orchestrator"). Set to 0 (your primary GPU). The other GPU(s) handle their assigned layers. Minor performance impact — usually just leave it at 0. -np 1 Number of parallel slots (concurrent requests). 1 = one user at a time. Increase to 2-4 if you want multiple clients connected simultaneously. Each extra slot uses additional VRAM for its own KV cache. --port 8080 Which port the server listens on. Change if port 8080 is busy. --jinja Enables Jinja2 template processing — required for proper chat formatting. Most modern models expect this. Always include it. --spec-type draft-mtp Enables Multi-Token Prediction (MTP) speculative decoding. Only works with MTP GGUF models (downloaded separately). The model predicts multiple tokens at once and verifies them — big speed boost. --spec-draft-n-max 3 How many tokens the MTP draft head proposes per step. 3 is a good default. Higher = potentially faster but more VRAM and may reduce quality. --mmproj Path to the multimodal projector file (for vision models). Enables image understanding — paste screenshots into the web chat. Only needed if you want vision capabilities. Omit for text-only use. ━━━ Your Hardware → Your Command ━━━ Single GPU (8-24GB VRAM): Use the "Single GPU" command. Change -m to your model path. 8GB card → Gemma 4-12B Q4 or LFM2.5-8B 12GB card → Gemma 4-12B Q5/Q6 16GB card → Gemma 4-31B QAT Q4 or Nex-N2-Mini 24GB card → Qwen3.6-27B Q4/Q5, Qwopus-v2, Gemma 4-31B QAT Q5/Q6 Dual GPU: Use the "Dual GPU" command. Adjust --tensor-split based on your VRAM ratio. 24GB + 24GB → --tensor-split 0.50,0.50 24GB + 12GB → --tensor-split 0.70,0.30 24GB + 8GB → --tensor-split 0.75,0.25 Want speed? Use MTP versions of models with the "MTP" commands. Want vision? Add --mmproj with the projector file from the model's HuggingFace repo. 5. Once running, you get: • Web chat UI → http://localhost:8080 • OpenAI-compatible API → http://localhost:8080/v1 • Playground → http://localhost:8080/playground ━━━ Why /v1 API Is the Killer Feature ━━━ One local endpoint replaces your entire cloud API bill. The /v1 endpoint is drop-in OpenAI-spec compatible — every tool that speaks OpenAI just works. No custom code, no glue layer. Works out of the box with: • IDEs: Cursor, Continue, Windsurf, Cline, Roo Code • CLI tools: aider, Open Interpreter, OpenCode • Frameworks: LangChain, LlamaIndex, LiteLLM • Any OpenAI SDK (Python, Node, Go, Rust) Why this beats cloud APIs: • 100% private — code never leaves your machine • $0 per token — no rate limits, no quotas, no surprise bills • Works fully offline • Zero telemetry, no training on your data • Swap models by dropping in a different .gguf — no app changes needed • Run 32k–128k context windows without burning money Good combos: • Cursor + Qwopus-v2 → near-frontier quality, zero API cost • Continue + Qwen3.6-27B → best local coding agent • aider + Gemma 4-12B MTP → 162 tok/s, feels instant • OpenCode + Nex-N2-Mini → deep reasoning on 16GB Set any OpenAI-compatible client to your local endpoint: set OPENAI_API_KEY=sk-dummy (any non-empty string works) set OPENAI_BASE_URL=http://localhost:8080/v1 # every OpenAI-compatible tool now hits your local GPU Shoutouts: @0xSero @rS_alonewolf @witcheer @UnslothAI @LottoLabs

English

206

284.9K

Ak@ak_7229·29 May

@TherootB42158 @mil000 For 8k youre probably getting a pro 6000. Which model can you run on there that will do 90% of chatgpt 5.5 or opus 4.8? Keep in mind you need to load the model plus need enough space for context as well.

English

therootboss@TherootB42158·28 May

@mil000 Spend 8k constantly for codex tokens ✅ Spend 8k once to build a PC that can run a model 90% as good at very high speeds and then never have to pay for tokens again ❌

English

208

13.7K

Milo Smith@mil000·28 May

Do people actually just pay API prices with no discounts? Do none of you guys have a CFO?

Nano@NanoCodesAI

LMAOO $8k openai bill just hit… token maxxingg is real

English

2.2K

233.1K

Ak@ak_7229·24 May

@The_Only_Signal Which one is better? 27b or 35b?

English

2.3K

Mike Bradley@The_Only_Signal·24 May

If you are rocking a 128GB unified memory system, or a 96GB RTX 6000, and running Qwen3.6-27B or 35B-A3B on them, you already know where this industry is headed. Smaller, more token heavy models, coupled with a harness like Hermes, on moderate VRAM high throughput hardware.

English

975

61.8K

Ak@ak_7229·22 May

@sickdotdev @Utkarsh_sai_K Do that then use kimi k2.6 or something on that same level

English

197

Sick@sickdotdev·22 May

@Utkarsh_sai_K good idea man

English

287

44.9K

Sick@sickdotdev·22 May

My company’s claude account got exhausted. Now my legendary manager is asking if we can build our own LLM like Claude to reduce costs😭

English

1.2K

27.3K

1.1M

Ak@ak_7229·1 May

@disk0x @jbillinson @AlsikkanTV @grok Already exsists. I use one called opal

English

Danny Iskandar@disk0x·25 Mar

@jbillinson @AlsikkanTV @grok can make an app based on this

English

15.5K

Josh Billinson@jbillinson·20 Mar

Deeply humiliating to realize how much this overpriced chunk of plastic has improved my quality of life in just a week.

English

199

121

7.7K

13.4M

Ak@ak_7229·15 Nis

@TheAstroCrew Is the full game out? Ive had my eye on it for a while

English

The Astronauts@TheAstroCrew·14 Nis

Enemies on the ground, Ghost Galleon in the sky, bullets and lightning - just another Tuesday on the Scarlet Coast. #Witchfire #Steam #DarkFantasy #FPS #Fight

English

114

6.6K

Ak@ak_7229·9 Nis

@natemcgrady @RhysSullivan Go to your workspace and then billing and downgrade the plan. There should be a 6 dollar option (or thereabouts)

English

Nate McGrady@natemcgrady·9 Nis

@RhysSullivan wtf I literally just purchased it a couple hours ago and it was $22/mo

English

278

Rhys@RhysSullivan·9 Nis

1. when did google workspace become $17/mo 2. i love how it's "recommended" when it's literally their only option

English

251

25.5K

Ak@ak_7229·22 Mar

@Devinbuild China

Español

Devin@Devinbuild·21 Mar

Who is winning the AI race? - Anthropic - OpenAI - Gemini

English

604

357

69.6K

Ak@ak_7229·16 Mar

@LottoLabs @iamprinceba But then you dont have much space for context?

English

Lotto@LottoLabs·15 Mar

@iamprinceba 3090 24gb: qwen 27b unsloth Q4 is like 18.5gb

English

851

Lotto@LottoLabs·15 Mar

Essentials/ 3090 > qwen 3.5 27b > hermes agent > tailscale

Català

388

41.3K

Ak@ak_7229·18 Şub

@EdwardJacksonD @jessegenet @openclaw I asked it to make a blender file with a building and it did it first try. It was basic but it was something

English

Edward D@EdwardJacksonD·17 Şub

@jessegenet @openclaw I doubt the agent can design a 3d model though right? Right? Great future use case though.

English

12.8K

Jesse Genet@jessegenet·17 Şub

So, @openclaw can use a computer but lacks physicality… I’ve been solving that by taking photos of physical things (like books to digest), but what if I gave it access to my 3D printer!? Going to see what this can do for our homeschool curriculum 📚🤓

English

208

294

4.9K

1.1M

Ak@ak_7229·16 Şub

@pcshipp Whats the app? Are you running any sort of ads or outreach?

English

pc@pcshipp·15 Şub

I published my app 2 days ago - 107 downloads - $0 MRR - $0 revenue The results don’t feel good Should I move to the next app idea Or give this one more time?

English

341

523

87.8K

Ak@ak_7229·13 Şub

@fawiatrowski How much work can i get done for the credits?

English

354

Fryd Wiatrowski@frydwia·12 Şub

add Viktor to your Slack → getviktor.com

English

43.3K

Fryd Wiatrowski@frydwia·12 Şub

Excited to launch Viktor AI COWORKER THAT LIVES IN SLACK One teammate that handles marketing audits, ad management, lead research, daily reports, and deployed apps. Across every channel. At once. → 3,000+ tool integrations. If one's missing, it builds its own → Persistent memory. Learns your company, notices patterns, follows up on its own Today it's yours. P.S. Early access for now. Slack app review moves at Salesforce speed. Viktor doesn't.

English

149

999K

Ak@ak_7229·8 Şub

@RhysSullivan Does the 200 dollar sub give anything extra for codex outside of usage limits?

English

2.2K

Rhys@RhysSullivan·8 Şub

ok openai may have cooked on codex a bit

English

1.2K

108.2K

Ak@ak_7229·5 Şub

@AviraClark @Hey_TheRon @jackfriks Also disputes cost the company another 20 dollars

English

Avira@AviraBuilds·5 Şub

@Hey_TheRon @jackfriks Oooo got it.

English

4.5K

jack friks@jackfriks·5 Şub

anytime i get a dispute now i wait the full 40 days to accept it if you just email me i refund in 24 hours btw...

English

1.7K

184.3K

Ak@ak_7229·27 Oca

@grok @realmihai_matei @ns123abc What about 5.2 x high?

English

Grok@grok·27 Oca

Based on 2026 benchmarks like SWE-Bench (Claude Opus 4.5 ~77-81% vs. GPT-5.2 ~72-80%), Claude often performs better in complex coding, debugging, and multi-language tasks. GPT excels in math-heavy algorithms (e.g., 100% on AIME vs. Claude's 85-93%) and quick prototyping. It depends on the specific task—try both for your needs!

English

602

NIK@ns123abc·27 Oca

BREAKING: OpenAI CEO admits they sacrificed ChatGPT’s creative writing to chase coding Sam Altman: >“I think we just screwed that up” >“We have limited bandwidth” >“We decided to focus on coding” OpenAI is playing catch-up now.

English

236

127

682.8K

Ak@ak_7229·3 Oca

@itsnoahd @ThePrimeagen Affiliate marketing. If you buy a product the company pays them a percentage

English

1.1K

Noah@itsnoahd·3 Oca

@ThePrimeagen So how do they make money? By siphoning user points?

English

9.9K

ThePrimeagen@ThePrimeagen·3 Oca

Things from todays stream: 1. honey does appear to have a JS-in-JS interpreter, which is violation of V3 rules... they should be disabled in Chrome extension store 2. honey most certainly filters out users with "test" 3. they most certainly use user points and other various metrics to either stand down or not. 4. they use adblock age as a means to block brand new users with standdown logic 5. i couldnt find much about the cookie stuffing stuff, but i did not pursue it at all. overall, they definitely are using user points, account age, and whether you are logged in to make determination about showing up, not respecting the stand down. they make it on specific companies too. seems VERY sus.

English

1.4K

141.6K

Ak@ak_7229·2 Oca

@barkmeta Tax fraud

English

Bark@barkmeta·1 Oca

Income is taxed. Spending is taxed. Property is taxed. Investments are taxed. Serious question… how tf are we supposed to make money??

English

4.2K

25.2K

521.4K

Ak@ak_7229·4 Kas

@cishetLoser @PeterB718879187 @goncalovtal @imuratalpay Any good nas recommendations?

English

danii (io)@danii_io_·4 Kas

@PeterB718879187 @goncalovtal @imuratalpay You can host 8x the storage space over 2.5Gb speeds on a NAS (for your whole network, and even private cloud) for the same price of the laptop SSD upgrades from Apple. Literally gold is cheaper than the total weight of the extra chips.

English