jason

2K posts

jason

@1louder

Oracle of the digital age, a compass navigating the human odyssey.

San Francisco Katılım Nisan 2009

581 Takip Edilen196 Takipçiler

jason@1louder·1d

Use llama.cpp compiled from source, which updates almost daily. Codex, Claude Code, Hermes, or Openclaw can do this for you. Just ask it to install lllama.cpp and Qwen3.6 27B or 35B MTP. LM Studio is essentially a graphical user interface (GUI) built on top of llama.cpp. Model Compatibility: Both use the GGUF format. You can download a model in LM Studio and then point standalone llama.cpp to the same file to run it, or vice versa. If you need a GUI for llama-server add Open WebUI

English

251

Wahyu Nurcahyo 🇮🇩@🇩🇪@txtnurcahyo·1d

@1louder @sudoingX I tried using LM Studio but failed, not supported yet. Ollama doesn't have official Qwen 3.6 MTP. Wait for the next MTP support.

English

206

Sudo su@sudoingX·2d

if you run a single 24gb gpu, a 3090, a 4090, a 7900 xtx, whatever gets you the 24 gigs, the no brainer pick is qwen 3.6 27b dense at q4. not close. i have run the tier. it fits in 24gb with real context room to spare, it keeps the reasoning smaller models lose, it pushes around 41 tok/s on a single 3090, and i watched it one shot a playable game start to finish, zero iterations. nothing else in that vram class does what this model does. undisputed king of the 24gb tier, and there is nothing you can say to change my mind.

English

600

31.6K

jason@1louder·1d

@txtnurcahyo @sudoingX Try MTP (Multi Token Prediction) speculative decoding #mtp-guide#mtp-benchmarks#mtp-guide" target="_blank" rel="nofollow noopener">unsloth.ai/docs/models/qw… Qwen3.6-27B runs on 18GB RAM setups~1.4-2x faster generation with no change in accuracy

English

191

Wahyu Nurcahyo 🇮🇩@🇩🇪@txtnurcahyo·2d

@sudoingX Please help, I have 7900 xtx using Qwen 3.6 32b moe is blazing fast, but using 27b runs like a snail. I use Windows 11 and LM Studio. Please inform me how to make it useful. Actually okay for normal chat, but for Claude Code, is too slow.

English

1.5K

jason@1louder·2d

@leftcurvedev_ The primary benefit here is throughput. Dense models (Qwen3.6 27B) benefit more than sparse/(35B MoE). No loss in coherence or recall out to 132k tokens, 27B q5_k_m is the first local model to pass my Lambda Calculus and long-context benchmarks. #mtp-guide#mtp-benchmarks" target="_blank" rel="nofollow noopener">unsloth.ai/docs/models/qw…

English

315

left curve dev@leftcurvedev_·2d

I nearly 2x'd the speed while only using +1GB VRAM with the new MTP update in llama.cpp 🤯 You need to add these flags to start using it: --spec-type draft-mtp \ --spec-draft-p-min 0.75 \ --spec-draft-n-max 2 My results with Qwen3.6 27B on a single RTX 5080 ↓ ⚪️ no flag (without mtp) → 54.3 tok/s with 13.26GB VRAM 🔵 --spec-draft-n-max 2 → 90.7 tok/s with 14.29GB VRAM 🔴 --spec-draft-n-max 2 --spec-draft-p-min 0.75 → 93.9 tok/s with 14.30GB VRAM 🟢 --spec-draft-n-max 6 --spec-draft-p-min 0.75 → 93.9 tok/s with 14.87GB VRAM Increasing to 6 draft tokens didn't help my setup for some reason. I made sure to test with a low context length to have enough headroom and eliminate risk of vram stress. From my understanding: 1) The speed gains are very task-dependent. You need to test across a wide range of tasks to get a realistic idea of the benefits 2) We’re already running heavily quantized GGUF models (Q3, Q4, Q6, etc.), so we already benefit from strong speed/performance thanks to the reduced size. That’s why some people are seeing little to no improvement compared to MLX or other quantized versions The progress over the past few days has been insane to say the least. However, MTP now consumes significantly more VRAM. Personally 16GB just isn't enough to use MTP and run it with a good context size. Time to upgrade lads, 24GB+ users are eating GOOD today 🔥 Full setup below ↓

English

429

26.5K

jason@1louder·2d

@aijoey I'm running Qwen3.6 27B Multi-Token Prediction (MTP). 2T is fast enough to deploy on a single Spark DGX for production ops and simple coding. Depth-bench (long-context memo) coherence and NIAH are the best. huggingface.co/unsloth/Qwen3.…

English

Joey@aijoey·9 May

dgx spark learning log. trying to understand how far small local agent models can go before we point them at real benchmarks like webworld. this run used: - hardware: nvidia dgx spark / gb10 - model: DJLougen/Harmonic-Hermes-9B-GGUF, Q5_K_M - runtime: llama.cpp llama-server, local OpenAI compatible endpoint - model lineage: qwen3.5 architecture; Harmonic-Hermes-9B by @DJLougen , GGUF quant metadata says quantized by @UnslothAI - test harness: a tiny local python repo with pytest - safe tools exposed to the model: run_tests, read_file, patch_file what happened: the model ran the tests, read the broken file, found the off by one bug, patched it, and got the tests passing. it is a smoke test while learning the DGX Spark/local agent stack: local GGUF model, local runtime, safe tool calls, verified patch, tests green. next step: webworld / real web task benchmarking.

English

3.5K

jason@1louder·4d

@theo CLI people have you covered! Send clipboard screenshots to a remote SSH host and put the remote file path back into your clipboard so you can paste it into the terminal (e.g., Claude Code, OpenCode, any TUI)

English

852

Theo - t3.gg@theo·4d

Just learned it's literally impossible to paste images into Claude Code over SSH. How do you CLI people live like this??

English

408

2.4K

654.5K

jason@1louder·11 May

This is less true today. AI has become a universal semantic bridge between intuition and formalization. In the past the transition from intuition (mental models) to formal language (math/code) required a manual, high-impedance "compilation" process. This gap often resulted in the death of ideas because the creator lacked the formal syntax to express them. The concept of "Newtonian Gravity" exists as a cluster of coordinates. Whether that cluster is accessed via the string F = G(m1m2)/r^2 or the English phrase "The force between two masses is proportional to their weight and inverse to distance," the underlying Semantic Tensor is identical. As a tech founder hiring talent I see value moving away from "Syntax Specialists" (common Python/JS to C++/Rust) toward "System Architects" (people with high-fidelity intuition who can orchestrate LLM translations.

English

jason@1louder·30 Nis

@thekitze Unplug and meditate.

English

288

kitze@thekitze·30 Nis

i'm so tired of everything and everyone i have no idea how i'm still pushing through in this industry i need a change asap

English

158

15.9K

jason@1louder·26 Nis

@NathanWilbanks_ done! Thanks.

English

Nathan Wilbanks@NathanWilbanks_·25 Nis

@1louder open your dms and I'll send it your way!

English

129

Nathan Wilbanks@NathanWilbanks_·25 Nis

I've been running the same AI agent every day since JULY 🚨 - 297 days running 24/7 - 993,115 seconds of compute time automated - 5,020,623,362 tokens generated - 127,743 workflows ran - 605,292 tool executions Comment AGNT below to steal my setup 👇

English

194

231

28.4K

jason@1louder·24 Nis

@heskelbalas @AlekseiKaplin @i2cjak @fishPointer @yacineMTB That looks cool, but you can use the laser printer toner transfer method, buy some copper clad FR4 boards on amazon for ~$1 ea. and MG Chemicals 415-1L Ferric Chloride (also on Amazon) and etch your own boards.

English

Heskel Balas 🚁@heskelbalas·24 Nis

@AlekseiKaplin @i2cjak @fishPointer @yacineMTB I know how the 'proper' PCBs are made. I don't have the machines to do it at home

English

37.9K

Heskel Balas 🚁@heskelbalas·24 Nis

Wait, you can just 3D print a circuit layout and then copper leaf it, and it works?!! We could have done this the whole time? @i2cjak @fishPointer @yacineMTB

English

380

1.2K

14K

1.9M

jason@1louder·24 Nis

Some people need Windows for reasons. Run native Windows llama.cpp w/ CUDA built from source, not WSL. Cursor/Codex/Claude Code. Antigravity is getting worse, but useful for browser-in-the-loop front end iteration. LM Studio is nice for LLM experimentation, swapping, and learning.

English

725

Ahmad@TheAhmadOsman·24 Nis

Pro tip DO NOT USE Windows for local LLMs

moebiusSurfing@moebiussurfing

@TheAhmadOsman What's your favourite software setup for Windows 11 and the 2x3090? Lm studio, openCode, antigravity..?

English

904

79.9K

jason@1louder·21 Nis

@theo I'm guessing Palantir does not have this problem.

English

268

Theo - t3.gg@theo·21 Nis

I can't even do basic cryptographic challenges with Opus. I get that they're trying to limit it for safety but this is insane

English

1.1K

83.5K

jason@1louder·20 Nis

@ThePrimeagen At this point, I’m convinced they’re A/B testing us.

English

1.1K

ThePrimeagen@ThePrimeagen·20 Nis

You all need to stop. opus 4.7 has been performance mogging on trusted benchmarks. y'all are just holding it wrong.

English

632

94.5K

jason@1louder·20 Nis

@morganlinton RTX4090 is slightly faster vs. 3090 for a single node but same 24GB size. I run Qwen3.6-35B-A3B-UD-Q4_K_XL - 262k context. and Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2 (jackrong)

English

156

Morgan@morganlinton·19 Nis

Okay, so I've come to the conclusion that I need a 3090, like yesterday. Really want to run more powerful LLMs locally at home. Relatively new territory for me, I'm far from an expert, so was chatting with Perplexity about it. Here's what it thinks I need, hoping someone like @0xSero or @LottoLabs and let me know how right, or wrong it is, and what I really need. Trying not to break the bank so I'm okay starting small(ish) 🤏

English

jason@1louder·19 Nis

I get better results. Qwen 3.6-35B-A3B needs reasoning tokens (similar to DeepSeek-R1). For 1-shot math and coding, you must ensure this mode is active. Unsloth Dynamic 2.0 quant preserves more logic/syntax weights in the attention layers compared to standard 4-bit quants. Inference parameters matter. Qwen 3.6 on single-node RTX4090 is part of my stack now. It passes my High-Fidelity λ-Calculus Evaluator. Try these...

English

280

Taelin@VictorTaelin·19 Nis

@N8Programs i couldn't care less just READ the fucking outputs i asked a simple λ-calculus evaluator and it gave me SHIT that looks straight out of GPT-2 meanwhile opus gives me code straight from God's Book

English

169

6.7K

Taelin@VictorTaelin·19 Nis

Ok so I just tested Qwen 3.6, and I'm now depressed again. I won't say bad things about it, I'll just accept that local models aren't happening and our lives will depend more and more on the goodwill of two companies upon us. Yay

English

173

1.7K

233.7K

jason@1louder·19 Nis

@stevibe @relentless4o Has anyone tried llama-swap? github.com/mostlygeek/lla…. It has smart proxy routes w transparent upstream server stop/start and definable parameters/ports for each model.

English

stevibe@stevibe·19 Nis

@relentless4o I agree. Maybe in the near future, we'll have a convenient way to route tasks to the specific small models that excel at them?

English

3.2K

stevibe@stevibe·19 Nis

I gave two MoE models the same vibe coding challenge Qwen3.6 35B A3B (31.8GB) vs Gemma4 26B A4B (23.3GB) Stack: > Unsloth Q6_K_XL > llama.cpp > Model-card recommended sampling for each 4 prompts, side-by-side. Which one do you think wins?

English

116

1.5K

260.7K

jason@1louder·19 Nis

@gregisenberg The 'branding problem' is structural rot from misaligned incentives. Retail subscribers are a source of value to extract. The target market for OpenAI /Anthropic is Govt. & Enterprise where hype and scare tactics grow market share & build regulatory moats. XAi/Grok's take:

English

GREG ISENBERG@gregisenberg·16 Nis

AI has a serious branding problem Probably worse than web3/crypto/NFTs if you ask the average person in the streets, they probably fear and hate AI

English

515

122.4K

jason@1louder·16 Nis

This is a VC-friendly scare narrative. Cal.com is building a SaaS product with healthcare and govt. customers. Your analysis fails logically and epistemologically. Open source isn't "dead", it's evolving, as it always has. AI raises real security challenges for all software (open or closed), but transparency has historically been a net positive for security, @pumfleet cherry-picks offensive AI capabilities while downplaying adaptation, community resilience, and the fact that many OSS projects will thrive precisely because they can leverage AI for faster, collective fixes.

English

Bailey Pumfleet@pumfleet·15 Nis

x.com/i/article/2044…

ZXX

18.8K

jason@1louder·16 Nis

@gregisenberg OSS evolves from readable source and becomes the whole production system around it: prompts, agent workflows, tests, evals, issue routing, CI policy, provenance, and maintainer judgment.... bigger more economically important, lower marginal cost, messier licensing.

English

jason@1louder·16 Nis

@gregisenberg It becomes a liquid. Steve Yegge is early, and what he says is not fully true, feasible, or desirable yet, but I think in the near future he's not far off. “code is a liquid....You spray it through hoses. You don't f***ing look at it.” youtube.com/watch?v=YWejaU…

YouTube

English

153

GREG ISENBERG@gregisenberg·16 Nis

What happens to open source when AI is writing 100% of the code? I've been thinking about this a lot. Like… the whole system was built around humans valuing the act of contribution. You learned, you struggled, you submitted a PR, you got feedback, you got better. That loop created engineers. It created community. It created ownership. If AI writes the PR, who owns it? Who learned from it? Who's gonna stay up at 2am debugging the thing they shipped because they actually care? The cool part about OSS is that no one owns it. As a consumer, you could always look under the hood, fork it, take it somewhere else. I don't think open source dies. But I genuinely don't know what it becomes... Any ideas?

English

170

243

28.8K

Keşfet

@sudoingX @txtnurcahyo @leftcurvedev_ @aijoey @DJLougen @UnslothAI @theo @thekitze