jason

2K posts

jason banner
jason

jason

@1louder

Oracle of the digital age, a compass navigating the human odyssey.

San Francisco Katılım Nisan 2009
581 Takip Edilen196 Takipçiler
jason
jason@1louder·
Use llama.cpp compiled from source, which updates almost daily. Codex, Claude Code, Hermes, or Openclaw can do this for you. Just ask it to install lllama.cpp and Qwen3.6 27B or 35B MTP. LM Studio is essentially a graphical user interface (GUI) built on top of llama.cpp. Model Compatibility: Both use the GGUF format. You can download a model in LM Studio and then point standalone llama.cpp to the same file to run it, or vice versa. If you need a GUI for llama-server add Open WebUI
English
1
0
2
251
Sudo su
Sudo su@sudoingX·
if you run a single 24gb gpu, a 3090, a 4090, a 7900 xtx, whatever gets you the 24 gigs, the no brainer pick is qwen 3.6 27b dense at q4. not close. i have run the tier. it fits in 24gb with real context room to spare, it keeps the reasoning smaller models lose, it pushes around 41 tok/s on a single 3090, and i watched it one shot a playable game start to finish, zero iterations. nothing else in that vram class does what this model does. undisputed king of the 24gb tier, and there is nothing you can say to change my mind.
English
59
28
600
31.6K
jason
jason@1louder·
@txtnurcahyo @sudoingX Try MTP (Multi Token Prediction) speculative decoding #mtp-guide#mtp-benchmarks#mtp-guide" target="_blank" rel="nofollow noopener">unsloth.ai/docs/models/qw… Qwen3.6-27B runs on 18GB RAM setups~1.4-2x faster generation with no change in accuracy
jason tweet media
English
1
0
1
191
Wahyu Nurcahyo 🇮🇩@🇩🇪
@sudoingX Please help, I have 7900 xtx using Qwen 3.6 32b moe is blazing fast, but using 27b runs like a snail. I use Windows 11 and LM Studio. Please inform me how to make it useful. Actually okay for normal chat, but for Claude Code, is too slow.
English
5
0
2
1.5K
jason
jason@1louder·
@leftcurvedev_ The primary benefit here is throughput. Dense models (Qwen3.6 27B) benefit more than sparse/(35B MoE). No loss in coherence or recall out to 132k tokens, 27B q5_k_m is the first local model to pass my Lambda Calculus and long-context benchmarks. #mtp-guide#mtp-benchmarks" target="_blank" rel="nofollow noopener">unsloth.ai/docs/models/qw…
jason tweet media
English
0
0
2
315
left curve dev
left curve dev@leftcurvedev_·
I nearly 2x'd the speed while only using +1GB VRAM with the new MTP update in llama.cpp 🤯 You need to add these flags to start using it: --spec-type draft-mtp \ --spec-draft-p-min 0.75 \ --spec-draft-n-max 2 My results with Qwen3.6 27B on a single RTX 5080 ↓ ⚪️ no flag (without mtp) → 54.3 tok/s with 13.26GB VRAM 🔵 --spec-draft-n-max 2 → 90.7 tok/s with 14.29GB VRAM 🔴 --spec-draft-n-max 2 --spec-draft-p-min 0.75 → 93.9 tok/s with 14.30GB VRAM 🟢 --spec-draft-n-max 6 --spec-draft-p-min 0.75 → 93.9 tok/s with 14.87GB VRAM Increasing to 6 draft tokens didn't help my setup for some reason. I made sure to test with a low context length to have enough headroom and eliminate risk of vram stress. From my understanding: 1) The speed gains are very task-dependent. You need to test across a wide range of tasks to get a realistic idea of the benefits 2) We’re already running heavily quantized GGUF models (Q3, Q4, Q6, etc.), so we already benefit from strong speed/performance thanks to the reduced size. That’s why some people are seeing little to no improvement compared to MLX or other quantized versions The progress over the past few days has been insane to say the least. However, MTP now consumes significantly more VRAM. Personally 16GB just isn't enough to use MTP and run it with a good context size. Time to upgrade lads, 24GB+ users are eating GOOD today 🔥 Full setup below ↓
English
29
37
429
26.5K
jason
jason@1louder·
@aijoey I'm running Qwen3.6 27B Multi-Token Prediction (MTP). 2T is fast enough to deploy on a single Spark DGX for production ops and simple coding. Depth-bench (long-context memo) coherence and NIAH are the best. huggingface.co/unsloth/Qwen3.…
jason tweet mediajason tweet media
English
0
0
0
72
Joey
Joey@aijoey·
dgx spark learning log. trying to understand how far small local agent models can go before we point them at real benchmarks like webworld. this run used: - hardware: nvidia dgx spark / gb10 - model: DJLougen/Harmonic-Hermes-9B-GGUF, Q5_K_M - runtime: llama.cpp llama-server, local OpenAI compatible endpoint - model lineage: qwen3.5 architecture; Harmonic-Hermes-9B by @DJLougen , GGUF quant metadata says quantized by @UnslothAI - test harness: a tiny local python repo with pytest - safe tools exposed to the model: run_tests, read_file, patch_file what happened: the model ran the tests, read the broken file, found the off by one bug, patched it, and got the tests passing. it is a smoke test while learning the DGX Spark/local agent stack: local GGUF model, local runtime, safe tool calls, verified patch, tests green. next step: webworld / real web task benchmarking.
English
4
2
31
3.5K
jason
jason@1louder·
@theo CLI people have you covered! Send clipboard screenshots to a remote SSH host and put the remote file path back into your clipboard so you can paste it into the terminal (e.g., Claude Code, OpenCode, any TUI)
jason tweet media
English
0
0
6
852
Theo - t3.gg
Theo - t3.gg@theo·
Just learned it's literally impossible to paste images into Claude Code over SSH. How do you CLI people live like this??
English
408
18
2.4K
654.5K
jason
jason@1louder·
This is less true today. AI has become a universal semantic bridge between intuition and formalization. In the past the transition from intuition (mental models) to formal language (math/code) required a manual, high-impedance "compilation" process. This gap often resulted in the death of ideas because the creator lacked the formal syntax to express them. The concept of "Newtonian Gravity" exists as a cluster of coordinates. Whether that cluster is accessed via the string F = G(m1m2)/r^2 or the English phrase "The force between two masses is proportional to their weight and inverse to distance," the underlying Semantic Tensor is identical. As a tech founder hiring talent I see value moving away from "Syntax Specialists" (common Python/JS to C++/Rust) toward "System Architects" (people with high-fidelity intuition who can orchestrate LLM translations.
jason tweet media
English
0
0
0
18
kitze
kitze@thekitze·
i'm so tired of everything and everyone i have no idea how i'm still pushing through in this industry i need a change asap
English
37
1
158
15.9K
Nathan Wilbanks
Nathan Wilbanks@NathanWilbanks_·
I've been running the same AI agent every day since JULY 🚨 - 297 days running 24/7 - 993,115 seconds of compute time automated - 5,020,623,362 tokens generated - 127,743 workflows ran - 605,292 tool executions Comment AGNT below to steal my setup 👇
English
194
27
231
28.4K
jason
jason@1louder·
@heskelbalas @AlekseiKaplin @i2cjak @fishPointer @yacineMTB That looks cool, but you can use the laser printer toner transfer method, buy some copper clad FR4 boards on amazon for ~$1 ea. and MG Chemicals 415-1L Ferric Chloride (also on Amazon) and etch your own boards.
English
0
0
0
62
jason
jason@1louder·
Some people need Windows for reasons. Run native Windows llama.cpp w/ CUDA built from source, not WSL. Cursor/Codex/Claude Code. Antigravity is getting worse, but useful for browser-in-the-loop front end iteration. LM Studio is nice for LLM experimentation, swapping, and learning.
English
0
0
5
725
jason
jason@1louder·
@theo I'm guessing Palantir does not have this problem.
English
0
0
0
268
Theo - t3.gg
Theo - t3.gg@theo·
I can't even do basic cryptographic challenges with Opus. I get that they're trying to limit it for safety but this is insane
Theo - t3.gg tweet media
English
98
28
1.1K
83.5K
jason
jason@1louder·
@ThePrimeagen At this point, I’m convinced they’re A/B testing us.
English
0
0
0
1.1K
ThePrimeagen
ThePrimeagen@ThePrimeagen·
You all need to stop. opus 4.7 has been performance mogging on trusted benchmarks. y'all are just holding it wrong.
ThePrimeagen tweet mediaThePrimeagen tweet media
English
54
8
632
94.5K
jason
jason@1louder·
@morganlinton RTX4090 is slightly faster vs. 3090 for a single node but same 24GB size. I run Qwen3.6-35B-A3B-UD-Q4_K_XL - 262k context. and Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2 (jackrong)
jason tweet media
English
0
0
0
156
Morgan
Morgan@morganlinton·
Okay, so I've come to the conclusion that I need a 3090, like yesterday. Really want to run more powerful LLMs locally at home. Relatively new territory for me, I'm far from an expert, so was chatting with Perplexity about it. Here's what it thinks I need, hoping someone like @0xSero or @LottoLabs and let me know how right, or wrong it is, and what I really need. Trying not to break the bank so I'm okay starting small(ish) 🤏
Morgan tweet media
English
34
2
60
8K
jason
jason@1louder·
I get better results. Qwen 3.6-35B-A3B needs reasoning tokens (similar to DeepSeek-R1). For 1-shot math and coding, you must ensure this mode is active. Unsloth Dynamic 2.0 quant preserves more logic/syntax weights in the attention layers compared to standard 4-bit quants. Inference parameters matter. Qwen 3.6 on single-node RTX4090 is part of my stack now. It passes my High-Fidelity λ-Calculus Evaluator. Try these...
jason tweet media
English
0
0
3
280
Taelin
Taelin@VictorTaelin·
@N8Programs i couldn't care less just READ the fucking outputs i asked a simple λ-calculus evaluator and it gave me SHIT that looks straight out of GPT-2 meanwhile opus gives me code straight from God's Book
English
7
0
169
6.7K
Taelin
Taelin@VictorTaelin·
Ok so I just tested Qwen 3.6, and I'm now depressed again. I won't say bad things about it, I'll just accept that local models aren't happening and our lives will depend more and more on the goodwill of two companies upon us. Yay
English
173
32
1.7K
233.7K
stevibe
stevibe@stevibe·
@relentless4o I agree. Maybe in the near future, we'll have a convenient way to route tasks to the specific small models that excel at them?
English
2
0
4
3.2K
stevibe
stevibe@stevibe·
I gave two MoE models the same vibe coding challenge Qwen3.6 35B A3B (31.8GB) vs Gemma4 26B A4B (23.3GB) Stack: > Unsloth Q6_K_XL > llama.cpp > Model-card recommended sampling for each 4 prompts, side-by-side. Which one do you think wins?
English
68
116
1.5K
260.7K
jason
jason@1louder·
@gregisenberg The 'branding problem' is structural rot from misaligned incentives. Retail subscribers are a source of value to extract. The target market for OpenAI /Anthropic is Govt. & Enterprise where hype and scare tactics grow market share & build regulatory moats. XAi/Grok's take:
jason tweet media
English
0
0
0
76
GREG ISENBERG
GREG ISENBERG@gregisenberg·
AI has a serious branding problem Probably worse than web3/crypto/NFTs if you ask the average person in the streets, they probably fear and hate AI
English
515
59
1K
122.4K
jason
jason@1louder·
This is a VC-friendly scare narrative. Cal.com is building a SaaS product with healthcare and govt. customers. Your analysis fails logically and epistemologically. Open source isn't "dead", it's evolving, as it always has. AI raises real security challenges for all software (open or closed), but transparency has historically been a net positive for security, @pumfleet cherry-picks offensive AI capabilities while downplaying adaptation, community resilience, and the fact that many OSS projects will thrive precisely because they can leverage AI for faster, collective fixes.
English
0
0
1
69
jason
jason@1louder·
@gregisenberg OSS evolves from readable source and becomes the whole production system around it: prompts, agent workflows, tests, evals, issue routing, CI policy, provenance, and maintainer judgment.... bigger more economically important, lower marginal cost, messier licensing.
English
0
0
0
17
jason
jason@1louder·
@gregisenberg It becomes a liquid. Steve Yegge is early, and what he says is not fully true, feasible, or desirable yet, but I think in the near future he's not far off. “code is a liquid....You spray it through hoses. You don't f***ing look at it.” youtube.com/watch?v=YWejaU…
YouTube video
YouTube
English
1
0
1
153
GREG ISENBERG
GREG ISENBERG@gregisenberg·
What happens to open source when AI is writing 100% of the code? I've been thinking about this a lot. Like… the whole system was built around humans valuing the act of contribution. You learned, you struggled, you submitted a PR, you got feedback, you got better. That loop created engineers. It created community. It created ownership. If AI writes the PR, who owns it? Who learned from it? Who's gonna stay up at 2am debugging the thing they shipped because they actually care? The cool part about OSS is that no one owns it. As a consumer, you could always look under the hood, fork it, take it somewhere else. I don't think open source dies. But I genuinely don't know what it becomes... Any ideas?
English
170
14
243
28.8K