A

6.3K posts

A banner
A

A

@ababaka

meme

Beigetreten Mart 2010
50 Folgt91 Follower
A
A@ababaka·
@0xSero Have you tried this model yourself? It thinks x5 from what qwen thinks! Completely unusable.
English
0
0
1
402
0xSero
0xSero@0xSero·
Step-3.7-Flash-q3_k_m - The #1 smartest model the DGX Spark can run, beating out Qwen-3.6-35B and 27B Starts off at a blazing 62 tok/s decode, loses steam, but still unoptimised. exo run smart btw
0xSero tweet media
English
18
9
199
12.9K
A
A@ababaka·
@ornith_ Fingers crossed! 🤞 128gb ram bros waiting for good workhorse.
English
0
0
0
70
Ornith
Ornith@ornith_·
@ababaka 🐦chirp chirp! working on it!
English
1
0
10
1.7K
Ornith
Ornith@ornith_·
Aloha! 🌺 Meet Ornith-1.0, a family of open-source LLMs specialized for agentic coding. Ornith-1.0 spans the full parameter sizes including 9B Dense, 31B Dense, 35B MoE, and 397B MoE. It achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks including: ✅Terminal-Bench 2.1(77.5) ✅SWE-Bench(82.4 on verified, 62.2 on pro, 78.9 on Multilingual) ✅NL2Repo(48.2) ✅SWE Atlas(41.2 on QnA, 42.6 RF, 39.1 TW) ✅ClawEval(77.1) Post-trained on top of gemma4 and qwen3.5, Ornith-1.0 employs a novel self-improving training strategy in which reinforcement learning is used to generate not only solution rollouts, but also the task-specific scaffolds that drive those rollouts. By jointly optimizing the scaffold and the resulting solution, the model generate higher-quality solutions in agentic coding.😎 All models are released under the MIT license, enabling full commercial and research use. 📖Tech Blog: deep-reinforce.com/ornith_1_0.html 🤗Huggingface: huggingface.co/collections/de…
Ornith tweet media
English
461
958
6.3K
5M
A
A@ababaka·
@sudoingX Just backup plan when subscription limits are exhausted.
English
0
0
5
8.2K
Sudo su
Sudo su@sudoingX·
single 3090 in 2026 is...
English
6
1
2
3K
A
A@ababaka·
@sudoingX Stepfun thinks a lot. I mean A LOT.
English
0
0
0
167
Sudo su
Sudo su@sudoingX·
genuinely fucking wild how underrated stepfun's step 3.7 flash is in the dgx spark world. barely anyone's talking about it, and it's the single best model you can run on one spark, full stop. 198B, vision, the full context, i've been living in it and the gap between how good it is and how little it gets discussed is absurd. dropping my benchmarks tonight so you can see exactly what a single dgx spark is capable of in local ai.
English
24
5
178
13.1K
A
A@ababaka·
@sudoingX Tested GPT 5.5 xhigh vs Qwen3.6 27b on same task. ChatGPT rated Qwen 7/10—consistent flaw: lacks code depth, misses key details. Tried varied system prompts, still always 7/10. So not 90 but 70%, still good
English
0
0
5
786
Sudo su
Sudo su@sudoingX·
cancel your chatgpt subscription for a month. buy a single used 3090, call it a grand. run qwen 3.6 27b dense on it and let it grind on your actual work, the code, the drafts, the boring research. here's what happens. you go the whole month and barely hit a wall. the few times you do, you clock that THAT's the 10% you actually needed the frontier for, and the other 90% a card sitting in your room handled just fine. most people pay every month for capability they touch a handful of times. own the 90%, rent the rest only when you hit the wall. trust me anon, you won't look at that subscription the same again.
English
133
51
1.4K
212.5K
A
A@ababaka·
@pupposandro The 1.5tb model is hard to call local.
English
0
0
0
3
A
A@ababaka·
@stevibe Have you tried mimo m2.5 pro? It turned out to be unexpectedly good in my experiments. Plus, the 1 million context works well somewhere up to 300-400k.
English
1
0
3
212
stevibe
stevibe@stevibe·
Kimi is still my go-to model for learning, it explains complex deep learning topics clearly. GPT is too summary-like (bullet-point style), Claude Opus is pricey, and DeepSeek/Gemini are good for general topics.
English
10
2
122
6.1K
A
A@ababaka·
@SwedPaul Запихнут dgx spark в ноутбук. Скриньте.
Русский
0
0
1
359
Paul
Paul@SwedPaul·
Nvidia покажет новые ARM-чипы со встроенной графикой. Первые устройства это ноутбуки от Microsoft Ну а дальше нам придется обновлять наши ПК — и это того будет стоить. Ведь поможет сократить отставание от Apple устройств на М-чипах (Это мое предположение и не является истиной)
NVIDIA@nvidia

A new era of PC. 25.0528, 121.5990

Русский
17
0
39
14.3K
Frank
Frank@jedisct1·
I’ve just released MiMo V2.5-Coder. If you have 128 GB of RAM, this is one of the best models you can run locally. It’s fast, and in all my experiments it outperformed Qwen 3.6 and DeepSeek 4-Flash. huggingface.co/jedisct1/MiMo-…
English
53
121
1K
116.6K
A
A@ababaka·
@stevibe is there at least one model worthy of attention besides the qwen3.6 27b, which successfully accommodates even 24gb of vram?
English
0
0
0
60
stevibe
stevibe@stevibe·
Loving my DGX Spark! Perfect for testing small-to-medium models without the wait. Sure, it's about 3x slower than a full GPU, but I still get my results. A special benchmark targeting small models is coming up: Qwen 0.8B to 35B, plus the full Gemma4 series.
English
3
2
44
3.3K
A
A@ababaka·
@LottoLabs I wonder if there are agents who can parallel tasks in multiple threads well. In this case, 128GB of spark can be useful.
English
0
0
0
26
A
A@ababaka·
@sudoingX but I can describe how hot it got in the room with the 3090 running.
English
0
0
1
157
Sudo su
Sudo su@sudoingX·
you cannot describe the taste of free thinking until you own the machine that runs it. the day the model moves from someone else's cloud to your own desk, the way you think changes. every kid in this decade should grow up with their own gpu and their own model. owning the tool that shapes your thinking is not a flex, it is a foundation, like owning your own books used to be when libraries were the only place to access them. the gpu market has not been this accessible in 5 years. used 3090 for around $900 on facebook marketplace. used 4060 ti 16gb for $500. used 3060 12gb for $350. they all run real local ai today. the floor is on the floor. if you have not bought a gpu yet, this is the year. buy a gpu.
English
14
5
142
6.1K
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
And now The Apple Silicon Inference Poll! 🥁 Preferred inference engine on your Apple machine?
English
21
3
40
15.1K
A
A@ababaka·
@LottoLabs Where did the notes go? without run parameters, this site loses its meaning.
English
1
0
1
77
Lotto
Lotto@LottoLabs·
Why don’t Anthropic and OpenAI drop banger ~27b like models and just sell licenses for them
English
61
5
1K
87.1K
A
A@ababaka·
@sudoingX Don't get your hopes up. in my experience and posts on reddit, with the quantization of qwen3.6 27b cache, model starts processing old messages in a loop. It took me a while to figure out what was going on, but that was probably it.
English
0
0
0
193
Sudo su
Sudo su@sudoingX·
"how do you fit qwen 3.6 27b q4 on 24gb at 262k context" lands in my dms 5 times a week. here is the exact memory math. model bytes at idle = 16gb (q4_k_m of 27b dense) kv cache at 262k context with q4_0 for both k and v = 5gb total = 21gb on the card headroom = 3gb for prompts and tool call traces the magic is the kv cache type. most people leave it at default fp16 or push to q8 thinking quality wins. on qwen 3.6 27b dense at 262k: - fp16 kv cache = does not fit at all - q8 kv cache = fits at 23gb but runs 3x slower (double penalty: more vram, less speed) - q4_0 kv cache = fits at 21gb at full speed (40 tok/s flat curve, same speed at 4k or 262k) most builders never test the kv cache type because tutorials never mention it. it is the single biggest unlock on consumer 24gb hardware. flags i run: ./llama-server -m Qwen3.6-27B-Q4_K_M.gguf -ngl 99 -c 262144 -np 1 -fa on --cache-type-k q4_0 --cache-type-v q4_0 what they do: -ngl 99 = offload everything to gpu -c 262144 = 262k context window -np 1 = single user slot (do not enable multi-slot, eats headroom) -fa on = flash attention on (memory and speed both win) --cache-type-k q4_0 --cache-type-v q4_0 = the unlock if you are sitting on 24gb and not running this config, you are leaving 250k of context on the table. or worse, you are running q8 kv cache and burning 3x your speed for nothing. q4 is not a compromise on consumer hardware. it is the right call.
English
86
109
1.3K
75.8K
A
A@ababaka·
@kadirnardev Qwen released 27b about month ago. What are you talking about?
English
0
0
0
54
Kadir Nar
Kadir Nar@kadirnardev·
The Qwen team is no longer releasing their models as open source, and this is a big problem for us. We need small models to train many models like TTS, STT, Omni, and others. Previously there was LLaMA, but they're no longer releasing either. The Qwen team won't be releasing anymore either. Our only hope is the LFM models. Minimax, Kimi, and GLM teams are releasing great models for open source, but none of them release small models. And if these companies also stop releasing open source, it's going to be really bad :(
English
70
51
893
74.2K
A
A@ababaka·
@UnslothAI Hmm... Can it really fit on 16gb vram with 200k context? Highly doubt
English
0
0
1
386
Unsloth AI
Unsloth AI@UnslothAI·
This model has been #1 trending for 3 weeks now. It's Qwen3.5-27B fine-tuned on distilled data from Claude-4.6-Opus (reasoning). Trained via Unsloth. Runs locally on 16GB in 4-bit or 32GB in 8-bit. Model: huggingface.co/Jackrong/Qwen3…
Unsloth AI tweet media
English
88
227
2.7K
208.8K