6.3K posts

A

@ababaka

meme

Beigetreten Mart 2010

50 Folgt91 Follower

A@ababaka·1d

@0xSero Have you tried this model yourself? It thinks x5 from what qwen thinks! Completely unusable.

English

402

0xSero@0xSero·1d

Step-3.7-Flash-q3_k_m - The #1 smartest model the DGX Spark can run, beating out Qwen-3.6-35B and 27B Starts off at a blazing 62 tok/s decode, loses steam, but still unoptimised. exo run smart btw

English

199

12.9K

A@ababaka·1d

@ornith_ Fingers crossed! 🤞 128gb ram bros waiting for good workhorse.

English

Ornith@ornith_·1d

@ababaka 🐦chirp chirp! working on it!

English

1.7K

Ornith@ornith_·2d

Aloha! 🌺 Meet Ornith-1.0, a family of open-source LLMs specialized for agentic coding. Ornith-1.0 spans the full parameter sizes including 9B Dense, 31B Dense, 35B MoE, and 397B MoE. It achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks including: ✅Terminal-Bench 2.1(77.5) ✅SWE-Bench(82.4 on verified, 62.2 on pro, 78.9 on Multilingual) ✅NL2Repo(48.2) ✅SWE Atlas(41.2 on QnA, 42.6 RF, 39.1 TW) ✅ClawEval(77.1) Post-trained on top of gemma4 and qwen3.5, Ornith-1.0 employs a novel self-improving training strategy in which reinforcement learning is used to generate not only solution rollouts, but also the task-specific scaffolds that drive those rollouts. By jointly optimizing the scaffold and the resulting solution, the model generate higher-quality solutions in agentic coding.😎 All models are released under the MIT license, enabling full commercial and research use. 📖Tech Blog: deep-reinforce.com/ornith_1_0.html 🤗Huggingface: huggingface.co/collections/de…

English

461

958

6.3K

A@ababaka·5d

@sudoingX Just backup plan when subscription limits are exhausted.

English

8.2K

Sudo su@sudoingX·5d

single 3090 in 2026 is...

English

A@ababaka·5d

@sudoingX Stepfun thinks a lot. I mean A LOT.

English

167

Sudo su@sudoingX·5d

genuinely fucking wild how underrated stepfun's step 3.7 flash is in the dgx spark world. barely anyone's talking about it, and it's the single best model you can run on one spark, full stop. 198B, vision, the full context, i've been living in it and the gap between how good it is and how little it gets discussed is absurd. dropping my benchmarks tonight so you can see exactly what a single dgx spark is capable of in local ai.

English

178

13.1K

A@ababaka·20 Haz

@sudoingX Tested GPT 5.5 xhigh vs Qwen3.6 27b on same task. ChatGPT rated Qwen 7/10—consistent flaw: lacks code depth, misses key details. Tried varied system prompts, still always 7/10. So not 90 but 70%, still good

English

786

Sudo su@sudoingX·20 Haz

cancel your chatgpt subscription for a month. buy a single used 3090, call it a grand. run qwen 3.6 27b dense on it and let it grind on your actual work, the code, the drafts, the boring research. here's what happens. you go the whole month and barely hit a wall. the few times you do, you clock that THAT's the 10% you actually needed the frontier for, and the other 90% a card sitting in your room handled just fine. most people pay every month for capability they touch a handful of times. own the 90%, rent the rest only when you hit the wall. trust me anon, you won't look at that subscription the same again.

English

133

1.4K

212.5K

A@ababaka·17 Haz

@pupposandro The 1.5tb model is hard to call local.

English

A@ababaka·13 Haz

@stevibe Have you tried mimo m2.5 pro? It turned out to be unexpectedly good in my experiments. Plus, the 1 million context works well somewhere up to 300-400k.

English

212

stevibe@stevibe·13 Haz

Kimi is still my go-to model for learning, it explains complex deep learning topics clearly. GPT is too summary-like (bullet-point style), Claude Opus is pricey, and DeepSeek/Gemini are good for general topics.

English

122

6.1K

A@ababaka·30 May

@ivanfioravanti Looks like q3 is total garbage.. look at metrics huggingface.co/AesSedai/Step-…

English

138

Ivan Fioravanti ᯅ@ivanfioravanti·30 May

Step-3.7-Flash 3bit gs32 added, it will work on a 128GB machine, does it work well? Let me do some test. huggingface.co/collections/iv…

English

2.5K

A@ababaka·30 May

@SwedPaul Запихнут dgx spark в ноутбук. Скриньте.

Русский

359

Paul@SwedPaul·29 May

Nvidia покажет новые ARM-чипы со встроенной графикой. Первые устройства это ноутбуки от Microsoft Ну а дальше нам придется обновлять наши ПК — и это того будет стоить. Ведь поможет сократить отставание от Apple устройств на М-чипах (Это мое предположение и не является истиной)

NVIDIA@nvidia

A new era of PC. 25.0528, 121.5990

Русский

14.3K

A@ababaka·25 May

@jedisct1 @DgitalNarrative It's not the answer

English

284

Frank@jedisct1·25 May

@DgitalNarrative It's a much larger MoE model.

English

Frank@jedisct1·25 May

I’ve just released MiMo V2.5-Coder. If you have 128 GB of RAM, this is one of the best models you can run locally. It’s fast, and in all my experiments it outperformed Qwen 3.6 and DeepSeek 4-Flash. huggingface.co/jedisct1/MiMo-…

English

121

116.6K

A@ababaka·15 May

@stevibe is there at least one model worthy of attention besides the qwen3.6 27b, which successfully accommodates even 24gb of vram?

English

stevibe@stevibe·15 May

Loving my DGX Spark! Perfect for testing small-to-medium models without the wait. Sure, it's about 3x slower than a full GPU, but I still get my results. A special benchmark targeting small models is coming up: Qwen 0.8B to 35B, plus the full Gemma4 series.

English

3.3K

A@ababaka·7 May

@LottoLabs I wonder if there are agents who can parallel tasks in multiple threads well. In this case, 128GB of spark can be useful.

English

Lotto@LottoLabs·7 May

This is crazy because 27b running above 30TPS on a spark gives you actual usability and tons of room left over

Banana@banana_baeee

My DFlash decode optimized numbers are here for 3.6 - quite variable, but can make a big difference. I am hoping to combine the decode and prefill optimizations into one fast 27B dense solution and get the best of both! localmaxxing.com/runs/cmomgvsoo…

English

4.2K

A@ababaka·5 May

@sudoingX but I can describe how hot it got in the room with the 3090 running.

English

157

Sudo su@sudoingX·5 May

you cannot describe the taste of free thinking until you own the machine that runs it. the day the model moves from someone else's cloud to your own desk, the way you think changes. every kid in this decade should grow up with their own gpu and their own model. owning the tool that shapes your thinking is not a flex, it is a foundation, like owning your own books used to be when libraries were the only place to access them. the gpu market has not been this accessible in 5 years. used 3090 for around $900 on facebook marketplace. used 4060 ti 16gb for $500. used 3060 12gb for $350. they all run real local ai today. the floor is on the floor. if you have not bought a gpu yet, this is the year. buy a gpu.

English

142

6.1K

A@ababaka·5 May

@ivanfioravanti Take a look at this. Probably you may add one more competitor youtu.be/ZwCbChJWXkQ

YouTube

English

Ivan Fioravanti ᯅ@ivanfioravanti·4 May

And now The Apple Silicon Inference Poll! 🥁 Preferred inference engine on your Apple machine?

English

15.1K

A@ababaka·3 May

@LottoLabs Where did the notes go? without run parameters, this site loses its meaning.

English

Lotto@LottoLabs·2 May

Go checkout localmaxxing.com to see qwen 27b eat their lunch

English

1.1K

Lotto@LottoLabs·2 May

Why don’t Anthropic and OpenAI drop banger ~27b like models and just sell licenses for them

English

87.1K

A@ababaka·29 Nis

@sudoingX Don't get your hopes up. in my experience and posts on reddit, with the quantization of qwen3.6 27b cache, model starts processing old messages in a loop. It took me a while to figure out what was going on, but that was probably it.

English

193

Sudo su@sudoingX·28 Nis

"how do you fit qwen 3.6 27b q4 on 24gb at 262k context" lands in my dms 5 times a week. here is the exact memory math. model bytes at idle = 16gb (q4_k_m of 27b dense) kv cache at 262k context with q4_0 for both k and v = 5gb total = 21gb on the card headroom = 3gb for prompts and tool call traces the magic is the kv cache type. most people leave it at default fp16 or push to q8 thinking quality wins. on qwen 3.6 27b dense at 262k: - fp16 kv cache = does not fit at all - q8 kv cache = fits at 23gb but runs 3x slower (double penalty: more vram, less speed) - q4_0 kv cache = fits at 21gb at full speed (40 tok/s flat curve, same speed at 4k or 262k) most builders never test the kv cache type because tutorials never mention it. it is the single biggest unlock on consumer 24gb hardware. flags i run: ./llama-server -m Qwen3.6-27B-Q4_K_M.gguf -ngl 99 -c 262144 -np 1 -fa on --cache-type-k q4_0 --cache-type-v q4_0 what they do: -ngl 99 = offload everything to gpu -c 262144 = 262k context window -np 1 = single user slot (do not enable multi-slot, eats headroom) -fa on = flash attention on (memory and speed both win) --cache-type-k q4_0 --cache-type-v q4_0 = the unlock if you are sitting on 24gb and not running this config, you are leaving 250k of context on the table. or worse, you are running q8 kv cache and burning 3x your speed for nothing. q4 is not a compromise on consumer hardware. it is the right call.

English

109

1.3K

75.8K

A@ababaka·19 Nis

Не прошло и 5 лет. Или прошло? speedtest.net/my-result/a/11…

Русский

A@ababaka·1 Nis

@kadirnardev Qwen released 27b about month ago. What are you talking about?

English

Kadir Nar@kadirnardev·31 Mar

The Qwen team is no longer releasing their models as open source, and this is a big problem for us. We need small models to train many models like TTS, STT, Omni, and others. Previously there was LLaMA, but they're no longer releasing either. The Qwen team won't be releasing anymore either. Our only hope is the LFM models. Minimax, Kimi, and GLM teams are releasing great models for open source, but none of them release small models. And if these companies also stop releasing open source, it's going to be really bad :(

English

893

74.2K

A@ababaka·30 Mar

@UnslothAI Hmm... Can it really fit on 16gb vram with 200k context? Highly doubt

English

386

Unsloth AI@UnslothAI·30 Mar

This model has been #1 trending for 3 weeks now. It's Qwen3.5-27B fine-tuned on distilled data from Claude-4.6-Opus (reasoning). Trained via Unsloth. Runs locally on 16GB in 4-bit or 32GB in 8-bit. Model: huggingface.co/Jackrong/Qwen3…

English

227

2.7K

208.8K

Entdecken

@0xSero @ornith_ @sudoingX @pupposandro @stevibe @ivanfioravanti @SwedPaul @jedisct1