A
6.3K posts


Aloha! 🌺 Meet Ornith-1.0, a family of open-source LLMs specialized for agentic coding.
Ornith-1.0 spans the full parameter sizes including 9B Dense, 31B Dense, 35B MoE, and 397B MoE. It achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks including:
✅Terminal-Bench 2.1(77.5)
✅SWE-Bench(82.4 on verified, 62.2 on pro, 78.9 on Multilingual)
✅NL2Repo(48.2)
✅SWE Atlas(41.2 on QnA, 42.6 RF, 39.1 TW)
✅ClawEval(77.1)
Post-trained on top of gemma4 and qwen3.5, Ornith-1.0 employs a novel self-improving training strategy in which reinforcement learning is used to generate not only solution rollouts, but also the task-specific scaffolds that drive those rollouts. By jointly optimizing the scaffold and the resulting solution, the model generate higher-quality solutions in agentic coding.😎
All models are released under the MIT license, enabling full commercial and research use.
📖Tech Blog: deep-reinforce.com/ornith_1_0.html
🤗Huggingface: huggingface.co/collections/de…

English

genuinely fucking wild how underrated stepfun's step 3.7 flash is in the dgx spark world. barely anyone's talking about it, and it's the single best model you can run on one spark, full stop.
198B, vision, the full context, i've been living in it and the gap between how good it is and how little it gets discussed is absurd. dropping my benchmarks tonight so you can see exactly what a single dgx spark is capable of in local ai.
English

cancel your chatgpt subscription for a month. buy a single used 3090, call it a grand. run qwen 3.6 27b dense on it and let it grind on your actual work, the code, the drafts, the boring research.
here's what happens. you go the whole month and barely hit a wall. the few times you do, you clock that THAT's the 10% you actually needed the frontier for, and the other 90% a card sitting in your room handled just fine.
most people pay every month for capability they touch a handful of times. own the 90%, rent the rest only when you hit the wall. trust me anon, you won't look at that subscription the same again.
English

@ivanfioravanti Looks like q3 is total garbage.. look at metrics huggingface.co/AesSedai/Step-…
English

Step-3.7-Flash 3bit gs32 added, it will work on a 128GB machine, does it work well? Let me do some test.
huggingface.co/collections/iv…
English

Nvidia покажет новые ARM-чипы со встроенной графикой. Первые устройства это ноутбуки от Microsoft
Ну а дальше нам придется обновлять наши ПК — и это того будет стоить. Ведь поможет сократить отставание от Apple устройств на М-чипах
(Это мое предположение и не является истиной)
NVIDIA@nvidia
A new era of PC. 25.0528, 121.5990
Русский

I’ve just released MiMo V2.5-Coder. If you have 128 GB of RAM, this is one of the best models you can run locally. It’s fast, and in all my experiments it outperformed Qwen 3.6 and DeepSeek 4-Flash. huggingface.co/jedisct1/MiMo-…
English

@LottoLabs I wonder if there are agents who can parallel tasks in multiple threads well. In this case, 128GB of spark can be useful.
English

This is crazy because 27b running above 30TPS on a spark gives you actual usability and tons of room left over
Banana@banana_baeee
My DFlash decode optimized numbers are here for 3.6 - quite variable, but can make a big difference. I am hoping to combine the decode and prefill optimizations into one fast 27B dense solution and get the best of both! localmaxxing.com/runs/cmomgvsoo…
English

you cannot describe the taste of free thinking until you own the machine that runs it.
the day the model moves from someone else's cloud to your own desk, the way you think changes.
every kid in this decade should grow up with their own gpu and their own model. owning the tool that shapes your thinking is not a flex, it is a foundation, like owning your own books used to be when libraries were the only place to access them.
the gpu market has not been this accessible in 5 years. used 3090 for around $900 on facebook marketplace. used 4060 ti 16gb for $500. used 3060 12gb for $350. they all run real local ai today. the floor is on the floor.
if you have not bought a gpu yet, this is the year. buy a gpu.
English

@ivanfioravanti Take a look at this. Probably you may add one more competitor youtu.be/ZwCbChJWXkQ

YouTube
English

@LottoLabs Where did the notes go? without run parameters, this site loses its meaning.
English

"how do you fit qwen 3.6 27b q4 on 24gb at 262k context" lands in my dms 5 times a week. here is the exact memory math.
model bytes at idle = 16gb (q4_k_m of 27b dense)
kv cache at 262k context with q4_0 for both k and v = 5gb
total = 21gb on the card
headroom = 3gb for prompts and tool call traces
the magic is the kv cache type. most people leave it at default fp16 or push to q8 thinking quality wins. on qwen 3.6 27b dense at 262k:
- fp16 kv cache = does not fit at all
- q8 kv cache = fits at 23gb but runs 3x slower (double penalty: more vram, less speed)
- q4_0 kv cache = fits at 21gb at full speed (40 tok/s flat curve, same speed at 4k or 262k)
most builders never test the kv cache type because tutorials never mention it. it is the single biggest unlock on consumer 24gb hardware.
flags i run:
./llama-server -m Qwen3.6-27B-Q4_K_M.gguf -ngl 99 -c 262144 -np 1 -fa on --cache-type-k q4_0 --cache-type-v q4_0
what they do:
-ngl 99 = offload everything to gpu
-c 262144 = 262k context window
-np 1 = single user slot (do not enable multi-slot, eats headroom)
-fa on = flash attention on (memory and speed both win)
--cache-type-k q4_0 --cache-type-v q4_0 = the unlock
if you are sitting on 24gb and not running this config, you are leaving 250k of context on the table. or worse, you are running q8 kv cache and burning 3x your speed for nothing.
q4 is not a compromise on consumer hardware. it is the right call.
English

The Qwen team is no longer releasing their models as open source, and this is a big problem for us. We need small models to train many models like TTS, STT, Omni, and others. Previously there was LLaMA, but they're no longer releasing either. The Qwen team won't be releasing anymore either. Our only hope is the LFM models.
Minimax, Kimi, and GLM teams are releasing great models for open source, but none of them release small models. And if these companies also stop releasing open source, it's going to be really bad :(
English

@UnslothAI Hmm... Can it really fit on 16gb vram with 200k context? Highly doubt
English

This model has been #1 trending for 3 weeks now.
It's Qwen3.5-27B fine-tuned on distilled data from Claude-4.6-Opus (reasoning). Trained via Unsloth.
Runs locally on 16GB in 4-bit or 32GB in 8-bit.
Model: huggingface.co/Jackrong/Qwen3…

English


