Draneil Mifa

220 posts

Draneil Mifa

Draneil Mifa

@DragonGroky

Katılım Şubat 2025
259 Takip Edilen5 Takipçiler
Draneil Mifa
Draneil Mifa@DragonGroky·
@0xCVYH I use flux Klein 9B with comfyui . Is there a node to load this new polarquant ?
English
1
0
1
40
CV.YH
CV.YH@0xCVYH·
@DragonGroky PolarQuant uses a different approach than GGUF. Hadamard rotation + Lloyd-Max optimal centroids give better quality at the same bitrate. GGUF packs bits, PolarQuant optimizes the quantization itself. Working on llama.cpp integration tho
English
1
0
0
57
CV.YH
CV.YH@0xCVYH·
PolarQuant agora quantiza modelos de IMAGEM. FLUX.2 klein 9B com Q5: 0.9986 de similaridade com o original. Praticamente identico. 121 camadas quantizadas, 304 preservadas em BF16. Hadamard + Lloyd-Max funcionando em diffusion models. Alguem pediu hoje e ja ta pronto. huggingface.co/caiovicentino1…
Português
6
12
140
8.9K
Draneil Mifa
Draneil Mifa@DragonGroky·
@0xCVYH cool, but why it is not a gguf as usual on github ?
English
1
0
1
48
CV.YH
CV.YH@0xCVYH·
@DragonGroky Yes, fits perfectly on a 3090 24GB. PQ5 quality is actually better than standard Q8 at similar size because of the Hadamard rotation + Lloyd-Max centroids. 0.996 cosine similarity vs original. Give it a try and compare
English
1
0
1
293
Draneil Mifa
Draneil Mifa@DragonGroky·
@MyopicRaccoon @LottoLabs @no_stp_on_snek i ve a 3090 mais reachs only 20token/s ! can you share your llama.cpp parameters ? i've also turboquant version. llama-server -m models-d/Qwopus3.5-27B-v3-Q5_K_M.gguf -c 128000 \ -ctk turbo3 -ctv turbo3 \ -fa on -np 1 \ --cache-ram 4096 \ --ctx-checkpoints 4 \ --fit on
English
1
0
0
119
Myopic Raccoon
Myopic Raccoon@MyopicRaccoon·
@LottoLabs Here's what I am running, Qwen3.5-27B-UD-Q5_K_XL.gguf, with @no_stp_on_snek's turboquant (turbo4), 128K context (I could double if I drop to Q4), on a single RTX 3090/24GB. Getting a solid 31 tok/s. and Hermes 0.70. Heaven.
English
5
1
18
851
Lotto
Lotto@LottoLabs·
People yearn for qwen 27b and Hermes agent Tailscale, used desktop You’re cooking
English
7
1
109
5.7K
Eric ⚡️ Building...
REQUESTED: Qwen3 30B Coder-Next? The community delivered 🔥 mradermacher/Qwen3-Coder-30B-A3B-Instruct-480B-Distill-V2-Fp32-i1-GGUF ⚡ Distilled from 480B into 30B Coder 📈 SWE-Rebench / agentic coding ✅ runs perfectly local with GGUF quants Exactly what you wanted, consumer-hardware friendly: huggingface.co/mradermacher/Q…
Eric ⚡️ Building... tweet media
English
5
5
99
6.4K
Draneil Mifa
Draneil Mifa@DragonGroky·
@IRMC16 @sudoingX i tried you config but it is at about 20 tokens / sec. It is because you have a 4090 and me a 3090 ?
Draneil Mifa tweet mediaDraneil Mifa tweet media
English
1
0
0
8
IRMC
IRMC@IRMC16·
@DragonGroky @sudoingX - "--cache-type-k" - "q4_0" - "--cache-type-v" - "q4_0" - "--host" - "0.0.0.0" - "--port" - "8000" - "--alias" - "Qwen3.5-27B"
2
0
1
18
Sudo su
Sudo su@sudoingX·
people keep asking me what model to run on a single 3090. it's not even close. Qwen 3.5 27B dense Q4_K_M. undisputed.
kumikumi (Ankkala)@ankkala

@sudoingX to be clear, which model / quantization did you run on the 3090?

English
49
17
551
34K
Draneil Mifa
Draneil Mifa@DragonGroky·
@IRMC16 @sudoingX I was having same expectations for Gemma-4 31B but I saw that it is only better for chating but not for coding . We have to wait for qwen 3.6 ☺️
English
0
0
1
40
IRMC
IRMC@IRMC16·
@DragonGroky @sudoingX I was waiting for Gemma-4-31B with TurboQuant cache. To swap out Qwen3.5-27B. My expectation was: Bigger is better. Apparently not for my use case. x.com/TeksEdge/statu… x.com/leftcurvedev_/…
left curve dev@leftcurvedev_

everyone is wondering the same thing qwen3.5 27b or gemma 4 31b? new benchmarks from @ArtificialAnlys are out, let's dig the numbers: 💻 coding index > gemma wins surprisingly it was the best and scored 42 — it managed to handle more coding tasks successfully than qwen, very interesting! 👀 🤖 agentic index > qwen destroys gemma when it comes to tool calls, multi-step reasoning, and autonomous task execution, there's no need to talk about it — qwen is the absolute winner in the category, scoring 55 (!) vs 41 for gemma 🤯 👑 the winner > qwen3.5 27b stays undefeated gemma could have been an amazing contender but in agentic tasks it's just too much behind, compared to what qwen has to offer — makes no sense to use it if your tasks are heavy and need reasoning what are your thoughts?

English
1
0
1
80
ハカセ アイ(Ai-Hakase)🐾最新トレンドAIのためのX 🐾
③ 日本語1クリックで完結!最強の全自動ワークフロー🤖 「設定が難しそう⋯」という方もご安心ください! 最新のAI「Qwen 3.5」を組み合わせれば、日本語で「夕暮れを歩く少女」と入力するだけでOK✨ AIが自動で最適な英語プロンプトやカメラワークの指示を作ってくれます。 面倒な計算もすべてAIにお任せして、最高の一本を作りましょう!
日本語
2
0
12
1.1K
ハカセ アイ(Ai-Hakase)🐾最新トレンドAIのためのX 🐾
【LTX-2.3覚醒】映像が劇的に動き出す「3-Pass Sampling」の衝撃!🚀✨ 動画生成AIで「動きが硬い⋯」と悩んでいませんか? 生成時間はそのままで、クオリティを極限まで高める「チート級の裏技」が登場しました! 👇️ このスレッドで解説していきます! #AI動画 #LTX2 #動画生成AI #生成AI
日本語
4
10
185
21.6K
Draneil Mifa
Draneil Mifa@DragonGroky·
@loktar00 for a 3090, it is the good parameters for using TurboQuant ? export PATH=/ia/llama-cpp-turboquant-cuda/build/bin:$PATH llama-server -m models-d/Qwopus3.5-27B-v3-Q4_K_M.gguf \ -c 132000 -n 4096 \ --cache-type-k q8_0 --cache-type-v turbo3 \ -fa on -np 1 -ngl 99
English
1
0
0
1.8K
Loktar 🇺🇸
Loktar 🇺🇸@loktar00·
TurboQuant is getting ported to llama.cpp and if it actually delivers 6x memory compression with zero accuracy loss.... your 24GB 3090 basically becomes a 144GB card for KV cache. The local inference cost equation keeps getting more insane every week
English
70
113
2K
78.7K
Draneil Mifa
Draneil Mifa@DragonGroky·
@IRMC16 @sudoingX really good 34 t/s. would you mind to share the llama.cpp command ? i'm at 17 t/s with 3090, i guess i should reach similar speed as you ?
English
1
0
0
21
Draneil Mifa
Draneil Mifa@DragonGroky·
@IRMC16 @sudoingX thanks. you are also on win11 with WSL Linux for turbo quant? This is my old conf: llama-server -m models/Qwen3.5-27B-Q4_K_M.gguf --alias "Qwen3.5-27B-Q4_K_M" \ --n-gpu-layers 99 \ -c 262144 \ --cache-type-k q4_0 \ --cache-type-v q4_0 \ --port 8000 --host 0.0.0.0 \ -fa on
English
0
0
0
26
IRMC
IRMC@IRMC16·
@DragonGroky @sudoingX For me 196608 is a sweet spot. Larger amounts get flushed into RAM, dropping the performance.
English
1
0
0
11
Draneil Mifa
Draneil Mifa@DragonGroky·
@IRMC16 @sudoingX How many token /sec do you have with your setup ? I was using Q4 K_M and 200k context length but switched to Q5 with turbo quant and reduced content length
English
2
0
1
28
IRMC
IRMC@IRMC16·
@DragonGroky @sudoingX My Qwen3.5-27B-Q4_K_M implementation (third compose) has a context length of 196608. It runs flawless with OpenClaw, OpenCode/GSD and my home brewed LangGraph agents such as a Docling/LangExtract agent. I was hunting 512k with TQ3_4S.
IRMC tweet media
English
1
0
0
29
Draneil Mifa
Draneil Mifa@DragonGroky·
@sudoingX Using Hermes and coPaw, I ve better result with coPaw (LLM : qwen 27B Q5 turbo quant). Using on a 3090
English
0
0
0
63
Sudo su
Sudo su@sudoingX·
if you're still on openclaw bloatware in april 2026, you're not a builder. you're a tourist that has 0 idea how things are accelerating. try hermes agent for yourself. take that step and make the switch today anon. and i'm not the one calling it bloat that came from builders in my dms and timeline. bloat is their words, not mine. and if you have already made the switch, let people know your experience so more builders on bloated tools become aware. you deserve a better tool in this fast changing field.
Sudo su tweet media
Teknium (e/λ)@Teknium

Hermes Agent is the third fastest growing GitHub repo this week!

English
7
5
92
6.2K
Draneil Mifa
Draneil Mifa@DragonGroky·
@IRMC16 @sudoingX i tried hermes and not so good. it was lost in loop. Now i use Copaw (Alibaba agent) and it works really good. Python code generating HTML dashboard from different API source.
English
1
0
1
42
Draneil Mifa
Draneil Mifa@DragonGroky·
@IRMC16 @sudoingX i ve a 3090 and run also eveything locally. For now i'm using Qwopus3.5 27B Q5 K_M and it is about 15/20 t/s with 130k context length
English
2
0
1
44
Draneil Mifa
Draneil Mifa@DragonGroky·
Using Qwopus3 Q5 and Hermes is not so good, Hermes is quickly running in loop to try to patch but with copaw, i don't have this problem, it looks much better and doesn't go in loop.
Draneil Mifa tweet mediaDraneil Mifa tweet media
English
0
0
0
22
Draneil Mifa
Draneil Mifa@DragonGroky·
@sudoingX on my 3090, I'm running Q5_K_M. llama-server -m models-d/Qwopus3.5-27B-v3-Q5_K_M.gguf --alias "Qwen3.5-27B-Q5-Qwopus3" \ -c 128000 \ -ctk turbo3 -ctv turbo3 \ --port 8000 --host 0.0.0.0 \ -fa on -np 1 \ --cache-ram 4096 \ --ctx-checkpoints 4 \ --fit on \ --reasoning off
English
0
0
0
203
Ahmad
Ahmad@TheAhmadOsman·
you like Chinese opensource models then use Qwen 3.5 27B you like American opensource models then use Gemma 4 31B both can run easily on consumer hardware at home and they’re State of The Art models
English
63
65
1.1K
38.4K
Draneil Mifa
Draneil Mifa@DragonGroky·
@TheAhmadOsman Personally I don’t care it is Chinese , American , French or whatever , I just want an LLM running on my 3090 and able to gen quality code like sonnet 4.5 or Opus 4.6. For now I am using Qwopus3 Q5 + turboquant llama.cpp version but happy to try Gemma if better
English
0
0
0
79