bgeneto

122 posts

bgeneto

@netobge

가입일 Kasım 2022

139 팔로잉9 팔로워

bgeneto@netobge·23h

@Tech2Wild Here it is... ready to run docker compose file: #aAwaBfG8tE2I8LRhAT-UKQ/dHKNUNrpSIUqbj2n8qDGNA" target="_blank" rel="nofollow noopener">send.bitwarden.com/#aAwaBfG8tE2I8…

English

Tech2Wild@Tech2Wild·1d

@netobge Can you send me that recipe

English

353

Tech2Wild@Tech2Wild·1d

Hmm wondering which is better: Gemma 4 12B or Qwen 3.6 35B-A3? 🤔

English

18.7K

bgeneto@netobge·2d

@acadictive This 30x price increase should be considered illegal by any means!

English

Ehsan@acadictive·4d

Dear GitHub Copilot team, I am happy to announce that I successfully burned all of my monthly tokens in under 3 days thanks to your garbage new pricing model. I'd also like to inform you that I won't be renewing my subscription or adding more budget. Best, A former customer.

English

311

269

3.8K

723.8K

bgeneto@netobge·2d

@acadictive Burned in the first day. You are a hero! @GitHubCopilot new way of counting tokens is completely unfair.

English

bgeneto@netobge·5d

@TheAhmadOsman Qwen3.7 Max is not open-source/open-weights, right? So OP is probably right, Let's see how M3 stacks against Kimi though

English

672

Ahmad@TheAhmadOsman·5d

Kimi K2.6 is still the current opensource SoTA model

English

558

54.2K

bgeneto@netobge·30 May

@danieltvela @The_Only_Signal It really is... but then what would we call Qwen3.6 35B A3B (called Qwen3.6 Flash by Alibaba) running at 160tps on a single rtx 3090 with 262k context in vLLM vs 37 tps 27B then?

English

117

Daniel T. Vela@danieltvela·30 May

@The_Only_Signal Qwen3.6-27B is a miracle

Magyar

3.2K

Mike Bradley@The_Only_Signal·30 May

Spent the past 48 hours benchmarking Step3.7-Flash vs Qwen3.5-397B. Conclusion: People don’t appreciate Qwen3.6-27B enough.

English

579

37.2K

bgeneto@netobge·30 May

@leo_linsky @The_Only_Signal Interesting to see the Qwen3.6 flash, a 35B A3B MoE, ahead of 27B dense and all the hype going to the much slower 27B

English

104

Leo Linsky@leo_linsky·30 May

@The_Only_Signal Qwen 3.6 27B is such a ridiculous outlier in our testing that we had to re-evaluate our whole methodology. Data holds up. Incredible amount of intelligence AND agentic tool progress packed into a small param count. Data at gertlabs.com/rankings

English

1.7K

bgeneto@netobge·26 May

@CarlosZarattini Deixa ver se eu entendi sua posição: então o dado que desmonta essa mentira de vez diz que menos de 3% dos beneficiários trabalham de carteira assinada... Puxa, 3% é realmente um desmonte...

Português

Carlos Zarattini@CarlosZarattini·24 May

Absurda e desinformada essa declaração contra o Bolsa Família. É inadmissível que ainda seja preciso repetir o óbvio. O Bolsa Família é, sim, um estímulo à mobilidade social. O programa garante comida na mesa de quase 50 milhões de brasileiros e ajuda famílias inteiras a atravessarem a pobreza com dignidade. Entre 2023 e 2024, 8,6 milhões de pessoas saíram da pobreza e 1,9 milhão deixou a extrema pobreza no Brasil. E o dado que desmonta essa mentira de vez: em 2024, beneficiários do Bolsa Família ocuparam 1,2 milhão de postos formais de trabalho. Quem diz que o Bolsa Família “acomoda” simplesmente despreza a realidade do povo brasileiro. poder360.com.br/poder-economia…

Português

281

972

6.5K

bgeneto@netobge·13 May

@cjzafir Have you published your model? Where?

English

CJ Zafir@cjzafir·12 May

Qwen 3.5 has the best SLMs to fine-tune! Its 4B model is really smart if you train it on a well structured dataset. I fine-tuned the model on a 135M dataset generated by Codex 5.5 + DeepSeek v4 Pro. I achieved 96%+ accurate results with Qwen 3.5 4B. And 95% on Qwen 3.5 2B (that only requires 3.5GB RAM). For context, on the same pipeline: > Sonnet 4.6 achieved 89% > GPT 5.4 Mini achieved 85% > Haiku 4.5 achieved 72% I don't trust evals, so I ran a 7000+ row hard-boundary test, and the results of Qwen 3.5 were consistent. A 4B fine-tuned model beating a 20x bigger model in accuracy and latency is no joke. It cost me $173 in total to generate the dataset and cover the cloud GPU cost to fine-tune both models. I said this before, and I'll say it again: not everything requires a 1T-parameter LLM. We need ELMs (Expert Language Models) that are specialized for one domain only. ELMs > LLMs. I'll be writing more about how SLM fine-tuning works. So stay tuned.

English

695

28.1K

bgeneto@netobge·13 May

@TeksEdge While anyone with a much cheaper RTX 3090 can run Intel/Qwen3.6-35B-A3B-int4-mixed-AutoRound with vLLM at 150t/s without MTP with fp8 kv cache and 128k context.

English

583

David Hendrickson@TeksEdge·12 May

🤯 Unsloth released the fastest Qwen3.6-27B MTP GGUF I've tested. Time to upgrade. Compared to the previous GGUF, Q4/Q6 XL versions are 👀 ~55% faster! On a single RTX 5090: ✅ 114 tok/s — UD-IQ2_M (MTP) ✅ 93 tok/s — UD-Q4_K_XL (MTP) ✅ 75 tok/s — UD-Q6_K_XL (MTP) 💨Fastest MTP quant is 3.3x faster than the old Q8_0 baseline (35 tps) 262K context + tool calling. All on one 5090. * compiled from the MTP PR branch ('am17an:mtp-clean', build b9117-ebe4fca4b)

English

521

46.7K

bgeneto@netobge·3 May

@loktar00 27B quality with 35B tps would be a dream with 3090. But all those inflated numbers with 27B and 3090 are unreal and quality compromised... 35B with vLLM is so stable that I stopped searching for a better alternative with a single 3090.

English

Loktar 🇺🇸@loktar00·2 May

I wish 3.6 35B was just a little better.. the speeds I'm getting are insane.

English

172

13.4K

bgeneto@netobge·3 May

@malikwas1f @largePrawn I did... Several times, since day-0 and also today. No more then 60 tps with 2x3090. Much better tps with single gpu and qwen3.6 35B (130 tps) without even spec dec.

English

noname@malikwas1f·2 May

@netobge @largePrawn Those numbers are real life and real time. Go check out the repo and do it yourself.

English

Tony Ge@largePrawn·2 May

Hitting 140 tok/s on Qwen 3.6 27B running vLLM with 2x 3090s using the following @malikwas1f's repo github.com/noonghunna/clu… Literally just pointed claude at it and walked away. Came back to a 2.5x speed bump 🤯🤯🤯

English

346

27.8K

bgeneto@netobge·2 May

@rafaon3 @luksamuk A3B tá lento, tenta o Intel/Qwen3.6 35B AutoRound com vLLM, consigo 130 tok/s com ele e 60 tok/s com o Qwen3.6 27B, mas não uso pq esses modelos pensam d+ e 60 tps fica extremamente lento para codar com 128k tokens.

Português

rafaon3@rafaon3·2 May

@luksamuk Rodo o a3 a 90tks na 3090 e 27 a 41tks não entendo mas aceito

Português

Lucas@luksamuk·2 May

Até agora, o coding champion aqui, numa RTX 3050 com 6GB de VRAM foi o Qwen 3.6 35B-A3B. Quantização: UD-Q3_K_L. Arquitetura MoE ajuda com velocidade; qualidade inigualável; bom tradeoff com velocidade. Não duvido que o 27B faça coisa melhor, mas é lento que dói (limitação minha)

Português

3.5K

bgeneto@netobge·29 Nis

@EnioViterbo Que moral ele tem pra falar assim? Mundo louco esse, ministro do STF fingindo que embolsar +80 milhões e tá tudo bem, vida que segue. Só no Brasil.

Português

109

Enio Viterbo@EnioViterbo·28 Nis

Pelo amor de Deus. Se controla, Alexandre. O ministro Alexandre de Moraes aproveitou o julgamento de um processo do deputado Gustavo Gayer contra um outro deputado e simplesmente começou a mandar indiretas para o Romeu Zema. Um completo desvio de finalidade. Um desrespeito com o dinheiro público. Um desrespeito com o Direito e com o processo penal. Um desrespeito com o STF. Os ministros Alexandre de Moraes e Gilmar Mendes têm que aprender que não é porque tem um microfone ali na mesa que eles podem falar qualquer coisa. A sessão de julgamento dos processos é DOS PROCESSOS. Não é pra cantar. Não é pra recitar poesia. Não é pra mandar recados políticos. Se quiserem um microfone e uma bancada para dar recados políticos, candidatem-se ao Congresso.

Português

510

2.1K

11.8K

264.7K

bgeneto@netobge·28 Nis

@MemoryReboot_ I'm getting 130-160 tok/s with one RTX 3090 and Intel AutoRound Qwen3.6 35B A3B without any spec dec. So you certainly have a regression in speed here.

English

298

Mass@MemoryReboot_·27 Nis

DFlash benchmarks on dual RTX 3090 Qwen3.6-35B-A3B AWQ-INT4 + DFlash drafter on vLLM nightly, TP=2 Tried different num_speculative_tokens to find what works: - n=4: 96.6 tok/s - n=8: 96.0 tok/s - n=15 (z-lab's recommended): 20-40 tok/s n=4 is a sweet spot For comparison, Qwen3.6-35B-A3B Q6 on llama.cpp gives me 102 tok/s on the same hardware ☹️ What am I doing wrong?

English

4.7K

bgeneto@netobge·25 Nis

@spiritbuun Still slower... forgot to mention that gguf model used is: lmstudio-community/Qwen3.6-27B-GGUF cmake -B build --fresh \ -DGGML_CUDA=ON \ -DGGML_NATIVE=ON \ -DGGML_CUDA_FA=ON \ -DGGML_CUDA_FA_ALL_QUANTS=ON \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_CUDA_ARCHITECTURES=86

English

buun@spiritbuun·25 Nis

@netobge Can you repull from master, build, and try it again? Might have fixed it

English

buun@spiritbuun·23 Nis

Pushed: DFlash implementation for llama-cpp. buun-llama-cpp/llama-server -m Qwen3.6-27B.gguf -md dflash-draft-q4_k_m.gguf --spec-type dflash

Dansk

416

39.7K

bgeneto@netobge·25 Nis

@spiritbuun ./build/bin/llama-server \ -m ~/models/Qwen3.6-27B-Q4_K_M.gguf -md ~/models/dflash-draft-3.6-q4_k_m.gguf --spec-type dflash \ --reasoning on \ --reasoning-budget -1 \ --ctx-size 32000 \ --fit off \ -ngl 99 -ngld 99 \ --flash-attn on \ -ctk q8_0 \ -ctv q8_0...

English

buun@spiritbuun·25 Nis

@netobge Can you paste me your llama flags so I can reproduce?

English

bgeneto@netobge·25 Nis

@unbug @elliotarledge Yes. Bunn llama cpp has DFlash support but no luck for me, it worked but slower

English

unbug@unbug·25 Nis

@elliotarledge Any news for llamacpp?

English

1.7K

Elliot Arledge@elliotarledge·25 Nis

DFlash is the future of inference. huggingface.co/z-lab/Qwen3.6-…

English

484

39.6K

bgeneto@netobge·25 Nis

@PelicanInvasion @elliotarledge Not for me, no speed gains with the already fast 35B MoE

English

PelicanInvasion@PelicanInvasion·25 Nis

@elliotarledge How fast on 5090 GPUs? Any work on Qwen 3.6 35b for even more speed?

English

1.7K

탐색

@Tech2Wild @acadictive @GitHubCopilot @TheAhmadOsman @danieltvela @The_Only_Signal @leo_linsky @CarlosZarattini