El Deffo

3.1K posts

El Deffo

El Deffo

@eldeffo

Katılım Eylül 2010
551 Takip Edilen7 Takipçiler
El Deffo
El Deffo@eldeffo·
@bnjmn_marie there's some talk about broken chat templates in 3.6 on hf
English
0
0
1
206
Benjamin Marie
Benjamin Marie@bnjmn_marie·
Currently digging into my Qwen3.6 27B evals and some results are… weird. It’s clearly better than Qwen3.5 27B on some tasks, but worse on others, especially instruction-following. I also couldn’t reproduce their published GPQA Diamond score. In my setup, Qwen3.5 is significantly ahead. When I see stuff like this, I usually check Artificial Analysis, and they seem to get similar results Qwen3.5 >> Qwen3.6 on this one. I’ll share more next week once the full analysis is done, but right now Qwen3.6 27B doesn’t look like the obvious pick some people are making it out to be. Gemma 4 31B, or even Qwen3.5, can still be much better depending on whether you care about accuracy, efficiency, or the specific task.
English
24
2
128
11.8K
left curve dev
left curve dev@leftcurvedev_·
@JamesNumb3rs I’m using the UD-IQ3_XXS gguf for both 27B and 35B. Only have a single RTX 5080 (16gb) Try to use q8_0 kv cache (or no quant), I’m using q4_0 and can feel it doesn’t help
English
6
0
2
335
left curve dev
left curve dev@leftcurvedev_·
🥊 Qwen3.6 35B A3B vs Claude Sonnet 4.5 Making them fight on the same prompt 🐋 "Whale Song" challenge HTML canvas, no libraries Holy shit
English
17
8
190
28.4K
Poslední skaut™
Poslední skaut™@Posledniskaut·
Ľuboš Blaha, můj oblíbený stand-upista
79
7
363
24.2K
Ettore Di Giacinto
Ettore Di Giacinto@mudler_it·
APEX quantizations of more models ongoing! Meanwhile, playing with Qwen 3.5.. the impact of APEX vs Unsloth Dynamic quant on quality is clearly visible IMO, at least in some areas. I know we need more numbers before drawing conclusions, but this isn't about numbers. Just check out a simple prompt: "create an html page of a rotating cube in SVG." Left: Unsloth Qwen3.5-35B-A3B-UD-Q8_K_XL.gguf (48.7 GB, ~32 tok/s) → flat square (?????) Right: APEX Qwen3.5-35B-A3B-APEX-I-Quality.gguf (22.8 GB, ~53 tok/s) → ✨
English
4
1
36
4.1K
El Deffo
El Deffo@eldeffo·
@bnjmn_marie even with -np X and perhaps several server copies at the same time? maybe it would be nice to have a proper comparison of the engines too.
English
0
0
2
66
Benjamin Marie
Benjamin Marie@bnjmn_marie·
@eldeffo Vllm is much faster at scale, when running hundreds of concurrent queries
English
1
0
1
129
Benjamin Marie
Benjamin Marie@bnjmn_marie·
List of quantized Gemma 4 31B I’m evaluating: - Intel/gemma-4-31B-it-int4-AutoRound (19.2 GB) - cyankiwi/gemma-4-31B-it-AWQ-4bit (20.5 GB) - RedHatAI/gemma-4-31B-it-NVFP4 (23.3 GB) - nvidia/Gemma-4-31B-IT-NVFP4 (32.7 GB) - RedHatAI/gemma-4-31B-it-FP8-block (33.3 GB) → yes, NVIDIA’s NVFP4 checkpoint is as large as an FP8 checkpoint. This is what happens when you don’t quantize the attention layers of a dense model.
English
17
9
184
15.8K
El Deffo
El Deffo@eldeffo·
@bnjmn_marie generally the most interesting question is what can you fit into 11GB, 15GB, 23GB, 31GB... past that, it's just macs and rtx pros, and those can run almost anything anyway.
English
0
0
0
10
El Deffo
El Deffo@eldeffo·
@bnjmn_marie how so? llama with quants is consistently faster than vllm, at least every time I tried. also, maybe the battery could be reduced, small models - Q4_K_M, maybe IQ4_NL & IQ3_XSS, + some smarter Q2s on 200B+ models? those are probably the only ones that need to be tested really
English
2
0
0
94
El Deffo
El Deffo@eldeffo·
@LenSeaside @0xSero you can fit 27B UD-IQ3_XSS into 12GB, but only if you connect the display to motherboard.
English
0
0
0
47
Len Seaside
Len Seaside@LenSeaside·
@0xSero I would really appreciate a 12GB level for all the 3060 owners please. Are we better off with 9B variants or trying the 2.5 bit Unsloth 27B version? Or the A3B?
English
5
0
14
1.9K
0xSero
0xSero@0xSero·
Best models to run on your hardware level I'll be doing this every week, I hope you guys enjoy. ---- 8 GB ---- Autocomplete for coding (like Cursor Tab) - huggingface.co/NexVeridian/ze… - huggingface.co/bartowski/zed-… Tool calling, assistant style - huggingface.co/nvidia/NVIDIA-… ---- 16 Gb ---- Here things get better: Multimodal - huggingface.co/Qwen/Qwen3.5-9B - huggingface.co/Tesslate/OmniC… - huggingface.co/unsloth/Qwen3.… ---- 24 GB ---- - The best model you can get (thanks Qwen) huggingface.co/Qwen/Qwen3.5-2… - Great model (strong agents) huggingface.co/nvidia/Nemotro… - Mine hehe huggingface.co/0xSero/Qwen-3.… I'm doing a weekly series
English
220
367
3.7K
582.1K
El Deffo
El Deffo@eldeffo·
@LenSeaside @stevibe 27B UD-IQ3_XXS [-ngl 65 + 36K Q4 kv cache] 1100-1200pp 36-37 t/s but only if you connect the display to motherboard/CPU's iGPU, that will get you 1-3GB VRAM back.
English
0
0
1
22
Len Seaside
Len Seaside@LenSeaside·
@stevibe Can you include 9B in your analyses please? I only have a 12GB GPU. I would be very interested to know if it's much worse or not that bad etc. and what I can do to help it along in terms of where it fails.
English
4
0
7
2.6K
stevibe
stevibe@stevibe·
"122B has to be smarter than 27B" I showed 4 UI components to three Qwen3.5 models and asked them to recreate them from a screenshot alone: - 27B (dense) - 35B-A3B (MoE) - 122B-A10B (MoE) Same screenshot. Same prompt. Same task. Which one do you think nailed it?
English
54
74
940
100.5K
El Deffo
El Deffo@eldeffo·
@0xSero if llama had api to move layers in and out of VRAM, with this info the performance gains could be quite substantial
English
1
0
0
194
0xSero
0xSero@0xSero·
I made a dataset from every AI chat and session I ever made, and passed it to Qwen3.5-35B - 7.6% of the model handled for 50% of my requests - 27.5% over 80% of my requests That means I can technically get any model to 7.6%/3.8% in fp8 someone give me 1 million pls
0xSero tweet media
English
5
2
69
8.6K
El Deffo
El Deffo@eldeffo·
@Ben68638515 @jonasgahrstore @bundeskanzler good buddy, Art. 69(a) clearly states that insane people shouldn't post on the interwebz without a professional supervision, & your posts are in a clear violation. please rectify your misconduct immediately also - check yourself before you wreck yourself. it's not looking good.
English
1
0
0
39
Hans-Benjamin Braun
Hans-Benjamin Braun@Ben68638515·
@eldeffo @jonasgahrstore @bundeskanzler You are free to present scientific arguments against every single one of the 10 geophysical evidences. Until you have done that, your post represents intentional misinformation violating Art. 70(c) of the ICC Rome Statute.
English
1
0
4
137
Jonas Gahr Støre
Jonas Gahr Støre@jonasgahrstore·
A strong partnership between Norway and Germany will be even stronger. On energy, industry, defence, climate, space and support to Ukraine. Thank you @bundeskanzler Friedrich Merz for a substantive meeting in Berlin. It all amounts to mutual European security. (Photo: Uwe Koch)
Jonas Gahr Støre tweet media
English
119
45
304
21.7K
Hans-Benjamin Braun
Hans-Benjamin Braun@Ben68638515·
Did you in your conversation with BK Merz also address the fact that it was your very own, Norwegian tax funded seismic agency NORSAR (joint venture with US Los Alamos Nuke Lab) which deliberately covered up the nuclear nature of the destruction of Nordstream: Indeed, Nordstream was nuked (sic!) as part of the US/NATO extortion racket: US LNG export capacity increased from ~0 in '16 to that of Nordstream 1+2 in '20 (within few percent) before Nordstream was destroyed by a Mini-Nuke under US/NATO auspices exactly on Donetsk Referendum Day, serving as covert shock wave attack towards Kaliningrad as evidenced by seismic measurements. This was indeed a US/NATO masterstroke: The covert nuclear nature of this attack subjugated DE and DK unconditionally to US/NATO orders, and coerced SW & FI into NATO. A summary of my evidence that proved the nuclear (!) destruction of Nordstream 1+2 beyond the shadow of a doubt was presented to the UN Security Council on Sept 26 2023 (SC/15422). Instead of following up on my report, responsibility was offloaded to authorities of Sweden, Denmark and Germany with the former two promptly aborting their investigations shortly afterwards, while the matter was - to this very day - intentionally buried by DE. x.com/Ben68638515/st…
English
2
6
24
10.8K
El Deffo
El Deffo@eldeffo·
@Princip_on @RALee85 buddy, even the glorious soviet union lost like half of the many wars it started. and russia is no soviet union.
English
0
0
3
35
Gavrilo Princip
Gavrilo Princip@Princip_on·
@RALee85 Even if it's 10x worse, the Russians will prevail over Ukraine. That's the goal and nothing will stop them. Looking back at russian history, even the most deranged russophobes can't picture a scenario where the SMO can be turned around in Ukraine's favor.
English
4
0
1
442
Rob Lee
Rob Lee@RALee85·
“Guided by President Vladimir Putin's crack team of economic officials, Russia has reported better average growth than the euro zone over the past four years despite being hit by some 24,000 Western sanctions. But high interest rates, higher taxes, rising prices and a $20-per-barrel discount for Russian oil are taking their toll - even in Moscow, a vast urban area of 22 million people that has been largely insulated from the worst impact of Europe's deadliest war since World War Two. "To let" signs are prominent in retail spaces across the capital. Sales of new light commercial vehicles and trucks, a good indicator for the health of the retail and construction industry, fell by 38% to 147,000 units in 2025 and have continued to fall in the first weeks of 2026, Autostat said. Data from Sberbank, which as Russia's biggest bank sees the ripples of expenditure across the economy, showed that the fall in the number of catering outlets in January was the biggest since 2021 and that restaurant spending hit the lowest in three years in November-early December 2025.” reuters.com/business/russi…
Rob Lee tweet media
English
21
69
490
61.3K
Tomas Kouba
Tomas Kouba@tomaskouba·
@eldeffo @Posledniskaut Až na to, že máte špatně budoucí čas. Některým z nás to bylo jasné už před dvěma lety. Jen rychlost opravdu překvapila.
Čeština
1
0
0
48
Poslední skaut™
Poslední skaut™@Posledniskaut·
Velké poděkování Slovákům. Mohli jste nám teď ty dva roky našeho pošklebování dát pěkně sežrat, ale neudělali jste to. Naopak projevujete nesmírnou účast. Ještě jednou díky.
Čeština
80
77
3.2K
47.8K
El Deffo
El Deffo@eldeffo·
@Posledniskaut a okrem toho sme dost unaveni zo svojich kokotov, na vasich nezostava energia. uzite si to. bude to jizda.
Čeština
1
0
1
111
El Deffo
El Deffo@eldeffo·
@Posledniskaut neboj, tie 2 roky ceskeho kokotizmu zostali ulozene v pamati. a na to, ze cesi kompletne prestali vnimat rozdiel medzi realitou a svojimi smiesnymi predstavami o svojej nadradenosti sme si uz davno zvykli. ze delate machry a hajzl mate na chodbe, nie je nic nove.
1
0
5
573
El Deffo
El Deffo@eldeffo·
@BarronTNews_ didn't stun anyone, noone in EU gives a damn about this clown. also, will 💯% end up in jail for all the corruption after the next election.
English
0
0
0
5
ⁿᵉʷˢ Barron Trump 🇺🇸
🚨 HOLY CRAP. Slovakia PM Robert Fico just STUNNED the EU after speaking with Donald Trump and Marco Rubio about Greenland. “Trump is clearly pursuing the nation-state interests of the United States. If the EU acted the same way, we would be in a completely different position.” BOOM. 🔥 And then he went for the jugular. “World leaders do not take the EU fully seriously. That’s because of our nonsensical climate targets and our suicidal migration policy.” That’s the truth they’re terrified to say out loud. America First works. National interest works. Strength works. This guy is BASED. 💯💯
English
174
3.2K
16.4K
563.4K
Robin Ebers | AI Coach for Founders
just trying running GLM 4.7 Flash locally in Open Code terrible experience I got 96 GB memory, model ran so slowly it was unbearable anyone saying open source models running on your computer are the future, they're either lying to you or are insanely delusional lol
English
126
8
264
46.6K