El Deffo

3.1K posts

El Deffo

@eldeffo

Katılım Eylül 2010

551 Takip Edilen7 Takipçiler

El Deffo@eldeffo·4d

@bnjmn_marie there's some talk about broken chat templates in 3.6 on hf

English

206

Benjamin Marie@bnjmn_marie·4d

Currently digging into my Qwen3.6 27B evals and some results are… weird. It’s clearly better than Qwen3.5 27B on some tasks, but worse on others, especially instruction-following. I also couldn’t reproduce their published GPQA Diamond score. In my setup, Qwen3.5 is significantly ahead. When I see stuff like this, I usually check Artificial Analysis, and they seem to get similar results Qwen3.5 >> Qwen3.6 on this one. I’ll share more next week once the full analysis is done, but right now Qwen3.6 27B doesn’t look like the obvious pick some people are making it out to be. Gemma 4 31B, or even Qwen3.5, can still be much better depending on whether you care about accuracy, efficiency, or the specific task.

English

128

11.8K

El Deffo@eldeffo·26 Nis

@leftcurvedev_ @JamesNumb3rs connect your display to motherboard iGPU, you'll save yourself 1-3GB of VRAM and will be able to use Q4

English

left curve dev@leftcurvedev_·26 Nis

@JamesNumb3rs I’m using the UD-IQ3_XXS gguf for both 27B and 35B. Only have a single RTX 5080 (16gb) Try to use q8_0 kv cache (or no quant), I’m using q4_0 and can feel it doesn’t help

English

335

left curve dev@leftcurvedev_·26 Nis

🥊 Qwen3.6 35B A3B vs Claude Sonnet 4.5 Making them fight on the same prompt 🐋 "Whale Song" challenge HTML canvas, no libraries Holy shit

English

190

28.4K

El Deffo@eldeffo·13 Nis

@Posledniskaut v slove kokot tam mas nejak privela preklepov.

Slovenščina

203

Poslední skaut™@Posledniskaut·13 Nis

Ľuboš Blaha, můj oblíbený stand-upista

363

24.2K

El Deffo@eldeffo·6 Nis

@mudler_it you should compare UD_Q4_K_M instead

English

Ettore Di Giacinto@mudler_it·5 Nis

APEX quantizations of more models ongoing! Meanwhile, playing with Qwen 3.5.. the impact of APEX vs Unsloth Dynamic quant on quality is clearly visible IMO, at least in some areas. I know we need more numbers before drawing conclusions, but this isn't about numbers. Just check out a simple prompt: "create an html page of a rotating cube in SVG." Left: Unsloth Qwen3.5-35B-A3B-UD-Q8_K_XL.gguf (48.7 GB, ~32 tok/s) → flat square (?????) Right: APEX Qwen3.5-35B-A3B-APEX-I-Quality.gguf (22.8 GB, ~53 tok/s) → ✨

English

4.1K

El Deffo@eldeffo·6 Nis

@bnjmn_marie even with -np X and perhaps several server copies at the same time? maybe it would be nice to have a proper comparison of the engines too.

English

Benjamin Marie@bnjmn_marie·6 Nis

@eldeffo Vllm is much faster at scale, when running hundreds of concurrent queries

English

129

Benjamin Marie@bnjmn_marie·6 Nis

List of quantized Gemma 4 31B I’m evaluating: - Intel/gemma-4-31B-it-int4-AutoRound (19.2 GB) - cyankiwi/gemma-4-31B-it-AWQ-4bit (20.5 GB) - RedHatAI/gemma-4-31B-it-NVFP4 (23.3 GB) - nvidia/Gemma-4-31B-IT-NVFP4 (32.7 GB) - RedHatAI/gemma-4-31B-it-FP8-block (33.3 GB) → yes, NVIDIA’s NVFP4 checkpoint is as large as an FP8 checkpoint. This is what happens when you don’t quantize the attention layers of a dense model.

English

184

15.8K

El Deffo@eldeffo·6 Nis

@bnjmn_marie generally the most interesting question is what can you fit into 11GB, 15GB, 23GB, 31GB... past that, it's just macs and rtx pros, and those can run almost anything anyway.

English

El Deffo@eldeffo·6 Nis

@bnjmn_marie how so? llama with quants is consistently faster than vllm, at least every time I tried. also, maybe the battery could be reduced, small models - Q4_K_M, maybe IQ4_NL & IQ3_XSS, + some smarter Q2s on 200B+ models? those are probably the only ones that need to be tested really

English

El Deffo@eldeffo·28 Mar

@LenSeaside @0xSero you can fit 27B UD-IQ3_XSS into 12GB, but only if you connect the display to motherboard.

English

Len Seaside@LenSeaside·28 Mar

@0xSero I would really appreciate a 12GB level for all the 3060 owners please. Are we better off with 9B variants or trying the 2.5 bit Unsloth 27B version? Or the A3B?

English

1.9K

0xSero@0xSero·28 Mar

Best models to run on your hardware level I'll be doing this every week, I hope you guys enjoy. ---- 8 GB ---- Autocomplete for coding (like Cursor Tab) - huggingface.co/NexVeridian/ze… - huggingface.co/bartowski/zed-… Tool calling, assistant style - huggingface.co/nvidia/NVIDIA-… ---- 16 Gb ---- Here things get better: Multimodal - huggingface.co/Qwen/Qwen3.5-9B - huggingface.co/Tesslate/OmniC… - huggingface.co/unsloth/Qwen3.… ---- 24 GB ---- - The best model you can get (thanks Qwen) huggingface.co/Qwen/Qwen3.5-2… - Great model (strong agents) huggingface.co/nvidia/Nemotro… - Mine hehe huggingface.co/0xSero/Qwen-3.… I'm doing a weekly series

English

220

367

3.7K

582.1K

El Deffo@eldeffo·24 Mar

@LenSeaside @stevibe 27B UD-IQ3_XXS [-ngl 65 + 36K Q4 kv cache] 1100-1200pp 36-37 t/s but only if you connect the display to motherboard/CPU's iGPU, that will get you 1-3GB VRAM back.

English

Len Seaside@LenSeaside·23 Mar

@stevibe Can you include 9B in your analyses please? I only have a 12GB GPU. I would be very interested to know if it's much worse or not that bad etc. and what I can do to help it along in terms of where it fails.

English

2.6K

stevibe@stevibe·23 Mar

"122B has to be smarter than 27B" I showed 4 UI components to three Qwen3.5 models and asked them to recreate them from a screenshot alone: - 27B (dense) - 35B-A3B (MoE) - 122B-A10B (MoE) Same screenshot. Same prompt. Same task. Which one do you think nailed it?

English

940

100.5K

El Deffo@eldeffo·19 Mar

@0xSero btw. github.com/vllm-project/v…

0xSero@0xSero·19 Mar

I think we can get very big models on smaller GPUs pretty soon. x.com/0xSero/status/…

0xSero@0xSero

x.com/i/article/2034…

English

209

19.6K

El Deffo@eldeffo·9 Mar

@0xSero if llama had api to move layers in and out of VRAM, with this info the performance gains could be quite substantial

English

194

0xSero@0xSero·9 Mar

I made a dataset from every AI chat and session I ever made, and passed it to Qwen3.5-35B - 7.6% of the model handled for 50% of my requests - 27.5% over 80% of my requests That means I can technically get any model to 7.6%/3.8% in fp8 someone give me 1 million pls

English

8.6K

El Deffo@eldeffo·20 Şub

@Ben68638515 @jonasgahrstore @bundeskanzler good buddy, Art. 69(a) clearly states that insane people shouldn't post on the interwebz without a professional supervision, & your posts are in a clear violation. please rectify your misconduct immediately also - check yourself before you wreck yourself. it's not looking good.

English

Hans-Benjamin Braun@Ben68638515·20 Şub

@eldeffo @jonasgahrstore @bundeskanzler You are free to present scientific arguments against every single one of the 10 geophysical evidences. Until you have done that, your post represents intentional misinformation violating Art. 70(c) of the ICC Rome Statute.

English

137

Jonas Gahr Støre@jonasgahrstore·22 Tem

A strong partnership between Norway and Germany will be even stronger. On energy, industry, defence, climate, space and support to Ukraine. Thank you @bundeskanzler Friedrich Merz for a substantive meeting in Berlin. It all amounts to mutual European security. (Photo: Uwe Koch)

English

119

304

21.7K

El Deffo@eldeffo·20 Şub

@Ben68638515 @jonasgahrstore @bundeskanzler you ran out of your medicines again, didn't ya.

English

144

Hans-Benjamin Braun@Ben68638515·22 Tem

Did you in your conversation with BK Merz also address the fact that it was your very own, Norwegian tax funded seismic agency NORSAR (joint venture with US Los Alamos Nuke Lab) which deliberately covered up the nuclear nature of the destruction of Nordstream: Indeed, Nordstream was nuked (sic!) as part of the US/NATO extortion racket: US LNG export capacity increased from ~0 in '16 to that of Nordstream 1+2 in '20 (within few percent) before Nordstream was destroyed by a Mini-Nuke under US/NATO auspices exactly on Donetsk Referendum Day, serving as covert shock wave attack towards Kaliningrad as evidenced by seismic measurements. This was indeed a US/NATO masterstroke: The covert nuclear nature of this attack subjugated DE and DK unconditionally to US/NATO orders, and coerced SW & FI into NATO. A summary of my evidence that proved the nuclear (!) destruction of Nordstream 1+2 beyond the shadow of a doubt was presented to the UN Security Council on Sept 26 2023 (SC/15422). Instead of following up on my report, responsibility was offloaded to authorities of Sweden, Denmark and Germany with the former two promptly aborting their investigations shortly afterwards, while the matter was - to this very day - intentionally buried by DE. x.com/Ben68638515/st…

English

10.8K

El Deffo@eldeffo·19 Şub

@Princip_on @RALee85 buddy, even the glorious soviet union lost like half of the many wars it started. and russia is no soviet union.

English

Gavrilo Princip@Princip_on·19 Şub

@RALee85 Even if it's 10x worse, the Russians will prevail over Ukraine. That's the goal and nothing will stop them. Looking back at russian history, even the most deranged russophobes can't picture a scenario where the SMO can be turned around in Ukraine's favor.

English

442

Rob Lee@RALee85·19 Şub

“Guided by President Vladimir Putin's crack team of economic officials, Russia has reported better average growth than the euro zone over the past four years despite being hit by some 24,000 Western sanctions. But high interest rates, higher taxes, rising prices and a $20-per-barrel discount for Russian oil are taking their toll - even in Moscow, a vast urban area of 22 million people that has been largely insulated from the worst impact of Europe's deadliest war since World War Two. "To let" signs are prominent in retail spaces across the capital. Sales of new light commercial vehicles and trucks, a good indicator for the health of the retail and construction industry, fell by 38% to 147,000 units in 2025 and have continued to fall in the first weeks of 2026, Autostat said. Data from Sberbank, which as Russia's biggest bank sees the ripples of expenditure across the economy, showed that the fall in the number of catering outlets in January was the biggest since 2021 and that restaurant spending hit the lowest in three years in November-early December 2025.” reuters.com/business/russi…

English

490

61.3K

El Deffo@eldeffo·29 Oca

@tomaskouba @Posledniskaut nerad ti to hovorim, ale toto je iba zaciatok. a dno neexistuje.

Čeština

Tomas Kouba@tomaskouba·29 Oca

@eldeffo @Posledniskaut Až na to, že máte špatně budoucí čas. Některým z nás to bylo jasné už před dvěma lety. Jen rychlost opravdu překvapila.

Čeština

Poslední skaut™@Posledniskaut·28 Oca

Velké poděkování Slovákům. Mohli jste nám teď ty dva roky našeho pošklebování dát pěkně sežrat, ale neudělali jste to. Naopak projevujete nesmírnou účast. Ještě jednou díky.

Čeština

3.2K

47.8K

El Deffo@eldeffo·28 Oca

@Posledniskaut a okrem toho sme dost unaveni zo svojich kokotov, na vasich nezostava energia. uzite si to. bude to jizda.

Čeština

111

El Deffo@eldeffo·28 Oca

@Posledniskaut neboj, tie 2 roky ceskeho kokotizmu zostali ulozene v pamati. a na to, ze cesi kompletne prestali vnimat rozdiel medzi realitou a svojimi smiesnymi predstavami o svojej nadradenosti sme si uz davno zvykli. ze delate machry a hajzl mate na chodbe, nie je nic nove.

573

El Deffo@eldeffo·22 Oca

@BarronTNews_ didn't stun anyone, noone in EU gives a damn about this clown. also, will 💯% end up in jail for all the corruption after the next election.

English

ⁿᵉʷˢ Barron Trump 🇺🇸@BarronTNews_·20 Oca

🚨 HOLY CRAP. Slovakia PM Robert Fico just STUNNED the EU after speaking with Donald Trump and Marco Rubio about Greenland. “Trump is clearly pursuing the nation-state interests of the United States. If the EU acted the same way, we would be in a completely different position.” BOOM. 🔥 And then he went for the jugular. “World leaders do not take the EU fully seriously. That’s because of our nonsensical climate targets and our suicidal migration policy.” That’s the truth they’re terrified to say out loud. America First works. National interest works. Strength works. This guy is BASED. 💯💯

English

174

3.2K

16.4K

563.4K

El Deffo@eldeffo·21 Oca

@robinebers skills issue. 25-30 t/s on a cpu + 12gb gpu.

English

Robin Ebers | AI Coach for Founders@robinebers·21 Oca

just trying running GLM 4.7 Flash locally in Open Code terrible experience I got 96 GB memory, model ran so slowly it was unbearable anyone saying open source models running on your computer are the future, they're either lying to you or are insanely delusional lol

English

126

264

46.6K

Keşfet

@bnjmn_marie @leftcurvedev_ @JamesNumb3rs @Posledniskaut @mudler_it @LenSeaside @0xSero @stevibe