Eduardo Gonzalez

3.8K posts

Eduardo Gonzalez

@wm_eddie

Founder of @xpressai, maker of AI infrastructure tools. Co-Author of the Japanese book “Learning DL by Implementing Applications” Also on sigmoid.

Himeji-shi, Hyogo Katılım Aralık 2007

907 Takip Edilen778 Takipçiler

Eduardo Gonzalez@wm_eddie·7 May

@ktosopl My son got into Starfox 64 recently. It really is a masterpiece of a game. Almost 30 years later and it is still a lot of fun and still looks great.

English

Konrad ‘ktoso’ Malawski 🐟🏴‍☠️🇺🇦@ktosopl·7 May

New starfox looks absolutely great 😊 should be a fun remake, I’ve played that game to death way back then.

English

332

Eduardo Gonzalez retweetledi

Artificial Analysis@ArtificialAnlys·30 Nis

Alibaba's Qwen3.6 27B is the new open weights leader under 150B parameters scoring 46 on the Artificial Analysis Intelligence Index, but uses ~3.7x the output tokens and costs ~21x more than Gemma 4 31B (39) to run the full Intelligence Index @Alibaba_Qwen has released two open weights models in the Qwen3.6 family: Qwen3.6 27B (Dense, 46 on the Intelligence Index) and Qwen3.6 35B A3B (MoE, 43). The MoE variant has 36B total parameters but only activates 3B per forward pass. Both are Apache 2.0 licensed, support 262K context, include native multimodal input, and use the unified thinking/non-thinking hybrid architecture. Unlike Qwen3.5, Alibaba has not released larger Qwen3.6 models as open weights - Qwen3.6 Plus and Qwen3.6 Max Preview remain proprietary, so the Qwen3.6 open weights family is currently all under 50B models. All scores below are for reasoning mode. The Intelligence Index is our synthesis metric incorporating 10 evaluations covering agentic tasks, coding, and scientific reasoning. Key takeaways: ➤ Qwen3.6 27B is the most intelligent open weights model under 150B parameters. At 46 on the Intelligence Index, Qwen3.6 27B is ahead of Qwen3.6 35B A3B (43), Qwen3.5 27B (42), and Gemma 4 31B (39). It is also ahead of larger open weights models including NVIDIA Nemotron 3 Super 120B A12B (Reasoning, 36), Qwen3.5 122B A10B (42) and gpt-oss-120b (high, 33). In native BF16 precision, the 27B takes ~56GB to store the weights, fitting on a single H100, and in 4-bit quantization the weights fit on consumer hardware with 16GB+ of RAM ➤ Qwen3.6 35B A3B is the most intelligent open weights model with ~3B active parameters, 6 points ahead of Qwen3.5 35B A3B (37) and 13 points ahead of GLM-4.7-Flash (30). Other ~3B active peers include Gemma 4 26B A4B (31), Qwen3 Coder Next (80B total, 28), and NVIDIA Nemotron Cascade 2 30B A3B (28) ➤ AA-Omniscience improvement is driven entirely by abstention rather than accuracy. Qwen3.6 27B's hallucination rate falls from 80% to 48% versus Qwen3.5 27B, while accuracy is roughly flat - consistent with our finding that AA-Omniscience accuracy typically correlates with total parameter count and Qwen3.6 27B retains the same 27B parameter count as its predecessor. The 35B A3B shows the same pattern whereby hallucination drops from 84% to 50% while accuracy remains equivalent ➤ Token usage is up across both models versus Qwen3.5 and significantly higher than Gemma 4 31B. Qwen3.6 27B used ~144M output tokens to run the Intelligence Index (~1.5x Qwen3.5 27B at 98M, ~3.7x Gemma 4 31B at 39M). Qwen3.6 35B A3B used ~143M (~1.4x Qwen3.5 35B A3B at 100M, ~3.7x Gemma 4 31B) ➤ The 27B got materially more expensive while the 35B A3B is roughly flat versus predecessor. Per-token pricing on Alibaba Cloud moved differently, with the 27B going from $0.30/$2.40 to $0.60/$3.60 while the 35B A3B (Reasoning) remains nearly flat at $0.248/$1.485 (vs $0.25/$2.00 for Qwen3.5 35B A3B). Qwen3.6 27B costs ~$659 to run the Intelligence Index, ~2.2x Qwen3.5 27B (~$299) and ~21x Gemma 4 31B (~$31 at median third-party pricing of $0.14/$0.40 per 1M input/output tokens). Qwen3.6 35B A3B costs ~$280, roughly tied with Qwen3.5 35B A3B (~$302) and ~9x Gemma 4 31B ➤ Qwen3.6 27B is competitive with leading models on agentic real-world work tasks despite its size. At 1414 Elo on GDPval-AA, Qwen3.6 27B is ahead of recent open weights peers Qwen3.6 35B A3B (1297), Qwen3.5 27B (1157) and Gemma 4 31B (1115), but trails larger open weights leaders including DeepSeek V4 Pro (Reasoning, Max Effort, 1554) and GLM-5.1 (Reasoning, 1535). It matches DeepSeek V4 Flash (Reasoning, High Effort, 1414) at 284B total parameters, and sits roughly in line with GPT-5.4 mini (xhigh, 1436) and Muse Spark (1421). ➤ Non-reasoning variants remain equivalent versus Qwen3.5. Qwen3.6 27B (Non-reasoning, 37) is effectively tied with Qwen3.5 27B (Non-reasoning, 37); Qwen3.6 35B A3B (Non-reasoning, 32) is equivalent to Qwen3.5 35B A3B (Non-reasoning, 31). The Qwen3.6 generation gains are concentrated in reasoning mode Other information: ➤ Context window: 262K tokens (equivalent to Qwen3.5) ➤ License: Apache 2.0 ➤ Multimodality: Native vision input (text and image), text output ➤ API pricing (Alibaba Cloud): Qwen3.6 27B: $0.60/$3.60, Qwen3.6 35B A3B (Reasoning): $0.248/$1.485 ➤ Availability: Available on Alibaba Cloud first-party API. Qwen3.6 35B A3B is available on several third-party APIs such as @DeepInfra, @parasail_io, @clarifai and @novita_labs

English

596

55.7K

Eduardo Gonzalez retweetledi

MLT & AI Communities@__MLT__·17 Nis

Anatomy of a CLAW is up! Watch it here youtube.com/watch?v=X5380O…

YouTube

English

933

Eduardo Gonzalez@wm_eddie·16 Mar

CEOs with a stack of AI agents = Dark Helmet playing with his dolls in Spaceballs. youtu.be/LMxTFqPET5I?si…

YouTube

English

128

Eduardo Gonzalez retweetledi

Kilian Lieret@KLieret·9 Şub

Everyone talks about AGI, but you change the formatting of toolcall outputs a bit and SWE-bench performance drops by 5%

English

265

22.4K

Eduardo Gonzalez@wm_eddie·29 Oca

Damn, the thing doesn't even understand uv anymore.

English

Eduardo Gonzalez@wm_eddie·29 Oca

Well opus-4-5 is completely lobotomized now... ⏺ The module-level import sys at the top of files might be causing issues with import ordering. Let me remove them:

English

Eduardo Gonzalez@wm_eddie·13 Oca

@alexgraveley I was thinking the same thing. But the main problem I see is discovery. How will the agent know how to use the different files properly…

English

Alex Graveley@alexgraveley·12 Oca

Agents in 2026: Plan9 all the things!

English

2.3K

Eduardo Gonzalez retweetledi

Colors of Web3 and Entrepreneurship@ColorsofWeb3pod·10 Oca

8/8 We unpacked all of this with @wm_eddie (co-founder/CEO of @xpressai) on Colors of Web3 & Entrepreneurship. Watch/listen: youtu.be/gDPgmOsPefY

YouTube

English

Eduardo Gonzalez@wm_eddie·9 Oca

Anthropic is playing a dumb game blocking other clients. If Claude Code worked on my servers I’d use it. But it just crashes on boot. OpenCode just works. And my own harness is way better for long term memories…

English

123

Eduardo Gonzalez@wm_eddie·28 Ara

Shogi is chess with necromancy.

English

Eduardo Gonzalez@wm_eddie·5 Ara

@HeroeDeUnaMano Assuming it’s real.

English

134

OneHandedHero@HeroeDeUnaMano·5 Ara

@wm_eddie get me this! Only in Japan!

Genki✨@Genki_JPN

Samus Aran arm canon cushion that you can also use as a pillow!

English

Eduardo Gonzalez@wm_eddie·5 Ara

@HeroeDeUnaMano I’m on it.

English

110

Eduardo Gonzalez@wm_eddie·1 Ara

This is the most interesting part of the DeepseekV3.2 paper IMHO. Very close to something I've been meaning to try for a long time.

English

239

Eduardo Gonzalez@wm_eddie·26 Kas

@_m0se_ ./build/bin/llama-server -m models/qwen3-vl-24b-reap-Q4_K_M.gguf --mmproj models/qwen3-vl-24b-mmproj-bf16.gguf --cpu-moe -c 32768 を利用すればギリギリ８gbの2070Superで使えます。良いですねこれ。

日本語

130

OpenMOSE@_m0se_·26 Kas

Qwen3-VL-REAP-24B-A3B-GGUF GGUFバージョンも作りました。 imatrix版です。 cpu-moeをうまく使えば、8GB GPUにのると思います huggingface.co/OpenMOSE/Qwen3…

日本語

785

Eduardo Gonzalez retweetledi

ぬこぬこ / NUKO 🇯🇵@nukonuko·6 Kas

GMK at 渋谷ベルゴ maps.app.goo.gl/qM4wMBuoPoy29Q…

日本語

1.8K

Eduardo Gonzalez@wm_eddie·5 Kas

@jzawodn I also ran into this. The latest version works fine. Which is interesting. Wonder what happened there.

English

Jeremy Zawodny@jzawodn·5 Kas

Unexpected.

English

204

Eduardo Gonzalez@wm_eddie·31 Eki

@abacaj This is one of the reasons I use SambaNova. They don’t quantize the weights. The difference is huge. They are only superficially equivalent. If only SambaNova supported more models.

English

1.1K

anton@abacaj·30 Eki

Run gpt-oss-20b on openrouter get 32/100 on benchmark. Run gpt-oss-20b on vllm with h200s get 83/100 on benchmark. What are these providers doing? Deepinfra terrible results

English

573

65.5K

Eduardo Gonzalez@wm_eddie·27 Eki

Hmm... I think it may be very important that we do not train models on Asimov's work.

English

113

Eduardo Gonzalez@wm_eddie·24 Eki

@YouKnowEno I got the silicone sport band for this very reason. Has enough holes that even if it bothers me I can move it back far enough to not touch the MacBook.

English

Eno@YouKnowEno·24 Eki

how people work on their macbooks with a watch on? the sounds and feeling of metal scraping metal drives me nuts.

English

1.3K

Keşfet

@ktosopl @Alibaba_Qwen @DeepInfra @parasail_io @clarifai @novita_labs @alexgraveley @xpressai