yuxinlu1
23 posts






Gemma 4 12B Coder is here and it's a game changer for local code generation. This GGUF model packs Google's latest gemma-4 architecture into a compact 12B size, perfect for running on consumer hardware. It's optimized for reasoning and thinking, making it ideal for developers who want fast, private coding assistance without the cloud.

This is the most hilarious thing I saw and did today Ran gemma-4-12B-coder-fable5-composer2.5-v1-GGUF locally with 8 GB VRAM at 20+ tok/sec Anthropic's Claude Fable 5 launched June 9. By June 12 it was banned. I can't access it. You can't either. But here's the twist: I'm running a model trained on its chain of thought at 20 tok/s on my RTX 4060 8GB. Locally. Offline. No cloud. No export control. Enter: Gemma4-12B-Coder GGUF (Q4_K_M) Base: Google's gemma-4-12B-it Fine-tuned on verifiable Python CoT data: - Primary: Composer 2.5 real reasoning traces (only passing solutions kept) - Auxiliary: Fable 5 used to redo the hard cases Composer missed. Every training example's reasoning led to code that actually ran. No hallucinated logic. Llama.cpp flags: -m gemma4-coding-Q4_K_M.gguf -cnv -ngl 44 -c 64000 -v (huggingface model link in comments) Flag breakdown: -ngl 44 → offload 44 layers to GPU (tune this for your VRAM) -c 64000 → 64K context window -cnv → conversation/chat mode -v → verbose output The irony writes itself. Anthropic spent weeks telling the world Fable 5 (mythos) is too powerful to release. Then released it. Then got banned from serving it, including their own researchers. Meanwhile: a Gemma 4 12B fine tune, trained on Fable 5's reasoning, runs fully offline on my mid range consumer GPU No API. No cloud. Just me and llama.cpp. This is why local AI matters. Check out the model's link in the comments. How's your experience been with this model?




Gemma 4 12B Coder is here and it's a game changer for local code generation. This GGUF model packs Google's latest gemma-4 architecture into a compact 12B size, perfect for running on consumer hardware. It's optimized for reasoning and thinking, making it ideal for developers who want fast, private coding assistance without the cloud.

Help us shape the next GLM release: what should we prioritize most?




Just got my brand new mac signed by Sam Altman

We’ve agreed to a partnership with @SpaceX that will substantially increase our compute capacity. This, along with our other recent compute deals, means that we’ve been able to increase our usage limits for Claude Code and the Claude API.

SubQ , a new type of AI model, says they are 50x faster and 20x cheaper than Opus 4.7 and GPT 5.5 In fact, they also say they perform INSANELY WELL on benchmarks and have a 12M context This would be earth shattering, if true - Anthropic/OpenAI's valuation would go to zero 😱




Grok




