
Studio Zamudio One
1.2K posts

Studio Zamudio One
@StudioZamudio
Studio Zamudio One -- Forging unforgettable and innovative game experiences and building the next generation.





















@loftwah I’ve used AI, it cannot produce anything “acceptable” in terms of a production-ready application on its own.





Spent the past two days learning and testing new @googlegemma DiffusionGemma 26B on the new WIP llama.cpp diffusion branch (PR #24427)on my 3060. Unlike traditional LLMs that generate one token at a time, DiffusionGemma generates a whole block of tokens and repeatedly refines it over multiple denoising steps. Setup: - RTX 3060 12GB - diffusiongemma-26B-A4B-it-Q4_K_M.gguf @UnslothAI quant - `-ngl 15` (anything higher OOM) - Peak VRAM: ~10.4GB Prompt: "Write a Python quicksort implementation" 512-token generation (2 diffusion blocks): 12 steps -> 12.1s (42.4 canvas tok/s) 18 steps -> 17.9s (28.6 canvas tok/s) 24 steps -> 24.2s (21.2 canvas tok/s) 32 steps -> 32.0s (16.0 canvas tok/s) 48 steps -> 37.2s (13.8 canvas tok/s) A few observations: - Lower step counts were noticeably worse in quality - Denoising cost stayed almost perfectly flat at ~0.5s/step - Peak VRAM stayed around 10.4GB across all runs - It's not that bad even with offloading most to cpu - Diminishing returns started appearing around the 24–32 step range Still early days, but it's pretty wild seeing a 26B diffusion language model running locally in llama.cpp


Update on Fable 5: > Anthropic staff have flown to Washington > Ongoing talks are happening with Trump administration > Both sides are eager to solve the dispute Something will probably come out tomorrow. This is also likely the reason we haven't heard from @AnthropicAI yet despite the "24hrs" promise.












