Crown ๐
28.4K posts

Crown ๐
@ciruai
Local LLM Min Maxer. AI is about the workflow, not the model. AMD Local LLM Group: https://t.co/0wQDCDXlzO









@barackomaba @sudoingX @JozsefSzalma This is correct :)




Kimi-K2.7-Code is the new Opensouece SoTA for Coding & Agentic workflows



Excited to launch Luce KVFlash. We've been working harder than ever with @davideciffa to bring better DX for local AI. Today, long context has a second memory bill nobody budgets for: the KV cache. On Qwen3.6-27B at 256K it costs 4.6 GiB of VRAM and drags decode down to 13 tok/s, because every new token reads the whole thing. KVFlash keeps a small pool of KV on the GPU, auto-sized to your VRAM, and pages cold 64-token chunks to host RAM, bit-exact and recallable. decode holds a flat 38.6 tok/s from 64K to the native 256K on a 3090, 2.9x the full cache at 256K, 72 MiB resident and benchmark accuracy unchanged.

@sudoingX @JozsefSzalma He's wrong. The best way to use it is to set the vram limit to .5gb and then you set gtt to the full 128gb You get fully shared memory with no performance decrease. (I'm only reserving a small amount to keep from oom)



listen up ROCm and Vulkan builders. @FrameworkPuter just shipped me strix halo desktop, 128GB unified, landing on my desk tuesday. everyone keeps asking what actually runs on this thing beyond vendor charts and forum guesses. so i'm going to answer it properly. starting with big MoE models since massive total params on light active is the whole point of 128GB unified. if there's a specific model or quant you want tested on strix halo, reply and it goes in the queue.

Announcement: Weโre going to ablate this model โ prefeitura-rio/Rio-3.5-Open-397B (based on Qwen3.5-397B-A17B). If the ablation succeeds, we will release the BF16 weights. If youโre interested, please follow us for first-hand news! huggingface.co/prefeitura-rioโฆ


Ever wanted bootleg raindrop + token usage analytics for droid? Look no further than github.com/ain3sh/droid-sโฆ ๐ซช


Local AI is the future I agree, I see it the same way streaming (local and cloud) became the future. Family and friends thought I was a wizard in the early 2000's for having a computer hooked to my TV and watching rips. They think I'm a wizard now for having AI rigs at home. Everyone can stream locally or over the cloud with tiny stick devices, no one thinks you're a wizard anymore the same will happen with AI. It's fun to have massive systems right now but the cats out of the bag, eventually local AI and just AI in general will be a normal household appliance, and embedded into devices. We've already won they just haven't realized it yet.


the one box i was missing just landed anon. this is the @FrameworkPuter desktop with amd's strix halo, ryzen ai max+ 395, 128gb of unified memory, up to 96 of it addressable as vram. amd and framework sent it over for honest testing, no strings attached, and i've been waiting on this one specifically. here's why it matters. i've run local ai on basically everything, a 150 dollar drawer card, a 3090, a 5090, the dgx spark, datacenter h200s. the one gap was always the accessible big memory tier on the amd side, and this fills it. 128gb unified at roughly half the price of the nvidia equivalent, the sovereignty box for people who want to run real models without a datacenter budget. booting it today. and the question i actually want answered is the one nobody answers straight: what does this thing really run? same bar i hold every other card to. amd, nvidia, apple, measured, never vibes. let's find out what it's got.








