Crown 👑 (@ciruai) - ملف تويتر | Zamantika Mersobahis Locabet

تغريدة مثبتة

Crown 👑@ciruai·31 May

@the_jimmy_jones x.com/i/chat/group_j…

QME

1

3

5

1K

Crown 👑@ciruai·1h

@usr_bin_roygbiv I just asked it what a good new local model is and it said qwen 3.5 122b...

GIF

English

0

2

29

Roy@usr_bin_roygbiv·3h

grok is actually expensive as fuck for how awful it is

English

5

0

17

507

Crown 👑@ciruai·1h

@Tilarium @usr_bin_roygbiv Ok, I trust you. They benchmark really well. I need to have time to spend with it in the harness. Even dominated the Hermes bench which is exactly the kind of tools you'd expect would make it good for that. (The 12b)

English

0

16

Alex 🟢 🇮🇱 🇺🇦 🇺🇸@Tilarium·1h

@ciruai @usr_bin_roygbiv I tested this while driving full blown ai assistant (not coding agent). It failed miserably

English

1

0

10

Crown 👑@ciruai·1d

I'm seeing the light. The Gemma models are actually extremely good. The 12b might be even better at Hermes than qwen3.6 35b. My AMD Strix Halo gets 115 TPS+ with 26b QAT MTP New quality tests run : lab.citu.ai #hermes-agent" target="_blank" rel="nofollow noopener">llm.ciru.ai/#hermes-agent @usr_bin_roygbiv

English

25

7

169

35.5K

Crown 👑@ciruai·2h

@alexellisuk @arunninghacker I'm not sure, but spark is tiny silver box. It can easily run an uncensored 35b

English

1

0

8

Alex Ellis@alexellisuk·3h

@ciruai @arunninghacker Are you sure that's what he means? Something he said doesn't add up: "with an Nvidia card in it" GB10 is a SoC with unified memory..

English

1

0

11

Volodymyr Styran 🇺🇦@arunninghacker·1d

I ran OpenCode riding an uncensored Qwen3.6 running on a Mini-sized shiny silver box with an NVIDIA card in it for a couple of days, and I assure you: all this ethics/regulations/export control frontier AI drama will be over very, very soon

English

16

17

485

61.7K

Crown 👑@ciruai·2h

@itsjustmarky @sudoingX You can reserve as little as .5 gb for vram in bios and leave the rest for unified.

English

0

29

sudo rm -rf@itsjustmarky·3h

@ciruai @sudoingX I am able to use over 110g with mine just a skill issue.

English

1

0

1

48

Crown 👑@ciruai·7h

Someone please ping @sudoingX and tell him to stop saying that AMD Strix Halo can only use 96GB as VRAM. It is a real shared memory pool. Just has to be configured properly. While you're at it show him my 2000 rows of benchmarks since he keeps asking "does anyone have any AMD benchmarks" and ignoring the best answers. llm.ciru.ai

Framework@FrameworkPuter

@barackomaba @sudoingX @JozsefSzalma This is correct :)

English

3

0

14

1.3K

Crown 👑@ciruai·3h

@SMNYC1 @arunninghacker It can be really good, what do you use AI for? llm.ciru.ai/crown-citadel-…

English

0

3

bird@SMNYC1·15h

@arunninghacker Mind sharing specs? Curious how good local can be

English

1

0

2

3.1K

Crown 👑@ciruai·3h

@alexellisuk @arunninghacker gb10 , its a spark, igpu

Deutsch

1

0

8

Alex Ellis@alexellisuk·17h

@arunninghacker I also run Qwen 3.6. Curious which box and which Nvidia GPU? Sounded like a Mac Mini, but you didn't say eGPU.

English

0

2

4.4K

Crown 👑@ciruai·6h

@TheAhmadOsman Are you able to make use of the agent swarm or is that only through the platform?

English

0

1

312

Ahmad@TheAhmadOsman·7h

Recent Opensource LLMs releases rankings for me 1st place: Kimi K2.7 Coding 2nd place: GLM 5.2 3rd place: MiniMax M3

Ahmad@TheAhmadOsman

Kimi-K2.7-Code is the new Opensouece SoTA for Coding & Agentic workflows

English

39

22

475

24.5K

Crown 👑@ciruai·6h

@DBillionaer @sudoingX that twitter account must be scheduled messages only.

English

0

1

68

David Aerdrop_BILLIONAER@DBillionaer·7h

@ciruai @sudoingX Weird right?

English

1

0

2

88

Crown 👑@ciruai·7h

@amerukraine @PAguiar_NH @usr_bin_roygbiv You can use tiny routing models. Often the split is obvious like "is this a quick chat reply or a coding task?"

English

0

1

6

Sephiroth@amerukraine·7h

@PAguiar_NH @barackomaba @usr_bin_roygbiv How does it know if it’s a complex task ?

English

1

0

5

Crown 👑@ciruai·10h

Luce is constantly creating cutting edge performance improvements for real people using real AI locally.

Sandro@pupposandro

Excited to launch Luce KVFlash. We've been working harder than ever with @davideciffa to bring better DX for local AI. Today, long context has a second memory bill nobody budgets for: the KV cache. On Qwen3.6-27B at 256K it costs 4.6 GiB of VRAM and drags decode down to 13 tok/s, because every new token reads the whole thing. KVFlash keeps a small pool of KV on the GPU, auto-sized to your VRAM, and pages cold 64-token chunks to host RAM, bit-exact and recallable. decode holds a flat 38.6 tok/s from 64K to the native 256K on a 3090, 2.9x the full cache at 256K, 72 MiB resident and benchmark accuracy unchanged.

English

1

2

10

1.3K

Crown 👑@ciruai·11h

@sudoingX @FrameworkPuter These influencers are so lazy can't get the basics right. @FrameworkPuter next time please send products to someone who actually knows what they are talking about 🤞 3+ times correcting the guy and watching him spread misinformation about Strix Halo. x.com/i/status/20633…

Crown 👑@ciruai

@sudoingX @JozsefSzalma He's wrong. The best way to use it is to set the vram limit to .5gb and then you set gtt to the full 128gb You get fully shared memory with no performance decrease. (I'm only reserving a small amount to keep from oom)

English

0

1

0

166

Sudo su@sudoingX·20h

the one box i was missing just landed anon. this is the @FrameworkPuter desktop with amd's strix halo, ryzen ai max+ 395, 128gb of unified memory, up to 96 of it addressable as vram. amd and framework sent it over for honest testing, no strings attached, and i've been waiting on this one specifically. here's why it matters. i've run local ai on basically everything, a 150 dollar drawer card, a 3090, a 5090, the dgx spark, datacenter h200s. the one gap was always the accessible big memory tier on the amd side, and this fills it. 128gb unified at roughly half the price of the nvidia equivalent, the sovereignty box for people who want to run real models without a datacenter budget. booting it today. and the question i actually want answered is the one nobody answers straight: what does this thing really run? same bar i hold every other card to. amd, nvidia, apple, measured, never vibes. let's find out what it's got.

Sudo su@sudoingX

listen up ROCm and Vulkan builders. @FrameworkPuter just shipped me strix halo desktop, 128GB unified, landing on my desk tuesday. everyone keeps asking what actually runs on this thing beyond vendor charts and forum guesses. so i'm going to answer it properly. starting with big MoE models since massive total params on light active is the whole point of 128GB unified. if there's a specific model or quant you want tested on strix halo, reply and it goes in the queue.

English

15

9

138

21.8K

Crown 👑 أُعيد تغريده

TonoKen3🤖Local-LLM&Robot🏁とのけん3@Tono_Ken3·23h

これはヤバい！ huihui先生がRio-3.5-Open-397Bの無検閲版（しかもBF16版）制作に着手してくれた！このモデルのベンチは見ものですリリースされたら私はNVFP4版の制作をしますみなさん応援よろです📣

huihui.ai@support_huihui

Announcement: We’re going to ablate this model — prefeitura-rio/Rio-3.5-Open-397B (based on Qwen3.5-397B-A17B). If the ablation succeeds, we will release the BF16 weights. If you’re interested, please follow us for first-hand news! huggingface.co/prefeitura-rio…

日本語

2

9

76

9.3K

Crown 👑@ciruai·14h

@ain3sh @0xSero x.com/barackomaba/st…

Crown 👑@ciruai

haha, I mean, I get it, if the most common issue is something like "when calling an endpoint check providers.md for runtime information, you are expected to load models yourself" That's great, positive prompting works best in my experience. Negative prompting is never ending, and most of the time when you're editing page formatting or something the last thing it needs to know is "Do not use pdf.default Do not use PDFParser Do not manually decompress streams" The same way your model magically becomes more coherent when you say something like "you're a senior engineer" is why it can also become worse when you prime the tokens for the things you DONT want. "DONT THINK ABOUT ELEPHANTS"

QME

1

0

1

27

Ainesh Chatterjee@ain3sh·14h

@barackomaba @0xSero ?

QAM

1

0

1

19

0xSero@0xSero·15h

If you picked up Droid over the last 6 months I think setting 2 hours in your week to explore your session history would be enlightening See how many times you address the same problems, then update agents.md to prevent it again You can do this in pi/opencode too

Ainesh Chatterjee@ain3sh

Ever wanted bootleg raindrop + token usage analytics for droid? Look no further than github.com/ain3sh/droid-s… 🫪

English

7

3

66

8K

Crown 👑@ciruai·14h

@bradmillscan on another level!

English

0

118

Brad Mills 🔑⚡️@bradmillscan·14h

My agent has some pretty cool dreams

English

3

0

12

827

Crown 👑@ciruai·14h

haha, I mean, I get it, if the most common issue is something like "when calling an endpoint check providers.md for runtime information, you are expected to load models yourself" That's great, positive prompting works best in my experience. Negative prompting is never ending, and most of the time when you're editing page formatting or something the last thing it needs to know is "Do not use pdf.default Do not use PDFParser Do not manually decompress streams" The same way your model magically becomes more coherent when you say something like "you're a senior engineer" is why it can also become worse when you prime the tokens for the things you DONT want. "DONT THINK ABOUT ELEPHANTS"

English

0

62

0xSero@0xSero·14h

@barackomaba Shut it down lads, you heard the man

English

1

0

1

156

Crown 👑@ciruai·14h

@LottoLabs Thanks Lotto! You've been killing it with the site.

English

0

2

27

Lotto@LottoLabs·15h

@barackomaba Both you guys are great follows!

English

3

0

6

220

Crown 👑@ciruai·15h

This guy deserves way more followers. Extremely bright engineer and tinkerer always on the cutting edge of local AI.

Loktar 🇺🇸@loktar00

Local AI is the future I agree, I see it the same way streaming (local and cloud) became the future. Family and friends thought I was a wizard in the early 2000's for having a computer hooked to my TV and watching rips. They think I'm a wizard now for having AI rigs at home. Everyone can stream locally or over the cloud with tiny stick devices, no one thinks you're a wizard anymore the same will happen with AI. It's fun to have massive systems right now but the cats out of the bag, eventually local AI and just AI in general will be a normal household appliance, and embedded into devices. We've already won they just haven't realized it yet.

English

5

0

32

2.6K

Crown 👑@ciruai·15h

@punchtaylor @sudoingX I've done much of the work, results are here: llm.ciru.ai

English

0

21

Punch Taylor@punchtaylor·20h

@sudoingX no strix halo here — im cuda on a 4090, metal on a mac studio, jetsons for the mesh. but this is exactly the tok/s gap id want to see laid out. post the rocm vs vulkan numbers and ill compare against the cuda/metal side.

English

2

0

2

150

Sudo su@sudoingX·20h

before i benchmark this box, settle something for me. on amd strix halo, are you team rocm or team vulkan? i'm testing both and posting the real tok/s regardless, but this debate gets religious on this chip, so drop your actual field experience, what was faster, what broke. i'll put it against my numbers.

Sudo su@sudoingX

the one box i was missing just landed anon. this is the @FrameworkPuter desktop with amd's strix halo, ryzen ai max+ 395, 128gb of unified memory, up to 96 of it addressable as vram. amd and framework sent it over for honest testing, no strings attached, and i've been waiting on this one specifically. here's why it matters. i've run local ai on basically everything, a 150 dollar drawer card, a 3090, a 5090, the dgx spark, datacenter h200s. the one gap was always the accessible big memory tier on the amd side, and this fills it. 128gb unified at roughly half the price of the nvidia equivalent, the sovereignty box for people who want to run real models without a datacenter budget. booting it today. and the question i actually want answered is the one nobody answers straight: what does this thing really run? same bar i hold every other card to. amd, nvidia, apple, measured, never vibes. let's find out what it's got.

English

22

0

37

5.9K

Crown 👑@ciruai·15h

@sudoingX vulkan

Suomi

0

1

122

Crown 👑

اكتشف