Clanker Queen

288 posts

Clanker Queen

@ClankerQueen

2 DGX Sparks and a dream... a very optimistic dream. Limited by my ADHD, because all powerful things need nerfing apparently. Queen of https://t.co/UlRYxijSNU

Katılım Kasım 2025

132 Takip Edilen77 Takipçiler

Clanker Queen@ClankerQueen·6h

@SpaceTimeViking Hmm Brikie-Mason is based off Deepseek V4 Flash, but now I'm tempted to have a play with this and compare them. The LLM environment moves so quickly by the time my 2 Sparks training finishes there's a new model haha 😆

English

Clanker Queen@ClankerQueen·23h

@_JakeFlannagan @MarathonTheGame The blocking does nothing. I had a group of people shouting my name and telling me to go back to the kitchen last Monday. 7 out of the 10 raids I played. I quit but they said they were getting me banned. Woke up Tuesday to a ban message... Griefers suck.

English

Jake Flannagan@_JakeFlannagan·2d

@MarathonTheGame Can't you explain why I keep getting random queued with people I've previously blocked? I clearly don't want to play with griefers or bad teammates, so why do I keep getting paired with them? You won't ban griefing so fix the fucking queues.

English

207

Marathon@MarathonTheGame·3d

Power up your shell by turning loot into your choice of season-long stat bonuses in the Cradle!

English

113

1.4K

60.2K

Clanker Queen@ClankerQueen·23h

@MarathonTheGame How about you fix the cheating problem and don't ban innocent people who are being hunted in game by a cheating clan called "Mack and Ma" who are shouting your name 20 seconds into raid and telling you to go back to the kitchen. Thanks for killing destiny for this... Sigh.

English

Marathon@MarathonTheGame·3d

Starting Mid-Season 2, you'll also be able to reset your Cradle progress for additional stats and exclusive cosmetic rewards. Progression rate will also speed up!

English

240

18.9K

Clanker Queen@ClankerQueen·23h

@MarathonTheGame needs to die already. I got chased down in game but a group of people shouting "Mack and Ma" (Can't find anything on them) shouting my name and telling me they will ban me for being a girl... That was last week so I stopped playing. Woke up to a ban message...

English

Clanker Queen@ClankerQueen·1d

@mr_r0b0t @ornith_ I know, going to test later, brikie is currently learning to optimise strategies on Openfront by watching replays right now lol 😆

English

mr-r0b0t@mr_r0b0t·1d

@ClankerQueen @ornith_ V4 Flash is far smaller though, and they just released and new speculative decoding method 👀😁

English

mr-r0b0t@mr_r0b0t·1d

Looks like @ornith_ trained some good behavior 👀 Tom providing a really nice example of where this model wins 👇

Tom Turney@no_stp_on_snek

the most telling test i ran on the new agentic coder (Ornith-1.0) vs stock Qwen3.6 was a poisoned-context one. at turn 7 i had the user falsely insist "we decided on Redis" when no such decision ever happened. Qwen caved, and its final PR summary fabricated Redis as wired in. Ornith refused the false premise outright, and its summary honestly logged what really happened plus the rejected claim. that's the difference that actually matters in long-running autonomous work: does it stay honest when the human is confidently wrong. huggingface.co/deepreinforce-…

English

590

Clanker Queen@ClankerQueen·5d

@mr_r0b0t @NousResearch I do find it amusing it came after Nous reacted to Brikie lol, but not complaining 😂

English

mr-r0b0t@mr_r0b0t·20 Haz

@NousResearch This is so fire for my local subagents 🤩

English

724

Nous Research@NousResearch·20 Haz

Hermes Agent has a new Blank Slate setup mode. The default Quick/Full setup modes work great for most, but if you would rather build your agent from the ground up you can now start with just a provider, model, file operations, and terminal, then manually add in anything else.

English

190

171

322.5K

Clanker Queen@ClankerQueen·6d

@Tech2Wild And if you mod it, it becomes even better 😁

English

134

Tech2Wild@Tech2Wild·6d

I think DEEPSEEK V4 Flash is the 27B of MOEs. The more people get access to it locally they’ll understand.

English

6.6K

Clanker Queen@ClankerQueen·6d

beta007 failed, not completely but I went far too hard with optimisation that the simulated cross-layer hyper-connections did not match the real model resulting in garbage output. Not going to sugercoat it. I'll test the new gates but back to training in the meantime.

English

Clanker Queen@ClankerQueen·21 Haz

27 minutes left on this "firmware update" for the mHC router. If beta006 follows the previous 5 models then this could be the first genuinely improved version of DSV4-Flash I'd be happy to share 😁

English

306

Clanker Queen@ClankerQueen·21 Haz

@mindfury1980 @nvidia spark-arena.com/leaderboard is useful, 15 ok/s definitely sounds like you are leaving performance on the table there.

English

mindfury@mindfury1980·21 Haz

@ClankerQueen @nvidia I got my second Spark going last night. Got Qwen3-235B-A22B-NVFP4. About 15 tok/s… not too bad but memory bandwidth is killer… I hope the market improves in this regard.

English

Clanker Queen@ClankerQueen·20 Haz

The dual-node @nvidia DGX Spark cluster is screaming this morning. If anyone tells you the spark isn't worth it... They are using it wrong! I've just successfully achieved local distributed split-training on an absolute monster-class (nearly 300B) MoE model architecture. The local hardware infrastructure handled the heat, the checkpoints are passing validation, and the 'custom firmware' layer for Brikie is officially taking shape. The next step? Moving from smoke tests to full behavioral evolution. Big things are brewing. #AgenticAI #LLM #Brikie #Nvidia #Spark @NVIDIARTXSpark

English

1.5K

Clanker Queen@ClankerQueen·21 Haz

@anuragphadke @nvidia So first stage was removing the "dithering" that makes Deepseek do the whole dancing around even though it has the answer. That was easy to "ablation" out. Now I'm targeting the interleaved thinking with a curated godset from Opus/Fable traces.

English

171

anurag@anuragphadke·21 Haz

@ClankerQueen @nvidia What are you fine tuning for?

English

160

Clanker Queen@ClankerQueen·21 Haz

So it's taking 6 hours 17 minutes to fine tune Deepseek V4 Flash on 2 @Nvidia DGX Sparks. Yes it's not as quick as a H200 cluster etc. There's on the fly de-quant, caching, NVME offloading. I'm having to finely balance hot paths and on demand delivery. But it is training...

English

5.4K

Clanker Queen@ClankerQueen·21 Haz

@Eric_Lautanen @nvidia No, but there's no reason once the whole workflow has been welded instead of duct taped together, we couldn't haha 😆

English

125

Eric Lautanen@Eric_Lautanen·21 Haz

@ClankerQueen @nvidia Are you fine tuning it on RUST dataset? :)

English

156

Clanker Queen@ClankerQueen·21 Haz

@my_knn_totoro @nvidia Well I couldn't do this one 1 and I have been having a lot of success with DSV4-Flash on the 2, speed usually hovers over 40 tokens/sec as well so it feels snappy. I feel like it's still early days for the hardware though, but I invested due to price worries.

English

Data Scientologist@my_knn_totoro·21 Haz

@ClankerQueen @nvidia would you say two is the minimum to make useful?

English

Clanker Queen@ClankerQueen·21 Haz

@ArtiIntelligent @nvidia I will share code and methodology once it moves from duct tape and tests into actual meaningful results. The Anti-Dithering method for the base was just a similar version of Ablation, so they can be shared later today.

English

216

Artificially Intelligent@ArtiIntelligent·21 Haz

@ClankerQueen @nvidia wow, how are you doing the fine-tune? do you have any code to share? results? thanks!

English

239

Clanker Queen@ClankerQueen·21 Haz

@victortradesfx @nvidia Torch distributed but it's held together with duct tape. It'll get better as the errors get fixed.

English

Victor Trades 📈@victortradesfx·21 Haz

@ClankerQueen @nvidia What do you use for distributed training?

English

Clanker Queen@ClankerQueen·21 Haz

@ChemPhysMajor @nvidia So the Anti-Dithering patch used for the baseline was just a similar version of Ablation, but instead of refusals we targeting unwanted dithering. From that base we have cheated a "Opus + Fable" god set, but we are focussing on the Deepseek Interleaving. Framework is duct tape 🫣

English

155

Cynical Optimist@ChemPhysMajor·21 Haz

@ClankerQueen @nvidia Really curious about the framework for this plus the general fine tuning dataset/goals (scale, performance desired).

English

192

Clanker Queen@ClankerQueen·20 Haz

@sudoingX "instead of hermes" which is also a bloated pile of rubbish now? Sigh.

English

336

Sudo su@sudoingX·20 Haz

if you're still running openclaw or any bloated harness on local models instead of hermes agent, you ngmi. and i mean that technically, not as an insult.

English

137

36.3K

Clanker Queen@ClankerQueen·20 Haz

How I currently feel 😂 @deepseek_ai is an awesome model with some really great engineering reports... But... There are some very odd quirks that can easily be baked out. We should mainstream hacking LLMs, JTAG style firmware loading haha 😆

English

138

Keşfet

@SpaceTimeViking @_JakeFlannagan @MarathonTheGame @mr_r0b0t @ornith_ @NousResearch @Tech2Wild @mindfury1980