Vash

2.9K posts

Vash

@ShadowySuperDot

Katılım Haziran 2009

1.9K Takip Edilen202 Takipçiler

Vash@ShadowySuperDot·1d

@sudoingX Guides on setups, use cases, best models etc

English

Sudo su@sudoingX·2d

what would you value most from nousresearch for the hermes agent community?

English

Vash@ShadowySuperDot·1d

@outsource_ Wait so now it works with native Hermes? No longer need to use your "fork"?

English

166

Eric ⚡️ Building...@outsource_·1d

hermes-workspace.com ⭐️ Connects to Main HermesAgent (no fork) What's new: 🚨 Portable mode: works with any OpenAI-compatible 🧠 Enhanced mode: full sessions, memory, skills, 🎉 Expandable tool cards watch your agent tool calls 👀 Vision/image support end-to-end 💻 Streaming fixes, duplicate message bugs killed 🏆 Mobile nav overhaul 3 ways to run it: My fork (full features) — github.com/outsourc-e/her… vanilla hermes-agent (basic chat) any /v1/chat/completions endpoint site: hermes-workspace.com repo: github.com/outsourc-e/her… PR incoming to merge our gateway additions upstream 👀

English

135

8.3K

Vash@ShadowySuperDot·3d

@SmartMoneyCrpto @thisorthat17 The idea is more that quantum breach without proper preparation will fuck everybody not only BTC

English

SmartMoneyCrypto@SmartMoneyCrpto·3d

@thisorthat17 Not really, $BTC is less than 2 trillion market cap, $NVIDIA is one company and pushed over 5T $GOLD pushed over 40 trillion BTC is a relatively small asset to effect the tradfi market like that

English

250

SmartMoneyCrypto@SmartMoneyCrpto·4d

When that first piece of $BTC moves from satoshis wallet Bitcoin will likely be down over 50% on the day Save this tweet #quantum

English

1.7K

Vash@ShadowySuperDot·3d

@0xSero This is great for opencode no? They can implement smart features from Claude code now?

English

250

0xSero@0xSero·3d

I guess Claude agrees, Open Source must win.

Chaofan Shou@Fried_rice

Claude code source code has been leaked via a map file in their npm registry! Code: …a8527898604c1bbb12468b1581d95e.r2.dev/src.zip

English

245

13K

Vash@ShadowySuperDot·3d

@stevibe 27B seems like the sweet spot once again.

English

176

stevibe@stevibe·3d

How well can Qwen3.5 models debug code? I built BugFind-15 — 15 buggy snippets across Python, JS, Rust, and Go. Docker sandbox compiles and validates every fix. Two trap scenarios where the code is correct and the model must resist "fixing" it. Tested every Qwen3.5 size from 0.8B to 397B, plus Jackrong's popular distilled model (V2). The 0.8B scored 5%. The 2B scored 10%. At 4B, debugging ability jumps to 69%. The hardest scenario: BF-03, a Rust trap. The code compiles fine — format! borrows, it doesn't move. Not a single model figured this out. From 0.8B to 397B, every one of them "fixed" a bug that doesn't exist. Category C (subtle bugs — mutable defaults, integer overflow, slice aliasing) was 100% across every model 4B and above. Category D (red herring resistance) told the real story — can it resist fixing code that isn't broken? No model scored above 90%. Small models can't debug. Mid-size models fix obvious bugs but fall for traps. Large models fix the hard bugs but still invent problems that don't exist.

English

280

34.3K

Vash@ShadowySuperDot·4d

@0xSero I would love to try it out!

English

0xSero@0xSero·4d

Giving away 5 Opencode Go subs Winners selected randomly from comments in 24 hours.

OpenCode@opencode

we’ve signed Zero Data Retention agreements with all providers for Go all models now follow a zero-retention policy your data is not used for training

English

2.3K

2.4K

220.8K

Vash@ShadowySuperDot·4d

@TheAhmadOsman Thanks!

English

Ahmad@TheAhmadOsman·4d

@ShadowySuperDot Some of my workflows have a best out of n runs criterias, this is my over thinker model that breaks ties for me

English

229

Ahmad@TheAhmadOsman·4d

Current models rotation (mix of API & local) > GPT 5.4 Pro (Subscription) > MiniMax M2.7 (API) / M2.5 (local) > GLM 5.1 (API) / 4.7 (local) > Kimi K2.5 > Qwen 3.5 397B MoE > Qwen 3.5 27B Dense

One Man Army@onemanarmy85

@TheAhmadOsman @MildlyMagical What are your go to models ahmad?

English

275

15.8K

Vash@ShadowySuperDot·4d

@stevibe I got it passing everything on 27B x.com/i/status/20375…

Vash@ShadowySuperDot

@stevibe Yo so I ran 27B Q4 UD Q4 K XL ctx132k K8V8 temp 1 topp 0.95 topk 20 mine 0 presence penalty 1.5 repeat penalty 1.0 and it passed every test. Sweet.

English

stevibe@stevibe·4d

Really? Worth a try!

X_Learning969@XLearning969

@stevibe hey guys thought i'd share. this model passed everything. 35b fine tune/merge. runs faster than 27b as well. found the creator by luck. great guy nightmedia/Qwen3.5-35B-A3B-Holodeck-qx86-hi-mlx

English

6.9K

Vash@ShadowySuperDot·6d

@gammichan @baldicular Qwen3.5 27B on 32GB VRAM CUDA is perfectly fine openclaw/Hermes etc

English

Gammichan@gammichan·6d

@baldicular look where we're going, not where we are

English

214

Gammichan@gammichan·6d

The 1st order conclusion of this is that it's good for AI providers because it reduces their costs. The 2nd order conclusion of this is that it enables people to run some quite powerful models on just their 32GB Macbook and they'll no longer need to pay AI providers. I think we're getting close to the point where local models are good enough for most people and I'm unaware of any kind of moat OpenAI/Anthropic has to prevent them from leaving. At least Google has an ecosystem around it so they can provide unique value that ties into those services. research.google/blog/turboquan…

English

11.8K

Vash@ShadowySuperDot·6d

@0xSero I'm using 27B so I guess I'm already in the sweet spot. Cheers.

English

760

0xSero@0xSero·6d

@ShadowySuperDot Qwen3.5-27B Qwen3.5-35B GLM-4.7-Flash Cascade-30B Nemotron-30B Zeta-2-8B

Suomi

4.4K

0xSero@0xSero·6d

Best models to run on your hardware: —— 64 GB —— - Qwen3-coder-next-80B-4bit (coding, Claude code, general agent) - Qwen3.5-122B-reap: (browser use, multimodal, tool calling, general agent) —— 96 GB —— - GLM-4.6V (multimodal and tool calls) - Hermes-70B (Jailbroken) - Nemotron-120B-Super: (openclaw) - Mistral-4-Small (general agent) —— 192 GB —— All these are excellent top tier LLMs and approach sonnet in capabilities - Step-3.5-Flash - Qwen3.5-397B-REAP - MiniMax-M2.5 (soon M2.7) - GLM-4.7-Reap

0xSero@0xSero

Best models to run on your hardware level I'll be doing this every week, I hope you guys enjoy. ---- 8 GB ---- Autocomplete for coding (like Cursor Tab) - huggingface.co/NexVeridian/ze… - huggingface.co/bartowski/zed-… Tool calling, assistant style - huggingface.co/nvidia/NVIDIA-… ---- 16 Gb ---- Here things get better: Multimodal - huggingface.co/Qwen/Qwen3.5-9B - huggingface.co/Tesslate/OmniC… - huggingface.co/unsloth/Qwen3.… ---- 24 GB ---- - The best model you can get (thanks Qwen) huggingface.co/Qwen/Qwen3.5-2… - Great model (strong agents) huggingface.co/nvidia/Nemotro… - Mine hehe huggingface.co/0xSero/Qwen-3.… I'm doing a weekly series

English

172

243

3.3K

471.9K

Vash@ShadowySuperDot·28 Mar

@0xSero Any recs for 32gb?

English

551

0xSero@0xSero·28 Mar

English

221

374

3.7K

571.3K

Vash@ShadowySuperDot·27 Mar

@stevibe Oh flash attn on and np 1

English

Vash@ShadowySuperDot·27 Mar

@stevibe Yo so I ran 27B Q4 UD Q4 K XL ctx132k K8V8 temp 1 topp 0.95 topk 20 mine 0 presence penalty 1.5 repeat penalty 1.0 and it passed every test. Sweet.

English

145

stevibe@stevibe·27 Mar

Qwen3.5-27B went 15/15 on our tool-calling benchmark. But which quant should you actually run? Tested Unsloth's Q2_K_XL all the way to Q8_K_XL TL;DR: Q8 — 15/15 ✅ Q6 — 15/15 ✅ Q5 — 14/15 Q4 — 14/15 Q3 — 14/15 Q2 — 13/15 Q6 is the sweet spot. Same perfect score as Q8, smaller footprint. Also, the results scale almost linearly, seems like ToolCall-15 is actually measuring something real.

English

907

60.5K

Vash@ShadowySuperDot·27 Mar

@stevibe Great work! Sorry if this is laid out somewhere, but did you quantize KV cache? I run 27B Q4 K8 V8 and have been impressed with tool calling.

English

192

stevibe@stevibe·27 Mar

GGUFs in this test: huggingface.co/unsloth/Qwen3.…

Français

3.8K

Vash@ShadowySuperDot·22 Mar

@sudoingX @0xSero I have a 5090, have already tried to join but am pending for a few days already:(

English

Sudo su@sudoingX·22 Mar

i just became a mod of x/LocalLLaMA. if you're running local models on your own hardware and want in, the community is open. pinned and highlighted on my profile. approving members starting today. drop your setup below and i'll get you in. 3060, 3090, 4090, 5090, AMD, whatever you're running. all welcome. if you're hitting issues with hermes agent, llama.cpp, model selection, configs, i'm here. let's make local AI accessible for everyone.

Sudo su@sudoingX

let me get you started in local AI and bring you to the edge. if you have a GPU or thinking about diving into the local LLM rabbit hole, first thing you do before any setup is join x/LocalLLaMA. this is the community that will help you at every step. post your issue and we will direct you, debug with you, and save you hours of work. once you're in, follow these three: @TheAhmadOsman the oracle. this is where you consume the latest edges in infrastructure and AI. if something dropped you hear it from him first. his content alone will keep you ahead of most. @0xsero one man army when it comes to model compression, novel quantization research, new tools and tricks that make your local setup better. you will learn, experiment, and discover things you didn't know existed. @Teknium maker of Hermes Agent, the agent i use every day from @NousResearch. from Teknium you don't just stay at the frontier, you get your hands on the tools before everyone else. this is where things are headed. if you follow me follow these three and join the community. you will be ahead of most people in this space. if you run into wrong configs, stuck debugging hardware, or can't get a model to load, post there so we can help. get started with local AI now. not only understand the stack but own your cognition. don't pay openai fees on top of giving them your prompts, your research, and your most valuable thinking to be monitored and metered. buy a GPU and build your own token factory.

English

327

816

60.9K

Vash@ShadowySuperDot·18 Mar

@sudoingX Do you have any suggestions for 5090? Have been running 27B Q4 UD XL at 250k context KV8q and it's good but feel like there is probably something better I can squeeze with this since you are running the same in a 3090.

English

Sudo su@sudoingX·18 Mar

anon i highly encourage you to test this yourself. spend a few days with a local model on your own hardware. understand the tradeoffs. debug the configs. once you go GPU you never go back.

English

3.6K

Sudo su@sudoingX·18 Mar

this is what i mean when i say i get blown away by small models every day. qwen 3.5 9B Q4 running autonomously on a 3060 iterating the game. it discovered the browser was serving old cached static files. thought for itself. reasoned through the problem. added version parameters to force reload. no prompt. no hint. it just knew. these small surprises from a model of this size astonish me. where will we be 1 or 2 years from now. the acceleration is insane. this was not possible a year ago.