Vash

2.9K posts

Vash banner
Vash

Vash

@ShadowySuperDot

Katılım Haziran 2009
1.9K Takip Edilen202 Takipçiler
Vash
Vash@ShadowySuperDot·
@sudoingX Guides on setups, use cases, best models etc
English
0
0
0
6
Sudo su
Sudo su@sudoingX·
what would you value most from nousresearch for the hermes agent community?
English
23
3
22
9K
Vash
Vash@ShadowySuperDot·
@outsource_ Wait so now it works with native Hermes? No longer need to use your "fork"?
English
1
0
1
166
Eric ⚡️ Building...
hermes-workspace.com ⭐️ Connects to Main HermesAgent (no fork) What's new: 🚨 Portable mode: works with any OpenAI-compatible 🧠 Enhanced mode: full sessions, memory, skills, 🎉 Expandable tool cards watch your agent tool calls 👀 Vision/image support end-to-end 💻 Streaming fixes, duplicate message bugs killed 🏆 Mobile nav overhaul 3 ways to run it: My fork (full features) — github.com/outsourc-e/her… vanilla hermes-agent (basic chat) any /v1/chat/completions endpoint site: hermes-workspace.com repo: github.com/outsourc-e/her… PR incoming to merge our gateway additions upstream 👀
English
16
16
135
8.3K
Vash
Vash@ShadowySuperDot·
@SmartMoneyCrpto @thisorthat17 The idea is more that quantum breach without proper preparation will fuck everybody not only BTC
English
0
0
1
22
SmartMoneyCrypto
SmartMoneyCrypto@SmartMoneyCrpto·
@thisorthat17 Not really, $BTC is less than 2 trillion market cap, $NVIDIA is one company and pushed over 5T $GOLD pushed over 40 trillion BTC is a relatively small asset to effect the tradfi market like that
English
1
0
4
250
SmartMoneyCrypto
SmartMoneyCrypto@SmartMoneyCrpto·
When that first piece of $BTC moves from satoshis wallet Bitcoin will likely be down over 50% on the day Save this tweet #quantum
English
6
7
44
1.7K
Vash
Vash@ShadowySuperDot·
@0xSero This is great for opencode no? They can implement smart features from Claude code now?
English
1
0
0
250
Vash
Vash@ShadowySuperDot·
@stevibe 27B seems like the sweet spot once again.
English
1
0
6
176
stevibe
stevibe@stevibe·
How well can Qwen3.5 models debug code? I built BugFind-15 — 15 buggy snippets across Python, JS, Rust, and Go. Docker sandbox compiles and validates every fix. Two trap scenarios where the code is correct and the model must resist "fixing" it. Tested every Qwen3.5 size from 0.8B to 397B, plus Jackrong's popular distilled model (V2). The 0.8B scored 5%. The 2B scored 10%. At 4B, debugging ability jumps to 69%. The hardest scenario: BF-03, a Rust trap. The code compiles fine — format! borrows, it doesn't move. Not a single model figured this out. From 0.8B to 397B, every one of them "fixed" a bug that doesn't exist. Category C (subtle bugs — mutable defaults, integer overflow, slice aliasing) was 100% across every model 4B and above. Category D (red herring resistance) told the real story — can it resist fixing code that isn't broken? No model scored above 90%. Small models can't debug. Mid-size models fix obvious bugs but fall for traps. Large models fix the hard bugs but still invent problems that don't exist.
English
31
14
280
34.3K
Vash
Vash@ShadowySuperDot·
@0xSero I would love to try it out!
English
0
0
0
2
Ahmad
Ahmad@TheAhmadOsman·
@ShadowySuperDot Some of my workflows have a best out of n runs criterias, this is my over thinker model that breaks ties for me
English
1
0
2
229
Ahmad
Ahmad@TheAhmadOsman·
Current models rotation (mix of API & local) > GPT 5.4 Pro (Subscription) > MiniMax M2.7 (API) / M2.5 (local) > GLM 5.1 (API) / 4.7 (local) > Kimi K2.5 > Qwen 3.5 397B MoE > Qwen 3.5 27B Dense
One Man Army@onemanarmy85

@TheAhmadOsman @MildlyMagical What are your go to models ahmad?

English
22
12
275
15.8K
stevibe
stevibe@stevibe·
Really? Worth a try!
X_Learning969@XLearning969

@stevibe hey guys thought i'd share. this model passed everything. 35b fine tune/merge. runs faster than 27b as well. found the creator by luck. great guy nightmedia/Qwen3.5-35B-A3B-Holodeck-qx86-hi-mlx

English
4
2
42
6.9K
Vash
Vash@ShadowySuperDot·
@gammichan @baldicular Qwen3.5 27B on 32GB VRAM CUDA is perfectly fine openclaw/Hermes etc
English
0
0
1
50
Gammichan
Gammichan@gammichan·
The 1st order conclusion of this is that it's good for AI providers because it reduces their costs. The 2nd order conclusion of this is that it enables people to run some quite powerful models on just their 32GB Macbook and they'll no longer need to pay AI providers. I think we're getting close to the point where local models are good enough for most people and I'm unaware of any kind of moat OpenAI/Anthropic has to prevent them from leaving. At least Google has an ecosystem around it so they can provide unique value that ties into those services. research.google/blog/turboquan…
English
15
0
52
11.8K
Vash
Vash@ShadowySuperDot·
@0xSero I'm using 27B so I guess I'm already in the sweet spot. Cheers.
English
2
0
2
760
0xSero
0xSero@0xSero·
@ShadowySuperDot Qwen3.5-27B Qwen3.5-35B GLM-4.7-Flash Cascade-30B Nemotron-30B Zeta-2-8B
Suomi
3
0
28
4.4K
0xSero
0xSero@0xSero·
Best models to run on your hardware: —— 64 GB —— - Qwen3-coder-next-80B-4bit (coding, Claude code, general agent) - Qwen3.5-122B-reap: (browser use, multimodal, tool calling, general agent) —— 96 GB —— - GLM-4.6V (multimodal and tool calls) - Hermes-70B (Jailbroken) - Nemotron-120B-Super: (openclaw) - Mistral-4-Small (general agent) —— 192 GB —— All these are excellent top tier LLMs and approach sonnet in capabilities - Step-3.5-Flash - Qwen3.5-397B-REAP - MiniMax-M2.5 (soon M2.7) - GLM-4.7-Reap
0xSero@0xSero

Best models to run on your hardware level I'll be doing this every week, I hope you guys enjoy. ---- 8 GB ---- Autocomplete for coding (like Cursor Tab) - huggingface.co/NexVeridian/ze… - huggingface.co/bartowski/zed-… Tool calling, assistant style - huggingface.co/nvidia/NVIDIA-… ---- 16 Gb ---- Here things get better: Multimodal - huggingface.co/Qwen/Qwen3.5-9B - huggingface.co/Tesslate/OmniC… - huggingface.co/unsloth/Qwen3.… ---- 24 GB ---- - The best model you can get (thanks Qwen) huggingface.co/Qwen/Qwen3.5-2… - Great model (strong agents) huggingface.co/nvidia/Nemotro… - Mine hehe huggingface.co/0xSero/Qwen-3.… I'm doing a weekly series

English
172
243
3.3K
471.9K
Vash
Vash@ShadowySuperDot·
@0xSero Any recs for 32gb?
English
0
0
2
551
0xSero
0xSero@0xSero·
Best models to run on your hardware level I'll be doing this every week, I hope you guys enjoy. ---- 8 GB ---- Autocomplete for coding (like Cursor Tab) - huggingface.co/NexVeridian/ze… - huggingface.co/bartowski/zed-… Tool calling, assistant style - huggingface.co/nvidia/NVIDIA-… ---- 16 Gb ---- Here things get better: Multimodal - huggingface.co/Qwen/Qwen3.5-9B - huggingface.co/Tesslate/OmniC… - huggingface.co/unsloth/Qwen3.… ---- 24 GB ---- - The best model you can get (thanks Qwen) huggingface.co/Qwen/Qwen3.5-2… - Great model (strong agents) huggingface.co/nvidia/Nemotro… - Mine hehe huggingface.co/0xSero/Qwen-3.… I'm doing a weekly series
English
221
374
3.7K
571.3K
Vash
Vash@ShadowySuperDot·
@stevibe Oh flash attn on and np 1
English
0
0
0
41
Vash
Vash@ShadowySuperDot·
@stevibe Yo so I ran 27B Q4 UD Q4 K XL ctx132k K8V8 temp 1 topp 0.95 topk 20 mine 0 presence penalty 1.5 repeat penalty 1.0 and it passed every test. Sweet.
English
1
0
0
145
stevibe
stevibe@stevibe·
Qwen3.5-27B went 15/15 on our tool-calling benchmark. But which quant should you actually run? Tested Unsloth's Q2_K_XL all the way to Q8_K_XL TL;DR: Q8 — 15/15 ✅ Q6 — 15/15 ✅ Q5 — 14/15 Q4 — 14/15 Q3 — 14/15 Q2 — 13/15 Q6 is the sweet spot. Same perfect score as Q8, smaller footprint. Also, the results scale almost linearly, seems like ToolCall-15 is actually measuring something real.
English
52
78
907
60.5K
Vash
Vash@ShadowySuperDot·
@stevibe Great work! Sorry if this is laid out somewhere, but did you quantize KV cache? I run 27B Q4 K8 V8 and have been impressed with tool calling.
English
1
0
1
192
Vash
Vash@ShadowySuperDot·
@sudoingX @0xSero I have a 5090, have already tried to join but am pending for a few days already:(
English
0
0
0
20
Sudo su
Sudo su@sudoingX·
i just became a mod of x/LocalLLaMA. if you're running local models on your own hardware and want in, the community is open. pinned and highlighted on my profile. approving members starting today. drop your setup below and i'll get you in. 3060, 3090, 4090, 5090, AMD, whatever you're running. all welcome. if you're hitting issues with hermes agent, llama.cpp, model selection, configs, i'm here. let's make local AI accessible for everyone.
Sudo su tweet media
Sudo su@sudoingX

let me get you started in local AI and bring you to the edge. if you have a GPU or thinking about diving into the local LLM rabbit hole, first thing you do before any setup is join x/LocalLLaMA. this is the community that will help you at every step. post your issue and we will direct you, debug with you, and save you hours of work. once you're in, follow these three: @TheAhmadOsman the oracle. this is where you consume the latest edges in infrastructure and AI. if something dropped you hear it from him first. his content alone will keep you ahead of most. @0xsero one man army when it comes to model compression, novel quantization research, new tools and tricks that make your local setup better. you will learn, experiment, and discover things you didn't know existed. @Teknium maker of Hermes Agent, the agent i use every day from @NousResearch. from Teknium you don't just stay at the frontier, you get your hands on the tools before everyone else. this is where things are headed. if you follow me follow these three and join the community. you will be ahead of most people in this space. if you run into wrong configs, stuck debugging hardware, or can't get a model to load, post there so we can help. get started with local AI now. not only understand the stack but own your cognition. don't pay openai fees on top of giving them your prompts, your research, and your most valuable thinking to be monitored and metered. buy a GPU and build your own token factory.

English
327
43
816
60.9K
Vash
Vash@ShadowySuperDot·
@sudoingX Do you have any suggestions for 5090? Have been running 27B Q4 UD XL at 250k context KV8q and it's good but feel like there is probably something better I can squeeze with this since you are running the same in a 3090.
English
0
0
0
72
Sudo su
Sudo su@sudoingX·
anon i highly encourage you to test this yourself. spend a few days with a local model on your own hardware. understand the tradeoffs. debug the configs. once you go GPU you never go back.
English
6
1
30
3.6K
Sudo su
Sudo su@sudoingX·
this is what i mean when i say i get blown away by small models every day. qwen 3.5 9B Q4 running autonomously on a 3060 iterating the game. it discovered the browser was serving old cached static files. thought for itself. reasoned through the problem. added version parameters to force reload. no prompt. no hint. it just knew. these small surprises from a model of this size astonish me. where will we be 1 or 2 years from now. the acceleration is insane. this was not possible a year ago.
Sudo su tweet media
English
33
17
371
18.5K
Eric ⚡️ Building...
Eric ⚡️ Building...@outsource_·
Shipped 🚀Hermes Workspace For @NousResearch Hackathon ⚡️
Nous Research@NousResearch

The Hermes Agent Hackathon Starts Now Show us what Hermes Agent can do: build something unique, creative, and useful. 1st: $7,500 2nd: $2,000 3rd: $500 To enter, make a tweet tagging @NousResearch with a video demo and a brief writeup, then send the tweet link to the submissions channel in our Discord. Entries will be judged by Nous staff on creativity, usefulness, and presentation. Submissions are due EOD Sunday 03/16.

English
2
0
5
246
Vash
Vash@ShadowySuperDot·
@LottoLabs Do you mean running the same prompt 5 times or iterating 5 times over the initial output?
English
0
0
0
27
Lotto
Lotto@LottoLabs·
Qwen 3.5 27b could probably 5 shot anything lol
English
17
4
214
17.9K