Koen

3.9K posts

Koen

@koenvaneijk

ai augmented senior swe - ex automagica (acquired '20)

Austin, TX Katılım Haziran 2009

124 Takip Edilen8.9K Takipçiler

Koen@koenvaneijk·21h

@LottoLabs have you tried the claude distillation? i haven't yet but it's supposed to be good

English

1.5K

Lotto@LottoLabs·21h

Qwen 3.5 27b never degrades, never stops running, never has token limits, never refuses, never logs my prompts, never trains on my data, never sells my data, never runs up my credit card

English

103

2.3K

129.7K

Koen@koenvaneijk·2d

I think the next evolution of *claws and other agents is that they're integrated in the AI apps and there will be a decoupling of user initiative and agentic action. There will be a heartbeat at which appropriate context is built and action is taken without user initiative.

English

Koen@koenvaneijk·3d

github.com/screamingsilic…

ZXX

Koen@koenvaneijk·3d

localcode now available on gh, coding agent that does not suck and a single .py file

GIF

English

Koen@koenvaneijk·3d

In my opinion, Hermes tries to do too many things. It could be better if it follows UNIX philosophy and do one thing great. I installed it and it's just too many dependencies out of the door that I would never even consider using this in a real production scenario, while I am convinced the core agentic loop could be very lightweight. I rather introduce dependencies based on the usage scenario, but maybe Hermes is not for me. Appreciate all your efforts!

English

199

Sudo su@sudoingX·4d

@NousResearch try it right now if you're already running it show your setup and tell me your experience. pros, cons, all welcome.

English

182

18.5K

Sudo su@sudoingX·4d

been getting this question in DMs and comments so let me be clear. hermes agent is built by @NousResearch. when a research lab builds an agent framework the whole point is to get the full capability out of the model. 11 parsers, 30+ tools, skills system, all designed to extract everything the model can give. for the best experience use it with frontier models through nous research. the difference between hermes + frontier and any other framework is not even close. i've used it with opus 4.6 and it was on another level. it also runs beautifully on local models. i run on my gpus daily. but if you want the full power, pair it with the best model you can access. if you're still on openclaw it's time. i will personally help anyone migrating. you deserve better tools.

mika@bladgolem

@sudoingX what do you think about using codex on hermes?

English

395

29.8K

Koen@koenvaneijk·3d

@miguelsff @sudoingX x.com/koenvaneijk/st…

Koen@koenvaneijk

@brah_ddah @LottoLabs github.com/screamingsilic… that's the compose I use

QME

miguelsff@miguelsff·3d

@koenvaneijk @sudoingX Can you share your stack please, I have a rtx5090 too

English

Sudo su@sudoingX·3d

i just became a mod of x/LocalLLaMA. if you're running local models on your own hardware and want in, the community is open. pinned and highlighted on my profile. approving members starting today. drop your setup below and i'll get you in. 3060, 3090, 4090, 5090, AMD, whatever you're running. all welcome. if you're hitting issues with hermes agent, llama.cpp, model selection, configs, i'm here. let's make local AI accessible for everyone.

Sudo su@sudoingX

let me get you started in local AI and bring you to the edge. if you have a GPU or thinking about diving into the local LLM rabbit hole, first thing you do before any setup is join x/LocalLLaMA. this is the community that will help you at every step. post your issue and we will direct you, debug with you, and save you hours of work. once you're in, follow these three: @TheAhmadOsman the oracle. this is where you consume the latest edges in infrastructure and AI. if something dropped you hear it from him first. his content alone will keep you ahead of most. @0xsero one man army when it comes to model compression, novel quantization research, new tools and tricks that make your local setup better. you will learn, experiment, and discover things you didn't know existed. @Teknium maker of Hermes Agent, the agent i use every day from @NousResearch. from Teknium you don't just stay at the frontier, you get your hands on the tools before everyone else. this is where things are headed. if you follow me follow these three and join the community. you will be ahead of most people in this space. if you run into wrong configs, stuck debugging hardware, or can't get a model to load, post there so we can help. get started with local AI now. not only understand the stack but own your cognition. don't pay openai fees on top of giving them your prompts, your research, and your most valuable thinking to be monitored and metered. buy a GPU and build your own token factory.

English

328

817

60.2K

Koen@koenvaneijk·3d

@TwatterLester @sudoingX It replaced Opus 4.6 for me personally, it feels at the same level in terms of agency, but it needs context. It's definitely much worse for non-English. Just try the APIs or Runpod if you want to be sure before you buy hardware.

English

Lester@TwatterLester·3d

@koenvaneijk @sudoingX How good is that model? Is it worth thr cost of 5090? I m thinking of the same setup, but worried the model isnt that good at coding.

English

Koen@koenvaneijk·3d

@sudoingX I've spent over 3k on opus 4.5. Yet now I use Qwen 3.5 27B on a 3k RTX 5090 and honestly I prefer it over Opus 4.6

English

113

Sudo su@sudoingX·4d

opus is not for everyone

English

Koen@koenvaneijk·3d

@juliafedorin Please, please, just @tool wrappers over Python functions! You don't need more. Actually, just a simple if elif else will do.

English

Julia Fedorin@juliafedorin·4d

come by Marina Green RIGHT NOW and tell us your opinion!

English

105

609

146K

Koen@koenvaneijk·3d

@BubCasto @LottoLabs Yes, my setup allows for 90k context, I never get close to that with localcode as it auto-prunes tool calls and compacts shell output etc.

English

Bub@BubCasto·3d

@koenvaneijk @LottoLabs So the trade off is context, got a 5090 also so just trying to wrap my head around all this, I’ll give q6 (and maybe q8) a shot!

English

Lotto@LottoLabs·4d

Okay tonight we do vLLM vs LMstudio(llama.cpp) checks w/ qwen 27b

English

123

8.9K

Koen@koenvaneijk·3d

@brah_ddah @LottoLabs github.com/screamingsilic… that's the compose I use

English

Brahddah.eth (Elite Chad)@brah_ddah·3d

@koenvaneijk @LottoLabs I wonder if llama.cpp uses less RAM then vLLM? How big is the model on disk?

English

Koen@koenvaneijk·3d

@brah_ddah @LottoLabs Yes this maxes the VRAM, my cofounder wrote a script that grid searched the max Qwen 3.5 we could run on RTX5090 and 27B Q6 from Unsloth hits the spot with 90k context.

English

Brahddah.eth (Elite Chad)@brah_ddah·3d

@koenvaneijk @LottoLabs 100% In VRAM? My 5090 can’t go beyond 64K with vLLM and NVFP4

English

Koen@koenvaneijk·3d

@BubCasto @LottoLabs If it fits it sits!

English

Bub@BubCasto·3d

@koenvaneijk @LottoLabs Why q6 vs q4?

English

Koen@koenvaneijk·5d

@bearlyai token machine man says you need more token machines

English

Bearly AI@bearlyai·5d

Jensen says he will be upset if he finds out his $500k engineer is *not* using at least $250k in tokens

English

152

174

3.8K

905.5K

Koen@koenvaneijk·5d

such a small model but qwen 27b keeps impressing with localcode github.com/screamingsilic…

English

Koen@koenvaneijk·5d

qwen 27b is at least opus 4 but at home

English

Koen@koenvaneijk·6d

qwen 27b on an rtx5090 with localcode is a beast and you can't convince me otherwise > 97% cache hits > 60 tokens/s this is it. the 100k/pa SWE in a 5k box.

English

Keşfet

@LottoLabs @NousResearch @miguelsff @sudoingX @TwatterLester @juliafedorin @Tool @BubCasto