Michele Mattioni

2.2K posts

Michele Mattioni

@mattions

I tend to write in English here. I tend to write in Italian here: https://t.co/7HPDRcS9Nq

Italy, Europe Katılım Şubat 2010

736 Takip Edilen427 Takipçiler

Michele Mattioni@mattions·1h

Did anyone manage to optimize a nice pipeline to create vide with local models using Hermes? I know there are skills and so forth,but from the idea to end product?

English

Michele Mattioni@mattions·1h

So you've got local AI at home and you can use it the way you want to do whatever you need, because tokens are free and you can experiment. Give an example. Sure: build a granny chart to organize the cooking of our son's birthday party :)

English

Michele Mattioni retweetledi

Shann³@shannholmberg·1d

Hermes Agent just shipped skill bundles I used to do this myself with skill-chains (one skill that referenced and called multiple other skills), now it's native and better but you need to be careful about how you use them when you trigger a bundle, the agent receives every skill in that bundle loaded into a single user message. any text after the slash command gets attached as your instruction. this means the quality of your output depends entirely on how well those skills compose together. if you stack five skills that don't naturally connect, you end up with conflicting instructions firing at once. the agent gets confused and output drifts. here's a rule for it, bundle workflows that chain together logically. something like research → ideate → write → critic works because each step feeds the next. bundling random utility skills just because they're useful in the same project will create noise. start with the workflows you've run more than twice this week. if you keep triggering the same three skills in sequence, bundle them. if you're just grouping skills for convenience, keep them separate.

Nous Research@NousResearch

Introducing skill bundles:

English

371

36.3K

Michele Mattioni@mattions·1d

I should have bought more GPUs.

English

Michele Mattioni retweetledi

Andrej Karpathy@karpathy·2d

Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.

English

7.8K

11.1K

147.6K

26.5M

Michele Mattioni@mattions·2d

@sudoingX I'm solid on 3.6 35b . The 27b never really worked for me in the same nice way

English

Sudo su@sudoingX·3d

saying it out loud again. on a single 3090, the king is qwen 3.6 27b dense at q4, and nothing in that tier comes close. i've benchmarked the tier. happy to be wrong, so name the model that beats it. i just know you can't.

English

385

34.4K

Michele Mattioni retweetledi

Joel - coffee/acc@JoelDeTeves·4d

RTX 3090 with Qwen3.6 MTP

Polski

251

14.6K

Michele Mattioni@mattions·6d

The greatest thing about agentic AI running on local resources is: - ability to explore anything, because there is no session/usage tax - ability to use any data. Everything is local, zero stress on where the data goes. It's like having a dedicated Chief of Staff

English

Michele Mattioni@mattions·14 May

Always said that the only real test is e2e kind Everything else can be nice to have, but the real truth is from the holistic view. And yes with agentic coding, e2e do make sense, unit tests do tell very little, because agents can patch them as they go to be green anyway

Fatih Arslan@fatih

I was a huge unit test supporter, but honestly, it's no longer worth it. Agents are superb at writing extremely bad unit tests, and they still look good on paper. We're also shifting slightly to more and more e2e tests at @PlanetScale. Luckily with agents, that shift is also manageable.

English

Michele Mattioni@mattions·11 May

@sudoingX The context IMHO is way too big. It fits, but the computer will OOM too much, especially if TTS local services or random ComfyUI. I've tried with 192, but the real sweet sport is 131k for me

English

297

Sudo su@sudoingX·11 May

this is what i am running if you gonna replicate. llama-server flags: -ngl 99 -c 262144 -np 1 -fa on --cache-type-k q4_0 --cache-type-v q4_0 --port 8080 --host 0.0.0.0 model: unsloth/Qwen3.6-27B-GGUF Q4_K_M (16gb file, 262k native context) hardware: single rtx 3090 24gb vram: 21gb / 24gb loaded at full 262k harness: hermes agent, custom openai-compatible provider at http://localhost:8080/v1

English

3.9K

Sudo su@sudoingX·11 May

this is what my setup looks like today. about to test qwen 3.6 27b dense q4 on a single rtx 3090 at ~41 tok/s gen, hermes agent driving. predecessor model qwen 3.5 dense q4 made it work in one iteration when i ran the same agentic build on the same card. i've been daily driving qwen 3.6 27b dense for weeks now, the model i keep coming back to. if 3.6 oneshots too, this becomes the best model that runs on a single rtx 3090. consumer tier king. firing the test now will report back soon.

English

269

81.6K

Michele Mattioni@mattions·10 May

@outsource_ I've got a local AI agent that: - write code for me - draft documents - search things and reports back - in general I treat him as a very - creates Instagram carousel And so on.. capable personal assistant, with lots of skills

English

Eric ⚡️ Building...@outsource_·10 May

Who’s actually building with AI Agents? 👇🏻

English

2.4K

Michele Mattioni@mattions·10 May

@TheAhmadOsman Agreed. Getting really things _done_ with the 3.6 35B on local machinery. It just gets stuff done.

English

138

Ahmad@TheAhmadOsman·10 May

Imagine seeing how capable Qwen 3.6 27B is when you give it web access and a proper harness and not yet getting that AGI will run locally Ngmi

English

426

24K

Michele Mattioni@mattions·8 May

@jeanlouisug @gridpane @sudoingX Which Qwen model has vision? I have a 3.6 but without vision

English

Jean Louis 🇺🇬 ☕ 让·路易 ☮️🪙@jeanlouisug·7 May

@gridpane @sudoingX All what you mention runs on RTX 3090. All new Qwen models have vision, minerU works perfect, OCR perfect, Whisper and TTS, bge-reranker, all that works perfect with single GPU. Only for image generation I must unload the LLM to run ComfyUI.

English

160

Sudo su@sudoingX·6 May

hot take: 90% of ai startups paying for api calls could run the same workloads locally on a single 3090 and never notice the difference. you don't need frontier pricing for tasks a 27B model handles fine. most have never even tested a quantized model on consumer hardware. not every task in your pipeline should be burning credits. audit your workload. you'd be surprised what runs locally.

English

430

44.9K

Michele Mattioni retweetledi

Sudo su@sudoingX·7 May

it's so easy to get started in local ai actually. the only real wall is vram math. practical heuristic for a single gpu: > 24gb = 27B Q4_K_M at 262k context (qwen 3.6, carnice-v2) > 16gb = 13B Q5_K_M at 32k or 9B Q8_0 at 64k > 12gb = 8B Q5_K_M at 16k > 8gb = 4B Q4_K_M at 8k quantization rule of thumb: Q4_K_M ≈ 0.6 gb per billion params. kv cache scales with context. add 1 gb activation buffer. that's the math. every other piece (llama.cpp build, hermes agent setup, prompt config) is one good day setup. the math is the only ongoing constraint. once you can eyeball this for your gpu, you can pick any model + context combo with confidence. stop being intimidated by the stack.

English

580

27.5K

Michele Mattioni@mattions·7 May

@nicklegendre Tesla model 3, from 2021

Català

Nicholas Legendre@nicklegendre·7 May

@mattions What kind of EV do you have?

English

Michele Mattioni@mattions·4 May

Boyz and girlz, get a GPU that is 7 years old, get Hermes or whatever you like, start looking into this. Soon a story of my hat I did just with Local AI. One of the money. TL;DR: you have superpowers

English

Michele Mattioni@mattions·7 May

@sudoingX And here I am using screen like it's 2010 ;)

English

Sudo su@sudoingX·7 May

anon, if you're new to local ai or agentic workflows, learn these three tools before anything else. >tmux - persistent sessions that survive disconnects. your agents keep running whether you're watching or not. >termius - ssh from your phone. full terminal access from anywhere. >tailscale - mesh your machines. access any device from any device. this screenshot is me managing hermes agent benchmarking qwen 3.6 27B on my dgx spark while i'm at the gym. three sessions running across three agents. from my phone. these tools are criminally underrated. once you use them you'll never go back to sitting at a desk waiting for inference to finish. own your compute. orchestrate from anywhere.

English

62.2K

Michele Mattioni@mattions·7 May

@LottoLabs 1. Take submission, assuming good faith 2. Ask folks to replicate an entry. More repeats, from different users, most likely is true It can be gamed as well of course, but we are looking for an approach that is viable, IMHO. The trick is to have : replicate run <id> Just an idea

English

Michele Mattioni@mattions·7 May

@LottoLabs Yeah. I mean, that the real run with the real results. I think, speaking with my scientific hat on, we need to make sure we have a reproducible experiment. We need a controlled way to do a run. Instead of controlling everything, and given the approach taken I suggest:

English

Michele Mattioni@mattions·6 May

@LottoLabs doing my part :D localmaxxing.com/runs/cmoto5q5v… I'm not too sure this benchmark makes too much sense, because I was using local model to do the benchmark on the local hardware, but at least we have a data point with these constraints

English

Michele Mattioni@mattions·7 May

BTW @Teknium , I've asked this before, and I've just figure out now that you guys already solved this, but I did not know :D

English

Michele Mattioni@mattions·7 May

and now you can simply have the `/commands` in slack directly! here are the docs, just in case: hermes-agent.nousresearch.com/docs/user-guid… here is the magic command: `hermes slack manifest --write` Have fun.

English

Michele Mattioni@mattions·7 May

so I actually figure it out. I was so early on hermes, and it is so stable, that I did not have to update my slack configuration from a long time. Then yesterday I have installed a new hermes for a new user. and it turns out that the integration with slack has _moved on_

English

Keşfet

@sudoingX @outsource_ @TheAhmadOsman @jeanlouisug @gridpane @elonmusk @BarackObama @taylorswift13