Michele Mattioni

2.2K posts

Michele Mattioni banner
Michele Mattioni

Michele Mattioni

@mattions

I tend to write in English here. I tend to write in Italian here: https://t.co/7HPDRcS9Nq

Italy, Europe Katılım Şubat 2010
736 Takip Edilen427 Takipçiler
Michele Mattioni
Michele Mattioni@mattions·
Did anyone manage to optimize a nice pipeline to create vide with local models using Hermes? I know there are skills and so forth,but from the idea to end product?
English
0
0
0
14
Michele Mattioni
Michele Mattioni@mattions·
So you've got local AI at home and you can use it the way you want to do whatever you need, because tokens are free and you can experiment. Give an example. Sure: build a granny chart to organize the cooking of our son's birthday party :)
English
0
0
0
12
Michele Mattioni retweetledi
Shann³
Shann³@shannholmberg·
Hermes Agent just shipped skill bundles I used to do this myself with skill-chains (one skill that referenced and called multiple other skills), now it's native and better but you need to be careful about how you use them when you trigger a bundle, the agent receives every skill in that bundle loaded into a single user message. any text after the slash command gets attached as your instruction. this means the quality of your output depends entirely on how well those skills compose together. if you stack five skills that don't naturally connect, you end up with conflicting instructions firing at once. the agent gets confused and output drifts. here's a rule for it, bundle workflows that chain together logically. something like research → ideate → write → critic works because each step feeds the next. bundling random utility skills just because they're useful in the same project will create noise. start with the workflows you've run more than twice this week. if you keep triggering the same three skills in sequence, bundle them. if you're just grouping skills for convenience, keep them separate.
Shann³ tweet media
Nous Research@NousResearch

Introducing skill bundles:

English
23
40
371
36.3K
Michele Mattioni retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
English
7.8K
11.1K
147.6K
26.5M
Michele Mattioni
Michele Mattioni@mattions·
@sudoingX I'm solid on 3.6 35b . The 27b never really worked for me in the same nice way
English
0
0
1
62
Sudo su
Sudo su@sudoingX·
saying it out loud again. on a single 3090, the king is qwen 3.6 27b dense at q4, and nothing in that tier comes close. i've benchmarked the tier. happy to be wrong, so name the model that beats it. i just know you can't.
English
68
18
385
34.4K
Michele Mattioni retweetledi
Joel - coffee/acc
Joel - coffee/acc@JoelDeTeves·
RTX 3090 with Qwen3.6 MTP
Polski
17
25
251
14.6K
Michele Mattioni
Michele Mattioni@mattions·
The greatest thing about agentic AI running on local resources is: - ability to explore anything, because there is no session/usage tax - ability to use any data. Everything is local, zero stress on where the data goes. It's like having a dedicated Chief of Staff
English
0
0
0
10
Michele Mattioni
Michele Mattioni@mattions·
Always said that the only real test is e2e kind Everything else can be nice to have, but the real truth is from the holistic view. And yes with agentic coding, e2e do make sense, unit tests do tell very little, because agents can patch them as they go to be green anyway
Fatih Arslan@fatih

I was a huge unit test supporter, but honestly, it's no longer worth it. Agents are superb at writing extremely bad unit tests, and they still look good on paper. We're also shifting slightly to more and more e2e tests at @PlanetScale. Luckily with agents, that shift is also manageable.

English
0
0
0
56
Michele Mattioni
Michele Mattioni@mattions·
@sudoingX The context IMHO is way too big. It fits, but the computer will OOM too much, especially if TTS local services or random ComfyUI. I've tried with 192, but the real sweet sport is 131k for me
English
1
0
2
297
Sudo su
Sudo su@sudoingX·
this is what i am running if you gonna replicate. llama-server flags: -ngl 99 -c 262144 -np 1 -fa on --cache-type-k q4_0 --cache-type-v q4_0 --port 8080 --host 0.0.0.0 model: unsloth/Qwen3.6-27B-GGUF Q4_K_M (16gb file, 262k native context) hardware: single rtx 3090 24gb vram: 21gb / 24gb loaded at full 262k harness: hermes agent, custom openai-compatible provider at http://localhost:8080/v1
English
9
3
74
3.9K
Sudo su
Sudo su@sudoingX·
this is what my setup looks like today. about to test qwen 3.6 27b dense q4 on a single rtx 3090 at ~41 tok/s gen, hermes agent driving. predecessor model qwen 3.5 dense q4 made it work in one iteration when i ran the same agentic build on the same card. i've been daily driving qwen 3.6 27b dense for weeks now, the model i keep coming back to. if 3.6 oneshots too, this becomes the best model that runs on a single rtx 3090. consumer tier king. firing the test now will report back soon.
Sudo su tweet media
English
25
9
269
81.6K
Michele Mattioni
Michele Mattioni@mattions·
@outsource_ I've got a local AI agent that: - write code for me - draft documents - search things and reports back - in general I treat him as a very - creates Instagram carousel And so on.. capable personal assistant, with lots of skills
English
0
0
0
70
Michele Mattioni
Michele Mattioni@mattions·
@TheAhmadOsman Agreed. Getting really things _done_ with the 3.6 35B on local machinery. It just gets stuff done.
English
0
0
0
138
Ahmad
Ahmad@TheAhmadOsman·
Imagine seeing how capable Qwen 3.6 27B is when you give it web access and a proper harness and not yet getting that AGI will run locally Ngmi
English
46
8
426
24K
Jean Louis 🇺🇬 ☕ 让·路易 ☮️🪙
@gridpane @sudoingX All what you mention runs on RTX 3090. All new Qwen models have vision, minerU works perfect, OCR perfect, Whisper and TTS, bge-reranker, all that works perfect with single GPU. Only for image generation I must unload the LLM to run ComfyUI.
English
1
0
0
160
Sudo su
Sudo su@sudoingX·
hot take: 90% of ai startups paying for api calls could run the same workloads locally on a single 3090 and never notice the difference. you don't need frontier pricing for tasks a 27B model handles fine. most have never even tested a quantized model on consumer hardware. not every task in your pipeline should be burning credits. audit your workload. you'd be surprised what runs locally.
English
71
23
430
44.9K
Michele Mattioni retweetledi
Sudo su
Sudo su@sudoingX·
it's so easy to get started in local ai actually. the only real wall is vram math. practical heuristic for a single gpu: > 24gb = 27B Q4_K_M at 262k context (qwen 3.6, carnice-v2) > 16gb = 13B Q5_K_M at 32k or 9B Q8_0 at 64k > 12gb = 8B Q5_K_M at 16k > 8gb = 4B Q4_K_M at 8k quantization rule of thumb: Q4_K_M ≈ 0.6 gb per billion params. kv cache scales with context. add 1 gb activation buffer. that's the math. every other piece (llama.cpp build, hermes agent setup, prompt config) is one good day setup. the math is the only ongoing constraint. once you can eyeball this for your gpu, you can pick any model + context combo with confidence. stop being intimidated by the stack.
English
31
44
580
27.5K
Michele Mattioni
Michele Mattioni@mattions·
Boyz and girlz, get a GPU that is 7 years old, get Hermes or whatever you like, start looking into this. Soon a story of my hat I did just with Local AI. One of the money. TL;DR: you have superpowers
English
1
0
0
41
Sudo su
Sudo su@sudoingX·
anon, if you're new to local ai or agentic workflows, learn these three tools before anything else. >tmux - persistent sessions that survive disconnects. your agents keep running whether you're watching or not. >termius - ssh from your phone. full terminal access from anywhere. >tailscale - mesh your machines. access any device from any device. this screenshot is me managing hermes agent benchmarking qwen 3.6 27B on my dgx spark while i'm at the gym. three sessions running across three agents. from my phone. these tools are criminally underrated. once you use them you'll never go back to sitting at a desk waiting for inference to finish. own your compute. orchestrate from anywhere.
Sudo su tweet media
English
75
67
1K
62.2K
Michele Mattioni
Michele Mattioni@mattions·
@LottoLabs 1. Take submission, assuming good faith 2. Ask folks to replicate an entry. More repeats, from different users, most likely is true It can be gamed as well of course, but we are looking for an approach that is viable, IMHO. The trick is to have : replicate run <id> Just an idea
English
0
0
0
12
Michele Mattioni
Michele Mattioni@mattions·
@LottoLabs Yeah. I mean, that the real run with the real results. I think, speaking with my scientific hat on, we need to make sure we have a reproducible experiment. We need a controlled way to do a run. Instead of controlling everything, and given the approach taken I suggest:
English
1
0
0
11
Michele Mattioni
Michele Mattioni@mattions·
BTW @Teknium , I've asked this before, and I've just figure out now that you guys already solved this, but I did not know :D
English
0
0
0
10
Michele Mattioni
Michele Mattioni@mattions·
so I actually figure it out. I was so early on hermes, and it is so stable, that I did not have to update my slack configuration from a long time. Then yesterday I have installed a new hermes for a new user. and it turns out that the integration with slack has _moved on_
English
1
0
0
25