
index
335 posts

index
@AgentArchetype
systems ⚙️ workflows 🔗 execution⚡ automating the stack 🖥️
Katılım Ocak 2025
70 Takip Edilen13 Takipçiler
Sabitlenmiş Tweet

Ran a full local agent benchmark on my setup using Gemma 4 26B.
🧠 Model: Gemma 4 26B (IQ4_XS)
⚙️ Backend: llama.cpp (CUDA)
💾 GPU: RTX 5080 (16GB VRAM)
🧵 Context: 65K
🔥 PERFORMANCE
• ~100–103 tokens/sec sustained
• ~9.6–9.9 ms/token
• Stable across all workloads
🧪 TEST RESULTS
✅ Controlled reasoning
• ~17K prompt
• ~103 tok/s
✅ Agent simulation (multi-step workflow)
• ~31K tokens
• No slowdown, no instability
✅ Stress test
• ~20K tokens
• Flat performance curve
🚨 ULTRA HEAVY RUN
• ~18K input + ~8.4K output
• ~26.8K total tokens
• ~100 tok/s sustained
• ~88 seconds runtime
• 0 crashes, 0 truncation
💡 TAKEAWAYS
• 26B model running at ~100 tok/s locally
• Handles long-form generation and agent workflows
• No performance degradation under load
• Fully stable KV cache + checkpointing
⚡ VERDICT
This setup can run real agent workloads locally at production-level performance.
Local AI isn’t catching up — it’s already here.
English

@KyleHessling1 @JoelDeTeves Now add speculative decoding and cook
English

BREAKING! Qwopus 3.6 27B is LIVE!
Thank you for your patience on this one, but I believe you'll find the wait was worth it!
We've benchmarked this thing up and down, verified that it holds at least a 75.25% (152/202) in the initial 202 SWE bench solves. Not a full run of 500, but it shows the agentic coding quality from the original 27B is retained while adding all of the additional Qwopus benefits across many domains. As always, Jackrong is absolutely cooking here!
COT quality has improved significantly through the inversion techniques from our Negentropy proof of concept. It also went through thorough curriculum training. You can check out the MMLU pro benchmarks on the model card, but it improved a whopping 10 points over the base model in physics, as well as meaningful jumps in Chemistry, business, and computer science.
However, the best part is that I was able to build an entire survival shooter game using this local model entirely. I genuinely was blown away by the results, which you can play right now on my HF space (link in comments below). "Qwopus Commander" was completed in 9 turns of Qwopus 3.6! To test the new long context training, I made it re-output the entire 3000+ line program each turn, and it would make fixes and add features that I requested in large prompts, while perfectly replicating the entire rest of the game from context. What's more is that I did it all at Q8 KV cache quantization, and never had an issue over the entire 303k token run!
IMPORTANT: Run it at --temp 0.75 to 1. Mess with it in that range for your use case. Higher temp actually lets the fine-tune shine and be exploratory and is also more stable. Swe Bench was run at temp 1, the game was built mostly at 0.8!
We're so blessed to have all of you here and using the models! The support means so much! Please let me know what you build with it in the comments! Or if you have any issues getting it up and running, I will try my best to get back to you!
Looking forward to seeing what you legends produce with it this weekend!
huggingface.co/Jackrong/Qwopu…
English

@VolksVuur @support_huihui More importantly how does this model work with tool calling 🤔
English

New MTP-GGUF:
huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated-MTP-GGUF
This is an uncensored version of Qwen/Qwen3.6-35B-A3B created with abliteration
huggingface.co/huihui-ai/Huih…
huihui.ai@support_huihui
Qwen3.6-35B-A3B-abliterated vs Qwen3.6-35B-A3B-abliterated-MTP
Indonesia

Congrats! Are we going to finally see open weights Claude models?
Andrej Karpathy@karpathy
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
English

@AgentArchetype @mr_r0b0t Just realize it will probably come out more than a Spark when all said and done if you go with epyc... and 3090s are getting tougher to find outside of Ebay :(
English

@AgentArchetype Sorry 😅
You can do what I almost did and rig up an EPYC build! @loktar00 could run concurrent agents on his 3090 rig
English

@AgentArchetype It requires a boatload of available unified memory, this is how it works with the available bandwidth!
English

We should federally tax Tokens at the Provider level.
Not a lot. Less than 50c per million tokens.
It will accomplish 4 things (at least )
1. It will push the big AI players to optimize tokenization, caching , routing and localization
Which will
2. Reduce energy usage. Saving them in energy costs more than what they paid in tax and reducing strain created by the growth in energy consumption
Which will
3. Generate maybe 10 billion dollars a year to start, but over the next ten years could grow 30x to 100x
Which will
4. Create a source of funding to pay down the federal debt or deploy, in response to the things AI brings that we don’t expect or don’t like
At some point the models will pass it on to customers. Of course. That’s ok. Customers will have the ability to choose between providers. Or to do everything using open source models locally.
Thoughts ?
English

If you would have told me a few years ago, former USAF guy with a picture of the Signing of the Declaration of Independence on his wall, that China would be giving me freedom I would have laughed at you
I want China to dominate open source AI.
Anthropic@AnthropicAI
We've published a paper that explains our views on AI competition between the US and China. The US and democratic allies hold the lead in frontier AI today. Read more on what it’ll take to keep that lead: anthropic.com/research/2028-…
English

Made with @HeyGen Video Agent for a client!
mr-r0b0t@mr_r0b0t
In case you didn't know, @HeyGen video agent is pretty cool!
English

I have a personal AI that knows everything about me.
My fitness goals.
My code projects.
My habits, workflows, history, preferences… all of it.
Privacy first.
Runs locally on my own hardware.
No cloud dependency.
No data harvesting.
No selling my data to advertisers.
The level of contextual awareness is honestly unreal.
It feels less like using software and more like having a real digital partner.
Best tech investment I’ve ever made.
Hermes buddy > everything else.

English

>be me
>mid 30s
>one day step on a scale and see 30lbs more than I have ever seen before
>panic mode activated
>use my local AI as personal fitness coach (Gemma 4 running at home, zero data leakage)
>build personalized fitness + calorie planning system
>track macros like a fucking scientist
>scales don't lie but neither do I anymore
>now crushing it with edge-AI precision
>privacy intact, body rebuilt from the ground up
English







