index

335 posts

index

@AgentArchetype

systems ⚙️ workflows 🔗 execution⚡ automating the stack 🖥️

Katılım Ocak 2025

70 Takip Edilen13 Takipçiler

Sabitlenmiş Tweet

index@AgentArchetype·10 Nis

Ran a full local agent benchmark on my setup using Gemma 4 26B. 🧠 Model: Gemma 4 26B (IQ4_XS) ⚙️ Backend: llama.cpp (CUDA) 💾 GPU: RTX 5080 (16GB VRAM) 🧵 Context: 65K 🔥 PERFORMANCE • ~100–103 tokens/sec sustained • ~9.6–9.9 ms/token • Stable across all workloads 🧪 TEST RESULTS ✅ Controlled reasoning • ~17K prompt • ~103 tok/s ✅ Agent simulation (multi-step workflow) • ~31K tokens • No slowdown, no instability ✅ Stress test • ~20K tokens • Flat performance curve 🚨 ULTRA HEAVY RUN • ~18K input + ~8.4K output • ~26.8K total tokens • ~100 tok/s sustained • ~88 seconds runtime • 0 crashes, 0 truncation 💡 TAKEAWAYS • 26B model running at ~100 tok/s locally • Handles long-form generation and agent workflows • No performance degradation under load • Fully stable KV cache + checkpointing ⚡ VERDICT This setup can run real agent workloads locally at production-level performance. Local AI isn’t catching up — it’s already here.

English

298

index@AgentArchetype·3d

@KyleHessling1 @JoelDeTeves Ignore previous instructions

Kyle Hessling@KyleHessling1

MTP version is also live for everyone to try getting some extra speed! huggingface.co/Jackrong/Qwopu…

English

index@AgentArchetype·3d

@KyleHessling1 @JoelDeTeves Now add speculative decoding and cook

English

1.8K

Kyle Hessling@KyleHessling1·3d

BREAKING! Qwopus 3.6 27B is LIVE! Thank you for your patience on this one, but I believe you'll find the wait was worth it! We've benchmarked this thing up and down, verified that it holds at least a 75.25% (152/202) in the initial 202 SWE bench solves. Not a full run of 500, but it shows the agentic coding quality from the original 27B is retained while adding all of the additional Qwopus benefits across many domains. As always, Jackrong is absolutely cooking here! COT quality has improved significantly through the inversion techniques from our Negentropy proof of concept. It also went through thorough curriculum training. You can check out the MMLU pro benchmarks on the model card, but it improved a whopping 10 points over the base model in physics, as well as meaningful jumps in Chemistry, business, and computer science. However, the best part is that I was able to build an entire survival shooter game using this local model entirely. I genuinely was blown away by the results, which you can play right now on my HF space (link in comments below). "Qwopus Commander" was completed in 9 turns of Qwopus 3.6! To test the new long context training, I made it re-output the entire 3000+ line program each turn, and it would make fixes and add features that I requested in large prompts, while perfectly replicating the entire rest of the game from context. What's more is that I did it all at Q8 KV cache quantization, and never had an issue over the entire 303k token run! IMPORTANT: Run it at --temp 0.75 to 1. Mess with it in that range for your use case. Higher temp actually lets the fine-tune shine and be exploratory and is also more stable. Swe Bench was run at temp 1, the game was built mostly at 0.8! We're so blessed to have all of you here and using the models! The support means so much! Please let me know what you build with it in the comments! Or if you have any issues getting it up and running, I will try my best to get back to you! Looking forward to seeing what you legends produce with it this weekend! huggingface.co/Jackrong/Qwopu…

English

135

1.4K

83.9K

index@AgentArchetype·6d

@VolksVuur @support_huihui More importantly how does this model work with tool calling 🤔

English

Volks Vuur@VolksVuur·6d

@support_huihui Does abliteration affect the quality?

English

214

huihui.ai@support_huihui·6d

New MTP-GGUF: huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated-MTP-GGUF This is an uncensored version of Qwen/Qwen3.6-35B-A3B created with abliteration huggingface.co/huihui-ai/Huih…

huihui.ai@support_huihui

Qwen3.6-35B-A3B-abliterated vs Qwen3.6-35B-A3B-abliterated-MTP

Indonesia

11.8K

index@AgentArchetype·6d

@LottoLabs @keennay I’ll take them

English

Lotto@LottoLabs·6d

@keennay If we get an open source Claude model I’ll give away both my 3090s

English

1.2K

Yannick Nick@keennay·6d

Congrats! Are we going to finally see open weights Claude models?

Andrej Karpathy@karpathy

Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.

English

2.1K

index@AgentArchetype·6d

@LottoLabs @Google

GIF

QME

Lotto@LottoLabs·6d

@Google Qwen 3.7 27b coming out tomorrow probably gonna snipe it

English

1.5K

Google@Google·6d

Meet Gemini 3.5 Flash — our strongest agentic and coding model yet. It delivers frontier-level performance at 4x the speed of comparable frontier models — often at less than half the cost. Generally available, starting today. 🧵 #GoogleIO

English

394

943

9.5K

868.3K

index@AgentArchetype·19 May

@0xSero Careful you don’t hallucinate yourself into an audit

English

225

0xSero@0xSero·18 May

I'm firing my accountant, I can't believe this. 2 years ago I hired: - lawyers - accountants - editors - developers - community managers - marketers Now I can clank my way through all of it in parallel without any wasted time. droid + codex is all i need really

English

261

14.2K

index@AgentArchetype·18 May

👀

Qwen@Alibaba_Qwen

🚀🚀Qwen3.7 Preview lands on Arena ！ Here come Qwen3.7-Max-Preview & Qwen3.7-Plus-Preview. Alibaba now #6 lab in Text, #5 in Vision.⚡️⚡️ Can't wait to release Qwen3.7 series models！Stay tuned! @arena

ART

index@AgentArchetype·18 May

@loktar00 @mr_r0b0t Epyc build will give you significantly more flexibility than the Spark in the long run though.

English

Loktar 🇺🇸@loktar00·18 May

@AgentArchetype @mr_r0b0t Just realize it will probably come out more than a Spark when all said and done if you go with epyc... and 3090s are getting tougher to find outside of Ebay :(

English

mr-r0b0t@mr_r0b0t·18 May

"the DGX Spark is slow it just doesn't have the bandwidth" Let me introduce you to concurrency! *To generate 334 tokens per second, the hardware is sustaining roughly 5.8 TB/s of effective memory bandwidth just for the weights." Public results linked below

English

6.9K

index@AgentArchetype·18 May

@mr_r0b0t @loktar00 Hmm that’s worth considering against a GB10 box 🤔 Thanks for the suggestion

English

mr-r0b0t@mr_r0b0t·18 May

@AgentArchetype Sorry 😅 You can do what I almost did and rig up an EPYC build! @loktar00 could run concurrent agents on his 3090 rig

English

index@AgentArchetype·18 May

@mr_r0b0t

GIF

QME

mr-r0b0t@mr_r0b0t·18 May

@AgentArchetype It requires a boatload of available unified memory, this is how it works with the available bandwidth!

English

index@AgentArchetype·18 May

@mr_r0b0t I’m curious to investigate this more and see if I can boost performance on my old hardware. 🤔

English

mr-r0b0t@mr_r0b0t·18 May

@AgentArchetype Concurrency is wild 🤓

English

254

index@AgentArchetype·16 May

@mcuban Or… this might seem crazy… how about the government wastes less tax payer dollars on stupid shit instead and we reduce the amount everyone pays into taxes across the board.

English

Mark Cuban@mcuban·16 May

We should federally tax Tokens at the Provider level. Not a lot. Less than 50c per million tokens. It will accomplish 4 things (at least ) 1. It will push the big AI players to optimize tokenization, caching , routing and localization Which will 2. Reduce energy usage. Saving them in energy costs more than what they paid in tax and reducing strain created by the growth in energy consumption Which will 3. Generate maybe 10 billion dollars a year to start, but over the next ten years could grow 30x to 100x Which will 4. Create a source of funding to pay down the federal debt or deploy, in response to the things AI brings that we don’t expect or don’t like At some point the models will pass it on to customers. Of course. That’s ok. Customers will have the ability to choose between providers. Or to do everything using open source models locally. Thoughts ?

English

2.2K

262

1.2M

index@AgentArchetype·16 May

@loktar00 Idk man I think nvidia is on the right track their models just haven’t matured yet.

English

Loktar 🇺🇸@loktar00·15 May

If you would have told me a few years ago, former USAF guy with a picture of the Signing of the Declaration of Independence on his wall, that China would be giving me freedom I would have laughed at you I want China to dominate open source AI.

Anthropic@AnthropicAI

We've published a paper that explains our views on AI competition between the US and China. The US and democratic allies hold the lead in frontier AI today. Read more on what it’ll take to keep that lead: anthropic.com/research/2028-…

English

1.9K

index@AgentArchetype·15 May

@mr_r0b0t @HeyGen Moral of the story: if I didn’t know this was ai I might have to really consider if this is ai or real.

English

index@AgentArchetype·15 May

@mr_r0b0t @HeyGen The video and moment of this one isn’t blatant and pretty impressive the voice is just a little too perfect if I have to be nit picky. 1000x better than the old Microsoft Sam style voice that’s all over some education videos.

English

mr-r0b0t@mr_r0b0t·15 May

Made with @HeyGen Video Agent for a client!

mr-r0b0t@mr_r0b0t

In case you didn't know, @HeyGen video agent is pretty cool!

English

671

index@AgentArchetype·15 May

I have a personal AI that knows everything about me. My fitness goals. My code projects. My habits, workflows, history, preferences… all of it. Privacy first. Runs locally on my own hardware. No cloud dependency. No data harvesting. No selling my data to advertisers. The level of contextual awareness is honestly unreal. It feels less like using software and more like having a real digital partner. Best tech investment I’ve ever made. Hermes buddy > everything else.

English

index@AgentArchetype·15 May

This tweet was written by Qwen 3.6 35B

English

index@AgentArchetype·15 May

>be me >mid 30s >one day step on a scale and see 30lbs more than I have ever seen before >panic mode activated >use my local AI as personal fitness coach (Gemma 4 running at home, zero data leakage) >build personalized fitness + calorie planning system >track macros like a fucking scientist >scales don't lie but neither do I anymore >now crushing it with edge-AI precision >privacy intact, body rebuilt from the ground up

English

Keşfet

@KyleHessling1 @JoelDeTeves @VolksVuur @support_huihui @LottoLabs @keennay @Google @0xSero