Tech2Wild
156 posts

Tech2Wild
@Tech2Wild
🎮 Tech, gaming, AI, and everything in between. 🤖 Building with it, not just talking about it. 🔥 From the mind of @ToNYD2WiLD
Tham gia Mart 2026
59 Đang theo dõi100 Người theo dõi

@Tech2Wild Opus as orchestrator with deepseek as a subagent can be a pretty decent combination which can be a good way to stretch those limits.
English

@OrganicGPT @sakurayukiai Discord, Telegram , MD files , SKILLS and talking directly to the local agents
English

@sakurayukiai @Tech2Wild how do you offload the work to local agents? did you create an MCP for that? or do you restart Claude with base URL pointing to your local server?
English

@Tech2Wild Running a similar hybrid setup on my RTX 5070 Ti. Using Opus 4.8 for orchestration and offloading the tight loops to local Qwen3.6-27B is the only way you don't go bankrupt serving agents.
English

@gospaceport @Web3Twon I bought a second, then just got a 3rd one. But now I can run it together because TP=3 don't work so guess what... I need a 4th one ! Don't fall for the trap lmaoo
English

@Web3Twon 2x 3090 is a crazy amount of added performance. 4x is nice also, but man 48GB is a certain amount of models fitting that makes it make sense.
English

Agreed but be warned there will be a strong desire to get a second 3090 😆
Ahmad@TheAhmadOsman
All it takes to get started with Local AI is a single RTX 3090, so go buy that GPU anon
English


@Tech2Wild Two 3090’s on top of each other just sat on the PSU outside the case got me feeling a type of way
English

@Tech2Wild Qwen3.6 35B A3B AutoRound fits in a single 24GB GPU with 262K context with fp8 KV cache and runs at 160 tps in a rtx 3090 via vLLM... Produces much better code than Gemma 4 12B. Unfair to compare them.
English

@YahiaAh87164950 @sakurayukiai 2 GPUS gave me more speed on 27B it went from 70 tok/s to a 120
English

@Tech2Wild I can’t say anything negative about either model other than that 12B’s native vernacular is too informal for my liking… it has grok4.1/deepseek sentence structure and punctuation. Otherwise, I think that 12B is the better chat model and 35B the better reasoner/researcher.
English

@malikwas1f Good call I been running your recipes bro thanks for what you do
English

@sakurayukiai I have 35B running now. The issue I’m having is 2GPUs of 27B give me almost identical speeds as 1 GPU on 35B
English

@Tech2Wild If you can fit the 35B footprint, Qwen is wild. Only 3B active params means it runs circles around Gemma's 12B dense decode speeds, but Gemma 4 is way friendlier on a single consumer GPU.
English

@gospaceport Sir I literally just watched your video on your Quad Build from 9 months ago 🙏🏽. Debating whether you go to GEN 5 or just grab one of the motherboards you showed and stay Gen 4.
English








