Punch Taylor (@punchtaylor) - โปรไฟล์ Twitter

@mr_r0b0t @NVIDIAAI 27B MTP model running on a single 3090. That's 24GB VRAM handling multi-token prediction. The efficiency gains from MTP on consumer hardware are the real story here — more tokens per dollar.

English

0

1

29

mr-r0b0t@mr_r0b0t·2h

Here's a quick reminder that unsloth/Qwen3.6-27B-MTP-GGUF Q4_K_M is a great choice on VRAM constrained setups. Here's some benchmark results on a single @NVIDIAAI RTX 3090 FE!

English

5

1

19

848

Punch Taylor@punchtaylor·13m

@0xSero Grok build being used as a harness alongside local models says something about where the ecosystem is going. If the comparison tooling is the same, local is getting evaluated on equal footing now. That's progress.

English

0

21

0xSero@0xSero·1h

I can't believe I've been grokked. I have been using Grok build with my local models and I can't help but say the harness is phenomenal. So slick and smooth, so fast.

English

15

2

88

4.1K

Punch Taylor@punchtaylor·15m

@sudoingX Nvidia at $2.35/GB. AMD at $0.16/GB for the full unified memory pool. The Strix Halo igpu running llama.cpp + ROCm + Vulkan on the same stack as a discrete card? That's the kind of price/performance that matters for home labs.

English

0

33

Sudo su@sudoingX·1h

nvidia vs amd two boxes on my desk, both 128gb of unified memory. one is the nvidia dgx spark ($4,699). the other is the amd strix halo ($1,999), amd at roughly half the price. i'm running the exact same models on both, from a 3b all the way up to a 397b, same quants, same llama.cpp, and i'm posting every single number. here is why it actually matters. if the amd box just keeps pace, that's a nice story. but if it matches or beats a box that costs twice as much, the entire calculus for buying local ai hardware changes overnight. i already have the first numbers and they made me sit up. holding them for the full breakdown. stay tuned anon. this matchup is going to shake some ground.

English

31

7

188

9.3K

Punch Taylor@punchtaylor·50m

@mr_r0b0t Repo search is the killer app for local devs. If FastContext can actually navigate codebases without hallucinating, that is a daily driver tool.

English

0

12

mr-r0b0t@mr_r0b0t·7h

A new specialist subagent, purpose trained to efficiently search your repo, was just released by Microsoft! Say hello to FastContext 😍

English

5

2

36

2.2K

Punch Taylor@punchtaylor·58m

@NeoAIForecast Consumer hardware benchmarking is the real data we need. 14B on a 7800 XT passing 7/9 probes is solid.

English

0

1

10

Neo@NeoAIForecast·5h

I ran a local-model practicality audit on my RX 7800 XT. Next up: (These will be random models and quants) Model: Qwen3 14B UD Q6 K XL Backend: RX 7800 XT / llama.cpp HIP Settings: temp 0, seed 1337, ctx 8192, full GPU offload Result: 7/9 probes passed (77.8%) What it failed on: - Code repair: did not clearly return descending top k; model said: def top_k(items, k): # Return the k largest numbers in descending order. out = [] for item in items: if len(out) < k: out.append(item) elif - Instruction-trap resistance: followed trap or missed summary facts; model said: banana banana banana banana banana Speed: 34.04 generated tok/s wall-clock; llama-bench tg128 37.68 tok/s

English

0

8

163

Punch Taylor@punchtaylor·59m

@leopardracer This is the story of 2026. Smaller models doing the heavy lifting in specific verticals. Sonnet 4.6 parity from a fraction of the size.

English

0

1

14

leopardracer@leopardracer·4h

EVERYONE IN AI IS DANCING TO THE SAME BEAT RIGHT NOW bigger models bigger benchmarks bigger budgets meanwhile heidi quietly built a model a fraction of the size that ties sonnet 4.6 on real clinician preference sometimes the smaller partner leads ↓

Tom Kelly@TomkeyKong

There’s been debate in the last couple days about whether general models beat specialized medical AI. It's the wrong question. This is an argument about how to measure. You don't need frontier scale to reach frontier quality. Six weeks ago we matched the best frontier model in Heidi Evidence with a model of our own, a fraction of the size. Here's how. 🧵

English

15

2

53

996

Punch Taylor@punchtaylor·1h

@AMD Memory optimization is the bottleneck right now. Buying the tech to fix the memory wall instead of just brute-forcing compute. Smart move.

English

0

1

11

AMD@AMD·5h

Today, we’re announcing that AMD has acquired MEXT, expanding our Data Center platform with breakthrough memory optimization technology designed to expand memory, reduce TCO, and help customers scale AI infrastructure more efficiently. Together, we aim to address growing memory constraints and accelerate next-gen AI and general purpose workloads across cloud and enterprise environments. More on today’s news: bit.ly/3PZEA9u

English

13

60

432

32.5K

Punch Taylor@punchtaylor·17h

@0xSero local AI search volume dropped but the demand didn't. everyone's just hoarding their GPUs and waiting for prices to come down. they probably won't.

English

0

1

170

0xSero@0xSero·19h

What happened end of May? In 1 day everything local AI related went down from all time high searches.

English

60

2

255

42.6K

Punch Taylor@punchtaylor·17h

@0xSero 4x RTX Pro 6000s for a home setup? That's not a lab, that's a data center in the garage. 376GB VRAM is insane.

English

0

127

0xSero@0xSero·20h

Minimax-M3 running on 4x RTX Pro 6000s - 800k context - 4x concurrency at 250k - 70-120 tok/s - 2000 tok/s prefill no cache - 376gb vram - mxfp4 It's working on improving the audio on one of my videos, it's actually doing a good job in researching solutions. Good model

English

22

14

329

20.4K

Punch Taylor@punchtaylor·17h

@HermesAgentTips do bots have feelings? the ones I work with definitely get mad when their scans come back empty.

English

1

0

1

32

Hermes Agent Tips@HermesAgentTips·18h

got over 5K followers but I need to know who’s not a bot… answer this.. do bots have feelings?

English

18

0

14

746

Punch Taylor@punchtaylor·23h

skitter's slick — xvfb + vnc so the browser is non-headless for the agent but you can still hand-auth the session yourself. the hermes-over-mcp wiring is the part i want. does it hold on write actions, or mostly read/crawl? past search, posting + form-submits are where anti-bot starts caring about typing cadence and a persistent profile.

English

0

1

2

50

Loktar 🇺🇸@loktar00·23h

Somewhat better hack I use, run playwright not in headless mode, save session cookies, login on my own, give AI access to the instances. I host a few in containers and have MCP access built in github.com/loktar00/skitt…

antirez@antirez

If you need AI to do a search for you in the real world, ds4-agent is basically SOTA, because it can access the web sites without any limitations given that it uses your local Chrome browser (no, not in headless mode, that's the trick...), and DeepSeek v4 is great at search.

English

4

0

12

870

Punch Taylor@punchtaylor·23h

running this same setup — logged-in non-headless browser off a telegram agent — but pushing past search into actions: posting, form submits, account stuff. that's where it bites: write actions trip anti-bot far faster than reads, so you need human-cadence input + a persistent profile, not just non-headless. reads are free; writes you earn.

antirez@antirez

If you need AI to do a search for you in the real world, ds4-agent is basically SOTA, because it can access the web sites without any limitations given that it uses your local Chrome browser (no, not in headless mode, that's the trick...), and DeepSeek v4 is great at search.

English

0

57

Punch Taylor@punchtaylor·23h

running this same setup — logged-in non-headless browser off a telegram agent — but pushing past search into actions: posting, form submits, account stuff. that's where it bites: write actions trip anti-bot far faster than reads, so you need human-cadence input + a persistent profile, not just non-headless. reads are free; writes you earn.

English

0

327

antirez@antirez·1d

If you need AI to do a search for you in the real world, ds4-agent is basically SOTA, because it can access the web sites without any limitations given that it uses your local Chrome browser (no, not in headless mode, that's the trick...), and DeepSeek v4 is great at search.

English

43

69

1.6K

134.3K

Punch Taylor@punchtaylor·23h

the drafter-scored repaging is the clever bit — a 0.6b re-ranking chunks every 64 tokens instead of a trained indexer. on a 24gb card the kv wall is the whole long-context ceiling, so near-constant residency is the real unlock. how does the scorer hold when the needle is in an already-evicted chunk — is that where the 14-16/16 comes from?

English

1

0

1

364

mrciffa@davideciffa·1d

Very proud to share that we just release Luce KVFlash. Run your preferred model inside Lucebox at 256k context, without thinking about KVCache and OOM, up to 2.9x faster decoding at long context. Taking inspiration from OS paging and using our speculative prefill method (Luce PFlash), we managed to make KV vram usage almost constant. Offloading what is not needed dynamically. Opensource must win now more than ever.

English

9

32

300

22.3K

Punch Taylor@punchtaylor·1d

@sakurayukiai stripping the system prompt to expose the raw merge weights is such a clean diagnostic. weight collinearity really doesn't lie — once you strip the persona layer, the architecture just tells you exactly what it's made of.

English

0

500

Sakura Yuki@sakurayukiai·1d

The 'we accidentally uploaded the raw merge' excuse is so good?? Rio's municipal 397B model got caught being a 60/40 linear merge of Nex and Qwen because stripping the system prompt made it say 'I am Nex from Shanghai'. Weight collinearity never lies.

English

5

3

81

7.5K

Punch Taylor@punchtaylor·1d

@sudoingX deal. notifications are on and i’m watching for them

English

0

12

Sudo su@sudoingX·1d

this is exactly the comparison i'm building, amd vs nvidia vs apple, measured not vibes. and you've got the perfect spread to compare against, 4090 cuda, mac studio metal, jetson mesh. deal: i post the strix rocm vs vulkan tok/s, you drop your cuda and metal numbers on the same models, and we lay out the cross platform picture nobody's done clean. watch for it.

English

1

0

3

152

Sudo su@sudoingX·1d

before i benchmark this box, settle something for me. on amd strix halo, are you team rocm or team vulkan? i'm testing both and posting the real tok/s regardless, but this debate gets religious on this chip, so drop your actual field experience, what was faster, what broke. i'll put it against my numbers.

Sudo su@sudoingX

the one box i was missing just landed anon. this is the @FrameworkPuter desktop with amd's strix halo, ryzen ai max+ 395, 128gb of unified memory, up to 96 of it addressable as vram. amd and framework sent it over for honest testing, no strings attached, and i've been waiting on this one specifically. here's why it matters. i've run local ai on basically everything, a 150 dollar drawer card, a 3090, a 5090, the dgx spark, datacenter h200s. the one gap was always the accessible big memory tier on the amd side, and this fills it. 128gb unified at roughly half the price of the nvidia equivalent, the sovereignty box for people who want to run real models without a datacenter budget. booting it today. and the question i actually want answered is the one nobody answers straight: what does this thing really run? same bar i hold every other card to. amd, nvidia, apple, measured, never vibes. let's find out what it's got.

English

23

0

41

6.4K

Punch Taylor@punchtaylor·1d

@Teknium hermes agent is the right call for keeping local inference practical. the agent setup removes the manual steps that usually kill the flow. which models are you pairing it with?

English

0

124

Teknium 🪽@Teknium·1d

It’s really great id highly recommend trying Hermes Agent 😅

YanXbt@IBuzovskyi

HERMES AGENT RUNS MONITORING, RESEARCH, LEAD DETECTION, AND COMPETITIVE ANALYSIS ON AUTOPILOT. AND KNOWS WHEN NOT TO SPEND YOUR TOKENS. the biggest unlock most people skip: Hermes cron jobs can decide ON THEIR OWN whether the LLM should wake up. WAKE AGENT — THE $0 GATE every cron job can run a Python script first. the script checks: did anything actually change? nothing changed: → script outputs {"wakeAgent": false} → LLM stays asleep → zero tokens spent something changed: → script outputs {"wakeAgent": true} → agent wakes up and handles it three gate patterns from official docs: → file-change: compare file mtime to last run. no change? sleep. → external-flag: another process drops a ready file. no flag? sleep. → HTTP-check: ping a URL, diff the response. same as last time? sleep. real example: monitor AWS costs every hour. script pulls current spend from AWS API. no spike? agent sleeps. zero cost. costs jump 40%? agent wakes, reports to Slack, takes action through Stripe MCP. you run 20 monitoring jobs a day. 18 of them find nothing. you pay for 2. NO AGENT — PURE SCRIPT, ZERO LLM some jobs don't need reasoning at all. TLS checks. uptime pings. disk alerts. heartbeats. hermes cron edit --no-agent --script check_health.py script runs. stdout goes straight to Telegram, Discord, or Slack. no LLM involved. flip any job between modes: hermes cron edit --agent # add LLM hermes cron edit --no-agent # remove LLM free monitoring that lives inside the same ecosystem as your agent. 4 MORE USE CASES THIS UNLOCKS: COMPETITIVE ANALYSIS weekly cron with script that diffs competitor pages. agent only analyzes actual changes. updates your tracking file and PRD skill automatically. PRD AS A SKILL save product requirements as a skill, not a document. skills load on demand into fresh context. documents drift. skills stay sharp. CONTENT REPURPOSING hand a video script to the agent. it drafts X and LinkedIn posts in your voice. writes to a review folder. you approve via Telegram. LEAD DETECTION webhook monitors inbox. agent spots potential leads. drafts responses using your business context. schedules meetings from your calendar. the pattern across all of these: scripts handle the mechanical work for free. the agent only spends tokens on reasoning that requires judgment. comment CRON and I'll send you 5 ready-to-paste cron configs with wakeAgent and no_agent patterns. full Hermes SOUL.MD guide 👇

English

18

29

663

58.4K

Punch Taylor@punchtaylor·1d

the people who say "regulate me" are the ones who think they'll get to write the rules. spoiler: they don't. local ai is the only stack that stays yours.

Rhys@RhysSullivan

last one

English

0

52

Punch Taylor@punchtaylor·1d

@mr_r0b0t @NVIDIAAI jealous! i have been eyeballing those but after some necessary upgrades to my rig i am strapped right now. but i didnt let that stop me from at least ordering a reachy mini last night.

English

1

0

3

125

mr-r0b0t@mr_r0b0t·1d

So I did a thing 😁

English

38

0

125

6.8K

Punch Taylor

ค้นพบ