David Hendrickson

14K posts

David Hendrickson

@TeksEdge

PNW Katılım Temmuz 2023

531 Takip Edilen6.7K Takipçiler

Sabitlenmiş Tweet

David Hendrickson@TeksEdge·27 Mar

🎗️ "Medium-Sized" LLM Burners Coming Soon! 🔥 This Could Make Local HyperToken Generation a Reality. ⚡️ NVIDIA’s worst nightmare? 😱 ⚙️ Application-Specific Hardware Taalas new PCIe ASIC board would burn the entire medium-sized Qwen 3.5-27B LLM straight into silicon 🤯 (already doing it with small models) Taalos said medium models on ASIC would be available in their lab by Spring '26. 💭Imagine: 🚫 No more loading weights 🚀 ~10,000 Tokens Per Second locally (Llama 3.1 8B already @ 17,000 tps) 💻 Standard PC slot, ultra-low power (10x less) 🔋 🌍 100% offline with no cloud, no GPU farm 💰 Reddit unit cost rumor $300 to $400 🖥️ Imagine HyperToken generation on your desktop. 🤖 AI agents that think at light speed. ⚡️ Are you ready? 👀

English

178

424

2.7K

481K

David Hendrickson@TeksEdge·5m

As good as the new 🇨🇳 OS models are, they fall down in specific challenges. When asked to produce a "Mind Flayer" from Stranger Things on Voxel, this is what they produced. GLM-5.1 on the left and GPT-5.4 (mini) xhigh on the right.

English

David Hendrickson@TeksEdge·18m

Here is my hot take on @stevibe's latest (and most excellent) video 👉 on Blackwell hardware, vLLM + NVIDIA’s NVFP4 + MTP is a clear winner 🏆 with faster and more stable output (vs GGUF). 🎯 vLLM is your friend if you’ve got a DGX Spark! Just look 👀

stevibe@stevibe

"Why are you benchmarking DGX Spark? It's a training box." Yeah. Low bandwidth, but 128GB of unified memory is just sitting there. Plenty of room to optimize. DGX Spark + Qwen3.6 27B. Four backend/quant combos: 🔴 llama.cpp + UD_Q4_K_XL > 11.0 tok/s (baseline), TTFT 297ms 🟢 llama.cpp + DFlash > 20.4 tok/s (peaks at 97 tok/s), TTFT 320ms 🟡 vLLM FP8 + MTP > 13.1 tok/s, TTFT 540ms 🟣 vLLM NVFP4 + MTP > 24.2 tok/s, TTFT 376ms NVFP4+MTP is the winner for me, rock stable around 24 tok/s, no wild swings. DFlash is the wildcard: massive peaks, but fluctuates a lot. FP8+MTP barely beats baseline, and it's FP8. Love my Spark.

English

David Hendrickson@TeksEdge·53m

@NVIDIA_AI_PC 20

NVIDIA AI PC@NVIDIA_AI_PC·1h

Be honest — how many local models do you have downloaded right now? 👀

English

2.2K

David Hendrickson@TeksEdge·53m

🧠 New Semrush data highlights a hidden risk with cloud AI - why local AI is better. 👀 Here is what I found. According to their latest report 📉 ChatGPT only enables web search on 34.5% of queries (down from 46% late last year) 📊 Most responses still rely on training data instead of live information 📈 Outbound referrals grew 206% in 2025 🔗 Over 20% of that traffic flows straight to Google This shows how cloud models can quietly shift behavior by reducing search usage, driving users back to Google, or making other ecosystem changes without warning. 🏠 Local AI has none of these risks. You control the model, the data, and the behavior. No surprise changes. No hidden shifts. Just consistent, private, and cost-predictable performance for coding agents and research.

Semrush@semrush

ChatGPT is now a standard part of how people use the web, as one piece of a complex, interconnected search journey. We dug into 17 months of clickstream data to map how ChatGPT usage is changing, how referral traffic is growing, and where that traffic goes. If you're a marketer, understanding how your audience uses ChatGPT and where it exists in this buyer journey is critical for understanding how best to reach them. Key takeaways: • Outbound referral traffic from ChatGPT to the rest of the web grew 206% in 2025. • Over 30% of all referral traffic from ChatGPT goes to 10 domains. And over 20% goes to Google. • ChatGPT enables its search feature on just 34.5% of queries as of February 2026 – down from 46% in late 2024 – meaning most responses still rely on training data alone. • Users are asking more prompts per session. After 12 months of flat engagement, average queries per session jumped 50% in the last four months of our study period. Full study: social.semrush.com/48FOJOz.

English

122

David Hendrickson@TeksEdge·1h

@shiyam_kashfiq Glad to help!

English

Shiyam Kashfiq@shiyam_kashfiq·1h

@TeksEdge Impressive catalog, helps founders cut through the noise

English

David Hendrickson@TeksEdge·1h

April LLMs List (final) 🧾30 by my count, but probably missing a few. - GPT-5.5 or (“Spud”) ✅ - Meta Muse Spark ✅ - Claude 5 or (“Mythos”) - Beta Release ✅ - Claude 4.7 ✅ - Deepseek V4 Preview Pro/Flash ✅ - Grok 4.3 ✅ - Gemini 3.1 Flash Live ✅ - Laguna XS.2✅ - Xiaomi Mimo X2.5✅ - Ling-2.6-flash✅ - IBM Granite 4.1✅ - Mistral Medium✅ - Kimi K2.6 ✅ - Kimi K3.0 ❌ - Qwen3.6-Max-Preview ✅ - Qwen3.6-Plus✅ - Qwen3.5 Max Pro✅ - Qwen3.5 Omni Plus✅ - GLM-5.1 open ✅ - GLM-5V-Turbo✅ - Images 2 ✅ - MiMo-V2.5-Pro✅ - Hunyuan Hy3 Preview ✅ - Gemma 4 series✅ - Trinity Large Thinking (🇺🇸) ✅ - 1-bit Bonsai 8B (🇺🇸) ✅ - Holo3 (🇫🇷) ✅ - Tencent Hy3-preview ✅ - Sarvam 105B and Sarvam 30B (🇮🇳) ✅ - MAI-Transcribe-1, MAI-Voice/IMAGE-1 ✅ - MiniMax M2.7 open weights ✅ - MiniMax M3.0❌ - StepFun❌ - Hunyuan3 0B MoE❌

English

David Hendrickson@TeksEdge·2h

@sama Great move.

English

130

Sam Altman@sama·2h

we are gonna do something nice for everyone who applied for the GPT-5.5 party and that we didn't have space for. hope you enjoy!

English

765

103

4.2K

207.6K

David Hendrickson@TeksEdge·2h

I've been using @OpenRouter recently to run benchmarks on the new mystery model Owl-Alpha. That reminded me to double-check my "Enforce Zero Data Retention" setting when requesting an endpoint. Did you remember to lock down your settings?

English

181

David Hendrickson@TeksEdge·2h

🇨🇳 Chinese models still dominate the text-to-video race Kling • Vidu • Seedance • Hailuo and they’ve been running the show 🏆 A new competitor crashed the global top 6 🔥 Bach-1.0 Preview by Video Rebirth (Singapore) 🇸🇬 Now sitting at #6 on Artificial Analysis — going head-to-head with the best Chinese models ⚔️

Artificial Analysis@ArtificialAnlys

Bach-1.0 Preview from Video Rebirth debuts at #6 on the Artificial Analysis Text to Video Leaderboard (No Audio)! Bach-1.0 Preview is the latest Text to Video model from @video_rebirth, with similar performance to Vidu Q3 Pro, Kling 3.0 Omni 1080p (Pro), and grok-imagine-video. Bach-1.0 Preview is intended for broad release later in May. See example generations from Bach-1.0 Preview in the Artificial Analysis Video Arena below 🧵

English

185

David Hendrickson@TeksEdge·2h

💡 The biggest story this year (so far) IMHO was Clawdbot. It changed the inferencing game because personal bots use a steady stream (and lots) of tokens. As a result, the economics of generating your own versus buying them from BigAI changed.

English

141

David Hendrickson@TeksEdge·2h

@digitalix Yeah super stoked for this one. Especially the memory ceiling up to 192GB?

English

130

Alex Ziskind@digitalix·6h

495 leaks.

English

182

20.9K

David Hendrickson@TeksEdge·2h

@Scobleizer Went to a Cal football game last fall and didn't realize the porta-potties weren't restrooms but human-sized bongs. 😆

English

Robert Scoble@Scobleizer·11h

Why did I marry a UC Berkeley Cal graduate? Because they know how to party. Has made my life far more fun. Maryam took me to see Joy Crookes and Louis Capaldi at the Greek Theater in UC Berkeley Campus tonight. Both amazing. Acoustics here are way better than average in America. Reminds me I need to come and sneak into its AI department soon. Its AI program is one of the best in the world. Apple cofounder Woz went here after starting Apple. Which really tells you its computer science program is the best. Does anyone remember his student name? Days like this make paying the high taxes in California all worth it. It well be hard for AI to match this. But we will spend trillions trying.

English

12.8K

David Hendrickson@TeksEdge·13h

@mr_r0b0t @NousResearch @Teknium Hermes is kind of killing it right now

English

mr-r0b0t@mr_r0b0t·1d

10,814 ChatML + @NousResearch Hermes reasoning traces from DeepSeek V4 Pro for LoRA SFT on consumer GPUs. • 96 parallel workers, staggered 5s → 99.8% success • 76K tool calls across 20+ tools • 100% think blocks, JSON-repaired • 10.7% Hermes-specific This is the only dataset with all 8 Hermes-specific tools (memory, session_search, skills_list, delegate_task, skill_manage, skill_view, cronjob) used in realistic multi-turn agent conversations — 10.5% of all tool calls exercise capabilities that generic coding datasets don't cover. Every trace has reasoning blocks and ChatML formatting compatible with all 6 target models. HF dropping soon

English

3.1K

David Hendrickson@TeksEdge·13h

@Mayhem4Markets I’m sure Elon and Zuck aren’t happy.

English

Markets & Mayhem@Mayhem4Markets·1d

The narrative: open-weight models are really far from closed. The reality: Kimi-K2.6, an open-weight model, ranks just below the top 4 closed models at #5 on Artificial Analysis. It is tied with MiMo-V2.5-Pro, another open-weight model.

English

16.8K

David Hendrickson@TeksEdge·13h

@ai_for_success Google I/O next week should be fun!

English

AshutoshShrivastava@ai_for_success·20h

Google is working on a new design for the Gemini app. This was spotted on iOS by r/u/TaxOld2989. Looks kind of cool.

English

143

6.2K

David Hendrickson@TeksEdge·14h

🚨 AMD's New "Inference Box" Possible Hidden Sauce - A New Processor! Meet Strix Halo GORGON!! 💥 💪 Earlier, I posted that AMD is going to enter the Home Inferencing arena with its own branded LLM Box. BUT, we just learned what they might have up their sleeve, a new processor! ⚡️ 👀 Look at this 👇 AMD’s leaked Ryzen AI Max+ 495 “Gorgon Halo” looks like a serious Strix Halo 390 jump: 🧠 16C/32T vs 12C/24T (+33%) ⚡ PassMark CPU: 57,525 vs 41,552 (~+38%) 🎮 Radeon 8065S: 40 CUs vs 8050S 32 CUs (+25%) 🧮 Memory: 192GB spotted vs 128GB max (+50%) All this is based on a leak and speculation, nothing official yet.

David Hendrickson@TeksEdge

⁉️So get this, AMD is making a bold move to own the affordable personal inferencing market by launching a Mini PC in June, a 128GB Shared Memory Inferencing Box 🎇 They call it the ⬭ Halo Box. 🧾 It's a Ryzen AI MAX+ 395 (16 Zen 5 cores + 40 RDNA 3.5 CUs + XDNA 2 NPU) ✅ Up to 128GB LPDDR5X-8533 unified memory ✅ Full ROCm support + Day-0 AI model optimization 🧪 Built for local AI development (up to ~200B param models) 📈 Direct shot at NVIDIA’s $4,699 DGX Spark and could cost $2,000–$3,000 (as they do now) 🤔 Why launch now during the RAM shortage? While memory makers divert capacity to HBM for AI data centers (driving LPDDR5X prices to spike and NVIDIA to raise the price of DGX Spark by $700), AMD is making a bold move to own the affordable, high-memory AI mini-PC segment before the crisis worsens. 💡 My Speculation: AMD could be using its contracts, relationships, and strategic priority to secure better memory access than many traditional OEMs. This could give them an advantage in launching the Halo Box during the shortage. Smart timing or risky bet? 🔥 This is AMD aggressively fighting for the local AI developer market.

English

7.9K

David Hendrickson@TeksEdge·14h

🚀 Anthropic is in early talks to buy inference chips from UK startup Fractile. Their tech 👉 DRAM-less SRAM chips with "Memory Compute Fusion", memory and compute physically fused on the same die. Solves two major problems 💸 Skyrocketing DRAM costs & supply shortages 🧱 The memory wall bottleneck that slows down LLM inference How it compares ✅ Much more flexible than Taalas (not hardwired to one model) ⚡ Similar to Groq LPU but with tighter memory-compute integration 🔥 Claims significantly faster & cheaper than Nvidia GPUs This is the next wave of custom AI silicon

English

1.3K

David Hendrickson@TeksEdge·14h

RAMageddon is going to be brutal for the average person.

English

887

David Hendrickson@TeksEdge·15h

With PC and GPU prices continuing to climb, let’s hope these high performing medium OS models like Qwen3.6-27B continue to stay small enough to quantize and fit on 50GB VRAM or less.

David Hendrickson@TeksEdge

This is a great example of the difficult position local inferencers are faced with using quant models 60-100GB in size. The DGX-Spark at $4,700 (now) retail and the only reasonable option (vs $10-$14K) but it’s slooooow.

English

1.8K

David Hendrickson@TeksEdge·17h

This is really the best holistic LLM rating!

David Hendrickson@TeksEdge

🏆 LLMStats just dropped a fresh leaderboard update. This is my trusted ranking. 📊 The "TrueSkill" composite score is the real deal as the most conservative, battle-tested “Uber benchmark” in the game (μ − 3σ across GPQA, SWE-Bench, coding arenas & more). 👀 Current Standings 🏆 Overall #1 Claude Mythos Preview (@AnthropicAI) — 70.1Unreleased monster. 94.6% on GPQA Diamond. This thing is going to be an absolute banger 🚀 🥇 Best Open-Weights Kimi K2.6 (@moonshot) — 58.7Undisputed leader among open models right now. 90.5% GPQA + only $0.95/M tokens. Insanely good value 💎 Quick Hits 🏆 Gemini 3.1 Pro → Dominating coding arenas 👑 Llama 4 Scout → 10M context king ⚡ Mercury 2 → Fastest model at 1720 tps 🔥Bottom line If you care about real capability per dollar, Kimi K2.6 is the one to watch in the open-source world right now. And when Mythos drops… the game changes

English

484

Keşfet

@stevibe @NVIDIA_AI_PC @shiyam_kashfiq @sama @OpenRouter @digitalix @Scobleizer @mr_r0b0t