Devon James ☀️

19.8K posts

Devon James ☀️ banner
Devon James ☀️

Devon James ☀️

@DevonRJames

Co-Inventor @OpenIndexProto | CTO @Alexandria | formerly sales @Apple, VFX artist @hensoncompany @Sony & @wbpictures, Infantry @USMC & Technical Dir @web3wg

California انضم Haziran 2007
6K يتبع5K المتابعون
Devon James ☀️ أُعيد تغريده
Tom Turney
Tom Turney@no_stp_on_snek·
the original TurboQuant paper tested on A100 with models up to 8B. 6 days later, a bunch of strangers on the internet had it built and running on: - Apple Silicon M1 through M5 - NVIDIA 3080 Ti through DGX Spark Blackwell - AMD RX 6800 XT and 9070 - a 10-year-old Tesla P40 - an 8GB MacBook Air - models from 3.8B to 70B across 6 architecture families - 30+ independent testers along the way we found new optimizations the paper didn't cover and failure modes it didn't test. the fact that a loose group of people across the world can read a paper, build implementations from scratch, stress-test across hardware none of us could individually afford, and push the research further in under a week is genuinely one of the best things about this era. the tools and the community make it possible. open source is something else.
Tom Turney tweet media
English
33
241
2.3K
60.3K
Brian Roemmele
Brian Roemmele@BrianRoemmele·
Robot on @ShawnRyan762. Faster as faster we are approaching the “iPhone moment” for the century.
English
30
33
286
23.6K
Devon James ☀️ أُعيد تغريده
BuBBliK
BuBBliK@k1rallik·
Solo dev reverse-engineered Google's billion-dollar algorithm in 7 days Google published the paper that crashed memory stocks worldwide. Then shipped zero code. Tom Turney read the math, opened his terminal, and built the whole thing with Claude - then made it faster than Google promised. Day 1-3: Core algorithms, 141 tests, Python prototype Day 3-5: C port into llama.cpp, Metal GPU kernels Day 5-7: Speed optimization from 739 to 2747 tok/s That's a 3.7x speedup through pure engineering: > fp32 → fp16 WHT > half4 vectorized butterfly ops > graph-side rotation > block-32 storage layout Then he added his own research on top: > Sparse V: skip 90% of value decompressions at long context > Asymmetric K/V: keep keys precise, compress values harder > Temporal decay: old tokens get lower precision automatically Result: 35B model running on a MacBook with 4.6x compressed cache. 613 GitHub stars in a week. Google still hasn't released their own code.
BuBBliK tweet media
BuBBliK@k1rallik

x.com/i/article/2037…

English
143
797
6.4K
924.2K
Devon James ☀️ أُعيد تغريده
sui ☄️
sui ☄️@birdabo·
🚨SOMEONE REINVENTED HOW TEXT RENDERS ON THE WEB AND ITS ABSOLUTELY INSANE. the goated dev behind react, reasonML, and midjourney’s frontend, just dropped Pretext. a tiny typescript library that measures and lays out text 500x faster than the DOM. he trained models against real browser rendering for weeks until the output matched safari, chrome, and firefox exactly. the demos are insane!! hundreds of thousands of text boxes at 120fps. magazine layouts and chat bubbles that actually wrap right. engineers from Vercel, Remix, Figma, and shadcn all cosigned. this is the kind of open source that makes you want to be a better dev. here are some cool demos in the past 24hrs👇
Cheng Lou@_chenglou

My dear front-end developers (and anyone who’s interested in the future of interfaces): I have crawled through depths of hell to bring you, for the foreseeable years, one of the more important foundational pieces of UI engineering (if not in implementation then certainly at least in concept): Fast, accurate and comprehensive userland text measurement algorithm in pure TypeScript, usable for laying out entire web pages without CSS, bypassing DOM measurements and reflow

English
166
1.1K
12K
1.6M
Xumas
Xumas@xumas_iq·
Saddam Hussein's last public appearance before the Fall of Baghdad. This footage was filmed in Baghdad, 15mins away from American troops.
English
32
143
742
169.6K
Devon James ☀️ أُعيد تغريده
David Hendrickson
David Hendrickson@TeksEdge·
👀 Could burning the entire LLM (weights, attention layers, and everything else) straight onto a chip and board lower cost and speed up inferencing by hardwiring LLMs? YES ✅ — and it’s already being done. Taalas HC1 is using these ASIC “LLM burners” right now. 17k+ tokens/sec on Llama 3.1 8B, ultra-low power, rumored cost ~$300–400 PCIe card, 100% offline. Medium models (such as Qwen 3.5-27B) dropping to lab for testing Spring ’26. If sold to public could bring local hyper-token AI from sci-fi to your desktop. ⚡🪪🚀
David Hendrickson@TeksEdge

🎗️ "Medium-Sized" LLM Burners Coming Soon! 🔥 This Could Make Local HyperToken Generation a Reality. ⚡️ NVIDIA’s worst nightmare? 😱 ⚙️ Application-Specific Hardware Taalas new PCIe ASIC board would burn the entire medium-sized Qwen 3.5-27B LLM straight into silicon 🤯 (already doing it with small models) Taalos said medium models on ASIC would be available in their lab by Spring '26. 💭Imagine: 🚫 No more loading weights 🚀 ~10,000 Tokens Per Second locally (Llama 3.1 8B already @ 17,000 tps) 💻 Standard PC slot, ultra-low power (10x less) 🔋 🌍 100% offline with no cloud, no GPU farm 💰 Reddit unit cost rumor $300 to $400 🖥️ Imagine HyperToken generation on your desktop. 🤖 AI agents that think at light speed. ⚡️ Are you ready? 👀

English
15
11
143
13K
Andrej Karpathy
Andrej Karpathy@karpathy·
- Drafted a blog post - Used an LLM to meticulously improve the argument over 4 hours. - Wow, feeling great, it’s so convincing! - Fun idea let’s ask it to argue the opposite. - LLM demolishes the entire argument and convinces me that the opposite is in fact true. - lol The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.
English
1.7K
2.4K
30.6K
3.2M
Devon James ☀️
Devon James ☀️@DevonRJames·
@0xSero you skipped from 24 to 64, what would you recommend for 48 (across 2 4090s, so prolly not a fully usable 48)?
English
1
0
3
407
0xSero
0xSero@0xSero·
Best models to run on your hardware: —— 64 GB —— - Qwen3-coder-next-80B-4bit (coding, Claude code, general agent) - Qwen3.5-122B-reap: (browser use, multimodal, tool calling, general agent) —— 96 GB —— - GLM-4.6V (multimodal and tool calls) - Hermes-70B (Jailbroken) - Nemotron-120B-Super: (openclaw) - Mistral-4-Small (general agent) —— 192 GB —— All these are excellent top tier LLMs and approach sonnet in capabilities - Step-3.5-Flash - Qwen3.5-397B-REAP - MiniMax-M2.5 (soon M2.7) - GLM-4.7-Reap
0xSero@0xSero

Best models to run on your hardware level I'll be doing this every week, I hope you guys enjoy. ---- 8 GB ---- Autocomplete for coding (like Cursor Tab) - huggingface.co/NexVeridian/ze… - huggingface.co/bartowski/zed-… Tool calling, assistant style - huggingface.co/nvidia/NVIDIA-… ---- 16 Gb ---- Here things get better: Multimodal - huggingface.co/Qwen/Qwen3.5-9B - huggingface.co/Tesslate/OmniC… - huggingface.co/unsloth/Qwen3.… ---- 24 GB ---- - The best model you can get (thanks Qwen) huggingface.co/Qwen/Qwen3.5-2… - Great model (strong agents) huggingface.co/nvidia/Nemotro… - Mine hehe huggingface.co/0xSero/Qwen-3.… I'm doing a weekly series

English
166
232
3.2K
446.6K
0xSero
0xSero@0xSero·
Best models to run on your hardware level I'll be doing this every week, I hope you guys enjoy. ---- 8 GB ---- Autocomplete for coding (like Cursor Tab) - huggingface.co/NexVeridian/ze… - huggingface.co/bartowski/zed-… Tool calling, assistant style - huggingface.co/nvidia/NVIDIA-… ---- 16 Gb ---- Here things get better: Multimodal - huggingface.co/Qwen/Qwen3.5-9B - huggingface.co/Tesslate/OmniC… - huggingface.co/unsloth/Qwen3.… ---- 24 GB ---- - The best model you can get (thanks Qwen) huggingface.co/Qwen/Qwen3.5-2… - Great model (strong agents) huggingface.co/nvidia/Nemotro… - Mine hehe huggingface.co/0xSero/Qwen-3.… I'm doing a weekly series
English
213
370
3.7K
539.5K
Devon James ☀️
Devon James ☀️@DevonRJames·
On April 1, 2003, Iraqs information minister went on TV and said "they are nowhere near the airport …They are lost in the desert... they can not read a compass. they are no where near Baghdad! This is silly!" We found this pretty funny to hear. The next day the airport was taken. My platoon got there on April 4 I think. We crossed the Diyala river into Sadr city the night of April 7. The statues in Baghdad started getting pulled down on April 8. It's pretty standard for the losing side when facing overwhelming defeat to just lie through their teeth to give themselves as much time as possible so they're ready to go into hiding the moment the regime collapses.
English
0
0
0
162
Devon James ☀️ أُعيد تغريده
Elon Musk
Elon Musk@elonmusk·
Over 500 rocket landings now
English
15.6K
34.3K
420.3K
69.3M
Devon James ☀️ أُعيد تغريده
ComfyUI
ComfyUI@ComfyUI·
Upgrading your RAM is now unnecessary. Introducing our new ComfyUI Dynamic VRAM optimization. Running local models is now possible on even the most memory constrained hardware. Read more here: blog.comfy.org/p/dynamic-vram…
ComfyUI tweet media
English
84
318
2.9K
445.6K
Devon James ☀️ أُعيد تغريده
am.will
am.will@LLMJunky·
Two incredible innovations in the local AI space in a span of three days. I am so excited. ComfyUI just shipped "Dynamic VRAM" and it seems like a big deal for anyone running models locally. The problem: large AI models can have many GB of weights. If your system lacks the necessary RAM, you'd normally hit memory crashes or grind to a halt on the page file. Instead of loading the entire model into memory at once, ComfyUI now reads the model file piece by piece directly from your SSD. Only the specific parts needed for the current step get pulled into memory. Everything else stays on disk until it's actually called for. On the GPU side, they built a smart system that loads weight data at the exact moment it's needed. If your GPU runs out of space, it doesn't crash. It uses a temporary workaround to finish the calculation, then cleans up after itself. It also keeps track of what didn't fit so it doesn't waste time trying to reload things that won't fit again. The other big improvement is for workflows that use multiple models. Previously, swapping between models would pile everything into system memory and bog your machine down. Now when a model gets swapped out of the GPU, it just goes back to the "read from disk when needed" state instead of sitting in RAM. The result: a 56GB model can now run on a machine with only 32GB of memory. No crashes, no slowdowns from swap. Available now for Nvidia GPUs on Windows and Linux, with AMD support on the way. No idea how fast this is, but this seems incredible. Cannot wait to get my workstation going.
ComfyUI@ComfyUI

Upgrading your RAM is now unnecessary. Introducing our new ComfyUI Dynamic VRAM optimization. Running local models is now possible on even the most memory constrained hardware. Read more here: blog.comfy.org/p/dynamic-vram…

English
19
35
411
49.6K
Devon James ☀️ أُعيد تغريده
David Hendrickson
David Hendrickson@TeksEdge·
🎗️ "Medium-Sized" LLM Burners Coming Soon! 🔥 This Could Make Local HyperToken Generation a Reality. ⚡️ NVIDIA’s worst nightmare? 😱 ⚙️ Application-Specific Hardware Taalas new PCIe ASIC board would burn the entire medium-sized Qwen 3.5-27B LLM straight into silicon 🤯 (already doing it with small models) Taalos said medium models on ASIC would be available in their lab by Spring '26. 💭Imagine: 🚫 No more loading weights 🚀 ~10,000 Tokens Per Second locally (Llama 3.1 8B already @ 17,000 tps) 💻 Standard PC slot, ultra-low power (10x less) 🔋 🌍 100% offline with no cloud, no GPU farm 💰 Reddit unit cost rumor $300 to $400 🖥️ Imagine HyperToken generation on your desktop. 🤖 AI agents that think at light speed. ⚡️ Are you ready? 👀
David Hendrickson tweet media
English
174
422
2.7K
459.7K
Didicoy the Kunt
Didicoy the Kunt@Didicoy_Tonttu·
@johnnymaga Trump - 6'3" Rubio - 5'9" Burgum - 6'1" Why is Trump the shortest person in frame?
English
11
1
4
6.9K
johnny maga
johnny maga@johnnymaga·
Burgum on Venezuela: I literally think they’re going to put up a statue of President Trump Trump: That would be a great honor *2 mins of updates later* Burgum: Their oil now flows to our refineries Trump: Forget that. When are they going to do the statue? 😭
English
357
2.7K
27.6K
1.5M