Luke Wright

10.9K posts

Luke Wright banner
Luke Wright

Luke Wright

@lukewrightmain

Husband | Dad | @cellhasher | @LitecoinLabs | @Litecoin $BELLS

Hong Kong Katılım Eylül 2018
2.3K Takip Edilen12.4K Takipçiler
Nainsi Dwivedi
Nainsi Dwivedi@NainsiDwiv50980·
🚨 Someone just did the “impossible”… They ran a ~400B parameter AI model on a laptop. No cloud No data center Just a 48GB MacBook 🤯 A dev fed Claude Code with: • @karpathy autoresearch repo • Apple’s LLM in a Flash paper • Goal: run Qwen3.5 397B locally And it actually worked. → ~1 token/sec → ~21GB RAM → Rest streamed from SSD This isn’t a flex This is a shift We’re entering a world where: Your laptop can run models that once needed entire server farms It’s not about more compute anymore It’s about smarter systems 🚀
Nainsi Dwivedi tweet media
Suryansh Tiwari@Suryanshti777

x.com/i/article/2034…

English
100
179
1.2K
203.5K
Luke Wright
Luke Wright@lukewrightmain·
@0xSero Hey check out @Cellhasher lmk! We have been running phones in clusters for LLMs even bigger param models with our development team here for a while! Perhaps we share some resources and repos with you we have been working on!
English
0
0
3
258
0xSero
0xSero@0xSero·
Putting out a wish to the universe. I need more compute, if I can get more I will make sure every machine from a small phone to a bootstrapped RTX 3090 node can run frontier intelligence fast with minimal intelligence loss. I have hit page 2 of huggingface, released 3 model family compressions and got GLM-4.7 on a MacBook huggingface.co/0xsero My beast just isn’t enough and I already spent 2k usd on renting GPUs on top of credits provided by Prime intellect and Hotaisle. ——— If you believe in what I do help me get this to Nvidia, maybe they will bless me with the pewter to keep making local AI more accessible 🙏
0xSero tweet media
Michael Dell 🇺🇸@MichaelDell

Jensen Huang is loving the new Dell Pro Max with GB300 at NVIDIA GTC.💙 They asked me to sign it, but I already did 😉

English
159
430
3.6K
731.8K
Luke Wright
Luke Wright@lukewrightmain·
@paoloardoino Nice work! @Cellhasher can’t seem to get the same reach with slightly more performance during benchmarks over qvac fabric
English
0
0
0
45
Paolo Ardoino 🤖
Paolo Ardoino 🤖@paoloardoino·
Tether AI breakthrough news got a good reach
Paolo Ardoino 🤖 tweet media
Paolo Ardoino 🤖@paoloardoino

Tether AI breakthrough Tether AI team just released new version of QVAC Fabric to include the World’s First Cross-Platform BitNet LoRA Framework to Enable Billion-Parameter AI Training and Inference on Consumer GPUs and Smartphones. Background Microsoft's BitNet uses one bit architecture to dramatically compress models. Traditional LLMs operate on full-precision computation, where weights are stored as complex, high-resolution numbers. The innovation of BitNet is that it shrinks these weights into a tiny ternary range of only -1, 0, and 1. significantly reducing memory usage and computation. LoRA, is a parameter-efficient fine-tuning technique that reduces the number of trainable parameters by up to ninety-nine percent. Together they slash memory and compute requirements. Yet BitNet has mostly been limited to CPU or CUDA NVIDIA backends, and lacked the support of LoRA fine-tuning. Enters QVAC Fabric: the unlock Today, with QVAC Fabric LLM, is the first time BitNet LoRA fine-tuning and inference work cross-platform across GPU vendors and operating systems using Vulkan and Metal backends. That means support for AMD, Intel, Apple Metal and also Mobile GPUs. And for the first time ever, BitNet inference runs efficiently on smartphones using mobile GPUs. On flagship devices, GPU inference is 2 to 11 times faster than CPU while using up to 90% less memory than the full precision models. The biggest unlock: QVAC Fabric LLM support for BitNet LoRA fine-tuning on heterogeneous GPUs. Our team was able to demonstrate this by fine tuning models up to 3.8 billion parameters on all flagships phones such as Pixel 9, S25 and iPhone 16 and up to 13 billion parameter models on the iPhone 16. Github repositories: github.com/tetherto/qvac-… : general QVAC Fabric codebase github.com/tetherto/qvac-… : specific QVAC Fabric's BitNet knowledge base, architecture docs and pre-built binaries What does it mean? What used to require dedicated GPUs now runs on consumer hardware. This breakthrough is the first real-world signal of a local private AI that can truly serve the people. And this is just the beginning. In the next months and years Tether will relentlessly continue to invest significant amounts of resources and capital to continue to research and develop open-source intelligence that can scale and evolve on local devices, providing maximum utility and privacy to its users. The era of Stable Intelligence has just begun. Free as in freedom.

English
15
4
210
22.8K
Luke Wright
Luke Wright@lukewrightmain·
Benchmarked DeepSeek Coder 1.3B Q4_K_M on @Cellhasher vs llama.cpp vs @tether / @paoloardoino qvac today. Single phone 5yr old(Adreno 660, 4 threads): First Number = Prompt tok/s Second Number = Generation tok/s - llama.cpp (b8156): 66.35 / 25.88 tok/s - qvac-fabric: 55.53 / 26.56 tok/s - cellswarm-v2: 62.02 / 28.20 tok/s 🔥 Takeaways: llama.cpp still wins on prompt speed qvac gains slightly on generation Cellhasher leads overall on generation throughput Results are close, but Cellhasher edges it out where it matters. What model should we test next? Qwen3.5 coming once (tether supports it) qvac rebases 👀
Luke Wright tweet media
English
0
0
2
170
corbin
corbin@corbin_braun·
pitch me your startup with 0 words.
English
1.4K
12
791
140.3K
alan ⚡💵
alan ⚡💵@0xalank·
Tethers AI breakthrough is spamming me about antidepressants when I ask it "Hi" @paoloardoino what did you feed this thing
alan ⚡💵 tweet media
Paolo Ardoino 🤖@paoloardoino

Tether AI breakthrough Tether AI team just released new version of QVAC Fabric to include the World’s First Cross-Platform BitNet LoRA Framework to Enable Billion-Parameter AI Training and Inference on Consumer GPUs and Smartphones. Background Microsoft's BitNet uses one bit architecture to dramatically compress models. Traditional LLMs operate on full-precision computation, where weights are stored as complex, high-resolution numbers. The innovation of BitNet is that it shrinks these weights into a tiny ternary range of only -1, 0, and 1. significantly reducing memory usage and computation. LoRA, is a parameter-efficient fine-tuning technique that reduces the number of trainable parameters by up to ninety-nine percent. Together they slash memory and compute requirements. Yet BitNet has mostly been limited to CPU or CUDA NVIDIA backends, and lacked the support of LoRA fine-tuning. Enters QVAC Fabric: the unlock Today, with QVAC Fabric LLM, is the first time BitNet LoRA fine-tuning and inference work cross-platform across GPU vendors and operating systems using Vulkan and Metal backends. That means support for AMD, Intel, Apple Metal and also Mobile GPUs. And for the first time ever, BitNet inference runs efficiently on smartphones using mobile GPUs. On flagship devices, GPU inference is 2 to 11 times faster than CPU while using up to 90% less memory than the full precision models. The biggest unlock: QVAC Fabric LLM support for BitNet LoRA fine-tuning on heterogeneous GPUs. Our team was able to demonstrate this by fine tuning models up to 3.8 billion parameters on all flagships phones such as Pixel 9, S25 and iPhone 16 and up to 13 billion parameter models on the iPhone 16. Github repositories: github.com/tetherto/qvac-… : general QVAC Fabric codebase github.com/tetherto/qvac-… : specific QVAC Fabric's BitNet knowledge base, architecture docs and pre-built binaries What does it mean? What used to require dedicated GPUs now runs on consumer hardware. This breakthrough is the first real-world signal of a local private AI that can truly serve the people. And this is just the beginning. In the next months and years Tether will relentlessly continue to invest significant amounts of resources and capital to continue to research and develop open-source intelligence that can scale and evolve on local devices, providing maximum utility and privacy to its users. The era of Stable Intelligence has just begun. Free as in freedom.

English
7
0
8
759
David Motta
David Motta@davidmotta·
@itsPaulAi never thought id see the day this stuff runs smooth on a phone. wild how fast it’s getting.
English
3
0
1
159
Paul Couvert
Paul Couvert@itsPaulAi·
Ok that's absolutely insane?? Tether has just introduced the QVAC BitNet LoRA fine‑tuning framework You can now run and fine-tune (!) billion-parameter models ON YOUR PHONE - It cuts memory use by up to 90% - They've fine-tuned a 13B model on an iPhone 16 - Runs 11x faster on a Galaxy S25 vs CPU This change everything for local AI. Again, it enables billion‑parameter LLM fine‑tuning on laptops, consumer GPUs, and modern phones. "What used to require dedicated GPUs now runs on consumer hardware." 100% open source
Paolo Ardoino 🤖@paoloardoino

Tether AI breakthrough Tether AI team just released new version of QVAC Fabric to include the World’s First Cross-Platform BitNet LoRA Framework to Enable Billion-Parameter AI Training and Inference on Consumer GPUs and Smartphones. Background Microsoft's BitNet uses one bit architecture to dramatically compress models. Traditional LLMs operate on full-precision computation, where weights are stored as complex, high-resolution numbers. The innovation of BitNet is that it shrinks these weights into a tiny ternary range of only -1, 0, and 1. significantly reducing memory usage and computation. LoRA, is a parameter-efficient fine-tuning technique that reduces the number of trainable parameters by up to ninety-nine percent. Together they slash memory and compute requirements. Yet BitNet has mostly been limited to CPU or CUDA NVIDIA backends, and lacked the support of LoRA fine-tuning. Enters QVAC Fabric: the unlock Today, with QVAC Fabric LLM, is the first time BitNet LoRA fine-tuning and inference work cross-platform across GPU vendors and operating systems using Vulkan and Metal backends. That means support for AMD, Intel, Apple Metal and also Mobile GPUs. And for the first time ever, BitNet inference runs efficiently on smartphones using mobile GPUs. On flagship devices, GPU inference is 2 to 11 times faster than CPU while using up to 90% less memory than the full precision models. The biggest unlock: QVAC Fabric LLM support for BitNet LoRA fine-tuning on heterogeneous GPUs. Our team was able to demonstrate this by fine tuning models up to 3.8 billion parameters on all flagships phones such as Pixel 9, S25 and iPhone 16 and up to 13 billion parameter models on the iPhone 16. Github repositories: github.com/tetherto/qvac-… : general QVAC Fabric codebase github.com/tetherto/qvac-… : specific QVAC Fabric's BitNet knowledge base, architecture docs and pre-built binaries What does it mean? What used to require dedicated GPUs now runs on consumer hardware. This breakthrough is the first real-world signal of a local private AI that can truly serve the people. And this is just the beginning. In the next months and years Tether will relentlessly continue to invest significant amounts of resources and capital to continue to research and develop open-source intelligence that can scale and evolve on local devices, providing maximum utility and privacy to its users. The era of Stable Intelligence has just begun. Free as in freedom.

English
13
8
104
21.8K
Luke Wright
Luke Wright@lukewrightmain·
Paul! funny enough @Cellhasher has their own fabic.cpp that has been out for well over a year now for Phones that runs faster than tether currently, also works with 5 year old devices, tethers only works with a800 gpus on phones and up, where as theirs still amazing headroom on old devices. Non the less Tether has the cake when it comes to the latest generation!
English
1
0
0
130
Luke Wright
Luke Wright@lukewrightmain·
@hthieblot LMK if @Cellhasher fits your thesis We probably have the fastest AI interface and ability to cluster devices for bigger b models.
English
0
0
0
41
Hubert Thieblot
Hubert Thieblot@hthieblot·
Looking for obsessed builders. I invest up to $250K first checks in: • Robotics, drones, space • Applied AI/ML, models • Dev tools and infra • Manufacturing & logistics, and more... DMs open or just reply here what you are building. Early > polished.
English
459
55
1.4K
101.9K
Luke Wright
Luke Wright@lukewrightmain·
To Add to this A Model Like Qwen 3.5 is too new, @tether and @paoloardoino qvac fabric cannot load it currently. Cannot benchmark tests.
Luke Wright tweet media
English
0
0
0
64
Luke Wright
Luke Wright@lukewrightmain·
Here are the @tether results tested against basic LLAMA CPP on @Cellhasher DeepSeek Coder 1.3B — tether tests on Cellhasher (single phone, Adreno 660) Please Note @tether @paoloardoino work would take the cake on Devices that are of the newest generation Chipset ill benchmark those as well ass devices with Adreno 800+ are what their work is made for, this is strictly CPU, Cellhashers own fabric cpp also takes the cake on these on 5 year old devices. (multi-phone cluster) Results: Prompt (pp64): 66.35 tok/s (std) vs 55.53 tok/s (qvac) → std +19% Prompt (pp256): 67.10 tok/s (std) vs 55.96 tok/s (qvac) → std +20% Gen (tg128): 25.88 tok/s (std) vs 26.56 tok/s (qvac) → qvac +2.6% Gen (tg256): 25.75 tok/s (std) vs 26.44 tok/s (qvac) → qvac +2.7% Takeaways: llama.cpp (b8156) is ~20% faster on prompt processing (newer optimizations) qvac-fabric slightly wins on token generation (~2–3%) from REPACK + Flash Attention Net effect is basically noise at system level
Luke Wright tweet media
English
1
0
3
185
Luke Wright
Luke Wright@lukewrightmain·
@paoloardoino This only works for Gen3 Mobile Chips btw (this discludes any mobile device that was made in 2024 or under, when something for the older devices? if you want a fabric that works for all devices older devices included lmk
English
0
0
0
126
Paolo Ardoino 🤖
Paolo Ardoino 🤖@paoloardoino·
Tether AI QVAC Fabric LLM sauce 🍅🫙 source code repo github.com/tetherto/qvac-…
Paolo Ardoino 🤖@paoloardoino

Tether AI breakthrough Tether AI team just released new version of QVAC Fabric to include the World’s First Cross-Platform BitNet LoRA Framework to Enable Billion-Parameter AI Training and Inference on Consumer GPUs and Smartphones. Background Microsoft's BitNet uses one bit architecture to dramatically compress models. Traditional LLMs operate on full-precision computation, where weights are stored as complex, high-resolution numbers. The innovation of BitNet is that it shrinks these weights into a tiny ternary range of only -1, 0, and 1. significantly reducing memory usage and computation. LoRA, is a parameter-efficient fine-tuning technique that reduces the number of trainable parameters by up to ninety-nine percent. Together they slash memory and compute requirements. Yet BitNet has mostly been limited to CPU or CUDA NVIDIA backends, and lacked the support of LoRA fine-tuning. Enters QVAC Fabric: the unlock Today, with QVAC Fabric LLM, is the first time BitNet LoRA fine-tuning and inference work cross-platform across GPU vendors and operating systems using Vulkan and Metal backends. That means support for AMD, Intel, Apple Metal and also Mobile GPUs. And for the first time ever, BitNet inference runs efficiently on smartphones using mobile GPUs. On flagship devices, GPU inference is 2 to 11 times faster than CPU while using up to 90% less memory than the full precision models. The biggest unlock: QVAC Fabric LLM support for BitNet LoRA fine-tuning on heterogeneous GPUs. Our team was able to demonstrate this by fine tuning models up to 3.8 billion parameters on all flagships phones such as Pixel 9, S25 and iPhone 16 and up to 13 billion parameter models on the iPhone 16. Github repositories: github.com/tetherto/qvac-… : general QVAC Fabric codebase github.com/tetherto/qvac-… : specific QVAC Fabric's BitNet knowledge base, architecture docs and pre-built binaries What does it mean? What used to require dedicated GPUs now runs on consumer hardware. This breakthrough is the first real-world signal of a local private AI that can truly serve the people. And this is just the beginning. In the next months and years Tether will relentlessly continue to invest significant amounts of resources and capital to continue to research and develop open-source intelligence that can scale and evolve on local devices, providing maximum utility and privacy to its users. The era of Stable Intelligence has just begun. Free as in freedom.

Română
7
6
104
14.1K
Luke Wright
Luke Wright@lukewrightmain·
@paoloardoino Also to note, wed love for your to test our our frabic cpp solution for scaling with multiple Droids.
English
0
0
0
52
Luke Wright
Luke Wright@lukewrightmain·
@paoloardoino @Cellhasher So i noticed the Device GPUs for Mobile, we have done extensive testing and we have our own process that doesn't include the GPU usage in at least 5 year old devices (droids) but the new device NPUs we are seeing 30-50tok/s clusters on 80b models without tuning them on 4b quant.
English
0
0
0
25
Luke Wright
Luke Wright@lukewrightmain·
Hey Paolo! @Cellhasher has been doing this with their own models on phones for quite some time! Perhaps let’s connect. I’m an old-school Dino coin dev (Litecoin), worked on Ordinals theory (stablecoins on Litecoin), and also built the software backend for Cellhasher’s AI where phone clusters can run even 30B models, some 80B models, and newer phones with NPUs running 100B+ models :)
English
0
0
2
254
Paolo Ardoino 🤖
Paolo Ardoino 🤖@paoloardoino·
Tether AI breakthrough Tether AI team just released new version of QVAC Fabric to include the World’s First Cross-Platform BitNet LoRA Framework to Enable Billion-Parameter AI Training and Inference on Consumer GPUs and Smartphones. Background Microsoft's BitNet uses one bit architecture to dramatically compress models. Traditional LLMs operate on full-precision computation, where weights are stored as complex, high-resolution numbers. The innovation of BitNet is that it shrinks these weights into a tiny ternary range of only -1, 0, and 1. significantly reducing memory usage and computation. LoRA, is a parameter-efficient fine-tuning technique that reduces the number of trainable parameters by up to ninety-nine percent. Together they slash memory and compute requirements. Yet BitNet has mostly been limited to CPU or CUDA NVIDIA backends, and lacked the support of LoRA fine-tuning. Enters QVAC Fabric: the unlock Today, with QVAC Fabric LLM, is the first time BitNet LoRA fine-tuning and inference work cross-platform across GPU vendors and operating systems using Vulkan and Metal backends. That means support for AMD, Intel, Apple Metal and also Mobile GPUs. And for the first time ever, BitNet inference runs efficiently on smartphones using mobile GPUs. On flagship devices, GPU inference is 2 to 11 times faster than CPU while using up to 90% less memory than the full precision models. The biggest unlock: QVAC Fabric LLM support for BitNet LoRA fine-tuning on heterogeneous GPUs. Our team was able to demonstrate this by fine tuning models up to 3.8 billion parameters on all flagships phones such as Pixel 9, S25 and iPhone 16 and up to 13 billion parameter models on the iPhone 16. Github repositories: github.com/tetherto/qvac-… : general QVAC Fabric codebase github.com/tetherto/qvac-… : specific QVAC Fabric's BitNet knowledge base, architecture docs and pre-built binaries What does it mean? What used to require dedicated GPUs now runs on consumer hardware. This breakthrough is the first real-world signal of a local private AI that can truly serve the people. And this is just the beginning. In the next months and years Tether will relentlessly continue to invest significant amounts of resources and capital to continue to research and develop open-source intelligence that can scale and evolve on local devices, providing maximum utility and privacy to its users. The era of Stable Intelligence has just begun. Free as in freedom.
Tether@tether

Tether’s QVAC Launches World’s First Cross-Platform BitNet LoRA Framework to Enable Billion-Parameter AI Training and Inference on Consumer GPUs and Smartphones Learn more: tether.io/news/tethers-q…

English
136
171
1.4K
315.6K
Bitcoin News
Bitcoin News@BitcoinNewsCom·
NEW: Tether just unveiled a major breakthrough in local AI. Its new QVAC Fabric lets powerful AI models run directly on your smartphone or laptop, no data centers or expensive hardware required. Key points: • Runs on iPhone, Android, and desktop • Up to 90% less memory needed • Faster performance than traditional setups • No reliance on NVIDIA GPUs or the cloud AI is moving from big servers to your pocket, opening the door to faster, cheaper, and more private intelligence.
English
16
74
360
32.8K