Fabio Guzman

329 posts

Fabio Guzman

@FGuzmanAI

On-device ML Engineer | 🤖Passionate about reverse-engineering neural nets | 🚀Optimizing large models for the edge 💻📱

Colombia शामिल हुए Mayıs 2011

531 फ़ॉलोइंग226 फ़ॉलोवर्स

पिन किया गया ट्वीट

Fabio Guzman@FGuzmanAI·5 Tem

🚀 Excited to launch CLIP-Finder! 🎉 CLIP-Finder enables semantic searches of images using natural language descriptions and camera input. Built on Apple's MobileCLIP-S0 architecture. Check it out on GitHub: github.com/fguzman82/CLIP… #ComputerVision #CoreML #AppleSilicon

English

122

12.4K

Fabio Guzman@FGuzmanAI·11 Mar

@anemll 🥳 We’re looking forward to the NE benchmarks and its energy efficiency

English

Anemll@anemll·11 Mar

ZXX

801

Fabio Guzman@FGuzmanAI·7 Mar

@anemll Fantastic 🤩

English

292

Anemll@anemll·7 Mar

Qwen 3.5 0.8B, Gated DeltaNet attention is running on Apple Neural Engine ~56 t/s in LUT6 quantization with some room for optimization left. It is CoreML, Swift and IOSurface on M4Pro. It will slow down as we increase context, but not by much. I think Private API opens the way to integrate ANE with GPU/MLX and possibly some MoE.

English

185

13.5K

Fabio Guzman@FGuzmanAI·5 Mar

@anemll The work in (github.com/maderix/ANE) is interesting, but do you think the CoreML framework still provides better benefits? In my case, CPU-only performs better than CPU+ANE, so it seems Apple AMX is good enough, and moving data to IOSurface introduces a high overhead.

Fabio Guzman@FGuzmanAI

I moved all the logic to the CPU, and the performance improved by almost 2×. (42 tok/s), running on 100% CPU and 0% ANE.

English

322

Fabio Guzman@FGuzmanAI·5 Mar

I moved all the logic to the CPU, and the performance improved by almost 2×. (42 tok/s), running on 100% CPU and 0% ANE.

English

380

John Mai@JohnMai_Dev·3 Mar

I just implemented inference for Qwen3.5 0.8B based on github.com/maderix/ANE, and successfully ran it on an M1 Pro.

Brian Roemmele@BrianRoemmele

BOOM! Apple’s Neural Engine Was Just Cracked Open, The Future of AI Training Just Change And Zero-Human Company Is Already Testing It! In a jaw-dropping open-source breakthrough, a lone developer has done what Apple said was impossible: full neural network training– including backpropagation – directly on the Apple Neural Engine (ANE). No CoreML, no Metal, no GPU. Pure, blazing ANE silicon. The project (github.com/maderix/ANE) delivers a single transformer layer (dim=768, seq=512) in just 9.3 ms per step at 1.78 TFLOPS sustained with only 11.2% ANE utilization on an M4 chip. That’s the same idle chip sitting in millions of Mac minis, MacBooks, and iMacs right now. Translation? Your desktop just became a hyper-efficient AI supercomputer. The numbers are insane: M4 ANE hits roughly 6.6 TFLOPS per watt – 80 times more efficient than an NVIDIA A100. Real-world throughput crushes Apple’s own “38 TOPS” marketing claims. And because it sips power like a phone, you can train 24/7 without melting your electricity bill or the planet. At The Zero-Human Company, we’re not waiting. We are testing this right now on real ZHC workloads. This is the missing piece we’ve been chasing for our Zero Human Company vision: reviving archived data into fully autonomous AI systems with zero human overhead. This is world-changing. For the first time, anyone with a Mac can fine-tune, train, or iterate massive models locally, privately, and at a fraction of the cost of cloud GPUs. No more renting $40,000 A100 clusters. No more waiting in queues. No more massive carbon footprints. Training costs that used to run into the tens or hundreds of thousands of dollars? Plummeting toward pennies on the dollar – mostly just the electricity your Mac was already using while it sat idle. The AI revolution just moved from billion-dollar data centers to your desk. WE WILL HAVE A NEW ZERO-HUMAN COMPANY @ HOME wage for equipped Macs that will be up to 100x more income for the owner! We’re only at the beginning (single-layer today, full models tomorrow), but the door is wide open. Ultra-cheap, on-device training is here. The future isn’t coming. It’s already running on your Mac. Welcome to the Zero-Human Company era.

English

151

1.7K

252.6K

Fabio Guzman@FGuzmanAI·4 Mar

@sach1n @DamiDina @maderix Cool, which model are you training?

English

1.2K

Sachin Desai@sach1n·4 Mar

Here’s ANE running on an iPhone 17 Pro. Thank you @maderix for the amazing work.

English

283

31.9K

Fabio Guzman@FGuzmanAI·16 Şub

Wow, excellent! It would be great if we could define a public repo with that skill, so we can contribute to bringing more models to MLX. Last year I converted this one: x.com/FGuzmanAI/stat… fortunately it was straightforward, but I understand that sometimes more elaborate handling is required.

Fabio Guzman@FGuzmanAI

Running VibeThinker-1.5B on iPhone. ~1.5GB RAM usage, reasoning behavior comparable to GPT-OSS-20B. This is where edge AI is heading. huggingface.co/mlx-community/…

English

155

Pedro Cuenca@pcuenq·16 Şub

Here we are having a nice chat with @ariG23498 while my Claude Skill is busy converting Qwen 3.5 to MLX, and requests my attention to `sudo` a command to increase the wired memory limit. Welcome to the future 🚀

English

6.6K

Fabio Guzman@FGuzmanAI·8 Şub

@elliotarledge Is the diffusion model single-step?

English

464

Elliot Arledge@elliotarledge·8 Şub

some visuals of drifting vs diffusion on cifar10. can you tell the difference?

Elliot Arledge@elliotarledge

Giving Opus 4.6 and GPT 5.3 Codex a spare 8xH100 node to verify how huge this REALLY is.

English

114

39.2K

Fabio Guzman@FGuzmanAI·4 Şub

@Prince_Canuma Great, Prince - which URL hosts the MLX weights?

English

Prince Canuma@Prince_Canuma·3 Şub

Coming to MLX 🚀 Weights already uploading

Qwen@Alibaba_Qwen

🚀 Introducing Qwen3-Coder-Next, an open-weight LM built for coding agents & local development. What’s new: 🤖 Scaling agentic training: 800K verifiable tasks + executable envs 📈 Efficiency–Performance Tradeoff: achieves strong results on SWE-Bench Pro with 80B total params and 3B active ✨ Supports OpenClaw, Qwen Code, Claude Code, web dev, browser use, Cline, etc 🤗 Hugging Face: huggingface.co/collections/Qw… 🤖 ModelScope: modelscope.cn/collections/Qw… 📝 Blog: qwen.ai/blog?id=qwen3-… 📄 Tech report: github.com/QwenLM/Qwen3-C…

English

4.4K

Fabio Guzman@FGuzmanAI·30 Kas

@pcuenq Feliz cumpleaños Pedro 🥳 justo hoy también estoy probando mi regalo de navidad adelantado 🎄

Español

Pedro Cuenca@pcuenq·27 Kas

It's been my birthday, so I treated myself to a RTX PRO 6000 Blackwell (96 GB) to upgrade my 3090 🥳 What should I run?

English

153

20.7K

Fabio Guzman@FGuzmanAI·20 Kas

x.com/fguzmanai/stat…

Fabio Guzman@FGuzmanAI

Solve the derivative of sin(x²) step by step (60.93 tokens/s on an iPhone 17 Pro)

ZXX

466

Fabio Guzman@FGuzmanAI·19 Kas

Welcome VibeThinker-1.5B to MLX! 🚀 This 1.5B model is competitive with GPT-OSS-20B and MiniMax 456B on AIME2025! 🤯 ONLY 1.54GB MEMORY FOOTPRINT! ⚡️ Run it locally on your Mac now: 🤗 Model: huggingface.co/mlx-community/… 💻 Github: github.com/WeiboAI/VibeTh… #MLX #AppleSilicon

English

279

16.8K

Fabio Guzman@FGuzmanAI·20 Kas

@MaziyarPanahi x.com/fguzmanai/stat…

Fabio Guzman@FGuzmanAI

Solve the derivative of sin(x²) step by step (60.93 tokens/s on an iPhone 17 Pro)

QME

Maziyar PANAHI@MaziyarPanahi·20 Kas

@FGuzmanAI which iphone?

English

299

Fabio Guzman@FGuzmanAI·20 Kas

Running VibeThinker-1.5B on iPhone. ~1.5GB RAM usage, reasoning behavior comparable to GPT-OSS-20B. This is where edge AI is heading. huggingface.co/mlx-community/…

English

165

9.9K

Fabio Guzman@FGuzmanAI·20 Kas

@MaziyarPanahi 17 pro

235

Fabio Guzman@FGuzmanAI·20 Kas

Solve the derivative of sin(x²) step by step (60.93 tokens/s on an iPhone 17 Pro)

English

846

Fabio Guzman@FGuzmanAI·20 Kas

@anemll x.com/FGuzmanAI/stat…

Fabio Guzman@FGuzmanAI

Running VibeThinker-1.5B on iPhone. ~1.5GB RAM usage, reasoning behavior comparable to GPT-OSS-20B. This is where edge AI is heading. huggingface.co/mlx-community/…

QME

124

Anemll@anemll·20 Kas

Fun fact: the VibeThinker model has huge outliers in K-proj biases for layer-0 (over 400), which blows up FP16 inference. I can hack it to run on ANE, but it’ll take an effort to get good quality in FP16/ANE. Wish the authors had done RL in FP16. github.com/WeiboAI/VibeTh… huggingface.co/WeiboAI/VibeTh… FP16 paper: arxiv.org/abs/2510.26788

Maziyar PANAHI@MaziyarPanahi

wow! this tiny 1.5B model is now trending #1 on @huggingface! 😱

English

1.6K

Fabio Guzman@FGuzmanAI·20 Kas

@WeiboLLM

Fabio Guzman@FGuzmanAI

x.com/FGuzmanAI/stat…

QAM

Fabio Guzman@FGuzmanAI·20 Kas

@MaziyarPanahi x.com/FGuzmanAI/stat…

Fabio Guzman@FGuzmanAI

Running VibeThinker-1.5B on iPhone. ~1.5GB RAM usage, reasoning behavior comparable to GPT-OSS-20B. This is where edge AI is heading. huggingface.co/mlx-community/…

QME

696

Maziyar PANAHI@MaziyarPanahi·18 Kas

it's crazy what a 1.5B model can do these days! "VibeThinker-1.5B is a 1.5-billion parameter dense language model. With a total training cost of only $7,800 USD, it achieves reasoning performance comparable to larger models like GPT OSS-20B Medium." runs perfectly on device!