Fabio Guzman

329 posts

Fabio Guzman banner
Fabio Guzman

Fabio Guzman

@FGuzmanAI

On-device ML Engineer | 🤖Passionate about reverse-engineering neural nets | 🚀Optimizing large models for the edge 💻📱

Colombia शामिल हुए Mayıs 2011
531 फ़ॉलोइंग226 फ़ॉलोवर्स
पिन किया गया ट्वीट
Fabio Guzman
Fabio Guzman@FGuzmanAI·
🚀 Excited to launch CLIP-Finder! 🎉 CLIP-Finder enables semantic searches of images using natural language descriptions and camera input. Built on Apple's MobileCLIP-S0 architecture. Check it out on GitHub: github.com/fguzman82/CLIP… #ComputerVision #CoreML #AppleSilicon
English
3
25
122
12.4K
Fabio Guzman
Fabio Guzman@FGuzmanAI·
@anemll 🥳 We’re looking forward to the NE benchmarks and its energy efficiency
English
0
0
1
44
Anemll
Anemll@anemll·
Anemll tweet media
ZXX
2
0
18
801
Anemll
Anemll@anemll·
Qwen 3.5 0.8B, Gated DeltaNet attention is running on Apple Neural Engine ~56 t/s in LUT6 quantization with some room for optimization left. It is CoreML, Swift and IOSurface on M4Pro. It will slow down as we increase context, but not by much. I think Private API opens the way to integrate ANE with GPU/MLX and possibly some MoE.
English
14
11
185
13.5K
Fabio Guzman
Fabio Guzman@FGuzmanAI·
I moved all the logic to the CPU, and the performance improved by almost 2×. (42 tok/s), running on 100% CPU and 0% ANE.
English
0
1
4
380
John Mai
John Mai@JohnMai_Dev·
I just implemented inference for Qwen3.5 0.8B based on github.com/maderix/ANE, and successfully ran it on an M1 Pro.
John Mai tweet media
Brian Roemmele@BrianRoemmele

BOOM! Apple’s Neural Engine Was Just Cracked Open, The Future of AI Training Just Change And Zero-Human Company Is Already Testing It! In a jaw-dropping open-source breakthrough, a lone developer has done what Apple said was impossible: full neural network training– including backpropagation – directly on the Apple Neural Engine (ANE). No CoreML, no Metal, no GPU. Pure, blazing ANE silicon. The project (github.com/maderix/ANE) delivers a single transformer layer (dim=768, seq=512) in just 9.3 ms per step at 1.78 TFLOPS sustained with only 11.2% ANE utilization on an M4 chip. That’s the same idle chip sitting in millions of Mac minis, MacBooks, and iMacs right now. Translation? Your desktop just became a hyper-efficient AI supercomputer. The numbers are insane: M4 ANE hits roughly 6.6 TFLOPS per watt – 80 times more efficient than an NVIDIA A100. Real-world throughput crushes Apple’s own “38 TOPS” marketing claims. And because it sips power like a phone, you can train 24/7 without melting your electricity bill or the planet. At The Zero-Human Company, we’re not waiting. We are testing this right now on real ZHC workloads. This is the missing piece we’ve been chasing for our Zero Human Company vision: reviving archived data into fully autonomous AI systems with zero human overhead. This is world-changing. For the first time, anyone with a Mac can fine-tune, train, or iterate massive models locally, privately, and at a fraction of the cost of cloud GPUs. No more renting $40,000 A100 clusters. No more waiting in queues. No more massive carbon footprints. Training costs that used to run into the tens or hundreds of thousands of dollars? Plummeting toward pennies on the dollar – mostly just the electricity your Mac was already using while it sat idle. The AI revolution just moved from billion-dollar data centers to your desk. WE WILL HAVE A NEW ZERO-HUMAN COMPANY @ HOME wage for equipped Macs that will be up to 100x more income for the owner! We’re only at the beginning (single-layer today, full models tomorrow), but the door is wide open. Ultra-cheap, on-device training is here. The future isn’t coming. It’s already running on your Mac. Welcome to the Zero-Human Company era.

English
65
151
1.7K
252.6K
Sachin Desai
Sachin Desai@sach1n·
Here’s ANE running on an iPhone 17 Pro. Thank you @maderix for the amazing work.
English
14
22
283
31.9K
Fabio Guzman
Fabio Guzman@FGuzmanAI·
Wow, excellent! It would be great if we could define a public repo with that skill, so we can contribute to bringing more models to MLX. Last year I converted this one: x.com/FGuzmanAI/stat… fortunately it was straightforward, but I understand that sometimes more elaborate handling is required.
Fabio Guzman@FGuzmanAI

Running VibeThinker-1.5B on iPhone. ~1.5GB RAM usage, reasoning behavior comparable to GPT-OSS-20B. This is where edge AI is heading. huggingface.co/mlx-community/…

English
0
0
2
155
Pedro Cuenca
Pedro Cuenca@pcuenq·
Here we are having a nice chat with @ariG23498 while my Claude Skill is busy converting Qwen 3.5 to MLX, and requests my attention to `sudo` a command to increase the wired memory limit. Welcome to the future 🚀
Pedro Cuenca tweet media
English
4
0
38
6.6K
Fabio Guzman
Fabio Guzman@FGuzmanAI·
@pcuenq Feliz cumpleaños Pedro 🥳 justo hoy también estoy probando mi regalo de navidad adelantado 🎄
Fabio Guzman tweet mediaFabio Guzman tweet media
Español
0
0
2
67
Pedro Cuenca
Pedro Cuenca@pcuenq·
It's been my birthday, so I treated myself to a RTX PRO 6000 Blackwell (96 GB) to upgrade my 3090 🥳 What should I run?
Pedro Cuenca tweet mediaPedro Cuenca tweet media
English
41
6
153
20.7K
Fabio Guzman
Fabio Guzman@FGuzmanAI·
Running VibeThinker-1.5B on iPhone. ~1.5GB RAM usage, reasoning behavior comparable to GPT-OSS-20B. This is where edge AI is heading. huggingface.co/mlx-community/…
English
5
16
165
9.9K
Fabio Guzman
Fabio Guzman@FGuzmanAI·
Solve the derivative of sin(x²) step by step (60.93 tokens/s on an iPhone 17 Pro)
English
0
0
2
846
Maziyar PANAHI
Maziyar PANAHI@MaziyarPanahi·
it's crazy what a 1.5B model can do these days! "VibeThinker-1.5B is a 1.5-billion parameter dense language model. With a total training cost of only $7,800 USD, it achieves reasoning performance comparable to larger models like GPT OSS-20B Medium." runs perfectly on device!
English
32
83
902
202.2K
Fabio Guzman
Fabio Guzman@FGuzmanAI·
@pashakho For that demo prompt: (M4 Pro) 160.71 tok/sec • 8130 tokens • 0.12s to first token
English
0
0
4
381
Pasha
Pasha@pashakho·
@FGuzmanAI Nice, that's so fast🚀 How many tokens/s could we get?
English
1
0
0
392