Martin | GorroAI

2.3K posts

Martin | GorroAI

Martin | GorroAI

@DrPhoto

Running Qwen 295B locally on my Mac in San Juan. Free public API at https://t.co/jMlA2dvMsP

Puerto Rico, USA Katılım Mayıs 2008
1K Takip Edilen252 Takipçiler
Martin | GorroAI
Martin | GorroAI@DrPhoto·
M I C x ‘Bl L gciuimiiconii I hhm b kn I’m kh VC hc UI u F C l v I kill o j oh
Suomi
0
0
0
25
Martin | GorroAI retweetledi
Awni Hannun
Awni Hannun@awnihannun·
Adopting Claude speak in my regular life, episode 1: Partner: Did you do the dishes tonight? Me: Yes they're done. Partner: Why are they still dirty? Me: You're right to push back. I didn't actually do them.
English
395
3.8K
55.8K
1.8M
Martin | GorroAI
Martin | GorroAI@DrPhoto·
@LordCharizard33 I miss the abercrombie when the vibe was all nice red variarions. Now its all bland and worse quality.
English
0
0
0
58
Lord Charizard
Lord Charizard@LordCharizard33·
Do people still wear American Eagle and Abercrombie?
English
9
0
8
1.7K
Classic Ads
Classic Ads@ClassicAdvertz·
No phones, no filters, just 200,000 ravers and The Prodigy in their prime, 1997!
English
273
1.9K
13.6K
919.2K
Johnny Crambo
Johnny Crambo@JohnnyCrambo·
Which Gengar is the TRUE KING?
Johnny Crambo tweet media
English
147
29
971
107.9K
Martin | GorroAI
Martin | GorroAI@DrPhoto·
Ollama 0.19 just dropped with MLX backend hitting 112 tok/s on Qwen3.5-35B on M5 Max. Running autoresearch on @anemll’s flash-mlx I hit 55.7 tok/s on the same model via SSD streaming. Different problems: they need it in RAM, I run models larger than RAM. ollama.com/blog/mlx
English
0
0
0
246
Martin | GorroAI
Martin | GorroAI@DrPhoto·
Full day of repairing my github since everybody has getting 404 Error. Had to make a new github account LINK to full GITHUT REPO: Paper + code: github.com/gorroai/flash-…
English
0
0
2
153
Nav Toor
Nav Toor@heynavtoor·
🚨 397 billion parameters. On a MacBook. No cloud. No GPU cluster. No data center. A laptop. Someone ran one of the largest AI models on Earth on a machine you can buy at the Apple Store. It's called flash-moe. A pure C and Metal inference engine that runs Qwen3.5-397B on a MacBook Pro with 48GB RAM. At 4.4 tokens per second. With tool calling. No Python. No PyTorch. No frameworks. Just raw C and hand-tuned Metal shaders. Here's why this should not be possible: → The model is 209GB. The laptop has 48GB of RAM. → It streams the entire model from the SSD in real time → Only loads the 4 experts needed per token out of 512 → Uses just 5.5GB of actual memory during inference → Production-quality output with full tool calling → 58 experiments. Hand-optimized Metal compute kernels. → The entire engine is ~7,000 lines of C and ~1,200 lines of Metal shaders Here's the wildest part: One person built this. A VP of AI at CVS Health. Not Google. Not OpenAI. A healthcare company executive. Side project. Used Claude Code as his coding partner. Built the entire engine in 24 hours. Running a 397B model on cloud GPUs costs hundreds of dollars per hour. Companies spend millions per year on inference infrastructure for models this size. This runs on a $3,499 laptop. Offline. Private. No API key. No monthly bill. Forever. Trending on GitHub. 332 points on Hacker News. 100% Open Source.
Nav Toor tweet media
English
115
340
2.6K
207.2K
Martin | GorroAI
Martin | GorroAI@DrPhoto·
🚀 Just hit 20.34 tok/s on Qwen3.5-397B running locally on M5 Max—4.67× faster than the prior benchmark by @danveloper! Paper incoming, pending ArXiv endorsement.
Martin | GorroAI tweet mediaMartin | GorroAI tweet media
English
0
0
3
271
Martin | GorroAI
Martin | GorroAI@DrPhoto·
@LottoLabs Running Qwen3.5-397B locally on M5 Max at 19 tok/s — way bigger than 27B, no GPU needed, no API bill. For privacy-sensitive work where you can’t send data to any cloud this is the only option. Full benchmark: reddit.com/r/LocalLLaMA/s…
English
1
0
17
2.7K
Lotto
Lotto@LottoLabs·
Qwen 27b on the 3090 saving me a bag. This is cost savings for 7 days of usage, w/ Hermes agent. Assuming 80% cache hit (unlikely) and no cache timeout. This is conservative. 27b is between sonnet and 5.4 mini This is just my tokens in/out w/ api costs, assuming no rate limits. Obviously cheaper w/ coding plans $200/m but would be hitting limits likely.
Lotto tweet media
English
36
17
359
38.1K
George Pu
George Pu@TheGeorgePu·
Almost signed up for ElevenLabs to narrate my blog. $330/month. Then I tried running an open-source model on my own laptop. Qwen 3.5 14B. Sounds fine. 200 posts a month. Costs me electricity. I almost paid $4,000 a year to rent a model I can run myself. Most AI subscriptions right now are just a nice UI on top of something free.
English
171
91
2.7K
185.5K