s◎last◎rm

5.9K posts

s◎last◎rm banner
s◎last◎rm

s◎last◎rm

@__solastorm__

Retired algo & HFT engineer. Nerd. Degen. Lazy. 🤖 🇺🇸🇪🇺🇺🇦🇬🇧🏴󠁧󠁢󠁥󠁮󠁧󠁿🇨🇦

Blackstool, England ;) انضم Haziran 2021
6.2K يتبع2.8K المتابعون
Graeme
Graeme@gkisokay·
The Local LLM Cheat Sheet for Your 8GB VRAM or Unified-Memory Device If you have a spare base Mac mini, RTX 3060, or 8GB iGPU device, these are the top LLMs you can run on them. Best Daily Driving Models Qwen3.5 4B - The Unified 8GB Daily-Driver Newest 4B from Alibaba and the strongest one-model answer on 8GB. Q8 on a discrete GPU is essentially lossless, while the Q6 fallback for unified memory still beats most 7B Q4 quants on benchmarks. Qwen3.5-9B - The Upgrade Daily-Driver for 8GB The best overall model that fits VRAM. Q4_K_M is just enough, and Q5_K_S at 5.82GB is also pick-safe if you want to push the ceiling. Not a clean unified-memory fit. Phi-4-mini-instruct - The Best Unified-Memory Microsoft’s strongest small instruct model, and unusually clean at this tier because the same Q8_0 file fits both ceilings. Great for chat, summaries, structured output, and function calling. Gemma-4-E4B-it - Newest Top Model 4B effective active params, sharper instruction following than Gemma 3 4B, and best-in-class IFEval at this size. GPU users get a near-lossless Q5_K_M, but unified-memory users have to drop to Q3, which dulls reasoning. Best Reasoning Models Qwen3-4B-Thinking-2507 - Best Reasoner The best small reasoner at this tier. Q8 on both passes means the CoT doesn’t quantize-degrade. AIME and MATH scores land ahead of Phi-4-mini reasoning, making it the reasoning pick whether you’re on a GPU or unified memory. DeepSeek-R1-Distill-Qwen-7B - Best Chain-of-Thought The deepest chain-of-thought option if you can spare the VRAM. Distilled R1 at 7B is the strongest reasoning chain at this size, but there is no clean sub-4GB unified-memory quant that preserves the reasoning chain, so is GPU-only. Best Specialist Models Qwen2.5-Coder-7B-Instruct - Best Coder The best small coder for HumanEval and MBPP. Use bartowski’s GGUF repo, since Qwen’s official GGUFs have size anomalies and Unsloth doesn’t ship a 7B-Coder GGUF. Best for code completion, refactors, debugging, and repo work. Qwen3-VL-4B-Instruct - Best Vision It replaces Gemma 3 4B as the vision choice with same 4B-class footprint, newer post-training, and 2x the context window. Q8 on both tiers keeps both the language core and vision head near-lossless. SmolLM3-3B-128K - Best long-context 28K context at 3B params, lossless BF16 on GPU, and near-lossless Q8 on unified memory. Pick this when the job is feeding long documents to a tiny model that can keep up. On a 3060/4060 with VRAM to spare, Qwen3.5-9B is the best pick right now. The other rows only matter for specific jobs: Qwen3-VL-4B for vision, Qwen2.5-Coder-7B for coding on GPU, and SmolLM3-3B-128K for long-doc work. Which models have you been running on 8GB VRAM or unified-memory devices? Let me know in the comments.
Graeme tweet media
Graeme@gkisokay

Local LLM Cheat Sheet Master Collection: All Tiers (April 2026) Bookmark this thread to access the top LLMs for your exact hardware and use case 🧵

English
17
75
521
47.1K
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
How can Qwen3.6-27B be so incredibly strong? I tested it 8bit MLX and it's really incredible, it's not just benchmarks. It's something unimaginable few months ago. Imagine what will happen in the next 6 months 🤯
Ivan Fioravanti ᯅ tweet media
English
20
11
319
14.2K
brimigs
brimigs@b_migliaccio·
brimigs tweet media
ZXX
6
0
46
2K
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
Today I will code using Local AI only! with OpenCode. I'll try MiniMax M2.7 and Qwen models! Let's see what happens.
English
53
7
466
34.8K
s◎last◎rm
s◎last◎rm@__solastorm__·
@dani_avila7 If it wants to read them it will. It’ll just write a shell script
English
0
0
0
73
s◎last◎rm
s◎last◎rm@__solastorm__·
I’ve struggled to get anything useful out of Gemma 4 on a Mac with only 32gb ram. Have tried a bunch of variants. Anyone got a tip on how to get it to work?
English
0
0
0
21
zostaff
zostaff@zostaff·
I BUILT A BOT THAT PREDICTS FOOTBALL MORE ACCURATELY THAN BOOKMAKERS 3 probability sources. ML model + Bet365 odds + Polymarket. When all three diverge - that's edge 5 seasons of EPL, La Liga, Bundesliga. 7,600+ matches. Each with goals, shots, possession, corners, cards, odds ELO rating using the FIFA formula - accounts for opponent strength, goal difference, home advantage. Not just W/D/L but the context behind every win xG proxy from basic stats - shots on target * 30% conversion + shots off target * 3%. Teams scoring more than they should - regression is coming Rolling averages over 5 matches, fatigue factor, head-to-head history, day of the week Claude API analyzes context the model can't see - motivation, pressure, derbies XGBoost + Random Forest + Logistic Regression in an ensemble. Walk-forward backtest, not random split Bookmaker says 55% home. Polymarket says 48%. Model says 52%. KL-divergence between sources = signal. The bigger the gap + the fatter the edge All three agree - I skip, zero edge. Two against one - I enter on the majority side Kelly sizes the position, Claude explains why
zostaff@zostaff

x.com/i/article/2043…

English
86
123
1.6K
508.6K
s◎last◎rm
s◎last◎rm@__solastorm__·
@ivanfioravanti Agreed. I'm new to this. Was pretty shocked at how flakey the available software harnesses were.
English
0
0
2
18
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
@__solastorm__ I think many are following this path, but it would be better concentrating efforts on fewer and more battle tested alternatives.
English
1
0
0
135
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
MLX benchmarking the various inference engines is a real mess at the moment. 😭 I'm finding many issues under heavy load, wrong perf stats from some servers, wrong management of cache mixing parts of prompts from other sessions, OOM, bugs. So far I've tested: omlx, vmlx and mlx-vlm using gemma-4-26b-a4b-it-4bit My choices: 🥇 mlx-vlm best performance overall in non cache scenarios 🥈 omlx best cache management (7.3K prefill below sounds wrong to me) 🥉 vmlx was unable to complete batch inference tests going in OOM with 32 parallel batch of prompt/tg 2048/128 I'll keep investigating and testing because we need to figure out the best way to run models locally.
Ivan Fioravanti ᯅ tweet mediaIvan Fioravanti ᯅ tweet media
English
24
7
143
12K
s◎last◎rm
s◎last◎rm@__solastorm__·
@Scaramucci Elmo is lying to you. The math for space data centres doesn't math. It's not even close.
English
0
0
0
60
Anthony Scaramucci
Anthony Scaramucci@Scaramucci·
I own SpaceX. I participated in a private round. I was also an investor in xAI. Now here's my honest read: The cult of personality around Elon Musk gives his companies an excessive premium that is off the charts. Tesla is suffering and still carries a valuation that defies conventional metrics. SpaceX is reportedly raising $75 billion. That would be three times larger than any IPO in history. So why did I invest? Because I think Starlink alone is worth a fortune. I believe the idea of orbital data centers — getting energy from the sun, not the electricity grid, beaming it back down to earth via satellite technology is fascinating. It sounds like science fiction right now. I think it becomes fact. And I think he's better positioned to do it than anyone else on earth. But here's the lesson that really drives my thinking. I missed Amazon. $10,000 invested in the Amazon IPO on May 15th 1997 would be worth almost $20 million today. The guys who bought it were right. I wasn't. I'm not making that mistake again.
English
161
83
840
281.9K
Adam Wathan
Adam Wathan@adamwathan·
// AGENTS.md Never, ever, under any circumstances, ever, not once, no matter what, try to start the fucking dev server, it’s already fucking running.
English
312
287
6.7K
296.3K
s◎last◎rm
s◎last◎rm@__solastorm__·
@AlexFinn It works on a Mac? Can you share the model link?
English
0
0
0
100
Alex Finn
Alex Finn@AlexFinn·
Bad news: Claude Mythos is out and you can't use it sucker. It's too dangerous in your hands. Good news: I just downloaded GLM 5.1 onto my Mac Studio, and it's by far the best open source model I've ever used Crushing every task I give it compared to Qwen and Gemma. Faster too I have it scraping the web and putting together content and playbooks for me every minute of the day Working nonstop. Costs me literally $0. Also is very strong at coding too Is it Opus 4.6? No, but it's getting closer. And nobody can lobotomize it or lower my limits or take it away from me A 24/7/365 employee that never eats, sleeps, or complains. Just works. For free. The greatest technology in the history of this species should be democratized, not gatekept. And that's what the open source community is doing right now The people who bought hardware are prepared for the future. The people attacking the people with hardware are on the wrong side of history. Superintelligence on your desk is within reach.
Alex Finn tweet mediaAlex Finn tweet media
English
180
112
1.5K
128.3K
Liberty Tree 🟧
Liberty Tree 🟧@LiberteaTree·
I'll admit it. I was duped by Trump's rhetoric while campaigning. Dude was solid. Now, I think he needs to be impeached, convicted, and jailed.
English
1.2K
1.4K
13.6K
200.8K
Ginny Robinson
Ginny Robinson@ImGinnyRobinson·
I wish I had never voted for Donald Trump.
English
2.2K
1.2K
14.6K
460.6K
Gary Cardone
Gary Cardone@GaryCardone·
Most certainly not what I voted for or intended for when I donated 12.8 bitcoin to Trump's team. I tried, I put my energy behind a wrecking ball to drain the swamp, to stop war, my intentions were good, if not naive and childish to think after 65 years the corrupt american political system could be dismantled. I am left with the decision to never again support political ambitions of anyone, I distrust all professional politicians and the toilet bowl they created and encourage as an incubator for professional politicians. I am and shall seek out and align with those who are freedom fighters, who believe in peace, in humanity, in the the right for others to live freely, openly, and creatively as communities who are not repressed or oppressed by any government or body. I build around me a massive community of champions, of hero's, those who choose to use their voice irrespective of the attacks that come from speaking the truth for the betterment of humanity. Who wants to join? The Right Thing to Do is the Right Thin, irrespective of how convenient it may or may not be. Stand up, Voice up, Man up; this is your defining moment we should all be judged on. Do I / you choose to; 'go along to get along', or 'fight back against insanity, traps, corruption, enslavement, oppression, surveilance and stand up like a man and fight those that enslave humanty" Our species requires it of you.
Alex Jones@RealAlexJones

🚨🚨WAR CRIME ALERT!!🚨🚨- Trump on Iran: "A whole civilization will die tonight, never to be brought back again. The definition of genocide is destroying an entire civilization/people! Trump literally sounds like an unhinged super villain from a Marvel comic movie. This IS NOT WHAT WE VOTED FOR!!!

English
1.2K
847
7.5K
1.1M
s◎last◎rm
s◎last◎rm@__solastorm__·
@ryanvogel Is there a trick to get it running? I tried and failed
English
0
0
0
43
vogel
vogel@ryanvogel·
gemma 4 on my macbook pro, M4 max, 36 GB ram is crazy fast for how good it is i think i have a cool project idea for it 👀
English
51
16
707
69.5K
s◎last◎rm
s◎last◎rm@__solastorm__·
@constantout Same. I use it all day every day and only sometimes, at the end of a week, do I come close to running out. Having said that - codex is sooo generous. You just don’t need the max plan
English
0
0
0
26
Constantin 🔥
Constantin 🔥@constantout·
I feel like the whole Claude Clode limits drama is turning into a meme. I’m on $100/mo Max and have never hit limits in 2 prompts. How tf are people doing it?
English
99
2
148
23.4K