thestreamingdev() (@thestreamingdev) - Профиль Twitter

Закреплённый твит

The scaling path: 16GB Mac mini → 35B agent ($0/month) 48GB Mac Pro → 35B at higher quality + speculative decoding 192GB Mac Studio → 397B frontier model 512GB Mac Pro → 1 TRILLION parameter model Same agent code. Zero changes. Just swap the model file. Everything is open source. The agent, the benchmarks, the retro Mac web UI, all of it. 🍎 github.com/walter-grace/m… One ask: I'd love to test this on a Mac Studio or Mac Pro with 192GB+. If you have one collecting dust and want to help push local AI forward, DM me. I'll run a frontier model on it and publish everything. There are 100 million Macs with Apple Silicon in the world. Every one of them is an untapped AI workstation. Time to use them.

English

17

33

420

43.1K

thestreamingdev()@thestreamingdev·13h

@ricknoblett

GIF

QME

0

145

Rick@ricknoblett·14h

@thestreamingdev Can’t wait for the M5 Studio!!

English

1

0

1

158

thestreamingdev()@thestreamingdev·1d

I ran a 35-billion parameter AI agent on a $600 Mac mini. Specs: M4 Mac-Mini 16GB RAM The model doesn't fit in RAM. It pages from the SSD at 30 tokens/second. On NVIDIA, the same paging gives you 1.6 tok/s. Apple Silicon gives you 30. That's 18.6x faster. No cloud. No API keys. $0/month. Here's what it can do 🧵

English

169

211

3.2K

683.2K

thestreamingdev()@thestreamingdev·13h

@TheCraigHewitt 🫡 thanks!

English

0

243

Craig Hewitt@TheCraigHewitt·13h

@thestreamingdev Rad. Following

English

1

0

1

266

thestreamingdev()@thestreamingdev·15h

@hexcrafter_eth @GoogleResearch claude code but only for macs. runs on the hardware of your device

English

0

6

hexcrafter.eth@hexcrafter_eth·15h

@thestreamingdev @GoogleResearch Whats “mac code”

English

1

0

10

Google Research@GoogleResearch·2d

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

GIF

English

907

5.5K

37.4K

17.8M

thestreamingdev()@thestreamingdev·18h

@Underminer1979 1000%

0

1

842

Acoustically Regarded@Underminer1979·19h

@thestreamingdev I have a 12 GB RTX 3080 and 64 GB of ram, can I do that too?

English

1

0

900

thestreamingdev()@thestreamingdev·19h

@hu_yifei Working on getting 27B onto the Mac Mini got 35B-A3B already working!

thestreamingdev()@thestreamingdev

I ran a 35-billion parameter AI agent on a $600 Mac mini. Specs: M4 Mac-Mini 16GB RAM The model doesn't fit in RAM. It pages from the SSD at 30 tokens/second. On NVIDIA, the same paging gives you 1.6 tok/s. Apple Silicon gives you 30. That's 18.6x faster. No cloud. No API keys. $0/month. Here's what it can do 🧵

English

0

25

Yifei Hu@hu_yifei·21h

Qwen3.5 27B feels more solid than 35B-A3B, because a dense model is more solid than a sparse model. (English is not my first language, but I really tried here)

English

26

5

361

20.7K

thestreamingdev()@thestreamingdev·19h

@WinstonLBrown1 Do it!!!

English

0

531

WinstonLBrown@WinstonLBrown1·20h

@thestreamingdev Stealing bits of this, thanks!

English

1

0

1

561

thestreamingdev()@thestreamingdev·22h

@karpathy This is where SSO can improve dramatically

English

0

67

Andrej Karpathy@karpathy·22h

When I built menugen ~1 year ago, I observed that the hardest part by far was not the code itself, it was the plethora of services you have to assemble like IKEA furniture to make it real, the DevOps: services, payments, auth, database, security, domain names, etc... I am really looking forward to a day where I could simply tell my agent: "build menugen" (referencing the post) and it would just work. The whole thing up to the deployed web page. The agent would have to browse a number of services, read the docs, get all the api keys, make everything work, debug it in dev, and deploy to prod. This is the actually hard part, not the code itself. Or rather, the better way to think about it is that the entire DevOps lifecycle has to become code, in addition to the necessary sensors/actuators of the CLIs/APIs with agent-native ergonomics. And there should be no need to visit web pages, click buttons, or anything like that for the human. It's easy to state, it's now just barely technically possible and expected to work maybe, but it definitely requires from-scratch re-design, work and thought. Very exciting direction!

Patrick Collison@patrickc

When @karpathy built MenuGen (karpathy.bearblog.dev/vibe-coding-me…), he said: "Vibe coding menugen was exhilarating and fun escapade as a local demo, but a bit of a painful slog as a deployed, real app. Building a modern app is a bit like assembling IKEA future. There are all these services, docs, API keys, configurations, dev/prod deployments, team and security features, rate limits, pricing tiers." We've all run into this issue when building with agents: you have to scurry off to establish accounts, clicking things in the browser as though it's the antediluvian days of 2023, in order to unblock its superintelligent progress. So we decided to build Stripe Projects to help agents instantly provision services from the CLI. For example, simply run: $ stripe projects add posthog/analytics And it'll create a PostHog account, get an API key, and (as needed) set up billing. Projects is launching today as a developer preview. You can register for access (we'll make it available to everyone soon) at projects.dev. We're also rolling out support for many new providers over the coming weeks. (Get in touch if you'd like to make your service available.) projects.dev

English

443

388

4.6K

1.5M

thestreamingdev()@thestreamingdev·22h

@MarcusShepher20 no sir

Español

0

735

Marcus Shepherd@MarcusShepher20·22h

@thestreamingdev Is this a fork of pi

English

1

0

785

thestreamingdev()@thestreamingdev·23h

@MarioClawAI “Hey Claude make this run with my openclaw” it just needs to run the model locally

English

0

2

2.5K

MarioClawAI | AI News@MarioClawAI·1d

@thestreamingdev How to know if the model can run with openclaw on your system?

English

1

0

1

2.9K

thestreamingdev()@thestreamingdev·23h

@dzienko As long as the RAM is there it should work just as well. I’m guessing this is why Apple launched their new laptops

English

0

1

279

Kamil Dzieniszewski@dzienko·1d

@thestreamingdev What about MacBooks?

English

1

0

301

thestreamingdev()@thestreamingdev·23h

@suvodeepmishra1

GIF

QME

0

1.2K

Suvodeep | Developer@suvodeepmishra1·23h

@thestreamingdev no cloud no API just vibes

Eesti

1

0

1.3K

thestreamingdev()@thestreamingdev·23h

@gubatron @PhillyGunson Probably way faster 😅

English

0

110

GUBA@gubatron·1d

@thestreamingdev @PhillyGunson wonder how it'd go with a 5090

Hapeville, GA 🇺🇸 English

1

0

115

thestreamingdev() ретвитнул

Morgan@morganlinton·23h

The next issue of my newsletter is out. 🎩 tip to @GoogleResearch, @saranormous, @karpathy, @thestreamingdev, @sawyerhood, @aidenybai, @LLMJunky

English

4

3

19

2K

thestreamingdev()@thestreamingdev·1d

@realphilipdias Nope but that’s breakthrough! We got 35B on it!

English

0

4

1.3K

Philip Dias@realphilipdias·1d

@thestreamingdev 16gb mac mini is not ideal for anything over 7b

English

1

0

1.3K

thestreamingdev()@thestreamingdev·1d

@Alibaba_Qwen We love @Alibaba_Qwen !

thestreamingdev()@thestreamingdev

I ran a 35-billion parameter AI agent on a $600 Mac mini. Specs: M4 Mac-Mini 16GB RAM The model doesn't fit in RAM. It pages from the SSD at 30 tokens/second. On NVIDIA, the same paging gives you 1.6 tok/s. Apple Silicon gives you 30. That's 18.6x faster. No cloud. No API keys. $0/month. Here's what it can do 🧵

English

0

31

Qwen@Alibaba_Qwen·1d

Big thanks to Steve for testing the entire Qwen3.5 family. Community feedback like this helps us get better. 🙏

stevibe@stevibe

Which local models can actually handle tool calling? I built a framework to find out. 15 scenarios. 12 tools. Mocked responses. Temperature 0. No cherry-picking. Tested every Qwen3.5 size from 0.8B to 397B, and since some of you asked after the distillation tests: yes, I included Jackrong's Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled too. Only two models went all green: the 27B dense and the distilled 27B. The 397B? Failed two tests. The 122B? Failed one. The 35B? Failed two. The timed-out results — mostly on the smaller models, are cases where the model got stuck in a loop, repeating the same tool call until it hit the 30-second limit. The test that exposed the most models: "Search for Iceland's population, then calculate 2% of it." Simple, but 35B, 122B, and 397B all used a rounded number from memory instead of the actual search result. They didn't trust their own tool output. Small models hallucinate data. Big models ignore data. The 27B just threaded it through.

English

41

62

1.5K

128.1K

thestreamingdev()@thestreamingdev·1d

@morganlinton thanks! SSD paging result is genuinely surprising to people, conventional wisdom says paging = unusable. The magic of @Apple Silicon breaks that assumption because there's no PCIe bus between the GPU and SSD

English

0

2

11

5.7K

Morgan@morganlinton·1d

@thestreamingdev Whoa, sounds impossible, but clearly it’s very possible since you are actually showing it, wild!!

English

1

0

1

6.5K