thestreamingdev()

4.5K posts

thestreamingdev()

thestreamingdev()

@thestreamingdev

all things ai and coding while streaming, DM for consulting.

가입일 Ocak 2022
564 팔로잉1.7K 팔로워
고정된 트윗
thestreamingdev()
thestreamingdev()@thestreamingdev·
I ran a 35-billion parameter AI agent on a $600 Mac mini. Specs: M4 Mac-Mini 16GB RAM The model doesn't fit in RAM. It pages from the SSD at 30 tokens/second. On NVIDIA, the same paging gives you 1.6 tok/s. Apple Silicon gives you 30. That's 18.6x faster. No cloud. No API keys. $0/month. Here's what it can do 🧵
English
169
213
3.2K
690.1K
TBPN
TBPN@tbpn·
Linear CEO @karrisaarinen says throwing out SaaS entirely can send companies on a "long journey" back to the exact same workflows: "This idea of like, 'let's throw everything out, no SaaS at all'... I think that's an option. But then you start inventing stuff back from first principles, and you maybe end up in the same spot again." "Everyone's running their agent, and now they have 10 agents running. So then they put them on a Kanban board, which Linear has, and then it's like, 'Oh, now I invented agent orchestration.' Like, no, you invented the Kanban board, which has been around for like 30 years." "So, kudos to connecting those two topics. But some things don't need to be reinvented. Some things can still work. But if you just throw everything out, you kind of start from the beginning, and then I think it's a long journey to figure everything out again."
English
0
6
165
33.8K
thestreamingdev()
thestreamingdev()@thestreamingdev·
Actually, it is possible! I built a custom MLX engine that streams FFN weights from SSD instead of loading the full model into RAM. The 27B (16.1 GB, full 4-bit quality) runs on any 16 GB Mac, only 5.5 GB stays in memory. Measured 0.18 tok/s on our M4. Slow, but coherent output, no compression artifacts. Code: github.com/walter-grace/m… That said for actual daily use, I'd run Qwen3.5-35B-A3B GUFF at IQ2_M (10.6 GB) through llama.cpp. It fits entirely in your 16 GB, runs at 30 tok/s on M4 (will be a bit less on M2), and is a better model than the 27B. Web search, shell commands, reasoning all working. That agent is in the main repo! Let me know if you get it going. I have a claude file in there too just ask claude to help set it up
English
0
0
0
4
Lean Kin Prak
Lean Kin Prak@LeanKinPrazli·
@0xSero Nice! I wanted to run the qwen 27B on my M2 16GB but failed. That's not possible, right? I mean with compression etc.
English
2
0
4
1.4K
0xSero
0xSero@0xSero·
Qwen3.5-35B compressed 20% with 1%~ performance drop on average. Now you can fit this (4bits) with full context on 24GB of VRAM 700$~ or 1x 3090 huggingface.co/0xSero/Qwen-3.…
English
47
46
861
29.9K
Dan Romero
Dan Romero@dwr·
@zachterrell57 How do you convert? And preserve complex layouts like tables and other graphics with data?
English
4
0
0
575
thestreamingdev()
thestreamingdev()@thestreamingdev·
I ran a 35-billion parameter AI agent on a $600 Mac mini. Specs: M4 Mac-Mini 16GB RAM The model doesn't fit in RAM. It pages from the SSD at 30 tokens/second. On NVIDIA, the same paging gives you 1.6 tok/s. Apple Silicon gives you 30. That's 18.6x faster. No cloud. No API keys. $0/month. Here's what it can do 🧵
English
169
213
3.2K
690.1K
Google Research
Google Research@GoogleResearch·
Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI
GIF
English
924
5.5K
37.6K
18.1M
Yifei Hu
Yifei Hu@hu_yifei·
Qwen3.5 27B feels more solid than 35B-A3B, because a dense model is more solid than a sparse model. (English is not my first language, but I really tried here)
English
26
5
380
22.6K
Andrej Karpathy
Andrej Karpathy@karpathy·
When I built menugen ~1 year ago, I observed that the hardest part by far was not the code itself, it was the plethora of services you have to assemble like IKEA furniture to make it real, the DevOps: services, payments, auth, database, security, domain names, etc... I am really looking forward to a day where I could simply tell my agent: "build menugen" (referencing the post) and it would just work. The whole thing up to the deployed web page. The agent would have to browse a number of services, read the docs, get all the api keys, make everything work, debug it in dev, and deploy to prod. This is the actually hard part, not the code itself. Or rather, the better way to think about it is that the entire DevOps lifecycle has to become code, in addition to the necessary sensors/actuators of the CLIs/APIs with agent-native ergonomics. And there should be no need to visit web pages, click buttons, or anything like that for the human. It's easy to state, it's now just barely technically possible and expected to work maybe, but it definitely requires from-scratch re-design, work and thought. Very exciting direction!
Patrick Collison@patrickc

When @karpathy built MenuGen (karpathy.bearblog.dev/vibe-coding-me…), he said: "Vibe coding menugen was exhilarating and fun escapade as a local demo, but a bit of a painful slog as a deployed, real app. Building a modern app is a bit like assembling IKEA future. There are all these services, docs, API keys, configurations, dev/prod deployments, team and security features, rate limits, pricing tiers." We've all run into this issue when building with agents: you have to scurry off to establish accounts, clicking things in the browser as though it's the antediluvian days of 2023, in order to unblock its superintelligent progress. So we decided to build Stripe Projects to help agents instantly provision services from the CLI. For example, simply run: $ stripe projects add posthog/analytics And it'll create a PostHog account, get an API key, and (as needed) set up billing. Projects is launching today as a developer preview. You can register for access (we'll make it available to everyone soon) at projects.dev. We're also rolling out support for many new providers over the coming weeks. (Get in touch if you'd like to make your service available.) projects.dev

English
467
417
5K
1.7M
thestreamingdev()
thestreamingdev()@thestreamingdev·
@MarioClawAI “Hey Claude make this run with my openclaw” it just needs to run the model locally
English
0
0
2
2.6K
thestreamingdev()
thestreamingdev()@thestreamingdev·
@dzienko As long as the RAM is there it should work just as well. I’m guessing this is why Apple launched their new laptops
English
0
0
1
291
thestreamingdev()
thestreamingdev()@thestreamingdev·
@morganlinton thanks! SSD paging result is genuinely surprising to people, conventional wisdom says paging = unusable. The magic of @Apple Silicon breaks that assumption because there's no PCIe bus between the GPU and SSD
English
0
2
12
5.8K
Morgan
Morgan@morganlinton·
@thestreamingdev Whoa, sounds impossible, but clearly it’s very possible since you are actually showing it, wild!!
English
1
0
2
6.5K