notallthere

202 posts

notallthere

notallthere

@notallthere_net

Let’s talk about local computer vision lllms

hell Katılım Mart 2009
43 Takip Edilen11 Takipçiler
notallthere
notallthere@notallthere_net·
@LottoLabs I started at this path, let me give u a warning now. No. Start with a 5000 pro 48gb ddr7
English
0
0
0
150
Lotto
Lotto@LottoLabs·
It’s very simple Find a 3090 or two Get any mobo that supports 2 pcie x16 ports (at least x16x4 for lanes) Get a 1200W+ PSU Buy the cheapest ddr4 ram 64gb+ (you’re not using it anyways) Install Linux, vLLM, Llama.cpp, SGlang, tailscale Download any flavour of qwen 3.7 27b You are now localmaxxing
English
98
57
1K
61.8K
notallthere
notallthere@notallthere_net·
M5 Max 128 gb ram laptop, There is no way possible used new refurbished not spark, RTX , or anything or anyway , to get models loaded that require 70GB+ of vram at the TOKs that this laptop does NOTHING These laptops will be gone soon mark my words
English
0
0
0
19
notallthere
notallthere@notallthere_net·
@theo He’S the kind of guy that re-reads his own posts
English
1
0
0
51
Theo - t3.gg
Theo - t3.gg@theo·
He really doesn't like when you weaponize his own behavior against him. You guys definitely should spam his replies, pointing out the contradictory nature of his assertions. That would be terrible.
Theo - t3.gg tweet media
English
8
1
164
15.1K
Theo - t3.gg
Theo - t3.gg@theo·
This Gary Marcus is going around insisting that nobody will debate him about AI. I shared my thoughts and got blocked. Pic unrelated.
Theo - t3.gg tweet media
English
72
48
1.4K
75.4K
David Hendrickson
David Hendrickson@TeksEdge·
🚨 🤯 @xyster did it! This is insane! Imagine running MiniMax M2.7 locally! One of the best open source models running locally on 4x @Intel B70 ARC Pros w/128GB of VRAM @ 83 tps! While not cheap, $4K will get you Intel 4 cards while 1 x @nvidia RTX-5090 32GB will cost $5K or RTX-6000 w/96GB costs $10K but neither will run MiniMax M2.7.
David Hendrickson tweet media
Steve💙🇨🇦@xyster

The fresh new B70 PCIe 4.0 build is running Minimax now. Stock was about 13-tps; currently 83-tps (decode) after applying my optimizations. It's a bit short of the 93 I had on the PCIe 5.0 mobo, but I might have missed some patches. Sanity check tests passed. See reply for more

English
25
44
512
52.4K
notallthere
notallthere@notallthere_net·
Do you watch YouTube videos on AI? The Problem: 95% of the videos about AI on YouTube are people that are just using AI to make videos on YouTube and optimize their YouTube strategy. They’re not building what you’re building.
English
0
0
0
12
Sudo su
Sudo su@sudoingX·
dgx spark has the most underserved ecosystem for what the hardware is capable of. complete white space.
English
25
2
149
12.3K
notallthere
notallthere@notallthere_net·
@BardIonson @sudoingX Yeah that’s nice but you can remote in and at the prices right now you would be better off going the rtx route
English
0
0
0
23
Kaito
Kaito@KaiXCreator·
Why does a MacBook often feel more powerful than Windows even with the same RAM?
English
40
0
39
2.9K
notallthere
notallthere@notallthere_net·
@adamdotdev Mid halfing same old moron posts, Funny to me grown men that jump on hype , and pretend it’s a crystal Ball, we been here since 2009 and we will be here until 2.9m btc
English
0
0
0
308
Adam
Adam@adamdotdev·
So crazy to me that there are still adult men talking publicly about crypto/blockchain/ethereum/etc things
English
43
2
346
30K
notallthere
notallthere@notallthere_net·
Before u do this realize they take up 3 slots of ur lucky, look for 2 a lot cards, make sure they aren’t passive, make sure I have room for ur raid card, make sure u have enough ram it takes planning , also, not saying don’t do it saying that your going to spend minimum 6k before u have the right setup
English
0
0
1
738
Ahmad
Ahmad@TheAhmadOsman·
Gentle reminder that all you need to start with Local AI is: - 2x RTX 3090s (pick up for $700-$900 on r/hardwareswap) - Qwen 3.6 27B / Gemma 4 31B - Your favorite agent (Claude Code / OpenCode / etc) - Self-hosted SearXNG for web access And you got yourself Opus 4.5 at home
English
72
71
952
54.1K
notallthere
notallthere@notallthere_net·
@TheAhmadOsman Before u do this realize they take up 3 slots of ur lucky, it takes planning , also clinks shit up in price, not saying don’t do it saying that your going to spend minimum 6k before u have the right setup
English
0
0
1
863
Evis Drenova
Evis Drenova@evisdrenova·
The cursor fall-off is going to be studied for decades. I don't know any engineer who uses them anymore. Not to say that others don't, but it's obvious that they're no longer on the tech frontier. Still, a $60b outcome in 4 years is nothing to sneeze at...
Evis Drenova tweet media
English
281
1
314
220K
notallthere
notallthere@notallthere_net·
@moshhamedani Claude code was good in 2025 can’t even believe anyone still uses it over codex the difference is not even comparable, I don’t trust claude with ANYTHING important
English
0
0
0
68
Mosh
Mosh@moshhamedani·
I haven’t used Codex yet but I see some been favoring that over Claude Code. Can you share some insight on why you think it’s better?
English
143
7
319
51.9K
Jack V Lloyd
Jack V Lloyd@jackvlloyd·
You can judge how intelligent someone is by how much they hate data centers. The greater the hate, the lower the intelligence.
English
349
83
1.1K
32K
notallthere
notallthere@notallthere_net·
@0xSero I would rather pay for 3 6000,s 96gb each , can’t deal with the slowness of sparks ,
English
0
0
1
118
0xSero
0xSero@0xSero·
Cheapest competitive build on Nvidia 2 sparks = 8000$ Total specs - 256gb - 8tb - 546gb/s memory bandwidth - tons of flops --- models: - Deepseek-v4-flash - MiMo-v2.5-flash fp4 - MiniMax-M2.7 - Qwen3.5-397b-reap Flaws: - Low mem bandwidth - You need 2 for best perf
0xSero tweet media
English
41
8
273
24.7K
notallthere
notallthere@notallthere_net·
@steipete I do the total OPPOSITE. I’ve been working for 8 months on trying to achieve on small amounts so they can run on small llms CONSTRAINS produce true diamonds, look at Kimi k 2.6 , I think abundance ultimately produces worst achievements
English
0
0
0
403
Peter Steinberger 🦞
Peter Steinberger 🦞@steipete·
People freaking out over my AI spend. What nobody sees: Part of what excites me so much about working on OpenClaw is that I'm trying to answer the question: How would we build software in the future if tokens don't matter? We constant run ~100 codex in the cloud, reviewing every PR, every issue. If a fix on main lands, @clawsweeper will eventually find that 6 month old issue and close it with an exact reference. We run codex on every commit to review for security issues (as it's far too easy to miss). We run codex to de-duplicate issues and find clusters and send reports for the most pressing issues. We have agents that can recreate complex setups, spin up ephemeral crabbox.sh machines, log into e.g. Telegram, make a video and post before/after fix on the PR. There's codex that watch new issues and - if it fits our documented vision well, automatically create a PR of it. (that then another codex reviews) We have codex running that scans comments for spam and blocks people. We have codex instances running that verify performance benchmarks and report regressions into Discord. We have agents that listen on our meetings and proactively start work, e.g. create PRs when we discuss new features while we discuss them. We build clawpatch.ai to split all our projects into functional units to review and find bugs and regresssions. We do the same split for security with Vercel's deepsec and Codex Security to find regressions and vulnerabilities. All that automation allows us to run this project extremely lean.
English
512
423
7.5K
2M
Kaito
Kaito@KaiXCreator·
Is Claude Opus really the best model for coding right now ?
English
137
3
116
16.2K
notallthere
notallthere@notallthere_net·
@ctnzr @grok with this tech is there a path to running Kimi k2 on 3 96gb ddr7 cards?
English
1
0
0
340
Bryan Catanzaro
Bryan Catanzaro@ctnzr·
We've gone even farther: Nemotron 3 Super is 120B and pretrained on 25T tokens in NVFP4. Nemotron 3 Ultra is ~500B and also pretrained in NVFP4. Accelerated computing means we rethink every aspect of the AI stack looking for new opportunities to improve efficiency.
How To AI@HowToAI_

NVIDIA has done the impossible and nobody's talking about it. They trained a 12 BILLION parameter LLM in 4-bit precision on 10 trillion tokens. For years, the AI industry has been stuck. If you wanted to train a world-class AI, you had to use 16-bit or 8-bit precision. Going lower to 4-bit, was a death sentence for the model. It would become unstable, "hallucinate" its own math, and eventually collapse. But NVIDIA proved that "impossible" was just a math problem. They used a new format called NVFP4. Instead of a standard, rigid structure, NVFP4 uses "micro-scaling." It groups numbers into tiny blocks and applies individual scaling factors to each one. It’s like giving the AI a pair of high-definition glasses for its own data, allowing it to see fine details even with 75% less memory. The result is a total paradigm shift: - 2× to 3× faster arithmetic performance. - 50% reduction in memory usage. - Near-zero loss in intelligence. The researchers compared the 4-bit model against a massive 8-bit baseline. The curves are identical. On MMLU, GSM8K, and coding benchmarks, the "tiny" 4-bit version performed within 0.1% of the more expensive model. This is an economic earthquake. Training a frontier model used to require tens of thousands of GPUs and months of time. NVIDIA just showed we can get the same results with half the hardware and a fraction of the electricity.

English
36
90
932
137.3K