notallthere

202 posts

notallthere

@notallthere_net

Let’s talk about local computer vision lllms

hell Katılım Mart 2009

43 Takip Edilen11 Takipçiler

notallthere@notallthere_net·12h

@LottoLabs I started at this path, let me give u a warning now. No. Start with a 5000 pro 48gb ddr7

English

150

Lotto@LottoLabs·18h

It’s very simple Find a 3090 or two Get any mobo that supports 2 pcie x16 ports (at least x16x4 for lanes) Get a 1200W+ PSU Buy the cheapest ddr4 ram 64gb+ (you’re not using it anyways) Install Linux, vLLM, Llama.cpp, SGlang, tailscale Download any flavour of qwen 3.7 27b You are now localmaxxing

English

61.8K

notallthere@notallthere_net·1d

M5 Max 128 gb ram laptop, There is no way possible used new refurbished not spark, RTX , or anything or anyway , to get models loaded that require 70GB+ of vram at the TOKs that this laptop does NOTHING These laptops will be gone soon mark my words

English

notallthere@notallthere_net·1d

@theo He’S the kind of guy that re-reads his own posts

English

Theo - t3.gg@theo·2d

He really doesn't like when you weaponize his own behavior against him. You guys definitely should spam his replies, pointing out the contradictory nature of his assertions. That would be terrible.

English

164

15.1K

Theo - t3.gg@theo·2d

This Gary Marcus is going around insisting that nobody will debate him about AI. I shared my thoughts and got blocked. Pic unrelated.

English

1.4K

75.4K

notallthere@notallthere_net·1d

@TeksEdge @xyster @intel @nvidia TOKs or stfu

English

147

David Hendrickson@TeksEdge·1d

🚨 🤯 @xyster did it! This is insane! Imagine running MiniMax M2.7 locally! One of the best open source models running locally on 4x @Intel B70 ARC Pros w/128GB of VRAM @ 83 tps! While not cheap, $4K will get you Intel 4 cards while 1 x @nvidia RTX-5090 32GB will cost $5K or RTX-6000 w/96GB costs $10K but neither will run MiniMax M2.7.

Steve💙🇨🇦@xyster

The fresh new B70 PCIe 4.0 build is running Minimax now. Stock was about 13-tps; currently 83-tps (decode) after applying my optimizations. It's a bit short of the 93 I had on the PCIe 5.0 mobo, but I might have missed some patches. Sanity check tests passed. See reply for more

English

512

52.4K

notallthere@notallthere_net·5d

Do you watch YouTube videos on AI? The Problem: 95% of the videos about AI on YouTube are people that are just using AI to make videos on YouTube and optimize their YouTube strategy. They’re not building what you’re building.

English

notallthere@notallthere_net·5d

@OliDietzel @sudoingX 20k ur almost at 200gb vram and agents and subagents DONT U WANT SUB AGENTS !?

English

Oli Dietzel 🗽@OliDietzel·6d

@notallthere_net @sudoingX 5000 < 20000 :)

107

Sudo su@sudoingX·6d

dgx spark has the most underserved ecosystem for what the hardware is capable of. complete white space.

English

149

12.3K

notallthere@notallthere_net·5d

@J33P4 @sudoingX 20x slower

English

𝗝 𝟯 𝟯 𝗣 𝟰 | 𝗷𝟯𝟯𝗽𝟰.𝗲𝘁𝗵@J33P4·5d

@notallthere_net @sudoingX 4x cheaper perhaps?

English

notallthere@notallthere_net·5d

@BardIonson @sudoingX Yeah that’s nice but you can remote in and at the prices right now you would be better off going the rtx route

English

𝙱å𝚛𝚍 𝙸𝚘𝚗𝚜𝚘𝚗@BardIonson·5d

@notallthere_net @sudoingX I am interested because of its portability for travel and quietness for edge computing

English

notallthere@notallthere_net·5d

@bsatyarthi @KaiXCreator Yep memory is inside the cpu or something right?

English

Badal Satyarthi@bsatyarthi·5d

@KaiXCreator just two words - “vertical integration”

English

Kaito@KaiXCreator·6d

Why does a MacBook often feel more powerful than Windows even with the same RAM?

English

2.9K

notallthere@notallthere_net·6d

@adamdotdev Mid halfing same old moron posts, Funny to me grown men that jump on hype , and pretend it’s a crystal Ball, we been here since 2009 and we will be here until 2.9m btc

English

308

Adam@adamdotdev·6d

So crazy to me that there are still adult men talking publicly about crypto/blockchain/ethereum/etc things

English

346

30K

notallthere@notallthere_net·6d

Before u do this realize they take up 3 slots of ur lucky, look for 2 a lot cards, make sure they aren’t passive, make sure I have room for ur raid card, make sure u have enough ram it takes planning , also, not saying don’t do it saying that your going to spend minimum 6k before u have the right setup

English

738

Ahmad@TheAhmadOsman·6d

Gentle reminder that all you need to start with Local AI is: - 2x RTX 3090s (pick up for $700-$900 on r/hardwareswap) - Qwen 3.6 27B / Gemma 4 31B - Your favorite agent (Claude Code / OpenCode / etc) - Self-hosted SearXNG for web access And you got yourself Opus 4.5 at home

English

952

54.1K

notallthere@notallthere_net·6d

@TheAhmadOsman Before u do this realize they take up 3 slots of ur lucky, it takes planning , also clinks shit up in price, not saying don’t do it saying that your going to spend minimum 6k before u have the right setup

English

863

notallthere@notallthere_net·6d

@evisdrenova “Fell off” Off becuase u are an Elon hater

English

Evis Drenova@evisdrenova·6d

The cursor fall-off is going to be studied for decades. I don't know any engineer who uses them anymore. Not to say that others don't, but it's obvious that they're no longer on the tech frontier. Still, a $60b outcome in 4 years is nothing to sneeze at...

English

281

314

220K

notallthere@notallthere_net·18 May

@moshhamedani Claude code was good in 2025 can’t even believe anyone still uses it over codex the difference is not even comparable, I don’t trust claude with ANYTHING important

English

Mosh@moshhamedani·16 May

I haven’t used Codex yet but I see some been favoring that over Claude Code. Can you share some insight on why you think it’s better?

English

143

319

51.9K

notallthere@notallthere_net·16 May

@jackvlloyd Agreed

English

Jack V Lloyd@jackvlloyd·15 May

You can judge how intelligent someone is by how much they hate data centers. The greater the hate, the lower the intelligence.

English

349

1.1K

32K

notallthere@notallthere_net·16 May

@0xSero I would rather pay for 3 6000,s 96gb each , can’t deal with the slowness of sparks ,

English

118

0xSero@0xSero·16 May

Cheapest competitive build on Nvidia 2 sparks = 8000$ Total specs - 256gb - 8tb - 546gb/s memory bandwidth - tons of flops --- models: - Deepseek-v4-flash - MiMo-v2.5-flash fp4 - MiniMax-M2.7 - Qwen3.5-397b-reap Flaws: - Low mem bandwidth - You need 2 for best perf

English

273

24.7K

notallthere@notallthere_net·16 May

@steipete I do the total OPPOSITE. I’ve been working for 8 months on trying to achieve on small amounts so they can run on small llms CONSTRAINS produce true diamonds, look at Kimi k 2.6 , I think abundance ultimately produces worst achievements

English

403

Peter Steinberger 🦞@steipete·16 May

People freaking out over my AI spend. What nobody sees: Part of what excites me so much about working on OpenClaw is that I'm trying to answer the question: How would we build software in the future if tokens don't matter? We constant run ~100 codex in the cloud, reviewing every PR, every issue. If a fix on main lands, @clawsweeper will eventually find that 6 month old issue and close it with an exact reference. We run codex on every commit to review for security issues (as it's far too easy to miss). We run codex to de-duplicate issues and find clusters and send reports for the most pressing issues. We have agents that can recreate complex setups, spin up ephemeral crabbox.sh machines, log into e.g. Telegram, make a video and post before/after fix on the PR. There's codex that watch new issues and - if it fits our documented vision well, automatically create a PR of it. (that then another codex reviews) We have codex running that scans comments for spam and blocks people. We have codex instances running that verify performance benchmarks and report regressions into Discord. We have agents that listen on our meetings and proactively start work, e.g. create PRs when we discuss new features while we discuss them. We build clawpatch.ai to split all our projects into functional units to review and find bugs and regresssions. We do the same split for security with Vercel's deepsec and Codex Security to find regressions and vulnerabilities. All that automation allows us to run this project extremely lean.

English

512

423

7.5K

notallthere@notallthere_net·16 May

@KaiXCreator It was in 2025

English

Kaito@KaiXCreator·15 May

Is Claude Opus really the best model for coding right now ?

English

137

116

16.2K

notallthere@notallthere_net·16 May

@ctnzr @grok with this tech is there a path to running Kimi k2 on 3 96gb ddr7 cards?

English

340

Bryan Catanzaro@ctnzr·16 May

We've gone even farther: Nemotron 3 Super is 120B and pretrained on 25T tokens in NVFP4. Nemotron 3 Ultra is ~500B and also pretrained in NVFP4. Accelerated computing means we rethink every aspect of the AI stack looking for new opportunities to improve efficiency.

How To AI@HowToAI_

NVIDIA has done the impossible and nobody's talking about it. They trained a 12 BILLION parameter LLM in 4-bit precision on 10 trillion tokens. For years, the AI industry has been stuck. If you wanted to train a world-class AI, you had to use 16-bit or 8-bit precision. Going lower to 4-bit, was a death sentence for the model. It would become unstable, "hallucinate" its own math, and eventually collapse. But NVIDIA proved that "impossible" was just a math problem. They used a new format called NVFP4. Instead of a standard, rigid structure, NVFP4 uses "micro-scaling." It groups numbers into tiny blocks and applies individual scaling factors to each one. It’s like giving the AI a pair of high-definition glasses for its own data, allowing it to see fine details even with 75% less memory. The result is a total paradigm shift: - 2× to 3× faster arithmetic performance. - 50% reduction in memory usage. - Near-zero loss in intelligence. The researchers compared the 4-bit model against a massive 8-bit baseline. The curves are identical. On MMLU, GSM8K, and coding benchmarks, the "tiny" 4-bit version performed within 0.1% of the more expensive model. This is an economic earthquake. Training a frontier model used to require tens of thousands of GPUs and months of time. NVIDIA just showed we can get the same results with half the hardware and a fraction of the electricity.

English

932

137.3K

Keşfet

@LottoLabs @theo @TeksEdge @xyster @intel @nvidia @Intel @OliDietzel