majabbar

5.5K posts

majabbar

@MindLedger

Realist, AI Augumented Human..... 🤔

✨️ Entrou em Şubat 2012

198 Seguindo379 Seguidores

majabbar@MindLedger·2d

@TheAhmadOsman What about gen 4 gpus on gen 5 slots? I have installed 2x 3090ti on 2x Gen 5 slots with x8x8 bifurcation.

English

Ahmad@TheAhmadOsman·2d

the basics: PCIe lanes, or the highways GPUs use for data transfer > you've probably seen stuff like "PCIe 4.0 x16" thrown around > in AI/Hardware/LLM build threads so, what's PCIe actually? > it stands for "Peripheral Component Interconnect Express" > it's how your GPU, SSD, or any addon card > talks (transfers data) to your CPU > via high-speed lanes packed into your Motherboard > "x16" = number of PCIe lanes (more lanes = more total bandwidth) > "4.0" = the generation (each Gen doubles bandwidth per lane) > "PCIe" = the name of the interface standard --- with every PCIe "Gen" generation > speeds usually double > > Gen 3: ~1 GB/s > > Gen 4: ~2 GB/s > > Gen 5: ~4 GB/s > > Gen 6: ~8 GB/s each PCIe lane > is a full-duplex wire pair > one pair for send, one for receive > when you plug a GPU > into x16 Gen 4 PCIe slot > you're assigning 16 lanes of > Gen 4 speed for data transfer > to and from your GPU > that's 32 GB/s each direction > that's also 4 times faster than your NVMe SSD btw --- if you're curious > Gen 3: 1 GB/s per lane > > 16 GB/s in one direction (read OR write) > > 32 GB/s total bandwidth (read + write, aka "full duplex") > Gen 4: 2 GB/s per lane > x16 slot = 32 GB/s one way > > 64 GB/s combined (both directions) > Gen 5: 4 GB/s per lane > x16 slot = 64 GB/s one way > > 128 GB/s both ways > Gen 6: 8 GB/s per lane > x16 slot = 128 GB/s one way > > 256 GB/s both ways --- why lanes (and Gen) actually matter - inference & training > single GPU inference > all your Tensors and model weights cross the PCIe bus > x16 Gen 4 = 32 GB/s both ways > drop to x8, you're at half that; x4, you're throttled hard > single GPU training > Dataloader and Checkpoint Writes hit the PCIe even more > less lanes = GPU sitting around waiting for data > multi-GPU inference > CPU only has so many lanes to hand out > > gaming mobos? > > usually x16 GPU_1, but drop to x4 for GPU_2 > > this starves bandwidth, even for GPU_1 > > Threadripper Pro/Epyc? > > full x16 to every slot - no bottlenecks > multi-GPU training > Gradients and Activations need to move fast between GPUs > no NVLink? they're stuck riding pcie > bottleneck the lanes and your "8 GPUs" run like 3 > proper x16 lanes (and preferably NVLink) actually let you scale --- > bandwidth cheat sheet > 4090 on Gen 4 x16 = 32 GB/s > drop to Gen 3 x8 = 8 GB/s > Threadripper = 72 lanes > > 2x-4x GPUs at x16 Gen 4 > Threadripper Pro = 128 lanes > > 4x-6x GPUs at x16 Gen 5 > Epyc Genoa = 128–160 lanes > > 6x-10x GPUs at x16 Gen 5 > Intel i9 or AMD Ryzen? 16-24 lanes > > 1 GPU at x16 or 2 with bottlenecks --- next up in this series: > Retimers, Redrivers, and all the weird stuff nobody warns you about > Bifurcation > Gen 3 risers > Chipset vs CPU lanes > PCIe Switches > eGPU traps > other rookie mistakes --- —Buy a GPU, The Movement

English

147

5.5K

majabbar@MindLedger·4d

@TheAhmadOsman Tomorrow is a big day for me inshallah. I'm going to fix my new psu, motherboard, and 2x 309ti with nvlink. It's all because of you. 🙏

English

198

Ahmad@TheAhmadOsman·4d

LLMs will get locked to apps - No API access - “For safety reasons” Anthropic, OpenAI, Google, etc optimize for vendor lock-in & data collection Run your AI models locally > Opensource > Open weights > Your hardware When you don’t own the model you are the product

Peter Steinberger 🦞@steipete

Anthropic now blocks first-party harness use too 👀 claude -p --append-system-prompt 'A personal assistant running inside OpenClaw.' 'is clawd here?' → 400 Third-party apps now draw from your extra usage, not your plan limits. So yeah: bring your own coin 🪙🦞

English

309

14.8K

majabbar@MindLedger·5d

@TheAhmadOsman Opensource all the way to AGI 🤪

English

Ahmad@TheAhmadOsman·5d

Anthropic is doing research on opensource models The same Anthropic has never released an opensource model LOL, LMAO even

Ahmad@TheAhmadOsman

@AnthropicAI So you guys like open-weights too? Any plans to release an opensource model for the community?

English

261

12.7K

majabbar@MindLedger·6d

@TheAhmadOsman @AnjneyMidha @CS153Systems @LisaSu @satyanadella @garrytan @sdianahu @matiii @andi_blatt @mabb0tt @karpathy @LiamFedus @amppublic It should be a must.

English

Ahmad@TheAhmadOsman·6d

@AnjneyMidha @CS153Systems @LisaSu @satyanadella @garrytan @sdianahu @matiii @andi_blatt @mabb0tt @karpathy @LiamFedus @amppublic Let me know if you ever need someone to speak on local inference or hardware :)

English

828

Anjney Midha@AnjneyMidha·2 Nis

We have now updated the @CS153Systems site with our second batch of speakers including @LisaSu @satyanadella @garrytan @sdianahu @matiii and @andi_blatt Next week we'll unveil some surprises that are big wins for students and the broader ecosystem

English

426

102.8K

majabbar@MindLedger·6d

@TheAhmadOsman THANKS

English

464

Ahmad@TheAhmadOsman·6d

DROP EVERYTHING > install Harbor > harbor pull unsloth/gemma-4-31B-it-GGUF:Q4_K_M > harbor up llamacpp searxng webui > open Open WebUI > load Gemma 4 Now your local model has a UI, web search, and a sandboxed stack

English

140

1.8K

94.1K

majabbar@MindLedger·6d

@TheAhmadOsman Wow. Useful .. very useful ...

English

207

Ahmad@TheAhmadOsman·6d

x.com/i/article/2040…

ZXX

218

1.5K

178.5K

majabbar@MindLedger·6d

@TheAhmadOsman Very informative.

English

Ahmad@TheAhmadOsman·6d

Fundamentals of LLMs: MoE vs Dense > many popular releases have been sparse MoEs > so when a dense model drops, everyone starts asking why it feels so much slower > that’s the cost of full activation > Dense = tokens run through every parameter of the model weifhts > MoE = tokens selectively activate a subset of the parameters of the model weights > Dense models (Qwen 3.5 27B, Gemma 4 31B) > every parameter fires on every token > ~27B ops per token, every time > MoE models (MiniMax M2, Kimi K2.5) > router + many experts > per token: activate top-k (usually 2) > the rest do nothing > this one design choice changes everything > inference speed > Dense is slower: all weights, every token > MoE is faster: a 675B model might only run ~40B active params > big model, small compute footprint > memory / VRAM > Dense: lower usage, only store what you execute (~140GB for 70B BF16) > MoE: all experts must live in memory (Kimi K2.5 is ~600GB in NVFP4) > compute / FLOPs > Dense: high compute burn per token > MoE: cheap per token, expensive to host in memory though

English

325

26.6K

majabbar@MindLedger·6d

@TheAhmadOsman @sudoingX When is the discord? I have so much to ask during my 0 to Hero journey and may be may regularly share my milestones. I started with 1 3090ti, just got the second 3090ti and am now deciding on a new motherboard and stuff, etc.

English

465

Ahmad@TheAhmadOsman·6d

GPUs >> Unified Memory (e.g. Mac Studio)

mike@mike_4131

@TheAhmadOsman should we secure GPUs or is a Mac Studio 512gb enough?

Italiano

189

25.6K

majabbar@MindLedger·2 Nis

@TheAhmadOsman RTX Pro 6000 is the way to go. A hard wish, though.......

English

143

Ahmad@TheAhmadOsman·2 Nis

got this in my inbox from a NVIDIA contact > RTX 5090 vs Mac Studio M3 Ultra this further highlights how Dense models are best served on GPUs (2.7x perf. jump) p.s. this is without accounting for concurrency (parallel agents) & most probably there’s a lot more juice to squeeze

English

3.3K

Ahmad@TheAhmadOsman·2 Nis

look how Gemma 4 31B (Dense) performs so close to models that are 13x-20x its size

Ahmad@TheAhmadOsman

i still believe in dense models

English

480

33K

majabbar@MindLedger·2 Nis

@TheAhmadOsman @VitalikButerin Spot-on

English

Ahmad@TheAhmadOsman·2 Nis

@VitalikButerin you should buy a few GPUs, vitalik x.com/TheAhmadOsman/…

Ahmad@TheAhmadOsman

My house has 33 GPUs. > 21x RTX 3090s > 4x RTX 4090s > 4x RTX 5090s > 4x Tenstorrent Blackhole p150a Before AGI arrives: Acquire GPUs. Go into debt if you must. But whatever you do, secure the GPUs.

English

vitalik.eth@VitalikButerin·2 Nis

My self-sovereign / local / private / secure LLM setup, April 2026 vitalik.eth.limo/general/2026/0…

English

574

630

4.8K

majabbar@MindLedger·2 Nis

@TheAhmadOsman So far, all of your predictions are coming true.

English

Ahmad@TheAhmadOsman·2 Nis

This prediction is going to age like fine wine

Ahmad@TheAhmadOsman

INCREDIBLE Qwen3.6-Plus just dropped > the gap to Opus 4.5 is getting… uncomfortable > (and hopefully Alibaba will release the weights soon) > look at agent + coding benchmarks > Terminal-Bench 2.0 > Qwen: 61.6 > Opus: 59.3 > Qwen wins > SWE-bench Pro > Qwen: 56.6 > Opus: 57.1 > basically tied > SWE-bench Verified > Qwen: 78.8 > Opus: 80.9 > within striking distance > SWE Multilingual > Qwen: 73.8 > Opus: 77.5 > still behind > but not by much > Claw Eval (pass^3) > Qwen: 58.7 > Opus: 59.6 > again… basically tied > NL2Repo (long horizon repo work) > Qwen: 37.9 > Opus: 43.2 > gap exists but no doubt it is shrinking fast > QwenWebBench (actual frontend / artifacts) > Qwen: 1502 > Opus: 1518 > that’s basically parity > overall > equal or better in some areas > within single-digit % in most > closing fastest in agent workflows > and that last part matters most > because the future isn’t: > “who answers trivia better” > it’s: > “who can run systems end-to-end” > and on that axis > opensource labs are already breathing down Opus’ neck > this is a real shift in economics > once you’re > ~95% as good > massively cheaper > fully controllable I really hope the folks at Alibaba keep backing open source and we get the weights for this one soon

English

195

11.8K

majabbar@MindLedger·1 Nis

@TheAhmadOsman They'll 'opensource' it as they have done with Claude Code 😉

English

151

Ahmad@TheAhmadOsman·1 Nis

is Anthropic gonna opensource this or are we going to have to wait until their next leak?

Boris Cherny@bcherny

Today we're excited to announce NO_FLICKER mode for Claude Code in the terminal It uses an experimental new renderer that we're excited about. The renderer is early and has tradeoffs, but already we've found that most internal users prefer it over the old renderer. It also supports mouse events (yes, in a terminal). Try it: CLAUDE_CODE_NO_FLICKER=1 claude

English

7.3K

majabbar@MindLedger·1 Nis

@sudoingX Ask Claude Code 🤪

English

Sudo su@sudoingX·1 Nis

fuck. i just lost a thought.

English

2.1K

majabbar@MindLedger·1 Nis

@TheAhmadOsman Yours would be the 1st, I suppose. 👍

English

Ahmad@TheAhmadOsman·1 Nis

PREDICTION 2026-2027 will bring a new era for opensource AI An era that will be DOMINATED by American opensource labs pushing the frontier of open models >The gap between close & open models will get narrower, not widen as many speculate This tweet is for history, bookmark it

English

201

46.7K

majabbar@MindLedger·1 Nis

@TheAhmadOsman Time is near, I suppose.

English

majabbar@MindLedger·1 Nis

@TheAhmadOsman Well, RTX 6000 is something I could dream for 🤪

English

144

Ahmad@TheAhmadOsman·1 Nis

This will probably be great for Large single GPUs (e.g. RTX PRO 6000) You’re limited to 40Gpbs initially (during model loading) but then once the model is fully loaded on the GPU it should be extremely faster than Unified Memory speeds for inference

the tiny corp@__tinygrad__

If you have a Thunderbolt or USB4 eGPU and a Mac, today is the day you've been waiting for! Apple finally approved our driver for both AMD and NVIDIA. It's so easy to install now a Qwen could do it, then it can run that Qwen...

English

208

20.1K

majabbar retweetou

Ahmad@TheAhmadOsman·31 Mar

Opensource is Anthropic’s Lord Voldemort 336 days ago Anthropic’s sent me a DMCA takedown for my opensource fork of Claude Code today Claude Code source code got leaked and is cloned and forked 10000s of times lol

Ahmad@TheAhmadOsman

LOL Anthropic sent me a DMCA for my fork of anon-kode

English

350

21.8K

majabbar@MindLedger·31 Mar

@elonmusk Why?