majabbar

5.5K posts

majabbar banner
majabbar

majabbar

@MindLedger

Realist, AI Augumented Human..... 🤔

✨️ Entrou em Şubat 2012
198 Seguindo379 Seguidores
majabbar
majabbar@MindLedger·
@TheAhmadOsman What about gen 4 gpus on gen 5 slots? I have installed 2x 3090ti on 2x Gen 5 slots with x8x8 bifurcation.
English
0
0
0
52
Ahmad
Ahmad@TheAhmadOsman·
the basics: PCIe lanes, or the highways GPUs use for data transfer > you've probably seen stuff like "PCIe 4.0 x16" thrown around > in AI/Hardware/LLM build threads so, what's PCIe actually? > it stands for "Peripheral Component Interconnect Express" > it's how your GPU, SSD, or any addon card > talks (transfers data) to your CPU > via high-speed lanes packed into your Motherboard > "x16" = number of PCIe lanes (more lanes = more total bandwidth) > "4.0" = the generation (each Gen doubles bandwidth per lane) > "PCIe" = the name of the interface standard --- with every PCIe "Gen" generation > speeds usually double > > Gen 3: ~1 GB/s > > Gen 4: ~2 GB/s > > Gen 5: ~4 GB/s > > Gen 6: ~8 GB/s each PCIe lane > is a full-duplex wire pair > one pair for send, one for receive > when you plug a GPU > into x16 Gen 4 PCIe slot > you're assigning 16 lanes of > Gen 4 speed for data transfer > to and from your GPU > that's 32 GB/s each direction > that's also 4 times faster than your NVMe SSD btw --- if you're curious > Gen 3: 1 GB/s per lane > > 16 GB/s in one direction (read OR write) > > 32 GB/s total bandwidth (read + write, aka "full duplex") > Gen 4: 2 GB/s per lane > x16 slot = 32 GB/s one way > > 64 GB/s combined (both directions) > Gen 5: 4 GB/s per lane > x16 slot = 64 GB/s one way > > 128 GB/s both ways > Gen 6: 8 GB/s per lane > x16 slot = 128 GB/s one way > > 256 GB/s both ways --- why lanes (and Gen) actually matter - inference & training > single GPU inference > all your Tensors and model weights cross the PCIe bus > x16 Gen 4 = 32 GB/s both ways > drop to x8, you're at half that; x4, you're throttled hard > single GPU training > Dataloader and Checkpoint Writes hit the PCIe even more > less lanes = GPU sitting around waiting for data > multi-GPU inference > CPU only has so many lanes to hand out > > gaming mobos? > > usually x16 GPU_1, but drop to x4 for GPU_2 > > this starves bandwidth, even for GPU_1 > > Threadripper Pro/Epyc? > > full x16 to every slot - no bottlenecks > multi-GPU training > Gradients and Activations need to move fast between GPUs > no NVLink? they're stuck riding pcie > bottleneck the lanes and your "8 GPUs" run like 3 > proper x16 lanes (and preferably NVLink) actually let you scale --- > bandwidth cheat sheet > 4090 on Gen 4 x16 = 32 GB/s > drop to Gen 3 x8 = 8 GB/s > Threadripper = 72 lanes > > 2x-4x GPUs at x16 Gen 4 > Threadripper Pro = 128 lanes > > 4x-6x GPUs at x16 Gen 5 > Epyc Genoa = 128–160 lanes > > 6x-10x GPUs at x16 Gen 5 > Intel i9 or AMD Ryzen? 16-24 lanes > > 1 GPU at x16 or 2 with bottlenecks --- next up in this series: > Retimers, Redrivers, and all the weird stuff nobody warns you about > Bifurcation > Gen 3 risers > Chipset vs CPU lanes > PCIe Switches > eGPU traps > other rookie mistakes --- —Buy a GPU, The Movement
Ahmad tweet media
English
12
12
147
5.5K
majabbar
majabbar@MindLedger·
@TheAhmadOsman Tomorrow is a big day for me inshallah. I'm going to fix my new psu, motherboard, and 2x 309ti with nvlink. It's all because of you. 🙏
English
1
0
2
198
Ahmad
Ahmad@TheAhmadOsman·
LLMs will get locked to apps - No API access - “For safety reasons” Anthropic, OpenAI, Google, etc optimize for vendor lock-in & data collection Run your AI models locally > Opensource > Open weights > Your hardware When you don’t own the model you are the product
Peter Steinberger 🦞@steipete

Anthropic now blocks first-party harness use too 👀 claude -p --append-system-prompt 'A personal assistant running inside OpenClaw.' 'is clawd here?' → 400 Third-party apps now draw from your extra usage, not your plan limits. So yeah: bring your own coin 🪙🦞

English
29
25
309
14.8K
Ahmad
Ahmad@TheAhmadOsman·
Anthropic is doing research on opensource models The same Anthropic has never released an opensource model LOL, LMAO even
Ahmad tweet media
Ahmad@TheAhmadOsman

@AnthropicAI So you guys like open-weights too? Any plans to release an opensource model for the community?

English
16
12
261
12.7K
Ahmad
Ahmad@TheAhmadOsman·
DROP EVERYTHING > install Harbor > harbor pull unsloth/gemma-4-31B-it-GGUF:Q4_K_M > harbor up llamacpp searxng webui > open Open WebUI > load Gemma 4 Now your local model has a UI, web search, and a sandboxed stack
Ahmad tweet media
English
71
140
1.8K
94.1K
Ahmad
Ahmad@TheAhmadOsman·
Fundamentals of LLMs: MoE vs Dense > many popular releases have been sparse MoEs > so when a dense model drops, everyone starts asking why it feels so much slower > that’s the cost of full activation > Dense = tokens run through every parameter of the model weifhts > MoE = tokens selectively activate a subset of the parameters of the model weights > Dense models (Qwen 3.5 27B, Gemma 4 31B) > every parameter fires on every token > ~27B ops per token, every time > MoE models (MiniMax M2, Kimi K2.5) > router + many experts > per token: activate top-k (usually 2) > the rest do nothing > this one design choice changes everything > inference speed > Dense is slower: all weights, every token > MoE is faster: a 675B model might only run ~40B active params > big model, small compute footprint > memory / VRAM > Dense: lower usage, only store what you execute (~140GB for 70B BF16) > MoE: all experts must live in memory (Kimi K2.5 is ~600GB in NVFP4) > compute / FLOPs > Dense: high compute burn per token > MoE: cheap per token, expensive to host in memory though
Ahmad tweet media
English
29
21
325
26.6K
majabbar
majabbar@MindLedger·
@TheAhmadOsman @sudoingX When is the discord? I have so much to ask during my 0 to Hero journey and may be may regularly share my milestones. I started with 1 3090ti, just got the second 3090ti and am now deciding on a new motherboard and stuff, etc.
English
0
0
3
465
Ahmad
Ahmad@TheAhmadOsman·
GPUs >> Unified Memory (e.g. Mac Studio)
mike@mike_4131

@TheAhmadOsman should we secure GPUs or is a Mac Studio 512gb enough?

Italiano
25
4
189
25.6K
majabbar
majabbar@MindLedger·
@TheAhmadOsman RTX Pro 6000 is the way to go. A hard wish, though.......
English
0
0
1
143
Ahmad
Ahmad@TheAhmadOsman·
got this in my inbox from a NVIDIA contact > RTX 5090 vs Mac Studio M3 Ultra this further highlights how Dense models are best served on GPUs (2.7x perf. jump) p.s. this is without accounting for concurrency (parallel agents) & most probably there’s a lot more juice to squeeze
Ahmad tweet media
English
5
2
51
3.3K
majabbar
majabbar@MindLedger·
@TheAhmadOsman They'll 'opensource' it as they have done with Claude Code 😉
English
0
0
1
151
Sudo su
Sudo su@sudoingX·
fuck. i just lost a thought.
English
10
0
24
2.1K
Ahmad
Ahmad@TheAhmadOsman·
PREDICTION 2026-2027 will bring a new era for opensource AI An era that will be DOMINATED by American opensource labs pushing the frontier of open models >The gap between close & open models will get narrower, not widen as many speculate This tweet is for history, bookmark it
English
24
11
201
46.7K
Ahmad
Ahmad@TheAhmadOsman·
This will probably be great for Large single GPUs (e.g. RTX PRO 6000) You’re limited to 40Gpbs initially (during model loading) but then once the model is fully loaded on the GPU it should be extremely faster than Unified Memory speeds for inference
the tiny corp@__tinygrad__

If you have a Thunderbolt or USB4 eGPU and a Mac, today is the day you've been waiting for! Apple finally approved our driver for both AMD and NVIDIA. It's so easy to install now a Qwen could do it, then it can run that Qwen...

English
29
5
208
20.1K
Elon Musk
Elon Musk@elonmusk·
Stand By Me
English
36.2K
23.8K
331.8K
81.2M
majabbar
majabbar@MindLedger·
@TheAhmadOsman But we can download in good faith. Naivety can't be punished.🤡
English
0
0
0
41
Ahmad
Ahmad@TheAhmadOsman·
thank you for leaking the Claude Code source code to the opensource community this time Dario 💙
English
22
37
474
26K