Raja Koduri

4.5K posts

Raja Koduri

@RajaXg

Create, Clean, Consume is my aspirational routine. My interests math, computer graphics, silicon, software and music.

San Francisco Katılım Aralık 2009

2.1K Takip Edilen51.1K Takipçiler

Sabitlenmiş Tweet

Raja Koduri@RajaXg·15 Tem

x.com/i/article/1812…

ZXX

121

785

190.7K

Raja Koduri@RajaXg·3d

30 years of working with Dell from 2d Bitblt engines, 3d accelerators, GPUs, CPUs and now token factories, finally got a picture with the great man himself @MichaelDell at Dell Technologies World in Las Vegas

English

3.7K

Raja Koduri@RajaXg·3d

Whenever I meet someone visiting silicon valley from Taiwan I jokingly ask if they brought me a wafer 😀 friends from whalechip in Taiwan did one better, they sent me wafer-on-wafer, hybrid bonded logic and dram wafers. Thank you Wenchung I know we are in "wafer scale" buzz since last week. Congratulations Cerebras! Andrew, Dhiraj, Gary, Julie and other friends and acquaintances. 👏👏 Easily one of the most courageous, innovative and hardworking silicon engineering in the world right now.

English

142

11.8K

Raja Koduri retweetledi

David Bennett@DavidBennett__·26 Nis

This is very cool! RRR x Takarazuka (famous Japanese all female theater group) You must have seen this @RajaXg, but what a brilliant collaboration. 👏🏻 @ssrajamouli

English

1.9K

Raja Koduri@RajaXg·22 Nis

Good to see good old GPU shaders in action at Google cloud next’26 in Las Vegas!

English

3.5K

Raja Koduri@RajaXg·12 Nis

@m13v_ north fork near Yosemite

English

Matt@m13v_·12 Nis

@RajaXg which center did you sit at? the "attention and equanimity" framing is perfect honestly. six courses in and that's still the core of what i'd tell anyone. the hard part nobody warns you about is maintaining that equanimity when you're back at your desk on monday

English

Raja Koduri@RajaXg·19 Mar

Finished my first 10-day Vipassana meditation retreat. Silent and fully disconnected. It was hard. I will link some resources in the thread to learn more about this. For the AI nerds, I would summarize my experience as "Attention and equanimity is all you need"

English

5.3K

Raja Koduri retweetledi

Omer Cheema@OmerCheeema·28 Mar

In 1969, the Apollo Guidance Computer (AGC) took humans to the Moon with a 2.048 MHz crystal clock (effective ~1.024 MHz internal), executing roughly 40,000–85,000 instructions per second. It had just 2,048 words of magnetic-core RAM (~4 KB) and 36,864 words of hand-woven core-rope ROM (~72 KB total memory). No fancy frameworks. No bloated buffers. Every word and cycle was sacred. Software in tight assembly handled real-time guidance, descent, rendezvous — and even recovered gracefully from 1202 alarms during Apollo 11's landing. My old professor (ex-NASA researcher) once yelled at me for initializing an unnecessary array. His point hit hard: "If a ~2 MHz machine with <100 KB of memory could land on the Moon, why do we waste resources so recklessly on far less demanding apps today?" He's right. I've seen it dozens of times in software teams. Give engineers abundant RAM/CPU/GPU, and usage magically expands to fill it. Unneeded allocations, oversized buffers "just in case," heavy dependency trees, and telemetry bloat everywhere. Product managers who set hard memory optimization goals? Teams suddenly deliver wonders. Low-hanging fruit abounds. Yet it's rarely prioritized until costs or OOM errors bite. This mirrors Raja's wisdom on innovation thriving in constrained environments. Scarcity forces elegance, reuse, and true mastery of the problem. Fast-forward to today's AI stack, especially LLM inference: A 70B-parameter model in FP16 weighs ~140 GB just for weights. Add a 32K-token context with modest batching? The KV cache alone can explode to 40–80+ GB (or 300+ GB at 128K context with multiple users) — often dwarfing the model itself. For Llama 3.1 70B at FP16, one long context request can eat tens of GB in cache before you even generate tokens. No wonder serving costs skyrocket and concurrency suffers. Yet the fixes echo the AGC mindset. Quantization (INT4/FP8/NF4): Shrinks a 70B model from ~140 GB (FP16) to ~35 GB with minimal quality loss. Up to 4x memory reduction + big speedups. PagedAttention (vLLM): Treats KV cache like virtual memory with small blocks — slashes fragmentation/waste from 60-80% down to <4%, unlocking 2–4x higher throughput and far more concurrent users on the same GPUs. FlashAttention: Fuses ops and tiles computation for up to 10–20x memory savings on attention (especially long sequences) plus 2–4x speed. Combine them and you serve dramatically more with the same hardware, or run bigger models/contexts without burning money. The lesson from the Moon: constraints aren't enemies they're forcing functions for better engineering. In AI (and software at large), deliberate discipline, profiling ruthlessly, enforcing budgets, questioning every allocation, still wins. Key optimization areas delivering the biggest savings today: KV cache management (paged + quantized), model weight quantization, efficient attention kernels, mixed precision + checkpointing in training, and system-level profiling to kill unnecessary bloat. Abundance made us sloppy. Scarcity (or enforced budgets) will make us brilliant again.

Raja Koduri@RajaXg

I warned my memory friends a few months ago..there are tons of optimizations available across the whole stack to reduce memory capacity and bandwidth...as long as memory was relatively "cheap" , we stay lazy...constraints unleash creativity..I hear the memory supply chain constraints won't be solved till 2030..prepare for deluge of creativity..it hasn't been a week since Turbo quant... not only in software, but you will some insanely cool hardware improvisations and new suppliers emerge to to the top as well

English

9.5K

Raja Koduri@RajaXg·28 Mar

@Midnight_Captl I’m saying watch for disruptions in the hierarchy..I’m bullish that there will be new winners in the memory hierarchy..bearish that the current leaderboard will be stable

English

148

Midnight Capital LLC@Midnight_Captl·28 Mar

@RajaXg You’re flip flopping. Are you saying you’re bearish memory pricing power or not 😵‍💫

English

175

Midnight Capital LLC@Midnight_Captl·28 Mar

The quoted tweet is complete nonsense. Memory needs will keep exploding. A lot of ill informed people thinking DRAM / HBM prices will fall... Sorry but keep dreaming Take a look at the NVDA gen / gen diagram below - which component looks like it will grow the most? MU / SK / Samsung pricing power isn't going anywhere

Raja Koduri@RajaXg

English

171

21.7K

Raja Koduri@RajaXg·28 Mar

@Midnight_Captl Memory capacity and bandwidth will remain important, but who and how it will be served could change...producing new winners..

English

193

Midnight Capital LLC@Midnight_Captl·28 Mar

@RajaXg What did you warn your friends about?

English

446

Raja Koduri@RajaXg·28 Mar

@Midnight_Captl About the system architecture changes that could have more lasting impact in 3-4 years..

English

Raja Koduri@RajaXg·28 Mar

@benitoz There will be no change in demand...but will be fascinating to see how much more efficient software and hardware will get in memory utilization as the demand far exceeds supply...and don't discount left field ideas that change the system architecture..

English

2.2K

Ben Pouladian@benitoz·27 Mar

@RajaXg Yes. Compression doesn’t kill demand. It makes bigger workloads viable

English

3.1K

Raja Koduri@RajaXg·27 Mar

ComfyUI@ComfyUI

Upgrading your RAM is now unnecessary. Introducing our new ComfyUI Dynamic VRAM optimization. Running local models is now possible on even the most memory constrained hardware. Read more here: blog.comfy.org/p/dynamic-vram…

English

445

162.5K

Raja Koduri@RajaXg·23 Mar

Plenty of DRAM supply at BiRite in SF

English

4.5K

Raja Koduri retweetledi

Alex Cheema@alexocheema·20 Mar

The stakes are higher than you can possibly imagine.

0xSero@0xSero

x.com/i/article/2034…

English

344

36.4K

Raja Koduri@RajaXg·18 Mar

No supply constraints!

English

2.5K

Raja Koduri retweetledi

𝐷𝑟. 𝐼𝑎𝑛 𝐶𝑢𝑡𝑟𝑒𝑠𝑠@IanCutress·16 Mar

It begins. @RajaXg

English

118

7.6K

Raja Koduri@RajaXg·16 Mar

We are working with all the usual suspects in the GPU-TPU-XPU-DRAM ecosystem to power these giga-watts. Been at it for several months, good to be able to share this ahead of GTC today. crnasia.com/india/news/202…

English

1.5K

1.7M

Raja Koduri retweetledi

rajamouli ss@ssrajamouli·25 Şub

For years our stories pushed the limits of our canvas. Today we stretch it further. Excited to launch A&M MoCap Lab, India’s largest Motion Capture facility at @AnnapurnaStudios set up in collaboration with @mihiravisualabs and @Animatrik Looking forward for storytellers to explore its limitless potential in animation, live action, gaming and more.

English

225

3.2K

23.2K

687.4K

Raja Koduri@RajaXg·17 Şub

Zettascale India is no longer just a vision. It’s becoming reality. Back in 2020, I gave a talk laying out why India must build sovereign AI compute infrastructure at zettascale. 🎬 Watch my FICCI talk here: youtu.be/nd-EFhFlAr8?si… Fast forward to today: AM Green Group has announced a $25 billion, 1 GW AI compute hub in Greater Noida. One of the largest AI infrastructure investments in India’s history. The facility will house nearly 500,000 high-performance chipsets and run entirely on 24/7 carbon-free energy from solar, wind, and pumped storage. I’m proud to serve as an advisor to this effort. This is a bold and necessary ambition. But announcing it is only the beginning. The real work lies ahead: building the supply chains, attracting and developing talent, executing on the engineering, and delivering on the sustainability promise. There is a lot to do to turn this vision into reality. Phase 1 targets 2028. Full capacity by 2030. The commitment is there. Now it’s about relentless execution. Let’s build it right. livemint.com/ai/am-group-25… Will be at India AI Impact summit in Delhi this week

YouTube

English

8.1K

Raja Koduri retweetledi

OXMIQ@realoxmiqlabs·12 Şub

@RajaXg explains why the GPU industry lacks the kind of standardized IP ecosystem that ARM created for CPUs, and why that matters for the future of AI acceleration. This is exactly the gap OXMIQ is addressing with our licensable chiplet-based GPU IP architecture. #OXMIQ

English

1.7K

Keşfet

@MichaelDell @ssrajamouli @m13v_ @Midnight_Captl @benitoz @elonmusk @BarackObama @taylorswift13