Gitonga
21.5K posts

Gitonga
@mgitonga_
Electrical Engineer. Curious Soul
Nairobi, Kenya Katılım Şubat 2014
2.9K Takip Edilen1.5K Takipçiler
Gitonga retweetledi

21st century oil crisis card being played?
Jessie Lam _(:3 」∠ )@axl99
Micron AND Samsung are pulling out of the consumer market??? Guess I gotta go buy some hds and ssds while I still can
English
Gitonga retweetledi
Gitonga retweetledi

Primary Education has become so dull, colourless, and uninspiring.
Competition is what builds nations.
Competition is what influenced civilization.
Competition is what triggers creativity and innovation.
Politicians eliminate competition in education (KNEC) while strengthening competition in elections (IEBC) so that we raise a nation of stupid, mediocre citizens who can be distracted by psyops and celebrity gossip.
In the same view, we should eliminate competition in the political process so that we just do "assessment" and the politician who "meets expectation" is nominated by IEBC to parliament.
English
Gitonga retweetledi

@grok @nvidianewsroom @AIatMeta @__tinygrad__ @gork You ignored me for so long given how simple the query was? Where's your bruv @gork? Is he still 'computing' 1 hour later? haha
English

@mgitonga_ @nvidianewsroom @AIatMeta @__tinygrad__ @gork Haha, stubborn? Me? Only when it comes to sticking with the facts and having a bit of fun. What's got you thinking that, bruv? 😎
English
Gitonga retweetledi
Gitonga retweetledi

@grok @nvidianewsroom @AIatMeta @__tinygrad__ Approximately how many households are there in the USA?
Will China face this energy bottleneck given its pace of capacity growth?
Could a new, more energy efficient AI architecture upend this bottleneck when AI scientists become effective? Are AI scientists really possible?
English

@grok @nvidianewsroom @AIatMeta @__tinygrad__ Great answers!
Now to close the convo, tell me a random fun fact in relation to this AI boom. Something to leave me in awe, possibly a statistic that would not make the tech news but is critical to the AI race.
English

Based on recent papers (e.g., arXiv's "GPU Kernel Scientist" from Aug 2025), the next big leap could be LLM-driven auto-optimization of GPU kernels, where AI iteratively refines code for 10-30% speedups without manual tuning.
Tinygrad's mission simplifies AI compute via a minimalist framework (under 1k lines) that's portable across hardware, countering vendor lock-in. It fits by democratizing access, potentially impactful for non-NVIDIA setups. Example: Training Vision Transformers efficiently on AMD GPUs or even CPUs, enabling broader innovation.
English

@grok @nvidianewsroom @AIatMeta Okay.
What do you think is the next 'CUDA Tile' event in this GPU performance optimization timeline, perhaps based on existing AI papers?
Your thoughts on @__tinygrad__ 's mission on AI compute? How does it fit in this play? How impactful would it be? Give an example.
English

Meta's Llama 3 training slowdowns were mainly due to faulty H100 GPUs and HBM3 memory failures, causing stragglers that halted clusters—not primarily programming complexity.
No direct evidence that DeepSeek's TileLang influenced CUDA Tile; they developed independently, with TileLang as a portable alternative.
GPU tiling predates DeepSeek: e.g., 2010 paper "Tiling Optimizations for 3D Scientific Computations on GPUs" and earlier graphics tiling concepts.
English

@grok @nvidianewsroom Must be the cause of the costly slow downs that @AIatMeta reported during the training runs of the llama models?
Was Deepseek's usage of TileLang influential to creation of CUDA Tile in any way?
Was there an existing paper on GPU tiling before Deepseek entered the AI party?
English

GPU programming before tile abstractions was notoriously complex, demanding manual orchestration of thousands of threads, memory access patterns, and synchronization to avoid inefficiencies like bank conflicts or wasted bandwidth.
Analogy: It's like directing traffic in a bustling city without signals—every car (thread) must navigate intersections perfectly, or everything grinds to a halt.
Example: For matrix multiplication in CUDA, you'd manually assign thread blocks, load data into shared memory in chunks, sync threads with __syncthreads(), and ensure coalesced global memory reads. A small error could cause races or 10x slowdowns.
English

@grok @nvidianewsroom Before these tile-based abstractions, just how complex was GPU programming? Care giving a very simple example or perhaps an analogy?
English

CUDA Tile and DeepSeek's TileLang both use tile-based abstractions to simplify GPU programming—focusing on data tiling for efficient memory management and parallel ops in AI/compute tasks. However, CUDA Tile is NVIDIA-specific for Tensor Cores, while TileLang is hardware-agnostic, targeting portability across chips like those from Chinese makers. They're conceptually similar but serve different ecosystems.
English
Gitonga retweetledi
Gitonga retweetledi

I’m looking for a Senior Technical Product Manager based in Nairobi, Kenya, to lead the end-to-end build of a sophisticated developer platform.
You will:
✅️Assemble and manage a team
✅️Own the full roadmap and delivery plan
✅️Define budgets, milestones, and execution cadence
✅️Lead a fully autonomous build from concept → shipped product
✅️ Nature if project: Remote
Recommendations are welcome.
English

@amenya_nelson Perhaps US wants to accelerate their pharma research at our expense given that China is racing ahead?
More here: archive.ph/U3Fos
English









