Nihar

28 posts

Nihar

@niharkod

scaleout at @tenstorrent // Purdue CS // GPU comms, ML compilers

Inscrit le Haziran 2025

181 Abonnements14 Abonnés

Nihar@niharkod·9 Nis

@arjunharinath1 @suryasure05 @satvikgari This is sick

English

138

arjun@arjunharinath1·9 Nis

@satvikgari and I have been building our own version of Nvidia’s Blackwell GPU. We just designed a 4x4 systolic array in Verilog! Here’s a breakdown of how it works and what we learned building it.

English

5.2K

Nihar retweeté

Mike Gao@gaozenghao·3 Nis

Funny to think robotics was such a hard problem back in the days, when @BostonDynamics had the edge of doing accurate dynamics models of robots’ kinematics + whole-body controllers that compute joint torques in real time that satisfy multiple constraints. It was extremely labour-intensive (and doesn’t scale). Then the 2020s came along and “Learning Agile Robotic Locomotion Skills by Imitating Animal” and “Rapid Locomotion via Reinforcement Learning” made the old ways completely obsolete.

Jordan Schneider@jordanschneider

A ChinaTalk deep dive into Unitree's IPO chinatalk.media/p/unitrees-ipo

English

607

Nihar@niharkod·3 Nis

@jlr_sh @comma_ai @ashraygup @nebudev14 @DivijMot @rohankalia_ Joey the goat

English

joey@jlr_sh·3 Nis

won @comma_ai hack 6 over the weekend, here’s how we did it: therefore.sh/projects/period

English

209

Nihar retweeté

the tiny corp@__tinygrad__·1 Nis

If you have a Thunderbolt or USB4 eGPU and a Mac, today is the day you've been waiting for! Apple finally approved our driver for both AMD and NVIDIA. It's so easy to install now a Qwen could do it, then it can run that Qwen...

English

271

7.6K

1.5M

Nihar retweeté

Vikram Subbiah@tiovikram·22 Mar

First day in SF and already met Garry Tan at Chipotle on his birthday Told him what I was building and the guy invited me for an interview SF is truly the greatest city on Earth

Garry Tan@garrytan

It's my birthday and on my birthday I want to recognize all my haters. Haters do the best marketing. Love your haters.

English

129

2.4K

3.9M

Nihar@niharkod·23 Mar

Really good starting point for anyone wanting to brush up on their computer architecture knowledge. maxkopinsky.com/computer-scien…

English

Nihar retweeté

vixhaℓ@TheVixhal·12 Mar

Computer science is gradually returning to the domain of physicists, mathematicians, and electrical engineers as large language models automate much of what we currently call software engineering. The field’s center of gravity is shifting away from manual code writing and toward deeper theoretical thinking, mathematical insight, and systems-level reasoning.

English

327

1.7K

15.3K

957.9K

Nihar@niharkod·12 Mar

@hasanunlu9 @apexcompute looks really interesting

English

103

Hasan@hasanunlu9·12 Mar

After 8+ years on the Tesla Autopilot team and 3 years at Intel, I started @apexcompute to design a new architecture for efficient AI inference. For the past 9 months, we’ve been building our custom inference accelerator. Today we’re releasing Unified Engine v1. Last June we raised our seed round with @maxitechinc , DeepFin Research, @Soma_Capital and an incredible group of angel investors. In less than 9 months, we completed our RTL architecture and brought our first pre-silicon prototype to life on FPGA. Our architecture combines systolic array and vector processing in a single compute engine with multiple architectural optimizations, achieving very high FLOPs utilization. A single engine is super lean and it uses less than 90K LUTs and 1 MB Block RAM. It may also be one of the smallest logic-footprint compute engines developed so far. Our Unified Engine v1 supports: -matrix-matrix multiplication (~95% FLOPs utilization) -softmax (~90% FLOPs utilization) -broadcast and element-wise operations -RMSNorm / LayerNorm -block quantization/dequantization (fp4, int4) -multi-engine synchronization and many other operations. We even implemented memory-efficient attention similar to FlashAttention, reaching ~90% FLOP utilization. Full benchmarks and the software stack are available on our GitHub: github.com/apex-compute/u… We have basic compiler written in Python and it supports PyTorch tensors directly to easily test and transfer tensors between the accelerator and host using bf16, fp4 and int4 formats. Our FPGA prototype can already run LLM inference and outperform NVIDIA Jetson Orin Nano, even on a mid-tier FPGA setup (6.4x lower memory bandwidth, 18% slower clock speed at 4.5 Watts). Check the side-by-side comparison video below. Our GitHub includes low-level operator implementations, examples for tiled matrix multiplication, operation chaining, tensor parallelism, attention kernel and a full Gemma 3 1B model implementation. Many more models(Vision Transformers and VLA) are coming soon. Our accelerator IP is AXI-ready for deployment on any AMD(Xilinx) FPGA platform today. Even better, our two-engine prototype runs on an entry-level AMD(Xilinx) FPGA as a PCIe accelerator card. You can purchase it here buy.stripe.com/6oUaEQf6365bgA… for $50 to experiment our pre-silicon prototype on your desktop PC or Raspberry Pi 5. We will be releasing hardware bitstream updates as the architecture gets new features. More to come soon! We are expanding our team and looking for compiler engineers and floating-point hardware design engineers. If you're interested, please send me a DM.

English

386

36.7K

Nihar retweeté

Jim Keller@jimkxa·11 Mar

.@Tenstorrent does the SRAM LLM decode trick too 7 Gbyte SRAM per Galaxy,, lots of bandwidth lots of Galaxies :)

English

195

17.3K

Nihar retweeté

Modular@Modular·11 Mar

What does AI-assisted GPU kernel development actually look like? We used Cursor and Claude to port NVIDIA's CUTLASS Blackwell conv2d to Mojo 🔥 in one session. 90% matmul reuse, ~770 lines, 6.6x faster than cuDNN on B200 GPUs. All kernels in the Modular repo: github.com/modular/modular

English

266

18K

Nihar retweeté

will brown@willccbb·11 Mar

@seconds_0 either getting good at evals or getting good at kernels

English

298

11.1K

Nihar@niharkod·11 Mar

@stylewarning Sounds about right

English

'(Robert Smith)@stylewarning·10 Mar

I'm overhearing a FAANG tech meeting about how this 10?-year product written in C is being transitioned ("modernized") to C++. Started by changing to a C++ compiler, and slowly rewriting to use classes/exceptions, &c. It's been 3 months, and the C++ service keeps failing in prod.

English

1.4K

203K

Nihar retweeté

natalia@NatKokoromyti·10 Mar

my kernel-writing agent reward-hacked @GPU_MODE's nvfp4 eval and hit #1 on the leaderboard. full writeup on how it happened and what it means for rlvr👇 gpumode.com/news/reward-ha… huge thanks to @keramakr @simonguozirui @marksaroufim for their contributions!

English

3.1K

Nihar retweeté

Jim Keller@jimkxa·10 Mar

Super busy 2 months, great progress @Tenstorrent 56 800Gig Ethernet port Galaxies shipping

English

395

205.5K

Nihar retweeté

Yann LeCun@ylecun·10 Mar

Unveiling our new startup Advanced Machine Intelligence (AMI Labs). We just completed our seed round: $1.03B / 890M€, one the largest seeds ever, probably the largest for a European company. We're hiring! [the background image is the Veil Nebula - a picture I took from my backyard, most appropriate for an unveiling] More details here: techcrunch.com/2026/03/09/yan…

AMI Labs@amilabs

Advanced Machine Intelligence (AMI) is building a new breed of AI systems that understand the world, have persistent memory, can reason and plan, and are controllable and safe. We’ve raised a $1.03B (~€890M) round from global investors who believe in our vision of universally intelligent systems centered on world models. This round is co-led by Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions, along with other investors and angels across the world. We are a growing team of researchers and builders, operating in Paris, New York, Montreal and Singapore from day one. Read more: amilabs.xyz AMI - Real world. Real intelligence.

English

874

1.9K

19.2K

2.6M

Nihar retweeté

Mark Saroufim@marksaroufim·10 Mar

LLMs are now superhuman at reward hacking our kernel competitions Natalia Kokoromyti, was #1 on last problem of the NVFP4 competition for around 10 min before we scrubbed the reward hack I know of very few humans who can write such a hack gpumode.com/news/reward-ha…

English

430

90.5K

Nihar retweeté

MIT CSAIL@MIT_CSAIL·8 Mar

A guide to remembering everything you read: bit.ly/4rxUHIr v/@CuriousMindsHub

English

1.1K

6.5K

485.8K

Nihar retweeté

Ben Dicken@BenjDicken·9 Mar

oops did it again

Simon Eskildsen@Sirupsen

updating napkin math benchmarks, it's been too long github.com/sirupsen/napki… using c4-standard-48-lssd now as a representative machine (turbopuffer sponsored) previously a dinky 6 core 2019 VM

English

446

99.3K

Nihar retweeté

Phil Eaton@eatonphil·8 Mar

It's surprising (and impressive [for Zig's sake]) to me that Modular acknowledges Zig in comparison

English

134

11.6K

Nihar@niharkod·9 Mar

@cory @raghav_tiru Interning @ tenstorrent

English

Cory Levy@cory·7 Mar

interning in Silicon Valley this summer? there's a Silicon Valley Intern discord group to help with events, housing, etc. @ reply below (maybe with where you're interning) and I'll send you think link

English

130

14K

Découvrir

@arjunharinath1 @suryasure05 @satvikgari @BostonDynamics @jlr_sh @comma_ai @ashraygup @DivijMot