@krishgarg and I built Shard, beating @GoogleDeepMind's TurboQuant on KV cache compression.
10x compression on Llama-3.1-8B-Instruct at 8K.
NIAH recall: 1.000.
Keys: RoPE-aware PCA + int4 fused attention.
Values: Hadamard + VQ.
Same needles. Less cache.
krishgarg.com/shard
i just beat @GoogleDeepMind's turboquant
introducing Shard. 10x KV cache compression on Llama-3.1-8B. zero quality loss
- 10x @ 8K context, 11.2x @ 32K
- NIAH recall 1.000 across 4K-32K
- LongBench Δ ≈ 0 vs FP16
turboquant tops out at 4-6x at the same quality. we doubled it.
read more: krishgarg.com/shard@kirrithan
Just finished @ycombinator demo day. It has truly been the most productive time of our lives. Have grown @GolpoAI with @KarShreyas to levels I could not even imagine.
One year ago, I was sitting in my final high school class, so happy that my lifelong dream was finally coming true. I was headed to Stanford University for my four years of undergrad, followed by a master’s and PhD. It was the plan I had held onto since I was six. I couldn’t wait to live it.
But… life had other plans.
Fast forward a year, and I’ve dropped out of Stanford to build my own company, Golpo (YC S25), with my brother Shreyas Kar, who also dropped out of Stanford University (a bit brutal for our parents haha).
If you had told my high school self, or even me six months ago, that I’d walk away from the thing I’d dreamed of for over a decade, I would’ve laughed and said no way.
But some problems are too important to ignore.
I’ve loved videos since I was a kid. They were my go-to way to explain anything. Even if it took hours, it was worth it.
So when AI video platforms like Sora and Veo 3 came out, I was excited: finally, a faster way to turn ideas into clear, engaging videos.
But the industry took a very different path.
Instead of using AI video to make communication easier and more effective, the industry started chasing spectacle. The focus shifted toward making things look “impressive”, pouring in more and more compute to generate flashy, cinematic content.
But in doing so, the core purpose of video got lost. The outputs became expensive to produce, difficult to scale, and often felt shallow or disconnected from real use cases.
Don’t get me wrong, I love a cinematic video of a dancing cat. But those aren’t tools for learning, explaining, or sharing real ideas.
We’ve drifted away from video as a means of understanding and started treating it like a performance, when what we really need is clarity, not just wow.
That’s why I let go of the dream and joined Y Combinator to build Golpo (YC S25), with my brother Shreyas Kar, to reimagine AI video generation.
Because if we don’t build the version of AI video we want to see, one that helps people actually understand things, not just be impressed, we risk letting it drift too far in the wrong direction.
san francisco needs more fun
so i’m hosting the first ever manhunt/tag next sunday (july 27)
pull up for an amazing time
comment or dm me for the invite link!!