Rohan
5 posts

Rohan
@_rohantuli
software engineering @uwaterloo, building @ https://t.co/iCR6O82LKi
Waterloo, Ontario Katılım Kasım 2020
63 Takip Edilen52 Takipçiler

i just beat @GoogleDeepMind's turboquant
introducing Shard. 10x KV cache compression on Llama-3.1-8B. zero quality loss
- 10x @ 8K context, 11.2x @ 32K
- NIAH recall 1.000 across 4K-32K
- LongBench Δ ≈ 0 vs FP16
turboquant tops out at 4-6x at the same quality. we doubled it.
read more: krishgarg.com/shard
@kirrithan
English
Rohan retweetledi
Rohan retweetledi

i built a free way to index and search any docs locally (cli or mcp) with a memory layer
runs entirely on your machine via @ollama
English







