shubham
3.3K posts

shubham
@bluequbit
building custom AI tools and multi-agent systems






I implemented @GoogleResearch's TurboQuant as a CUDA-native compression engine on Blackwell B200. 5x KV cache compression on Qwen 2.5-1.5B, near-loseless attention scores, generating live from compressed memory. 5 custom cuTile CUDA kernels ft: - fused attention (with QJL corrections) - online softmax -on-chip cache decompression - pipelined TMA loads Try it out: devtechjr.github.io/turboquant_cut… s/o @blelbach and the cuTile team at @nvidia for lending me Blackwell GPU access :) cc @sundeep @GavinSherry





Introducing Steer AI. We made an AI that can't stop thinking about any concept you choose, by steering a model's internal representations at inference time. Ask it anything, and watch it bend reality around that concept. Available for one week only.






Introducing Lightreel - the first AI that doomscrolls for you Analyzes 150,000+ TikTok UGC videos to answer any marketing question “What hooks are my competitors using?” “Why did this video flop?” “Find me 10 NYU creators with 1500 followers” It helped me get 60 million views in 3 days Try it out today! (PS — Running UGC for free for one random person who retweets this. I'll make your app go viral)












