Sabitlenmiş Tweet

1/n While everybody’s been busy packing for #NeurIPS2023, our team at @graphcoreai has been busy with this beauty. Let me introduce:
✨SparQ Attention✨
TL;DR This is a plug-and-play inference Attention block for pre-trained LLMs, which evaporates the KV cache bandwidth 🧵
English



































