Lucien retweetledi

Speed up your Gemma 4 workflows by up to 3x with Multi-Token Prediction (MTP) drafters.
Standard LLM inference is fundamentally memory-bandwidth bound, creating a latency bottleneck as billions of parameters travel from VRAM just to generate a single token. We're working to ease this bottleneck with MTP drafters for @googlegemma 4.

English


























