

Shreyas Havaldar
301 posts

@_toolazyto_
Presidential Fellow PhD Student @Columbia | @GoogleDeepMind | @IITHyderabad CS '22 | Causality & LLMs | Somewhere, something incredible is waiting to be known






Please share widely: Columbia's student-run Computer Science PhD Pre-Submission Application Review (PAR) Program is back in full swing! If you are interested in having current CS PhD students review your personal statement, please apply by November 10!! cs.columbia.edu/cscu-phd-par-p…








Image generation with Gemini just got a bananas upgrade and is the new state-of-the-art image generation and editing model. 🤯 From photorealistic masterpieces to mind-bending fantasy worlds, you can now natively produce, edit and refine visuals with new levels of reasoning, control and creativity. A quick dive into Gemini 2.5 Flash’s capabilities 🧵

🚀 Fast and accurate Speculative Decoding for Long Context? 🔎Problem: 🔹Standard speculative decoding struggles with long-context generation, as current draft models are pretty weak for long context 🔹Finding the right draft model is tricky, as compatibility varies across models 💡Thoughts: 🔹Why not use the target model itself as the draft, and then use approximations like quantization to make it faster? 🔹Quantization offers better target-draft alignment, leading to a clear improvement in acceptance ratio 🔹No tedious model searching is needed anymore ⚠️ Challenge: 🔹With the quantized target model as draft, we will need to store a separate copy of KV caches for the quantized model. Very memory intensive for large models 🔑Solution: 🔹Proposed Hierarchical KV Cache for quantized KV. No need for separate KV storage 🔹Bit-sharing between target & draft models, leading to equivalent representation with minimal overhead ⚡ Results: 🔹2.5× End-to-End generation speedup 🔥 🔹2.88x kernel-level efficiency 🔹>90% acceptance rates between the target and the draft model 🔹1.3× memory reduction Paper: arxiv.org/abs/2502.10424 Code: github.com/SqueezeAILab/Q… Joint work with: @HaochengXiUCB @adityastomar_ @coleman_hooper1 @sehoonkim418 @mchorton1991 (@Apple ) @MahyarNajibi (@Apple ) Michael Mahoney @KurtKeutzer @amir__gholami 🧵 [1/6]
















