Martin Genzel

126 posts

Martin Genzel banner
Martin Genzel

Martin Genzel

@MartinGenzel

Staff Machine Learning Researcher @MMerantix | Applied mathematician | Interested in Deep Learning, LLMs, Model Compression & Efficiency

Berlin, Germany Katılım Haziran 2021
614 Takip Edilen273 Takipçiler
Sabitlenmiş Tweet
Martin Genzel
Martin Genzel@MartinGenzel·
The LLM compression community is obsessed with quantizing to ever lower bit-widths. But below 4 bits, returns start diminishing, requiring many tricks & hacks to prevent collapse. What if 8-bit + smarter compression beats the low-bit race? 📢 Thrilled to introduce EntQuant …
English
1
0
5
89
Martin Genzel
Martin Genzel@MartinGenzel·
So what’s the price? Entropy coding adds some decode overhead at inference time. Inspired by the recent DFloat11 paper, we integrate a GPU-based ANS decoder into the forward pass, decoding weights on-the-fly, leading to a modest overhead compared to Marlin FP8 and BF16.
Martin Genzel tweet media
English
1
0
2
71
Martin Genzel
Martin Genzel@MartinGenzel·
The LLM compression community is obsessed with quantizing to ever lower bit-widths. But below 4 bits, returns start diminishing, requiring many tricks & hacks to prevent collapse. What if 8-bit + smarter compression beats the low-bit race? 📢 Thrilled to introduce EntQuant …
English
1
0
5
89
Martin Genzel retweetledi
Dan Alistarh
Dan Alistarh@DAlistarh·
We're releasing the DASLab GGUF Quantization Toolkit! 🚀 First open-source toolkit bringing GPTQ + EvoPress to @ggerganov's GGUF format, enabling heterogeneous quantization based on importance. Result: Better models at the same file size. [1/5]
Dan Alistarh tweet media
English
4
50
267
66.4K
Martin Genzel
Martin Genzel@MartinGenzel·
📢 Excited to share our latest research @MMerantix on Any Compression of Foundation Models. We all know how intuitive and seamless image compression is: use a slider to specify your target size and get an instant preview. Our quest: Can compressing an LLM be just as easy? 🧵👇
English
1
3
9
850