Patrick Putzky đã retweet

… “Entropy-Coding Meets Quantization”, our latest research @MMerantix
⭐Compress LLMs down to 2 bits/param while retaining 8-bit precision
⭐Data-free, no calibration needed
⭐Compress a 70B model on-the-fly in <10min on a single GPU

English
