
Compress large language models faster with LLM Compressor v0.10. Distributed GPTQ delivers multi-GPU support, compressed-tensors offloading handles models that exceed memory capacity, and GPTQ FP4 microscale support brings advanced quantization.
red.ht/47IlaeH
English


