

Did some half-baked experiments with GPU power limits to see how it affected inference performance on Minimax M2.5. TL;DR: - unconstrained 350W/GPU limit on 6x RTX3090 gave the best performance, and perhaps counterintuitively, was most efficient - Minimax doesn't use all the power I give it. I attribute that to MoE requiring fewer operations per token, but idk - Nerfing your system in the name of reducing the power bill might not actually help you Blog: llmgarage.ai/power-limit-to…





















