
⚡🔧 PyTorch inference optimization just got a lot simpler Introducing AITune — NVIDIA's new library that automatically finds the fastest inference backend for any PyTorch model. It covers TensorRT, Torch Inductor, TorchAO and more, benchmarks all of them on your model and hardware, and picks the winner. No guessing, no manual tuning. The production path (Ahead-of-time): AITune profiles all backends, validates correctness automatically, and serializes the best one as an .ait artifact — compile once, zero warmup on every redeploy. Something torch.compile alone doesn't give you. Pipelines are also supported — each submodule gets tuned independently. The fast path (Just-in-time): set env variable, run your script unchanged. No code changes, no setup — AITune auto-discovers modules and optimizes them. Good for quick exploration before committing to AOT. Not competing with vLLM or TRT-LLM — fills the gap for everything else: diffusion, CV, speech, embeddings. Works on any PyTorch model. Check it out: github.com/ai-dynamo/aitu…




















