
🥳Excited to share our latest work: "Diff-A-Riff"! 🥁 A Latent Diffusion Model that generates instrumental accompaniments for any musical input, specifically tailored for music producers! It's faster, lighter, and produces superior audio quality. Control via text/audio references. 48kHz sample rate, (pseudo) stereo, ~3Gb memory, takes 6 seconds to generate 90 seconds of music. Trained on a single GPU. 📜arxiv.org/pdf/2406.08384 🎶sonycslparis.github.io/diffariff-comp… 🎸 "Diff-A-Riff" adapts to any musical input, following the artist's unique style. 🎛️ Optional controls via text prompts, audio references, interpolation slider, pseudo-stereo width and loop intensity. 🎚️ It produces state-of-the-art audio quality indistinguishable from real data by human raters and operates at unprecedented speed. 🧠 "Diff-A-Riff" is smaller and more efficient than previous models thanks to its Consistency Autoencoder, making it accessible and practical for various applications. Big shoutout to my outer space colleagues: Javier Nistal, the Machine in "machine learning" 🚄, Marco Pasini, the neural net whisperer 🤫, Cyran Aouameur, the troubleshootah 🛠️, Maarten Grachten, aka MaartenGPT 🤖. #Teamwork #AI #MusicTech #Innovation @latentspaces @marco_ppasini @cyranaouameur @SonyCSLMusic










