Stefan Lattner

191 posts

Stefan Lattner banner
Stefan Lattner

Stefan Lattner

@deeplearnmusic

Sony Computer Science Laboratories, Paris

Paris, Frankreich 가입일 Mayıs 2018
198 팔로잉1.6K 팔로워
Stefan Lattner
Stefan Lattner@deeplearnmusic·
🔊 New Paper Alert — Training-Free Inference-Time Timbre Transfer Check out our latest #ICASSP paper on timbre transfer in music audio! 🎶 ➡️ Diffusion Timbre Transfer Via Mutual Information Guided Inpainting; Ching Ho Lee, Javier Nistal, Stefan Lattner, Marco Pasini & George Fazekas 📜 Paper: arxiv.org/pdf/2601.01294 🥁 Audio Demos: anon-audio-demo-25.github.io/audio_demo/ In this work, we rethink timbre transfer as an inference-time editing problem — and show that you don’t need to retrain or fine-tune heavy models to change the instrumental color of a piece while preserving its musical structure. 🎯 What’s New? Instead of training separate models or adding control modules for each instrument: ✅ We start from a pre-trained latent diffusion model and steer it on the fly using two simple controls: • Mutual-Information guided noise injection: add noise only in latent channels most informative of timbre. • Early-step clamping: “lock in” melody and rhythm by restoring structure-dominant channels during denoising. This lightweight, training-free procedure lets you control timbre without sacrificing the original melody, harmony or rhythm — and works with text or audio conditioning (e.g., CLAP). ✨ Why It Matters 🎵 Practical music production tools for re-orchestration and sound design 🛠️ Efficient editing with no added model training 🔍 A framework that could extend beyond timbre to other label-driven audio edits 📌 Compatible with strong diffusion backbones and generative audio models @SonyCSLMusic @SonyCSLParis
English
0
4
22
1.9K
Stefan Lattner 리트윗함
Alain Riou
Alain Riou@howariou·
We just released the full training code, as well as our best pretrained model! 🎉 Feel free to use our SOTA checkpoint in your own project with 3 lines of code, or to retrain on your own data using our Lightning+Hydra+Dora codebase ⚡️🐍 🌐 github.com/sony/sampleid
Alain Riou tweet media
Alain Riou@howariou

Eminem sampled Aerosmith, 50 Cent sampled Nina Simone, everybody sampled Chic... Many great songs sampled existing ones! Detecting this is the topic of our latest paper with @serrjoa at @SonyAI Barcelona 😎 tl;dr: multi-track dataset + few tricks = +18% boost over SOTA 🚀 1/N

English
1
5
17
1.6K
Stefan Lattner
Stefan Lattner@deeplearnmusic·
🎶 Internship Opportunity — Sony CSL Paris Music Team Focus: Artist-centric music AI Location: Paris, France Duration: 3-6 months Start date: January-June 2026 (flexible) About the Team The Music Team at Sony Computer Science Laboratories (CSL) Paris is working on the future of AI-assisted music creation. We explore how AI can empower creativity rather than replace it. Our research spans human-AI collaboration, interactive audio generation, live performance tools, and music cognition, all centered on artists’ real-world workflows and creative needs. Internship Mission We’re seeking a creative and technically skilled research intern to contribute to the development of artist-centric AI tools. You will work with our research team to design, implement, and test cutting-edge machine learning algorithms, which will form the basis of tools that enhance artistic control and exploration. Candidate Profile Essential: - Strong Python skills - Experience with a machine learning framework (PyTorch strongly preferred) - Comfortable using the command line in Linux environments - Background in at least one of: audio signal processing / acoustics / music theory / music cognition - Experience training modern neural networks Strongly preferred: - Familiar with the design of modern transformers - Familiar with major classes of modern generative models (autoregressive, diffusion/flow-based) - Currently pursuing a Master's or PhD in computer science, AI, EE, music technology, or equivalent Bonus: A strong interest in music, whether you play it, produce it, or listen to a lot of it. Publications in relevant conferences or journals (ISMIR, ICASSP, AES, ICLR, NeurIPS, ICML). 👉 How to Apply Send your CV and a short motivation statement to cslmusicteam@sony.com.
English
1
4
23
2.2K
Stefan Lattner
Stefan Lattner@deeplearnmusic·
🔥New @ISMIRConf paper alert: CoDiCodec – a unified neural audio codec producing both continuous embeddings and discrete tokens from the same model. 👉 pip install codicodec It outperforms existing continuous and discrete codecs in audio quality (FAD, FAD_clap)! ..by the great and only one @marco_ppasini 💪 CoDiCodec offers - Continuous (~11 Hz) + discrete (2.38 kbps) latents - FSQ-dropout: improves continuous decoding while keeping discrete tokens useful - Autoregressive & parallel decoding @SonyCSLParis @SonyCSLMusic #codec
English
1
4
15
831
Stefan Lattner
Stefan Lattner@deeplearnmusic·
🎶 New ISMIR 2025 paper! "Estimating Musical Surprisal from Audio in Autoregressive Diffusion Model Noise Spaces" by Mathias Rose Bjare (JKU), Stefan Lattner (Sony CSL), and Gerhard Widmer (JKU/LIT AI Lab). We explore how surprisal — the unexpectedness of musical events — can be modeled directly from audio using autoregressive diffusion models (ADMs). 💡 What we did: - Compared surprisal from diffusion models vs. Generative Infinite-Vocabulary Transformers (GIVT). - Evaluated across tasks: monophonic pitch surprisal (expectation) & segment boundary detection (structural surprise). - Tested surprisal at different noise levels in diffusion processes to see how musical features emerge at multiple granularities. 🔥 Key take-aways: - Diffusion surprisal beats GIVT in modeling pitch expectation & boundary detection. - Mid-level noise surprisal captures pitch-level expectations while suppressing timbre-related variation. - Surprisal curves align with human-like musical segmentation, showing potential as proxies for perceptual surprise. Why it matters: Understanding musical surprisal links computational modeling with human perception and cognition — with applications in AI composition, real-time music interaction, and brain-music studies. 📜 Paper: arxiv.org/abs/2508.05306 💻 Code: github.com/SonyCSLParis/a… #DiffusionModels #Surprisal #MIR @ISMIRConf @SonyCSLParis @SonyCSLMusic
Stefan Lattner tweet media
English
0
5
28
1.8K
Stefan Lattner
Stefan Lattner@deeplearnmusic·
🎉 New paper announcement: "Assessing the Alignment of Audio Representations with Timbre Similarity Ratings" (accepted at ISMIR Conference 2025) 🎶 We asked a simple question: How close are today’s audio embeddings to the way humans perceive timbre? - We evaluated 18 hand-crafted & learned representations (MFCC, Music2Latent, Encodec, JTFS, CLAP, etc.) against 2 614 human similarity ratings spanning 21 classic “timbre-space” datasets. - We also tested a new sound-matching model and introduced an open-source Python package for easy benchmarking. 👉 Key take-aways: - Style embeddings (Gatys/Huang) extracted from CLAP and our sound-matching model *achieve the highest agreement* with human judgments, topping all other metrics. - Music2latent representations beat most of the others in (non-rank-based) relative distances - MFCCs are still competitive, sometimes beating more complex models. - Results pave the way for perceptually grounded metrics in generative audio, sample retrieval & instrument modeling. Big thanks to @tiianhk (QMUL) & Charalampos Saitis (QMUL) 📜 Read the paper: arxiv.org/abs/2507.07764 (Repo link inside the paper – contributions welcome!) #Timbre #AudioAI #MIR #ISMIR2025 #Research @SonyCSLMusic @SonyCSL
English
0
1
21
1.7K
Stefan Lattner 리트윗함
Stefan Lattner
Stefan Lattner@deeplearnmusic·
🔥Visit our talks and posters at #ICASSP2025! 👀 Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding M. Pasini, S. Lattner, G. Fazekas Wednesday, April 9 (2:00 pm): Deep generative models I Accompaniment Prompt Adherence: A Measure for Evaluating Music Accompaniment Systems M. Grachten, J. Nistal Friday, April 11 (11:30 am): Applied Signal Processing Systems Estimating Musical Surprisal in Audio M. Bjare, G. Cantisani, S. Lattner and G. Widmer Wednesday, April 9 (11:30 am): Music analysis II Hybrid Losses for Hierarchical Embedding Learning H. Tian, S. Lattner, B. McFee, C. Saitis Tuesday, April 8 (2:00 pm): Music analysis I Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures A. Riou, S. Lattner, A. Gagneré, G. Hadjeres, S. Lattner, G. Peeters Tuesday, April 8 (2:00 pm): Music analysis I @howariou @tiianhk @latentspaces @SonyCSLMusic @SonyCSLParis
English
0
4
23
1.4K
Stefan Lattner
Stefan Lattner@deeplearnmusic·
🌟My keynote at the @c4dm workshop about "Models of Musical Signals: Representation, Learning & Generation" is now on YouTube, giving an overview on developments in self-supervised learning for audio since 2020, low-level representation learning, audio (stem) generation and much more 🧵👇 youtube.com/watch?v=ixHfBP… @SonyCSLMusic @SonyCSLParis
YouTube video
YouTube
English
1
11
37
2.7K