Stefan Lattner

191 posts

Stefan Lattner

@deeplearnmusic

Sony Computer Science Laboratories, Paris

Paris, Frankreich 가입일 Mayıs 2018

198 팔로잉1.6K 팔로워

Stefan Lattner@deeplearnmusic·4 Şub

🔊 New Paper Alert — Training-Free Inference-Time Timbre Transfer Check out our latest #ICASSP paper on timbre transfer in music audio! 🎶 ➡️ Diffusion Timbre Transfer Via Mutual Information Guided Inpainting; Ching Ho Lee, Javier Nistal, Stefan Lattner, Marco Pasini & George Fazekas 📜 Paper: arxiv.org/pdf/2601.01294 🥁 Audio Demos: anon-audio-demo-25.github.io/audio_demo/ In this work, we rethink timbre transfer as an inference-time editing problem — and show that you don’t need to retrain or fine-tune heavy models to change the instrumental color of a piece while preserving its musical structure. 🎯 What’s New? Instead of training separate models or adding control modules for each instrument: ✅ We start from a pre-trained latent diffusion model and steer it on the fly using two simple controls: • Mutual-Information guided noise injection: add noise only in latent channels most informative of timbre. • Early-step clamping: “lock in” melody and rhythm by restoring structure-dominant channels during denoising. This lightweight, training-free procedure lets you control timbre without sacrificing the original melody, harmony or rhythm — and works with text or audio conditioning (e.g., CLAP). ✨ Why It Matters 🎵 Practical music production tools for re-orchestration and sound design 🛠️ Efficient editing with no added model training 🔍 A framework that could extend beyond timbre to other label-driven audio edits 📌 Compatible with strong diffusion backbones and generative audio models @SonyCSLMusic @SonyCSLParis

English

1.9K

Stefan Lattner 리트윗함

Alain Riou@howariou·27 Eki

We just released the full training code, as well as our best pretrained model! 🎉 Feel free to use our SOTA checkpoint in your own project with 3 lines of code, or to retrain on your own data using our Lightning+Hydra+Dora codebase ⚡️🐍 🌐 github.com/sony/sampleid

Alain Riou@howariou

Eminem sampled Aerosmith, 50 Cent sampled Nina Simone, everybody sampled Chic... Many great songs sampled existing ones! Detecting this is the topic of our latest paper with @serrjoa at @SonyAI Barcelona 😎 tl;dr: multi-track dataset + few tricks = +18% boost over SOTA 🚀 1/N

English

1.6K

Stefan Lattner@deeplearnmusic·29 Eki

🎶 Internship Opportunity — Sony CSL Paris Music Team Focus: Artist-centric music AI Location: Paris, France Duration: 3-6 months Start date: January-June 2026 (flexible) About the Team The Music Team at Sony Computer Science Laboratories (CSL) Paris is working on the future of AI-assisted music creation. We explore how AI can empower creativity rather than replace it. Our research spans human-AI collaboration, interactive audio generation, live performance tools, and music cognition, all centered on artists’ real-world workflows and creative needs. Internship Mission We’re seeking a creative and technically skilled research intern to contribute to the development of artist-centric AI tools. You will work with our research team to design, implement, and test cutting-edge machine learning algorithms, which will form the basis of tools that enhance artistic control and exploration. Candidate Profile Essential: - Strong Python skills - Experience with a machine learning framework (PyTorch strongly preferred) - Comfortable using the command line in Linux environments - Background in at least one of: audio signal processing / acoustics / music theory / music cognition - Experience training modern neural networks Strongly preferred: - Familiar with the design of modern transformers - Familiar with major classes of modern generative models (autoregressive, diffusion/flow-based) - Currently pursuing a Master's or PhD in computer science, AI, EE, music technology, or equivalent Bonus: A strong interest in music, whether you play it, produce it, or listen to a lot of it. Publications in relevant conferences or journals (ISMIR, ICASSP, AES, ICLR, NeurIPS, ICML). 👉 How to Apply Send your CV and a short motivation statement to cslmusicteam@sony.com.

English

2.2K

Stefan Lattner@deeplearnmusic·15 Eyl

@ISMIRConf 📜 Paper: arxiv.org/abs/2509.09836 🎧 Samples: sonycslparis.github.io/codicodec-comp… 💻 Inference code + weights: github.com/SonyCSLParis/c…

English

362

Stefan Lattner@deeplearnmusic·15 Eyl

🔥New @ISMIRConf paper alert: CoDiCodec – a unified neural audio codec producing both continuous embeddings and discrete tokens from the same model. 👉 pip install codicodec It outperforms existing continuous and discrete codecs in audio quality (FAD, FAD_clap)! ..by the great and only one @marco_ppasini 💪 CoDiCodec offers - Continuous (~11 Hz) + discrete (2.38 kbps) latents - FSQ-dropout: improves continuous decoding while keeping discrete tokens useful - Autoregressive & parallel decoding @SonyCSLParis @SonyCSLMusic #codec

English

831

Stefan Lattner@deeplearnmusic·20 Ağu

🎶 New ISMIR 2025 paper! "Estimating Musical Surprisal from Audio in Autoregressive Diffusion Model Noise Spaces" by Mathias Rose Bjare (JKU), Stefan Lattner (Sony CSL), and Gerhard Widmer (JKU/LIT AI Lab). We explore how surprisal — the unexpectedness of musical events — can be modeled directly from audio using autoregressive diffusion models (ADMs). 💡 What we did: - Compared surprisal from diffusion models vs. Generative Infinite-Vocabulary Transformers (GIVT). - Evaluated across tasks: monophonic pitch surprisal (expectation) & segment boundary detection (structural surprise). - Tested surprisal at different noise levels in diffusion processes to see how musical features emerge at multiple granularities. 🔥 Key take-aways: - Diffusion surprisal beats GIVT in modeling pitch expectation & boundary detection. - Mid-level noise surprisal captures pitch-level expectations while suppressing timbre-related variation. - Surprisal curves align with human-like musical segmentation, showing potential as proxies for perceptual surprise. Why it matters: Understanding musical surprisal links computational modeling with human perception and cognition — with applications in AI composition, real-time music interaction, and brain-music studies. 📜 Paper: arxiv.org/abs/2508.05306 💻 Code: github.com/SonyCSLParis/a… #DiffusionModels #Surprisal #MIR @ISMIRConf @SonyCSLParis @SonyCSLMusic

English

1.8K

Stefan Lattner@deeplearnmusic·11 Tem

🎉 New paper announcement: "Assessing the Alignment of Audio Representations with Timbre Similarity Ratings" (accepted at ISMIR Conference 2025) 🎶 We asked a simple question: How close are today’s audio embeddings to the way humans perceive timbre? - We evaluated 18 hand-crafted & learned representations (MFCC, Music2Latent, Encodec, JTFS, CLAP, etc.) against 2 614 human similarity ratings spanning 21 classic “timbre-space” datasets. - We also tested a new sound-matching model and introduced an open-source Python package for easy benchmarking. 👉 Key take-aways: - Style embeddings (Gatys/Huang) extracted from CLAP and our sound-matching model *achieve the highest agreement* with human judgments, topping all other metrics. - Music2latent representations beat most of the others in (non-rank-based) relative distances - MFCCs are still competitive, sometimes beating more complex models. - Results pave the way for perceptually grounded metrics in generative audio, sample retrieval & instrument modeling. Big thanks to @tiianhk (QMUL) & Charalampos Saitis (QMUL) 📜 Read the paper: arxiv.org/abs/2507.07764 (Repo link inside the paper – contributions welcome!) #Timbre #AudioAI #MIR #ISMIR2025 #Research @SonyCSLMusic @SonyCSL

English

1.7K

Stefan Lattner 리트윗함

Julien Guinot@Juj_Guinot·16 Haz

Happy to say that our tutorial for Self-Supervised Learning in MIR was accepted @ISMIRConf 2025!! Join @howariou, @marco_ppasini @YuexuanKong, @Gabolsgabs and @deeplearnmusic, and myself in Daejeon in september to learn more about music SSL 🎶 Stay tuned for paper preprints 🎀

ISMIR Conference@ISMIRConf

🎓 ISMIR 2025 Tutorials – Sunday, Sep 21! 🎓 Explore cutting-edge topics in Music Information Retrieval, presented by leading researchers: 1⃣ Differentiable Physical Modeling Sound Synthesis: Theory, Musical Application, and Programming 2⃣ Self-supervised Learning for Music – An Overview and New Horizons 3⃣ PsyNet: Online Research Platform for Music Studies 4⃣ Differentiable Alignment Techniques for Music Processing: Techniques and Applications 5⃣ Explainable AI for Music Information Retrieval 6⃣ MIR for Health, Medicine, and Well-being 🔗 ismir2025.ismir.net/program-tutori… #ISMIR2025 #MIR #MusicTech #Daejeon #Korea

English

1.6K

Stefan Lattner@deeplearnmusic·15 May

As @TechHubbySony went offline, here is the direct link to DrumGAN: drumgan.csl.sony.fr

English

633

Stefan Lattner 리트윗함

SonyCSL(Paris)_Music Team@SonyCSLMusic·5 May

Excited to have been part of #ICASSP2025 🎶Our researchers shared work on modeling musical surprisal in audio, zero-shot stem retrieval for music production, and #Music2Latent2, a new audio compression method Thanks to all who joined! sony.com/en/SonyInfo/te… #AudioResearch #AI

English

1.3K

Stefan Lattner@deeplearnmusic·8 Nis

🔥Visit our talks and posters at #ICASSP2025! 👀 Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding M. Pasini, S. Lattner, G. Fazekas Wednesday, April 9 (2:00 pm): Deep generative models I Accompaniment Prompt Adherence: A Measure for Evaluating Music Accompaniment Systems M. Grachten, J. Nistal Friday, April 11 (11:30 am): Applied Signal Processing Systems Estimating Musical Surprisal in Audio M. Bjare, G. Cantisani, S. Lattner and G. Widmer Wednesday, April 9 (11:30 am): Music analysis II Hybrid Losses for Hierarchical Embedding Learning H. Tian, S. Lattner, B. McFee, C. Saitis Tuesday, April 8 (2:00 pm): Music analysis I Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures A. Riou, S. Lattner, A. Gagneré, G. Hadjeres, S. Lattner, G. Peeters Tuesday, April 8 (2:00 pm): Music analysis I @howariou @tiianhk @latentspaces @SonyCSLMusic @SonyCSLParis

English

1.4K

Stefan Lattner@deeplearnmusic·1 Nis

We are excited to share our recent collaboration with electronic artist DeLaurentis (delaurentis.lnk.to/Musicalism) 🎤✨, showcasing the new real-time capabilities of @howariou 's PESTO, a lightweight pitch estimation tool. Real-time PESTO 🚀 is now accessible to the community 🎶 : 👉 github.com/SonyCSLParis/p… Amazing visuals by Alexis Andre! youtube.com/watch?v=lS_L1E… @SonyCSLMusic @SonyCSLParis

YouTube

English

894

Stefan Lattner@deeplearnmusic·13 Şub

"Comparing Representations for Audio Synthesis Using Generative Adversarial Networks" by @latentspaces et al. arxiv.org/abs/2006.09266

English

898

Stefan Lattner@deeplearnmusic·13 Şub

"SampleMatch: Drum Sample Retrieval by Musical Context" arxiv.org/abs/2208.01141

English

968

Stefan Lattner@deeplearnmusic·13 Şub

🌟My keynote at the @c4dm workshop about "Models of Musical Signals: Representation, Learning & Generation" is now on YouTube, giving an overview on developments in self-supervised learning for audio since 2020, low-level representation learning, audio (stem) generation and much more 🧵👇 youtube.com/watch?v=ixHfBP… @SonyCSLMusic @SonyCSLParis

YouTube

English

2.7K

탐색

@SonyCSLMusic @SonyCSLParis @ISMIRConf @marco_ppasini @tiianhk @SonyCSL @howariou @YuexuanKong