Pedro Sarmento
2.3K posts

Pedro Sarmento
@umpedronosapato
AI & Music Data Scientist @moises_ai | prev. @c4dm
Katılım Haziran 2014
1.9K Takip Edilen2.8K Takipçiler
Pedro Sarmento retweetledi
Pedro Sarmento retweetledi

SCIENCE PAPER DROPPED
big ups @zacknovack 🥳 & the @harmonai_org team
This paper explores an inexpensive method to add custom control (e.g. pitch) to a pretrained audio diffusion model, without retraining the model
arxiv.org/abs/2603.04366
English
Pedro Sarmento retweetledi

Some really great insights here about the differences between masked and uniform-state discrete diffusion.
Both continuous diffusion and uniform-state discrete diffusion for modelling categorical data seem to be making a bit of a comeback recently. Entropy is all you need🙃
Dimitri von Rütte@dvruette
there, I said it. diffusion LLMs are the future! I'll be back in a couple of years to collect my "I told you so" award.
English

@CasebeerJonah @__gzhu__ @zhepeiw03 @NicholasJBryan Really interesting work! Will there be a link where we can listen to some results or check the code?
English

GenAE: An audio autoencoder engineered for generative modeling.
To appear at ICASSP 2026.
w/ @__gzhu__ @zhepeiw03 @NicholasJBryan
arXiv: arxiv.org/abs/2602.15749
Video: youtu.be/gDIIuLb0cf0

YouTube
English

Coming soon to @nonety_pe: draw to engrave the music. Built on top of everyone' s favorite drawing tool, @tldraw and a simple 300k parameter CNN.
English
Pedro Sarmento retweetledi

Excited to announce our ICASSP 2026 paper "Stemphonic: All-at-once Flexible Multi-stem Music Generation" !
w/ @__gzhu__, @j_p_caceres, @huangcza, and @NicholasJBryan
🔊Demo stemphonic-demo.vercel.app
📰Paper arxiv.org/abs/2602.09891
More details in🧵
English
Pedro Sarmento retweetledi

The ACE-Step 1.5 paper can be confusing.
In this post I share are its main ideas: artintech.substack.com/p/ace-step-15-…
1. DIFFUSION MODEL: supports multiple tasks.
2. LANGUAGE MODEL: reprompting & semantic tokens generation.
3. DATA PREPARATION: 27M songs.
4. OPEN WEIGHTS: supports LoRAs.




English
Pedro Sarmento retweetledi

Yes! This ATTM Grand Challenge brings the fair-play & affordability we've been longing for in TTM research: from-scratch training on fixed academic data, prioritizing novel algorithmic or system design. We provide a MeanAudio baseline to get started easily. Join us! 🚀 #ICME2026
Hao-Wen (Herman) Dong 董皓文@hermanhwdong
📢Happy to announce the ICME 2026 Grand Challenge on Academic Text-to-Music Generation! - Official launch: Feb 10 - Registration deadline: Mar 20 - Submission deadline: April 23 Co-organizing with @affige_yang @HungyiLee2 @Lonian6 & Fang-Chih Hsieh ntu-musicailab.github.io/ICME26-ATTM-Gr…
English
Pedro Sarmento retweetledi

New paradigm from Kaiming He's team: Drifting Models!
With this approach, you can generate a perfect image in a single step.
The team trains a "drifting field" that smoothly moves samples toward equilibrium with the real data distribution.
The result? A one-step generator that sets a new SOTA on ImageNet 256x256, beating complex multi-step models.

English
Pedro Sarmento retweetledi
Pedro Sarmento retweetledi

🔊 New Paper Alert — Training-Free Inference-Time Timbre Transfer
Check out our latest #ICASSP paper on timbre transfer in music audio! 🎶
➡️ Diffusion Timbre Transfer Via Mutual Information Guided Inpainting; Ching Ho Lee, Javier Nistal, Stefan Lattner, Marco Pasini & George Fazekas
📜 Paper: arxiv.org/pdf/2601.01294
🥁 Audio Demos: anon-audio-demo-25.github.io/audio_demo/
In this work, we rethink timbre transfer as an inference-time editing problem — and show that you don’t need to retrain or fine-tune heavy models to change the instrumental color of a piece while preserving its musical structure.
🎯 What’s New?
Instead of training separate models or adding control modules for each instrument:
✅ We start from a pre-trained latent diffusion model and steer it on the fly using two simple controls:
• Mutual-Information guided noise injection: add noise only in latent channels most informative of timbre.
• Early-step clamping: “lock in” melody and rhythm by restoring structure-dominant channels during denoising.
This lightweight, training-free procedure lets you control timbre without sacrificing the original melody, harmony or rhythm — and works with text or audio conditioning (e.g., CLAP).
✨ Why It Matters
🎵 Practical music production tools for re-orchestration and sound design
🛠️ Efficient editing with no added model training
🔍 A framework that could extend beyond timbre to other label-driven audio edits
📌 Compatible with strong diffusion backbones and generative audio models
@SonyCSLMusic @SonyCSLParis
English
Pedro Sarmento retweetledi

Very cool to see this large dataset. Walking is good! Bummer about swimming though! My preferred exercise. I figure variability might be greater with swimming! @OncoAlert
Eric Topol@EricTopol
The @BMJMedicine was supposed to post this paper 2 hours ago but has failed to do. It is being covered by other means #google_vignette" target="_blank" rel="nofollow noopener">medicalxpress.com/news/2026-01-p…
Someday the link will be active! dx.doi.org/10.1136/bmjmed… English
Pedro Sarmento retweetledi

Carlos Hernandez-Olivan, Hendrik Vincent Koops, Hao Hao Tan, Elio Quinton, "Single-step Controllable Music Bandwidth Extension With Flow Matching," arxiv.org/abs/2601.14356
English

if you're into great music in general, and prog metal in particular, check the new single by titan @jackjamesloth 🤘
servalprog.bandcamp.com/album/gone
English

@ArxivSound are we running out of cool acronyms? 🙃
arxiv.org/abs/2506.17815
English

Xinhao Mei, Gael Le Lan, Haohe Liu, Zhaoheng Ni, Varun Nagaraja, Yang Liu, Yangyang Shi, Vikas Chandra, "SLAP: Scalable Language-Audio Pretraining with Variable-Duration Audio and Multi-Objective Training," arxiv.org/abs/2601.12594
Filipino
Pedro Sarmento retweetledi

Coditany of Timeness --- In 2017 we made the 1st fully neural synthesized album. Ever. We put it on Bandcamp. 1M+ people listened. 100+ articles were written about it. It was research. It was art. It was anti-human black metal.
Today Bandcamp BANS ai 🚫
reddit.com/r/BandCamp/com…
English
Pedro Sarmento retweetledi

Simon Rouard, Manu Orsini, Axel Roebel, Neil Zeghidour, Alexandre D\'efossez, "Continuous Audio Language Models," arxiv.org/abs/2509.06926
Română
Pedro Sarmento retweetledi

AI music doesn’t have to be ‘slop’. It can be Interactive!
Over the break, I wrote a post defining Interactive AI Music, a term that brings together my favorite artistic projects in AI and music.
artintech.substack.com/p/interactive-…
English
Pedro Sarmento retweetledi

TIL torvalds is into audio effect programming github.com/torvalds/Audio…
English

