Pedro Sarmento

2.3K posts

Pedro Sarmento

@umpedronosapato

AI & Music Data Scientist @moises_ai | prev. @c4dm

Katılım Haziran 2014

1.9K Takip Edilen2.8K Takipçiler

Pedro Sarmento retweetledi

Paul McCabe@mccabep·4d

My thanks to Alex Ruger and TapeOp for a really fun interview- I enjoyed getting to talk about Roland Future Design Lab and my career with Roland, plus a trip down my own memory lane! Behind The Gear with Roland's Paul McCabe tapeop.com/interviews/173… #tapeop via @tapeopmag

English

120

Pedro Sarmento retweetledi

dadabots@dadabots·4d

yeahh open source is fun 😜🤟

dadabots@dadabots

🥳 Announcing Stable Audio 3 🍕 🏆 fastest music models ever 💻 runs on MacBookPro M-series 🧪 break it plz 🧠 LoRA finetune in < 1h 📷 Sm = faster, Medium = qualityer ⚡ 59x realtime on M5 Pro One-liner fast install: curl -LsSf dadabots.com/_/sa3-mac | bash

English

4.2K

Pedro Sarmento retweetledi

Dorien Herremans@dorienherremans·16 Nis

Fresh off teaching my graduate course on Multimodal Generative AI. I open-sourced the entire thing on GitHub: lectures + labs on vision/audio models, multimodal alignment, RAG, and agentic systems.🔥 Free for anyone building in this space. Link in comments #GenerativeAI

English

329

Pedro Sarmento retweetledi

Nicholas Boffi@nmboffi·7 Nis

🤯 big update to our flow map language models paper! we believe this is the future of non-autoregressive text generation. read about it in the blog: one-step-lm.github.io/blog/ full details in the paper: arxiv.org/abs/2602.16813 we introduce a new class of continuous flow-based language models and distill them into their corresponding flow map for one-step text generation. we beat all discrete diffusion baselines at ~8x speed! v2 gives a complete theory of the flow map over discrete data, with three equivalent ways to learn it (semigroup, lagrangian, eulerian). it turns out you can train these with cross-entropy objectives that look very similar to standard discrete diffusion — but without the factorization error that kills discrete methods at few steps. beyond improving results across the board, we showcase properties that are unique to continuous flows. in particular, inference-time steering and guidance become straightforward. autoguidance brings generative perplexity down to 51.6 on LM1B, while discrete baselines completely collapse at the same guidance scale. we also show reward-guided generation for steering topic, sentiment, grammaticality, and safety at inference time — and it works even at 1-2 steps with our flow map model. simple, well-understood techniques from continuous flows just work incredibly well in practice for language. we’re extremely excited about the future of this class of models. stay tuned for results on scaling, reasoning, and reinforcement learning-based fine-tuning. 🚀

English

476

74.4K

Pedro Sarmento retweetledi

hypebot@hypebot·20 Mar

Pro Musicians, Not Amateurs, Are Leading the AI Music Revolution according to new Water & Music x Moises Study bit.ly/4lFFvqX #aimusic #aimusicproduction #ai #musicians #musictech bit.ly/4lFFvqX

English

258

Pedro Sarmento@umpedronosapato·21 Mar

a vague sense of prog, AI and friendship youtu.be/sumZL0Mx1-I?si…

YouTube

English

151

Pedro Sarmento retweetledi

Chris Donahue@chrisdonahuey·19 Mar

Vibe coding is cool but have you tried vibe patching? Pure Vibes = Pure Data + MCP. Describe your sound, watch the patch appear 🌊 🔊👇

English

106

6.7K

Pedro Sarmento@umpedronosapato·5 Mar

music - AI - ethics 🎸

billboard@billboard

Charlie Puth Named Chief Music Officer at Ethical AI Platform Moises billboard.com/pro/charlie-pu…

English

305

Pedro Sarmento retweetledi

dadabots@dadabots·5 Mar

SCIENCE PAPER DROPPED big ups @zacknovack 🥳 & the @harmonai_org team This paper explores an inexpensive method to add custom control (e.g. pitch) to a pretrained audio diffusion model, without retraining the model arxiv.org/abs/2603.04366

English

1.2K

Pedro Sarmento retweetledi

Sander Dieleman@sedielem·2 Mar

Some really great insights here about the differences between masked and uniform-state discrete diffusion. Both continuous diffusion and uniform-state discrete diffusion for modelling categorical data seem to be making a bit of a comeback recently. Entropy is all you need🙃

Dimitri von Rütte@dvruette

there, I said it. diffusion LLMs are the future! I'll be back in a couple of years to collect my "I told you so" award.

English

220

26.5K

Pedro Sarmento@umpedronosapato·24 Şub

@CasebeerJonah @__gzhu__ @zhepeiw03 @NicholasJBryan Really interesting work! Will there be a link where we can listen to some results or check the code?

English

228

Jonah Casebeer@CasebeerJonah·24 Şub

GenAE: An audio autoencoder engineered for generative modeling. To appear at ICASSP 2026. w/ @__gzhu__ @zhepeiw03 @NicholasJBryan arXiv: arxiv.org/abs/2602.15749 Video: youtu.be/gDIIuLb0cf0

YouTube

English

12.3K

Pedro Sarmento@umpedronosapato·13 Şub

@Ashvala @nonety_pe @tldraw wow this is super cool!

English

170

Ash@Ashvala·13 Şub

Coming soon to @nonety_pe: draw to engrave the music. Built on top of everyone' s favorite drawing tool, @tldraw and a simple 300k parameter CNN.

English

8.3K

Pedro Sarmento retweetledi

Shih-Lun (Sean) Wu@slseanwu·11 Şub

Excited to announce our ICASSP 2026 paper "Stemphonic: All-at-once Flexible Multi-stem Music Generation" ! w/ @__gzhu__, @j_p_caceres, @huangcza, and @NicholasJBryan 🔊Demo stemphonic-demo.vercel.app 📰Paper arxiv.org/abs/2602.09891 More details in🧵

English

3.4K

Pedro Sarmento retweetledi

Jordi Pons@jordiponsdotme·10 Şub

The ACE-Step 1.5 paper can be confusing. In this post I share are its main ideas: artintech.substack.com/p/ace-step-15-… 1. DIFFUSION MODEL: supports multiple tasks. 2. LANGUAGE MODEL: reprompting & semantic tokens generation. 3. DATA PREPARATION: 27M songs. 4. OPEN WEIGHTS: supports LoRAs.

English

2.7K

Pedro Sarmento retweetledi

Yi-Hsuan Yang@affige_yang·9 Şub

Yes! This ATTM Grand Challenge brings the fair-play & affordability we've been longing for in TTM research: from-scratch training on fixed academic data, prioritizing novel algorithmic or system design. We provide a MeanAudio baseline to get started easily. Join us! 🚀 #ICME2026

Hao-Wen (Herman) Dong 董皓文@hermanhwdong

📢Happy to announce the ICME 2026 Grand Challenge on Academic Text-to-Music Generation! - Official launch: Feb 10 - Registration deadline: Mar 20 - Submission deadline: April 23 Co-organizing with @affige_yang @HungyiLee2 @Lonian6 & Fang-Chih Hsieh ntu-musicailab.github.io/ICME26-ATTM-Gr…

English

Pedro Sarmento retweetledi

机器之心 JIQIZHIXIN@jiqizhixin·5 Şub

New paradigm from Kaiming He's team: Drifting Models! With this approach, you can generate a perfect image in a single step. The team trains a "drifting field" that smoothly moves samples toward equilibrium with the real data distribution. The result? A one-step generator that sets a new SOTA on ImageNet 256x256, beating complex multi-step models.

English

162

1.3K

319.9K

Pedro Sarmento retweetledi

Yoshua Bengio@Yoshua_Bengio·3 Şub

Today we’re releasing the International AI Safety Report 2026: the most comprehensive evidence-based assessment of AI capabilities, emerging risks, and safety measures to date. 🧵 (1/17)

English

376

1.1K

470.9K

Pedro Sarmento retweetledi

Stefan Lattner@deeplearnmusic·4 Şub

🔊 New Paper Alert — Training-Free Inference-Time Timbre Transfer Check out our latest #ICASSP paper on timbre transfer in music audio! 🎶 ➡️ Diffusion Timbre Transfer Via Mutual Information Guided Inpainting; Ching Ho Lee, Javier Nistal, Stefan Lattner, Marco Pasini & George Fazekas 📜 Paper: arxiv.org/pdf/2601.01294 🥁 Audio Demos: anon-audio-demo-25.github.io/audio_demo/ In this work, we rethink timbre transfer as an inference-time editing problem — and show that you don’t need to retrain or fine-tune heavy models to change the instrumental color of a piece while preserving its musical structure. 🎯 What’s New? Instead of training separate models or adding control modules for each instrument: ✅ We start from a pre-trained latent diffusion model and steer it on the fly using two simple controls: • Mutual-Information guided noise injection: add noise only in latent channels most informative of timbre. • Early-step clamping: “lock in” melody and rhythm by restoring structure-dominant channels during denoising. This lightweight, training-free procedure lets you control timbre without sacrificing the original melody, harmony or rhythm — and works with text or audio conditioning (e.g., CLAP). ✨ Why It Matters 🎵 Practical music production tools for re-orchestration and sound design 🛠️ Efficient editing with no added model training 🔍 A framework that could extend beyond timbre to other label-driven audio edits 📌 Compatible with strong diffusion backbones and generative audio models @SonyCSLMusic @SonyCSLParis

English

Pedro Sarmento retweetledi

Hope Rugo@hoperugo·24 Oca

Very cool to see this large dataset. Walking is good! Bummer about swimming though! My preferred exercise. I figure variability might be greater with swimming! @OncoAlert

Eric Topol@EricTopol

The @BMJMedicine was supposed to post this paper 2 hours ago but has failed to do. It is being covered by other means #google_vignette" target="_blank" rel="nofollow noopener">medicalxpress.com/news/2026-01-p… Someday the link will be active! dx.doi.org/10.1136/bmjmed…

English

6.3K

Pedro Sarmento retweetledi

arXiv Sound@ArxivSound·22 Oca

Carlos Hernandez-Olivan, Hendrik Vincent Koops, Hao Hao Tan, Elio Quinton, "Single-step Controllable Music Bandwidth Extension With Flow Matching," arxiv.org/abs/2601.14356

English

494

Keşfet

@tapeopmag @zacknovack @harmonai_org @CasebeerJonah @__gzhu__ @zhepeiw03 @NicholasJBryan @Ashvala