Pedro Sarmento

2.3K posts

Pedro Sarmento banner
Pedro Sarmento

Pedro Sarmento

@umpedronosapato

AI & Music Data Scientist @moises_ai | prev. @c4dm

Katılım Haziran 2014
1.9K Takip Edilen2.8K Takipçiler
Pedro Sarmento retweetledi
Paul McCabe
Paul McCabe@mccabep·
My thanks to Alex Ruger and TapeOp for a really fun interview- I enjoyed getting to talk about Roland Future Design Lab and my career with Roland, plus a trip down my own memory lane! Behind The Gear with Roland's Paul McCabe tapeop.com/interviews/173… #tapeop via @tapeopmag
English
0
1
2
120
Pedro Sarmento retweetledi
Dorien Herremans
Dorien Herremans@dorienherremans·
Fresh off teaching my graduate course on Multimodal Generative AI. I open-sourced the entire thing on GitHub: lectures + labs on vision/audio models, multimodal alignment, RAG, and agentic systems.🔥 Free for anyone building in this space. Link in comments #GenerativeAI
Dorien Herremans tweet media
English
2
1
7
329
Pedro Sarmento retweetledi
Nicholas Boffi
Nicholas Boffi@nmboffi·
🤯 big update to our flow map language models paper! we believe this is the future of non-autoregressive text generation. read about it in the blog: one-step-lm.github.io/blog/ full details in the paper: arxiv.org/abs/2602.16813 we introduce a new class of continuous flow-based language models and distill them into their corresponding flow map for one-step text generation. we beat all discrete diffusion baselines at ~8x speed! v2 gives a complete theory of the flow map over discrete data, with three equivalent ways to learn it (semigroup, lagrangian, eulerian). it turns out you can train these with cross-entropy objectives that look very similar to standard discrete diffusion — but without the factorization error that kills discrete methods at few steps. beyond improving results across the board, we showcase properties that are unique to continuous flows. in particular, inference-time steering and guidance become straightforward. autoguidance brings generative perplexity down to 51.6 on LM1B, while discrete baselines completely collapse at the same guidance scale. we also show reward-guided generation for steering topic, sentiment, grammaticality, and safety at inference time — and it works even at 1-2 steps with our flow map model. simple, well-understood techniques from continuous flows just work incredibly well in practice for language. we’re extremely excited about the future of this class of models. stay tuned for results on scaling, reasoning, and reinforcement learning-based fine-tuning. 🚀
English
13
91
476
74.4K
Pedro Sarmento retweetledi
Chris Donahue
Chris Donahue@chrisdonahuey·
Vibe coding is cool but have you tried vibe patching? Pure Vibes = Pure Data + MCP. Describe your sound, watch the patch appear 🌊 🔊👇
English
7
16
106
6.7K
Pedro Sarmento retweetledi
dadabots
dadabots@dadabots·
SCIENCE PAPER DROPPED big ups @zacknovack 🥳 & the @harmonai_org team This paper explores an inexpensive method to add custom control (e.g. pitch) to a pretrained audio diffusion model, without retraining the model arxiv.org/abs/2603.04366
English
1
3
42
1.2K
Pedro Sarmento retweetledi
Sander Dieleman
Sander Dieleman@sedielem·
Some really great insights here about the differences between masked and uniform-state discrete diffusion. Both continuous diffusion and uniform-state discrete diffusion for modelling categorical data seem to be making a bit of a comeback recently. Entropy is all you need🙃
Dimitri von Rütte@dvruette

there, I said it. diffusion LLMs are the future! I'll be back in a couple of years to collect my "I told you so" award.

English
6
19
220
26.5K
Ash
Ash@Ashvala·
Coming soon to @nonety_pe: draw to engrave the music. Built on top of everyone' s favorite drawing tool, @tldraw and a simple 300k parameter CNN.
English
4
9
81
8.3K
Pedro Sarmento retweetledi
Jordi Pons
Jordi Pons@jordiponsdotme·
The ACE-Step 1.5 paper can be confusing. In this post I share are its main ideas: artintech.substack.com/p/ace-step-15-… 1. DIFFUSION MODEL: supports multiple tasks. 2. LANGUAGE MODEL: reprompting & semantic tokens generation. 3. DATA PREPARATION: 27M songs. 4. OPEN WEIGHTS: supports LoRAs.
Jordi Pons tweet mediaJordi Pons tweet mediaJordi Pons tweet mediaJordi Pons tweet media
English
0
5
46
2.7K
Pedro Sarmento retweetledi
Yi-Hsuan Yang
Yi-Hsuan Yang@affige_yang·
Yes! This ATTM Grand Challenge brings the fair-play & affordability we've been longing for in TTM research: from-scratch training on fixed academic data, prioritizing novel algorithmic or system design. We provide a MeanAudio baseline to get started easily. Join us! 🚀 #ICME2026
Hao-Wen (Herman) Dong 董皓文@hermanhwdong

📢Happy to announce the ICME 2026 Grand Challenge on Academic Text-to-Music Generation! - Official launch: Feb 10 - Registration deadline: Mar 20 - Submission deadline: April 23 Co-organizing with @affige_yang @HungyiLee2 @Lonian6 & Fang-Chih Hsieh ntu-musicailab.github.io/ICME26-ATTM-Gr…

English
0
1
17
1K
Pedro Sarmento retweetledi
机器之心 JIQIZHIXIN
机器之心 JIQIZHIXIN@jiqizhixin·
New paradigm from Kaiming He's team: Drifting Models! With this approach, you can generate a perfect image in a single step. The team trains a "drifting field" that smoothly moves samples toward equilibrium with the real data distribution. The result? A one-step generator that sets a new SOTA on ImageNet 256x256, beating complex multi-step models.
机器之心 JIQIZHIXIN tweet media
English
15
162
1.3K
319.9K
Pedro Sarmento retweetledi
Yoshua Bengio
Yoshua Bengio@Yoshua_Bengio·
Today we’re releasing the International AI Safety Report 2026: the most comprehensive evidence-based assessment of AI capabilities, emerging risks, and safety measures to date. 🧵 (1/17)
English
67
376
1.1K
470.9K
Pedro Sarmento retweetledi
Stefan Lattner
Stefan Lattner@deeplearnmusic·
🔊 New Paper Alert — Training-Free Inference-Time Timbre Transfer Check out our latest #ICASSP paper on timbre transfer in music audio! 🎶 ➡️ Diffusion Timbre Transfer Via Mutual Information Guided Inpainting; Ching Ho Lee, Javier Nistal, Stefan Lattner, Marco Pasini & George Fazekas 📜 Paper: arxiv.org/pdf/2601.01294 🥁 Audio Demos: anon-audio-demo-25.github.io/audio_demo/ In this work, we rethink timbre transfer as an inference-time editing problem — and show that you don’t need to retrain or fine-tune heavy models to change the instrumental color of a piece while preserving its musical structure. 🎯 What’s New? Instead of training separate models or adding control modules for each instrument: ✅ We start from a pre-trained latent diffusion model and steer it on the fly using two simple controls: • Mutual-Information guided noise injection: add noise only in latent channels most informative of timbre. • Early-step clamping: “lock in” melody and rhythm by restoring structure-dominant channels during denoising. This lightweight, training-free procedure lets you control timbre without sacrificing the original melody, harmony or rhythm — and works with text or audio conditioning (e.g., CLAP). ✨ Why It Matters 🎵 Practical music production tools for re-orchestration and sound design 🛠️ Efficient editing with no added model training 🔍 A framework that could extend beyond timbre to other label-driven audio edits 📌 Compatible with strong diffusion backbones and generative audio models @SonyCSLMusic @SonyCSLParis
English
0
4
22
2K
Pedro Sarmento retweetledi
Hope Rugo
Hope Rugo@hoperugo·
Very cool to see this large dataset. Walking is good! Bummer about swimming though! My preferred exercise. I figure variability might be greater with swimming! @OncoAlert
Eric Topol@EricTopol

The @BMJMedicine was supposed to post this paper 2 hours ago but has failed to do. It is being covered by other means #google_vignette" target="_blank" rel="nofollow noopener">medicalxpress.com/news/2026-01-p… Someday the link will be active! dx.doi.org/10.1136/bmjmed…

English
0
7
32
6.3K
Pedro Sarmento retweetledi
arXiv Sound
arXiv Sound@ArxivSound·
Carlos Hernandez-Olivan, Hendrik Vincent Koops, Hao Hao Tan, Elio Quinton, "Single-step Controllable Music Bandwidth Extension With Flow Matching," arxiv.org/abs/2601.14356
English
0
3
5
494