Andrew Rouditchenko 🇺🇦

224 posts

Andrew Rouditchenko 🇺🇦

@arouditchenko

PhD student at MIT working on multi-modal and multilingual speech. I was an intern at @AIatMeta and @Apple MLR.

가입일 Aralık 2016

566 팔로잉467 팔로워

고정된 트윗

Andrew Rouditchenko 🇺🇦@arouditchenko·15 May

Do you really need audio to fine-tune your Audio LLM? 🤔 Answer below: Introducing Omni-R1, a simple GRPO fine‑tuning method for Qwen2.5‑Omni on audio question answering. It sets new state‑of‑the‑art accuracies on the MMAU benchmark for Audio LLMs. arxiv.org/abs/2505.09439

English

148

8.8K

Andrew Rouditchenko 🇺🇦 리트윗함

Umberto Cappellazzo@Umberto_Senpai·13 Mar

How do AVSR models balance what they hear and what they see? Introducing Dr. SHAP-AV, the first large-scale Shapley-based analysis of modality contributions in audio-visual speech recognition. 6 sota models 2 benchmarks 3 analyses 🌐Project page: umbertocappellazzo.github.io/Dr-SHAP-AV/ 🧵👇

English

116

Andrew Rouditchenko 🇺🇦@arouditchenko·9 Kas

@rdesh26 Thanks for sharing!

English

Desh Raj@rdesh26·8 Kas

@arouditchenko I saw a very interesting poster today at SANE showing that audio LLMs often answer without even paying attention to the audio: giovana-morais.github.io/2025_investiga…

English

135

Andrew Rouditchenko 🇺🇦@arouditchenko·15 May

English

148

8.8K

Andrew Rouditchenko 🇺🇦 리트윗함

Puyuan Peng@PuyuanPeng·30 Eyl

𝐅𝐫𝐨𝐦 𝐮𝐧𝐞𝐦𝐩𝐥𝐨𝐲𝐚𝐛𝐥𝐞 𝐦𝐚𝐭𝐡 𝐮𝐧𝐝𝐞𝐫𝐠𝐫𝐚𝐝 → 𝐭𝐨 𝟗,𝟎𝟎𝟎 𝐆𝐢𝐭𝐇𝐮𝐛 𝐬𝐭𝐚𝐫𝐬 & 𝟒 𝐫𝐞𝐬𝐞𝐚𝐫𝐜𝐡 𝐬𝐜𝐢𝐞𝐧𝐭𝐢𝐬𝐭 𝐨𝐟𝐟𝐞𝐫𝐬 (𝐌𝐒𝐋, 𝐞𝐭𝐜.) 👉My journey of doing 𝐏𝐡𝐃 𝐢𝐧 𝐀𝐈: tinyurl.com/5n7b7v36

English

856

Andrew Rouditchenko 🇺🇦@arouditchenko·25 Ağu

@anmol01gulati Congrats!

English

227

Anmol Gulati@anmol01gulati·23 Ağu

Honored to receive the Most Influential Paper at Interspeech in last 5 years for Conformer — Test of Time Award at Interspeech. Conformer was my very first paper at Google Brain(2020) and is the de-facto speech encoder architecture in recognition systems worldwide. Story Time 🧵

English

291

785.2K

Andrew Rouditchenko 🇺🇦 리트윗함

Jiawei (Joe) Zhou@jzhou_jz·13 Ağu

🎙️ Another #MultimodalAI workshop we are organizing—this one zeroes in on speech & language foundation models! 📚 Dive into #SpeechAI, audio, and language tech. Learn how to build foundation models and hear from both academia and industry experts. 🗓Sep 4–5, 2025 | @TTIC_Connect

Shinji Watanabe@shinjiw_at_cmu

📢 Excited to announce our 2-day workshop on "Foundations of Speech and Audio Foundation Models" at TTI Chicago, happening September 4–5! 🔗 Info & registration: sites.google.com/view/speech-ai… 📝 Poster submissions welcome! Join us for talks, discussions, and community building!

English

4.2K

Andrew Rouditchenko 🇺🇦@arouditchenko·4 Tem

@chenwanch1 Your rebuttal leaves them speechless

English

304

William Chen@chenwanch1·4 Tem

What is it with speech reviewers on openreview? In my past 3 submissions (EMNLP 24, ICML 25, EMNLP 25), I have gotten only 1 reply to a rebuttal, out of a total of 11 reviews. Very frustrating, esp since they ask for more results and analyses that take a lot of time/compute.

English

2.3K

Andrew Rouditchenko 🇺🇦 리트윗함

Peyman Milanfar@docmilanfar·30 Haz

If your PhD advisor dressed like this, you probably didn't use neural nets in your thesis

English

949

81.4K

Andrew Rouditchenko 🇺🇦 리트윗함

yobibyte@y0b1byte·26 Haz

Finally, after all these years of being mocked, ffmpeg enthusiasts win!

English

325

5.4K

356.3K

Andrew Rouditchenko 🇺🇦 리트윗함

Heng-Jui Chang@hjchang87·25 Haz

💡Bridging speech, sound, & music representations with one universal model? We introduce USAD ✅ 📚 Distills knowledge from domain-specific SSL models 🎯 Matches expert models across speech/audio/music tasks 📄 arxiv.org/abs/2506.18843 🧑‍💻 huggingface.co/MIT-SLS/USAD-B…

English

Andrew Rouditchenko 🇺🇦 리트윗함

Herman Kamper@HermanKamper·20 Haz

Learn to figure out what is worth figuring out: kamperh.com/2025/06/20/kno…

English

459

Andrew Rouditchenko 🇺🇦@arouditchenko·27 May

@nizumical +another benchmark that merges them into one benchmark 😅

English

167

daisukelab@nizumical·27 May

Surge of LALM benchmarking papers. arxiv.org/pdf/2505.15957 Holistic... arxiv.org/abs/2505.17568 JALMBench arxiv.org/abs/2505.15406 Audio Jailbreak arxiv.org/abs/2505.16211 AudioTrust arxiv.org/abs/2505.13032 MMAR arxiv.org/abs/2505.13237 SAKURA arxiv.org/abs/2505.13115 Benchmarking...

English

1.5K

Andrew Rouditchenko 🇺🇦@arouditchenko·23 May

Congrats to Edson for leading our Contrastive Audio-Visual Masked Autoencoders 2.0 Project (CAV-MAE Sync), accepted at #CVPR2025! Check out Edson's thread for more details ⬇️

Edson Araujo@edsonroteia

🚀 Excited to announce our #CVPR2025 paper: CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment! We introduce a simple yet effective method for improved audio-visual learning. 🔗 Project: edsonroteia.github.io/cav-mae-sync/ 🧵 (1/7)👇

English

368

Andrew Rouditchenko 🇺🇦@arouditchenko·15 May

Link the MMAU leaderboard (Massive Multi-Task Audio Understanding and Reasoning Benchmark) - it should hopefully be updated soon with Omni-R1 #leaderboard" target="_blank" rel="nofollow noopener">sakshi113.github.io/mmau_homepage/…

English

297

Andrew Rouditchenko 🇺🇦@arouditchenko·15 May

Code, models, and datasets on the way! Thank you to my co-authors: Saurabh Bhati, @edsonroteia, Samuel Thomas, @HildeKuehne, @RogerioFeris, James Glass. Collaboration through @MITIBMLab

English

358

Andrew Rouditchenko 🇺🇦@arouditchenko·14 May

Granite-speech audio LLM from IBM. The level of data detail here is great especially comparing to ie. Whisper paper

arXiv Sound@ArxivSound

``Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities,'' George Saon, Avihu Dekel, Alexander Brooks, Tohru Nagano, Abraham Daniels, Aharon Satt, Ashish Mittal, Brian Kingsbury, David Haws, Edmilson Morais, Gakuto Kurata, Ha… ift.tt/QPsxkH2

English

410

Andrew Rouditchenko 🇺🇦 리트윗함