Andrew Rouditchenko 🇺🇦

224 posts

Andrew Rouditchenko 🇺🇦 banner
Andrew Rouditchenko 🇺🇦

Andrew Rouditchenko 🇺🇦

@arouditchenko

PhD student at MIT working on multi-modal and multilingual speech. I was an intern at @AIatMeta and @Apple MLR.

شامل ہوئے Aralık 2016
566 فالونگ467 فالوورز
پن کیا گیا ٹویٹ
Andrew Rouditchenko 🇺🇦
Andrew Rouditchenko 🇺🇦@arouditchenko·
Do you really need audio to fine-tune your Audio LLM? 🤔 Answer below: Introducing Omni-R1, a simple GRPO fine‑tuning method for Qwen2.5‑Omni on audio question answering. It sets new state‑of‑the‑art accuracies on the MMAU benchmark for Audio LLMs. arxiv.org/abs/2505.09439
English
3
34
148
8.8K
Andrew Rouditchenko 🇺🇦 ری ٹویٹ کیا
Umberto Cappellazzo
Umberto Cappellazzo@Umberto_Senpai·
How do AVSR models balance what they hear and what they see? Introducing Dr. SHAP-AV, the first large-scale Shapley-based analysis of modality contributions in audio-visual speech recognition. 6 sota models 2 benchmarks 3 analyses 🌐Project page: umbertocappellazzo.github.io/Dr-SHAP-AV/ 🧵👇
English
1
1
2
116
Andrew Rouditchenko 🇺🇦
Andrew Rouditchenko 🇺🇦@arouditchenko·
Do you really need audio to fine-tune your Audio LLM? 🤔 Answer below: Introducing Omni-R1, a simple GRPO fine‑tuning method for Qwen2.5‑Omni on audio question answering. It sets new state‑of‑the‑art accuracies on the MMAU benchmark for Audio LLMs. arxiv.org/abs/2505.09439
English
3
34
148
8.8K
Andrew Rouditchenko 🇺🇦 ری ٹویٹ کیا
Puyuan Peng
Puyuan Peng@PuyuanPeng·
𝐅𝐫𝐨𝐦 𝐮𝐧𝐞𝐦𝐩𝐥𝐨𝐲𝐚𝐛𝐥𝐞 𝐦𝐚𝐭𝐡 𝐮𝐧𝐝𝐞𝐫𝐠𝐫𝐚𝐝 → 𝐭𝐨 𝟗,𝟎𝟎𝟎 𝐆𝐢𝐭𝐇𝐮𝐛 𝐬𝐭𝐚𝐫𝐬 & 𝟒 𝐫𝐞𝐬𝐞𝐚𝐫𝐜𝐡 𝐬𝐜𝐢𝐞𝐧𝐭𝐢𝐬𝐭 𝐨𝐟𝐟𝐞𝐫𝐬 (𝐌𝐒𝐋, 𝐞𝐭𝐜.) 👉My journey of doing 𝐏𝐡𝐃 𝐢𝐧 𝐀𝐈: tinyurl.com/5n7b7v36
English
3
1
19
856
Anmol Gulati
Anmol Gulati@anmol01gulati·
Honored to receive the Most Influential Paper at Interspeech in last 5 years for Conformer — Test of Time Award at Interspeech. Conformer was my very first paper at Google Brain(2020) and is the de-facto speech encoder architecture in recognition systems worldwide. Story Time 🧵
Anmol Gulati tweet media
English
12
12
291
785.2K
Andrew Rouditchenko 🇺🇦 ری ٹویٹ کیا
Jiawei (Joe) Zhou
Jiawei (Joe) Zhou@jzhou_jz·
🎙️ Another #MultimodalAI workshop we are organizing—this one zeroes in on speech & language foundation models! 📚 Dive into #SpeechAI, audio, and language tech. Learn how to build foundation models and hear from both academia and industry experts. 🗓Sep 4–5, 2025 | @TTIC_Connect
Jiawei (Joe) Zhou tweet media
Shinji Watanabe@shinjiw_at_cmu

📢 Excited to announce our 2-day workshop on "Foundations of Speech and Audio Foundation Models" at TTI Chicago, happening September 4–5! 🔗 Info & registration: sites.google.com/view/speech-ai… 📝 Poster submissions welcome! Join us for talks, discussions, and community building!

English
1
2
23
4.2K
William Chen
William Chen@chenwanch1·
What is it with speech reviewers on openreview? In my past 3 submissions (EMNLP 24, ICML 25, EMNLP 25), I have gotten only 1 reply to a rebuttal, out of a total of 11 reviews. Very frustrating, esp since they ask for more results and analyses that take a lot of time/compute.
English
2
0
33
2.3K
Andrew Rouditchenko 🇺🇦 ری ٹویٹ کیا
Peyman Milanfar
Peyman Milanfar@docmilanfar·
If your PhD advisor dressed like this, you probably didn't use neural nets in your thesis
Peyman Milanfar tweet media
English
31
32
949
81.4K
Andrew Rouditchenko 🇺🇦 ری ٹویٹ کیا
yobibyte
yobibyte@y0b1byte·
Finally, after all these years of being mocked, ffmpeg enthusiasts win!
yobibyte tweet media
English
59
325
5.4K
356.3K
Andrew Rouditchenko 🇺🇦 ری ٹویٹ کیا
Heng-Jui Chang
Heng-Jui Chang@hjchang87·
💡Bridging speech, sound, & music representations with one universal model? We introduce USAD ✅ 📚 Distills knowledge from domain-specific SSL models 🎯 Matches expert models across speech/audio/music tasks 📄 arxiv.org/abs/2506.18843 🧑‍💻 huggingface.co/MIT-SLS/USAD-B…
Heng-Jui Chang tweet mediaHeng-Jui Chang tweet mediaHeng-Jui Chang tweet mediaHeng-Jui Chang tweet media
English
0
9
34
2K
Andrew Rouditchenko 🇺🇦
Andrew Rouditchenko 🇺🇦@arouditchenko·
Congrats to Edson for leading our Contrastive Audio-Visual Masked Autoencoders 2.0 Project (CAV-MAE Sync), accepted at #CVPR2025! Check out Edson's thread for more details ⬇️
Edson Araujo@edsonroteia

🚀 Excited to announce our #CVPR2025 paper: CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment! We introduce a simple yet effective method for improved audio-visual learning. 🔗 Project: edsonroteia.github.io/cav-mae-sync/ 🧵 (1/7)👇

English
0
0
6
368
Andrew Rouditchenko 🇺🇦
Andrew Rouditchenko 🇺🇦@arouditchenko·
Link the MMAU leaderboard (Massive Multi-Task Audio Understanding and Reasoning Benchmark) - it should hopefully be updated soon with Omni-R1 #leaderboard" target="_blank" rel="nofollow noopener">sakshi113.github.io/mmau_homepage/…
English
0
0
6
297
Andrew Rouditchenko 🇺🇦 ری ٹویٹ کیا
arXiv Sound
arXiv Sound@ArxivSound·
``Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities,'' George Saon, Avihu Dekel, Alexander Brooks, Tohru Nagano, Abraham Daniels, Aharon Satt, Ashish Mittal, Brian Kingsbury, David Haws, Edmilson Morais, Gakuto Kurata, Ha… ift.tt/QPsxkH2
English
0
1
14
1.5K