Andrew Rouditchenko ๐Ÿ‡บ๐Ÿ‡ฆ

224 posts

Andrew Rouditchenko ๐Ÿ‡บ๐Ÿ‡ฆ banner
Andrew Rouditchenko ๐Ÿ‡บ๐Ÿ‡ฆ

Andrew Rouditchenko ๐Ÿ‡บ๐Ÿ‡ฆ

@arouditchenko

PhD student at MIT working on multi-modal and multilingual speech. I was an intern at @AIatMeta and @Apple MLR.

๊ฐ€์ž…์ผ Aralฤฑk 2016
566 ํŒ”๋กœ์ž‰467 ํŒ”๋กœ์›Œ
๊ณ ์ •๋œ ํŠธ์œ—
Andrew Rouditchenko ๐Ÿ‡บ๐Ÿ‡ฆ
Do you really need audio to fine-tune your Audio LLM? ๐Ÿค” Answer below: Introducing Omni-R1, a simple GRPO fineโ€‘tuning method for Qwen2.5โ€‘Omni on audio question answering. It sets new stateโ€‘ofโ€‘theโ€‘art accuracies on the MMAU benchmark for Audio LLMs. arxiv.org/abs/2505.09439
English
3
34
148
8.8K
Andrew Rouditchenko ๐Ÿ‡บ๐Ÿ‡ฆ ๋ฆฌํŠธ์œ—ํ•จ
Umberto Cappellazzo
Umberto Cappellazzo@Umberto_Senpaiยท
How do AVSR models balance what they hear and what they see? Introducing Dr. SHAP-AV, the first large-scale Shapley-based analysis of modality contributions in audio-visual speech recognition. 6 sota models 2 benchmarks 3 analyses ๐ŸŒProject page: umbertocappellazzo.github.io/Dr-SHAP-AV/ ๐Ÿงต๐Ÿ‘‡
English
1
1
2
116
Andrew Rouditchenko ๐Ÿ‡บ๐Ÿ‡ฆ
Do you really need audio to fine-tune your Audio LLM? ๐Ÿค” Answer below: Introducing Omni-R1, a simple GRPO fineโ€‘tuning method for Qwen2.5โ€‘Omni on audio question answering. It sets new stateโ€‘ofโ€‘theโ€‘art accuracies on the MMAU benchmark for Audio LLMs. arxiv.org/abs/2505.09439
English
3
34
148
8.8K
Andrew Rouditchenko ๐Ÿ‡บ๐Ÿ‡ฆ ๋ฆฌํŠธ์œ—ํ•จ
Puyuan Peng
Puyuan Peng@PuyuanPengยท
๐…๐ซ๐จ๐ฆ ๐ฎ๐ง๐ž๐ฆ๐ฉ๐ฅ๐จ๐ฒ๐š๐›๐ฅ๐ž ๐ฆ๐š๐ญ๐ก ๐ฎ๐ง๐๐ž๐ซ๐ ๐ซ๐š๐ โ†’ ๐ญ๐จ ๐Ÿ—,๐ŸŽ๐ŸŽ๐ŸŽ ๐†๐ข๐ญ๐‡๐ฎ๐› ๐ฌ๐ญ๐š๐ซ๐ฌ & ๐Ÿ’ ๐ซ๐ž๐ฌ๐ž๐š๐ซ๐œ๐ก ๐ฌ๐œ๐ข๐ž๐ง๐ญ๐ข๐ฌ๐ญ ๐จ๐Ÿ๐Ÿ๐ž๐ซ๐ฌ (๐Œ๐’๐‹, ๐ž๐ญ๐œ.) ๐Ÿ‘‰My journey of doing ๐๐ก๐ƒ ๐ข๐ง ๐€๐ˆ: tinyurl.com/5n7b7v36
English
3
1
19
856
Anmol Gulati
Anmol Gulati@anmol01gulatiยท
Honored to receive the Most Influential Paper at Interspeech in last 5 years for Conformer โ€” Test of Time Award at Interspeech. Conformer was my very first paper at Google Brain(2020) and is the de-facto speech encoder architecture in recognition systems worldwide. Story Time ๐Ÿงต
Anmol Gulati tweet media
English
12
12
291
785.2K
Andrew Rouditchenko ๐Ÿ‡บ๐Ÿ‡ฆ ๋ฆฌํŠธ์œ—ํ•จ
Jiawei (Joe) Zhou
Jiawei (Joe) Zhou@jzhou_jzยท
๐ŸŽ™๏ธ Another #MultimodalAI workshop we are organizingโ€”this one zeroes in on speech & language foundation models! ๐Ÿ“š Dive into #SpeechAI, audio, and language tech. Learn how to build foundation models and hear from both academia and industry experts. ๐Ÿ—“Sep 4โ€“5, 2025 | @TTIC_Connect
Jiawei (Joe) Zhou tweet media
Shinji Watanabe@shinjiw_at_cmu

๐Ÿ“ข Excited to announce our 2-day workshop on "Foundations of Speech and Audio Foundation Models" at TTI Chicago, happening September 4โ€“5! ๐Ÿ”— Info & registration: sites.google.com/view/speech-aiโ€ฆ ๐Ÿ“ Poster submissions welcome! Join us for talks, discussions, and community building!

English
1
2
23
4.2K
William Chen
William Chen@chenwanch1ยท
What is it with speech reviewers on openreview? In my past 3 submissions (EMNLP 24, ICML 25, EMNLP 25), I have gotten only 1 reply to a rebuttal, out of a total of 11 reviews. Very frustrating, esp since they ask for more results and analyses that take a lot of time/compute.
English
2
0
33
2.3K
Andrew Rouditchenko ๐Ÿ‡บ๐Ÿ‡ฆ ๋ฆฌํŠธ์œ—ํ•จ
Peyman Milanfar
Peyman Milanfar@docmilanfarยท
If your PhD advisor dressed like this, you probably didn't use neural nets in your thesis
Peyman Milanfar tweet media
English
31
32
949
81.4K
Andrew Rouditchenko ๐Ÿ‡บ๐Ÿ‡ฆ ๋ฆฌํŠธ์œ—ํ•จ
yobibyte
yobibyte@y0b1byteยท
Finally, after all these years of being mocked, ffmpeg enthusiasts win!
yobibyte tweet media
English
59
325
5.4K
356.3K
Andrew Rouditchenko ๐Ÿ‡บ๐Ÿ‡ฆ ๋ฆฌํŠธ์œ—ํ•จ
Heng-Jui Chang
Heng-Jui Chang@hjchang87ยท
๐Ÿ’กBridging speech, sound, & music representations with one universal model? We introduce USAD โœ… ๐Ÿ“š Distills knowledge from domain-specific SSL models ๐ŸŽฏ Matches expert models across speech/audio/music tasks ๐Ÿ“„ arxiv.org/abs/2506.18843 ๐Ÿง‘โ€๐Ÿ’ป huggingface.co/MIT-SLS/USAD-Bโ€ฆ
Heng-Jui Chang tweet mediaHeng-Jui Chang tweet mediaHeng-Jui Chang tweet mediaHeng-Jui Chang tweet media
English
0
9
34
2K
Andrew Rouditchenko ๐Ÿ‡บ๐Ÿ‡ฆ
Congrats to Edson for leading our Contrastive Audio-Visual Masked Autoencoders 2.0 Project (CAV-MAE Sync), accepted at #CVPR2025! Check out Edson's thread for more details โฌ‡๏ธ
Edson Araujo@edsonroteia

๐Ÿš€ Excited to announce our #CVPR2025 paper: CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment! We introduce a simple yet effective method for improved audio-visual learning. ๐Ÿ”— Project: edsonroteia.github.io/cav-mae-sync/ ๐Ÿงต (1/7)๐Ÿ‘‡

English
0
0
6
368
Andrew Rouditchenko ๐Ÿ‡บ๐Ÿ‡ฆ
Link the MMAU leaderboard (Massive Multi-Task Audio Understanding and Reasoning Benchmark) - it should hopefully be updated soon with Omni-R1 #leaderboard" target="_blank" rel="nofollow noopener">sakshi113.github.io/mmau_homepage/โ€ฆ
English
0
0
6
297
Andrew Rouditchenko ๐Ÿ‡บ๐Ÿ‡ฆ ๋ฆฌํŠธ์œ—ํ•จ
arXiv Sound
arXiv Sound@ArxivSoundยท
``Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities,'' George Saon, Avihu Dekel, Alexander Brooks, Tohru Nagano, Abraham Daniels, Aharon Satt, Ashish Mittal, Brian Kingsbury, David Haws, Edmilson Morais, Gakuto Kurata, Haโ€ฆ ift.tt/QPsxkH2
English
0
1
14
1.5K