Anurag Kumar

213 posts

Anurag Kumar banner
Anurag Kumar

Anurag Kumar

@AcouIntel

Research Scientist, @GoogleDeepMind | Prev: @AIatMeta | CMU @SCSatCMU | @IITKanpur | Audio/Speech, Multimodal AI

Cambridge, MA انضم Haziran 2016
290 يتبع2.1K المتابعون
تغريدة مثبتة
Anurag Kumar
Anurag Kumar@AcouIntel·
Looking forward to @NeurIPSConf #NeurIPS2024 next week, I am there from Dec 11th-15th. Join our Audio Imagination Workshop on Dec 14th for engaging discussions on all things in audio generation space. We have an exciting list of papers and speakers. audio-imagination.com
Anurag Kumar tweet media
English
2
0
15
9.1K
Anurag Kumar
Anurag Kumar@AcouIntel·
We are looking for reviewers for @ieeeICASSP 2026 for AASP areas. We received quite a bit more papers this cycle. If you don't currently review for ICASSP please consider doing so. Fill out the form below docs.google.com/forms/d/e/1FAI…
English
1
5
7
3K
Anurag Kumar أُعيد تغريده
Google DeepMind
Google DeepMind@GoogleDeepMind·
An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵
Google DeepMind tweet media
English
152
705
4.3K
1.1M
Anurag Kumar
Anurag Kumar@AcouIntel·
(2) XRIR: Hearing Anywhere in Any Environment. A key problem in neural RiR estimation has been cross-room generalization. We make an attempt to address this and introduce a large scale dataset ACOUSTICROOMS, with 300,000 high-fidelity RIRs simulated from 260 diverse rooms.
Anurag Kumar tweet mediaAnurag Kumar tweet media
English
1
0
0
133
Anurag Kumar
Anurag Kumar@AcouIntel·
Couple of papers at @CVPR #CVPR2025 (1) VisAH: Learning to Highlight Audio by Watching Movies. How do you transform a poorly mixed audio into a well-balanced audio ? VisAH learns to leverage visual cues by training from movies which naturally provides the required supervision
Anurag Kumar tweet mediaAnurag Kumar tweet media
English
1
0
11
604
Anurag Kumar أُعيد تغريده
Nando de Freitas
Nando de Freitas@NandoDF·
RL is not all you need, nor attention nor Bayesianism nor free energy minimisation, nor an age of first person experience. Such statements are propaganda. You need thousands of people working hard on data pipelines, scaling infrastructure, HPC, apps with feedback to drive benchmarks and data, tons of research and engineering on generative models, data mixtures, ablations, RL/selftraining, etc etc and we will probably need lots of people working hard to figure out safety, causal world models, awareness, models that create abstractions comparable to infinity and zero and use these to predict the existence of things like black holes and suggest experiments to verify such hypothesis, or come up with novel engineering designs to generate energy more efficiently, robotics, etc etc. It takes thousands of people and many ideas. In the end some simple ideas might become obvious but such obviousness only happens in retrospect. Yes, there is a bitter lesson but if we had followed it, we’d still be doing linear regression with RL. Let’s not oversimplify, but rather honour the research and engineering of thousands of people. Also, people keep rewriting history. When our language understanding start up (darkbluelabs) was acquired by Google about 10 years ago, we joined DeepMind, where the AGI documents were all about concepts, RL, episodic memories and made it clear that there was no room for language. To be honest, back then such a position wasn’t so crazy. Now it seems silly, but only because of the benefit of hindsight. There’s no 1 or 10 heroes in the history of AI. There’s many 1000s of hard working students, profs, engineers, operations and support people, product folks, managers, even hedge funds among others. Let’s honour the whole community and not just ceos or the philosophers of Bayes, RL, deep learning, etc. I look forward to learning from the next generation and seeing what they will achieve. To them: Don’t buy the existing narratives blindly, innovate. Remember that just like mathematics, AI will advance one grave at the time.
English
30
194
1.4K
114.1K
Anurag Kumar
Anurag Kumar@AcouIntel·
(2) Reexamining the Efficacy of MetricGAN for Speech Enhancement. Led by @realHaibinWu. Showcases some crucial limitations of MetricGAN, and proposes some training tricks to address. (already presented, but check out the paper) tinyurl.com/y8yxde5r (3/3)
Anurag Kumar tweet media
English
0
0
0
207
Anurag Kumar
Anurag Kumar@AcouIntel·
(1) Advancing Active Speaker Detection for Egocentric Videos. Led by @huh_jaesung. SOTA for active speaker detection in challenging ego-centric videos. Session: Machine learning for multimodal data I Apr 11: 11:30 am - 1:00 pm. tinyurl.com/3pr959xa (2/3)
Anurag Kumar tweet media
English
1
0
2
248
Anurag Kumar
Anurag Kumar@AcouIntel·
@ieeeICASSP is finally happening at a place for which I don’t need a visa to travel 😀, but not able to attend this year #ICASSP2025. If you are there, check out these two papers I co-authored. (1/3)
English
1
0
5
336
Anurag Kumar
Anurag Kumar@AcouIntel·
Career Update: Excited to join Google Deepmind @GoogleDeepMind to continue working on audio/speech/multimodal AI. I left Meta @Meta after more than 6 years and I will definitely miss working with some amazing friends and colleagues. Super thankful for all the fun collaborations.
English
45
26
1.6K
107.9K
Anurag Kumar أُعيد تغريده
Shrestha Mohanty
Shrestha Mohanty@shremoha·
So happy to share that our work has been accepted to @SIGIRConf. Thank you to my amazing collaborators! @NegarEmpr, Andrea Tupini, Yuxuan Sun, @Tviskaron, @artemZholus, @Cote_Marc and @julia_kiseleva Pre-print: arxiv.org/pdf/2407.08898
Negar Arabzadeh@NegarEmpr

What a way to wrap up @IgluContest! Our paper “IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents” accepted to @SIGIRConf including: 1) rich multi-modal dataset 2) A data collection tool 3) An online eval framework #SIGIR2025

English
2
2
12
6.2K
Anurag Kumar أُعيد تغريده
arXiv Sound
arXiv Sound@ArxivSound·
``Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment,'' Joanna Hong, Sanjeel Parekh, Honglie Chen, Jacob Donley, Ke Tan, Buye Xu, Anurag Kumar, ift.tt/5JkZ0Gp
English
0
2
8
2.3K
Anurag Kumar
Anurag Kumar@AcouIntel·
The paper explores how LLMs can be used to effectively contextualize excerpts from conversations to improve understandability, readability, and other factors and reduce misinterpretations.
English
0
0
2
822
Anurag Kumar أُعيد تغريده
Shrestha Mohanty
Shrestha Mohanty@shremoha·
Excited to share our work at @coling2025! While I couldn’t attend in person, @jad_kabbara will be presenting today at the 1:30 PM poster session. Come by to learn how we’re using LLMs to improve understanding in social conversations! #COLING2025 #NLProc
Shrestha Mohanty tweet media
English
2
4
17
1.7K
Anurag Kumar أُعيد تغريده
arXiv Sound
arXiv Sound@ArxivSound·
``SyncFlow: Toward Temporally Aligned Joint Audio-Video Generation from Text,'' Haohe Liu, Gael Le Lan, Xinhao Mei, Zhaoheng Ni, Anurag Kumar, Varun Nagaraja, Wenwu Wang, Mark D. Plumbley, Yangyang Shi, Vikas Chandra, ift.tt/sxluwgt
Deutsch
0
3
12
1.6K
Anurag Kumar
Anurag Kumar@AcouIntel·
It was exciting to see the amazing turnout at our Audio Imagination Workshop @NeurIPSConf #NeurIPS2024. Grateful to everyone, invited speakers, panelists, authors and participants for the interesting presentations, discussions, and engagement. audio-imagination.com
Anurag Kumar tweet mediaAnurag Kumar tweet mediaAnurag Kumar tweet media
English
2
2
30
2K