Sonal Kumar

99 posts

Sonal Kumar

@realsonalkumar

CS PhD @ UMD

Katılım Kasım 2015

196 Takip Edilen51 Takipçiler

Sonal Kumar@realsonalkumar·1d

As speech and audio language models rapidly evolve, we need stronger conversations around architectures, data, evaluation, and safety for audio-first AI. SALMA brings together researchers working across audio LLMs. Consider submitting or sharing with anyone in this space!

SALMA Workshop Chairs@SALMAworkshop26

📢 Call for Papers is out - salma-workshop.github.io/salma-2026/ We invite submissions to SALMA 2026: Speech and Audio Language Models Workshop, co-located with EMNLP 2026.🎙️ 🗓️ Direct Submission: July 27, 2026 ARR Commitment: August 26, 2026 Please spread the word and consider submitting!

English

169

Sonal Kumar retweetledi

Utkarsh Tyagi@utkarsh4430·6d

1/ New from @ScaleAILabs: Rubrics (a.k.a. checklists) have become the default reward interface for RL on open-ended tasks without final verifiable answers. But most rubric RL still relies on static aggregation: fixed human weights over criteria, summed into one scalar reward. We show that this conflates what should matter in the final answer with what can actually teach the current policy. arxiv.org/abs/2605.20164

English

8.2K

Sonal Kumar retweetledi

GAMMA UMD@gammaumd·19 May

Congratulations to all GAMMA members graduating this year! 🎓 We were happy to celebrate together at the GAMMA graduation gathering and see so many lab members, alumni, and friends come back for this special week. Wishing everyone the very best for the upcoming hooding ceremonies and their next chapter! 🎉

English

712

Sonal Kumar retweetledi

Bryan Catanzaro@ctnzr·28 Nis

Today we're releasing Nemotron 3 Nano Omni. Audio, Video, Image, Text ➡️ Text Ask questions about all your data. Amazing efficiency powered by the Nemotron Hybrid SSM MoE architecture. State of the art multimodal intelligence.

English

352

26K

Sonal Kumar retweetledi

DailyPapers@HuggingPapers·19 Nis

Audio Flamingo Next A next-generation open audio-language model by NVIDIA with 30-minute audio support and time-grounded reasoning. Trained on 1M+ hours, it outperforms larger models on speech, sound, and music understanding.

English

5.7K

Sonal Kumar retweetledi

Jonah Casebeer@CasebeerJonah·24 Şub

GenAE: An audio autoencoder engineered for generative modeling. To appear at ICASSP 2026. w/ @__gzhu__ @zhepeiw03 @NicholasJBryan arXiv: arxiv.org/abs/2602.15749 Video: youtu.be/gDIIuLb0cf0

YouTube

English

12.3K

Sonal Kumar retweetledi

Justin Salamon@justin_salamon·18 Şub

This is big. SOTA audio reasoning. SOTA video reasoning. SOTA audio captioning. SOTA sound event detection. Better than Gemini. Better than Qwen. TAC: Timestamped Audio Captioning 📑 paper: lnkd.in/getEz5xU 🌐 website with more demos: lnkd.in/gdw5TTuS

English

261

19.3K

Sonal Kumar@realsonalkumar·19 Şub

More Demos: sonalkum.github.io/tacmodel/ Paper: arxiv.org/pdf/2602.15766 Special thanks to my mentors at Adobe - @pseetharaman @urinieto @justin_salamon and others who made this possible.

English

Sonal Kumar@realsonalkumar·19 Şub

🧠 Describe-then-reason: A TAC(-V) and LLM (text-only reasoner) cascade that leverages these grounded captions to reason over video and audio. This approach beats native multimodal models on benchmarks like MMAR, MMAU-Pro, VideoHolmes, and Daily-Omni.

English

Sonal Kumar@realsonalkumar·19 Şub

Excited to announce our most recent work TAC: Timestamped Audio Captioning, which captures every detail in the audio with precise timestamps. 🛠️ We propose a strategy to generate complex, polyphonic audio mixtures which enables precise temporal grounding for overlapping events.

English

105

Sonal Kumar@realsonalkumar·29 Kas

Excited to attend #NeurIPS2025 next week to present our work - Audio Flamingo 3. AF3 will be presented at Exhibit Hall C/D/E, booth #1903 on Thursday, December 4, from 4:30 p.m. PST to 7:30 p.m. PST. If you are attending, let's connect and discuss Audio Intelligence.

English

Sonal Kumar retweetledi

Audio and Speech Processing Papers@AudioAndSpeech·20 Ağu

MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence. arxiv.org/abs/2508.13992

English

371

Sonal Kumar retweetledi

UMD Science@UMDscience·22 May

To our graduate students—we are so proud of you. You worked diligently to become experts in your fields. You made critical discoveries. You mentored the next generation of scientists in and out of the classroom. Welcome to our community of 60,000 #ScienceTerp alumni! #UMDgrad

English

Sonal Kumar retweetledi

Huck Yang@huckiyang·13 May

🎶Join the first-ever global Audio QA and content Reasoning 🔊🕵️ challenge task 5 in @DCASE_Challenge 2025 - by June 15, 2025 with your own AudioLMs / Multimodal LMs! a joint work w/ @NVIDIAAI @umdcs @Adobe @SeoulNatlUni @StudyatUSTC @SreyanG @realsonalkumar @ramani_d @dmanocha @ZhifengKong @urinieto @gunheekim @RafaelValleArt @ctnzr 👩‍💻 Multi-domain Audio QA Dataset: huggingface.co/datasets/Peace… 📗 DCASE-25-Task-5 Arxiv Report: arxiv.org/pdf/2505.07365 🇪🇸 11th DCASE 2025: dcase.community/challenge2025/…

English

1.6K

Sonal Kumar@realsonalkumar·2 Nis

Our team at @umdcs, in collaboration with @nvidia, @SeoulNatlUni and USTC, is excited to introduce Task 5: Audio Question Answering for the first time at DCASE. Focused on advancing audio understanding in LALMs. Explore the details and participate here: #audio-question-answering" target="_blank" rel="nofollow noopener">dcase.community/challenge2025/…

DCASE Challenge@DCASE_Challenge

📢 DCASE 2025 challenge is now officially launched! 🎉 You'll find more info about the tasks of the challenge website 👇 dcase.community/challenge2025/

English

397

Sonal Kumar@realsonalkumar·20 Mar

@AravSrinivas @PPLXDevs What are your thoughts on giving some API credits to pro users? 😁

English

221

Aravind Srinivas@AravSrinivas·20 Mar

Update dropping tomorrow. Follow @PPLXDevs

English

284

33.7K

Sonal Kumar@realsonalkumar·20 Mar

It’s wonderful to see big names using MMAU! Reported scores on MMAU test-mini looks great 😲, and it would give better insights to see performance on the full test set for a more comprehensive evaluation here: eval.ai/web/challenges…

arXiv Sound@ArxivSound

``Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering,'' Gang Li, Jizhong Liu, Heinrich Dinkel, Yadong Niu, Junbo Zhang, Jian Luan, ift.tt/iFSunDw

English

282

Sonal Kumar retweetledi

steven@Tu7uruu·11 Mar

🦩 NVIDIA just released Audio Flamingo 2, an audio model that understands non-speech sounds, non-verbal speech, and music, achieving state-of-the-art performance across over 20 benchmarks with only 3 billion parameters. > Excels in tasks like temporal reasoning, attribute identification, and contextual sound event analysis. > Capable of comprehending audio segments up to 5 minutes in length, enabling deeper analysis of extended content. > Outperforms larger proprietary models despite its smaller size, having been trained exclusively on public datasets. > Introduces AudioSkills for expert audio reasoning and LongAudio for long audio understanding, advancing the field of audio-language modeling.

English

101

672

44.4K

Sonal Kumar@realsonalkumar·7 Mar

Our new work on Large Audio Language models which aims to not only push the boundaries of audio understanding but also enable reasoning on long audios. #ai #arxiv #ResearchPapers

arXiv Sound@ArxivSound

``Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities,'' Sreyan Ghosh, Zhifeng Kong, Sonal Kumar, S Sakshi, Jaehyeon Kim, Wei Ping, Rafael Valle, Dinesh Manocha, Bryan Catanzaro, ift.tt/KysoHqG

English

Keşfet

@ScaleAILabs @__gzhu__ @zhepeiw03 @NicholasJBryan @pseetharaman @urinieto @justin_salamon @DCASE_Challenge