Sonal Kumar

99 posts

Sonal Kumar

Sonal Kumar

@realsonalkumar

CS PhD @ UMD

Katılım Kasım 2015
196 Takip Edilen51 Takipçiler
Sonal Kumar
Sonal Kumar@realsonalkumar·
As speech and audio language models rapidly evolve, we need stronger conversations around architectures, data, evaluation, and safety for audio-first AI. SALMA brings together researchers working across audio LLMs. Consider submitting or sharing with anyone in this space!
SALMA Workshop Chairs@SALMAworkshop26

📢 Call for Papers is out - salma-workshop.github.io/salma-2026/ We invite submissions to SALMA 2026: Speech and Audio Language Models Workshop, co-located with EMNLP 2026.🎙️ 🗓️ Direct Submission: July 27, 2026 ARR Commitment: August 26, 2026 Please spread the word and consider submitting!

English
0
0
1
169
Sonal Kumar retweetledi
Utkarsh Tyagi
Utkarsh Tyagi@utkarsh4430·
1/ New from @ScaleAILabs: Rubrics (a.k.a. checklists) have become the default reward interface for RL on open-ended tasks without final verifiable answers. But most rubric RL still relies on static aggregation: fixed human weights over criteria, summed into one scalar reward. We show that this conflates what should matter in the final answer with what can actually teach the current policy. arxiv.org/abs/2605.20164
Utkarsh Tyagi tweet media
English
2
21
73
8.2K
Sonal Kumar retweetledi
GAMMA UMD
GAMMA UMD@gammaumd·
Congratulations to all GAMMA members graduating this year! 🎓 We were happy to celebrate together at the GAMMA graduation gathering and see so many lab members, alumni, and friends come back for this special week. Wishing everyone the very best for the upcoming hooding ceremonies and their next chapter! 🎉
GAMMA UMD tweet media
English
0
4
17
712
Sonal Kumar retweetledi
Bryan Catanzaro
Bryan Catanzaro@ctnzr·
Today we're releasing Nemotron 3 Nano Omni. Audio, Video, Image, Text ➡️ Text Ask questions about all your data. Amazing efficiency powered by the Nemotron Hybrid SSM MoE architecture. State of the art multimodal intelligence.
Bryan Catanzaro tweet media
English
11
54
352
26K
Sonal Kumar retweetledi
DailyPapers
DailyPapers@HuggingPapers·
Audio Flamingo Next A next-generation open audio-language model by NVIDIA with 30-minute audio support and time-grounded reasoning. Trained on 1M+ hours, it outperforms larger models on speech, sound, and music understanding.
DailyPapers tweet media
English
1
17
96
5.7K
Sonal Kumar retweetledi
Justin Salamon
Justin Salamon@justin_salamon·
This is big. SOTA audio reasoning. SOTA video reasoning. SOTA audio captioning. SOTA sound event detection. Better than Gemini. Better than Qwen. TAC: Timestamped Audio Captioning 📑 paper: lnkd.in/getEz5xU 🌐 website with more demos: lnkd.in/gdw5TTuS
English
9
28
261
19.3K
Sonal Kumar
Sonal Kumar@realsonalkumar·
🧠 Describe-then-reason: A TAC(-V) and LLM (text-only reasoner) cascade that leverages these grounded captions to reason over video and audio. This approach beats native multimodal models on benchmarks like MMAR, MMAU-Pro, VideoHolmes, and Daily-Omni.
Sonal Kumar tweet media
English
1
0
2
41
Sonal Kumar
Sonal Kumar@realsonalkumar·
Excited to announce our most recent work TAC: Timestamped Audio Captioning, which captures every detail in the audio with precise timestamps. 🛠️ We propose a strategy to generate complex, polyphonic audio mixtures which enables precise temporal grounding for overlapping events.
English
1
0
7
105
Sonal Kumar
Sonal Kumar@realsonalkumar·
Excited to attend #NeurIPS2025 next week to present our work - Audio Flamingo 3. AF3 will be presented at Exhibit Hall C/D/E, booth #1903 on Thursday, December 4, from 4:30 p.m. PST to 7:30 p.m. PST. If you are attending, let's connect and discuss Audio Intelligence.
Sonal Kumar tweet media
English
0
0
2
75
Sonal Kumar retweetledi
UMD Science
UMD Science@UMDscience·
To our graduate students—we are so proud of you. You worked diligently to become experts in your fields. You made critical discoveries. You mentored the next generation of scientists in and out of the classroom. Welcome to our community of 60,000 #ScienceTerp alumni! #UMDgrad
UMD Science tweet mediaUMD Science tweet mediaUMD Science tweet mediaUMD Science tweet media
English
0
8
24
1K
Sonal Kumar retweetledi
Huck Yang
Huck Yang@huckiyang·
🎶Join the first-ever global Audio QA and content Reasoning 🔊🕵️ challenge task 5 in @DCASE_Challenge 2025 - by June 15, 2025 with your own AudioLMs / Multimodal LMs! a joint work w/ @NVIDIAAI @umdcs @Adobe @SeoulNatlUni @StudyatUSTC @SreyanG @realsonalkumar @ramani_d @dmanocha @ZhifengKong @urinieto @gunheekim @RafaelValleArt @ctnzr 👩‍💻 Multi-domain Audio QA Dataset: huggingface.co/datasets/Peace… 📗 DCASE-25-Task-5 Arxiv Report: arxiv.org/pdf/2505.07365 🇪🇸 11th DCASE 2025: dcase.community/challenge2025/…
Huck Yang tweet media
English
0
4
36
1.6K
Sonal Kumar
Sonal Kumar@realsonalkumar·
Our team at @umdcs, in collaboration with @nvidia, @SeoulNatlUni and USTC, is excited to introduce Task 5: Audio Question Answering for the first time at DCASE. Focused on advancing audio understanding in LALMs. Explore the details and participate here: #audio-question-answering" target="_blank" rel="nofollow noopener">dcase.community/challenge2025/…
DCASE Challenge@DCASE_Challenge

📢 DCASE 2025 challenge is now officially launched! 🎉 You'll find more info about the tasks of the challenge website 👇 dcase.community/challenge2025/

English
0
2
5
397
Sonal Kumar
Sonal Kumar@realsonalkumar·
It’s wonderful to see big names using MMAU! Reported scores on MMAU test-mini looks great 😲, and it would give better insights to see performance on the full test set for a more comprehensive evaluation here: eval.ai/web/challenges…
arXiv Sound@ArxivSound

``Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering,'' Gang Li, Jizhong Liu, Heinrich Dinkel, Yadong Niu, Junbo Zhang, Jian Luan, ift.tt/iFSunDw

English
0
1
4
282
Sonal Kumar retweetledi
steven
steven@Tu7uruu·
🦩 NVIDIA just released Audio Flamingo 2, an audio model that understands non-speech sounds, non-verbal speech, and music, achieving state-of-the-art performance across over 20 benchmarks with only 3 billion parameters. > Excels in tasks like temporal reasoning, attribute identification, and contextual sound event analysis. > Capable of comprehending audio segments up to 5 minutes in length, enabling deeper analysis of extended content. > Outperforms larger proprietary models despite its smaller size, having been trained exclusively on public datasets. > Introduces AudioSkills for expert audio reasoning and LongAudio for long audio understanding, advancing the field of audio-language modeling.
steven tweet media
English
10
101
672
44.4K