Sreyan Ghosh

406 posts

Sreyan Ghosh

@SreyanG

Research @ Google DeepMind | Audio Intelligence

College Park, MD Katılım Haziran 2014

318 Takip Edilen340 Takipçiler

Sabitlenmiş Tweet

Sreyan Ghosh@SreyanG·15 Tem

We at @nvidia and @gammaumd are excited to release Audio Flamingo 3, the most powerful, open, and capable large audio-language model to date! Paper: arxiv.org/abs/2507.08128 Open-source model, code, and data: research.nvidia.com/labs/adlr/AF3/ Try it out here: huggingface.co/spaces/nvidia/…

English

1.6K

Sreyan Ghosh retweetledi

SALMA Workshop Chairs@SALMAworkshop26·2d

📢 Call for Papers is out - salma-workshop.github.io/salma-2026/ We invite submissions to SALMA 2026: Speech and Audio Language Models Workshop, co-located with EMNLP 2026.🎙️ 🗓️ Direct Submission: July 27, 2026 ARR Commitment: August 26, 2026 Please spread the word and consider submitting!

English

1.2K

Sreyan Ghosh retweetledi

SALMA Workshop Chairs@SALMAworkshop26·2d

Excited to announce SALMA 2026: Speech and Audio Language Models Workshop, co-located with EMNLP 2026 in Budapest, Hungary! 🎙️

English

589

Sreyan Ghosh retweetledi

Huck Yang@huckiyang·5d

No more endless chasing AudioLM updates! 🏹🔊 Speech-Hands Agent has been accepted to @aclmeeting for oral & open-sourced, with a 5.69% WER as Voice-Agent and 77.37% Acc. on @DCASE_Challenge 25 AudioQA dev-set. Check our demo and open-claw PR 🦞arxiv.org/pdf/2601.09413 a joint work with Zhen Wan @ZhehuaiC @leoyerrrr @goelarushi27 @MXzBFhjFpS1jyMI @shinjiw_at_cmu @SreyanG @RafaelValleArt @RHachiuma @hirota_yusuke et al.

English

446

Sreyan Ghosh retweetledi

Pratyush Kumar@pratykumar·19 May

Speaking tomorrow at Stanford about the opportunity to build deep tech in India. If you want to train models, build products, create population scale impact, or are just curious what we are up to then RSVP and show up - sarvam.ai/events/stanfor…

English

842

107.6K

Sreyan Ghosh retweetledi

Pushmeet Kohli@pushmeet·8 May

The future of Math is mathematicians and AI agents working together. Very pleased to introduce @GoogleDeepMind's AI co-mathematician: a multi-agent system designed to actively collaborate with human experts on open-ended research mathematics. Mathematicians testing the agent across areas as diverse as group theory, Hamiltonian systems, and algebraic combinatorics have reported impressive results. In autonomous mode evaluation on the rigorous FrontierMath Tier 4 problems, AI co-mathematician scored an unprecedented 48% — a new high score among all AI systems evaluated.

English

172

372

2.6K

312.5K

Sreyan Ghosh retweetledi

IEEE ICASSP@ieeeICASSP·6 May

Dr. Tara Sainath, Distinguished Research Scientist, Google DeepMind, presents the first Industry Keynote: “Audio Processing with Large Language Models”

English

1.6K

Sreyan Ghosh retweetledi

Google DeepMind@GoogleDeepMind·6 May

We’re partnering with the developers of @EveOnline to explore the next frontier of AI research in games. EVE's complex, player-driven universe is the perfect safe sandbox to test agents on memory, continual learning, and long-term planning. Find out more → goo.gle/4epQIdy

English

128

228

1.9K

210.2K

Sreyan Ghosh retweetledi

steven@Tu7uruu·27 Nis

Today we launch smol-audio A collection of notebooks & scripts to build on cutting-edge local audio models ⚡️ Already in the cookbook: > Fine-tune Whisper / Parakeet / Voxtral / Granite Speech > Fine-tune Audio Flamingo 3 (full + LoRA) > Dialogue TTS with Dia-1.6B > Zero-shot

English

579

50.9K

Sreyan Ghosh retweetledi

DailyPapers@HuggingPapers·29 Nis

NVIDIA just released Nemotron 3 Nano Omni on Hugging Face A 30B parameter multimodal foundation model with native text, image, video, and audio understanding.

English

2.6K

Sreyan Ghosh retweetledi

OpenRouter@OpenRouter·28 Nis

NVIDIA Nemotron™ 3 Nano Omni is live on OpenRouter. An open 30B-A3B multimodal model for agentic workflows: text, image, video, and audio in → text out, with a 256k context window and efficient MoE architecture for computer use, documents, and AV reasoning.

English

207

13.6K

Sreyan Ghosh retweetledi

Unsloth AI@UnslothAI·28 Nis

NVIDIA releases Nemotron-3-Nano-Omni, a new 30B open multimodal MoE model. Nemotron-3-Nano-Omni-30B-A3B is the strongest omni model for its size and supports audio, video, image and text. Run on ~25GB RAM. GGUF: huggingface.co/unsloth/NVIDIA… Guide: unsloth.ai/docs/models/ne…

English

135

946

70.2K

Sreyan Ghosh retweetledi

vLLM@vllm_project·28 Nis

🎉 Congrats to @NVIDIAAI on Nemotron 3 Nano Omni — a 30B hybrid Transformer-Mamba MoE (3B active) that unifies vision, audio, video, and text in a single reasoning loop. 256K context, FP8 / NVFP4 quantization, open weights. Day-0 support in vLLM — tool calling, reasoning, and efficient video sampling for long-video workloads, verified on NVIDIA GPUs. 🔗 vllm.ai/blog/nemotron-… 🔗 recipes.vllm.ai/nvidia/Nemotro…

NVIDIA AI@NVIDIAAI

Meet Nemotron 3 Nano Omni 👋 Our latest addition to the Nemotron family is the highest efficiency, open multimodal model with leading accuracy. 30B parameters. 256K context length. 🧵👇

English

410

33.6K

Sreyan Ghosh retweetledi

NVIDIA@nvidia·28 Nis

x.com/i/article/2049…

ZXX

123

750

110.2K

Sreyan Ghosh retweetledi

Bryan Catanzaro@ctnzr·28 Nis

Today we're releasing Nemotron 3 Nano Omni. Audio, Video, Image, Text ➡️ Text Ask questions about all your data. Amazing efficiency powered by the Nemotron Hybrid SSM MoE architecture. State of the art multimodal intelligence.

English

352

26K

Sreyan Ghosh retweetledi

Piotr Żelasko@PiotrZelasko·28 Nis

Today we released Nemotron-3-Nano-Omni-30B-A3B - our first Omni model, with speech and audio understanding capabilities powered by parakeet-tdt-0.6b-v2 encoder. 🫡1st position on VoiceBench 🌏English only 🎙️5.95% WER on Open ASR Leaderboard 📽️Video+audio understanding

English

503

28.6K

Sreyan Ghosh@SreyanG·27 Nis

RT @huckiyang: Grateful to share four LM Post-training papers on Multimodal / Audio / Time-Series presented at @iclr_conf 2026, all open-s…

English

Sreyan Ghosh retweetledi

Google DeepMind@GoogleDeepMind·23 Nis

This is Decoupled DiLoCo: our new resilient and flexible way to train advanced AI models across multiple data centres. 🧵

GIF

English

171

1.2K

165.5K

Sreyan Ghosh retweetledi

arXiv Sound@ArxivSound·21 Nis

Vaibhavi Lokegaonkar, Aryan Vijay Bhosale, Vishnu Raj, Gouthaman KV, Ramani Duraiswami, Lie Lu, Sreyan Ghosh, Dinesh Manocha, "Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation," arxiv.org/abs/2604.17656

English

268

Sreyan Ghosh retweetledi

Google@Google·22 Nis

We’re introducing our eighth generation of TPUs. This time, we’re taking a dual chip approach: TPU 8t, optimized for training, and TPU 8i, optimized for inference. 💪TPU 8t achieves nearly three times the compute performance per pod over our previous generation, Ironwood. ⚡TPU

GIF

English

333

378

2.6K

255.4K

Sreyan Ghosh retweetledi

GAMMA UMD@gammaumd·22 Nis

Congratulations to Sreyan Ghosh @SreyanG on his PhD defense at UMD! 🎉 His dissertation, “Advancing Audio Processing in the Age of Large Language Models,” pushes forward audio, music, and long-form multimodal understanding through open models, large-scale data, new benchmarks, and temporally grounded reasoning. Really exciting work with broad impact on audio-language and audio-visual AI. Huge congratulations, Sreyan!

English

2.3K

Keşfet

@aclmeeting @DCASE_Challenge @ZhehuaiC @leoyerrrr @goelarushi27 @MXzBFhjFpS1jyMI @shinjiw_at_cmu @RafaelValleArt