Sreyan Ghosh

406 posts

Sreyan Ghosh banner
Sreyan Ghosh

Sreyan Ghosh

@SreyanG

Research @ Google DeepMind | Audio Intelligence

College Park, MD Katılım Haziran 2014
318 Takip Edilen340 Takipçiler
Sreyan Ghosh retweetledi
SALMA Workshop Chairs
SALMA Workshop Chairs@SALMAworkshop26·
📢 Call for Papers is out - salma-workshop.github.io/salma-2026/ We invite submissions to SALMA 2026: Speech and Audio Language Models Workshop, co-located with EMNLP 2026.🎙️ 🗓️ Direct Submission: July 27, 2026 ARR Commitment: August 26, 2026 Please spread the word and consider submitting!
SALMA Workshop Chairs tweet media
English
1
8
12
1.2K
Sreyan Ghosh retweetledi
SALMA Workshop Chairs
SALMA Workshop Chairs@SALMAworkshop26·
Excited to announce SALMA 2026: Speech and Audio Language Models Workshop, co-located with EMNLP 2026 in Budapest, Hungary! 🎙️
SALMA Workshop Chairs tweet media
English
1
2
2
589
Sreyan Ghosh retweetledi
Pratyush Kumar
Pratyush Kumar@pratykumar·
Speaking tomorrow at Stanford about the opportunity to build deep tech in India. If you want to train models, build products, create population scale impact, or are just curious what we are up to then RSVP and show up - sarvam.ai/events/stanfor…
Pratyush Kumar tweet media
English
31
86
842
107.6K
Sreyan Ghosh retweetledi
Pushmeet Kohli
Pushmeet Kohli@pushmeet·
The future of Math is mathematicians and AI agents working together. Very pleased to introduce @GoogleDeepMind's AI co-mathematician: a multi-agent system designed to actively collaborate with human experts on open-ended research mathematics. Mathematicians testing the agent across areas as diverse as group theory, Hamiltonian systems, and algebraic combinatorics have reported impressive results. In autonomous mode evaluation on the rigorous FrontierMath Tier 4 problems, AI co-mathematician scored an unprecedented 48% — a new high score among all AI systems evaluated.
Pushmeet Kohli tweet media
English
172
372
2.6K
312.5K
Sreyan Ghosh retweetledi
IEEE ICASSP
IEEE ICASSP@ieeeICASSP·
Dr. Tara Sainath, Distinguished Research Scientist, Google DeepMind, presents the first Industry Keynote: “Audio Processing with Large Language Models”
IEEE ICASSP tweet media
English
1
5
28
1.6K
Sreyan Ghosh retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
We’re partnering with the developers of @EveOnline to explore the next frontier of AI research in games. EVE's complex, player-driven universe is the perfect safe sandbox to test agents on memory, continual learning, and long-term planning. Find out more → goo.gle/4epQIdy
Google DeepMind tweet media
English
128
228
1.9K
210.2K
Sreyan Ghosh retweetledi
steven
steven@Tu7uruu·
Today we launch smol-audio A collection of notebooks & scripts to build on cutting-edge local audio models ⚡️ Already in the cookbook: > Fine-tune Whisper / Parakeet / Voxtral / Granite Speech > Fine-tune Audio Flamingo 3 (full + LoRA) > Dialogue TTS with Dia-1.6B > Zero-shot
steven tweet media
English
11
70
579
50.9K
Sreyan Ghosh retweetledi
DailyPapers
DailyPapers@HuggingPapers·
NVIDIA just released Nemotron 3 Nano Omni on Hugging Face A 30B parameter multimodal foundation model with native text, image, video, and audio understanding.
DailyPapers tweet media
English
1
12
51
2.6K
Sreyan Ghosh retweetledi
OpenRouter
OpenRouter@OpenRouter·
NVIDIA Nemotron™ 3 Nano Omni is live on OpenRouter. An open 30B-A3B multimodal model for agentic workflows: text, image, video, and audio in → text out, with a 256k context window and efficient MoE architecture for computer use, documents, and AV reasoning.
OpenRouter tweet media
English
8
23
207
13.6K
Sreyan Ghosh retweetledi
Unsloth AI
Unsloth AI@UnslothAI·
NVIDIA releases Nemotron-3-Nano-Omni, a new 30B open multimodal MoE model. Nemotron-3-Nano-Omni-30B-A3B is the strongest omni model for its size and supports audio, video, image and text. Run on ~25GB RAM. GGUF: huggingface.co/unsloth/NVIDIA… Guide: unsloth.ai/docs/models/ne…
Unsloth AI tweet media
English
44
135
946
70.2K
Sreyan Ghosh retweetledi
vLLM
vLLM@vllm_project·
🎉 Congrats to @NVIDIAAI on Nemotron 3 Nano Omni — a 30B hybrid Transformer-Mamba MoE (3B active) that unifies vision, audio, video, and text in a single reasoning loop. 256K context, FP8 / NVFP4 quantization, open weights. Day-0 support in vLLM — tool calling, reasoning, and efficient video sampling for long-video workloads, verified on NVIDIA GPUs. 🔗 vllm.ai/blog/nemotron-… 🔗 recipes.vllm.ai/nvidia/Nemotro…
vLLM tweet media
NVIDIA AI@NVIDIAAI

Meet Nemotron 3 Nano Omni 👋 Our latest addition to the Nemotron family is the highest efficiency, open multimodal model with leading accuracy. 30B parameters. 256K context length. 🧵👇

English
12
50
410
33.6K
Sreyan Ghosh retweetledi
Bryan Catanzaro
Bryan Catanzaro@ctnzr·
Today we're releasing Nemotron 3 Nano Omni. Audio, Video, Image, Text ➡️ Text Ask questions about all your data. Amazing efficiency powered by the Nemotron Hybrid SSM MoE architecture. State of the art multimodal intelligence.
Bryan Catanzaro tweet media
English
11
54
352
26K
Sreyan Ghosh retweetledi
Piotr Żelasko
Piotr Żelasko@PiotrZelasko·
Today we released Nemotron-3-Nano-Omni-30B-A3B - our first Omni model, with speech and audio understanding capabilities powered by parakeet-tdt-0.6b-v2 encoder. 🫡1st position on VoiceBench 🌏English only 🎙️5.95% WER on Open ASR Leaderboard 📽️Video+audio understanding
English
18
50
503
28.6K
Sreyan Ghosh
Sreyan Ghosh@SreyanG·
RT @huckiyang: Grateful to share four LM Post-training papers on Multimodal / Audio / Time-Series presented at @iclr_conf 2026, all open-s…
English
0
1
0
9
Sreyan Ghosh retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
This is Decoupled DiLoCo: our new resilient and flexible way to train advanced AI models across multiple data centres. 🧵
GIF
English
89
171
1.2K
165.5K
Sreyan Ghosh retweetledi
arXiv Sound
arXiv Sound@ArxivSound·
Vaibhavi Lokegaonkar, Aryan Vijay Bhosale, Vishnu Raj, Gouthaman KV, Ramani Duraiswami, Lie Lu, Sreyan Ghosh, Dinesh Manocha, "Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation," arxiv.org/abs/2604.17656
English
0
2
6
268
Sreyan Ghosh retweetledi
Google
Google@Google·
We’re introducing our eighth generation of TPUs. This time, we’re taking a dual chip approach: TPU 8t, optimized for training, and TPU 8i, optimized for inference. 💪TPU 8t achieves nearly three times the compute performance per pod over our previous generation, Ironwood. ⚡TPU
GIF
English
333
378
2.6K
255.4K
Sreyan Ghosh retweetledi
GAMMA UMD
GAMMA UMD@gammaumd·
Congratulations to Sreyan Ghosh @SreyanG on his PhD defense at UMD! 🎉 His dissertation, “Advancing Audio Processing in the Age of Large Language Models,” pushes forward audio, music, and long-form multimodal understanding through open models, large-scale data, new benchmarks, and temporally grounded reasoning. Really exciting work with broad impact on audio-language and audio-visual AI. Huge congratulations, Sreyan!
GAMMA UMD tweet media
English
0
1
27
2.3K