Satvik Dixit

26 posts

Satvik Dixit

Satvik Dixit

@SatvikDixit9

Audio understanding and generation | Prev @CarnegieMellon @IITDelhi

SF Katılım Mart 2021
1.1K Takip Edilen148 Takipçiler
Satvik Dixit retweetledi
Thinking Machines
Thinking Machines@thinkymachines·
People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…
English
418
1.8K
14.5K
6.7M
Satvik Dixit retweetledi
arXiv Sound
arXiv Sound@ArxivSound·
Satvik Dixit, Koichi Saito, Zhi Zhong, Yuki Mitsufuji, Chris Donahue, "FoleyBench: A Benchmark For Video-to-Audio Models," arxiv.org/abs/2511.13219
Indonesia
0
2
7
752
Satvik Dixit
Satvik Dixit@SatvikDixit9·
Excited to be at WASPAA 2025!
Satvik Dixit tweet media
English
0
0
5
458
Satvik Dixit retweetledi
Albert Gu
Albert Gu@_albertgu·
I converted one of my favorite talks I've given over the past year into a blog post. "On the Tradeoffs of SSMs and Transformers" (or: tokens are bullshit) In a few days, we'll release what I believe is the next major advance for architectures.
Albert Gu tweet media
English
27
116
782
119.4K
Satvik Dixit retweetledi
Neil Zeghidour
Neil Zeghidour@neilzegh·
Thanks @GoogleAI 🙏, I'm proud to see concepts introduced in this paper (RVQ-VAE, quantizer dropout) being still as relevant four years later, and in particular how the RVQ turned out to be a perfect fit for audio language models.
Google AI@GoogleAI

Congratulations to Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, and Marco Tagliasacchi for winning the IEEE Best Paper Award for "SoundStream: An End-to-End Natural Audio Codec"! arxiv.org/abs/2107.03312 #SPSAwards #IEEEAwards

English
3
13
183
12.7K
Satvik Dixit
Satvik Dixit@SatvikDixit9·
Excited to be at #ICASSP2025! If you’re working on or interested in audio language models, feel free to reach out. Also, come by our poster on audio caption evaluation at the SALMA Workshop tomorrow at 4 PM.
Satvik Dixit tweet media
English
1
0
6
251
Satvik Dixit retweetledi
Neil Zeghidour
Neil Zeghidour@neilzegh·
Trimodal training (text-audio-img) is challenging because you a have a lot of unimodal data, some bimodal and few to none with all 3 modalities & combining them is not obvious. We propose a simple extension to Moshi that allows it to understand images.
kyutai@kyutai_labs

Meet MoshiVis🎙️🖼️, the first open-source real-time speech model that can talk about images! It sees, understands, and talks about images — naturally, and out loud. Voice interaction with a compact model endowed with visual understanding opens up new applications, from audio description for the visual impaired to visual access to information. Try it out 👉 vis.moshi.chat Blog post 👉 kyutai.org/moshivis

English
4
5
43
2.7K
Satvik Dixit retweetledi
𝚐𝔪𝟾𝚡𝚡𝟾
Mellow: a small audio language model for reasoning
𝚐𝔪𝟾𝚡𝚡𝟾 tweet media
English
2
22
100
8.2K
Satvik Dixit retweetledi
arXiv Sound
arXiv Sound@ArxivSound·
``Mellow: a small audio language model for reasoning,'' Soham Deshmukh, Satvik Dixit, Rita Singh, Bhiksha Raj, ift.tt/BFgXl2L
Indonesia
0
5
25
3.4K
Satvik Dixit retweetledi
Satvik Dixit
Satvik Dixit@SatvikDixit9·
Key highlights: 1. GPT-4o outperforms human experts on VSC 2. Few-shot learning improves performance significantly 3. Applications in using VLMs for audio-based tasks, like audio caption augmentation
English
1
0
1
149
Satvik Dixit
Satvik Dixit@SatvikDixit9·
Come check out our poster at NeurIPS Audio Imagination Workshop on "Vision Language Models Are Few-Shot Audio Spectrogram Classifiers". Unfortunately I couldn't travel to present, but look for our poster in poster presentation session 2 @ 4:15 PM Saturday.
Satvik Dixit tweet media
English
1
0
5
263