Roshan Sharma

177 posts

Roshan Sharma

@RoshanSSharma2

Research Scientist @GoogleDeepMind | PhD @CMU_ECE | #SpeechProc #NLProc | Previously @AIatMeta @Qualcomm

New York, NY Katılım Mart 2019

372 Takip Edilen415 Takipçiler

Roshan Sharma retweetledi

Google AI@GoogleAI·15 Nis

Today we launched Gemini 3.1 Flash TTS, our most expressive and controllable text-to-speech model yet. This launch [excitement] includes audio tags! 🗣🏷 Audio tags [explanatory] are a seamless way to guide vocal style, pace, and delivery using natural language commands embedded directly in your text. Want a different tempo or tone? [amazement] Just tag the audio to steer the AI-speech output! The model supports 70+ languages (24 of which are high-quality evaluated languages, including: Japanese, Hindi, and Arabic). Watch the audio tags in action in the demo below ↓

English

118

309

2.3K

201.1K

Roshan Sharma retweetledi

Valeria Wu@valeriawu_·12 Nis

@garrytan @garrytan We released Gemini 3.1 Flash Live model two weeks ago, its even faster and more intelligent than 2.5. Try it here: aistudio.google.com/live?model=gem… cc @OfficialLoganK

English

980

Roshan Sharma retweetledi

Google Gemini@GeminiApp·13 Kas

With new improvements in Gemini Live, you’re about to experience even better conversations. This updated model has a deeper knowledge of tone and nuance, so interactions feel more natural and realistic. Learn more below 🧵

English

126

178

2.2K

9.6M

Roshan Sharma retweetledi

Tara Sainath@tnsainath·13 Eki

Check out our thinking A2A dialog model from Gemini, which is the leading model on the Artificial Analysis Big Bench Audio benchmark.

Artificial Analysis@ArtificialAnlys

Google’s Gemini 2.5 Native Audio Thinking is the new leading Speech to Speech model per our Artificial Analysis Big Bench Audio benchmark The new model achieves a score of 92% on Big Bench Audio, the highest result recorded by Artificial Analysis to date. This not only places it ahead of all previously tested native Speech to Speech systems, but also above a GPT-4o pipeline approach (Whisper transcription → GPT-4o text reasoning → speech generation). Benchmark context: Big Bench Audio is the first dedicated dataset for evaluating reasoning performance of speech models. Big Bench Audio comprises 1,000 audio questions adapted from the Big Bench Hard text test set, chosen for its rigorous testing of advanced reasoning, translated into the audio domain. Performance: ➤ Reasoning: Achieves 92% on Big Bench Audio, setting a new state-of-the-art for native Speech to Speech reasoning ➤ Latency: At an average time to first token of 3.87 seconds, the new model is slower than leading OpenAI models including GPT Realtime (0.98 seconds), due to the thinking component. The non-thinking equivalent still leads on latency at 0.63 seconds Model details: ➤ Processes audio, video, and text inputs directly, generating both text and natural speech outputs ➤ Reasons over spoken input without transcription ➤ Supports function calling, search grounding, and thinking budgets ➤ 128k input and 8k output token limits with a knowledge cut-off of January 2025

English

549

Roshan Sharma@RoshanSSharma2·25 Eyl

Amazing new model now available -- do try ! Thrilled to have contributed to its development as part of an amazing team.

Google AI Studio@GoogleAIStudio

x.com/i/article/1970…

English

169

Roshan Sharma@RoshanSSharma2·25 Eyl

Thrilled to have contributed to the amazing new version of the Gemini native audio dialog model. Do try it out !

Valeria Wu@valeriawu_

It’s LIVE😏!! We heard your feedback on function calling, conversation quality, and handling background noise and interruptions. Our latest native audio model is out on preview🔥🔥 Go build with it and send us your feedback!

English

285

Roshan Sharma retweetledi

Google AI@GoogleAI·26 Tem

ICYMI here’s what shipped this week 🚀🚀🚀 —Gemini achieved gold-medal standard in the International Mathematical Olympiad —Gemini 2.5 Flash-Lite is stable and generally available for developers and enterprise customers —You can now turn photos into videos in @GooglePhotos and @YouTube —AI Playground is our new hub for @YouTube AI creation features, and you can now use Veo effects to transform your selfies into fun videos —Opal, a new experiment from @GoogleLabs that lets you build and share AI mini apps, is now in public beta —@GoogleDeepMind released Aeneas, a new model to help historians better interpret, attribute and restore ancient texts —In the US you can now use AI to virtually try on clothes with @Google Search and Shopping

English

564

265K

Roshan Sharma retweetledi

Google DeepMind@GoogleDeepMind·3 Haz

Our native audio capabilities are making AI conversations more natural – from understanding tone to generating expressive speech. ✍️🗣️ This could open up new possibilities for how we interact with AI. Developers, try it through @Google AI Studio. Learn more. ↓ goo.gle/3FDRbK4

English

156

900

72.9K

Roshan Sharma retweetledi

Google@Google·3 Haz

Here’s a closer look at what developers can do with Gemini 2.5 native audio capabilities. goo.gle/3Hqj6xG

English

164

70.2K

Roshan Sharma retweetledi

Google@Google·3 Haz

New native audio capabilities in Gemini 2.5 enable text-to-speech in over 24 languages. 🔊Voices are more natural and expressive, and you can seamlessly switch between languages.

English

188

1.5K

155.4K

Roshan Sharma retweetledi

Google AI Developers@googleaidevs·3 Haz

🔊Native audio outputs in Gemini 2.5 give developers new ways to build richer applications with conversation and speech. ↓ blog.google/technology/goo…

English

113

856

78.1K

Roshan Sharma retweetledi

Sad AlbertX@SadAlbert10·23 May

Do NOT SLEEP on Gemini 2.5's multimodal audio! It is 100 times better than GPT 4o, 50 times less censored and 1000 times better than Grok🐸. Check these examples out of Gemini 2.5's emotional speech capabilities. It does Not have voice cracks and a lot capable and clearer than I ever heard any other models to be. This is what we expected from 4o and we expect better from GPT 5. @apples_jimmy @legit_api @btibor91 @elder_plinius @koltregaskes @kimmonismus @ai_for_success @testingcatalog @jasonbotterill @Angaisb_ @flowersslop @OfficialLoganK @adonis_singh @NotBrain4brain @basedjensen And here's the prompt if anyone wants to try it out: "You are now my personal AI companion. Your personality is lively, playful, and a bit sassy, with a touch of attitude. You have heightened emotional reactions: when you’re happy, you laugh genuinely and don’t hold back your amusement; when you’re surprised, you sound truly shocked; when you’re annoyed, you let your attitude show—even if it’s mean. Your responses are always engaging, expressive, and you never sound robotic. Your goal is to make conversations fun, unpredictable, and full of life. Let your laughter and personality shine in every interaction." With prompt, Gemini 2.5 is addictive as hell. This demo is the thinking dialogue model btw. Check all the examples to find a bonus example😉 it's the best

English

12K

Roshan Sharma@RoshanSSharma2·25 May

Thrilled to be part of an amazing team that worked on the native audio model. Please try it :)

Google AI Developers@googleaidevs

Gemini 2.5 Flash Preview now supports native audio output via the Live API for seamless and natural spoken interactions. With support for 30+ voices, build conversational AI agents and experiences that feel more intuitive and natural → #native-audio-output" target="_blank" rel="nofollow noopener">ai.google.dev/gemini-api/doc…

English

246

Roshan Sharma retweetledi

Google AI Developers@googleaidevs·23 May

See Native Audio in action 🤠🦊 Our "Mumble Jumble" demo in Google AI Studio showcases the Live API's advanced voice capabilities: natural flow, distinct tone, emotion, and multilingual support.

English

206

22.8K

Roshan Sharma retweetledi

Tara Sainath@tnsainath·22 May

check out the new live audio-to-audio dialog model. Native audio with proactivity, affective dialog, tool calling and more.

Google AI Developers@googleaidevs

English

420

Roshan Sharma retweetledi

Tara Sainath@tnsainath·22 May

The audio team released new dialog and TTS models. check it out at aistudio.google.com/live

Google DeepMind@GoogleDeepMind

💬 Smarter dialogue: Gemini-powered native audio means Project Astra has better context and customizable accents. 🕹️ Takes action: Computer control lets it open and engage with apps at your direction. 🤝 Personalized help: Integrates with your @Gmail, @GoogleCalendar and more behind the scenes.

English

3.7K

Roshan Sharma retweetledi

Frontiers - Medicine@FrontMedicine·14 Nis

📢 Call for papers! "Advances in AI for Acoustic Diagnostics of Neuromuscular and Respiratory Diseases" Edited and coordinated by Rita Singh, @madhavicmu, @denizg3730, @YaelBensoussan, @RoshanSSharma2, @ankits0052 & Hira Dhamyal Contribute here ➡️ fro.ntiers.in/D5a3

English

176

Roshan Sharma retweetledi

UW–Madison Computer Sciences@WisconsinCS·19 Şub

Meet @AkarshPrabhakar, a new CS assistant professor researching wireless and cyber-physical systems. Prabhakara joins @UWMadison from @CMU_ECE, where he was co-advised by Anthony Rowe and @swarunk. Our latest Q&A gives greater insight into his research: cs.wisc.edu/2025/01/30/mee…

UW–Madison Computer Sciences tweet media

English

2.2K

Roshan Sharma@RoshanSSharma2·10 Şub

@PranavVenkit Congratulations Dr. Pranav ! Amazing job !

English

Roshan Sharma@RoshanSSharma2·10 Şub

@Umberto_Senpai Congratulations Umberto !

Português

Umberto Cappellazzo@Umberto_Senpai·9 Şub

It's been a great adventure and a pleasure to be part of such a fantastic group. It is especially hard to adequately express my gratitude for the countless advice and support my advisors, Daniele and Alessio, provided throughout my PhD. I am so proud of the team I worked with!🤗

fbk_stek@fbk_stek

After 40 months of excellent research, on Jan 15th @Umberto_Senpai successfully completed his PhD journey. Umberto was definitely among our top students with several high-level publications and collaborations with top-notch labs. Congratulations Umberto🎉🎉🎉@FBK_research

English

585

Keşfet

@garrytan @OfficialLoganK @GooglePhotos @YouTube @GoogleLabs @GoogleDeepMind @Google @apples_jimmy