
Adrià Recasens
4.4K posts

Adrià Recasens
@arecasens
👨💻 Research Scientist @DeepMind 👀🔊 Multimodal 🗣️ Views are on my own


Say hello to Gemini 3.1 Flash Live. 🗣️ Our latest audio model delivers more natural conversations with improved function calling – making it more useful and informed. Here’s what’s new 🧵






See Native Audio in action 🤠🦊 Our "Mumble Jumble" demo in Google AI Studio showcases the Live API's advanced voice capabilities: natural flow, distinct tone, emotion, and multilingual support.

Gemini 2.5 Flash Preview now supports native audio output via the Live API for seamless and natural spoken interactions. With support for 30+ voices, build conversational AI agents and experiences that feel more intuitive and natural → #native-audio-output" target="_blank" rel="nofollow noopener">ai.google.dev/gemini-api/doc…

This just in... the @NotebookLM hosts have some rather exciting news they'd like to share with you all:

This just in... the @NotebookLM hosts have some rather exciting news they'd like to share with you all:


🎉 ¡Todos los ganadores de los Premios Paréntesis 2024! 🏆 👤 Personaje del Año: Sam Altman. Español del Año: Mateo Valero, director del BSC. 🚀 Mejor Emprendedora: Anna Giralt, de Artefacto. Catalán del Año: Adrià Recasens Continente. 📩 Detalles: bsniu.r.ag.d.sendibm3.com/mk/mr/sh/OycXx…

I’ve been exploring Gemini 2.0’s new native audio output capability, which is available for early testers. I’m a developer at Google Creative Lab, and wanted to share one of my favorite experiments so far called ✨ VoiceCursor (🔊 sound on for video) Unlike traditional TTS, native audio lets you prompt the model with expressive styles, ie “Say this like a disgruntled pirate…” So I made ✨VoiceCursor… it lets you rapidly try different prompts. Just type, highlight your phrase, then hear it spoken in different ways! My code is open-sourced here: github.com/googlecreative… Here’s a thread 🧵


Prompt: "Bear writing the solution to 2x-1=0. But only the solution!"



Interested in working on Gemini pre-training? I'm hiring a research scientist to work on pre-training data @GoogleDeepMind in London: boards.greenhouse.io/deepmind/jobs/… I am unfortunately not at #NeurIPS2024 but feel free to reach out to ask questions or see the team at the booth there!

Gemini 2.0 Flash comes with native audio output, and it’s actually wild 🤯 we are working hard to roll this out quickly to more folks!


