Adrià Recasens

4.4K posts

Adrià Recasens banner
Adrià Recasens

Adrià Recasens

@arecasens

👨‍💻 Research Scientist @DeepMind 👀🔊 Multimodal 🗣️ Views are on my own

London, England Katılım Temmuz 2007
1.9K Takip Edilen1.5K Takipçiler
Sabitlenmiş Tweet
Adrià Recasens
Adrià Recasens@arecasens·
This April I will be running the @LLHalf in memory of my nephew Guifré and in support of the work that @tommys does for those who suffered baby loss. If you are interested on contributing, here is my fundraising page! llhm.tommys.org/fundraising/ar…
English
0
4
14
1.6K
Adrià Recasens retweetledi
Demis Hassabis
Demis Hassabis@demishassabis·
Gemini 3.1 Flash Live is our highest quality audio & voice model yet - and a big leap towards building next-gen voice-first agents. Lower latency, better precision, more natural interactions... try it now with Gemini Live in the @GeminiApp or build with it in @GoogleAIStudio!
Google DeepMind@GoogleDeepMind

Say hello to Gemini 3.1 Flash Live. 🗣️ Our latest audio model delivers more natural conversations with improved function calling – making it more useful and informed. Here’s what’s new 🧵

English
125
137
1.5K
286.9K
Oriol Vinyals
Oriol Vinyals@OriolVinyalsML·
On my way to Barcelona to receive a Doctor Honoris Causa from my alma mater, @la_UPC. Truly honored! 🎓 Join Thursday for my Master Class, "From AI to AGI: The Quest for True Intelligence." Hope to see you there! telecos.upc.edu/ca/esdevenimen… "Create an image at 41.4036° N, 2.1744° E, January 1st, 1983, 15:00 hours."
Oriol Vinyals tweet media
English
83
159
1.5K
254.1K
XIAOYU BIE
XIAOYU BIE@BieXiaoyu·
@arecasens @GoogleDeepMind Hi Adrià! I’m a postdoc working on generative AI for audio, and this role looks like a perfect fit with my background and interests. Mind if I DM you a couple of quick questions?
English
1
0
0
138
Adrià Recasens
Adrià Recasens@arecasens·
We are shipping! 🚢🚢🚢 Native audio output is now available in the Live API. Very natural interaction with the option of using Google Search or thinking for more refined answers. Try it out in aistudio.google.com/live and let us know what you think!
Google AI Developers@googleaidevs

Gemini 2.5 Flash Preview now supports native audio output via the Live API for seamless and natural spoken interactions. With support for 30+ voices, build conversational AI agents and experiences that feel more intuitive and natural → #native-audio-output" target="_blank" rel="nofollow noopener">ai.google.dev/gemini-api/doc…

English
0
3
11
1.4K
Adrià Recasens retweetledi
Ankur Bapna
Ankur Bapna@ankurbpn·
Happy to see the first feature powered by Gemini native audio outputs ship out to public - especially since it's MASSIVELY multilingual. Lots more coming soon 😉
NotebookLM@NotebookLM

This just in... the @NotebookLM hosts have some rather exciting news they'd like to share with you all:

English
17
27
330
20.8K
Adrià Recasens retweetledi
Oriol Vinyals
Oriol Vinyals@OriolVinyalsML·
Introducing Gemini 2.5 Pro Experimental! 🎉 Our newest Gemini model has stellar performance across math and science benchmarks. It’s an incredible model for coding and complex reasoning, and it’s #1 on the @lmarena_ai leaderboard by a drastic 40 ELO margin. Only a handful of model releases have leaped ahead so strongly in ELO. 📈 ELO score differences map directly to win rate: e.g. a 400 ELO difference yields a ~91% win rate. Incredible that since 1.5, just a year ago, we jumped 200 ELO (300 since 1.0). Here’s a fun example where Gemini 2.5 Pro writes code to create an animated swarm of colorful boids swimming in a rotating hexagon. 💫 Try the model for free today in AI Studio. It’s also available to Gemini Advanced users in @geminiapp. aistudio.google.com/app/prompts/ge… Blog: goo.gle/4c3NitO
English
49
162
1.1K
211.7K
Adrià Recasens retweetledi
Alexander Chen
Alexander Chen@alexanderchen·
New Gemini 2.0 modalities will enable entirely new interfaces! ✨ that's why I love this early experimentation my teammate @trudypainter is doing with native audio output in her VoiceCursor prototype. I've been playing with this UI and it really feels like a magical piece of paper coming to life. Excited for native audio to roll out more widely so more people can start experimenting.
Trudy Painter@trudypainter

I’ve been exploring Gemini 2.0’s new native audio output capability, which is available for early testers. I’m a developer at Google Creative Lab, and wanted to share one of my favorite experiments so far called ✨ VoiceCursor (🔊 sound on for video) Unlike traditional TTS, native audio lets you prompt the model with expressive styles, ie “Say this like a disgruntled pirate…” So I made ✨VoiceCursor… it lets you rapidly try different prompts. Just type, highlight your phrase, then hear it spoken in different ways! My code is open-sourced here: github.com/googlecreative… Here’s a thread 🧵

English
1
2
14
1.5K
Adrià Recasens retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
Gemini 2.0 Flash Experimental has the ability to produce native audio in a variety of styles and languages - all from scratch. 🗣️ Here’s how this is different to traditional text-to-speech systems ↓ aistudio.google.com/live
English
78
250
1.4K
125.7K
Adrià Recasens retweetledi
Antoine Yang
Antoine Yang@AntoineYang2·
Gemini 2.0 Flash's video understanding is here 🚀 Think: search in videos via timecodes, extract text from moving camera footage, analyze screen recordings in real-time interactions with native audio out 🔊 Come and try it aistudio.google.com 😀 youtu.be/Mot-JEU26GQ?si…
YouTube video
YouTube
English
2
10
82
8.6K
Adrià Recasens retweetledi
Joost van Amersfoort
Joost van Amersfoort@joost_v_amersf·
A very interesting opportunity to work at the intersection of data and scaling. Paul's insights have been crucial to the success of Gemini 2.0 flash (and 1.5 and 1.0 and... 👀). He makes for an excellent mentor/manager! Come help us push the frontier further 🦾.
Paul Michel@pmichelX

Interested in working on Gemini pre-training? I'm hiring a research scientist to work on pre-training data @GoogleDeepMind in London: boards.greenhouse.io/deepmind/jobs/… I am unfortunately not at #NeurIPS2024 but feel free to reach out to ask questions or see the team at the booth there!

English
0
4
21
2.8K
Adrià Recasens retweetledi
Alexander Chen
Alexander Chen@alexanderchen·
"Say this in a whisper ..." 💬 Native audio output was definitely my favorite Gemini 2.0 demo to make. Being able to steer the voice so expressively with just prompts felt totally new. 🙂x.com/googleaidevs/s…
English
3
12
33
5.8K
Adrià Recasens retweetledi
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Gemini 2.0 Flash comes with native audio output, and it’s actually wild 🤯 we are working hard to roll this out quickly to more folks!
English
144
196
1.9K
164.6K