RJ Skerry-Ryan

1.1K posts

RJ Skerry-Ryan

RJ Skerry-Ryan

@rustyryan

🌮🤖 Speech and language modeling researcher. Principal SWE @ Google Deepmind. ♊🌊 Gemini Audio and Astra core team.

California, USA Katılım Temmuz 2007
1.3K Takip Edilen1.1K Takipçiler
Sabitlenmiş Tweet
RJ Skerry-Ryan
RJ Skerry-Ryan@rustyryan·
My team in GDM Frontier AI is hiring (Mountain View). If you're a researcher interested full duplex modeling, multimodal LLMs (Gemini), modality gap, joint speech/text modeling, PGMs, streamable generative models, and representation learning for language modeling -- DM me!
RJ Skerry-Ryan tweet media
English
1
11
105
17.6K
rohan anil
rohan anil@_arohan_·
There is no pre-training, post-training, or test-time training. There are only priors, updates, constraints, and compute budgets. There is only TRAINING. Last several years we shipped the org chart to fundamental optimization science.
English
22
35
539
66.7K
Skyzar🎴
Skyzar🎴@skyzarr_·
Bon ben j’ai fini Alan Wake 2 et c’était un des plus grand jeu que j’ai pu faire. Un des rares qui exploitent complètement les possibilités du médium et qui est enrobé d’une ambiance et d’un level design exceptionnel. Il est au JV ce que twin peaks est à la television
GIF
Français
8
8
84
3.2K
RJ Skerry-Ryan retweetledi
IEEE ICASSP
IEEE ICASSP@ieeeICASSP·
Dr. Tara Sainath, Distinguished Research Scientist, Google DeepMind, presents the first Industry Keynote: “Audio Processing with Large Language Models”
IEEE ICASSP tweet media
English
1
5
28
1.5K
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Google is the best company in the world
Logan Kilpatrick tweet media
English
151
135
2.8K
244.4K
RJ Skerry-Ryan retweetledi
Raphael Pisoni
Raphael Pisoni@ml_4rtemi5·
I never really considered how dangerous QK-norm actually is before working on RBF Attention. While solving some obvious issues, it can be the cause of some much less obvious ones.🧵
English
2
27
268
40.9K
RJ Skerry-Ryan retweetledi
Demis Hassabis
Demis Hassabis@demishassabis·
Our most expressive and steerable TTS model yet! Designed to give builders granular control over AI-generated speech, Gemini 3.1 Flash TTS is really fun to play with! Available in preview today - for devs via the Gemini API & @GoogleAIStudio + for enterprises on Vertex AI
Logan Kilpatrick@OfficialLoganK

Introducing Gemini 3.1 Flash TTS 🗣️, our latest text to speech model with scene direction, speaker level specificity, audio tags, more natural + expressive voices, and support for 70 different languages. Available via our new audio playground in AI Studio and in the Gemini API!

English
80
136
1.5K
141.6K
RJ Skerry-Ryan retweetledi
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
Google’s new Gemini 3.1 Flash TTS ranks #2 on the Artificial Analysis Speech Arena Leaderboard, ahead of ElevenLabs’ Eleven v3 and only behind Inworld TTS 1.5 Max Gemini 3.1 Flash TTS represents a significant step forward for Google from previous TTS models, with notably increased naturalness of speech samples. The model now ranks just 4 Elo points behind the leading model on the Speech Arena, the tightest margin at the top of the leaderboard. Key takeaways: ➤ Quality: Gemini 3.1 Flash TTS has an Elo of 1,211 based on over 1.7k arena appearances, placing it just 4 points behind the leading model (Inworld TTS 1.5 Max at 1,215) and 32 points ahead of Eleven v3 at 1,179 ➤ Pricing: Model's Standard pricing is $36.6/1M characters, 3.7x more expensive than Inworld TTS 1.5 Max ($10/1M chars) but 4.7x cheaper than Eleven v3 ($172/1M chars). Expect to be lower for Batch pricing ➤ Speed: Model generation speed is 27.4 characters per second, compared to 138 chars/s for Inworld TTS 1.5 Max and 38.8 chars/s for Eleven v3 ➤ Prompting: Features the ability to generate voices based on text prompting. Google's prompting strategy guide includes elements such as character persona, scene, style, pacing, and accent See more details and listen to samples below ⬇️
Artificial Analysis tweet media
English
8
25
355
29.4K
RJ Skerry-Ryan retweetledi
Google AI
Google AI@GoogleAI·
Today we launched Gemini 3.1 Flash TTS, our most expressive and controllable text-to-speech model yet. This launch [excitement] includes audio tags! 🗣🏷 Audio tags [explanatory] are a seamless way to guide vocal style, pace, and delivery using natural language commands embedded directly in your text. Want a different tempo or tone? [amazement] Just tag the audio to steer the AI-speech output! The model supports 70+ languages (24 of which are high-quality evaluated languages, including: Japanese, Hindi, and Arabic). Watch the audio tags in action in the demo below ↓
English
117
308
2.3K
200K
RJ Skerry-Ryan retweetledi
Sundar Pichai
Sundar Pichai@sundarpichai·
Introducing Gemini on Mac. It’s the first time we’re bringing the @Geminiapp to desktop. The team built this initial release with @Antigravity, and it went from an idea to a native Swift app prototype in a few days. More features on the way!
Sundar Pichai tweet media
English
530
853
11.5K
888.7K
RJ Skerry-Ryan retweetledi
Vilobh Meshram
Vilobh Meshram@vilobhmm·
Excited and thrilled to launch Gemini 3.1 Flash TTS 🚀🚀🚀 the latest text-to-speech model that delivers improved controllability, expressivity and quality with an impressive 1211 Quality ELO. Blog : blog.google/innovation-and….
English
0
4
11
1.1K
RJ Skerry-Ryan retweetledi
John Carmack
John Carmack@ID_AA_Carmack·
A Canticle For Leibowitz is a classic early (1959) post-apocalypse novel where an order of monks preserved the last remnants of learning (the memorabilia) after a nuclear exchange turned the remains of society into book and scientist burners. I first read it in the 80s as a mass market paperback that I somehow lost along the way. Other paperbacks from that time are yellow with age and getting brittle, but still readable. I read it again in the late 2000s on a first edition Kindle. I eventually migrated to iPads for Kindle reading, but every couple years I would come across an old Kindle in a drawer, charge it up, and check out what I had been reading on it. They eventually stopped working entirely. I’m just finishing reading a new Folio Society edition, printed on heavy, acid-free archival quality paper. If it doesn’t get soaked or burned, it could still be in good shape for centuries. The ephemeral nature of digital storage does give me some pause. We can still read Sumerian tablets full of administrative trivia from four thousand years ago, but there are no known copies of some important software products from just fifty years ago. I am a proud supporter of the Internet Archive!
John Carmack tweet mediaJohn Carmack tweet media
English
162
434
3.7K
229.4K
vik
vik@vikhyatk·
@giffmana completely as in, to avoid cuda malloc in the forward pass during inference
English
2
0
21
5.9K
vik
vik@vikhyatk·
i have completely abandoned functional programming. everything is stateful and has side effects
English
55
11
1.1K
87.2K
RJ Skerry-Ryan retweetledi
Demis Hassabis
Demis Hassabis@demishassabis·
@Steve_Yegge Maybe tell your buddy to do some actual work and to stop spreading absolute nonsense. This post is completely false and just pure clickbait.
English
299
456
13K
848.3K
rohan anil
rohan anil@_arohan_·
Don’t skip your determinism and numerics days. Interleave kernel days.
English
3
0
61
5.3K
Steve Yegge
Steve Yegge@Steve_Yegge·
I was chatting with my buddy at Google, who's been a tech director there for about 20 years, about their AI adoption. Craziest convo I've had all year. The TL;DR is that Google engineering appears to have the same AI adoption footprint as John Deere, the tractor company. Most of the industry has the same internal adoption curve: 20% agentic power users, 20% outright refusers, 60% still using Cursor or equivalent chat tool. It turns out Google has this curve too. But why is Google so... average? How is it that a handful of companies are taking off like a spaceship, and the rest, including Google, are mired in inaction? My buddy's observation was key here: There has been an industry-wide hiring freeze for 18+ months, during which time nobody has been moving jobs. So there are no clued-in people coming in from the outside to tell Google how far behind they are, how utterly mediocre they have become as an eng org. He says the problem is that they can't use Claude Code because it's the enemy, and Gemini has never been good enough to capture people's workflows like Claude has, so basically agentic coding just never really took off inside Google. They're all just plodding along, completely oblivious to what's happening out there right now. Not only is Google not able to do anything about it, they don't seem to be aware of the problem at all. I'm having major flashbacks to fifty years ago as a kid at the La Brea Tar Pits, asking, "why can't they just climb out?" My Google friend and I had this conversation over a month ago. I didn't share it because I wanted to look around a bit, and see if it's really as bad as all that. I've been talking to people from dozens of companies since then. And yeah. It's as bad as all that. Google is about average. Some companies at the bottom have near-zero AI adoption and can't even get budget for AI. They may have moats and high walls, but the horde is coming for them all the same. And then there are a few companies I've met recently who are *amazingly* leaned in to AI adoption. One category-leader company just cancelled IntelliJ for a thousand engineers. That's an incredibly bold move, one of many they're making towards agentic adoption. In my opinion, that company is setting themselves up for a _huge_ W. As for the rest, well, it's the Great Siloing. Everyone's flying blind. With nobody moving companies, no company knows where they stand on the AI adoption curve. Nobody knows how they're doing compared to everyone else. Half of them just check a box: "We enabled {Copilot/Cursor} for everyone!" Cue smug celebrations. They think this is like getting SOC2 compliance, just a thing they turn on and now it's "solved." And they don't realize that they've done effectively nothing at all. All because of a hiring freeze.
English
537
471
5.4K
2.8M
RJ Skerry-Ryan retweetledi
Nat McAleese
Nat McAleese@__nmca__·
A full-scale US Waymo rollout would cost ~700 full-time jobs in the funeral care industry (by saving around 35 thousand young American lives per year). Will no one think of (some of) the morticians!
English
67
534
8.2K
238.8K