Ankur Bapna

547 posts

Ankur Bapna

Ankur Bapna

@ankurbpn

Conversational Audio models @Meta Previously Gemini Native Audio @GoogleDeepmind

Katılım Şubat 2014
656 Takip Edilen1.1K Takipçiler
Ankur Bapna retweetledi
rohan anil
rohan anil@_arohan_·
Two truths and a lie, Adam is unmatched at optimizing frontier-scale models. I’m starting a new lab to accelerate frontier AI and dedicating next phase of my life to make it successful! Deep learning has new axes for compute scaling.
English
17
7
207
18.1K
Ankur Bapna retweetledi
Jeff Dean
Jeff Dean@JeffDean·
We've been working on the Waxal dataset project since 2021, aiming to enhance the amount of data available for African languages. This public speech dataset initially covers 27 Sub-Saharan African languages spoken by over 100 million speakers across more than 26 countries. 🌍
English
71
274
1.6K
193.2K
Ankur Bapna retweetledi
Lewis Tunstall
Lewis Tunstall@_lewtun·
We trained a tiny 4B model to reason for millions of tokens through IMO-level problems. Heaps excited to share our new blog post covering the full pipeline, from distilling the 🐳 to augmenting RL with a reasoning cache that unlocks extreme inference-time scaling for theorem proving. huggingface.co/spaces/lm-prov…
Lewis Tunstall tweet media
English
24
128
824
160.5K
Ankur Bapna retweetledi
Shashwat Goel
Shashwat Goel@ShashwatGoel7·
New Blogpost: How to game the METR plot🚨 In 2025, a single graph changed AGI timelines, investments, research priorities, model quality assessments and much more. But if you squint harder, only 14 prompts shaped AI discourse over this year. Thats all the data in the 1-4 hour horizon length regime that matters. 🕵️ What's more? A majority of these are about Cybersecurity capture the flag contests, and training a Machine Learning model. > Post-train your model on CTF and ML codebases > profit 📈! its METR horizon length will increase. Exactly what OpenAI has been targeting in its Codex model releases... and is Anthropic underperforming in the 2-4hr range because it mostly consists of cybersecurity, which is dual-use for safety? To be clear, I think its an excellent idea to track horizon lengths instead of benchmark accuracy. But under the current modelling assumption of success probability being a logistic function of task length, SWAA+HCAST accuracy improvements alone might explain the exponential progress in horizon length 🔎 In the blog, I show detailed evidence for why we need to stop overindexing on the METR plot. Share it with anyone you see making decisions based on where the latest model lands on the METR plot. shash42.substack.com/p/how-to-game-…
Shashwat Goel tweet media
English
37
69
765
206.3K
Ankur Bapna retweetledi
Heiga Zen (全 炳河)
Heiga Zen (全 炳河)@heiga_zen·
Google AI Studioにて、最新の「Gemini 2.5 Flash / Pro TTS」モデルがプレビュー公開されました。 私自身やチームの仲間たちが開発に関わったプロダクトに、新しい成果が積み上げられ、進化し続けていることは、長年この分野を見てきた研究者、そしてGooglerとして非常に感慨深いです。 Gemini TTS は、音声合成における大きな課題であった「自然さ」と「制御性」のトレードオフを、極めて高い次元で解消しつつあります。 1. 文脈を理解した「間」と緩急:単にテキストを読み上げるのではなく、緊張感や安堵といった文脈をAIが理解し、自動的に、あるいは指示通りに話速を調整します。 2. 意図通りのスタイル表現:「陽気さ」や「シリアスさ」といった抽象的なトーンの指示に対し、Gemini TTSは驚くほど忠実に従います。 3. 複数話者の自然な対話:キャラクターの一貫性を保ちながら、スムーズな会話のキャッチボールを実現しています。 開発者の皆さん、ぜひAI Studioで新しいGemini TTSを体験してみてください。 x.com/GoogleAIStudio…
Google AI Studio@GoogleAIStudio

x.com/i/article/1998…

日本語
1
143
688
373.2K
Ankur Bapna retweetledi
Google AI Developers
Google AI Developers@googleaidevs·
We’re launching Gemini 2.5 Flash and Pro Text-to-Speech (TTS) model updates 🚀 Improvements include: - Emotional style and tone versatility - Context-aware pacing control - Improved multiple-speaker capabilities Dive into the blog to learn how these advancements are giving developers more control over speech generation. blog.google/technology/dev…
English
64
204
1.7K
264.9K
Ankur Bapna retweetledi
Chubby♨️
Chubby♨️@kimmonismus·
Google cooked so hard. Not gonna lie, this feels like the future is here. Now develop Google Glasses with enough battery power, a good chip, and a look like Ray-Bans, and you'll have an instant hit. 100%.
English
481
2.1K
17.4K
3.1M
Ankur Bapna retweetledi
Mostafa Dehghani
Mostafa Dehghani@m__dehghani·
Thinking (test-time compute) in pixel space... 🍌 Pro tip: always peek at the thoughts if you use AI Studio. Watching the model think in pictures is really fun!
Mostafa Dehghani tweet mediaMostafa Dehghani tweet mediaMostafa Dehghani tweet media
English
21
78
696
135.7K
Ankur Bapna retweetledi
Jeff Dean
Jeff Dean@JeffDean·
I’m really excited about our release of Gemini 3 today, the result of hard work by many, many people in the Gemini team and all across Google! 🎊 We’ve built many exciting new product experiences with it, as you’ll see today and in the coming weeks and months. You can find it today on @GeminiApp and AI Mode in Search. For developers, you can build with it now in @GoogleAIStudio and Vertex AI. blog.google/products/gemin… The model performs quite well on a wide range of benchmarks.
Jeff Dean tweet media
English
208
343
3.4K
398.5K
Ankur Bapna retweetledi
Google
Google@Google·
Gemini Live’s new model updates are now available on the @GeminiApp on Android and iOS. Conversations are more adaptive and expressive, opening up new ways to learn and practice skills. Here are five ways you can try out these new updates: -Tailor your learning. Ask Gemini to explain a topic in your lesson plan and then say, "Okay, speed up," to get a crash course on the way to your next class. -You can now get tailored practice when learning a new language. Ask Gemini to quiz you on multiples of 10 in Korean, or practice casual greetings in Spanish. This allows you to gain real-world speaking experience in a low-risk setting. -Practice for your next big moment, like job interviews, or prepare for tough conversations with Gemini's ability to respond to your situation. -Hear stories come to life. Try asking Gemini to tell you about the Roman empire from the perspective of Julius Caesar himself. -Liven things up by asking Gemini to speak in a fun accent, like a cowboy accent when brainstorming ideas for a rodeo-themed birthday party.
English
95
107
1.3K
218.7K
Ankur Bapna retweetledi
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
Google’s Gemini 2.5 Native Audio Thinking is the new leading Speech to Speech model per our Artificial Analysis Big Bench Audio benchmark The new model achieves a score of 92% on Big Bench Audio, the highest result recorded by Artificial Analysis to date. This not only places it ahead of all previously tested native Speech to Speech systems, but also above a GPT-4o pipeline approach (Whisper transcription → GPT-4o text reasoning → speech generation). Benchmark context: Big Bench Audio is the first dedicated dataset for evaluating reasoning performance of speech models. Big Bench Audio comprises 1,000 audio questions adapted from the Big Bench Hard text test set, chosen for its rigorous testing of advanced reasoning, translated into the audio domain. Performance: ➤ Reasoning: Achieves 92% on Big Bench Audio, setting a new state-of-the-art for native Speech to Speech reasoning ➤ Latency: At an average time to first token of 3.87 seconds, the new model is slower than leading OpenAI models including GPT Realtime (0.98 seconds), due to the thinking component. The non-thinking equivalent still leads on latency at 0.63 seconds Model details: ➤ Processes audio, video, and text inputs directly, generating both text and natural speech outputs ➤ Reasons over spoken input without transcription ➤ Supports function calling, search grounding, and thinking budgets ➤ 128k input and 8k output token limits with a knowledge cut-off of January 2025
Artificial Analysis tweet media
English
30
144
875
148.5K
Ankur Bapna retweetledi
NotebookLM
NotebookLM@NotebookLM·
🚨Rolling out NEW audio overview formats: (Default) Deep Dive: a thorough examination of your sources Brief: 1-2 minute, bite-sized overviews Critique: an expert review, offering constructive feedback on your material Debate: a thoughtful debate between two hosts
English
108
378
2.5K
253.7K
Ankur Bapna retweetledi
Heiga Zen (全 炳河)
Heiga Zen (全 炳河)@heiga_zen·
Vertex AI でテキスト音声合成モデル「Gemini 2.5 Pro TTS」と「Gemini 2.5 Flash TTS」がローンチされました。 - プロンプトで感情を指定して音声を生成 - テキストでピッチやペースを細かく制御 GAでは、ストリーミングや複数話者、70以上のロケールに対応予定です cloud.google.com/text-to-speech…
日本語
0
40
131
33.9K
rohan anil
rohan anil@_arohan_·
Being a parent is marked by fun things and also by being sick together at the same time or in sequence. Kid got hfmd from attending preschool. I think I got it too.
English
3
0
20
6K
Ankur Bapna
Ankur Bapna@ankurbpn·
@tokumin Seems like a different group from the report :)
English
0
0
0
106
Ankur Bapna retweetledi
NotebookLM
NotebookLM@NotebookLM·
GREAT NEWS for our multilingual users! Rolling out this week: 1) Video Overviews in all 80 supported languages 2) Short & Default length controls for non-English audio overviews– so you should start seeing longer AOs! Because great ideas shouldn't get lost in translation 😘
English
215
381
2.4K
500.7K