Alan Cowen

1.5K posts

Alan Cowen banner
Alan Cowen

Alan Cowen

@AlanCowen

Gemini Audio, Director RS @GoogleDeepMind. Founder @Hume_AI. Teaching AI to make people happy.

Katılım Şubat 2015
238 Takip Edilen3.7K Takipçiler
Sabitlenmiş Tweet
Alan Cowen
Alan Cowen@AlanCowen·
Life update: I’m now at @GoogleDeepMind Feels surreal to not be at @hume_ai anymore, but the team there is in an incredible position to continue advancing cutting edge voice AI research. There are changes coming in how people interact with AI. There are changes coming in how labs will teach AI to make people happy. Voice intelligence is increasingly seen as a key unlock on both fronts, a thesis Hume was very early to adopt. So Hume is sharing the infrastructure for voice data, evals, and RL to help advance voice intelligence and AI empathy across the industry. It’s proving to be a much bigger opportunity than expected. I’m excited for the Hume team and grateful to everyone who helped us get to this point. Will share more on what I’m working on soon
English
39
16
586
56.3K
Alan Cowen
Alan Cowen@AlanCowen·
@deedydas Either uses an internal tool that goes by a different name (like everything at Google) or is in a small minority of holdouts
English
0
0
1
655
Deedy
Deedy@deedydas·
Google Senior Staff Engineer to me: “Yeah, I have no clue what Claude Code / Codex is but I hear it’s all the rage. No, I don’t really care, I just need GOOG to hit $400 and keep this job for 2-3 more years so I can retire!”
English
220
78
4.7K
594.1K
Alan Cowen retweetledi
Hume AI
Hume AI@hume_ai·
Today we're releasing our first open source TTS model, TADA! TADA (Text Audio Dual Alignment) is a speech-language model that generates text and audio in one synchronized stream to reduce token-level hallucinations and improve latency. This means: → Zero content hallucinations across 1,000+ test samples → 5x faster than similar-grade LLM-based TTS → Fits much longer audio: 2,048 tokens cover ~700 seconds with TADA vs. ~70 seconds in conventional systems → Free transcript alongside audio with no added latency
English
101
317
2.9K
260.4K
Alan Cowen
Alan Cowen@AlanCowen·
Life update: I’m now at @GoogleDeepMind Feels surreal to not be at @hume_ai anymore, but the team there is in an incredible position to continue advancing cutting edge voice AI research. There are changes coming in how people interact with AI. There are changes coming in how labs will teach AI to make people happy. Voice intelligence is increasingly seen as a key unlock on both fronts, a thesis Hume was very early to adopt. So Hume is sharing the infrastructure for voice data, evals, and RL to help advance voice intelligence and AI empathy across the industry. It’s proving to be a much bigger opportunity than expected. I’m excited for the Hume team and grateful to everyone who helped us get to this point. Will share more on what I’m working on soon
English
39
16
586
56.3K
Alan Cowen retweetledi
Hume AI
Hume AI@hume_ai·
One performance, infinite voices. Voice Conversion is now live on Hume’s creator studio and API! Generate the same pacing, pronunciation, and intonation with one recording across any voice you choose. Hear it for yourself ⬇️
English
20
33
167
19.6K
David Lieb
David Lieb@dflieb·
The AI device I think I want: - AirPods form factor - with wide angle cameras for context - with always-on ChatGPT voice mode - that knows when I’m talking vs other sounds (which it’s terrible at now) - that can respond reactively or be proactive I don’t think I want a display. I’ll have my phone in my pocket anyway.
English
47
8
229
105.3K
Alan Cowen
Alan Cowen@AlanCowen·
Excited to be powering @NianticSpatial ‘s Dot
Hume AI@hume_ai

Today, @NianticSpatial released an update to their AR companion, Dot, at Snap's Lens Fest, with new voice capabilities powered by Hume AI. Dot's new interactive dialogue capabilities allow the AI companion to guide users through physical spaces, offering contextual information through natural, emotionally responsive dialogue. Our CEO @AlanCowen, sees the product as a glimpse into the future of computing. “At some point, everyone is going to have AR in their lives in some format,” Cowen says. “You’ll be talking to companions that help guide you through the world.” Read more in today's @WIRED and article below!

English
1
2
16
3K
Alan Cowen retweetledi
SambaNova Japan
SambaNova Japan@SambaNovaAI_jp·
Hume社との提携について、著名なAI研究家である清水亮さん @shi3z に取材していただきました。
日本語
0
3
14
3.2K
Alan Cowen
Alan Cowen@AlanCowen·
@supertradeish @hume_ai It should be possible to reproduce any accent reliably using voice cloning. If you test a specific accent, let us know how it works!
English
0
0
0
26
Super Tradeish
Super Tradeish@supertradeish·
@hume_ai Impressive improvements in speed and cost! How does Octave 2 handle regional accents within those 11+ languages?
English
1
0
1
281
Alan Cowen retweetledi
Hume AI
Hume AI@hume_ai·
Introducing Octave 2: our next-generation multilingual text-to-speech model What’s new: - Fluent in 11+ languages - 40% faster (<200ms latency⁠⁠) & 50% cheaper than Octave 1 - Multi-speaker conversation - More reliable pronunciation - New voice conversion & phoneme editing capabilities For the month of October, we’re offering 50% off our Creator plan - use code OCTAVE2 at checkout!
English
84
166
1.6K
7.1M
Alan Cowen
Alan Cowen@AlanCowen·
@heyaytac Thanks for the feedback, we are working on it! It might already be better tomorrow
English
1
0
1
27
itouch
itouch@heyaytac·
@AlanCowen Overall great progress but just a small feedback: I have played around a bit with ev4 mini and it lacks a lot of nuances in the german language, especially when pronouncing words correctly. It also switches from german to english when repeating a phone number.
English
1
0
1
30
Alan Cowen
Alan Cowen@AlanCowen·
we've been working on making voice AI faster and more realistic. we're hoping that scaling voice will give AI compatibility with the human psyche: every voice session is time-locked, interruptible, and rife with feedback. but first, we need higher quality voice experiences.
Hume AI@hume_ai

Introducing Octave 2: our next-generation multilingual text-to-speech model What’s new: - Fluent in 11+ languages - 40% faster (<200ms latency⁠⁠) & 50% cheaper than Octave 1 - Multi-speaker conversation - More reliable pronunciation - New voice conversion & phoneme editing capabilities For the month of October, we’re offering 50% off our Creator plan - use code OCTAVE2 at checkout!

English
3
0
13
1.2K
Alan Cowen retweetledi
Robert Scoble
Robert Scoble@Scobleizer·
Today @hume_ai is releasing its latest text to audio and audio to text AI models. Remember this company? Brings emotions into your audio in a way others don't. But now better and here founder @AlanCowen gives me a deep dive into the audio/AI space and an update on their latest, coming later today.
English
12
9
105
22.6K
Alan Cowen
Alan Cowen@AlanCowen·
@Noahpinion That’s because there’s a third possibility thought to be most likely: AI doesn’t plateau, but training costs shrink by orders of magnitude relative to inference costs (because the scale of deployment grows much faster than the scale of training)
English
0
0
2
432
Noah Smith 🐇🇺🇸🇺🇦🇹🇼
Leading AI labs treat model training expenses as capex. However, if AI continuously improves, training the next model is a cost that never goes away. That's closer to opex. And if AI plateaus, trailing firms catch up and the product becomes a commodity.
English
36
16
295
30.4K
Alan Cowen
Alan Cowen@AlanCowen·
@willdepue They mean learning from production data with an objective function that approximates user satisfaction
English
0
0
1
288
will depue
will depue@willdepue·
everyone who mentions ‘continual learning’ as a problem is usually just talking about sample efficiency. clearly, you should ‘continually learn’ by continually training trajectories back into the model! there’s no mystery: this just doesn’t work with low sample efficiency.
English
26
5
370
84.2K