Bin Yang

21 posts

Bin Yang

@binyangderek

Founder & CEO @breezeblueX, building next-gen realtime interaction layer.

Toronto, Canada Entrou em Temmuz 2014

598 Seguindo102 Seguidores

Bin Yang@binyangderek·6d

@jordan_dearsley @Vapi_AI Glad to see a better TTS eval coming! How to get a new model participating in this?

English

Jordan Dearsley@jordan_dearsley·16 Haz

You can feel whether the voice on a call is a human or a machine before you can explain why. Today, @Vapi_AI is launching the Humanness Index™, a crowdsourced leaderboard for model humanness. You are the benchmark. Cast your first vote today: humannessindex.vapi.ai

English

7.4K

Bin Yang@binyangderek·7 Haz

We've all been there, and we've all past that stage. IMO, what needs to be optimized is "situational tts": generating speech that follows not just text, but role, intent, relationship, and scene. @BreezeBlueX is built for that purpose.

Neil Zeghidour@neilzegh

Our first commercial TTS model was optimized for WER and SSIM because that’s what research had taught us over years to be the standard metrics. The first customer feedbacks we had unveiled the huge blind spots of these metrics, in particular on naturalness, rhythm, emphasis, question intonation, etc. Now our internal eval has dozens of criteria monitored on each model.

English

Bin Yang@binyangderek·5 Haz

@unilightwf fwiw, it's essentially a half-duplex model (speech in, text response out) with VAD integrated into the backbone as special tokens. They also defined some interesting interaction-related tasks (e.g., proactively respond to sounds).

English

128

Wen-Chin Huang@unilightwf·5 Haz

Maybe a cool idea but really difficult for me to understand the main contribution (both from the paper and the demo)

arXiv Sound@ArxivSound

Zhifei Xie, Zihang Liu, Ze An, Xiaobin Hu, Yue Liao, Ziyang Ma, Dongchao Yang, Mingbao Lin, Deheng Ye, Shuicheng Yan, Chunyan Miao, "Audio Interaction Model," arxiv.org/abs/2606.05121

English

1.6K

Bin Yang@binyangderek·31 May

We definitely need more "real-world" benchmarks like this for speech models, if we want speech models to actually work in real life.

Bek@beknabdik

gpt-realtime-2 is genuinely fast, and this demo is great. no demo runs long enough to show the ceiling though. we ran it 60 turns. around 5 minutes in it went silent: took our audio, returned zero bytes, no error. then the connection dropped. the spec promises 128k tokens of context. the real ceiling is ~5 minutes. the previous gpt-realtime ran the same prompts comfortably. Full report here 🧵

English

109

Bin Yang@binyangderek·26 May

We are brewing☕️ a turbo version⚡️ of Bluebell tts model for realtime applications. Turns out, properly benchmarking end-to-end TTFA (time-to-first-audio, from client text -> tts provider server -> client intelligible speech) is nontrivial 👿 After we've sorted everything out (will open-source the benchmark tooling ofc), Bluebell-turbo-exp achieves the world's fastest TTFA 🥳 Guess who's the second one?

Bin Yang@binyangderek

~4 months in 2026 and we've seen 7 new voice design models released by different labs. Excited to see this new trend coming with @BreezeBlueX leading.

English

121

Bin Yang@binyangderek·16 May

Agree that it's the behavior that defines full-duplex, not architecture. Also, I would highlight that "forecasting" is an important capability of full-duplex systems as it significantly reduces the "perceived latency".

Desh Raj@rdesh26

x.com/i/article/2054…

English

168

Bin Yang@binyangderek·24 Nis

~4 months in 2026 and we've seen 7 new voice design models released by different labs. Excited to see this new trend coming with @BreezeBlueX leading.

English

405

Bin Yang@binyangderek·23 Nis

@AmpCode when will we have opus 4.7 support?

English

Amp@AmpCode·5 Mar

There's a new oracle in Amp: GPT-5.4. ampcode.com/news/gpt-5.4-t…

English

351

62.4K

Bin Yang@binyangderek·16 Nis

@PandaTalk8 可以试试我们的bluebell模型（breeze.blue），效果不会让你失望

中文

117

Mr Panda@PandaTalk8·15 Nis

求推荐哪家tts 最具性价比，开源或闭源、付费或免费都可以？性价比只的是既真实好用价格又跟不要钱一样。

中文

111

34.8K

Bin Yang@binyangderek·13 Nis

What's intriguing about this benchmark is, text-based models (GPT-5.2) can achieve a pass rate of 85%, 2x that of the current best voice model. The gap is HUGE.

Logan Kilpatrick@OfficialLoganK

Our latest Live model is # 1 on Tau Voice Bench! Excited to see this new frontier of voice models cross the chasm of usability in production.

English

130

Bin Yang@binyangderek·10 Nis

Let's go beyond voice cloning to voice design!

Breeze Blue Studio@BreezeBlueX

x.com/i/article/2042…

English

Bin Yang@binyangderek·1 Nis

Can't wait to hear it!

Breeze Blue Studio@BreezeBlueX

iOS 27 beta 1 quietly updated Siri's voice engine. Context-aware tone. Emotional inflection. Natural laughter. This is the biggest Siri upgrade since launch. Not even kidding. #Apple50 #siri #iOS27

English

Bin Yang@binyangderek·12 Şub

@DidiKieran Nice post! The follow-up work of VA-VAE, VTP (arxiv.org/abs/2512.13687) is also highly relevant, where the tokenizer is trained from scratch with a joint contrastive, self-supervised, and reconstruction objective to remove the dependency on pretrained representations.

English

247

Kieran Didi@DidiKieran·12 Şub

Too many REPA / RAE / representation alignment papers lately? I was lost too, so I wrote a blog post that organizes the space into phases and zooms in on what actually matters for general/molecular ML. Curious what folks think - link below! 🔗 Blog: kdidi.netlify.app/blog/ml/2025-1…

English

534

79.3K

Bin Yang@binyangderek·12 Şub

@unilightwf from a tech perspective, human preference aligned TTS evaluation is hard. do you have any proposal here?

English

281

Wen-Chin Huang@unilightwf·12 Şub

Open-source TTS界隈、ありがたいんだけど「高品質」「高速」と言いながらなんお評価結果も載せていなくて謎すぎる😭

日本語

9.1K

Bin Yang@binyangderek·6 Eyl

@eigensteve Hi, could you share your filming setup? This should be the common practice for all online courses/tutorials.

English

Steven Brunton@eigensteve·5 Eyl

First new video after being back from Sabbatical!! PDE 101: Separation of Variables... or how I learned to stop worrying and solve Laplace's equation One of the most important concepts in all of partial differential equations youtube.com/watch?v=VjWtMl…

YouTube

English

172

1.5K

Bin Yang retweetou

Raquel Urtasun@RaquelUrtasun·12 Eki

Today, together with my collaborators Andreas Geiger, Philip Lenz and Christoph Stiller, I was awarded the 2021 Everingham Prize for KITTI at #ICCV21, which enabled many breakthroughs in #SelfDrivingCars. Thank you, truly an honor!

English

332

Bin Yang@binyangderek·27 Ağu

Check out our #ECCV2020 oral: Learning Lane Graph Representations for Motion Forecasting, which achieves best results on Argoverse benchmark. Live QA: 8/27 9pm EDT Poster session: 8/27 9am/7pm EDT Paper: arxiv.org/abs/2007.13732 @RaquelUrtasun @UberATG

English

Descobrir

@jordan_dearsley @Vapi_AI @BreezeBlueX @unilightwf @AmpCode @PandaTalk8 @DidiKieran @elonmusk