Justin Uberti

5.4K posts

Justin Uberti

@juberti

Head of Realtime AI @OpenAI. Created WebRTC. Past: CTO @ultravox_dot_ai, Distinguished Engineer @google (Stadia, Meet/Duo), AIM. Amateur mathematician/musician.

Seattle, WA Katılım Şubat 2007

121 Takip Edilen14.2K Takipçiler

Sabitlenmiş Tweet

Justin Uberti@juberti·7 Nis

Will be speaking at the Cerebral Valley Voice Summit on May 6 in SF, along with some other great folks in this space! cerebralvalleyvoice.com

English

2.1K

Justin Uberti@juberti·1d

@thesyncim Nice. How about accuracy on the test vectors?

English

Marcelo Pires@thesyncim·1d

@juberti Decoder performance it’s actually on pair with C implementation, gopus is about 3-5% slower. codex did pretty much everything, kinda crazy

English

Justin Uberti@juberti·24 Nis

GPT-5.5 has been a workhorse for our team. We worried about attacks on our C-based Opus audio decoder, so we asked 5.5 to rewrite it in Go, using the RFC 6716 spec. A dozen PRs and 10 KLOC later, the decoder passes test vectors and is almost complete. 🔥 github.com/pion/opus/issu…

English

113

6.2K

Justin Uberti@juberti·24 Nis

@thesyncim Very cool, like the name. Would be good to compare performance against the test vectors!

English

137

Marcelo Pires@thesyncim·24 Nis

@juberti been working on this for the past few weeks :) github.com/thesyncim/gopus

English

312

Justin Uberti@juberti·13 Nis

@talentdensity Thanks. If you tell it to specifically not say those words, does that help?

English

Talent Density@talentdensity·12 Nis

nup but just some examples prompts are "always speak with a natural australian accent" and it will always drift back to american and "always use a vocabulary at a 5th grade reading level" and it will start saying things like "let's keep this simple" which was not the instruction

English

Justin Uberti@juberti·10 Nis

Not great to called out by an AI OG about AVM, but he’s right that the recent capability gains of text models have been >> that of speech models, mostly by thinking harder. But at the same time we need speech models to be faster + more humanlike. The impossible just takes longer.

Andrej Karpathy@karpathy

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

English

3.3K

Justin Uberti@juberti·11 Nis

@yi_ding yes that is an important part, think -> memoize -> repeat

English

Yi Ding -- prod/acc@yi_ding·10 Nis

I was surprised by how 4.1 outperformed reasoning=none in both latency and performance through at least 5.1? My original hypothesis of the yellow brick road towards AGI was spend inference tokens => distill into synthetic data => improve base model => spend inference tokens => etc but it looks like it hasn't turned out to be that easy, but once again, haven't seen the new tater yet. :-)

English

Justin Uberti@juberti·10 Nis

@yi_ding Sure, there have been large improvements in base models, but extended thinking is the thing that has brought about the step function capability gains.

English

Yi Ding -- prod/acc@yi_ding·10 Nis

@juberti Honestly, if you look at the performance of no-reasoning 5.x models vs. 4.1/4o you see the same trend. Reasoning has been a huge leap forward, but I would have expected the gains to have accrued faster to the base models also. Looking forward to 🥔

English

Justin Uberti@juberti·10 Nis

It's true, there's been great progress building the brain and now the hands, but to reach AGI we absolutely have to solve the ears and mouth.

Jordan Talks Everyday AI@EverydayAI_

@juberti or the annoyingly necessary small talk confirmations and delaying while it reasons, which is annoying AF. it's crazy to me that we're seemingly racing toward ASI full speed but no lab has figured out real-time voice that reasons over your data.

English

1.6K

Justin Uberti@juberti·10 Nis

The folks at val.town (a low-friction cloud JS IDE/runtime) featured my hello-realtime sample app in their latest blog post. hello-realtime is a great way to start with the OpenAI Realtime API, and works on web and by phone (425-800-0042)! blog.val.town/talk-of-the-to…

English

1.6K

Justin Uberti@juberti·10 Nis

One issue is that without upfront reasoning, sampling can put you on a bad (hallucinatory) path that’s hard to recover from. But nobody wants a speech model that takes several seconds to respond…

Jordan Talks Everyday AI@EverydayAI_

@juberti The biggest downfall is OpenAI’s text models rarely hallucinate with proper controls/CE, but voice models (presumably the 4o variant) legit struggle telling any truth. That’s not just an OAI problem obviously.

English

2.5K

Justin Uberti@juberti·10 Nis

@gabztodaro Please do!

English

143

gabz@gabztodaro·10 Nis

@juberti Always is. But that’s the best part of life. May I send my resume to you?

English

157

Justin Uberti@juberti·10 Nis

We're looking for a creative iOS engineer to join our realtime AI team here at OpenAI Seattle to help build the future of human-AI interaction. If you know WebRTC, AVFoundation, and/or Core Audio and like open-ended challenges, apply at openai.com/careers/ios-so… or just DM!

English

266

40.9K

Justin Uberti@juberti·10 Nis

@talentdensity Have a demo of the issue in realtime playground?

English

138

Talent Density@talentdensity·10 Nis

@juberti realtime 1.5 api is really not great accents drop after one sentence and agents still repeat words in the instructions hope a better update is coming soon

English

131

Justin Uberti@juberti·10 Nis

@gabztodaro Possibly? But might be a lot to juggle?

English

388

gabz@gabztodaro·10 Nis

@juberti I have 10+ years within iOS development, but I’m pursuing a masters degree in Applied AI, in Massachusetts. Can it be remote, at least until January 2027?

English

532

Justin Uberti@juberti·10 Nis

@knchst Can neither confirm nor deny

English

210

Kenichi Saito@knchst·10 Nis

噂のデバイス向けか

Justin Uberti@juberti

日本語

469

Justin Uberti@juberti·10 Nis

@SENTINELITE It’s a fast moving team, Seattle strongly preferred.

English

342

SENTINELITE@SENTINELITE·10 Nis

@juberti I’ll have my application on the desk by Monday. I’ve made multiple AVFoundation apps (ISO-recorder, lossless trimming app, etc), plus I’ve rolled custom audio packages (most recently working on an OSS Swift package for best-in-class API for T2S models). U.S remote or Seattle?

English

402

Justin Uberti@juberti·10 Nis

@jeudesprits US only unfortunately

English

403