Justin Uberti

5.4K posts

Justin Uberti banner
Justin Uberti

Justin Uberti

@juberti

Head of Realtime AI @OpenAI. Created WebRTC. Past: CTO @ultravox_dot_ai, Distinguished Engineer @google (Stadia, Meet/Duo), AIM. Amateur mathematician/musician.

Seattle, WA Katılım Şubat 2007
121 Takip Edilen14.2K Takipçiler
Sabitlenmiş Tweet
Justin Uberti
Justin Uberti@juberti·
Will be speaking at the Cerebral Valley Voice Summit on May 6 in SF, along with some other great folks in this space! cerebralvalleyvoice.com
Justin Uberti tweet media
English
2
3
10
2.1K
Marcelo Pires
Marcelo Pires@thesyncim·
@juberti Decoder performance it’s actually on pair with C implementation, gopus is about 3-5% slower. codex did pretty much everything, kinda crazy
English
1
0
0
48
Justin Uberti
Justin Uberti@juberti·
GPT-5.5 has been a workhorse for our team. We worried about attacks on our C-based Opus audio decoder, so we asked 5.5 to rewrite it in Go, using the RFC 6716 spec. A dozen PRs and 10 KLOC later, the decoder passes test vectors and is almost complete. 🔥 github.com/pion/opus/issu…
English
2
7
113
6.2K
Justin Uberti
Justin Uberti@juberti·
@thesyncim Very cool, like the name. Would be good to compare performance against the test vectors!
English
1
0
1
137
Talent Density
Talent Density@talentdensity·
nup but just some examples prompts are "always speak with a natural australian accent" and it will always drift back to american and "always use a vocabulary at a 5th grade reading level" and it will start saying things like "let's keep this simple" which was not the instruction
English
1
0
0
31
Justin Uberti
Justin Uberti@juberti·
Not great to called out by an AI OG about AVM, but he’s right that the recent capability gains of text models have been >> that of speech models, mostly by thinking harder. But at the same time we need speech models to be faster + more humanlike. The impossible just takes longer.
Andrej Karpathy@karpathy

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

English
8
0
28
3.3K
Justin Uberti
Justin Uberti@juberti·
@yi_ding yes that is an important part, think -> memoize -> repeat
English
0
0
1
27
Yi Ding -- prod/acc
Yi Ding -- prod/acc@yi_ding·
I was surprised by how 4.1 outperformed reasoning=none in both latency and performance through at least 5.1? My original hypothesis of the yellow brick road towards AGI was spend inference tokens => distill into synthetic data => improve base model => spend inference tokens => etc but it looks like it hasn't turned out to be that easy, but once again, haven't seen the new tater yet. :-)
English
1
0
0
36
Justin Uberti
Justin Uberti@juberti·
@yi_ding Sure, there have been large improvements in base models, but extended thinking is the thing that has brought about the step function capability gains.
English
1
0
1
44
Yi Ding -- prod/acc
Yi Ding -- prod/acc@yi_ding·
@juberti Honestly, if you look at the performance of no-reasoning 5.x models vs. 4.1/4o you see the same trend. Reasoning has been a huge leap forward, but I would have expected the gains to have accrued faster to the base models also. Looking forward to 🥔
English
1
0
0
62
Justin Uberti
Justin Uberti@juberti·
It's true, there's been great progress building the brain and now the hands, but to reach AGI we absolutely have to solve the ears and mouth.
Jordan Talks Everyday AI@EverydayAI_

@juberti or the annoyingly necessary small talk confirmations and delaying while it reasons, which is annoying AF. it's crazy to me that we're seemingly racing toward ASI full speed but no lab has figured out real-time voice that reasons over your data.

English
2
0
10
1.6K
Justin Uberti
Justin Uberti@juberti·
The folks at val.town (a low-friction cloud JS IDE/runtime) featured my hello-realtime sample app in their latest blog post. hello-realtime is a great way to start with the OpenAI Realtime API, and works on web and by phone (425-800-0042)! blog.val.town/talk-of-the-to…
English
0
4
9
1.6K
Justin Uberti
Justin Uberti@juberti·
One issue is that without upfront reasoning, sampling can put you on a bad (hallucinatory) path that’s hard to recover from. But nobody wants a speech model that takes several seconds to respond…
Jordan Talks Everyday AI@EverydayAI_

@juberti The biggest downfall is OpenAI’s text models rarely hallucinate with proper controls/CE, but voice models (presumably the 4o variant) legit struggle telling any truth. That’s not just an OAI problem obviously.

English
3
1
17
2.5K
gabz
gabz@gabztodaro·
@juberti Always is. But that’s the best part of life. May I send my resume to you?
English
1
0
1
157
Justin Uberti
Justin Uberti@juberti·
We're looking for a creative iOS engineer to join our realtime AI team here at OpenAI Seattle to help build the future of human-AI interaction. If you know WebRTC, AVFoundation, and/or Core Audio and like open-ended challenges, apply at openai.com/careers/ios-so… or just DM!
English
7
24
266
40.9K
Talent Density
Talent Density@talentdensity·
@juberti realtime 1.5 api is really not great accents drop after one sentence and agents still repeat words in the instructions hope a better update is coming soon
English
1
0
0
131
gabz
gabz@gabztodaro·
@juberti I have 10+ years within iOS development, but I’m pursuing a masters degree in Applied AI, in Massachusetts. Can it be remote, at least until January 2027?
English
1
0
0
532
SENTINELITE
SENTINELITE@SENTINELITE·
@juberti I’ll have my application on the desk by Monday. I’ve made multiple AVFoundation apps (ISO-recorder, lossless trimming app, etc), plus I’ve rolled custom audio packages (most recently working on an OSS Swift package for best-in-class API for T2S models). U.S remote or Seattle?
English
1
0
0
402
Justin Uberti
Justin Uberti@juberti·
A lot of familiar faces from last year's VapiCon, one of the benefits of a small and friendly voice AI community
English
0
0
1
560
Justin Uberti
Justin Uberti@juberti·
Will be speaking at the Cerebral Valley Voice Summit on May 6 in SF, along with some other great folks in this space! cerebralvalleyvoice.com
Justin Uberti tweet media
English
2
3
10
2.1K