Justin Uberti

5.4K posts

Justin Uberti

@juberti

Head of Realtime AI @OpenAI. Created WebRTC. Past: CTO @ultravox_dot_ai, Distinguished Engineer @google (Stadia, Meet/Duo), AIM. Amateur mathematician/musician.

Seattle, WA Beigetreten Şubat 2007

120 Folgt14K Follower

Angehefteter Tweet

Justin Uberti@juberti·25 Şub

We’ve integrated the ChatGPT Voice “orb” into the chat view in our latest iOS release, and you can toggle between integrated/fullscreen view. I find chatting with the orb feels more engaging, like there’s an actual thing you’re talking to. Thoughts?

English

346

36.7K

Justin Uberti@juberti·7 Mar

@jezell @stevendcoffey The need makes sense, just wondering if you can approximate it by continuing to append results to the same call.

English

Jesse Ezell@jezell·6 Mar

@juberti @stevendcoffey Imagine for example a tool call which runs a docker build. It’s a very long process. You actually want it to return results incrementally, not the whole chunk at the end for both context length issues and general agent awareness issues.

English

Jesse Ezell@jezell·6 Mar

@stevendcoffey @juberti I think there's a really big gap in the Responses API right now in that there is zero support for streaming tool calls. Now that there are websockets and things are moving more and more to be realtime with voice models on the way that change the interruptions model, the whole tool calls need to be request / response interactions thing is gonna get old fast. Sure, you can make some tools to do things like spin up background terminals and grep over their logs, but real systems are full of realtime events and signals and the current APIs just don't map to that world. I'm looking forward to whatever is the future where the Realtime API + Websocket Responses API are unified into something that can handle those workloads.

English

278

Justin Uberti retweetet

Sebastien Bubeck@SebastienBubeck·5 Mar

GPT-5.4

806

93.2K

Justin Uberti@juberti·5 Mar

Agreed. I think of this year-by-year: 2022: GPT 3.5 2023: GPT-4 2024: Reasoners 2025: Coding agents

Ethan Mollick@emollick

From an AI user perspective, the four big leaps so far in ability: 1. GPT-3.5 (ChatGPT, November 2022) 2. GPT-4 (Spring 2023) 3. Reasoners (starts with o1-preview, but the real deal was o3, Spring 2025) 4. Workable agentic systems (Harness + good reasoner models, December 2025)

English

2.2K

Justin Uberti@juberti·26 Şub

@SantiagoAfonso @athyuttamre We hope so too 😅

English

Santiago Afonso@SantiagoAfonso·26 Şub

@athyuttamre @juberti My assumption here is that you cannot bring 5.2 xhigh (2-6 minutes of thinking) or pro level intelligence (10+ minutes thinking for the easiest answers) to realtime in the next 36 months. I hope I'm wrong!

English

Justin Uberti@juberti·25 Şub

English

346

36.7K

Justin Uberti@juberti·26 Şub

@wayne_culbreth Realtime API models are separate from the ChatGPT Voice models since 3p customers have somewhat different needs

English

671

Wayne Culbreth@wayne_culbreth·26 Şub

@juberti When will realitime-1.5 be on iOS?

English

744

Justin Uberti retweetet

Peter Bakkum@pbbakkum·25 Şub

gpt-realtime-1.5 is the best native audio model on the Scale AudioMultiChallenge benchmark -- this is a significant jump in capability by this measure. There are models that outperform it but they are reasoning models without native audio output.

English

179

24.4K

Justin Uberti@juberti·26 Şub

@adamac hmm. It should be instantaneous. That said, we are working on a number of optimizations to improve this flow (including the smoothness of the UX)

English

Adam MacBeth@adamac·26 Şub

@juberti In a new chat or the same chat. Doesn’t help that the animation is janky as that captures my attention.

English

Justin Uberti@juberti·26 Şub

@adamac Even in a new chat?

English

Adam MacBeth@adamac·26 Şub

@juberti The time it takes to connect (spinner) is a hindrance to using voice mode imo. Seems like it’s at least 3 seconds every time even when toggling back and forth. Why isn’t this instantaneous?

English

Justin Uberti@juberti·26 Şub

@fabrohl @LearnAI_MJ Yep! Or just tap it.

English

Fabian@fabrohl·26 Şub

@LearnAI_MJ @juberti You drag the circle conversation bubble up into the main view.

English

Justin Uberti@juberti·25 Şub

interesting! a bit uncanny but definitely shows that some amount of singing with gpt-realtime-1.5 is possible...

stephen 🌿@stevelizcano

@juberti been experimenting with things like this and realtime-1.5 it's pretty fun and makes it engaging

English

2.2K

Justin Uberti@juberti·25 Şub

Yeah. This is the number one AVM complaint I hear these days. The amazing progress from reasoning models has made the text/audio intelligence delta much more obvious.

CommonSenseOnMars@CommonSenseMars

@juberti Agree, nice option. Just hoping for a legit voice model intelligence update eventually. Tbf OpenAI has been ahead of all the other AI lab voice models for over 1.5yrs now which is impressive

English

2.6K

Justin Uberti@juberti·25 Şub

You can easily make the model whisper via prompting. Singing is another story, as there are nontechnical restrictions there. You can experiment at platform.openai.com/audio/realtime

Sava@savamusics

@juberti Hi Justin, I tested it early in the morning. Will the final model be more expressive in terms of its tone variation and voice volume? It doesn't slow down the speed of its speech as much I would like and doesn't whisper. Also, it doesn't sing when asked to.

English

1.5K

Justin Uberti@juberti·24 Şub

@BenjieMalinao hello-realtime is using "marin"

English

180

Benjie Malinao@BenjieMalinao·24 Şub

@juberti whats the voice name?

English

260

Justin Uberti@juberti·23 Şub

Just released gpt-realtime-1.5 with improved intelligence, instruction following, and voice quality. Try it out today at hello-realtime.val.run (or call 425-800-0042)!

OpenAI Developers@OpenAIDevs

Voice workflows just got stronger with gpt-realtime-1.5 in the Realtime API. The model offers more reliable instruction following, tool calling, and multilingual accuracy. Demo with @charlierguo

English

71.8K

Justin Uberti@juberti·23 Şub

APIs remain the same, so this is a seamless upgrade for existing gpt-realtime users.

English

1.5K

Justin Uberti@juberti·12 Şub

@davezatz can you give a more specific example?

English

Dave Zatz@davezatz·12 Şub

@juberti 15 mins today, no issue. Not that you asked, the vocalized and non-vocalized pauses are off-putting in some way I can't put my finger on. :)

English

Dave Zatz@davezatz·7 Şub

Slightly more convenient than hitting the action button to speak to Grok while commuting, as I do now. (ChatGPT and Gemini interrupt themselves over car speakers, so Grok is currently the only option.)

Mark Gurman@markgurman

NEW: Apple is preparing to allow voice-controlled artificial intelligence apps from other companies in CarPlay, a move that will let users query AI chatbots through its vehicle interface for the first time. bloomberg.com/news/articles/…

English

1.1K

Justin Uberti@juberti·22 Oca

Congrats to LiveKit on their Series C! There's enormous potential for Voice AI, as anyone who's had to wait on hold for support can immediately understand.

LiveKit@livekit

We learn to speak before we learn to read. Voice is the most natural interface we have. We just raised a $100M to make building voice AI as easy as a web app.

English

1.5K

Entdecken

@jezell @stevendcoffey @SantiagoAfonso @athyuttamre @wayne_culbreth @adamac @fabrohl @LearnAI_MJ