
FODUU
378 posts

FODUU
@foduu
We're an affordable web design company in India providing web and mobile app development services around the globe. #webdesign #webdevelopment #appdevelopment
























So OpenAI will release a voice conversational system today? Apparently, it would be a voice-in -> voice-out?* Cuts the 3 fold process of: 1. voice to text (speech recognition i.e. whisper) 2. text to text (llm to process the text i.e. gpt 4) 3. text to speech (vocalise the speech) All of these three are compressed into one single process. This is similar to the process currently used for paying on the OAI ChatGPT on iPhone/ Android. Where is open source with respect to this? We already have seemingly strong audio LMs i.e. models that take in audio and spit out text processed via a LLM. Examples - Gazelle, Qwen Audio, SALMONN etc and for Text to Speech, we do have strong small models like Parler TTS, Piper, FastSpeech 2 and so on Bottom line: We're not as far as one would think. IMO, even the three-fold process listed above can be optimised significantly to still not be a bottleneck. I really hope OAI releases something like this, as it'd be a good way to reactivate interest in audio again for research labs and industry. *my sources are just random Twitter posts I've seen on my timeline so far.

We’ll be streaming live on openai.com at 10AM PT Monday, May 13 to demo some ChatGPT and GPT-4 updates.








