Saúl Ibarra Corretgé
24.2K posts
Saúl Ibarra Corretgé
@saghul
Open Source and Real Time Communications pundit. Self-appointed Chief Jitsi Evangelist. Aspiring maker. DMs are open. 🐘: @[email protected]


xAI’s new Grok Voice Agent is the new leading Speech to Speech model, surpassing Gemini 2.5 Flash Native Audio and GPT Realtime in our Big Bench Audio benchmark The new model achieves a score of 92.3% on Big Bench Audio, just ahead of the previous leader, Google’s Gemini 2.5 Flash Native Audio Thinking. This model is @xAI’s first public Speech to Speech API, bringing increased competition to the space. The model has tool calling support and xAI has said it’s ready to be used across voice assistants, phone agents, and interactive voice applications. Benchmark context: Big Bench Audio is the first dedicated dataset for evaluating reasoning performance of speech models. Big Bench Audio comprises 1,000 audio questions adapted from the Big Bench Hard text test set, chosen for its rigorous testing of advanced reasoning, translated into the audio domain. Performance: ➤ Reasoning: Achieves 92.3% on Big Bench Audio, setting a new state-of-the-art for native Speech to Speech reasoning. Congratulations @xai and @elonmusk on this impressive release! ➤ Latency: At an average time to first token of 0.78 seconds, it is the third fastest model on our leaderboard behind Google’s Gemini 2.5 Flash Native Audio Dialog and Gemini 2.5 Flash Live ➤ Price: Simple pricing of 5 cents per minute connected, or $3 per hour of audio Key features: ➤ Tool calling: Use built-in tools such as web search, RAG-powered search, or define your own tools with JSON schema ➤ Telephony: Connect to Session Initiation Protocol (SIP) providers like Twilio and Vonage ➤ Multilingual: Converse in over 100 languages with 5 voices to choose from




Google’s Nano Banana Pro is by far the best image generation AI out there. I gave it a picture of a question and it solved it correctly in my actual handwriting. Students are going to love this. 😂












