
John Robb
31.1K posts

John Robb
@johnrobb
The Global Guerrillas Report -- Sense-making frameworks. War-Tech-Politics Book: Brave New War Patreon: https://t.co/y1d9WwM6EU Substack: https://t.co/lTSdS1iHmB







Nvidia is now worth more than the GDP of every country in the world except the U.S. and China

Reading scores, 3rd graders to 8th graders, 2015 to 2025. This is a national tragedy.

Announcing agentic performance benchmarking for Speech to Speech models on Artificial Analysis. We use 𝜏-Voice to measure tool calling and customer interaction voice agent capabilities in realistic customer service scenarios Even the strongest Speech to Speech (S2S) models today resolve only about half of realistic customer service scenarios end-to-end - a meaningful gap relative to frontier text-based agents on the same tasks. Voice channels introduce significant complexity: challenging accents, background noise, and packet loss, all while requiring fast responses, consistency across long multi-turn conversations, and reliable tool use. Performance also varies considerably by audio condition: in clean audio some models perform notably better, but realistic conditions continue to pose a challenge. Conversation duration also varies meaningfully across models, with implications for both customer experience and operational cost. About 𝜏-Voice: Our Agentic Performance benchmark is based on 𝜏-Voice (Ray, Dhandhania, Barres & Narasimhan, 2026), which extends 𝜏²-bench into the voice modality to evaluate S2S models on realistic customer service tasks. It measures multi-turn instruction following, support of a simulated customer through a complete interaction, and tool use against simulated customer service systems. The simulated user combines an LLM-driven decision model with realistic audio synthesis: diverse accents, background noise, and packet loss modelled on real network conditions. This complements our Big Bench Audio benchmark measuring intelligence and Conversational Dynamics (Full Duplex Bench subset) benchmark measuring conversational naturalness. Scores are the average of three independent pass@1 trials. We evaluate under realistic audio conditions using the 𝜏²-bench base task split across three domains: ➤ Airline (50 scenarios): e.g., changing a flight, rebooking under policy constraints ➤ Retail (114 scenarios): e.g., disputing a charge, processing a return ➤ Telecom (114 scenarios): e.g., resolving a billing issue, troubleshooting a service problem Task success is determined by deterministic checks against expected actions and final database state, consistent with the 𝜏²-bench evaluator. Key results: xAI's Grok Voice Think Fast 1.0 is the clear leader at 52.1%, averaging 5.6 minutes per conversation, the second-longest overall. OpenAI's GPT-Realtime-2 (High) (39.8%, 3.0 min) and GPT-Realtime-1.5 (38.8%, 4.8 min) follow, with Gemini 3.1 Flash Live Preview - High close behind at 37.7% (3.8 min). Speech to Speech is a fast evolving modality and we expect movement in rankings as we continue to add new models with these capabilities, and model robustness improves. Congratulations @xAI @elonmusk! See below for further detail ⬇️

*TESLA SHARES EXTEND DECLINE TO 5%

NEW: Google reportedly in talks with SpaceX for a rocket launch deal to put orbital data centers in space.

GM just laid off hundreds of IT workers to hire those with stronger AI skills | Kirsten Korosec, TechCrunch General Motors has laid off more than 10% of its IT department, or about 600 salaried employees — in a deliberate skills swap: clearing out workers whose expertise no longer fits and making room for some with AI-focused backgrounds. GM confirmed to TechCrunch that it had conducted layoffs; they were first reported by Bloomberg News. In an emailed statement, the automaker framed the layoffs as means to prepare it for the future, without providing specifics. “GM is transforming its Information Technology organization to better position the company for the future,” the company said. These layoffs are not all permanent headcount reductions. A person familiar with the layoffs told TechCrunch that the company is still hiring people for roles in its IT department, but for different skills. The most sought-after capabilities are AI-native development, data engineering and analytics, cloud-based engineering, and agent and model development, prompt engineering, and new AI workflows. In practical terms, GM is looking for people who know how to build with AI from the ground up — designing the systems, training the models, and engineering the pipelines — not just use AI as a productivity tool. GM has laid off white-collar employees in several departments over the past 18 months, as it focuses its resources on high-priority initiatives, including AI. In August 2024, for example, the company cut about 1,000 software workers. The software workforce has undergone significant change since Sterling Anderson — co-founder of the autonomous trucking startup Aurora and a veteran of the autonomous vehicle industry — was hired in May 2025 as chief product officer. Last November, three top executives left the company’s software team as Anderson pushed to consolidate GM’s disparate technology businesses into one organization: Baris Cetinok, senior vice president of software and services product management, Dave Richardson, senior vice president of software and services engineering, and Barak Turovsky, a former VP at Cisco who spent just nine months as GM’s chief AI officer. GM has since moved to fill the gap with new AI-focused hires. It hired Behrad Toghi, who previously worked at Apple, in October as AI lead. The company also brought on Rashed Haq as its vice president of autonomous vehicles. Haq spent five years at Cruise — the self-driving vehicle company acquired and later shuttered by GM — as its head of AI and robotics. For the industry, GM's restructuring is a signal of what enterprise AI adoption actually looks like in practice -- not just adding AI tools on top of existing teams, but deliberately rebuilding the workforce from the ground up. The specific capabilities it's hiring for -- agent development, model engineering, AI-native workflows -- point directly at where large-enterprise demand is heading. techcrunch.com/2026/05/11/gm-…


it's insane that our government doesn't know how many people are within its borders. it's insane someone can be murdered in a public place and we can't immediately arrest the killer. if you're against public surveillance you are objectively pro crime and pro illegal immigration




