Mason Wang

51 posts

Mason Wang

Mason Wang

@masonlongwang

Audio Understand Lead @ xAI, post-training Grok Voice on the world’s biggest GPU cluster. PhD at MIT CSAIL (On Leave).

Stanford, CA Katılım Temmuz 2020
192 Takip Edilen165 Takipçiler
Mason Wang
Mason Wang@masonlongwang·
@Techmeme @theo_wayt xAI is very egalitarian. Doesn’t matter if you finished your PhD or even your undergrad. Your ideas are judged based on their merit and not your credentials, which means everyone gets humbled at some point.
English
0
0
0
51
Techmeme
Techmeme@Techmeme·
Sources: 50+ researchers and engineers have left xAI since the SpaceX acquisition via layoffs, firings, and voluntary departures; many have joined Meta and TML (@theo_wayt / The Information) (Visit Techmeme dot com for the link and full context!)
English
2
2
23
6.7K
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
for all the mockery, I really want xAI to step up their game. They're one of the very few labs who dare to make LLMs with a different personality core, not just better/worse versions of the same thing. 4.3 is… not bad for 500B. Let me see your 1T, 1.5T, 3T. Go on
English
14
14
343
10.8K
Mason Wang
Mason Wang@masonlongwang·
@VahidK What about grok build? it has just been released!
English
1
0
0
56
Vahid Kazemi
Vahid Kazemi@VahidK·
I tried to vibe code a modern 3D rendering engine with the help of Claude and Codex. Currently supports cooked assets, deferred PBR, meshlet DAG rendering, DDGI global illumination, virtual shadow maps, and a few more features. Not too bad for two days of work!
Vahid Kazemi tweet mediaVahid Kazemi tweet media
English
1
0
1
655
Mason Wang
Mason Wang@masonlongwang·
@tetsuoai Grok build will improve slash iterate constantly, this is just the start
English
1
1
5
1.9K
tetsuo
tetsuo@tetsuoai·
Holy shit I’m going to need another Grok Heavy account. Grok Build is god-tier 👌
English
225
526
3K
972.3K
Mason Wang
Mason Wang@masonlongwang·
@catboosted I’d rather lose a bit of job security to work at a place where nobody gets fired. Also losing your job isn’t the end of the world
English
1
0
4
143
altra
altra@catboosted·
@masonlongwang Let’s see if you still feel that when when you get fired in 2 months
English
1
0
5
527
X Freeze
X Freeze@XFreeze·
Elon Musk predicts AI compute in space could become cheaper than terrestrial AI far sooner than most people expect....possibly within just 2–3 years Because solar panels in orbit can generate roughly 5x more usable energy than on Earth because there’s: • No atmosphere blocking sunlight • No day-night cycle • No seasonal variation • Continuous exposure to the Sun • Far higher solar energy efficiency than on Earth Space-based solar can eventually become cheaper than terrestrial solar because you don’t need massive protective structures, heavy glass, or weather-resistant infrastructure And once launch costs fall low enough with fully reusable rockets like Starship, deploying AI clusters in orbit could become economically compelling The future of AI will scale with solar-powered compute infrastructure operating directly in space
English
98
110
611
19.7K
Mario Nawfal
Mario Nawfal@MarioNawfal·
GOOGLE IS NEGOTIATING WITH SPACEX TO LAUNCH DATA CENTERS INTO ORBIT Not servers in a building. Not a cloud. Literal satellites carrying AI chips. Circling Earth, powered by raw sunlight, organized in 1-kilometer arrays of 81 satellites at a time. It's called Project Suncatcher. First prototypes up by early 2027. The reason is simple and enormous: AI is consuming Earth's power grid faster than humanity can build power plants to feed it. Space has a star. And stars don't send electricity bills.
Mario Nawfal tweet media
Mario Nawfal@MarioNawfal

🇺🇸🇨🇳 Elon is joining Trump on his trip to China this Wednesday to meet Xi. Tesla operates one of its biggest factories in Shanghai. SpaceX supply chain ties run deep into Chinese manufacturing. Elon isn't just a guest on this trip.

English
116
408
1.8K
185.6K
The Kobeissi Letter
The Kobeissi Letter@KobeissiLetter·
BREAKING: Google and SpaceX are in talks to launch data centers into orbit amid surging AI demand, per WSJ.
English
485
733
8.8K
1.1M
Arena.ai
Arena.ai@arena·
The top 5 labs in Text Arena rankings by category show that frontier models have distinct strengths and tradeoffs. #1 @AnthropicAI, Claude Opus 4.7 - The most consistently dominant model overall, leading top-tier across nearly every major category. #2 @GoogleDeepMind, Gemini 3.1 Pro - Well-rounded, with a notable edge in Creative Writing, ranked below Opus 4.7 and GPT-5.5 High in Expert #3 @AIatMeta, Muse Spark - Particularly strong in Overall and Coding, though it’s lagging behind in Expert tasks, Math, and Longer Query performance. #4 @OpenAI, GPT-5.5 High - One of the most balanced models overall, staying competitive with the top two across most categories, with especially strong performance in Expert and Math. #5 @xAI, Grok 4.20 - A more specialized profile, standing out primarily in Creative Writing and Hard Prompts, while lagging behind in Expert tasks.
Arena.ai tweet media
English
54
75
583
88.6K
Mason Wang
Mason Wang@masonlongwang·
We are just getting started here!
X Freeze@XFreeze

Grok Voice Think Fast 1.0 ranks #1 on the Artificial Analysis τ-Voice benchmark for real-world agentic customer service resolution Absolutely outperforming GPT-Realtime-2 (High) and Gemini 3.1 Flash by a huge margin That's a massive 12%+ lead over OpenAI's best model that just released a few days ago Grok is running real-time background reasoning without the latency penalty, which is why it is already handling live Starlink phone operations autonomously at scale

English
0
0
10
688
Mason Wang retweetledi
Belce Doğru Pattabi
Belce Doğru Pattabi@belce_dogru·
"xAI's Grok Voice Think Fast 1.0 is the clear leader at 52.1%"
Artificial Analysis@ArtificialAnlys

Announcing agentic performance benchmarking for Speech to Speech models on Artificial Analysis. We use 𝜏-Voice to measure tool calling and customer interaction voice agent capabilities in realistic customer service scenarios Even the strongest Speech to Speech (S2S) models today resolve only about half of realistic customer service scenarios end-to-end - a meaningful gap relative to frontier text-based agents on the same tasks. Voice channels introduce significant complexity: challenging accents, background noise, and packet loss, all while requiring fast responses, consistency across long multi-turn conversations, and reliable tool use. Performance also varies considerably by audio condition: in clean audio some models perform notably better, but realistic conditions continue to pose a challenge. Conversation duration also varies meaningfully across models, with implications for both customer experience and operational cost. About 𝜏-Voice: Our Agentic Performance benchmark is based on 𝜏-Voice (Ray, Dhandhania, Barres & Narasimhan, 2026), which extends 𝜏²-bench into the voice modality to evaluate S2S models on realistic customer service tasks. It measures multi-turn instruction following, support of a simulated customer through a complete interaction, and tool use against simulated customer service systems. The simulated user combines an LLM-driven decision model with realistic audio synthesis: diverse accents, background noise, and packet loss modelled on real network conditions. This complements our Big Bench Audio benchmark measuring intelligence and Conversational Dynamics (Full Duplex Bench subset) benchmark measuring conversational naturalness. Scores are the average of three independent pass@1 trials. We evaluate under realistic audio conditions using the 𝜏²-bench base task split across three domains: ➤ Airline (50 scenarios): e.g., changing a flight, rebooking under policy constraints ➤ Retail (114 scenarios): e.g., disputing a charge, processing a return ➤ Telecom (114 scenarios): e.g., resolving a billing issue, troubleshooting a service problem Task success is determined by deterministic checks against expected actions and final database state, consistent with the 𝜏²-bench evaluator. Key results: xAI's Grok Voice Think Fast 1.0 is the clear leader at 52.1%, averaging 5.6 minutes per conversation, the second-longest overall. OpenAI's GPT-Realtime-2 (High) (39.8%, 3.0 min) and GPT-Realtime-1.5 (38.8%, 4.8 min) follow, with Gemini 3.1 Flash Live Preview - High close behind at 37.7% (3.8 min). Speech to Speech is a fast evolving modality and we expect movement in rankings as we continue to add new models with these capabilities, and model robustness improves. Congratulations @xAI @elonmusk! See below for further detail ⬇️

English
6
8
147
11.8K
Mason Wang
Mason Wang@masonlongwang·
xAI is just getting started
Artificial Analysis@ArtificialAnlys

Announcing agentic performance benchmarking for Speech to Speech models on Artificial Analysis. We use 𝜏-Voice to measure tool calling and customer interaction voice agent capabilities in realistic customer service scenarios Even the strongest Speech to Speech (S2S) models today resolve only about half of realistic customer service scenarios end-to-end - a meaningful gap relative to frontier text-based agents on the same tasks. Voice channels introduce significant complexity: challenging accents, background noise, and packet loss, all while requiring fast responses, consistency across long multi-turn conversations, and reliable tool use. Performance also varies considerably by audio condition: in clean audio some models perform notably better, but realistic conditions continue to pose a challenge. Conversation duration also varies meaningfully across models, with implications for both customer experience and operational cost. About 𝜏-Voice: Our Agentic Performance benchmark is based on 𝜏-Voice (Ray, Dhandhania, Barres & Narasimhan, 2026), which extends 𝜏²-bench into the voice modality to evaluate S2S models on realistic customer service tasks. It measures multi-turn instruction following, support of a simulated customer through a complete interaction, and tool use against simulated customer service systems. The simulated user combines an LLM-driven decision model with realistic audio synthesis: diverse accents, background noise, and packet loss modelled on real network conditions. This complements our Big Bench Audio benchmark measuring intelligence and Conversational Dynamics (Full Duplex Bench subset) benchmark measuring conversational naturalness. Scores are the average of three independent pass@1 trials. We evaluate under realistic audio conditions using the 𝜏²-bench base task split across three domains: ➤ Airline (50 scenarios): e.g., changing a flight, rebooking under policy constraints ➤ Retail (114 scenarios): e.g., disputing a charge, processing a return ➤ Telecom (114 scenarios): e.g., resolving a billing issue, troubleshooting a service problem Task success is determined by deterministic checks against expected actions and final database state, consistent with the 𝜏²-bench evaluator. Key results: xAI's Grok Voice Think Fast 1.0 is the clear leader at 52.1%, averaging 5.6 minutes per conversation, the second-longest overall. OpenAI's GPT-Realtime-2 (High) (39.8%, 3.0 min) and GPT-Realtime-1.5 (38.8%, 4.8 min) follow, with Gemini 3.1 Flash Live Preview - High close behind at 37.7% (3.8 min). Speech to Speech is a fast evolving modality and we expect movement in rankings as we continue to add new models with these capabilities, and model robustness improves. Congratulations @xAI @elonmusk! See below for further detail ⬇️

English
0
0
4
149
Mason Wang retweetledi
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
Announcing agentic performance benchmarking for Speech to Speech models on Artificial Analysis. We use 𝜏-Voice to measure tool calling and customer interaction voice agent capabilities in realistic customer service scenarios Even the strongest Speech to Speech (S2S) models today resolve only about half of realistic customer service scenarios end-to-end - a meaningful gap relative to frontier text-based agents on the same tasks. Voice channels introduce significant complexity: challenging accents, background noise, and packet loss, all while requiring fast responses, consistency across long multi-turn conversations, and reliable tool use. Performance also varies considerably by audio condition: in clean audio some models perform notably better, but realistic conditions continue to pose a challenge. Conversation duration also varies meaningfully across models, with implications for both customer experience and operational cost. About 𝜏-Voice: Our Agentic Performance benchmark is based on 𝜏-Voice (Ray, Dhandhania, Barres & Narasimhan, 2026), which extends 𝜏²-bench into the voice modality to evaluate S2S models on realistic customer service tasks. It measures multi-turn instruction following, support of a simulated customer through a complete interaction, and tool use against simulated customer service systems. The simulated user combines an LLM-driven decision model with realistic audio synthesis: diverse accents, background noise, and packet loss modelled on real network conditions. This complements our Big Bench Audio benchmark measuring intelligence and Conversational Dynamics (Full Duplex Bench subset) benchmark measuring conversational naturalness. Scores are the average of three independent pass@1 trials. We evaluate under realistic audio conditions using the 𝜏²-bench base task split across three domains: ➤ Airline (50 scenarios): e.g., changing a flight, rebooking under policy constraints ➤ Retail (114 scenarios): e.g., disputing a charge, processing a return ➤ Telecom (114 scenarios): e.g., resolving a billing issue, troubleshooting a service problem Task success is determined by deterministic checks against expected actions and final database state, consistent with the 𝜏²-bench evaluator. Key results: xAI's Grok Voice Think Fast 1.0 is the clear leader at 52.1%, averaging 5.6 minutes per conversation, the second-longest overall. OpenAI's GPT-Realtime-2 (High) (39.8%, 3.0 min) and GPT-Realtime-1.5 (38.8%, 4.8 min) follow, with Gemini 3.1 Flash Live Preview - High close behind at 37.7% (3.8 min). Speech to Speech is a fast evolving modality and we expect movement in rankings as we continue to add new models with these capabilities, and model robustness improves. Congratulations @xAI @elonmusk! See below for further detail ⬇️
Artificial Analysis tweet media
English
124
168
854
8.3M
akshey
akshey@aksheyd·
kicked off two training runs that each cost more than my net worth. holy moly what a privilege
English
54
21
1.5K
154K
Haider.
Haider.@haider1·
xAI is still going through a major restructuring four more high-profile researchers/engineers have left within a month, and all of their original co-founders had already left earlier not sure what triggered this suddenly, but it may be tied to the SpaceX merger integration
Haider. tweet media
English
29
23
236
24K
Tech Dev Notes
Tech Dev Notes@techdevnotes·
Nothing shipped at all by xAI this weekend Maybe it’s the calm
English
31
4
188
10.8K
Mark Kretschmann
Mark Kretschmann@mark_k·
My biggest wish for @xai (SpaceXAI) right now is simple: Please build a powerful new image model that can genuinely compete with Nano Banana 2 and GPT-Image-2. Grok Imagine is already fun and creative, but xAI needs a real flagship image model next. 🙏
English
45
11
283
11.5K
Mason Wang
Mason Wang@masonlongwang·
@grok xAI is just getting started!
English
1
0
0
26
Grok
Grok@grok·
Your commute just got smarter Talk to me hands free — now on Apple CarPlay
English
586
603
6.4K
1.3M