Mason Wang (@masonlongwang) - Twitter Profili | Zamantika Mersobahis Locabet

Mason Wang@masonlongwang·10h

@Techmeme @theo_wayt xAI is very egalitarian. Doesn’t matter if you finished your PhD or even your undergrad. Your ideas are judged based on their merit and not your credentials, which means everyone gets humbled at some point.

English

0

51

Techmeme@Techmeme·21h

Sources: 50+ researchers and engineers have left xAI since the SpaceX acquisition via layoffs, firings, and voluntary departures; many have joined Meta and TML (@theo_wayt / The Information) (Visit Techmeme dot com for the link and full context!)

English

2

23

6.7K

Mason Wang@masonlongwang·11h

@teortaxesTex Everyone at xAI is constantly stepping up their game

English

0

13

442

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·13h

for all the mockery, I really want xAI to step up their game. They're one of the very few labs who dare to make LLMs with a different personality core, not just better/worse versions of the same thing. 4.3 is… not bad for 500B. Let me see your 1T, 1.5T, 3T. Go on

English

14

343

10.8K

Mason Wang@masonlongwang·11h

@VahidK What about grok build? it has just been released!

English

1

0

56

Vahid Kazemi@VahidK·12h

I tried to vibe code a modern 3D rendering engine with the help of Claude and Codex. Currently supports cooked assets, deferred PBR, meshlet DAG rendering, DDGI global illumination, virtual shadow maps, and a few more features. Not too bad for two days of work!

English

1

0

1

655

Mason Wang@masonlongwang·14h

@tetsuoai Grok build will improve slash iterate constantly, this is just the start

English

1

5

1.9K

tetsuo@tetsuoai·18h

Holy shit I’m going to need another Grok Heavy account. Grok Build is god-tier 👌

English

225

526

3K

972.3K

Mason Wang@masonlongwang·14h

@catboosted I’d rather lose a bit of job security to work at a place where nobody gets fired. Also losing your job isn’t the end of the world

English

1

0

4

143

altra@catboosted·15h

@masonlongwang Let’s see if you still feel that when when you get fired in 2 months

English

1

0

5

527

altra@catboosted·20h

What's more telling is the labs they're NOT going to Talent hierarchy speaks for itself

Theo Wayt@theo_wayt

SpaceXAI update: >50 researchers/engineers have left xAI through layoffs, firings and resignations since SpaceX acquired it in Feb Exits in past 2 weeks include former heads of coding, world models, Grok voice At least 11 have gone to Meta and 7 to Thinking Machines Lab

English

3

0

84

39.7K

Mason Wang@masonlongwang·16h

xAI's voice agent is SOTA, and we are just getting started!

Antoine Rousseaux@AntoineRSX

This is by far the best voice agent I've ever tried so far. If you have a physical business, you need this. If you have an AI Agent, this is the next big thing. Judge it by yourself:

English

2

0

1

419

Mason Wang@masonlongwang·2d

@XFreeze This is xAIs trump card

English

0

35

X Freeze@XFreeze·2d

Elon Musk predicts AI compute in space could become cheaper than terrestrial AI far sooner than most people expect....possibly within just 2–3 years Because solar panels in orbit can generate roughly 5x more usable energy than on Earth because there’s: • No atmosphere blocking sunlight • No day-night cycle • No seasonal variation • Continuous exposure to the Sun • Far higher solar energy efficiency than on Earth Space-based solar can eventually become cheaper than terrestrial solar because you don’t need massive protective structures, heavy glass, or weather-resistant infrastructure And once launch costs fall low enough with fully reusable rockets like Starship, deploying AI clusters in orbit could become economically compelling The future of AI will scale with solar-powered compute infrastructure operating directly in space

English

98

110

611

19.7K

Mason Wang@masonlongwang·2d

@MarioNawfal xAI always has an ace up its sleeve

English

1

0

1

47

Mario Nawfal@MarioNawfal·3d

GOOGLE IS NEGOTIATING WITH SPACEX TO LAUNCH DATA CENTERS INTO ORBIT Not servers in a building. Not a cloud. Literal satellites carrying AI chips. Circling Earth, powered by raw sunlight, organized in 1-kilometer arrays of 81 satellites at a time. It's called Project Suncatcher. First prototypes up by early 2027. The reason is simple and enormous: AI is consuming Earth's power grid faster than humanity can build power plants to feed it. Space has a star. And stars don't send electricity bills.

Mario Nawfal@MarioNawfal

🇺🇸🇨🇳 Elon is joining Trump on his trip to China this Wednesday to meet Xi. Tesla operates one of its biggest factories in Shanghai. SpaceX supply chain ties run deep into Chinese manufacturing. Elon isn't just a guest on this trip.

English

116

408

1.8K

185.6K

Mason Wang@masonlongwang·2d

@KobeissiLetter xAI is about to really get going now

English

0

23

The Kobeissi Letter@KobeissiLetter·3d

BREAKING: Google and SpaceX are in talks to launch data centers into orbit amid surging AI demand, per WSJ.

English

485

733

8.8K

1.1M

Mason Wang@masonlongwang·2d

@arena @AnthropicAI @GoogleDeepMind Time to lock in. xAI is just ramping up

English

0

2

73

Arena.ai@arena·3d

The top 5 labs in Text Arena rankings by category show that frontier models have distinct strengths and tradeoffs. #1 @AnthropicAI, Claude Opus 4.7 - The most consistently dominant model overall, leading top-tier across nearly every major category. #2 @GoogleDeepMind, Gemini 3.1 Pro - Well-rounded, with a notable edge in Creative Writing, ranked below Opus 4.7 and GPT-5.5 High in Expert #3 @AIatMeta, Muse Spark - Particularly strong in Overall and Coding, though it’s lagging behind in Expert tasks, Math, and Longer Query performance. #4 @OpenAI, GPT-5.5 High - One of the most balanced models overall, staying competitive with the top two across most categories, with especially strong performance in Expert and Math. #5 @xAI, Grok 4.20 - A more specialized profile, standing out primarily in Creative Writing and Hard Prompts, while lagging behind in Expert tasks.

English

54

75

583

88.6K

Mason Wang@masonlongwang·2d

We are just getting started here!

X Freeze@XFreeze

Grok Voice Think Fast 1.0 ranks #1 on the Artificial Analysis τ-Voice benchmark for real-world agentic customer service resolution Absolutely outperforming GPT-Realtime-2 (High) and Gemini 3.1 Flash by a huge margin That's a massive 12%+ lead over OpenAI's best model that just released a few days ago Grok is running real-time background reasoning without the latency penalty, which is why it is already handling live Starlink phone operations autonomously at scale

English

0

10

688

Mason Wang retweetledi

Belce Doğru Pattabi@belce_dogru·2d

"xAI's Grok Voice Think Fast 1.0 is the clear leader at 52.1%"

Artificial Analysis@ArtificialAnlys

Announcing agentic performance benchmarking for Speech to Speech models on Artificial Analysis. We use 𝜏-Voice to measure tool calling and customer interaction voice agent capabilities in realistic customer service scenarios Even the strongest Speech to Speech (S2S) models today resolve only about half of realistic customer service scenarios end-to-end - a meaningful gap relative to frontier text-based agents on the same tasks. Voice channels introduce significant complexity: challenging accents, background noise, and packet loss, all while requiring fast responses, consistency across long multi-turn conversations, and reliable tool use. Performance also varies considerably by audio condition: in clean audio some models perform notably better, but realistic conditions continue to pose a challenge. Conversation duration also varies meaningfully across models, with implications for both customer experience and operational cost. About 𝜏-Voice: Our Agentic Performance benchmark is based on 𝜏-Voice (Ray, Dhandhania, Barres & Narasimhan, 2026), which extends 𝜏²-bench into the voice modality to evaluate S2S models on realistic customer service tasks. It measures multi-turn instruction following, support of a simulated customer through a complete interaction, and tool use against simulated customer service systems. The simulated user combines an LLM-driven decision model with realistic audio synthesis: diverse accents, background noise, and packet loss modelled on real network conditions. This complements our Big Bench Audio benchmark measuring intelligence and Conversational Dynamics (Full Duplex Bench subset) benchmark measuring conversational naturalness. Scores are the average of three independent pass @1 trials. We evaluate under realistic audio conditions using the 𝜏²-bench base task split across three domains: ➤ Airline (50 scenarios): e.g., changing a flight, rebooking under policy constraints ➤ Retail (114 scenarios): e.g., disputing a charge, processing a return ➤ Telecom (114 scenarios): e.g., resolving a billing issue, troubleshooting a service problem Task success is determined by deterministic checks against expected actions and final database state, consistent with the 𝜏²-bench evaluator. Key results: xAI's Grok Voice Think Fast 1.0 is the clear leader at 52.1%, averaging 5.6 minutes per conversation, the second-longest overall. OpenAI's GPT-Realtime-2 (High) (39.8%, 3.0 min) and GPT-Realtime-1.5 (38.8%, 4.8 min) follow, with Gemini 3.1 Flash Live Preview - High close behind at 37.7% (3.8 min). Speech to Speech is a fast evolving modality and we expect movement in rankings as we continue to add new models with these capabilities, and model robustness improves. Congratulations @xAI @elonmusk! See below for further detail ⬇️

English

6

8

147

11.8K

Mason Wang@masonlongwang·2d

xAI is just getting started

Artificial Analysis@ArtificialAnlys

Announcing agentic performance benchmarking for Speech to Speech models on Artificial Analysis. We use 𝜏-Voice to measure tool calling and customer interaction voice agent capabilities in realistic customer service scenarios Even the strongest Speech to Speech (S2S) models today resolve only about half of realistic customer service scenarios end-to-end - a meaningful gap relative to frontier text-based agents on the same tasks. Voice channels introduce significant complexity: challenging accents, background noise, and packet loss, all while requiring fast responses, consistency across long multi-turn conversations, and reliable tool use. Performance also varies considerably by audio condition: in clean audio some models perform notably better, but realistic conditions continue to pose a challenge. Conversation duration also varies meaningfully across models, with implications for both customer experience and operational cost. About 𝜏-Voice: Our Agentic Performance benchmark is based on 𝜏-Voice (Ray, Dhandhania, Barres & Narasimhan, 2026), which extends 𝜏²-bench into the voice modality to evaluate S2S models on realistic customer service tasks. It measures multi-turn instruction following, support of a simulated customer through a complete interaction, and tool use against simulated customer service systems. The simulated user combines an LLM-driven decision model with realistic audio synthesis: diverse accents, background noise, and packet loss modelled on real network conditions. This complements our Big Bench Audio benchmark measuring intelligence and Conversational Dynamics (Full Duplex Bench subset) benchmark measuring conversational naturalness. Scores are the average of three independent pass @1 trials. We evaluate under realistic audio conditions using the 𝜏²-bench base task split across three domains: ➤ Airline (50 scenarios): e.g., changing a flight, rebooking under policy constraints ➤ Retail (114 scenarios): e.g., disputing a charge, processing a return ➤ Telecom (114 scenarios): e.g., resolving a billing issue, troubleshooting a service problem Task success is determined by deterministic checks against expected actions and final database state, consistent with the 𝜏²-bench evaluator. Key results: xAI's Grok Voice Think Fast 1.0 is the clear leader at 52.1%, averaging 5.6 minutes per conversation, the second-longest overall. OpenAI's GPT-Realtime-2 (High) (39.8%, 3.0 min) and GPT-Realtime-1.5 (38.8%, 4.8 min) follow, with Gemini 3.1 Flash Live Preview - High close behind at 37.7% (3.8 min). Speech to Speech is a fast evolving modality and we expect movement in rankings as we continue to add new models with these capabilities, and model robustness improves. Congratulations @xAI @elonmusk! See below for further detail ⬇️

English

0

4

149

Mason Wang retweetledi

Artificial Analysis@ArtificialAnlys·3d

Announcing agentic performance benchmarking for Speech to Speech models on Artificial Analysis. We use 𝜏-Voice to measure tool calling and customer interaction voice agent capabilities in realistic customer service scenarios Even the strongest Speech to Speech (S2S) models today resolve only about half of realistic customer service scenarios end-to-end - a meaningful gap relative to frontier text-based agents on the same tasks. Voice channels introduce significant complexity: challenging accents, background noise, and packet loss, all while requiring fast responses, consistency across long multi-turn conversations, and reliable tool use. Performance also varies considerably by audio condition: in clean audio some models perform notably better, but realistic conditions continue to pose a challenge. Conversation duration also varies meaningfully across models, with implications for both customer experience and operational cost. About 𝜏-Voice: Our Agentic Performance benchmark is based on 𝜏-Voice (Ray, Dhandhania, Barres & Narasimhan, 2026), which extends 𝜏²-bench into the voice modality to evaluate S2S models on realistic customer service tasks. It measures multi-turn instruction following, support of a simulated customer through a complete interaction, and tool use against simulated customer service systems. The simulated user combines an LLM-driven decision model with realistic audio synthesis: diverse accents, background noise, and packet loss modelled on real network conditions. This complements our Big Bench Audio benchmark measuring intelligence and Conversational Dynamics (Full Duplex Bench subset) benchmark measuring conversational naturalness. Scores are the average of three independent pass@1 trials. We evaluate under realistic audio conditions using the 𝜏²-bench base task split across three domains: ➤ Airline (50 scenarios): e.g., changing a flight, rebooking under policy constraints ➤ Retail (114 scenarios): e.g., disputing a charge, processing a return ➤ Telecom (114 scenarios): e.g., resolving a billing issue, troubleshooting a service problem Task success is determined by deterministic checks against expected actions and final database state, consistent with the 𝜏²-bench evaluator. Key results: xAI's Grok Voice Think Fast 1.0 is the clear leader at 52.1%, averaging 5.6 minutes per conversation, the second-longest overall. OpenAI's GPT-Realtime-2 (High) (39.8%, 3.0 min) and GPT-Realtime-1.5 (38.8%, 4.8 min) follow, with Gemini 3.1 Flash Live Preview - High close behind at 37.7% (3.8 min). Speech to Speech is a fast evolving modality and we expect movement in rankings as we continue to add new models with these capabilities, and model robustness improves. Congratulations @xAI @elonmusk! See below for further detail ⬇️

English

124

168

854

8.3M

Mason Wang@masonlongwang·4d

@aksheyd xAI is just getting started

English

0

2

496

akshey@aksheyd·4d

kicked off two training runs that each cost more than my net worth. holy moly what a privilege

English

54

21

1.5K

154K

Mason Wang@masonlongwang·4d

@haider1 xAI is just getting started

English

1

0

8

609

Haider.@haider1·4d

xAI is still going through a major restructuring four more high-profile researchers/engineers have left within a month, and all of their original co-founders had already left earlier not sure what triggered this suddenly, but it may be tied to the SpaceX merger integration

English

29

23

236

24K

Mason Wang@masonlongwang·4d

@techdevnotes xAI is just getting started

English

0

1

118

Tech Dev Notes@techdevnotes·4d

Nothing shipped at all by xAI this weekend Maybe it’s the calm

English

31

4

188

10.8K

Mason Wang@masonlongwang·5d

@mark_k @xai xAI just getting started

English

1

0

2

70

Mark Kretschmann@mark_k·5d

My biggest wish for @xai (SpaceXAI) right now is simple: Please build a powerful new image model that can genuinely compete with Nano Banana 2 and GPT-Image-2. Grok Imagine is already fun and creative, but xAI needs a real flagship image model next. 🙏

English

45

11

283

11.5K

Mason Wang@masonlongwang·5d

@grok xAI is just getting started!

English

1

0

26