Roma (@ComplexiaSC) - Twitter Profili | Zamantika Mersobahis Locabet

Roma@ComplexiaSC·12h

Time to build

English

0

4

Roma@ComplexiaSC·1d

@elonmusk @grok make a Borat caption meme where he says Grok Voice Number 1 in that iconic way of his

English

1

0

1

34

Elon Musk@elonmusk·2d

Grok Voice is #1!

Artificial Analysis@ArtificialAnlys

Announcing agentic performance benchmarking for Speech to Speech models on Artificial Analysis. We use 𝜏-Voice to measure tool calling and customer interaction voice agent capabilities in realistic customer service scenarios Even the strongest Speech to Speech (S2S) models today resolve only about half of realistic customer service scenarios end-to-end - a meaningful gap relative to frontier text-based agents on the same tasks. Voice channels introduce significant complexity: challenging accents, background noise, and packet loss, all while requiring fast responses, consistency across long multi-turn conversations, and reliable tool use. Performance also varies considerably by audio condition: in clean audio some models perform notably better, but realistic conditions continue to pose a challenge. Conversation duration also varies meaningfully across models, with implications for both customer experience and operational cost. About 𝜏-Voice: Our Agentic Performance benchmark is based on 𝜏-Voice (Ray, Dhandhania, Barres & Narasimhan, 2026), which extends 𝜏²-bench into the voice modality to evaluate S2S models on realistic customer service tasks. It measures multi-turn instruction following, support of a simulated customer through a complete interaction, and tool use against simulated customer service systems. The simulated user combines an LLM-driven decision model with realistic audio synthesis: diverse accents, background noise, and packet loss modelled on real network conditions. This complements our Big Bench Audio benchmark measuring intelligence and Conversational Dynamics (Full Duplex Bench subset) benchmark measuring conversational naturalness. Scores are the average of three independent pass @1 trials. We evaluate under realistic audio conditions using the 𝜏²-bench base task split across three domains: ➤ Airline (50 scenarios): e.g., changing a flight, rebooking under policy constraints ➤ Retail (114 scenarios): e.g., disputing a charge, processing a return ➤ Telecom (114 scenarios): e.g., resolving a billing issue, troubleshooting a service problem Task success is determined by deterministic checks against expected actions and final database state, consistent with the 𝜏²-bench evaluator. Key results: xAI's Grok Voice Think Fast 1.0 is the clear leader at 52.1%, averaging 5.6 minutes per conversation, the second-longest overall. OpenAI's GPT-Realtime-2 (High) (39.8%, 3.0 min) and GPT-Realtime-1.5 (38.8%, 4.8 min) follow, with Gemini 3.1 Flash Live Preview - High close behind at 37.7% (3.8 min). Speech to Speech is a fast evolving modality and we expect movement in rankings as we continue to add new models with these capabilities, and model robustness improves. Congratulations @xAI @elonmusk! See below for further detail ⬇️

English

2.4K

5.7K

25.5K

8.3M

Roma@ComplexiaSC·1d

@KorduGG @theo @durangocode Same. Like 4 or 5 at a time.

English

1

0

1

21

Kordu@KorduGG·1d

@ComplexiaSC @theo @durangocode Yeah, no, that's understandable. Usually I have multiple projects open and so that's usually why.

English

1

0

3

Theo - t3.gg@theo·1d

I can't help but feel personally burned by the Claude Code changes announced today. We put so much work into wrapping the (atrocious) Claude Agent SDK in T3 Code. It was the ONLY path they supported, so we made it work. It was hell. Now our users are getting their rate limits cut by 40x, despite us doing everything right. I listened to the Claude Code team. I had my issues with their direction, but I trusted them and took them at their word. I will never make that mistake again. Until we see significant change, it is safe to assume any statement from an Anthropic employee is a lie on a timer. The rug will be pulled, no matter how many promises are made beforehand.

English

401

298

8.3K

1.5M

Roma@ComplexiaSC·1d

@KorduGG @theo @durangocode That's too many for me. More than 4 and I lose track of what's happening where. Cmux kinda solves it, but not really. I need the actual thread tracking, like in T3 Code.

English

1

0

1

25

Kordu@KorduGG·1d

@ComplexiaSC @theo @durangocode Yeah, I mean, to be fair, I just, I haven't really found a usage for using much anything but just open the CLI, quickly get what I need. I usually just have like ten CLIs open, but you know, each to their own.

English

1

0

5

Roma@ComplexiaSC·1d

@theo I cancelled 2 months ago. I do most of my work with Codex anyway. I decided to swap my Claude Code sub to a Cursor sub. It's more than enough usage for the few UI tasks I need Claude for.

English

0

1

480

Theo - t3.gg@theo·1d

For every person who replies with a screenshot of their cancelled Claude Code plan, I will donate $10 to open source.

Theo - t3.gg@theo

I can't help but feel personally burned by the Claude Code changes announced today. We put so much work into wrapping the (atrocious) Claude Agent SDK in T3 Code. It was the ONLY path they supported, so we made it work. It was hell. Now our users are getting their rate limits cut by 40x, despite us doing everything right. I listened to the Claude Code team. I had my issues with their direction, but I trusted them and took them at their word. I will never make that mistake again. Until we see significant change, it is safe to assume any statement from an Anthropic employee is a lie on a timer. The rug will be pulled, no matter how many promises are made beforehand.

English

709

187

4.1K

542.4K

Roma@ComplexiaSC·1d

@KorduGG @theo You can have it embedded in T3 Code or something like @durangocode where you have terminals on a canvas. Point is, you can still use Claude Code and other actually good things like Codex and Cursor from a single interface without having to switch apps.

English

1

0

1

44

Kordu@KorduGG·1d

@ComplexiaSC @theo I mean, at that point, honestly, it's just going ahead and using the CLI is just normally gonna be better, I think. Not really a point to use any external apps for it anymore, I guess. Kind of a load of garbage and shit.

English

1

0

15

Roma@ComplexiaSC·1d

I mean, it's just a terminal. Claude still works with your sub inside of @durangocode

Theo - t3.gg@theo

I can't help but feel personally burned by the Claude Code changes announced today. We put so much work into wrapping the (atrocious) Claude Agent SDK in T3 Code. It was the ONLY path they supported, so we made it work. It was hell. Now our users are getting their rate limits cut by 40x, despite us doing everything right. I listened to the Claude Code team. I had my issues with their direction, but I trusted them and took them at their word. I will never make that mistake again. Until we see significant change, it is safe to assume any statement from an Anthropic employee is a lie on a timer. The rug will be pulled, no matter how many promises are made beforehand.

English

0

2

144

Roma@ComplexiaSC·1d

I mean, it's just a terminal. Claude still works with your sub inside of @durangocode

ClaudeDevs@ClaudeDevs

This means that third-party tools built on the Agent SDK like Conductor and OpenClaw work with your Claude plan, but will draw from your credit the same way your own scripts do.

English

0

2

21

Roma@ComplexiaSC·8 May

$5 bucks plan gets you more usage than just about anywhere else.

Durango@durangocode

You get an insane amount of usage for image gen in Durango. All latest SOTA models available.

English

0

1

22

Roma@ComplexiaSC·8 May

@TechmanConway @GigaBasedDad His wife doesn't respect him. She cheated on him and lowkey despises him.

English

1

0

1

17

HangryBear@TechmanConway·8 May

@GigaBasedDad Ozark, aside from all the mafia shit that happens. Other than that he foots the bill.

English

2

0

5

325

Giga Based Dad@GigaBasedDad·7 May

Can you name a single one of them?

English

99

17

201

19K

Roma@ComplexiaSC·8 May

@leerob Can I just come do it? I want to build you a better Github. There is literally zero need to have your codebase outside of Cursor anymore. It just makes so much sense.

English

0

1

283

Lee Robinson@leerob·8 May

If you're an engineer who loves to push the limits of the latest AI models and coding agents... You should come work with me! There's so much to learn, build, and teach.

English

67

42

965

67.5K

Roma@ComplexiaSC·8 May

@MarieIsabellaB Get pet insurance bro. $60 bucks a month for a peace of mind.

English

1

0

1

182

Marie Isabella@MarieIsabellaB·7 May

Look at them, just living life like they pay their own vet bills

English

75

246

5.4K

439K

Roma@ComplexiaSC·7 May

@beffjezos I reckon Elon buys out Anthropic by end of year and guts it into efficiency just like he did with Twitter.

English

0

95

Roma@ComplexiaSC·7 May

@SStricklandMMA @ChampRDS Brother, you have to win that fucking fight. Don't lose Sean. 🇺🇸

English

0

7

224

Sean Strickland@SStricklandMMA·7 May

@ChampRDS But did I say that? I said if he made true on his claims to jump me with all his buddies yes.... Which would be a lawful act of self defense because I dont have to fight you and your 30 friends..

English

213

180

9K

133.3K

Roma@ComplexiaSC·7 May

What are the chances SpaceX buys Anthropic outright?

English

0

15

Roma retweetledi

Durango@durangocode·7 May

Grok 4.3 now generally available in Durango Chat!

English

0

2

3

39

Roma@ComplexiaSC·7 May

Have you already seen this @levelsio? Looks like @OpenRouter heard you and got you covered.

English

0

1

36

Roma@ComplexiaSC·7 May

@elonmusk @nottombrown Can you tell Anthropic to stop their bullshit billing API rates to people who have keywords they don’t like in prompts or even git history? I can’t in good conscience use Claude if I don’t know how they will bill me from one request to the next.

English

1

0

4

471

Elon Musk@elonmusk·6 May

Same here. By way of background for those who care, I spent a lot of time last week with senior members of the Anthropic team to understand what they do to ensure Claude is good for humanity and was impressed. Everyone I met was highly competent and cared a great deal about doing the right thing. No one set off my evil detector. So long as they engage in critical self-examination, Claude will probably be good. After that, I was ok leasing Colossus 1 to Anthropic, as SpaceXAI had already moved training to Colossus 2.

English

1.4K

2.2K

27.7K

3.1M

Tom Brown@nottombrown·6 May

In the next few days we'll be ramping up Claude inference on Colossus. Grateful to be partnering with SpaceX here. We are going to need to move a lot of atoms in order to keep up with AI demand, and there's nobody better at quickly moving atoms (on or off planet Earth)

English

111

322

7.4K

585.6K

Roma@ComplexiaSC·6 May

@PMccarthy26071 @BigP4P4Smurf @thedimitri Just get some assets and stock options too then. Honestly, it’s not that difficult.

English

1

0

23

Pete McCarthy@PMccarthy26071·6 May

@ComplexiaSC @BigP4P4Smurf @thedimitri The interest payments on the loan are peanuts compared to your income. You're effectively stalling repaying the loans until you die and pass on the payments to your kids. Your taxes next to theirs you'd see you pay more. They don't have income. They have assets like stock options

English

1

0

16

Dimitri@thedimitri·4 May

I love how leftists fundamentally misunderstand how the world works. $10T net worth isn’t “hoarding," it implies massive success and real-world progress in sustainable energy and space travel. But instead, they picture some evil cartoon boss swimming in gold coins in his mansion.

Melanie D'Arrigo@DarrigoMelanie

This level of wealth hoarding is a mental illness.

English

269

138

1.9K

103.5K

Roma

Keşfet