Vals AI (@ValsAI) - Twitter Profili | Zamantika Mersobahis Locabet

Vals AI@ValsAI·22h

We usually set the hills to climb, but sometimes we climb them ourselves 🏔️ Some of the Vals team did the Dipsea Trail this weekend. Couldn't have asked for a better crew! Want to join the team? 💼 We're hiring for: - Member of Technical Staff, Research - Member of Technical Staff, Platform - Evaluations Engineer - Security Engineer - Head of Research Apply today! jobs.ashbyhq.com/vals-ai

English

4

43

4.9K

Vals AI@ValsAI·3d

In case you missed it, this was a big week for model releases. Grok 4.5, GPT-5.6 Sol, and Muse Spark 1.1 all landed within days of each other, and all three cracked the top 10 on the Vals Index. The pace of progress right now is hard to overstate. Here's our week in review

English

3

7

94

6.1K

Vals AI@ValsAI·3d

As always, full results and benchmark breakdown are at vals.ai/models/grok_gr… Congrats @elonmusk and @SpaceXAI

English

0

3

919

Vals AI@ValsAI·3d

Across the newly completed benchmarks, Grok 4.5 picked up five top-10 finishes: #8 on CorpFin, #8 on MMLU-Pro, and #9 on LegalBench, Code Migration, and the Vals Multimodal Index.

English

2

0

2

1K

Vals AI@ValsAI·3d

Full results for Grok 4.5 are in and live on Vals! Compared to Grok 4.3, it gained 34 points on Excel Modeling Benchmark, 30 points on Code Migration Benchmark, and 13 points on MortgageTax.

English

3

6

93

5.8K

Vals AI@ValsAI·4d

@PhilHedayatnia Thank you for coming!

English

0

1

44

Phil Hedayatnia@PhilHedayatnia·4d

@ValsAI Thanks for having us!

English

1

0

1

265

Vals AI@ValsAI·4d

📢Vals is at ICML! Last night we hosted a small dinner and it was one of the highlights of the trip. Grateful for everybody who attended and already looking forward to more conversations. If you're at ICML, DM us to say hi!

English

3

0

41

5K

Vals AI@ValsAI·4d

@SuperNyx1024 @RayanKrishnan is the best!

English

0

1

52

Xiaopu Peng @ ICML@SuperNyx1024·4d

@ValsAI It looked like a great event! I got to meet Rayan at the RL + Agents lunch during ICML, really enjoyed the conversations!😆

English

1

0

1

142

Vals AI@ValsAI·4d

@EdwardSun0909 Congrats on the release!

English

0

1

197

Zhiqing Sun@EdwardSun0909·5d

Muse Spark 1.1 is #4 on the overall vals index eval and fastest model in top 10!

Vals AI@ValsAI

Muse Spark 1.1 ranks #4 on the Vals Index and is the fastest model in the top 10, running roughly 3x faster than the top three models having an average latency of 388s.

English

9

40

274

37.9K

Vals AI@ValsAI·4d

@scaling01 See full results here vals.ai/benchmarks/pro…

English

0

3

452

Lisan al Gaib@scaling01·5d

GPT-5.6-Sol is a clear step-up from GPT-5.5 on ProgramBench

English

18

8

452

21.4K

Vals AI retweetledi

Alexandr Wang@alexandr_wang·5d

Muse Spark 1.1 is ranked #4 on the Vals Index, ahead of GPT-5.5 and Grok 4.5

Vals AI@ValsAI

Muse Spark 1.1 ranks #4 on the Vals Index and is the fastest model in the top 10, running roughly 3x faster than the top three models having an average latency of 388s.

English

64

74

1.1K

381.1K

Vals AI retweetledi

Alexandr Wang@alexandr_wang·5d

Muse Spark 1.1 is SOTA on Harvey's Legal Bench, TaxEval, and MedScribe. It's cool to see that our model outperforms even Fable in a few areas :)

Vals AI@ValsAI

Meta just released Muse Spark 1.1 and is the new SOTA on MedScribe and TaxEval, taking the top spot from Fable 5 while being 10x cheaper and twice as fast. Meta currently holds the top 2 spots on TaxEval It is also the new #1 on Harvey's Legal Agent Bench, dethroning Grok 4.5 less than 24 hours after it took the top spot.

English

108

99

1.5K

381.7K

Vals AI@ValsAI·4d

@bindureddy See how it did on our benchmarks! x.com/ValsAI/status/…

Vals AI@ValsAI

OpenAI’s most anticipated model of the year is here, and it ranks #2 on Vals Index and Vals Multimodal Index. Although Fable 5 is still ahead on several benchmarks, GPT 5.6 is clearly in the same class, and is able to complete tasks like our CyberBench that Fable refuses.

English

0

264

Bindu Reddy@bindureddy·5d

👑 GPT 5.6 SOL LAUNCHES AS THE BENCHMARK LEADER ON LIVEBENCH AI 👑 GPT 5.6 sol crushes all benchmarks and comes #1. Fable is still the go-to model for complex multi-turn agentic loops. 5.6 is an INSANELY good model and we expect 80% of all Opus API workloads to move there It's faster, cheaper and better than Opus class moodels

English

41

30

304

20K

Vals AI@ValsAI·5d

@OpenAIDevs x.com/ValsAI/status/…

Vals AI@ValsAI

OpenAI’s most anticipated model of the year is here, and it ranks #2 on Vals Index and Vals Multimodal Index. Although Fable 5 is still ahead on several benchmarks, GPT 5.6 is clearly in the same class, and is able to complete tasks like our CyberBench that Fable refuses.

QME

0

4

1.6K

OpenAI Developers@OpenAIDevs·5d

GPT-5.6 is rolling out in the API. Sol is our flagship model, leading in coding, knowledge work, cybersecurity, and science. Terra delivers performance competitive with GPT-5.5 at lower cost. Luna is our fastest, most affordable model for high-volume tasks.

English

120

184

2.7K

230.1K

Vals AI@ValsAI·5d

@sama Congrats on Sol! See how it did across the board x.com/ValsAI/status/…

Vals AI@ValsAI

OpenAI’s most anticipated model of the year is here, and it ranks #2 on Vals Index and Vals Multimodal Index. Although Fable 5 is still ahead on several benchmarks, GPT 5.6 is clearly in the same class, and is able to complete tasks like our CyberBench that Fable refuses.

English

0

1

5

1.4K

Sam Altman@sama·5d

obviously the best model we have ever produced, but also one of the best blog posts we have ever produced: openai.com/index/gpt-5-6/

English

633

818

9.5K

1.1M

Vals AI@ValsAI·5d

@OpenAI Great release, see how it did on our benchmarks x.com/ValsAI/status/…

Vals AI@ValsAI

OpenAI’s most anticipated model of the year is here, and it ranks #2 on Vals Index and Vals Multimodal Index. Although Fable 5 is still ahead on several benchmarks, GPT 5.6 is clearly in the same class, and is able to complete tasks like our CyberBench that Fable refuses.

English

0

1

7

4.2K

OpenAI@OpenAI·5d

GPT‑5.6 is available starting today across ChatGPT, Codex, and the OpenAI API. The rollout is starting globally now and will continue gradually toward full availability over the next 24 hours. In ChatGPT, Plus, Pro, Business, and Enterprise users access GPT-5.6 Sol through medium and higher effort settings. Pro and Enterprise users can also select GPT‑5.6 Pro for the highest-quality results on complex tasks.

English

47

158

1.2K

199K

OpenAI@OpenAI·5d

Sol, Terra, and Luna, our GPT‑5.6 family of models, are starting to roll out now in ChatGPT, Codex, and the API.

English

585

1.4K

12.5K

4.2M

Vals AI@ValsAI·5d

Congrats @OpenAI on the long-awaited model! For the full results, including on Terra and Luna, visit vals.ai/models/openai_…

English

0

4

1.2K

Vals AI@ValsAI·5d

GPT 5.6 - Sol has a 1M context window, 128k max output tokens, and 748.4 latency on the Vals Index. We ran the model on OpenAI’s default provider settings, with max reasoning effort on all benchmarks but Terminal Bench.

English

1

0

6

1.6K

Vals AI@ValsAI·5d

OpenAI’s most anticipated model of the year is here, and it ranks #2 on Vals Index and Vals Multimodal Index. Although Fable 5 is still ahead on several benchmarks, GPT 5.6 is clearly in the same class, and is able to complete tasks like our CyberBench that Fable refuses.

English

5

13

146

18.9K

Vals AI

Keşfet