Vals AI

943 posts

Vals AI banner
Vals AI

Vals AI

@ValsAI

Public LLM Evaluation // https://t.co/FjWabQY2jk @8vc @BloombergBeta @pearvc

San Francisco, CA Katılım Mart 2024
243 Takip Edilen7.7K Takipçiler
Vals AI
Vals AI@ValsAI·
One last Minimax M2.7 result for you all - it has broken 25% on Vibe Code Bench. This is a benchmark we created in-house, testing a model's ability to write an application completely from scratch. It is the only Chinese model to do so so far.
Vals AI tweet media
Vals AI@ValsAI

Full Minimax results now available!

English
3
3
93
7.3K
Vals AI
Vals AI@ValsAI·
Results on the remaining benchmarks, including VibeCodeBench, will be released soon. Congrats to @MiniMax_AI on the release 🚀
English
0
0
7
1.6K
Vals AI
Vals AI@ValsAI·
All benchmarks were run with temperature=1, top_p=0.95, and max_tokens=196,608 using the official MiniMax API.
English
1
0
7
1.9K
Vals AI
Vals AI@ValsAI·
This marks a significant jump from Minimax M2.5 (+6.74) on the Vals Index, which itself was an improvement over Minimax M2.1. This has been an extremely rapid rate of improvement by the lab over the last three months.
English
1
1
21
3.3K
Vals AI
Vals AI@ValsAI·
Initial results are in for Minimax 2.7, and it comes in at #12 overall on the Vals Index. If the weights are released, it will be #2 on the open-weight index (only 0.5% behind #1).
Vals AI tweet media
English
11
25
335
30.7K
AiBattle
AiBattle@AiBattle_·
GPT-5.4 Mini and GPT-5.4 Nano are now on the API Pricing: GPT-5.4 Mini: - Input: $0.75 / Input MTok - Output: $4.50 / Output MTok GPT-5.4 Nano: - Input: $0.20 / Input MTok - Output: $1.25 / Output MTok
AiBattle tweet media
English
10
6
174
10.6K
Lisan al Gaib
Lisan al Gaib@scaling01·
OpenAI introduces GPT-5.4 mini and nano
Lisan al Gaib tweet media
Română
2
2
82
2.8K
OpenAI
OpenAI@OpenAI·
GPT-5.4 mini is available today in ChatGPT, Codex, and the API. Optimized for coding, computer use, multimodal understanding, and subagents. And it’s 2x faster than GPT-5 mini. openai.com/index/introduc…
OpenAI tweet media
English
538
682
6.2K
1.5M
Vals AI
Vals AI@ValsAI·
The model was ran with high reasoning. Full results are available on Vals AI!
English
0
0
1
513
Vals AI
Vals AI@ValsAI·
Like 5.4 Mini, it performs especially well for its size on VibeCodeBench, but unlike Mini, it is unable to perform ProofBench tasks to the same standard.
English
1
0
1
666
Vals AI
Vals AI@ValsAI·
OpenAI has also released 5.4 Nano today, which comes in at #18 on the Vals Index.
Vals AI tweet media
English
2
4
40
2.6K
Vals AI
Vals AI@ValsAI·
Full results are available on Vals AI. Congrats to @OpenAI on the release!
English
0
0
2
1.2K
Vals AI
Vals AI@ValsAI·
The model was ran with temperature xhigh across all benchmarks except Terminal Bench 2, which used high reasoning. Results for the Vals Index are finalized, but we are still confirming scores for the latest parameters on certain benchmarks.
English
2
0
2
1.5K
Vals AI
Vals AI@ValsAI·
GPT 5.4 Mini comes in at #13 on the Vals Index - equivalent performance to GPT 5 🚀
Vals AI tweet media
English
4
3
64
82.4K