Raphi-2Code

11.9K posts

Raphi-2Code banner
Raphi-2Code

Raphi-2Code

@R2Cdev_

加入时间 Ocak 2024
651 关注252 粉丝
置顶推文
Raphi-2Code
Raphi-2Code@R2Cdev_·
GPT-5.4 Pro is a lot better at composing music than GPT-5.2 Pro.
English
0
0
4
147
Raphi-2Code 已转推
ミツキヨ(Mitsukiyo)
ミツキヨ(Mitsukiyo)@mitsukiyo_5·
地球最強のMacBookになりました。
ミツキヨ(Mitsukiyo) tweet mediaミツキヨ(Mitsukiyo) tweet media
日本語
24
85
1.4K
104K
Raphi-2Code 已转推
Harshith
Harshith@HarshithLucky3·
No AGI in 2026
Indonesia
2
2
11
4.3K
Raphi-2Code 已转推
David Shapiro (L/0)
David Shapiro (L/0)@DaveShapi·
People who are wrong: - Degrowthers - Decels - Doomers - "AI is a bubble" - "AI is hitting a wall" - "Data centers are bad" They're ALL WRONG.
English
30
19
204
3.8K
Raphi-2Code 已转推
Tech Dev Notes
Tech Dev Notes@techdevnotes·
New Grok Imagine account on X
Tech Dev Notes tweet media
English
20
13
129
15.9K
Raphi-2Code 已转推
Angel 🌼
Angel 🌼@Angaisb_·
Spring is here Finally got rid of the ❄️ emoji
Angel 🌼 tweet media
English
7
1
41
5.3K
Raphi-2Code 已转推
Theo - t3.gg
Theo - t3.gg@theo·
Since OpenAI dropped gpt-oss-120b, Mistral has released 4 models that are worse than gpt-pss-120b
Artificial Analysis@ArtificialAnlys

Mistral has released Mistral Small 4, an open weights model with hybrid reasoning and image input, scoring 27 on the Artificial Analysis Intelligence Index @MistralAI's Small 4 is a 119B mixture-of-experts model with 6.5B active parameters per token, supporting both reasoning and non-reasoning modes. In reasoning mode, Mistral Small 4 scores 27 on the Artificial Analysis Intelligence Index, a 12-point improvement from Small 3.2 (15) and now among the most intelligent models Mistral has released, surpassing Mistral Large 3 (23) and matching the proprietary Magistral Medium 1.2 (27). However, it lags open weights peers with similar total parameter counts such as gpt-oss-120B (high, 33), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, 36), and Qwen3.5 122B A10B (Reasoning, 42). Key takeaways: ➤ Reasoning and non-reasoning modes in a single model: Mistral Small 4 supports configurable hybrid reasoning with reasoning and non-reasoning modes, rather than the separate reasoning variants Mistral has released previously with their Magistral models. In reasoning mode, the model scores 27 on the Artificial Analysis Intelligence Index. In non-reasoning mode, the model scores 19, a 4-point improvement from its predecessor Mistral Small 3.2 (15) ➤ More token efficient than peers of similar size: At ~52M output tokens, Mistral Small 4 (Reasoning) uses fewer tokens to run the Artificial Analysis Intelligence Index compared to reasoning models such as gpt-oss-120B (high, ~78M), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, ~110M), and Qwen3.5 122B A10B (Reasoning, ~91M). In non-reasoning mode, the model uses ~4M output tokens ➤ Native support for image input: Mistral Small 4 is a multimodal model, accepting image input as well as text. On our multimodal evaluation, MMMU-Pro, Mistral Small 4 (Reasoning) scores 57%, ahead of Mistral Large 3 (56%) but behind Qwen3.5 122B A10B (Reasoning, 75%). Neither gpt-oss-120B nor NVIDIA Nemotron 3 Super 120B A12B support image input. All models support text output only ➤ Improvement in real-world agentic tasks: Mistral Small 4 scores an Elo of 871 on GDPval-AA, our evaluation based on OpenAI's GDPval dataset that tests models on real-world tasks across 44 occupations and 9 major industries, with models producing deliverables such as documents, spreadsheets, and diagrams in an agentic loop. This is more than double the Elo of Small 3.2 (339) and close to Mistral Large 3 (880), but behind gpt-oss-120B (high, 962), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, 1021), and Qwen3.5 122B A10B (Reasoning, 1130) ➤ Lower hallucination rate than peer models of similar size: Mistral Small 4 scores -30 on AA-Omniscience, our evaluation of knowledge reliability and hallucination, where scores range from -100 to 100 (higher is better) and a negative score indicates more incorrect than correct answers. Mistral Small 4 scores ahead of gpt-oss-120B (high, -50), Qwen3.5 122B A10B (Reasoning, -40), and NVIDIA Nemotron 3 Super 120B A12B (Reasoning, -42) Key model details: ➤ Context window: 256K tokens (up from 128K on Small 3.2) ➤ Pricing: $0.15/$0.6 per 1M input/output tokens ➤ Availability: Mistral first-party API only. At native FP8 precision, Mistral Small 4's 119B parameters require ~119GB to self-host the weights (more than the 80GB of HBM3 memory on a single NVIDIA H100) ➤ Modality: Image and text input with text output only ➤ Licensing: Apache 2.0 license

English
55
9
837
52.1K
Raphi-2Code
Raphi-2Code@R2Cdev_·
and on ARC-AGI-1, GPT-5.4 high is a lot better and cheaper than mini xhigh!
Raphi-2Code tweet mediaRaphi-2Code tweet media
English
0
0
0
17
Raphi-2Code 已转推
Chris
Chris@chatgpt21·
GPT-5.4 Mini/Nano on ARC-AGI 2 GPT-5.4 Mini: - xHigh: 19%, - High: 13%, - Med: 4%, - Low: 1%, GPT-5.4 Mini is 3× cheaper per token, but used 3× more reasoning tokens, and preformed 3x worse than GPT 5.4 high
Chris tweet media
English
12
7
128
13K
Raphi-2Code 已转推
Tech Dev Notes
Tech Dev Notes@techdevnotes·
xAI has Removed grok-code-fast-1 model from the Available Models in Console
Tech Dev Notes tweet media
English
11
3
100
7.2K
Raphi-2Code 已转推
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
MLX Benchmark test Qwen3.5-35B-A3B-4bit MLX Benchmark Results M5 Max s M3 Ultra! The winner is M5 Max! 🥇 Details for each run in 🧵
Ivan Fioravanti ᯅ tweet media
English
8
4
63
3.8K
Raphi-2Code 已转推
Acer
Acer@AcerFur·
I love that GPT-5.4 Pro can write me 200 pages of a PDF in two days …and it’s only ~60% done oof
English
2
2
139
10.7K
Raphi-2Code 已转推
Espen JD
Espen JD@Snixtp·
I've seen very little discussion about how the new GPT-5.4 mini and nano compare to 5.3-Codex-Spark, so I did some research and created this. I focused only on official OpenAI sources, as there's a lot of misinformation out there. Hope you enjoy :)
Espen JD@Snixtp

x.com/i/article/2034…

English
0
1
3
119