Henrick

357 posts

Henrick banner
Henrick

Henrick

@reyhenrick

Founder

São Paulo, Brasil Katılım Mayıs 2018
289 Takip Edilen116 Takipçiler
Alexander Yue
Alexander Yue@Alezander907·
For Browser tasks, Qwen 3.7 max is a +15% improvement over Qwen 3.6 plus. Now matching gemini-3-flash and mimo 2.5 pro. However without caching it is more expensive than Opus 4.7! Even with caching setup, I would still recommend GLM 5.1 or Gemini 3.1 pro at this tier
Alexander Yue tweet media
English
7
7
105
8.3K
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Gemini 3.5 Flash ranks #1 on the APEX-Agents-AA benchmark, outperforming much larger models a whole size above it.
Logan Kilpatrick tweet media
English
288
99
1.7K
471K
Henrick
Henrick@reyhenrick·
@ChujieZheng When 3.7 Plus or 3.7 Max with image/video support?? :'(
English
0
0
1
57
Henrick
Henrick@reyhenrick·
@ChujieZheng When on API? And do Plus version have image input support??
English
0
0
0
31
Chujie Zheng
Chujie Zheng@ChujieZheng·
For Qwen3.7-Max, we have invested far more compute into RL training than ever before. Its top-tier AA score confirms the resulting general and agentic capabilities. This is just the start. We will firmly push forward RL scaling to build more powerful Qwen models. Stay tuned!
Artificial Analysis@ArtificialAnlys

Alibaba’s new Qwen3.7 Max model scores 56.6 on the Artificial Analysis Intelligence Index, 4.8 points higher than Qwen3.6 Max Preview (51.8). While Alibaba still trails models from OpenAI, Anthropic and Google, Qwen3.7 Max is the closest they have been to the frontier Qwen3.7 Max is @Alibaba_Qwen's latest proprietary flagship, scoring 56.6 on the Intelligence Index, a 4.8 point gain over Qwen3.6 Max Preview (51.8) released in April. Qwen3.7 Max continues Alibaba's pattern, in place since Qwen2.5 Max (January 2025), of releasing Max and Plus models as closed weights while the rest of the Qwen line remains open weights. The leading open weights Qwen on the Intelligence Index is Qwen3.6 27B (Reasoning, 45.8) released in April 2026, and the leading open weights MoE Qwen is Qwen3.5 397B A17B (Reasoning, 45.0) released in February 2026 Key takeaways for the reasoning variant: ➤ The Intelligence Index gains over Qwen3.6 Max Preview are concentrated in scientific reasoning, agentic capability and coding. CritPt +9.7 p.p (3.7% to 13.4%), HLE +9.2 p.p (28.9% to 38.1%), TerminalBench Hard +6.9 p.p (43.9% to 50.8%) and GDPval-AA +42 Elo (1504 to 1546). Scores on other benchmarks in the Intelligence Index are flat compared to Qwen3.6 Max Preview ➤ A significant share of the Intelligence Index gain is driven by higher abstention on AA-Omniscience, not higher accuracy. Qwen3.7 Max's accuracy on AA-Omniscience dropped 7.6 p.p (37.7% to 30.1%), while its hallucination rate dropped 21.3 p.p (44.2% to 22.9%). The model is choosing not to answer more questions rather than recalling more facts. Because hallucination rate and accuracy both feed into the Intelligence Index, the hallucination reduction is one of the larger single contributors to the +4.8 point gain on the Intelligence Index ➤ Qwen3.7 Max used 96.7M output tokens to run the Intelligence Index, ~31% more than Qwen3.6 Max Preview (73.9M). It sits mid-pack on frontier token usage: above GPT-5.5 (high, 44.5M) and Gemini 3.1 Pro Preview (57.3M), below Claude Opus 4.7 (Adaptive Reasoning, Max Effort, 112M), Kimi K2.6 (166M) and DeepSeek V4 Pro (Reasoning, Max Effort, 187M) Key model details: ➤ Context window: 1M tokens (up from 256K on Qwen3.6 Max Preview) ➤ Multimodality: Text input and output only ➤ Pricing: Yet to be announced (Qwen3.6 Max Preview is priced at $1.30/$7.80 per 1M input/output tokens on the @alibaba_cloud first-party API) ➤ Licensing: Proprietary, closed weights

English
76
45
996
74.9K
Tibo
Tibo@thsottiaux·
@ajambrosino Is garbage on the menu tomorrow?
English
30
3
315
89.6K
Andrew Ambrosino
Andrew Ambrosino@ajambrosino·
it's important, i think, to resist the temptation to ship garbage
English
75
43
986
235.8K
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
Cohere launches open weights model Command A+ that achieves 37 on the Artificial Analysis Intelligence Index The release of Command A+ places @Cohere in line with Claude 4.5 Haiku on the Intelligence Index, and just above NVIDIA Nemotron 3 Super and Gemini 3.1 Flash-Lite. Key Takeaways: ➤ Command A+ ranks first on AA-Omniscience Non-Hallucination at 86%, ~3 percentage points ahead of the next-best model. Its AA-Omniscience Accuracy is 9%, so the headline AA-Omniscience score lands at -4, demonstrating a similar archetype to Claude 4.5 Haiku, where the model knows its limits ➤ On Cohere’s API, Command A+ (~281 output tokens per second) is faster than several comparable open-weights and small to mid-sized proprietary models (e.g., GPT-5.4 nano, Claude 4.5 Haiku, and Grok 4.3), but still slower than Gemini 3.1 Flash-Lite Preview, which outputs 304 tokens per second ➤ Command A+ trails its peer set on scientific reasoning (HLE ~11%, GPQA Diamond ~76%) and on coding (Terminal-Bench Hard ~25%, SciCode ~38%), consistent with gaps on the hardest science and agentic coding benchmarks ➤ It supports visual reasoning and scores 63% on MMMU-Pro (between Claude 4.5 Haiku at 59% and GPT-5.4 nano (xhigh) at 65%)
Artificial Analysis tweet media
English
13
25
257
33.9K
David Hendrickson
David Hendrickson@TeksEdge·
🚨 Qwen3.7-Max benchmarks finally appeared and signals a "code-red" for @AnthropicAI. 📊 Qwen3.7 Opus-4.6 Max in 57% (24 out of 42) of comparable benchmarks, with large leads in several key areas: IMOAnswerBench: 90.0 vs 75.3 Apex: 44.5 vs 34.5 IFBench: 79.1 vs 62.5 MRCR-v2 128k: 90.4 vs 84.0 PolyMATH: 86.5 vs 80.2 It scores particularly strong in STEM reasoning, instruction following, long-context understanding, and multilingual tasks. Qwen models are very good and seems are no longer just competitive but they’re pulling ahead in multiple domains. 🔥
David Hendrickson tweet media
English
22
25
338
25.4K
BRUNO
BRUNO@itarema2·
@eixopolitico Curiosamente esses 20% serão os mais milionários/bilionários da turma
Português
2
2
113
7.4K
Eixo Político
Eixo Político@eixopolitico·
🇺🇸 Harvard aprova limite de notas máximas na instituição: professores agora só podem dar "A" para, no máximo, 20% dos alunos por turma. A medida tenta conter uma suposta "inflação de notas" na universidade, onde 2 em cada 3 notas foram "A" em 2024, contra apenas 35% em 2012.
Eixo Político tweet media
Português
49
19
1.5K
510.9K
Mikeysee
Mikeysee@mikeysee·
Suuuuper wierd results with the new @GeminiApp 3.5 Flash model and the @convex evals. For the first time across 100+ models I have seen the model consistantly do WORSE when given the guidelines?! I have no idea whats going on here?!
Mikeysee tweet mediaMikeysee tweet media
English
15
11
262
34.4K
China AInews
China AInews@LunchangG7603·
Qwen3.7-Max IS OUT🚀 In 35h autonomous core tests with 1000+ tool calls, it maintained coherent reasoning, achieved a 10× speedup. It can: Build frontend prototypes+ complex multi-file projects Automate workflows with MCP+ multi-agent collaboration Support mainstream frameworks
China AInews tweet media
English
20
70
774
54.6K
thi
thi@thiagochareti·
Esse filme é a mistura perfeita de HORROR CÓSMICO e HORROR FOLCLÓRICO aliás por mim podiam adaptar todos livros do Adam Nevill, ele tem essa pegada de seres bizarros que eu amo demais
Português
76
625
10.1K
660.4K
Matheus
Matheus@matheuscaseca·
a Tatá Werneck imitando a Sandra Annenberg e o Edu não se aguentando e fazendo ela sair do personagem kkkkkkkkkkkkkk #EduETatá
Português
195
1.3K
37.5K
2.6M