BlackAndW
44 posts


Grok Build 0.1 ranks #15 and Grok 4.3 (High) #17 in the new Agent Arena leaderboard. Grok Build 0.1 improves meaningfully on bash capability over Grok 4.3. It is slightly less steerable and more prone to tool hallucinations, but looks to be successfully completing tasks more often overall.
Agent Arena ranks models on real-world agentic tasks using a causal tracing methodology. A model’s net improvement indicates how much better or worse it is than the average model.
The thread breaks down how each model from @xAI scored across 5 signals, drawn from real tasks submitted by a global community of users.

English

my AI subscriptions right now
- Codex $200 (oss program)
- Claude $100
- Grok Build $300
- Cursor $20
OrcDev@orcdev
how many AI subscriptions do you have?
English

@pans65437 初めまして、中国が好きな中国人です。よろしくお願いします。もう二十年以上中国語を勉強してきました。❤️爱来自瓷器。
日本語

@iroha_ni_AI 老师你好,想请教一下你是怎么保证每次生成的人物一致性的,我每次生成的图片都和之前的不一致,即使我上传了图片也在prompt约束,但是还是一眼就看出来是2个人
中文





























