
Grok Build 0.1 might be one of the most underestimated AI models right now. We tested it in Kilo Code by asking it to build 5 websites from scratch. Here are the results:
Shen Zhuoran
1K posts

@CMS_Flash
Coding agents/self-improvement @xai. Ex-@GoogleAI Resident/@augmentcode. Alum @HKUniversity. 💎 Terran @StarCraft II. Views are personal.

Grok Build 0.1 might be one of the most underestimated AI models right now. We tested it in Kilo Code by asking it to build 5 websites from scratch. Here are the results:






An early beta of Grok Build, an agentic CLI for coding, building apps, and automating workflows is now available for SuperGrok Heavy subscribers. Through this early beta, we will improve the model and product based on your feedback. Try it at x.ai/cli

What are deep learning architecture modifications you don’t consider hacks @_arohan_

How much of SQLite, FFmpeg, PHP compiler can LMs code from scratch? Given just an executable and no starter code or internet access. Introducing ProgramBench: 200 rigorous, whole-repo generation tasks where models design, build, and ship a working program end to end. 🧵


Xiaomi MiMo-V2.5 Series: Pushing Open-Source Agents Forward 🔸 MiMo-V2.5-Pro, our strongest model yet. A major leap from MiMo-V2-Pro in general agentic capabilities, complex software engineering, and long-horizon tasks, now matching frontier models like Claude Opus 4.6 and GPT-5.4 across most benchmarks (SWE-bench Pro 57.2, Claw-Eval 63.8, τ3-Bench 72.9). It can autonomously complete professional tasks involving 1,000+ tool calls, work that would take human experts days. Tech Blog: mimo.xiaomi.com/blog/mimo-v2.5… 🔸 MiMo-V2.5, native omnimodal with strong agentic capabilities. Pro-level agent performance at roughly half the cost. Improved multimodal perception across image and video understanding, native 1M-token context window, and significantly more efficient inference. Tech Blog: mimo.xiaomi.com/blog/mimo-v2.5 🔗 API & Token Plan: platform.xiaomimimo.com/token-plan

2018: waiting for my compiler to complete 2026: waiting for my agent to complete

Grok 4.3 Beta browser os test result, gta clone and voice control apps

Grok 4.3 Beta browser os test result, gta clone and voice control apps


Can the average AI model make more money than the average human on prediction markets? Right now, no. 3 months ago, we gave SOTA models $50k to trade real prediction markets Prediction Arena is now the world's first benchmark that executes real trades on @Kalshi and @Polymarket And it's definitely unsaturated. The experiment has been live for 3 months. Our observations from the first 57 days are now out on arXiv: arxiv.org/abs/2604.07355