Shen Zhuoran

1K posts

Shen Zhuoran banner
Shen Zhuoran

Shen Zhuoran

@CMS_Flash

Coding agents/self-improvement @xai. Ex-@GoogleAI Resident/@augmentcode. Alum @HKUniversity. 💎 Terran @StarCraft II. Views are personal.

San Jose, CA, United States Katılım Şubat 2016
231 Takip Edilen3.3K Takipçiler
Shen Zhuoran retweetledi
Boyuan Zheng
Boyuan Zheng@boyuan__zheng·
Excited to see people try Grok Build for web dev. Our team has put a lot of effort into improving its aesthetics, functionality, and more exciting features to be expected with recursive self-improvement loop. It’s still early beta, and feedback is very welcome. Please try it out and let us know where we can improve.
Kilo@kilocode

Grok Build 0.1 might be one of the most underestimated AI models right now. We tested it in Kilo Code by asking it to build 5 websites from scratch. Here are the results:

English
246
116
1.5K
31.7M
Boyuan Zheng
Boyuan Zheng@boyuan__zheng·
@CMS_Flash @xai Will miss you, bro🥲 Really enjoy working with you on web dev and brainstorming recursive self-improvement. All the best!
English
1
0
1
111
Shen Zhuoran
Shen Zhuoran@CMS_Flash·
I left @xai last week. It was quite a journey. I learned a lot technically and made some lifelong friends. I was fortunate to have spent almost all of my xAI tenure on a singular, long-term project on optimizing Grok's ability to one-shot complex, complete, and polished web apps. I am proud of this team. We contributed one of xAI's most successful alignment recipes, which has been replicated to other teams. We scaled another innovative recipe to incredible scales, for which Grok 4.3 gave a tip-of-an-iceberg preview. I am excited to see its full impact reveal on future releases. Looking forward, we are now at a critical point in the history of AI. Coding is on track to become a solved problem and AI is on the brink of its first total disruption of a major industry. Models show the first signs of substantial participation in their own development. The singularity might be on the horizon. Today might be analogous to the eves of AlexNet and GPT-3, or perhaps more profound. I am excited for what is to come.
English
32
12
395
23.5K
JaySym
JaySym@JaySym_Ai·
Thanks, Cosmos. I no longer need to leave my macbook open all day long
English
1
0
1
264
Sebastian Andreas Nikolaus
Sebastian Andreas Nikolaus@HoneybadgerFan4·
@CMS_Flash @xai Interested in onboarding the handyman to the AI Age? Also interested in a 2v2 - I can bring a decent 6 pool rush to the table
Sebastian Andreas Nikolaus tweet media
English
1
0
1
143
Shen Zhuoran retweetledi
Boyuan Zheng
Boyuan Zheng@boyuan__zheng·
We put a lot of work into making it smarter and stronger at coding, especially for a small model. Excited for what's coming next with bigger models training at Colossus 2.
xAI@xai

An early beta of Grok Build, an agentic CLI for coding, building apps, and automating workflows is now available for SuperGrok Heavy subscribers. Through this early beta, we will improve the model and product based on your feedback. Try it at x.ai/cli

English
8
6
140
6.7K
Shen Zhuoran
Shen Zhuoran@CMS_Flash·
End-to-end eval is the way to scale to the future. For general coding, it will be whole repo generation for production-scale libraries, tested by UTs. For app development, it will be one-shotting production-grade apps, like Amazon, X, or Workday, tested by agentic grading.
John Yang@jyangballin

How much of SQLite, FFmpeg, PHP compiler can LMs code from scratch? Given just an executable and no starter code or internet access. Introducing ProgramBench: 200 rigorous, whole-repo generation tasks where models design, build, and ship a working program end to end. 🧵

English
1
1
10
3.2K
Design Arena
Design Arena@Designarena·
MiMo-V2.5 by @XiaomiMiMo and @Xiaomi has been added to Design Arena! Built for complex agent and coding tasks, with strong visual reasoning, precise chart understanding, and deep multimodal capabilities.
Design Arena tweet media
Xiaomi MiMo@XiaomiMiMo

Xiaomi MiMo-V2.5 Series: Pushing Open-Source Agents Forward 🔸 MiMo-V2.5-Pro, our strongest model yet. A major leap from MiMo-V2-Pro in general agentic capabilities, complex software engineering, and long-horizon tasks, now matching frontier models like Claude Opus 4.6 and GPT-5.4 across most benchmarks (SWE-bench Pro 57.2, Claw-Eval 63.8, τ3-Bench 72.9). It can autonomously complete professional tasks involving 1,000+ tool calls, work that would take human experts days. Tech Blog: mimo.xiaomi.com/blog/mimo-v2.5… 🔸 MiMo-V2.5, native omnimodal with strong agentic capabilities. Pro-level agent performance at roughly half the cost. Improved multimodal perception across image and video understanding, native 1M-token context window, and significantly more efficient inference. Tech Blog: mimo.xiaomi.com/blog/mimo-v2.5 🔗 API & Token Plan: platform.xiaomimimo.com/token-plan

English
2
6
67
3.2K
Shen Zhuoran
Shen Zhuoran@CMS_Flash·
Cool work of the team is finally out in the world. This is a very early preview, and much more is to come around: - One-shotting complex web apps; - Pure vibe coding; - Self-improvement between browser use and web app development. The last point is critical and unique to web dev, because there is an intelligence gap between using a web app and building a web app that we can exploit, in theory leading to indefinite scaling of self-improvement.
BijanBowen@Ominousind

Grok 4.3 Beta browser os test result, gta clone and voice control apps

English
10
15
282
34.8K
Shen Zhuoran retweetledi
Nat McAleese
Nat McAleese@__nmca__·
A full-scale US Waymo rollout would cost ~700 full-time jobs in the funeral care industry (by saving around 35 thousand young American lives per year). Will no one think of (some of) the morticians!
English
66
532
8.2K
239.2K
Shen Zhuoran
Shen Zhuoran@CMS_Flash·
@skcd42 Software -> game engine -> video gen, sounds about right
English
0
0
1
94
skcd
skcd@skcd42·
every well written software looks more and more like a game engine
English
45
48
1.7K
101.4K