Shen Zhuoran

1K posts

Shen Zhuoran

@CMS_Flash

Coding agents/self-improvement @xai. Ex-@GoogleAI Resident/@augmentcode. Alum @HKUniversity. 💎 Terran @StarCraft II. Views are personal.

San Jose, CA, United States Katılım Şubat 2016

231 Takip Edilen3.3K Takipçiler

Shen Zhuoran retweetledi

Boyuan Zheng@boyuan__zheng·4d

Excited to see people try Grok Build for web dev. Our team has put a lot of effort into improving its aesthetics, functionality, and more exciting features to be expected with recursive self-improvement loop. It’s still early beta, and feedback is very welcome. Please try it out and let us know where we can improve.

Kilo@kilocode

Grok Build 0.1 might be one of the most underestimated AI models right now. We tested it in Kilo Code by asking it to build 5 websites from scratch. Here are the results:

English

246

116

1.5K

31.7M

Shen Zhuoran@CMS_Flash·19 May

@boyuan__zheng @xai Finish our push for the best web dev model in the world 🔥🔥🔥!

English

Boyuan Zheng@boyuan__zheng·19 May

@CMS_Flash @xai Will miss you, bro🥲 Really enjoy working with you on web dev and brainstorming recursive self-improvement. All the best!

English

111

Shen Zhuoran@CMS_Flash·16 May

I left @xai last week. It was quite a journey. I learned a lot technically and made some lifelong friends. I was fortunate to have spent almost all of my xAI tenure on a singular, long-term project on optimizing Grok's ability to one-shot complex, complete, and polished web apps. I am proud of this team. We contributed one of xAI's most successful alignment recipes, which has been replicated to other teams. We scaled another innovative recipe to incredible scales, for which Grok 4.3 gave a tip-of-an-iceberg preview. I am excited to see its full impact reveal on future releases. Looking forward, we are now at a critical point in the history of AI. Coding is on track to become a solved problem and AI is on the brink of its first total disruption of a major industry. Models show the first signs of substantial participation in their own development. The singularity might be on the horizon. Today might be analogous to the eves of AlexNet and GPT-3, or perhaps more profound. I am excited for what is to come.

English

395

23.5K

Shen Zhuoran@CMS_Flash·19 May

@JaySym_Ai How to get access to Cosmos?

English

194

JaySym@JaySym_Ai·18 May

Thanks, Cosmos. I no longer need to leave my macbook open all day long

English

264

Shen Zhuoran@CMS_Flash·17 May

@zhmeishi @xai It was a memorable journey pushing on web dev together with you!

English

Zhenmei Shi@zhmeishi·16 May

@CMS_Flash @xai Wish u all the best, Zhuoran!

English

1.3K

Shen Zhuoran@CMS_Flash·17 May

@OpsoFacto @xai No.

Opso Facto@OpsoFacto·17 May

@CMS_Flash @xai Are you at spacexai

English

Shen Zhuoran@CMS_Flash·17 May

@HoneybadgerFan4 @xai Sure, lemme know when you want to play!

English

Sebastian Andreas Nikolaus@HoneybadgerFan4·16 May

@CMS_Flash @xai Interested in onboarding the handyman to the AI Age? Also interested in a 2v2 - I can bring a decent 6 pool rush to the table

English

143

Shen Zhuoran retweetledi

Boyuan Zheng@boyuan__zheng·15 May

We put a lot of work into making it smarter and stronger at coding, especially for a small model. Excited for what's coming next with bigger models training at Colossus 2.

xAI@xai

An early beta of Grok Build, an agentic CLI for coding, building apps, and automating workflows is now available for SuperGrok Heavy subscribers. Through this early beta, we will improve the model and product based on your feedback. Try it at x.ai/cli

English

140

6.7K

Shen Zhuoran@CMS_Flash·15 May

@ar0cket1 Why?

116

ar0cket1@ar0cket1·15 May

@CMS_Flash imo residuals are a skill issue

English

143

Shen Zhuoran@CMS_Flash·15 May

1. Residual connections 2. Attention mechanism 3. Linear attention

Core Automation@CoreAutoAI

What are deep learning architecture modifications you don’t consider hacks @_arohan_

English

9.9K

Shen Zhuoran@CMS_Flash·7 May

End-to-end eval is the way to scale to the future. For general coding, it will be whole repo generation for production-scale libraries, tested by UTs. For app development, it will be one-shotting production-grade apps, like Amazon, X, or Workday, tested by agentic grading.

John Yang@jyangballin

How much of SQLite, FFmpeg, PHP compiler can LMs code from scratch? Given just an executable and no starter code or internet access. Introducing ProgramBench: 200 rigorous, whole-repo generation tasks where models design, build, and ship a working program end to end. 🧵

English

3.2K

Shen Zhuoran@CMS_Flash·30 Nis

@Designarena @XiaomiMiMo @Xiaomi No scores yet?

English

199

Design Arena@Designarena·30 Nis

MiMo-V2.5 by @XiaomiMiMo and @Xiaomi has been added to Design Arena! Built for complex agent and coding tasks, with strong visual reasoning, precise chart understanding, and deep multimodal capabilities.

Xiaomi MiMo@XiaomiMiMo

Xiaomi MiMo-V2.5 Series: Pushing Open-Source Agents Forward 🔸 MiMo-V2.5-Pro, our strongest model yet. A major leap from MiMo-V2-Pro in general agentic capabilities, complex software engineering, and long-horizon tasks, now matching frontier models like Claude Opus 4.6 and GPT-5.4 across most benchmarks (SWE-bench Pro 57.2, Claw-Eval 63.8, τ3-Bench 72.9). It can autonomously complete professional tasks involving 1,000+ tool calls, work that would take human experts days. Tech Blog: mimo.xiaomi.com/blog/mimo-v2.5… 🔸 MiMo-V2.5, native omnimodal with strong agentic capabilities. Pro-level agent performance at roughly half the cost. Improved multimodal perception across image and video understanding, native 1M-token context window, and significantly more efficient inference. Tech Blog: mimo.xiaomi.com/blog/mimo-v2.5 🔗 API & Token Plan: platform.xiaomimimo.com/token-plan

English

3.2K

Shen Zhuoran@CMS_Flash·24 Nis

Agents are pretty much just English compilers.

skcd@skcd42

2018: waiting for my compiler to complete 2026: waiting for my agent to complete

English

2.5K

Shen Zhuoran retweetledi

Boyuan Zheng@boyuan__zheng·18 Nis

Incredible to see what we've been cooking this past month — and this is only the beginning. Stronger models and richer capabilities are on the way. What excites me most: what happens when the web dev model hits a singularity and unlocks Computer Use Agent self-play? Proud to be part of the team building this🚀

BijanBowen@Ominousind

Grok 4.3 Beta browser os test result, gta clone and voice control apps

English

183

16.5K

Shen Zhuoran retweetledi

Beibin Li@beibin79·18 Nis

Good job team!~ @ahmnav @CMS_Flash @arnogau @boyuan__zheng @ebecker_xai @1kevin_h @michaelbzhu @zhmeishi

BijanBowen@Ominousind

Grok 4.3 Beta browser os test result, gta clone and voice control apps

English

151

8.1K

Shen Zhuoran@CMS_Flash·19 Nis

Cool work of the team is finally out in the world. This is a very early preview, and much more is to come around: - One-shotting complex web apps; - Pure vibe coding; - Self-improvement between browser use and web app development. The last point is critical and unique to web dev, because there is an intelligence gap between using a web app and building a web app that we can exploit, in theory leading to indefinite scaling of self-improvement.

BijanBowen@Ominousind

Grok 4.3 Beta browser os test result, gta clone and voice control apps

English

282

34.8K

Shen Zhuoran@CMS_Flash·17 Nis

Coming soon!

Elon Musk@elonmusk

@HighlyUnspoken @cb_doge @Similarweb @grok Beta release of Grok Build app & terminal next week

English

225

6.3K

Shen Zhuoran retweetledi

Nat McAleese@__nmca__·13 Nis

A full-scale US Waymo rollout would cost ~700 full-time jobs in the funeral care industry (by saving around 35 thousand young American lives per year). Will no one think of (some of) the morticians!

English

532

8.2K

239.2K

Shen Zhuoran@CMS_Flash·14 Nis

@skcd42 Software -> game engine -> video gen, sounds about right

English