Andrew Zhai

564 posts

Andrew Zhai

@ZhaiAndrew

ML @ cursor. ex- founder @thealisa_com (acq.), ml @pinterest. @stanford @berkeley grad.

San Francisco, CA Katılım Ağustos 2015

253 Takip Edilen949 Takipçiler

Andrew Zhai@ZhaiAndrew·3d

it's a very good model. and really cheap!

Artificial Analysis@ArtificialAnlys

Cursor's new Composer 2.5 takes third on the Artificial Analysis Coding Agent Index and is ~10-60x lower cost than the higher-effort Opus 4.7 and GPT-5.5 variants above it. This release puts Composer among the leading coding agent models, something that wasn’t clear for past releases @cursor_ai has released Composer 2.5, the latest model in its Composer line. Composer 2.5 scored 62 on our Coding Agent Index, a 14 point gain over Composer 2 (48). This puts it in third place of our tested agents, behind only Claude Opus 4.7 (max) in Claude Code (66) and GPT-5.5 (xhigh reasoning) in Codex (65). These cost $4.10 and $4.82 per task respectively, ~10x the cost of Composer 2.5 Fast ($0.44) and ~60x the cost of Composer 2.5 standard ($0.07). Key results for Composer 2.5 in Cursor CLI: ➤ Cost-quality Pareto frontier: At $0.07 (standard) and $0.44 (Fast) per task, Composer 2.5 is cheaper than every other agent scoring above 60 on the Index. Medium-effort peers cost $1.24–$2.21 per task; higher-effort variants land 3-4 points above at $4.10–$4.82 ➤ Per-benchmark gains vs Composer 2: +35 points on SWE-Bench-Pro-Hard-AA (12% → 47%), +2 points on Terminal-Bench v2 (64% → 66%), and +3 points on SWE-Atlas-QnA (69% → 72%). At 47%, Composer 2.5's score on SWE-Bench-Pro-Hard-AA is comparable to Claude Opus 4.7 (max) in Claude Code ➤ Among the fastest coding agents: Composer 2.5 Fast runs at an average wall time of 6.7 minutes per task, the third-fastest agent on the Artificial Analysis Coding Agent Index, behind only Claude Opus 4.7 (medium) in Claude Code (5.8m) and GPT-5.5 (medium) in Cursor CLI (6.2m) ➤ Fast mode enables better responsiveness at 6x pricing: Fast runs 30% faster than standard Composer 2.5, but is ~6x the cost per task ($0.44 vs $0.07). Token pricing is 6x higher for Fast: $3.00/$15.00 vs $0.50/$2.50 per million input/output tokens Model details: ➤ Base model: Continued training on @Kimi_Moonshot's open weights Kimi K2.5 as with Composer 2, with Cursor reporting ~85% of total compute from its own additional training and reinforcement learning ➤ Pricing: $0.50/$2.50 per million input/output tokens for the standard variant; $3.00/$15.00 for the Fast variant (the default in Cursor) ➤ Available exclusively in Cursor: both Cursor IDE and Cursor CLI, an externally accessible API is not available Congratulations @cursor_ai and @mntruell on the impressive release!

English

913

Andrew Zhai retweetledi

Michael Truell@mntruell·4d

Composer 2.5 is now the most-chosen model in Cursor. We're giving everyone 10x usage for the rest of the day. Enjoy!

Cursor@cursor_ai

Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.

English

331

245

3.5K

39.6M

Andrew Zhai@ZhaiAndrew·5d

this was a fun one! a lot of research went into it, and internally we think the model is quite good. composer 2.5 is my daily driver now - please give it a try!

Cursor@cursor_ai

English

2.1K

Andrew Zhai retweetledi

Cursor@cursor_ai·29 Nis

We’re introducing the Cursor SDK so you can build agents with the same runtime, harness, and models that power Cursor. Run agents from CI/CD pipelines, create automations for end-to-end workflows, or embed agents directly inside your products.

English

412

825

8.8K

Andrew Zhai retweetledi

TBPN@tbpn·22 Nis

BREAKING: SpaceX has secured the right to acquire Cursor for $60B later this year

English

2.1K

279.6K

Andrew Zhai retweetledi

Michael Truell@mntruell·22 Nis

Excited to partner with the SpaceX team to scale up Composer. A meaningful step on our path to build the best place to code with AI.

SpaceX@SpaceX

SpaceXAI and @cursor_ai are now working closely together to create the world’s best coding and knowledge work AI. The combination of Cursor’s leading product and distribution to expert software engineers with SpaceX’s million H100 equivalent Colossus training supercomputer will allow us to build the world’s most useful models. Cursor has also given SpaceX the right to acquire Cursor later this year for $60 billion or pay $10 billion for our work together.

English

484

1.2K

10.4K

1.6M

Andrew Zhai retweetledi

World Labs@theworldlabs·7 Nis

We're excited to be rolling out two model updates today! Marble 1.1: Improves lighting and contrast, with a major reduction in visual artifacts. Marble 1.1-Plus: Our new model built for scale. Create larger, more complex environments than ever before.

English

173

1.2K

190.9K

Andrew Zhai retweetledi

Cursor@cursor_ai·3 Nis

We're doubling Composer 2 usage through the end of this weekend. We recommend trying it out in our new interface, available in Cursor 3. Enjoy!

Cursor@cursor_ai

We’re introducing Cursor 3. It is simpler, more powerful, and built for a world where all code is written by agents, while keeping the depth of a development environment.

English

134

1.5K

416K

Andrew Zhai@ZhaiAndrew·2 Nis

Cursor 3.0!

Cursor@cursor_ai

We’re introducing Cursor 3. It is simpler, more powerful, and built for a world where all code is written by agents, while keeping the depth of a development environment.

Português

401

Andrew Zhai@ZhaiAndrew·2 Nis

@liu8in really cool!

English

Bin Liu@liu8in·2 Nis

seedance 2.0 is officially on HeyGen and consistent characters is solved — together with your HeyGen avatar: any scene, any motion. just mindblown nuf said - I made this video fully with SD2 & HeyGen:

English

827

Andrew Zhai@ZhaiAndrew·29 Mar

Been dogfooding new agentic-first cursor cursor.com/glass on a personal side project for the last 4 days. It's incredibly powerful, doing what took weeks for a company to do in a few days by myself. Some tips: 1. Use composer 2 for as much as possible. 200+ TPS speed with frontier-level intelligence is just 🧑‍🍳😗. Opus still good for making more readable plans 2. Ask composer to generate (image) variations of your UX. cursor image generation for product UX exploration is underrated 3. Ask composer to test your product with computer use in the browser. It works surprisingly well and enables the agent to have the full context on bugs that you're seeing Cursor glass is fun! Agentic first with all the cursor controls you need to be precise. (disclaimer: I work at cursor but this is just my personal opinion)

English

172

16.6K

Andrew Zhai retweetledi

Cursor@cursor_ai·26 Mar

Earlier this week, we published our technical report on Composer 2. We're sharing additional research on how we train new checkpoints. With real-time RL, we can ship improved versions of the model every five hours.

English

103

128

1.6K

505.7K

Andrew Zhai retweetledi

Leandro von Werra@lvwerra·24 Mar

Auto-research for ML training models is all the rage now, but underrated is: auto-research for data! Sure, you can squeeze out a bit of model performance by optimizing hyperparameters, but code agents can do data work that has been very labour intensive and required a lot of attention to a lot details effortlessly: > download data from many different data sources > bring all the data sources into uniform format > do detailed EDA: find patterns and outliers > look at 100s of samples and take detailed notes > make beautiful infographics rather than mpl plots > iterate on data filtering by looking at more samples > make a simple pipelines robust and scalable It's now possible to write data pipelines for dozens of data sources in hours that would have taken weeks of reading many docs, debugging APIs and data formats, wrangling outliers and missing data. A few weeks ago we gave Claude access to the CPU partition of our cluster and it iteratively refined filters to retrieve a domain subset of FineWeb. This would have taken me 2-3 days to work through while it took Claude just a few hours with almost no babysitting and with a nice logbook. Thus the long tail of small, niche data sources becomes more accessible and can be aggregated to even larger high quality datasets for cool applications. Data has been fuelling LLM progress more than model architecture innovations, so I am very excited about this!

English

276

22.1K

Andrew Zhai@ZhaiAndrew·26 Mar

@joshua_xu_ congrats @joshua_xu_ !

English

Joshua Xu@joshua_xu_·25 Mar

HeyGen made Fast Company's Most Innovative Companies list for 2026. We built it for introverts. For people who hate cameras. For people who had something to say but no easy way to say it. 31 million people signed up so far. Turns out there were a lot of us.

English

7.2K

Andrew Zhai@ZhaiAndrew·25 Mar

excited to share how we trained composer 2! we poured a ton of research into getting the model to frontier-level coding, and this report includes many details we hope will be valuable to the community!

Cursor@cursor_ai

We're releasing a technical report describing how Composer 2 was trained.

English

1.4K

Andrew Zhai retweetledi

Mike@grabbou·24 Mar

We evaluated Composer 2 in our React Native evals, and I'll say this: the @cursor_ai team is cooking 🧑‍🍳

English

109K

Andrew Zhai retweetledi

Cursor@cursor_ai·23 Mar

Cursor can now search millions of files and find results in milliseconds. This dramatically speeds up how fast agents complete tasks. We're sharing how we built Instant Grep, including the algorithms and tradeoffs behind the design.

English

196

361

5.9K

1.1M

Keşfet

@liu8in @joshua_xu_ @cursor_ai @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates