Andrew Zhai

564 posts

Andrew Zhai banner
Andrew Zhai

Andrew Zhai

@ZhaiAndrew

ML @ cursor. ex- founder @thealisa_com (acq.), ml @pinterest. @stanford @berkeley grad.

San Francisco, CA Katılım Ağustos 2015
253 Takip Edilen949 Takipçiler
Andrew Zhai
Andrew Zhai@ZhaiAndrew·
it's a very good model. and really cheap!
Artificial Analysis@ArtificialAnlys

Cursor's new Composer 2.5 takes third on the Artificial Analysis Coding Agent Index and is ~10-60x lower cost than the higher-effort Opus 4.7 and GPT-5.5 variants above it. This release puts Composer among the leading coding agent models, something that wasn’t clear for past releases @cursor_ai has released Composer 2.5, the latest model in its Composer line. Composer 2.5 scored 62 on our Coding Agent Index, a 14 point gain over Composer 2 (48). This puts it in third place of our tested agents, behind only Claude Opus 4.7 (max) in Claude Code (66) and GPT-5.5 (xhigh reasoning) in Codex (65). These cost $4.10 and $4.82 per task respectively, ~10x the cost of Composer 2.5 Fast ($0.44) and ~60x the cost of Composer 2.5 standard ($0.07). Key results for Composer 2.5 in Cursor CLI: ➤ Cost-quality Pareto frontier: At $0.07 (standard) and $0.44 (Fast) per task, Composer 2.5 is cheaper than every other agent scoring above 60 on the Index. Medium-effort peers cost $1.24–$2.21 per task; higher-effort variants land 3-4 points above at $4.10–$4.82 ➤ Per-benchmark gains vs Composer 2: +35 points on SWE-Bench-Pro-Hard-AA (12% → 47%), +2 points on Terminal-Bench v2 (64% → 66%), and +3 points on SWE-Atlas-QnA (69% → 72%). At 47%, Composer 2.5's score on SWE-Bench-Pro-Hard-AA is comparable to Claude Opus 4.7 (max) in Claude Code ➤ Among the fastest coding agents: Composer 2.5 Fast runs at an average wall time of 6.7 minutes per task, the third-fastest agent on the Artificial Analysis Coding Agent Index, behind only Claude Opus 4.7 (medium) in Claude Code (5.8m) and GPT-5.5 (medium) in Cursor CLI (6.2m) ➤ Fast mode enables better responsiveness at 6x pricing: Fast runs 30% faster than standard Composer 2.5, but is ~6x the cost per task ($0.44 vs $0.07). Token pricing is 6x higher for Fast: $3.00/$15.00 vs $0.50/$2.50 per million input/output tokens Model details: ➤ Base model: Continued training on @Kimi_Moonshot's open weights Kimi K2.5 as with Composer 2, with Cursor reporting ~85% of total compute from its own additional training and reinforcement learning ➤ Pricing: $0.50/$2.50 per million input/output tokens for the standard variant; $3.00/$15.00 for the Fast variant (the default in Cursor) ➤ Available exclusively in Cursor: both Cursor IDE and Cursor CLI, an externally accessible API is not available Congratulations @cursor_ai and @mntruell on the impressive release!

English
2
0
18
913
Andrew Zhai retweetledi
Cursor
Cursor@cursor_ai·
We’re introducing the Cursor SDK so you can build agents with the same runtime, harness, and models that power Cursor. Run agents from CI/CD pipelines, create automations for end-to-end workflows, or embed agents directly inside your products.
English
412
825
8.8K
3M
Andrew Zhai retweetledi
TBPN
TBPN@tbpn·
BREAKING: SpaceX has secured the right to acquire Cursor for $60B later this year
TBPN tweet media
English
81
73
2.1K
279.6K
Andrew Zhai retweetledi
Michael Truell
Michael Truell@mntruell·
Excited to partner with the SpaceX team to scale up Composer. A meaningful step on our path to build the best place to code with AI.
SpaceX@SpaceX

SpaceXAI and @cursor_ai are now working closely together to create the world’s best coding and knowledge work AI. The combination of Cursor’s leading product and distribution to expert software engineers with SpaceX’s million H100 equivalent Colossus training supercomputer will allow us to build the world’s most useful models. Cursor has also given SpaceX the right to acquire Cursor later this year for $60 billion or pay $10 billion for our work together.

English
484
1.2K
10.4K
1.6M
Andrew Zhai retweetledi
World Labs
World Labs@theworldlabs·
We're excited to be rolling out two model updates today! Marble 1.1: Improves lighting and contrast, with a major reduction in visual artifacts. Marble 1.1-Plus: Our new model built for scale. Create larger, more complex environments than ever before.
English
43
173
1.2K
190.9K
Bin Liu
Bin Liu@liu8in·
seedance 2.0 is officially on HeyGen and consistent characters is solved — together with your HeyGen avatar: any scene, any motion. just mindblown nuf said - I made this video fully with SD2 & HeyGen:
English
2
0
15
827
Andrew Zhai
Andrew Zhai@ZhaiAndrew·
Been dogfooding new agentic-first cursor cursor.com/glass on a personal side project for the last 4 days. It's incredibly powerful, doing what took weeks for a company to do in a few days by myself. Some tips: 1. Use composer 2 for as much as possible. 200+ TPS speed with frontier-level intelligence is just 🧑‍🍳😗. Opus still good for making more readable plans 2. Ask composer to generate (image) variations of your UX. cursor image generation for product UX exploration is underrated 3. Ask composer to test your product with computer use in the browser. It works surprisingly well and enables the agent to have the full context on bugs that you're seeing Cursor glass is fun! Agentic first with all the cursor controls you need to be precise. (disclaimer: I work at cursor but this is just my personal opinion)
English
14
4
172
16.6K
Andrew Zhai retweetledi
Cursor
Cursor@cursor_ai·
Earlier this week, we published our technical report on Composer 2. We're sharing additional research on how we train new checkpoints. With real-time RL, we can ship improved versions of the model every five hours.
Cursor tweet media
English
103
128
1.6K
505.7K
Andrew Zhai retweetledi
Leandro von Werra
Leandro von Werra@lvwerra·
Auto-research for ML training models is all the rage now, but underrated is: auto-research for data! Sure, you can squeeze out a bit of model performance by optimizing hyperparameters, but code agents can do data work that has been very labour intensive and required a lot of attention to a lot details effortlessly: > download data from many different data sources > bring all the data sources into uniform format > do detailed EDA: find patterns and outliers > look at 100s of samples and take detailed notes > make beautiful infographics rather than mpl plots > iterate on data filtering by looking at more samples > make a simple pipelines robust and scalable It's now possible to write data pipelines for dozens of data sources in hours that would have taken weeks of reading many docs, debugging APIs and data formats, wrangling outliers and missing data. A few weeks ago we gave Claude access to the CPU partition of our cluster and it iteratively refined filters to retrieve a domain subset of FineWeb. This would have taken me 2-3 days to work through while it took Claude just a few hours with almost no babysitting and with a nice logbook. Thus the long tail of small, niche data sources becomes more accessible and can be aggregated to even larger high quality datasets for cool applications. Data has been fuelling LLM progress more than model architecture innovations, so I am very excited about this!
English
11
30
276
22.1K
Joshua Xu
Joshua Xu@joshua_xu_·
HeyGen made Fast Company's Most Innovative Companies list for 2026. We built it for introverts. For people who hate cameras. For people who had something to say but no easy way to say it. 31 million people signed up so far. Turns out there were a lot of us.
English
11
15
45
7.2K
Andrew Zhai retweetledi
Mike
Mike@grabbou·
We evaluated Composer 2 in our React Native evals, and I'll say this: the @cursor_ai team is cooking 🧑‍🍳
Mike tweet media
English
44
61
1K
109K
Andrew Zhai retweetledi
Cursor
Cursor@cursor_ai·
Cursor can now search millions of files and find results in milliseconds. This dramatically speeds up how fast agents complete tasks. We're sharing how we built Instant Grep, including the algorithms and tradeoffs behind the design.
Cursor tweet media
English
196
361
5.9K
1.1M