Random Libertarian Tech Lead
26.2K posts

Random Libertarian Tech Lead
@someRandomDev5
Reason over feelings, whenever the two come into conflict.
Pronouns: Who/cares شامل ہوئے Şubat 2018
67 فالونگ422 فالوورز

@someRandomDev5 @fynnso yeah good point, what form of fine tuning you think they ended up using?
English

@wesbos ... And there are some leaks which seem to suggest that Cursor's Composer model is no different; it seems that it may just be a fine-tuned version of Kimi K 2.5:
Fynn@fynnso
was messing with the OpenAI base URL in Cursor and caught this accounts/anysphere/models/kimi-k2p5-rl-0317-s515-fast so composer 2 is just Kimi K2.5 with RL at least rename the model ID
English

@wesbos Opus and GPT 5.4 are both highly intelligent.
But Opus has the edge in understanding people, and GPT 5.4 has the edge in understanding systems.
Nearly every open-weights model is basically just a distilled fine-tuned Frankenstein of GPT, Claude, and Gemini.
English

Composer 2 vs Opus 4.6 vs GPT 5.4 - a totally unscientific test
> Create a Twitter clone. Use Better Auth, Vite, Sqlite, Drizzle, Typescript, and React with Tanstack Start.
Each one took ~5 mins in plan mode. Each had access to a browser to test.
Composer: 5 mins $6.04 1,250 LOC
Opus: 19 mins $10.43 1,000 LOC
GPT: 22 mins $14.15 2,000 LOC
Opus seems to use the cache WAY more than composer, so it's not really 10x more expensive.
Composer app ran first try. Other two needed a bit of CORS debugging but did work.
Code between all was extremely similar
All three done inside cursor - so Claude Code / Codex may have been different
Models used:
Composer 2.0 (regular, not fast)
Opus 4.6 Medium Thinking
GPT 5.4 Medium Thinking


Wes Bos@wesbos
Cursor just launched Composer 2 - their own model. It's 10× cheaper than Opus 4.6 and supposed to rival it. I've been using it for a few days, I don't have any skewed graphs to show you but from a pure vibes POV I can tell you it's pretty good™ My litmus test right now is if it can build a 3D Printable model with Manifold CAD. I build a Gif zoetrope generator and it did fantastic.
English

@wesbos If you want an interesting test that actually demonstrates brownfield development (without requiring too much set up on your own part), ask the model to expand or modify the behavior of a large complicated open-source project that you actively use.
English

@wesbos That said, this greenfield test does do a fairly good job of demonstrating how well these models can replace design agency work. If all you're building are businesscard-like or brochure-like landing pages, then sure, this test is fine.
English

@fynnso In this scenario is it actually fine tuning or is it a LoRa adapter?
I’m a little new to these concepts but very curious.
English

was messing with the OpenAI base URL in Cursor and caught this
accounts/anysphere/models/kimi-k2p5-rl-0317-s515-fast
so composer 2 is just Kimi K2.5 with RL
at least rename the model ID

Cursor@cursor_ai
Composer 2 is now available in Cursor.
English

@BlackHC TLDR: How did you know that software was good before the era of AI? Pure vibes. How do you know it's good after the era of AI? Still pure vibes, nothing has changed.
English

@BlackHC Again, prompt engineering is an actual skill set and there's actually a lot of nuance to it despite people ridiculing it.
"Building an intuitive sense of the things that the LLM can actually infer accurately through language" is a key part of that.
English

