Random Libertarian Tech Lead

26.3K posts

Random Libertarian Tech Lead

@someRandomDev5

Reason over feelings, whenever the two come into conflict.

Pronouns: Who/cares เข้าร่วม Şubat 2018

67 กำลังติดตาม422 ผู้ติดตาม

Random Libertarian Tech Lead@someRandomDev5·1h

@FelixKerlin @cryptopunk7213 Savings per task would be a concise way to express it but then it would not be immediately clear that it's relative to only the most expensive model on this chart (rather than being some arbitrary metric, or relative to the most expensive model in existence).

English

Random Libertarian Tech Lead@someRandomDev5·1h

@FelixKerlin @cryptopunk7213 Since you already have a ceiling value that's constrained to this specific set of models, you could just frame it as "cost savings relative to the most expensive"... But that's difficult to communicate concisely in that amount of space.

English

Ejaaz@cryptopunk7213·22h

1. theres no way this beats opus 4.6 2. wtf is this chart crime??? 😂

Cursor@cursor_ai

Composer 2 is now available in Cursor.

English

984

205.5K

Random Libertarian Tech Lead@someRandomDev5·1h

@pirosb3 @fynnso Could be anything, really. If they're working with them directly, it could even be continued pre-training.

English

Daniel Pyrathon@pirosb3·3h

@someRandomDev5 @fynnso yeah good point, what form of fine tuning you think they ended up using?

English

Random Libertarian Tech Lead@someRandomDev5·2h

@wesbos ... And there are some leaks which seem to suggest that Cursor's Composer model is no different; it seems that it may just be a fine-tuned version of Kimi K 2.5:

Fynn@fynnso

was messing with the OpenAI base URL in Cursor and caught this accounts/anysphere/models/kimi-k2p5-rl-0317-s515-fast so composer 2 is just Kimi K2.5 with RL at least rename the model ID

English

Random Libertarian Tech Lead@someRandomDev5·2h

@wesbos Opus and GPT 5.4 are both highly intelligent. But Opus has the edge in understanding people, and GPT 5.4 has the edge in understanding systems. Nearly every open-weights model is basically just a distilled fine-tuned Frankenstein of GPT, Claude, and Gemini.

English

Wes Bos@wesbos·23h

Composer 2 vs Opus 4.6 vs GPT 5.4 - a totally unscientific test > Create a Twitter clone. Use Better Auth, Vite, Sqlite, Drizzle, Typescript, and React with Tanstack Start. Each one took ~5 mins in plan mode. Each had access to a browser to test. Composer: 5 mins $6.04 1,250 LOC Opus: 19 mins $10.43 1,000 LOC GPT: 22 mins $14.15 2,000 LOC Opus seems to use the cache WAY more than composer, so it's not really 10x more expensive. Composer app ran first try. Other two needed a bit of CORS debugging but did work. Code between all was extremely similar All three done inside cursor - so Claude Code / Codex may have been different Models used: Composer 2.0 (regular, not fast) Opus 4.6 Medium Thinking GPT 5.4 Medium Thinking

Wes Bos@wesbos

Cursor just launched Composer 2 - their own model. It's 10× cheaper than Opus 4.6 and supposed to rival it. I've been using it for a few days, I don't have any skewed graphs to show you but from a pure vibes POV I can tell you it's pretty good™ My litmus test right now is if it can build a 3D Printable model with Manifold CAD. I build a Gif zoetrope generator and it did fantastic.

English

1.1K

303.6K

Random Libertarian Tech Lead@someRandomDev5·4h

@wesbos If you want an interesting test that actually demonstrates brownfield development (without requiring too much set up on your own part), ask the model to expand or modify the behavior of a large complicated open-source project that you actively use.

English

Random Libertarian Tech Lead@someRandomDev5·4h

@wesbos That said, this greenfield test does do a fairly good job of demonstrating how well these models can replace design agency work. If all you're building are businesscard-like or brochure-like landing pages, then sure, this test is fine.

English

Random Libertarian Tech Lead@someRandomDev5·6h

@pirosb3 @fynnso You’re comparing “all forms of fine tuning” to “a specific form of fine tuning”.

English

101

Daniel Pyrathon@pirosb3·7h

@fynnso In this scenario is it actually fine tuning or is it a LoRa adapter? I’m a little new to these concepts but very curious.

English

5.5K

Fynn@fynnso·23h

was messing with the OpenAI base URL in Cursor and caught this accounts/anysphere/models/kimi-k2p5-rl-0317-s515-fast so composer 2 is just Kimi K2.5 with RL at least rename the model ID

Cursor@cursor_ai

Composer 2 is now available in Cursor.

English

232

356

5.3K

1.6M

Random Libertarian Tech Lead@someRandomDev5·22h

@BlackHC TLDR: How did you know that software was good before the era of AI? Pure vibes. How do you know it's good after the era of AI? Still pure vibes, nothing has changed.

English

Random Libertarian Tech Lead@someRandomDev5·22h

@BlackHC Again, prompt engineering is an actual skill set and there's actually a lot of nuance to it despite people ridiculing it. "Building an intuitive sense of the things that the LLM can actually infer accurately through language" is a key part of that.

English

Andreas Kirsch 🇺🇦@BlackHC·2d

A while back, Andrej Karpathy said the app store will be replaced by generated, disposable software," and Amjad Masad predicted that the value of all application software will go to zero I think this "ephemeral software hypothesis" is wrong, though, and I want to explain why:

English

378

30.7K

ค้นพบ

@FelixKerlin @cryptopunk7213 @pirosb3 @fynnso @wesbos @BlackHC @elonmusk @BarackObama