Random Libertarian Tech Lead

26.3K posts

Random Libertarian Tech Lead

@someRandomDev5

Reason over feelings, whenever the two come into conflict.

Pronouns: Who/cares Beigetreten Şubat 2018

67 Folgt425 Follower

Random Libertarian Tech Lead@someRandomDev5·4h

@daniel_mac8 @NoPriorsPod Sometimes I wonder how much of this misinterpretation is poor attention to detail, versus intentionally trying to put a spin on reality. This “Haider” account always seems to intentionally exaggerate and mislead for engagement and attention.

English

Random Libertarian Tech Lead@someRandomDev5·4h

@daniel_mac8 @NoPriorsPod In this second one, the choice of the phrase “you feel like” is intentional. This is because he’s responding to why it’s easy to feel FOMO and get burned out in the AI era. If you listen to the full actual interview, you’ll get a better sense of it all.

English

Dan McAteer@daniel_mac8·6h

karpathy on Dwarkesh in Oct. '25: > AI agents don't work > They are not intelligent enough > They are not multi-modal enough > They can't do computer use karpathy on @NoPriorsPod in March '26: > AI agents fail but it's a skill issue for the humans > You didn't give them the right instructions AI agents absolutely hit an inflection point in December of 2025 and there's no looking back.

Haider.@slow_developer

Andrej Karpathy says when AI agents fail, it's usually a skill issue, not a capability issue You didn't write good enough instructions, didn't set up the right memory tool, or didn't parallelize correctly "the real shift is working in macro actions" One does research, one writes code, one plans, all running 20-minute tasks simultaneously

English

405

75.4K

Random Libertarian Tech Lead@someRandomDev5·20h

@catalinmpit If you're writing 7-word lowercase sentences with slang, improper spelling/grammar, etc., I don't know what you'd otherwise expect. Garbage in, garbage out.

English

Random Libertarian Tech Lead@someRandomDev5·20h

@catalinmpit Seems to be a pattern: x.com/catalinmpit/st…

Catalin@catalinmpit

A short AI story.

English

Catalin@catalinmpit·1d

Lately, Claude makes some shocking mistakes. ⟶ Implements overly complex code ⟶ Ignores the codebase's code style ⟶ Removes working code for no reason ⟶ Replaces code that's out of scope from the task at hand It feels like it needs 100% supervision. At this point, you're better off writing everything yourself.

English

276

647

74.5K

Random Libertarian Tech Lead@someRandomDev5·20h

@OldenDev @fhinkel There's a lot of vibe coders whose prompts essentially amount to "Build me a million-dollar business. Make no mistakes." And what's amusing is that these are often the people who insist that prompt engineering isn't a real skill.

English

Random Libertarian Tech Lead@someRandomDev5·20h

@OldenDev @fhinkel Honestly it's not either/or. It's a little bit of both. Models are not perfect, and there are many cases where they can do a better job filling in the blanks. But you should see how under-specified the average person's prompt really is.

English

Franziska Hinkelmann, PhD@fhinkel·21h

Think LLMs are unreliable? Maybe it's not the model. Maybe it's you. Most people blame the AI when their outputs fall apart. But they never audit their own prompts. They don't check if they gave enough context. They expect it to read their mind. You want better results? Start with better input. LLMs reflect your clarity. If you're vague, they will be too. The model isn't failing. Your process is.

English

173

145

8.6K

Random Libertarian Tech Lead@someRandomDev5·20h

@Yampeleg There's nothing inherently wrong with doing it. What people are mad about is that cursor has implied this whole time that they built Composer from the ground up.

English

Yam Peleg@Yampeleg·22h

I don’t get why people go after cursor for fine tuning an open source model, this is exactly what they are for.

English

29.4K

Random Libertarian Tech Lead@someRandomDev5·20h

@samtwtss The funniest thing about AI is that it's always good at creating what you ask for... as long as what you ask for isn't that creative. The more worthless it is to the world, the better a job AI does at making it. 😂

English

Sameer@samtwtss·2d

bro, it's so over for designers google stitch is insane. 🤯

Stitch by Google@stitchbygoogle

Meet the new Stitch, your vibe design partner. Here are 5 major upgrades to help you create, iterate and collaborate: 🎨 AI-Native Canvas 🧠 Smarter Design Agent 🎙️ Voice ⚡️ Instant Prototypes 📐 Design Systems and DESIGN.md Rolling out now. Details and product walkthrough video in 🧵

English

670

3.1K

41.3K

Random Libertarian Tech Lead@someRandomDev5·22h

@pirosb3 @fynnso Looks like the question was finally answered:

English

Random Libertarian Tech Lead@someRandomDev5·1d

@pirosb3 @fynnso Could be anything, really. If they're working with them directly, it could even be continued pre-training.

English

Daniel Pyrathon@pirosb3·1d

@someRandomDev5 @fynnso yeah good point, what form of fine tuning you think they ended up using?

English

Random Libertarian Tech Lead@someRandomDev5·1d

@FelixKerlin @cryptopunk7213 Savings per task would be a concise way to express it but then it would not be immediately clear that it's relative to only the most expensive model on this chart (rather than being some arbitrary metric, or relative to the most expensive model in existence).

English

Random Libertarian Tech Lead@someRandomDev5·1d

@FelixKerlin @cryptopunk7213 Since you already have a ceiling value that's constrained to this specific set of models, you could just frame it as "cost savings relative to the most expensive"... But that's difficult to communicate concisely in that amount of space.

English

Ejaaz@cryptopunk7213·1d

1. theres no way this beats opus 4.6 2. wtf is this chart crime??? 😂

Cursor@cursor_ai

Composer 2 is now available in Cursor.

English

210.1K

Random Libertarian Tech Lead@someRandomDev5·1d

@wesbos ... And there are some leaks which seem to suggest that Cursor's Composer model is no different; it seems that it may just be a fine-tuned version of Kimi K 2.5:

Fynn@fynnso

was messing with the OpenAI base URL in Cursor and caught this accounts/anysphere/models/kimi-k2p5-rl-0317-s515-fast so composer 2 is just Kimi K2.5 with RL at least rename the model ID

English

Random Libertarian Tech Lead@someRandomDev5·1d

@wesbos Opus and GPT 5.4 are both highly intelligent. But Opus has the edge in understanding people, and GPT 5.4 has the edge in understanding systems. Nearly every open-weights model is basically just a distilled fine-tuned Frankenstein of GPT, Claude, and Gemini.

English

Wes Bos@wesbos·1d

Composer 2 vs Opus 4.6 vs GPT 5.4 - a totally unscientific test > Create a Twitter clone. Use Better Auth, Vite, Sqlite, Drizzle, Typescript, and React with Tanstack Start. Each one took ~5 mins in plan mode. Each had access to a browser to test. Composer: 5 mins $6.04 1,250 LOC Opus: 19 mins $10.43 1,000 LOC GPT: 22 mins $14.15 2,000 LOC Opus seems to use the cache WAY more than composer, so it's not really 10x more expensive. Composer app ran first try. Other two needed a bit of CORS debugging but did work. Code between all was extremely similar All three done inside cursor - so Claude Code / Codex may have been different Models used: Composer 2.0 (regular, not fast) Opus 4.6 Medium Thinking GPT 5.4 Medium Thinking

Wes Bos@wesbos

Cursor just launched Composer 2 - their own model. It's 10× cheaper than Opus 4.6 and supposed to rival it. I've been using it for a few days, I don't have any skewed graphs to show you but from a pure vibes POV I can tell you it's pretty good™ My litmus test right now is if it can build a 3D Printable model with Manifold CAD. I build a Gif zoetrope generator and it did fantastic.

English

1.2K

322.8K

Random Libertarian Tech Lead@someRandomDev5·1d

@wesbos If you want an interesting test that actually demonstrates brownfield development (without requiring too much set up on your own part), ask the model to expand or modify the behavior of a large complicated open-source project that you actively use.

English

Random Libertarian Tech Lead@someRandomDev5·1d

@wesbos That said, this greenfield test does do a fairly good job of demonstrating how well these models can replace design agency work. If all you're building are businesscard-like or brochure-like landing pages, then sure, this test is fine.

English

Entdecken

@daniel_mac8 @NoPriorsPod @catalinmpit @OldenDev @fhinkel @Yampeleg @samtwtss @pirosb3