Noah VanSickle

64 posts

Noah VanSickle

@vansickn

slopitecture

New York, NY Katılım Haziran 2023

152 Takip Edilen18 Takipçiler

Noah VanSickle@vansickn·2h

npm maintainers rn youtube.com/watch?v=u8qgeh…

YouTube

English

Noah VanSickle retweetledi

Mario Zechner@badlogicgames·16h

Step 1: give technical champions a neat tool to dork around with in their free time Step 2: wait for them to bring it into the enterprise, with or without approval, because it's useful Step 3: gain enterprise share rapidly Step 4: piss of the technical champions by fucking with their free time tool use and also degrade the quality of the tool by insisting on vibe coding everything Step 5: have your CEO constantly say all SWEs are on the chopping board in 6 months Step 6: find out you have no moat surprisedpikachu.pcx

Will Sentance@willsentance

posted on this back in march, but this will eventually become a study in a biz school somewhere. claude had the upper hand for the last two quarters due to their harness + model quality showing breakthroughs for production grade coding. they lost that lead almost overnight. heres why: 1. they treated their model as the moat, which wasnt sustainable as all OAI had to do was tune for code and release. the real moat for power users(the main consumer base + source for coding data) is price/perfomance and UX of the harness. OAI holds all compute and a comparable model so they get the price floor, simple as. 2. for some reason, anthropic decided to release a PR stint around Mythos with the implication that devs weren't to be trusted with such power, and its clear at this point it really was an attempt to declare their pivot away from the consumer to enterprise. this was also interpreted as a signal that anthropic wont be releasing SOTA to the consumer anymore, so users switched. OAI released a comparable model anyway and the world didn't implode, so, theres that too. 3. OAI bought all the talent for the harness they could over the last 12 months, Alex app, etc all got folded into one thing: make codex the best ever. All efforts in the company went towards this, instead of silently abandoning Claude Code users for enterprise like Anthropic is probably doing. 4. The claude code team is faced with hard choices, report the churn as a price/perfomance issue and take that up with execs, only to be told they cant budge, or try to find core UX issues that might win back some users. both choices are suboptimal and wont solve. core lesson: if you plan to abandon your core customer, be really careful how you execute that or you may end up in a canyon you cant cross

English

216

23.8K

Noah VanSickle@vansickn·16h

@itsyourcode @MattRogish Excited to read

English

PME@itsyourcode·16h

@MattRogish @vansickn Oh man yes I have so much to say about this. Planning a blog post on it soon actually

English

PME@itsyourcode·2d

Under-discussed problem right now with most frontier coding models. A leading contributor to slop and incidental complexity and daily pain Great read @vansickn !

English

1.6K

Noah VanSickle@vansickn·18h

Scared to use an em dash these days

English

Noah VanSickle@vansickn·18h

@micsolana Noticed this as well. Coworker recently said: "the quiet part out loud" which I think is a frequent codex-ism

English

118

Mike Solana@micsolana·1d

I have started to notice very smart people who use AI every day, typically as a kind of thought partner, have started to sound like AI, not only employing whatever popular new turn of phrase, but in this kind of bulleted cadence of speaking — a copy of a copy. concerning!

English

149

1.1K

107.4K

Noah VanSickle retweetledi

Tenobrus@tenobrus·1d

gpt 5.5 be like "the point is *not* <insanely dumb thing you never said or even implied>, it's <thing you directly asked it to explain and clearly understand>"

English

2.7K

72.3K

Noah VanSickle@vansickn·1d

@championswimmer @badlogicgames It’s sad to say that I’ve raised my voice to my agents once or twice. Almost always over this type of issue

English

Arnav Gupta@championswimmer·1d

@badlogicgames @vansickn One time I added to the prompt “my game is not released, I don’t care about migrating old save games to new version” and it still went ahead and thought “user said no migration let me write a function to discover old save games and delete them” like bruhhh 🥲

English

Noah VanSickle@vansickn·2d

> You're right and I just did it again. The shimmed Badge is the same trick with a different costume — a component that lies about what it is. Let me undo it and migrate callsites directly. Someone please tell me how I can get my agents to stop shimming

English

5.2K

Noah VanSickle@vansickn·1d

@badlogicgames @championswimmer @badlogicgames really enjoyed your ai engineer talk about Pi, watched it a few weeks ago and it’s stuck with me since. In your eyes is reducing this “backwards compatibility” problem a harness issue or a training issue? Or maybe harness until training catches up

English

Mario Zechner@badlogicgames·1d

@championswimmer @vansickn very true

English

Noah VanSickle@vansickn·1d

@MattRogish @itsyourcode So nice getting validation from you all on this topic Insanely annoying developing like this

English

Matt “Friend of the pod” Rogish 🇺🇸@MattRogish·1d

YES I have to spread that in all my prompts, garbage like: "implementation work must replace vestigial object-shape assumptions outright. Do not preserve compatibility in code APIs. Write database migrations. No `TODO`/`pending`/`xit`/`skip` markers, no "implementation deferred" stubs, no dead buttons or unreachable routes. Do not defer something until some "later phase". Do it now." yada yada yada

English

Noah VanSickle@vansickn·1d

@hthieblot I have 11 🫪

English

Hubert Thieblot@hthieblot·1d

The next billion-dollar founder has 9 followers on X rn. I will find you & fund you!

English

801

1.6K

81.1K

Noah VanSickle@vansickn·1d

@kauffinger What avenues could we use to verify codebase cleanliness? Tweaking RL rewards for really agressive linting? It’s interesting since this is a pretty subjective problem in terms of how you like your codebases set up. Makes me think more opinionated frameworks are the way to go

English

Konstantin@kauffinger·1d

It’s the RL as you say in the article. When we train on passing tests, that’s what the model does. When we train on small, verifiable tasks that need to be green… Well, the model is definitely not changing any of the old code if it doesn’t have to. Also, might be preferable to changing everything with each prompt, that would be horrible to work with.

English

Noah VanSickle retweetledi

Qui Vincit@vincit_amore·2d

It's quite impressive that even with how quickly and effectively LLMs can move now, my adhd is still able to overwhelm their action potential. Instead of getting caught up and staying on top it all better, I have just increased the amount of things on my plate I'm juggling until I'm back to the same relative baseline of being overwhelmed that I'm comfortable with.

English

165

Noah VanSickle@vansickn·1d

@vincit_amore That is so accurate lmfao I constantly find a way to stress myself out

English

Qui Vincit@vincit_amore·1d

@vansickn Similiar to my adhd observation here lol x.com/vincit_amore/s…

Qui Vincit@vincit_amore

English

Noah VanSickle@vansickn·2d

Does anyone feel like models get worse after a little bit? I have a bit of a honeymoon phase with each release and then it feels like it regresses. Is this quantization or a psychological phenomena

English

104

Noah VanSickle@vansickn·1d

@vincit_amore On the other hand it may just be providers saving compute, we’ve seen examples from Anthropic introducing adaptive thinking / model reasoning defaults all willy nilly

English

Noah VanSickle@vansickn·1d

@vincit_amore Think you’re probably right, and that I get “too comfortable” throwing harder and harder problems at it. Maybe it’s sort of a Peter Problem where my expectations of the model rise to the level of its incompetence.

English

Noah VanSickle@vansickn·1d

@kauffinger Wonder if it’s just a context problem or a training problem or a mix of both. Maybe it’s safer to have a model that does that instead of willy nilly breaking production? Regardless my speed of iteration is significantly impacted by this behavior

English

142

Konstantin@kauffinger·1d

@vansickn I'll iterate on a model and claude will add a migration each time instead of just re-migrating with an updated migration. You really have to repeat you're not live every single time.

English

230

Noah VanSickle@vansickn·1d

@karpathy I have a centralized nextjs app as a dashboard for my projects: almost like my own personalized prototyping layer. Hooked up to all my services for monitoring and visualizing

English

Andrej Karpathy@karpathy·1d

This works really well btw, at the end of your query ask your LLM to "structure your response as HTML", then view the generated file in your browser. I've also had some success asking the LLM to present its output as slideshows, etc. More generally, imo audio is the human-preferred input to AIs but vision (images/animations/video) is the preferred output from them. Around a ~third of our brains are a massively parallel processor dedicated to vision, it is the 10-lane superhighway of information into brain. As AI improves, I think we'll see a progression that takes advantage: 1) raw text (hard/effortful to read) 2) markdown (bold, italic, headings, tables, a bit easier on the eyes) <-- current default 3) HTML (still procedural with underlying code, but a lot more flexibility on the graphics, layout, even interactivity) <-- early but forming new good default ...4,5,6,... n) interactive neural videos/simulations Imo the extrapolation (though the technology doesn't exist just yet) ends in some kind of interactive videos generated directly by a diffusion neural net. Many open questions as to how exact/procedural "Software 1.0" artifacts (e.g. interactive simulations) may be woven together with neural artifacts (diffusion grids), but generally something in the direction of the recently viral x.com/zan2434/status… There are also improvements necessary and pending at the input. Audio nor text nor video alone are not enough, e.g. I feel a need to point/gesture to things on the screen, similar to all the things you would do with a person physically next to you and your computer screen. TLDR The input/output mind meld between humans and AIs is ongoing and there is a lot of work to do and significant progress to be made, way before jumping all the way into neuralink-esque BCIs and all that. For what's worth exploring at the current stage, hot tip try ask for HTML.

Thariq@trq212

x.com/i/article/2052…

English

819

1.7K

16.8K

2.3M

Noah VanSickle@vansickn·2d

@kinglycrow @itsyourcode Insanely frustrating especially when you care about quality and building long term

English