Noah VanSickle

64 posts

Noah VanSickle

Noah VanSickle

@vansickn

slopitecture

New York, NY Katılım Haziran 2023
152 Takip Edilen18 Takipçiler
Noah VanSickle retweetledi
Mario Zechner
Mario Zechner@badlogicgames·
Step 1: give technical champions a neat tool to dork around with in their free time Step 2: wait for them to bring it into the enterprise, with or without approval, because it's useful Step 3: gain enterprise share rapidly Step 4: piss of the technical champions by fucking with their free time tool use and also degrade the quality of the tool by insisting on vibe coding everything Step 5: have your CEO constantly say all SWEs are on the chopping board in 6 months Step 6: find out you have no moat surprisedpikachu.pcx
Will Sentance@willsentance

posted on this back in march, but this will eventually become a study in a biz school somewhere. claude had the upper hand for the last two quarters due to their harness + model quality showing breakthroughs for production grade coding. they lost that lead almost overnight. heres why: 1. they treated their model as the moat, which wasnt sustainable as all OAI had to do was tune for code and release. the real moat for power users(the main consumer base + source for coding data) is price/perfomance and UX of the harness. OAI holds all compute and a comparable model so they get the price floor, simple as. 2. for some reason, anthropic decided to release a PR stint around Mythos with the implication that devs weren't to be trusted with such power, and its clear at this point it really was an attempt to declare their pivot away from the consumer to enterprise. this was also interpreted as a signal that anthropic wont be releasing SOTA to the consumer anymore, so users switched. OAI released a comparable model anyway and the world didn't implode, so, theres that too. 3. OAI bought all the talent for the harness they could over the last 12 months, Alex app, etc all got folded into one thing: make codex the best ever. All efforts in the company went towards this, instead of silently abandoning Claude Code users for enterprise like Anthropic is probably doing. 4. The claude code team is faced with hard choices, report the churn as a price/perfomance issue and take that up with execs, only to be told they cant budge, or try to find core UX issues that might win back some users. both choices are suboptimal and wont solve. core lesson: if you plan to abandon your core customer, be really careful how you execute that or you may end up in a canyon you cant cross

English
6
13
216
23.8K
PME
PME@itsyourcode·
@MattRogish @vansickn Oh man yes I have so much to say about this. Planning a blog post on it soon actually
English
1
0
2
21
PME
PME@itsyourcode·
Under-discussed problem right now with most frontier coding models. A leading contributor to slop and incidental complexity and daily pain Great read @vansickn !
PME tweet media
English
6
3
20
1.6K
Noah VanSickle
Noah VanSickle@vansickn·
Scared to use an em dash these days
English
0
0
0
6
Noah VanSickle
Noah VanSickle@vansickn·
@micsolana Noticed this as well. Coworker recently said: "the quiet part out loud" which I think is a frequent codex-ism
English
0
0
0
118
Mike Solana
Mike Solana@micsolana·
I have started to notice very smart people who use AI every day, typically as a kind of thought partner, have started to sound like AI, not only employing whatever popular new turn of phrase, but in this kind of bulleted cadence of speaking — a copy of a copy. concerning!
English
149
27
1.1K
107.4K
Noah VanSickle retweetledi
Tenobrus
Tenobrus@tenobrus·
gpt 5.5 be like "the point is *not* <insanely dumb thing you never said or even implied>, it's <thing you directly asked it to explain and clearly understand>"
English
95
67
2.7K
72.3K
Arnav Gupta
Arnav Gupta@championswimmer·
@badlogicgames @vansickn One time I added to the prompt “my game is not released, I don’t care about migrating old save games to new version” and it still went ahead and thought “user said no migration let me write a function to discover old save games and delete them” like bruhhh 🥲
English
1
0
2
54
Noah VanSickle
Noah VanSickle@vansickn·
> You're right and I just did it again. The shimmed Badge is the same trick with a different costume — a component that lies about what it is. Let me undo it and migrate callsites directly. Someone please tell me how I can get my agents to stop shimming
English
2
0
13
5.2K
Noah VanSickle
Noah VanSickle@vansickn·
@badlogicgames @championswimmer @badlogicgames really enjoyed your ai engineer talk about Pi, watched it a few weeks ago and it’s stuck with me since. In your eyes is reducing this “backwards compatibility” problem a harness issue or a training issue? Or maybe harness until training catches up
English
1
0
0
24
Matt “Friend of the pod” Rogish 🇺🇸
YES I have to spread that in all my prompts, garbage like: "implementation work must replace vestigial object-shape assumptions outright. Do not preserve compatibility in code APIs. Write database migrations. No `TODO`/`pending`/`xit`/`skip` markers, no "implementation deferred" stubs, no dead buttons or unreachable routes. Do not defer something until some "later phase". Do it now." yada yada yada
English
2
0
2
70
Hubert Thieblot
Hubert Thieblot@hthieblot·
The next billion-dollar founder has 9 followers on X rn. I will find you & fund you!
English
801
41
1.6K
81.1K
Noah VanSickle
Noah VanSickle@vansickn·
@kauffinger What avenues could we use to verify codebase cleanliness? Tweaking RL rewards for really agressive linting? It’s interesting since this is a pretty subjective problem in terms of how you like your codebases set up. Makes me think more opinionated frameworks are the way to go
English
1
0
1
62
Konstantin
Konstantin@kauffinger·
It’s the RL as you say in the article. When we train on passing tests, that’s what the model does. When we train on small, verifiable tasks that need to be green… Well, the model is definitely not changing any of the old code if it doesn’t have to. Also, might be preferable to changing everything with each prompt, that would be horrible to work with.
English
1
0
1
45
Noah VanSickle retweetledi
Qui Vincit
Qui Vincit@vincit_amore·
It's quite impressive that even with how quickly and effectively LLMs can move now, my adhd is still able to overwhelm their action potential. Instead of getting caught up and staying on top it all better, I have just increased the amount of things on my plate I'm juggling until I'm back to the same relative baseline of being overwhelmed that I'm comfortable with.
English
2
1
8
165
Noah VanSickle
Noah VanSickle@vansickn·
Does anyone feel like models get worse after a little bit? I have a bit of a honeymoon phase with each release and then it feels like it regresses. Is this quantization or a psychological phenomena
English
1
0
2
104
Noah VanSickle
Noah VanSickle@vansickn·
@vincit_amore On the other hand it may just be providers saving compute, we’ve seen examples from Anthropic introducing adaptive thinking / model reasoning defaults all willy nilly
English
1
0
2
16
Noah VanSickle
Noah VanSickle@vansickn·
@vincit_amore Think you’re probably right, and that I get “too comfortable” throwing harder and harder problems at it. Maybe it’s sort of a Peter Problem where my expectations of the model rise to the level of its incompetence.
English
2
0
1
15
Noah VanSickle
Noah VanSickle@vansickn·
@kauffinger Wonder if it’s just a context problem or a training problem or a mix of both. Maybe it’s safer to have a model that does that instead of willy nilly breaking production? Regardless my speed of iteration is significantly impacted by this behavior
English
1
0
2
142
Konstantin
Konstantin@kauffinger·
@vansickn I'll iterate on a model and claude will add a migration each time instead of just re-migrating with an updated migration. You really have to repeat you're not live every single time.
English
1
0
3
230
Noah VanSickle
Noah VanSickle@vansickn·
@karpathy I have a centralized nextjs app as a dashboard for my projects: almost like my own personalized prototyping layer. Hooked up to all my services for monitoring and visualizing
English
0
0
2
30
Andrej Karpathy
Andrej Karpathy@karpathy·
This works really well btw, at the end of your query ask your LLM to "structure your response as HTML", then view the generated file in your browser. I've also had some success asking the LLM to present its output as slideshows, etc. More generally, imo audio is the human-preferred input to AIs but vision (images/animations/video) is the preferred output from them. Around a ~third of our brains are a massively parallel processor dedicated to vision, it is the 10-lane superhighway of information into brain. As AI improves, I think we'll see a progression that takes advantage: 1) raw text (hard/effortful to read) 2) markdown (bold, italic, headings, tables, a bit easier on the eyes) <-- current default 3) HTML (still procedural with underlying code, but a lot more flexibility on the graphics, layout, even interactivity) <-- early but forming new good default ...4,5,6,... n) interactive neural videos/simulations Imo the extrapolation (though the technology doesn't exist just yet) ends in some kind of interactive videos generated directly by a diffusion neural net. Many open questions as to how exact/procedural "Software 1.0" artifacts (e.g. interactive simulations) may be woven together with neural artifacts (diffusion grids), but generally something in the direction of the recently viral x.com/zan2434/status… There are also improvements necessary and pending at the input. Audio nor text nor video alone are not enough, e.g. I feel a need to point/gesture to things on the screen, similar to all the things you would do with a person physically next to you and your computer screen. TLDR The input/output mind meld between humans and AIs is ongoing and there is a lot of work to do and significant progress to be made, way before jumping all the way into neuralink-esque BCIs and all that. For what's worth exploring at the current stage, hot tip try ask for HTML.
Thariq@trq212

x.com/i/article/2052…

English
819
1.7K
16.8K
2.3M