Morten Vassvik

4.5K posts

Morten Vassvik

@vassvik

Simulation and rendering nerd. Co-founder and CTO @JangaFX. Working on EmberGen and more. Discord: vassvik @[email protected] @vassvik.bsky.social

Hellvik, Norway Katılım Nisan 2009

1.3K Takip Edilen3K Takipçiler

Sabitlenmiş Tweet

Morten Vassvik@vassvik·7 Ara

On this day 2 years ago I posted a tweet on what would eventually become a sparse version of EmberGen, which had already been in development for 3 years A retrospective🧵on the journey so far:

English

550

117.9K

Morten Vassvik@vassvik·5d

@fatttalis @_sholtodouglas @trq212 Just so, even worse when it's filled with technobabble-and-jargon-and-references

English

Fatttalis@fatttalis·6d

@_sholtodouglas @vassvik @trq212 The verbosity wouldn't necessarily be an issue IMO(I prefer verbose models) but not when it's spent on disclaiming, hedging and preambling itself out of existence. Frustrating when you can almost cut entire paragraphs out with the message itself being untouched in actual content.

English

Sholto Douglas@_sholtodouglas·6d

When do you reach for other models instead of Claude? What can we do better? Hit me with all of your frustrations. dms open. If you can give me detail (e.g. specifics/transcipts) - it'll help a lot in finding out exactly what we need to do to improve the next model

English

1.2K

1.4K

388.6K

Morten Vassvik@vassvik·6d

@_sholtodouglas @trq212 See x.com/vassvik/status… and followups

Morten Vassvik@vassvik

@_sholtodouglas @trq212 Here's a quick experiment, using the "car wash" gotcha question as a contrived example - not because of the subtance of the answers, but their characteristics. Ran with minimal setup to get as close to the weights as possible gist.github.com/vassvik/442041…

English

Sholto Douglas@_sholtodouglas·6d

@vassvik @trq212 link me an example of where you found it overly verbose?

English

339

Morten Vassvik@vassvik·6d

@_sholtodouglas @trq212 This genuinely matter because 1. The tokenizer uses more tokens/char 2. It produces longer responses, compounding on the previous point 3. It's exhaustive to read and interact with, causing user fatigue 4. Output tokens are the most expensive 5. Noise and waste in the context

English

Morten Vassvik@vassvik·6d

@_sholtodouglas @trq212 And no amount of claude.md or user prompt nudging truly makes it reverse course or follow any process, structure or rules to avoid its pathologies. I can show more if you want, but I feel like the example above already makes the point as a representative sample in my experience

English

Morten Vassvik@vassvik·14 May

@benjamincode And it can do this without having to re-access resources multiple times or littering the context with things it no longer need, costing tokens, usage, time and money, and impacting quality since the model gets its attention siphoned off.

English

Morten Vassvik@vassvik·14 May

@benjamincode This way you can basically formulate the code review as a DAG of forked and resumed sessions that traverse the PR in context-optimal ways (only carrying forward what it needs), ultimately ending up in a final delivery session that has the full picture

English

Benjamin Code@benjamincode·14 May

Ceux qui utilisent Claude Code et/ou Claude Desktop, ça change strictement rien. J'ai galéré à en avoir le coeur net tellement y'a de gens qui pètent un plomb. Ceux qui se plaignent, c'est quoi vos usages ? J'ai la sensation de passer à côté de trucs de ouf et limite d'être un normie de Claude tellement ce changement a 0 impact sur moi... Pourquoi je me sens si seul à pas être véner pour un sous ? Éclairez-moi par pitié !

Supersocks@iamsupersocks

Anthropic vient probablement de signer la fin de l’âge d’or Claude Code en OAuth. Officiellement : à partir du 15 juin, les plans payants Claude auront un crédit mensuel dédié à l’usage programmatique. Traduction : l’usage agentique sort du buffet illimité Si un x20 donnait jusqu’ici l’équivalent de plusieurs milliers de dollars API, et que le nouveau crédit tourne autour de quelques centaines, ce n’est pas un ajustement. C’est un changement de régime. C’était prévisible : les agents consomment comme de l’infra, pas comme du chat. MiniMax et d’autres labs ont déjà montré que le coût token finit toujours par revenir. Pour les builders, la fenêtre est claire : jusqu’à mi-juin, on build en externe. Après, on optimise, on route, on maintient si on veut continuer avec Claude. Claude restera performant, mais il risque de devenir davantage un modèle premium ponctuel plutôt qu’un moteur dédié aux agents en continu en dehors de l’écosystème Claude (et cette période privilégiée touchera peut-être bientôt à sa fin). On rappelle qu’un plan Max en x20 donne l’équivalent de 3 000 $ en crédits API via Claude Code et jusqu'à peu via Openclaw/hermès. Pour cet usage il reste OpenAI tant que l’offre OAuth/Codex reste généreuse. En mai, on peut dire que Claude reste très fort mais n'est probablement plus le moteur agentique par défaut.

Français

235

122.6K

Morten Vassvik retweetledi

Matt Pocock@mattpocockuk·13 May

This is the clarity we've been crying out for. But it's a poisoned chalice. This is a 10X cut to claude -p disguised as a monthly bonus. Anthropic is discouraging any kind of programmatic usage. And that's fine - no subsidy lasts forever. But it's time to try Codex.

ClaudeDevs@ClaudeDevs

Starting June 15, paid Claude plans can claim a dedicated monthly credit for programmatic usage. The credit covers usage of: - Claude Agent SDK - claude -p - Claude Code GitHub Actions - Third-party apps built on the Agent SDK

English

230

173

3.4K

289.3K

Morten Vassvik@vassvik·16 Nis

@trq212 Does the rewind summarize use the /compact mechanism? I've found there's generally a mixture of noise and signal in the summary no matter the hint, and it's a bit frustrating to only know whether the symmary is any good after the fact, and I don't know if I can trust it

English

Thariq@trq212·16 Nis

x.com/i/article/2044…

ZXX

289

8.5K

2.4M

Morten Vassvik@vassvik·11 Nis

@karpathy @yiningkarlli One more closer to the second group (or the same): People who uncritically brute force giant custom rule structures, agents and skills on top of already highly encumbered generic system prompts, thinking they've created this coherent and efficient process when it really isn't.

English

Morten Vassvik@vassvik·11 Nis

@karpathy @yiningkarlli Stripping system prompts of conflicts and reinforced with relevant parts, and some layers of persistent cooperating agents that can avoid polluting the context of each (which conflicts with the system prompt) you can suddenly see how capable the current models actually are.

English

Andrej Karpathy@karpathy·9 Nis

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

staysaasy@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English

1.2K

2.5K

20.8K

4.4M

Keşfet

@fatttalis @_sholtodouglas @trq212 @benjamincode @karpathy @yiningkarlli @elonmusk @BarackObama