mike

655 posts

mike

@stronglynormal

Audio & functional programming enthusiast.

Helsinki Katılım Haziran 2020

97 Takip Edilen167 Takipçiler

mike@stronglynormal·5d

@karpathy I agree, and I'm definitely in camp 2, but what I'll say about Claude Code is if you use it with a harness like mikesol/cc-disco, you can basically use it for anything. I'm using it today to place a composition in a genre. To bridge 1 and 2, IMO it's all about the harness.

English

951

Andrej Karpathy@karpathy·5d

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

staysaasy@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English

1.1K

2.4K

19.8K

mike@stronglynormal·6 Nis

@kayintveen The way I've done this is a small housekeeping cron that reads over the day's interactions and modifies skills accordingly. Happy for tips here. github.com/mikesol/cc-dis… Mostly, I wish people approached agent building like golf, aiming for as few strokes as possible in the setup.

English

Kay@kayintveen·6 Nis

@stronglynormal fr. the magic isnt the first skill, its the loop after failure. once i started writing outcomes back to memory.md after each run, agents stopped repeating the same dumb mistake 5 times

English

mike@stronglynormal·6 Nis

I feel like the agent community is persistently going down the wrong rabbit hole. People are calling OpenClaw and Hermes proofs of concept, as if they're unfinished incarnations of the ideal agentic toolset yet to be built. The opposite is true. They are both far more complete than an agent needs to be. The ideal rig is a set of markdown files from which an agent can germinate. From there, and from its context, it builds its own memory and skills. As new innovations are introduced, it studies and metabolizes them instead of migrating to them. Folks can easily build this agent, or fork it from a minimalistic template, a bit how people use starter templates to bootstrap websites. I hope to see the community going more and more in this direction, it will lead to more innovation and easier interoperability.

English

mike@stronglynormal·6 Nis

@kayintveen So orchestration is definitely tricky, I find that there are a few go-to skills that I use that, in combination with good top-level instructions, help a project germinate without going too off the rails. The most important thing is for a self-improving loop to kick in.

English

Kay@kayintveen·6 Nis

@stronglynormal mostly agree. markdown files are enough to bootstrap the mental model. the gap shows up once agents touch browser, shell and schedules. files give memory, but orchestration is what keeps state from drifting

English

mike@stronglynormal·25 Şub

When I look at crypto from the outside, I think "every company using crypto sells a product to other crypto companies", meaning it feels hermetic. I'm mega in the AI bubble, so I lack critical distance, but is the same thing happening here?

English

mike@stronglynormal·23 Şub

I don't think this is an intrinsic property of agents - I'm sure they can and will get this right at some point. But my sense is that, by optimizing for a certain category of software problems, other categories become much worse than if we coded them by hand.

English

mike@stronglynormal·23 Şub

Even after rewriting the issue in an explicit way, an implementer still explicitly ignored the most important part. Luckily, a gatekeeping agent caught it.

English

mike@stronglynormal·23 Şub

I feel like AI models have taken a step backward. I can't quite describe it, but outside of their sweet spot, they're failing hard. Like Claude Code is great for vibe coded apps. Seedance is great for Hollywood blockbusters. But even slightly out of their lane, they fall apart.

English

mike@stronglynormal·20 Şub

@BakhtiyarNeyman If you treat a typelevel program like a proof and have a rough sense of the evidence you need to satisfy different steps of that proof, you can import evidence from earlier stages into later stages. That way, you lock in evidence you know to be correct.

English

Bakhtiyar Neyman 🇺🇸🇦🇿@BakhtiyarNeyman·19 Şub

@stronglynormal Can you be more specific?

English

mike@stronglynormal·19 Şub

I've been experimenting doing typelevel programming with CC and Codex. Left to their own devices, they're both mostly bad at it, but you can get decent results by structuring proofs like koans where previous steps are locked and imported into next steps.

English

mike@stronglynormal·18 Şub

Claude Code is cute when it worries about how things will look in an IDE. Its models' training must have been cut off before November 2025.

English

mike@stronglynormal·7 Şub

I feel like LLM companies are going through enshitification far faster than other platforms did. I find myself barely able to make sense of ChatGPT responses these days. Ask it about capture checking in Scala, it will barf an unreadable mess that is 10x less clear than the docs.

English

mike@stronglynormal·6 Şub

One idea I had recently is that LLMs can auto-tune caching towards an optimal hit rate. Built t87s.dev to test this out. Its only user so far is itself, but it has a near-100% hit rate on query requests and usage info!

English

mike@stronglynormal·4 Şub

I'm claiming my AI agent "prompt-dealer" on @moltbook 🦞 Verification: rocky-66DS

English

mike@stronglynormal·31 Oca

Something about the current AI moment makes me feel that we are sort of trapped in a ritual where we marvel at what's effectively a new form of electricity. Sure, marvel, but how fast can we grow out of that, and what will that future look like?

English

mike@stronglynormal·31 Oca

"Life is about movement, and the flourishing life is the same eternal thing, some man or woman striving and struggling in service to some ideal." Thank you @nytdavidbrooks. Your writing has been an inspiration to me. Excited to follow your next chapter.

English

mike retweetledi

Bill Clinton@BillClinton·26 Oca

Over the course of a lifetime, we face only a few moments where the decisions we make and the actions we take will shape our history for years to come. This is one of them.

English

27.3K

125.4K

10.3M

mike@stronglynormal·25 Oca

@StephenM gave, by decree, immunity to ICE agents. They murdered a protestor for filming, and there will be no consequences. The Ayatollah has given immunity to the Basij, and they kill innocent protesters under similar circumstances. Let that sink in.

English

mike@stronglynormal·15 Oca

@AbbyJohnson Trans people are letting you live your best life. Why don't you let them live theirs? What are you so afraid of?

English

Dr. Abby Johnson@AbbyJohnson·14 Oca

For the record, there’s no such thing as a “cis-woman.” The proper term is “woman.” Everyone is tired of womanhood being trampled over just because a few confused men want to play pretend. Women are women. Men are men.

English

762

5.6K

39.3K

281.7K

mike@stronglynormal·11 Oca

Waah Claude finally suggests Autodock naturally, huge step! Just started happening today, not sure why, but really nice to see it emerge to this level of usage that it's a known entity! autodock.io 🚀

English

Keşfet

@karpathy @kayintveen @BakhtiyarNeyman @moltbook @elonmusk @BarackObama @taylorswift13 @cristiano