kmckiern

16 posts

kmckiern banner
kmckiern

kmckiern

@kmcki3rn

You can close this tab and return to your command line.

SF شامل ہوئے Ekim 2023
126 فالونگ6 فالوورز
Garry Tan
Garry Tan@garrytan·
OK Codex is GOAT at finding bugs and finding plan errors
English
216
83
2K
361.9K
Miles Brundage
Miles Brundage@Miles_Brundage·
The most common failure mode I've observed with GPT-5.4 is misunderstanding the intent behind the prompt (but then doing a good job at what it thought the task was). I'm not sure if this is a regression or not, but it stands out by contrast w/ the task execution
English
18
1
71
5.8K
kmckiern
kmckiern@kmcki3rn·
@Dorialexander the issue i've found with subagents for annotation is divergences/distribution drift. it's slower, but I've gotten better results with one continuous pass across the full dataset per agent. eg the agent calibrates better the more samples it sees.
English
0
0
0
67
Alexander Doria
Alexander Doria@Dorialexander·
Data annotation is probably one of the few areas where the harness could be the product right now. Models have the capacity, even the meta-skill to orchestrate subagents for full pipelines, just badly controlled and implemented.
English
8
1
63
3K
kmckiern
kmckiern@kmcki3rn·
@karpathy One caveat is that the deep wiki docs on one of my projects has several mistakes and there’s no way for me to fix them unfortunately.
English
0
0
0
64
Andrej Karpathy
Andrej Karpathy@karpathy·
On DeepWiki and increasing malleability of software. This starts as partially a post on appreciation to DeepWiki, which I routinely find very useful and I think more people would find useful to know about. I went through a few iterations of use: Their first feature was that it auto-builds wiki pages for github repos (e.g. nanochat here) with quick Q&A: deepwiki.com/karpathy/nanoc… Just swap "github" to "deepwiki" in the URL for any repo and you can instantly Q&A against it. For example, yesterday I was curious about "how does torchao implement fp8 training?". I find that in *many* cases, library docs can be spotty and outdated and bad, but directly asking questions to the code via DeepWiki works very well. The code is the source of truth and LLMs are increasingly able to understand it. But then I realized that in many cases it's even a lot more powerful not being the direct (human) consumer of this information/functionality, but giving your agent access to DeepWiki via MCP. So e.g. yesterday I faced some annoyances with using torchao library for fp8 training and I had the suspicion that the whole thing really shouldn't be that complicated (wait shouldn't this be a Function like Linear except with a few extra casts and 3 calls to torch._scaled_mm?) so I tried: "Use DeepWiki MCP and Github CLI to look at how torchao implements fp8 training. Is it possible to 'rip out' the functionality? Implement nanochat/fp8.py that has identical API but is fully self-contained" Claude went off for 5 minutes and came back with 150 lines of clean code that worked out of the box, with tests proving equivalent results, which allowed me to delete torchao as repo dependency, and for some reason I still don't fully understand (I think it has to do with internals of torch compile) - this simple version runs 3% faster. The agent also found a lot of tiny implementation details that actually do matter, that I may have naively missed otherwise and that would have been very hard for maintainers to keep docs about. Tricks around numerics, dtypes, autocast, meta device, torch compile interactions so I learned a lot from the process too. So this is now the default fp8 training implementation for nanochat github.com/karpathy/nanoc… Anyway TLDR I find this combo of DeepWiki MCP + GitHub CLI is quite powerful to "rip out" any specific functionality from any github repo and target it for the very specific use case that you have in mind, and it actually kind of works now in some cases. Maybe you don't download, configure and take dependency on a giant monolithic library, maybe you point your agent at it and rip out the exact part you need. Maybe this informs how we write software more generally to actively encourage this workflow - e.g. building more "bacterial code", code that is less tangled, more self-contained, more dependency-free, more stateless, much easier to rip out from the repo (x.com/karpathy/statu…) There's obvious downsides and risks to this, but it is fundamentally a new option that was not possible or economical before (it would have cost too much time) but now with agents, it is. Software might become a lot more fluid and malleable. "Libraries are over, LLMs are the new compiler" :). And does your project really need its 100MB of dependencies?
English
300
775
7.3K
1.1M
kmckiern
kmckiern@kmcki3rn·
@ccccjjjjeeee reminds me of a VAE objective. except (ofc) there's no model update
English
0
0
0
82
Christopher Ehrlich
Christopher Ehrlich@ccccjjjjeeee·
By the way, the secret to this is property-based testing. Write a bridge that calls the original code, and assert that for arbitrary input, both versions do the same thing. Make the agent keep going until this is consistently true.
Christopher Ehrlich@ccccjjjjeeee

It actually worked! For the past couple of days I’ve been throwing 5.3-codex at the C codebase for SimCity (1989) to port it to TypeScript. Not reading any code, very little steering. Today I have SimCity running in the browser. I can’t believe this new world we live in.

English
42
54
1.1K
101.4K
kmckiern
kmckiern@kmcki3rn·
@jeffclune yo dawg, we heard you like learning, so we made a meta-agent that learns how an agent should learn
English
0
0
3
439
Jeff Clune
Jeff Clune@jeffclune·
Can AI agents design better memory mechanisms for themselves? Introducing Learning to Continually Learn via Meta-learning Memory Designs. A meta agent automatically designs memory mechanisms, including what info to store, how to retrieve it, and how to update it, enabling agentic systems to continually learn across diverse domains. Led by @yimingxiong_ with @shengranhu 🧵👇 1/
GIF
English
79
188
1.3K
229.8K
lcamtuf
lcamtuf@lcamtuf·
Moltbook debate in a nutshell
lcamtuf tweet media
Deutsch
183
3.9K
62.1K
1M
davidad 🎇
davidad 🎇@davidad·
@BlancheMinerva @sebkrier @SharmakeFarah14 @nabla_theta The whitepill is that if the primary source of egregiously misaligned behavior involves contexts that give the LLM very strong evidence that it’s being evaluated by evaluators who *really* want to find misaligned behavior, that’s actually a form of alignment with its actual user.
English
7
5
69
5.2K
Séb Krier
Séb Krier@sebkrier·
Quite funny that to the extent that they're a thing, 'misalignment' failures come from the very fears/writings of those who thought they would be necessarily a thing. Not surprised that the evals created elicited these very behaviours. If I'm a model and I see "Scratchpad", I know which part of the latent space to simulate...
Geodesic Research@GeodesResearch

If pretraining data is full of examples of AI behaving badly (sci-fi villains, safety papers on scheming, news about AI crises), models might learn these as priors for how "an AI" should act. @turntrout called this "self-fulfilling misalignment", and we found evidence it exists.

Brooklyn, NY 🇺🇸 English
8
2
62
9K
kmckiern
kmckiern@kmcki3rn·
A new(ish) engineering pattern - easy state sharing. Increasingly find myself refactoring so agents can more easily see what I see.
kmckiern tweet media
English
0
0
1
87
kmckiern
kmckiern@kmcki3rn·
wish it were standard practice to have a root configs/ dir like, this is kinda distracting, no?
kmckiern tweet media
English
0
0
1
77
kmckiern
kmckiern@kmcki3rn·
@eshear What about objective reality / physics that doesn't depend on an observer?
English
0
0
0
29
Emmett Shear
Emmett Shear@eshear·
Starting to think that existence is entirely subjective awareness, but that the upshot is not “so it’s all vibes man” but more like “physics is the what it looks like if you have a bunch of Markov blankets exchanging information under optimal inference from observation”.
English
39
9
281
15.9K
kmckiern
kmckiern@kmcki3rn·
@tenobrus we need to dismantle the bourgeoisie TC 450k
English
0
0
6
206
Bojan Tunguz
Bojan Tunguz@tunguz·
My prediction: in 50 years we’ll look at the attention economy and all the products it spewed the way we look at the tobacco industry today.
English
195
300
3.8K
237K
Marc Andreessen 🇺🇸
Next podcast from @bhorowitz and me will be the future of the American Dream -- please send us questions as replies to this xeet! 🇺🇸🚀🫡
English
201
36
528
89.6K
Pata van Goon
Pata van Goon@basedalexandoor·
he does not look ok
Pata van Goon tweet media
English
74
8
774
289.9K