Tweet fixado
JBaba
1.9K posts

JBaba
@JBabaTalks
Let's build, https://t.co/vy1e38LSoY about me. https://t.co/ZvN4aVUywF Building. https://t.co/H51kcr3F0P Failed.
Entrou em Eylül 2018
163 Seguindo62 Seguidores

Stupidity is backed into intelligence
Duca@big_duca
We have AGI (for coding). And yet so much software is still so damn buggy. (including my own startup) Why?
English

@adamdotdev We have invisible hands from the top as well for more work
English

Man, I felt this video so hard. It feels like all of us (devs) that are using AI are trying to tiptoe around these feelings, or hang onto "but I use it correctly! i review the code!", idk, I think the drug analogy is apt. Once you have the button, it feels impossible to not use the button.
On the one hand, there is something so tempting about throwing it all away and going back to life as it was before all of this. Even a little part of me that hopes it's all unsustainable and crumbles down around me so I don't have to make the impossible decision. On the other hand, I've spent most of my career unable to sleep at night because I couldn't wait to wake up and implement the solution that came to me before bed. I don't have that problem anymore, I sleep like a baby, and I'm only now realizing it's because there's no mystery anymore, no burning desire to wake up and shape the world. I just put the prompts in, and that doesn't get me out of bed. Rock and a hard place for sure.
Anyway, use OpenCode, our new subscription (Go) is the best way to buy your drugs I mean tokens! 🥲
Mo@atmoio
I was a 10x engineer. Now I'm useless.
English

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project.
This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.:
- It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work.
- It found that the Value Embeddings really like regularization and I wasn't applying any (oops).
- It found that my banded attention was too conservative (i forgot to tune it).
- It found that AdamW betas were all messed up.
- It tuned the weight decay schedule.
- It tuned the network initialization.
This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism.
github.com/karpathy/nanoc…
All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges.
And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

It needs to collapse asap.
Rand@rand_longevity
every professor I talk to that uses AI says the college system is about to collapse
English

@JBabaTalks No, the difference is the agents are doing the planning and not you anymore
English

@JBabaTalks > Me having an idea → my agents planning and then orchestrating multiple parallel agents
English











