david

74 posts

david banner
david

david

@davidbshan

co-founder & cto @freesolo

San Francisco Katılım Eylül 2023
501 Takip Edilen232 Takipçiler
david
david@davidbshan·
Is there a better way/tool to get a tldr on a research paper than just pasting the arxiv link to ChatGPT and asking it to summarize?
English
0
0
0
26
david retweetledi
Jerry Zhang
Jerry Zhang@zjearbear·
Introducing Lemma. Your AI agents are failing in ways you can’t see. Lemma is the world’s first reliability platform that finds and fixes these issues fast.
English
59
37
216
36.4K
david
david@davidbshan·
Using cursor to help me set up my claude code is the new version of using safari to help me install chrome
English
0
0
4
99
Rohin
Rohin@0xrohin·
I had my O-1 approved 👽🇺🇸 Excited to keep working with my best friends @freesolo in San Francisco! Thanks @0xSigil and everyone else who helped.
Rohin tweet media
English
36
12
262
21K
david
david@davidbshan·
RL is like cooking, if you set the learning rate too high. The model gets burnt.
English
0
0
5
129
hamza mostafa
hamza mostafa@hamostaf04·
for those of you that don't have a GPU handy to play around with, i built a small fork of the repo that lets your coding agent tinker and experiment using a GPU on the cloud using @modal sandboxes w/ updated instructions in README and program.md. link in comments. enjoy :)
Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English
13
3
81
7.8K
david retweetledi
Y Combinator
Y Combinator@ycombinator·
Traverse (@traverseso) is a data research lab solving the hardest problem in AI training: subjective, taste-dependent work. They generate long-horizon training data for frontier models by observing and capturing what specialists actually do on the job. Congrats on the launch, @lanceyyan and @thezacharyyu! ycombinator.com/launches/PdH-t…
English
54
66
374
70.5K
Alex Hamidi
Alex Hamidi@ahamidi_·
I built a site that lets you doomscroll personal websites:
English
75
222
3.8K
95.5K
david retweetledi
Rohin
Rohin@0xrohin·
team offsite @freesolo ⛷️
Rohin tweet mediaRohin tweet media
English
0
2
9
1.1K
david
david@davidbshan·
What’s happening w the sf weather these days 😭
English
0
0
0
191
david
david@davidbshan·
chatgpt should have nicer search for their conversations
English
0
0
0
141
david retweetledi
Regina Lin
Regina Lin@reggitales·
You have 100+ tabs open and your brain is fried. Introducing Dex, your second brain in Chrome that organizes, remembers, and takes action for you. Turn tabs into to-dos, multitask with agents, find and save anything for later. All without leaving your tab. As a founder, it's already saved me hundreds of hours. Comment for 1M free tokens - joindex [dot] com
English
668
207
3K
268.7K
david
david@davidbshan·
Is anyone else having trouble attaching the LoRA adapter trained on Tinker to gpt-oss-20b 😢 @thinkymachines
English
0
1
7
655