david

74 posts

david

@davidbshan

co-founder & cto @freesolo

San Francisco 가입일 Eylül 2023

501 팔로잉232 팔로워

david@davidbshan·20h

Is there a better way/tool to get a tldr on a research paper than just pasting the arxiv link to ChatGPT and asking it to summarize?

English

david 리트윗함

Jerry Zhang@zjearbear·14 Nis

Introducing Lemma. Your AI agents are failing in ways you can’t see. Lemma is the world’s first reliability platform that finds and fixes these issues fast.

English

216

36.4K

david@davidbshan·10 Nis

Using cursor to help me set up my claude code is the new version of using safari to help me install chrome

English

david@davidbshan·17 Mar

@LeonLiur Cool tech!!

English

Leo Liu@LeonLiur·17 Mar

i built a bot that watches the corbet's couloir livestream cam and clips the cleanest runs / best wipeouts FOLLOW TO SEE SOME JERRY SHIT

Corbet's Couloir Cam Clips@corbetscou19963

YOU MIGHT WANNA BRACE YOUR ACL WHILE WATCHING THIS 🤢🤢🤢 (detected via corbet's live cam on 11:25 AM MT - Mar 14, 2026) #ski #jacksonhole #corbets

English

353

david@davidbshan·14 Mar

One of the most cracked engineers I know! Absolute pleasure to be working alongside you

Rohin@0xrohin

I had my O-1 approved 👽🇺🇸 Excited to keep working with my best friends @freesolo in San Francisco! Thanks @0xSigil and everyone else who helped.

English

413

david@davidbshan·14 Mar

@0xrohin @freesolo @0xSigil @extraordinary Hugeee! Congrats 🐐

English

192

Rohin@0xrohin·14 Mar

I had my O-1 approved 👽🇺🇸 Excited to keep working with my best friends @freesolo in San Francisco! Thanks @0xSigil and everyone else who helped.

English

262

21K

david@davidbshan·11 Mar

RL is like cooking, if you set the learning rate too high. The model gets burnt.

English

129

david 리트윗함

TechCrunch@TechCrunch·10 Mar

Thinking Machines Lab inks massive compute deal with Nvidia techcrunch.com/2026/03/10/thi…

English

106

29.8K

david@davidbshan·11 Mar

@hamostaf04 @modal So sick!

English

111

hamza mostafa@hamostaf04·10 Mar

for those of you that don't have a GPU handy to play around with, i built a small fork of the repo that lets your coding agent tinker and experiment using a GPU on the cloud using @modal sandboxes w/ updated instructions in README and program.md. link in comments. enjoy :)

Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

7.8K

david@davidbshan·8 Mar

@ycombinator @traverseso congrats on the launch!!! @lanceyyan @thezacharyyu

English

190

david 리트윗함

Y Combinator@ycombinator·7 Mar

Traverse (@traverseso) is a data research lab solving the hardest problem in AI training: subjective, taste-dependent work. They generate long-horizon training data for frontier models by observing and capturing what specialists actually do on the job. Congrats on the launch, @lanceyyan and @thezacharyyu! ycombinator.com/launches/PdH-t…

English

374

70.5K

david@davidbshan·4 Mar

@ahamidi_ This is so sick!

English

Alex Hamidi@ahamidi_·3 Mar

I built a site that lets you doomscroll personal websites:

English

222

3.8K

95.5K

david@davidbshan·25 Şub

This just makes so much more sense

Stefano Ermon@StefanoErmon

Mercury 2 is live 🚀🚀 The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs. Watching the team turn years of research into a real product never gets old, and I’m incredibly proud of what we’ve built. We’re just getting started on what diffusion can do for language.

English

172

david@davidbshan·19 Şub

If your company is interested in training a task-specific small language model or just have high token usage, hit me up at david@clado.ai or message me on LinkedIn!

Eric Mao@ericmao

2 weeks ago, Brett Adcock posted a public browser agent challenge; the website contained 30 steps and had to be solved in under 5 minutes. The first thing that stood out to me was that I only got to step 8 in 5 minutes, a far cry from 30 minutes. Any agent that can complete this would surely have to be ‘superhuman’. In addition, no frontier model was able to solve this, so, in some sense, any solution also needs to 'push the frontier' on this particular challenge. As it turns out, traditional post training (RL) was not the best solution to this problem instead a recursive DSPy policy pushed GPT OSS 120B from being unable to complete step 1 to finishing all 30 in 4:10 minutes (10x faster than Sonnet 4.6). Wrote a blog detailing this below.

English

186

david 리트윗함

Rohin@0xrohin·4 Şub

team offsite @freesolo ⛷️

English

1.1K

david@davidbshan·24 Ara

What’s happening w the sf weather these days 😭

English

191

david@davidbshan·24 Ara

chatgpt should have nicer search for their conversations

English

141

david 리트윗함

Regina Lin@reggitales·11 Ara

You have 100+ tabs open and your brain is fried. Introducing Dex, your second brain in Chrome that organizes, remembers, and takes action for you. Turn tabs into to-dos, multitask with agents, find and save anything for later. All without leaving your tab. As a founder, it's already saved me hundreds of hours. Comment for 1M free tokens - joindex [dot] com

English

668

207

268.7K

david@davidbshan·11 Ara

Is anyone else having trouble attaching the LoRA adapter trained on Tinker to gpt-oss-20b 😢 @thinkymachines

English

656

탐색

@LeonLiur @0xrohin @freesolo @0xSigil @extraordinary @hamostaf04 @modal @ycombinator