Calde

11.9K posts

Calde banner
Calde

Calde

@calde_ux

Product Manager @ArionKoder. I tweet about digital products, strategy, UX & technology. Working remotely before it was cool.

Corrientes Katılım Ağustos 2008
1.9K Takip Edilen2K Takipçiler
Calde
Calde@calde_ux·
Please, @googledrive, start treating Markdown files as a first class citizen. How come the only options we have is to open them with Google Docs or third party apps?
English
0
0
0
75
Calde
Calde@calde_ux·
😮
Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

ART
0
0
0
103
Calde retweetledi
Jason Fried
Jason Fried@jasonfried·
1999: Small, lean, quick, fit, profitable. 2026: Small, lean, quick, fit, profitable. The fundamentals are the fundamentals.
English
80
165
2K
57.6K
Calde retweetledi
Richie - oss/acc
Richie - oss/acc@richiemcilroy·
the funny part is that this post is both true and a contradiction. I recently replaced our ~$200/mo Intercom bill with an internal messaging embed I built myself in a few hours. We own all our data now and it costs almost nothing to run. so Intercom just lost a few thousand $ a year from us. Multiply that by a few hundred other people doing the same thing and that’s maybe $1m? $2m? $5m? gone. software is changing. No doubt. the question is… is it changing for everyone? or just the people already building it
English
15
9
238
76.2K
Richie - oss/acc
Richie - oss/acc@richiemcilroy·
apparently software as we know it is dead because Susan from accounting is going to vibe code a calendly alternative while on her lunch break I think some of you need to go outside and speak to real people
English
223
387
8K
861.6K
Calde retweetledi
Guido Marucci Blas
Guido Marucci Blas@guidomb·
I am seriously considering implement all my CI from scratch with an agent, thin web server that listens to github notifications using an agent using a bare EC2 instance like a good old days. Tire of all this bullshit of github not running, blacksmith outages, 1Password outages
English
0
1
1
198
Calde retweetledi
Anthropic
Anthropic@AnthropicAI·
AI speeds up complex tasks more than simpler ones: the higher the education level to understand a prompt, the more AI reduces how long it takes. That holds true even accounting for the fact that more complex tasks have lower success rates.
Anthropic tweet media
English
10
18
416
43.7K
Calde
Calde@calde_ux·
Now imagine something like Claude Cowork but mobile, embedded in your Mobile OS, and your cloud spaces. But also: not this initial version. The more mature version after 3, 5 iterations of this. The tech is already here, we are only missing the product iterations.
English
2
0
0
60
Calde
Calde@calde_ux·
2y ago I commented on a talk about how part of the AI Shift was GenAI changing one core concept of modern software: the "Operative System". Claude Cowork covers a good chunk of what was in my imagination at the moment. And this is just starting.
English
0
0
0
52
Calde
Calde@calde_ux·
I'm starting to suspect LinkedIn's AI strategy is to collect tons and tons of AI Slop pieces to be sold as negative sampling later.
English
0
0
0
22
Calde retweetledi
Calde
Calde@calde_ux·
100% recommend vibe coding instead of doom scrolling if you're bored.
Louis Amira@louisamira

@OfficialLoganK Vibe coding is doomscrolling except you look up at 1am and realize you built something

English
0
0
1
62
Calde retweetledi
GREG ISENBERG
GREG ISENBERG@gregisenberg·
I find this extremely lame and i'll call it out. All of these X accounts are fake based in India or "West Asia" yet pretty well-known people interact with them and follow them. Someone creates an account claims a role at a frontier AI lab based in SF (it's a lie), and then mostly curates smart-sounding charts, threads, and takes from other people usually without credit. They often use the format "this guy literally xyz...." Over time, a network of these accounts boosts each other, making the signals look even stronger, and my guess is that the endgame is selling influence, distribution, or “growth” or AI automation services once the audience is large enough. I have seen tons of these accounts recently and maybe you have too.
GREG ISENBERG tweet mediaGREG ISENBERG tweet mediaGREG ISENBERG tweet media
English
161
31
634
100.6K
Calde retweetledi
Stanford HAI
Stanford HAI@StanfordHAI·
Can we trust therapy chatbots? Is automation eliminating the wrong parts of our jobs? Are users’ private conversations training AI models? Stanford HAI scholars explored these questions and more. See what resonated most with our readers this year: hai.stanford.edu/news/most-read…
English
7
22
105
3.9K
Calde retweetledi
∩
@zachpogrob·
culture can't be bought
English
4
7
95
4.6K
Calde
Calde@calde_ux·
The 'For You' algorithm is designed to keep you angry, anxious, and scrolling. It is literally bad for your health. I opted out of the dopamine loop. Strictly 'Following' + Chronological order. The silence is golden.
English
0
0
0
68
Calde
Calde@calde_ux·
Hey @NotebookLM are you aware we cannot listen to the podcast we create in Android Auto? Or is it just me?
English
0
0
0
44
kat kampf
kat kampf@kat_kampf·
We started internal testing some big updates to the @GoogleAIStudio experience today! Coming to you early next year but reply below if you’d like early access in the coming weeks 👀
English
3.1K
127
3.7K
308K