Calde

11.9K posts

Calde

@calde_ux

Product Manager @ArionKoder. I tweet about digital products, strategy, UX & technology. Working remotely before it was cool.

Corrientes Katılım Ağustos 2008

1.9K Takip Edilen2K Takipçiler

Calde@calde_ux·12 Mar

Please, @googledrive, start treating Markdown files as a first class citizen. How come the only options we have is to open them with Google Docs or third party apps?

English

Calde@calde_ux·10 Mar

😮

Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

ART

103

Calde retweetledi

Jason Fried@jasonfried·3 Mar

1999: Small, lean, quick, fit, profitable. 2026: Small, lean, quick, fit, profitable. The fundamentals are the fundamentals.

English

165

57.6K

Calde@calde_ux·24 Şub

@richiemcilroy I loved the joke-to-reflection switch

English

Calde retweetledi

Richie - oss/acc@richiemcilroy·23 Şub

the funny part is that this post is both true and a contradiction. I recently replaced our ~$200/mo Intercom bill with an internal messaging embed I built myself in a few hours. We own all our data now and it costs almost nothing to run. so Intercom just lost a few thousand $ a year from us. Multiply that by a few hundred other people doing the same thing and that’s maybe $1m? $2m? $5m? gone. software is changing. No doubt. the question is… is it changing for everyone? or just the people already building it

English

238

76.2K

Richie - oss/acc@richiemcilroy·22 Şub

apparently software as we know it is dead because Susan from accounting is going to vibe code a calendly alternative while on her lunch break I think some of you need to go outside and speak to real people

English

223

387

861.6K

Calde retweetledi

Guido Marucci Blas@guidomb·11 Şub

I am seriously considering implement all my CI from scratch with an agent, thin web server that listens to github notifications using an agent using a bare EC2 instance like a good old days. Tire of all this bullshit of github not running, blacksmith outages, 1Password outages

English

198

Calde retweetledi

Anthropic@AnthropicAI·16 Oca

AI speeds up complex tasks more than simpler ones: the higher the education level to understand a prompt, the more AI reduces how long it takes. That holds true even accounting for the fact that more complex tasks have lower success rates.

English

416

43.7K

Calde@calde_ux·17 Oca

Now imagine something like Claude Cowork but mobile, embedded in your Mobile OS, and your cloud spaces. But also: not this initial version. The more mature version after 3, 5 iterations of this. The tech is already here, we are only missing the product iterations.

English

Calde@calde_ux·17 Oca

2y ago I commented on a talk about how part of the AI Shift was GenAI changing one core concept of modern software: the "Operative System". Claude Cowork covers a good chunk of what was in my imagination at the moment. And this is just starting.

English

Calde@calde_ux·14 Oca

I'm starting to suspect LinkedIn's AI strategy is to collect tons and tons of AI Slop pieces to be sold as negative sampling later.

English

Calde retweetledi

Martín Aberastegue Catena@Xyborg·9 Oca

Have you checked your @supabase implementation today? Do it, it's free, and fix your RLS 😀 supaexplorer.com/supabase-leak-…

Martín Aberastegue Catena@Xyborg

I've been polishing this feature for the last few days, but it's actually been part of SupaExplorer for a while. Now it's public, way more visible, and easier for everyone to use! ⚡️ 1) Run your first Supabase leak scan (it's free) 2) If you spot anything weird, connect your project and run a full audit, also free! 3) Need to fix something? No stress, ask our AI helper to generate your security report + the database fixes. All in one place, built for people who don't want to jump between 5 different tools 😁

English

683

Calde@calde_ux·31 Ara

100% recommend vibe coding instead of doom scrolling if you're bored.

Louis Amira@louisamira

@OfficialLoganK Vibe coding is doomscrolling except you look up at 1am and realize you built something

English

Calde retweetledi

GREG ISENBERG@gregisenberg·28 Ara

I find this extremely lame and i'll call it out. All of these X accounts are fake based in India or "West Asia" yet pretty well-known people interact with them and follow them. Someone creates an account claims a role at a frontier AI lab based in SF (it's a lie), and then mostly curates smart-sounding charts, threads, and takes from other people usually without credit. They often use the format "this guy literally xyz...." Over time, a network of these accounts boosts each other, making the signals look even stronger, and my guess is that the endgame is selling influence, distribution, or “growth” or AI automation services once the audience is large enough. I have seen tons of these accounts recently and maybe you have too.

English

161

634

100.6K

Calde@calde_ux·27 Ara

A point we should all be aware of:

Stanford HAI@StanfordHAI

ICYMI: A new @StanfordCRFM study finds AI transparency has declined sharply from 58 to 40 out of 100 points. Most companies reveal zero data on environmental impact or societal harm despite massive influence on billions of users. Read more: hai.stanford.edu/news/transpare…

English

Calde@calde_ux·23 Ara

"Developing the ability to think clearly without writing is a meta-skill that dramatically expands thinking capacity." — Doshi's Claude chat

Shreyas Doshi@shreyas

On the confusion between writing and thinking: shreyasdoshi.substack.com/p/on-the-confu… (just posted on Substack)

English

198

Calde retweetledi

Stanford HAI@StanfordHAI·23 Ara

Can we trust therapy chatbots? Is automation eliminating the wrong parts of our jobs? Are users’ private conversations training AI models? Stanford HAI scholars explored these questions and more. See what resonated most with our readers this year: hai.stanford.edu/news/most-read…

English

105

3.9K

Calde retweetledi

∩@zachpogrob·21 Ara

culture can't be bought

English

4.6K

Calde@calde_ux·22 Ara

The 'For You' algorithm is designed to keep you angry, anxious, and scrolling. It is literally bad for your health. I opted out of the dopamine loop. Strictly 'Following' + Chronological order. The silence is golden.

English

Calde@calde_ux·20 Ara

Hey @NotebookLM are you aware we cannot listen to the podcast we create in Android Auto? Or is it just me?

English

Calde@calde_ux·11 Ara

@kat_kampf @GoogleAIStudio I would like early access

English

kat kampf@kat_kampf·10 Ara

We started internal testing some big updates to the @GoogleAIStudio experience today! Coming to you early next year but reply below if you’d like early access in the coming weeks 👀

English

3.1K

127

3.7K

308K

Keşfet

@googledrive @richiemcilroy @supabase @NotebookLM @elonmusk @BarackObama @taylorswift13 @cristiano