Devlin Dunsmore

25.4K posts

Devlin Dunsmore

@devlind

Founding engineer of the AI @ Work group at AWS, changing how people work. Opinions here are my own. he/him

Seattle, WA Se unió Nisan 2008

871 Siguiendo1.1K Seguidores

Devlin Dunsmore@devlind·6d

@swyx On device is real. Many advantages besides disaggregated compute. Identity being one of them.

English

swyx@swyx·6d

Bolt has the ultimate end solution of course - why rent a computer when u have a much more powerful one for free sitting on every desk

Eric Simons@EricSimons

Incoming CPU shortage is real As clouds get tapped of their supply, agent co’s will increasingly explore on-device compute (everyone already has a CPU in pocket/on desk!) With foreground agents rising, we expect this graph to get even steeper this year

English

10.2K

swyx@swyx·6d

btw every single compute infra provider’s chart, including render competitors, is looking like this. something broke in Dec 2025 and everything is becoming computer. forget GPU shortage, forget Memory shortage, the @fabknowledge pod on LS was right, there is going to be a CPU shortage

Anurag Goel@anuraggoel

This chart shows the number of paid services created on @render each week. We're doing alright.

English

980

236.3K

Devlin Dunsmore@devlind·14 Mar

@LakersLead @MindOfBron He is just a different player in the G. He's legit impressive but you can tell he's so hesitant when he plays in the league. The only thing stopping him is him at this point because we see the skill level.

English

Lakers Lead@LakersLead·14 Mar

This is the player yall say can’t play in the NBA… (via @MindOfBron)

Lakers Lead@LakersLead

SOUTH BAY IS A PERFECT 11-0 WHEN BRONNY PLAYS 📈 HE IS ON PACE FOR A 50/40/90 SZN 🎯 EVERY TEAMMATE PRAISES HIM 👏 WHAT THE MEDIA WON’T TELL YOU IS BRONNY IS QUICKLY BECOMING ONE OF THE MOST IMPROVED YOUNG PLAYERS IN THE ENTIRE ASSOCIATION 👀

English

1.5K

119.8K

Devlin Dunsmore@devlind·4 Oca

Training agents for browser use is a great example of a harness that will go away, hopefully very soon. Not because agents will get very good at it, but rather we'll just have a substrate of the web designed for agents that doesn't rely off HTML

English

Devlin Dunsmore@devlind·4 Oca

@bcherny @johndeanl I've heard of some approaches where agents have a system for reserving file access to prevent conflicting access. This seems like a more repeatable and less error prone process.

English

Boris Cherny@bcherny·3 Oca

@johndeanl I run each Claude in a separate git checkout, so they don’t conflict. To roll back, just press esc twice

English

622

33.8K

John Dean@johndeanl·3 Oca

Can someone please elaborate on how running multiple claudes at once works? Like how do you manage rolling back changes? What do you do if you realize the first prompt was bad and you want to retry it?

Boris Cherny@bcherny

1/ I run 5 Claudes in parallel in my terminal. I number my tabs 1-5, and use system notifications to know when a Claude needs input #iterm-2-system-notifications" target="_blank" rel="nofollow noopener">code.claude.com/docs/en/termin…

English

196

82.1K

Devlin Dunsmore@devlind·1 Oca

I know @Nas and hitboy had a great run but this preemo Collab just hits different!

English

Devlin Dunsmore@devlind·28 Ara

@swyx @Steve_Yegge I'm actually glad to hear someone with Steve's experience have opinions that align with my own. I have 17 years of experience but find that I can direct agents to build whatever I want with better test coverage than I would write myself. For prod I do need to review code though

English

swyx@swyx·28 Ara

btw getting an abnormal amount of lovely youtube comments for the @Steve_Yegge pod on Vibe Coding. he is of course an S tier ranter, and to some extent i knew i was just there to give a prompt and let him loose, but i think theres a certain gravity to the fact that it was HIM saying these hypey things. You can get excited and yap on about the potential of vibe coding as a 20something anon build in public hustler or midlife crisis nontechnical hasbeen marveling at a pretty purple brochure website, but when it’s Steve goddamn Yegge, who has done all the hard things at early Amazon, Google, and Grab, from assembly to databases to OSes to games, people do sit up and take notice. pod link since you read all the way down here youtu.be/zuJyJP517Uw?si… gratifying to be able to create 2 good platforms for ai engineering debates to shine through.

YouTube

Nick Taylor@nickytonline

Another banger from the @latentspacepod. Really great convo @swyx and @Steve_Yegge . Go queue it up peeps! open.spotify.com/episode/20iTCh…

English

19.5K

Devlin Dunsmore@devlind·21 Ara

@JCrossover Crawsover, Iso Joe and Della Donne are some real real hoopers on this list!

English

🏁 Jamal Crawford@JCrossover·21 Ara

God is the greatest!! Wowwwwwww 🙏🏾

Basketball HOF@Hoophall

The Naismith Basketball Hall of Fame Announces Eligible Candidates for the Class of 2026. 🔗: hoophall.com/news/naismith-… #26HoopClass

English

409

48.8K

Devlin Dunsmore@devlind·28 Kas

@LakersLead This better than the recent finals floors

English

Lakers Lead@LakersLead·27 Kas

This LeBron photo is TOO COLD 🥶

English

188

30.4K

Devlin Dunsmore@devlind·10 Kas

@_joemag_ I literally kick off a task then make dinner for the family. It's glorious!

English

Joe Magerramov@_joemag_·10 Kas

A follow up on my last post, on my new approach to coding: blog.joemag.dev/2025/11/switch…

Joe Magerramov@_joemag_

A few thoughts on AI aided coding, testing, and software development: blog.joemag.dev/2025/10/the-ne…

English

4.9K

Devlin Dunsmore@devlind·16 Eki

@rakyll Same vibe at AWS. It's amazing

English

1.6K

Jaana Dogan ヤナドガン@rakyll·15 Eki

Before coming back to this company, I couldn't think about the possibility of getting up at 6 am and being excited about work again. Seriously, this place is different.

English

417

288.5K

Devlin Dunsmore@devlind·8 Eki

@swyx #vibeeng

QME

swyx@swyx·8 Eki

RIP Vibe Coding Feb 2025 - Oct 2025

English

273

203

5.4K

1.1M

Devlin Dunsmore@devlind·3 Eki

Finally catching up on @WESTSIDEGUNN #heelshaveeyes2 and oh my goodness what a piece of art!!! 🎨🎨🎨

English

Devlin Dunsmore@devlind·21 Tem

Omg @GriseldaRecords just SLIIIIIIIIIIIIID on this new @Raekwon track 🔥🔥🔥😤😤😤

English

2.4K

Devlin Dunsmore@devlind·14 Tem

@karpathy Another approach could be to distill these lessons into tools and rely on search to find the right tool for the task. Any task that requires deterministic/discrete outcomes (such as your example) is a good candidate for this

English

Andrej Karpathy@karpathy·13 Tem

Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly increase (/decrease) the probability of every action I took for the future". You get a lot more leverage from verifier functions than explicit supervision, this is great. But first, it looks suspicious asymptotically - once the tasks grow to be minutes/hours of interaction long, you're really going to do all that work just to learn a single scalar outcome at the very end, to directly weight the gradient? Beyond asymptotics and second, this doesn't feel like the human mechanism of improvement for majority of intelligence tasks. There's significantly more bits of supervision we extract per rollout via a review/reflect stage along the lines of "what went well? what didn't go so well? what should I try next time?" etc. and the lessons from this stage feel explicit, like a new string to be added to the system prompt for the future, optionally to be distilled into weights (/intuition) later a bit like sleep. In English, we say something becomes "second nature" via this process, and we're missing learning paradigms like this. The new Memory feature is maybe a primordial version of this in ChatGPT, though it is only used for customization not problem solving. Notice that there is no equivalent of this for e.g. Atari RL because there are no LLMs and no in-context learning in those domains. Example algorithm: given a task, do a few rollouts, stuff them all into one context window (along with the reward in each case), use a meta-prompt to review/reflect on what went well or not to obtain string "lesson", to be added to system prompt (or more generally modify the current lessons database). Many blanks to fill in, many tweaks possible, not obvious. Example of lesson: we know LLMs can't super easily see letters due to tokenization and can't super easily count inside the residual stream, hence 'r' in 'strawberry' being famously difficult. Claude system prompt had a "quick fix" patch - a string was added along the lines of "If the user asks you to count letters, first separate them by commas and increment an explicit counter each time and do the task like that". This string is the "lesson", explicitly instructing the model how to complete the counting task, except the question is how this might fall out from agentic practice, instead of it being hard-coded by an engineer, how can this be generalized, and how lessons can be distilled over time to not bloat context windows indefinitely. TLDR: RL will lead to more gains because when done well, it is a lot more leveraged, bitter-lesson-pilled, and superior to SFT. It doesn't feel like the full story, especially as rollout lengths continue to expand. There are more S curves to find beyond, possibly specific to LLMs and without analogues in game/robotics-like environments, which is exciting.

English

409

835

8.4K

1.1M

Devlin Dunsmore@devlind·12 Tem

@balldontstop Lebron played great offball when Luka was there though?

English

Devlin Dunsmore@devlind·10 Tem

@elonmusk @grok Where does this actually rank on the overall leaderboard? It's well known that the large frontier models generally suck at this benchmark

English

Elon Musk@elonmusk·10 Tem

One of these is not like the others @Grok

English

3.5K

4.4K

43K

12.2M

Devlin Dunsmore@devlind·6 Tem

@ASpittel Also much easier for developers create their own workflows and over time, best practices will emerge. Then you can roll those best practices into an opinionated UI to make it easier for the rest of the user base that doesn't want/need the customizability of the cli

English

Ali Spittel@ASpittel·6 Tem

Hot take, it’s the reason people use CLIs usually: the UI doesn’t get in the way, which helps you do what you want faster. I don’t think we’re close to getting to the final UIs for GenAI.

Sherpa@LLMSherpa

If Claude is really doing so much of the coding for Anthropic, why haven't they used it to create a fucking ui for Claude Code? It's 2025. Why the fuck am I forced to use a cli for everything as if it were 1995?

English

5.9K

Devlin Dunsmore@devlind·5 Tem

Man #Ironheart is just so so fucking good

English