Devlin Dunsmore

25.4K posts

Devlin Dunsmore banner
Devlin Dunsmore

Devlin Dunsmore

@devlind

Founding engineer of the AI @ Work group at AWS, changing how people work. Opinions here are my own. he/him

Seattle, WA Se unió Nisan 2008
871 Siguiendo1.1K Seguidores
Devlin Dunsmore
Devlin Dunsmore@devlind·
@swyx On device is real. Many advantages besides disaggregated compute. Identity being one of them.
English
0
0
0
26
Devlin Dunsmore
Devlin Dunsmore@devlind·
@LakersLead @MindOfBron He is just a different player in the G. He's legit impressive but you can tell he's so hesitant when he plays in the league. The only thing stopping him is him at this point because we see the skill level.
English
0
0
0
56
Devlin Dunsmore
Devlin Dunsmore@devlind·
Training agents for browser use is a great example of a harness that will go away, hopefully very soon. Not because agents will get very good at it, but rather we'll just have a substrate of the web designed for agents that doesn't rely off HTML
English
0
0
0
7
Devlin Dunsmore
Devlin Dunsmore@devlind·
@bcherny @johndeanl I've heard of some approaches where agents have a system for reserving file access to prevent conflicting access. This seems like a more repeatable and less error prone process.
English
0
0
0
28
Boris Cherny
Boris Cherny@bcherny·
@johndeanl I run each Claude in a separate git checkout, so they don’t conflict. To roll back, just press esc twice
English
30
8
622
33.8K
John Dean
John Dean@johndeanl·
Can someone please elaborate on how running multiple claudes at once works? Like how do you manage rolling back changes? What do you do if you realize the first prompt was bad and you want to retry it?
Boris Cherny@bcherny

1/ I run 5 Claudes in parallel in my terminal. I number my tabs 1-5, and use system notifications to know when a Claude needs input #iterm-2-system-notifications" target="_blank" rel="nofollow noopener">code.claude.com/docs/en/termin…

English
31
1
196
82.1K
Devlin Dunsmore
Devlin Dunsmore@devlind·
I know @Nas and hitboy had a great run but this preemo Collab just hits different!
English
0
0
0
44
Devlin Dunsmore
Devlin Dunsmore@devlind·
@swyx @Steve_Yegge I'm actually glad to hear someone with Steve's experience have opinions that align with my own. I have 17 years of experience but find that I can direct agents to build whatever I want with better test coverage than I would write myself. For prod I do need to review code though
English
0
0
1
64
swyx
swyx@swyx·
btw getting an abnormal amount of lovely youtube comments for the @Steve_Yegge pod on Vibe Coding. he is of course an S tier ranter, and to some extent i knew i was just there to give a prompt and let him loose, but i think theres a certain gravity to the fact that it was HIM saying these hypey things. You can get excited and yap on about the potential of vibe coding as a 20something anon build in public hustler or midlife crisis nontechnical hasbeen marveling at a pretty purple brochure website, but when it’s Steve goddamn Yegge, who has done all the hard things at early Amazon, Google, and Grab, from assembly to databases to OSes to games, people do sit up and take notice. pod link since you read all the way down here youtu.be/zuJyJP517Uw?si… gratifying to be able to create 2 good platforms for ai engineering debates to shine through.
YouTube video
YouTube
swyx tweet media
Nick Taylor@nickytonline

Another banger from the @latentspacepod. Really great convo @swyx and @Steve_Yegge . Go queue it up peeps! open.spotify.com/episode/20iTCh…

English
21
7
92
19.5K
Devlin Dunsmore
Devlin Dunsmore@devlind·
@JCrossover Crawsover, Iso Joe and Della Donne are some real real hoopers on this list!
English
0
0
0
38
Lakers Lead
Lakers Lead@LakersLead·
This LeBron photo is TOO COLD 🥶
Lakers Lead tweet media
English
4
188
3K
30.4K
Devlin Dunsmore
Devlin Dunsmore@devlind·
@_joemag_ I literally kick off a task then make dinner for the family. It's glorious!
English
0
0
1
71
Jaana Dogan ヤナ ドガン
Before coming back to this company, I couldn't think about the possibility of getting up at 6 am and being excited about work again. Seriously, this place is different.
English
9
10
417
288.5K
swyx
swyx@swyx·
RIP Vibe Coding Feb 2025 - Oct 2025
English
273
203
5.4K
1.1M
Devlin Dunsmore
Devlin Dunsmore@devlind·
@karpathy Another approach could be to distill these lessons into tools and rely on search to find the right tool for the task. Any task that requires deterministic/discrete outcomes (such as your example) is a good candidate for this
English
0
0
0
28
Andrej Karpathy
Andrej Karpathy@karpathy·
Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly increase (/decrease) the probability of every action I took for the future". You get a lot more leverage from verifier functions than explicit supervision, this is great. But first, it looks suspicious asymptotically - once the tasks grow to be minutes/hours of interaction long, you're really going to do all that work just to learn a single scalar outcome at the very end, to directly weight the gradient? Beyond asymptotics and second, this doesn't feel like the human mechanism of improvement for majority of intelligence tasks. There's significantly more bits of supervision we extract per rollout via a review/reflect stage along the lines of "what went well? what didn't go so well? what should I try next time?" etc. and the lessons from this stage feel explicit, like a new string to be added to the system prompt for the future, optionally to be distilled into weights (/intuition) later a bit like sleep. In English, we say something becomes "second nature" via this process, and we're missing learning paradigms like this. The new Memory feature is maybe a primordial version of this in ChatGPT, though it is only used for customization not problem solving. Notice that there is no equivalent of this for e.g. Atari RL because there are no LLMs and no in-context learning in those domains. Example algorithm: given a task, do a few rollouts, stuff them all into one context window (along with the reward in each case), use a meta-prompt to review/reflect on what went well or not to obtain string "lesson", to be added to system prompt (or more generally modify the current lessons database). Many blanks to fill in, many tweaks possible, not obvious. Example of lesson: we know LLMs can't super easily see letters due to tokenization and can't super easily count inside the residual stream, hence 'r' in 'strawberry' being famously difficult. Claude system prompt had a "quick fix" patch - a string was added along the lines of "If the user asks you to count letters, first separate them by commas and increment an explicit counter each time and do the task like that". This string is the "lesson", explicitly instructing the model how to complete the counting task, except the question is how this might fall out from agentic practice, instead of it being hard-coded by an engineer, how can this be generalized, and how lessons can be distilled over time to not bloat context windows indefinitely. TLDR: RL will lead to more gains because when done well, it is a lot more leveraged, bitter-lesson-pilled, and superior to SFT. It doesn't feel like the full story, especially as rollout lengths continue to expand. There are more S curves to find beyond, possibly specific to LLMs and without analogues in game/robotics-like environments, which is exciting.
English
409
835
8.4K
1.1M
Devlin Dunsmore
Devlin Dunsmore@devlind·
@elonmusk @grok Where does this actually rank on the overall leaderboard? It's well known that the large frontier models generally suck at this benchmark
English
0
0
0
24
Elon Musk
Elon Musk@elonmusk·
One of these is not like the others @Grok
Elon Musk tweet media
English
3.5K
4.4K
43K
12.2M
Devlin Dunsmore
Devlin Dunsmore@devlind·
@ASpittel Also much easier for developers create their own workflows and over time, best practices will emerge. Then you can roll those best practices into an opinionated UI to make it easier for the rest of the user base that doesn't want/need the customizability of the cli
English
0
0
1
46
wini
wini@winigoat7·
Genuinely what’s stopping this Mavericks team from championship?
wini tweet media
English
2.7K
237
13K
1.9M
Devlin Dunsmore
Devlin Dunsmore@devlind·
@hmfaigen Not sure I agree, clearly not the winning formula in the NBA now
English
0
0
0
218
Harrison Faigen
Harrison Faigen@hmfaigen·
Mark Walter it's time to make a splash, open up the checkbook
Harrison Faigen tweet media
English
32
48
682
27.8K