Ben

208 posts

Ben banner
Ben

Ben

@BuildItWithBen

Lead Android Engineer at Phyn Love learning about the future of: Software/IoT/Design Native Android + iOS

NY Katılım Kasım 2013
386 Takip Edilen140 Takipçiler
Ben
Ben@BuildItWithBen·
True to my experience. Not switching in the sense of no longer using the others, but changing which is the main driver. Being fluent and comfortable with each as they evolve and not locking yourself into one is a big advantage imo.
Soham Naik@sohamnaikdev

@cursor_ai Life cycle of a dev rn

English
1
0
1
28
Ryan Carson
Ryan Carson@ryancarson·
Grinding on frontend design. NOT using any multi-agent orchestration here. Just a *ton* of back and forth with Codex + comments copied from agentation.com
English
29
1
135
16.9K
Ben
Ben@BuildItWithBen·
@noahzweben Ooh yes been waiting for this much obliged...
English
0
0
0
157
Noah Zweben
Noah Zweben@noahzweben·
claude --remote-control claude --remote-control <name> to spawn an interactive session with remote-control enabled!
English
21
3
131
12.9K
Ben
Ben@BuildItWithBen·
Those who are tribal about what is "best" in terms of things like model selection, workflows, IDE vs no IDE....are at a huge disadvantage. The way things stand today, it's not either/or but both/and. Experiment, adapt, and understand all the nuance at play.
English
0
0
0
23
Ben
Ben@BuildItWithBen·
It's really hard not to feel like we're in the hard takeoff stage of AI + its impact when the best teams are putting out releases like this seemingly every few days. Even in my own work, it now feels like a slow day if I push <100 commits. The game has fundamentally changed.
Lee Robinson@leerob

Cursor now has automations! You can run agents on schedules, trigged by events from Slack, GitHub, or any MCP server. I get a daily review every morning with my GitHub/Slack activity. Our team now has dozens of agents running 24/7 improving or monitoring things for us.

English
0
0
0
32
Ben
Ben@BuildItWithBen·
Apple marketing team put on an absolute masterclass with this one. Truly a piece of art: youtube.com/watch?v=u3SIKA…
YouTube video
YouTube
English
0
0
0
27
juan
juan@juanbuis·
the macbook neo website is *stunning* that font, that color... we're SO back
English
31
44
728
126.6K
Ben
Ben@BuildItWithBen·
@49agents it's structured prompts driving a file-based protocol: -same skill doc injected into both models -both read/write one shared .md file with a strict template that forces claim-by-claim responses and a convergence score -bash script alternates CLI calls until they converge
English
0
0
0
16
49 Agents - Agentic Coding IDE
@BuildItWithBen thats a smart setup. the back-and-forth between codex app and claude code is real friction. are you using any specific prompting strategy to get them to hand off cleanly
English
1
0
0
12
Ben
Ben@BuildItWithBen·
Ended up creating a bash script for this to reduce the 'back-and-forth' friction of going between Codex mac app and Claude Code in terminal for several rounds. The script manages the back and forth now, calling codex and claude cli on loop until convergence. Truly hands off.
Ben@BuildItWithBen

HIGHLY recommend: A '/cross-analyze' skill set up in both claude and codex 1. generate plan 2. 'review' prompt/skill in both 3. run /cross-analyze in each which is mostly just "analyze this analysis in light of this other analysis" 4. repeat until convergence 5. go

English
1
0
2
80
Ben
Ben@BuildItWithBen·
@mitchellh Tablet strapped to leg, pencil + paper, or neither?
English
2
0
0
554
Mitchell Hashimoto
Mitchell Hashimoto@mitchellh·
Beautiful views of the California central coast while in a climbing turn, plus a view of the airport I just departed from at the end.
English
20
1
466
36.4K
Ben
Ben@BuildItWithBen·
@leerob Similar experience, I agree hedonic adaptation is a large part of it. 2 things I've found helpful are: -Staying close to the models -Building an agent workflow that acts as a 'protector'. Feed in content + analyze things like hype, bias, 1st principles, and actionability
English
0
0
1
68
Lee Robinson
Lee Robinson@leerob·
Why does it feel like new models get incredibly hyped at their launch, but then a few weeks later, they're now "trash"? I've been thinking about this a lot and many things can be true at the same time. I'll give you my most optimistic & pessimistic takes. Let's start positive. When new models come out, especially those that are state of the art (SOTA), they are genuinely incredible to try and use. Things that the prior generation of models sucked at... new models sometimes completely solve/fix! This can feel like a massive unlock for building software, knowledge work, research, data analysis, etc. One new model we've been testing is incredibly good at making accurate SQL queries, seemingly better than any other model we've tried. This is exciting! And it makes sense people then share those opinions here. Okay, more pessimistic. For better or for worse, people are incentivized to share colorful takes on pretty much anything. Some of these folks are relying on those X creator payouts for side cash. The views literally convert to dollars! This can create... tension. It's hard to tell when a take is honest and genuine, versus sensationalized for Elon bucks. (Side note: the "paid partnership" labels on tweets are a step in the right direction, although I don't think they really solve this inherent issue with creator payouts, as it's not a direct payment from company → creator) So there's lots of hype when new models drop. We see benchmarks where numbers usually go up and to the right, but it's hard to tell if that actually translates to better performance on the things we care about. The only way to really know is to try it, tinker, build things... but that takes time to do correctly, which is why the best takes on models are often a little delayed while people really "taste test" them. There's another angle here that makes it hard to understand hype/hate, which is that these models all have their own personalities/style/quirks. One person might love the verbosity and warmth of a model, while someone else completely hates it. At least for coding, it does seem like Codex/Opus/etc are converging to a similar style, but they are definitely still different (and people feel strongly about those differences!). So people use the latest frontier models for weeks to a month, but then you notice that the tides may turn online. Opus was the best model in the world, and now people think it's dumb/slow/bad. Rinse and repeat for Codex or other models. It's helpful to remember that most people are busy happily building/shipping at this point! Sometimes this feeling of model degradation is due to an actual issue! Maybe there was an inference bug, or provider downtime, or small updates/tweaks. The model checkpoints can change. However I would argue this is not the majority case. The best explanation for this, to me, is "hedonic adaptation". You quickly can get used to an improvement, so that what previously felt amazing and innovative now feels like your new baseline. Then it's no longer new/sexy. This is just how our brains are wired and not really specific to AI models. The best way to combat it is to be aware of your own biases. So... what should you do to make sense of all the takes on this site? 1. Try to read lots of opinions, not just official posts, but those from a variety of people using the models for things *you're interested in* 2. Listen and take note of opinions, but make sure you're forming your own opinions based on your usage/tinkering/experimentation 3. Remember to be skeptical of sensationalized posts about new models (it's so over / we're so back cycle)
English
66
14
350
55.6K
Ben
Ben@BuildItWithBen·
@heyeaslo Super clean. Love the brand consistency across your socials + websites + app.
English
0
0
2
803
Easlo
Easlo@heyeaslo·
I'm building an expense tracker that syncs with Notion.
Easlo tweet mediaEaslo tweet media
English
69
19
1.3K
113.6K
Ben
Ben@BuildItWithBen·
@TyRobben This is also currently my favorite AI pod - as a software engineer I can vouch for their deep experience & understanding of all the nuances involved with the current state of models + all the dev tooling around it. Great teachers and genuine people. I learn a lot from them.
English
0
0
1
26
Ty Robben
Ty Robben@TyRobben·
This is by far my favorite ai pod these 3 guys don’t work at any labs or ai companies (if they do they don’t show it) Just power users with immense experience The debate about shipping ai slop vs carefully monitored and reviewed code in startup vs corporate envs is top tier around the 28:00 mark Thank you so much @pvncher @GosuCoder @RayFernando1337 open.spotify.com/episode/5BGGX2…
English
5
2
18
1.4K
Ben
Ben@BuildItWithBen·
@thenanyu fwiw I just tried the new /simplify using opus 4.6 and it regressed my code rather than improving it. I had to use 5.3-codex to identify that what it did was wrong and fix it.
English
0
0
1
224
Nan Yu
Nan Yu@thenanyu·
I see things like /simplify and the existence of code review and bug finding AIs. I have to ask, why do these things exist? Why doesn't the coding agent just naturally do these things? I'm sure there's a good answer. Can someone help me understand?
Boris Cherny@bcherny

In the next version of Claude Code.. We're introducing two new Skills: /simplify and /batch. I have been using both daily, and am excited to share them with everyone. Combined, these kills automate much of the work it used to take to (1) shepherd a pull request to production and (2) perform straightforward, parallelizable code migrations.

English
139
5
518
155.8K
Ben retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Cool chart showing the ratio of Tab complete requests to Agent requests in Cursor. With improving capability, every point in time has an optimal setup that keeps changing and evolving and the community average tracks the point. None -> Tab -> Agent -> Parallel agents -> Agent Teams (?) -> ??? If you're too conservative, you're leaving leverage on the table. If you're too aggressive, you're net creating more chaos than doing useful work. The art of the process is spending 80% of the time getting work done in the setup you're comfortable with and that actually works, and 20% exploration of what might be the next step up even if it doesn't work yet.
Andrej Karpathy tweet media
Michael Truell@mntruell

x.com/i/article/2026…

English
212
338
3.9K
587.6K
Ben
Ben@BuildItWithBen·
@ryancarson Codex app is awesome, I've had the same experience. 5.3-codex extra high way faster now, have switched from mostly high to mostly xhigh.
English
2
0
2
685
Ryan Carson
Ryan Carson@ryancarson·
Here's a quick video showing you how I've set up my local dev environment to do a lot of parallel work at the same time. I'm really enjoying the Codex app. Over the past year I've been 100% in the TUI/CLI but I think the GUI form factor is really starting to work and I’m enjoying Codex a lot. Also, I've been using gpt-5.3-codex on extra high the whole day.
English
31
20
358
30.7K
GREG ISENBERG
GREG ISENBERG@gregisenberg·
they want you to think the block/square layoffs of 4000 employees isn't because ai "they just overhired" it doesn't take a rocket scientist to know all of a sudden you can spin up robots with human level intelligence for $200/mo of course it's ai and this will be more common
English
142
25
356
35.4K