Ben

208 posts

Ben

@BuildItWithBen

Lead Android Engineer at Phyn Love learning about the future of: Software/IoT/Design Native Android + iOS

NY Katılım Kasım 2013

386 Takip Edilen140 Takipçiler

Ben@BuildItWithBen·10h

True to my experience. Not switching in the sense of no longer using the others, but changing which is the main driver. Being fluent and comfortable with each as they evolve and not locking yourself into one is a big advantage imo.

Soham Naik@sohamnaikdev

@cursor_ai Life cycle of a dev rn

English

Ben@BuildItWithBen·6d

@ryancarson Very gud setup 🤌

English

Ryan Carson@ryancarson·6d

Grinding on frontend design. NOT using any multi-agent orchestration here. Just a *ton* of back and forth with Codex + comments copied from agentation.com

English

135

16.9K

Ben@BuildItWithBen·12 Mar

@noahzweben Ooh yes been waiting for this much obliged...

English

157

Noah Zweben@noahzweben·12 Mar

claude --remote-control claude --remote-control <name> to spawn an interactive session with remote-control enabled!

English

131

12.9K

Ben@BuildItWithBen·9 Mar

Those who are tribal about what is "best" in terms of things like model selection, workflows, IDE vs no IDE....are at a huge disadvantage. The way things stand today, it's not either/or but both/and. Experiment, adapt, and understand all the nuance at play.

English

Ben@BuildItWithBen·5 Mar

It's really hard not to feel like we're in the hard takeoff stage of AI + its impact when the best teams are putting out releases like this seemingly every few days. Even in my own work, it now feels like a slow day if I push <100 commits. The game has fundamentally changed.

Lee Robinson@leerob

Cursor now has automations! You can run agents on schedules, trigged by events from Slack, GitHub, or any MCP server. I get a daily review every morning with my GitHub/Slack activity. Our team now has dozens of agents running 24/7 improving or monitoring things for us.

English

Ben@BuildItWithBen·4 Mar

Apple marketing team put on an absolute masterclass with this one. Truly a piece of art: youtube.com/watch?v=u3SIKA…

YouTube

English

Ben@BuildItWithBen·4 Mar

@juanbuis Apple marketing team in full form with this one. The full video blew me away, absolute masterclass: youtube.com/watch?v=u3SIKA…

YouTube

English

411

juan@juanbuis·4 Mar

the macbook neo website is *stunning* that font, that color... we're SO back

English

728

126.6K

Ben@BuildItWithBen·4 Mar

@49agents it's structured prompts driving a file-based protocol: -same skill doc injected into both models -both read/write one shared .md file with a strict template that forces claim-by-claim responses and a convergence score -bash script alternates CLI calls until they converge

English

49 Agents - Agentic Coding IDE@49agents·3 Mar

@BuildItWithBen thats a smart setup. the back-and-forth between codex app and claude code is real friction. are you using any specific prompting strategy to get them to hand off cleanly

English

Ben@BuildItWithBen·3 Mar

Ended up creating a bash script for this to reduce the 'back-and-forth' friction of going between Codex mac app and Claude Code in terminal for several rounds. The script manages the back and forth now, calling codex and claude cli on loop until convergence. Truly hands off.

Ben@BuildItWithBen

HIGHLY recommend: A '/cross-analyze' skill set up in both claude and codex 1. generate plan 2. 'review' prompt/skill in both 3. run /cross-analyze in each which is mostly just "analyze this analysis in light of this other analysis" 4. repeat until convergence 5. go

English

Ben@BuildItWithBen·4 Mar

ultrathink HAS RISEN (v2.1.68). Almost certainly an attempt by Anthropic to address recent scalability issues caused by a sudden influx of new demand.

Ben@BuildItWithBen

ultrathink coloring + animation perfect example of a low-effort, high-delight feature. A simple joy.

English

Ben@BuildItWithBen·4 Mar

Wild

Parker Lyman@parker_lyman

This is how competitive it is in China OpenClaw installers have started offering 2 hours of house cleaning as part of the package in order to win clients They’ll even list any items you want to declutter on a secondhand marketplace All for $57

English

Ben@BuildItWithBen·3 Mar

@mitchellh Tablet strapped to leg, pencil + paper, or neither?

English

554

Mitchell Hashimoto@mitchellh·3 Mar

Beautiful views of the California central coast while in a climbing turn, plus a view of the airport I just departed from at the end.

English

466

36.4K

Ben@BuildItWithBen·3 Mar

@leerob Similar experience, I agree hedonic adaptation is a large part of it. 2 things I've found helpful are: -Staying close to the models -Building an agent workflow that acts as a 'protector'. Feed in content + analyze things like hype, bias, 1st principles, and actionability

English

Lee Robinson@leerob·3 Mar

Why does it feel like new models get incredibly hyped at their launch, but then a few weeks later, they're now "trash"? I've been thinking about this a lot and many things can be true at the same time. I'll give you my most optimistic & pessimistic takes. Let's start positive. When new models come out, especially those that are state of the art (SOTA), they are genuinely incredible to try and use. Things that the prior generation of models sucked at... new models sometimes completely solve/fix! This can feel like a massive unlock for building software, knowledge work, research, data analysis, etc. One new model we've been testing is incredibly good at making accurate SQL queries, seemingly better than any other model we've tried. This is exciting! And it makes sense people then share those opinions here. Okay, more pessimistic. For better or for worse, people are incentivized to share colorful takes on pretty much anything. Some of these folks are relying on those X creator payouts for side cash. The views literally convert to dollars! This can create... tension. It's hard to tell when a take is honest and genuine, versus sensationalized for Elon bucks. (Side note: the "paid partnership" labels on tweets are a step in the right direction, although I don't think they really solve this inherent issue with creator payouts, as it's not a direct payment from company → creator) So there's lots of hype when new models drop. We see benchmarks where numbers usually go up and to the right, but it's hard to tell if that actually translates to better performance on the things we care about. The only way to really know is to try it, tinker, build things... but that takes time to do correctly, which is why the best takes on models are often a little delayed while people really "taste test" them. There's another angle here that makes it hard to understand hype/hate, which is that these models all have their own personalities/style/quirks. One person might love the verbosity and warmth of a model, while someone else completely hates it. At least for coding, it does seem like Codex/Opus/etc are converging to a similar style, but they are definitely still different (and people feel strongly about those differences!). So people use the latest frontier models for weeks to a month, but then you notice that the tides may turn online. Opus was the best model in the world, and now people think it's dumb/slow/bad. Rinse and repeat for Codex or other models. It's helpful to remember that most people are busy happily building/shipping at this point! Sometimes this feeling of model degradation is due to an actual issue! Maybe there was an inference bug, or provider downtime, or small updates/tweaks. The model checkpoints can change. However I would argue this is not the majority case. The best explanation for this, to me, is "hedonic adaptation". You quickly can get used to an improvement, so that what previously felt amazing and innovative now feels like your new baseline. Then it's no longer new/sexy. This is just how our brains are wired and not really specific to AI models. The best way to combat it is to be aware of your own biases. So... what should you do to make sense of all the takes on this site? 1. Try to read lots of opinions, not just official posts, but those from a variety of people using the models for things *you're interested in* 2. Listen and take note of opinions, but make sure you're forming your own opinions based on your usage/tinkering/experimentation 3. Remember to be skeptical of sensationalized posts about new models (it's so over / we're so back cycle)

English

350

55.6K

Ben@BuildItWithBen·3 Mar

@heyeaslo Super clean. Love the brand consistency across your socials + websites + app.

English

803

Easlo@heyeaslo·3 Mar

I'm building an expense tracker that syncs with Notion.

English

1.3K

113.6K

Ben@BuildItWithBen·3 Mar

@TyRobben This is also currently my favorite AI pod - as a software engineer I can vouch for their deep experience & understanding of all the nuances involved with the current state of models + all the dev tooling around it. Great teachers and genuine people. I learn a lot from them.

English

Ty Robben@TyRobben·3 Mar

This is by far my favorite ai pod these 3 guys don’t work at any labs or ai companies (if they do they don’t show it) Just power users with immense experience The debate about shipping ai slop vs carefully monitored and reviewed code in startup vs corporate envs is top tier around the 28:00 mark Thank you so much @pvncher @GosuCoder @RayFernando1337 open.spotify.com/episode/5BGGX2…

English

1.4K

Ben@BuildItWithBen·28 Şub

@thenanyu fwiw I just tried the new /simplify using opus 4.6 and it regressed my code rather than improving it. I had to use 5.3-codex to identify that what it did was wrong and fix it.

English

224

Nan Yu@thenanyu·28 Şub

I see things like /simplify and the existence of code review and bug finding AIs. I have to ask, why do these things exist? Why doesn't the coding agent just naturally do these things? I'm sure there's a good answer. Can someone help me understand?

Boris Cherny@bcherny

In the next version of Claude Code.. We're introducing two new Skills: /simplify and /batch. I have been using both daily, and am excited to share them with everyone. Combined, these kills automate much of the work it used to take to (1) shepherd a pull request to production and (2) perform straightforward, parallelizable code migrations.

English

139

518

155.8K

Ben@BuildItWithBen·28 Şub

@karpathy Vilfredo Pareto: 80% mensch, 20% beard en.wikipedia.org/wiki/Vilfredo_…

English

Ben retweetledi

Andrej Karpathy@karpathy·28 Şub

Cool chart showing the ratio of Tab complete requests to Agent requests in Cursor. With improving capability, every point in time has an optimal setup that keeps changing and evolving and the community average tracks the point. None -> Tab -> Agent -> Parallel agents -> Agent Teams (?) -> ??? If you're too conservative, you're leaving leverage on the table. If you're too aggressive, you're net creating more chaos than doing useful work. The art of the process is spending 80% of the time getting work done in the setup you're comfortable with and that actually works, and 20% exploration of what might be the next step up even if it doesn't work yet.

Michael Truell@mntruell

x.com/i/article/2026…

English

212

338

3.9K

587.6K

Ben@BuildItWithBen·27 Şub

@ryancarson Codex app is awesome, I've had the same experience. 5.3-codex extra high way faster now, have switched from mostly high to mostly xhigh.

English

685

Ryan Carson@ryancarson·27 Şub

Here's a quick video showing you how I've set up my local dev environment to do a lot of parallel work at the same time. I'm really enjoying the Codex app. Over the past year I've been 100% in the TUI/CLI but I think the GUI form factor is really starting to work and I’m enjoying Codex a lot. Also, I've been using gpt-5.3-codex on extra high the whole day.

English

358

30.7K

Ben@BuildItWithBen·27 Şub

@gregisenberg Yep things are ramping up x.com/BuildItWithBen…

Ben@BuildItWithBen

My observation has been that this week we crossed the line between early adopters and the early majority on the "Innovation Adoption Curve". Many who ignored AI, especially in the business world, finally had their 'aha' moment and a fire lit under them.

English

148

GREG ISENBERG@gregisenberg·27 Şub

they want you to think the block/square layoffs of 4000 employees isn't because ai "they just overhired" it doesn't take a rocket scientist to know all of a sudden you can spin up robots with human level intelligence for $200/mo of course it's ai and this will be more common

English

142

356

35.4K

Keşfet

@ryancarson @noahzweben @juanbuis @49agents @mitchellh @leerob @heyeaslo @elonmusk