Hanif Carroll

1.1K posts

Hanif Carroll

@HanifCarroll

🇺🇸🇦🇷 | AI Product Engineer 🤓 Obsessed with building digital products 🥏 Ultimate frisbee / 🏋🏿‍♂️ Barbell enthusiast

Buenos Aires, Argentina Katılım Ocak 2024

63 Takip Edilen93 Takipçiler

Sabitlenmiş Tweet

Hanif Carroll@HanifCarroll·16 Nis

I read Sully’s piece on why LLM pipelines get slow when agents try to do too much, and it connected with one of the most important lessons I’ve learned as a software engineer: task decomposition. The useful version of that lesson is not just “break big software projects into smaller pieces.” It’s broader than that. If I’m avoiding a task because it feels overwhelming, the answer is usually not to wait until I feel more motivated. It’s to break the task down until I find a piece small enough to act on. That same lesson applies to the AI systems we build. We’re all guilty of wanting AI to do the whole job in one pass. We stuff a long transcript, ten objectives, edge cases, formatting rules, and a complex schema into one prompt, then blame the model when the first pass is weak. Sometimes the answer is not “wait six months for better models.” Sometimes the answer is: understand the task well enough to decompose it. Give each model a narrower job. Give it only the context it needs. Let specialized pieces run in parallel. Use QA as a guardrail, not as a crutch for a bad first pass. That’s what I liked about the Sully piece. It reframes speed as a consequence of better task design, not just faster inference. It also makes me more excited about open source and smaller models. If task complexity is the real bottleneck, then the future is not just bigger models doing everything. It’s better systems that make each model’s job smaller.

Sully.ai@sullyai

x.com/i/article/2044…

English

Hanif Carroll@HanifCarroll·13h

@signulll Generalizes to most (all?) sports. I've been thinking about this recently. It's definitely true in ultimate frisbee. Sports are about creating, manipulating, and taking advantage of space.

English

signüll@signulll·1d

in basketball the best players love spacing cuz it gives room to move, & lanes to attack. tech rn is maximum spacing. the floor is effectively wide open. the world is reshuffling fast enough that an entrepreneur gets to define what the next version looks like. chaos is a ladder type stuff. these types of windows don’t stay open long & don’t occur as often as you’d like them to.

English

430

23.1K

Hanif Carroll@HanifCarroll·13h

@tmophoto @sudoingX Same thing happens to me with Hermes and OpenClaw. That's why I've given up on coding with them and only use Codex app now.

English

tmo@tmophoto·13h

do you know why Hermes agent wont do this? It almost feels like its hard coded to save tokens like they aren't being generated at home for free. No matter what model i use it always requrires an immense about of babysitting... sometimes i can just answer yes 40 times in a row while it works through the same kind of list.

English

750

Sudo su@sudoingX·20h

few days into codex plus and i think i found the hack. nobody is talking about it and the value sitting in this subscription is wild. the hack: do not prompt the agent. write a single detailed task doc with every requirement laid out plus the final vision of what you are building, then fire codex cli with one line, accomplish this and test until done. it goes. hours of uninterrupted agentic coding on gpt 5.5 xhigh, no throttling, no rate cap, 'no can you clarify loop'. the agent has everything it needs in one place so it works the problem instead of working you. i have been grinding it since this morning, screenshot below shows the session past 24 mins and still running. anthropic burns through your daily allowance in three opus 4.7 prompts then your entire tier id is gone for the day. codex plus on the same money goes on and on while you go take a walk. this is the most underrated subscription in the agentic stack right now. the value is there if you front-load the prompt instead of conversation-mode it. give codex the brief, walk away, come back to a finished task. try this. loot the value while the math still favors you.

English

97.2K

Hanif Carroll retweetledi

Alexander Embiricos@embirico·22h

codex can work in the future: "tomorrow, check in on this discussion and ping me if it isn't resolved" "let me know if this bug isn't fixed by the day before launch" "bug me if this flaky test doesn't go green after retry" i do this all the time. powerful but not obvious—yet

English

559

37.9K

Hanif Carroll@HanifCarroll·1d

Easy way to 10x the UX of your app: Tell Codex to use Browser Use to go through the important flows, then ask it to identify usability problems.

English

Hanif Carroll@HanifCarroll·1d

We launched a Spanish learning app on iOS and Android in 4 weeks. The hardest part wasn't the app. It was figuring out who should pay for it first. The obvious answer was Spanish learners. But learners are hard to monetize on day one. They want to try before they commit, and you can't blame them. The clearer first buyer: teachers. A tutor who can generate level-appropriate readings, share them with a class, and track what students are working on—that's someone with a real budget and a real problem. The students come with them. That one insight reshaped the entire MVP. Here's what shipped: → iOS + Android reader for Spanish learners → Web workspace for teachers: class management, student groups, shared readings → RevenueCat for mobile subscriptions, Stripe for teacher plans → App Store + Google Play listings, store assets, legal pages, public site, handoff docs I've worked on a lot of MVPs. This one reminded me that "who is this for" and "who is paying for this" are two different questions, and you need to answer the second one before you can build the right first version. Full case study in the comments.

English

Hanif Carroll@HanifCarroll·1d

@brettmiller128 @petergyang Out of curiosity, do you have openclaw try to update itself? I was doing that, and it broke every time as well. Now I have both Hermes Agent and OpenClaw and I just ask them to update each other. Seems to be working so far.

English

Brett Miller@brettmiller128·1d

@petergyang I am super frustrated with openclaw. It breaks every time I update. Hermes is much more stable.

English

559

Peter Yang@petergyang·2d

I caved and downloaded Hermes to try. For those of you who have tried both Hermes and OpenClaw what difference do you notice? No shilling please, just want some honest opinions

English

375

1.1K

296.9K

Hanif Carroll@HanifCarroll·2d

Currently experimenting with custom CLI tools + codex exec + launchd for scheduled tasks. There's a powerful combination in there, just gotta figure out its shape and scope.

English

Hanif Carroll@HanifCarroll·3d

@kr0der Used to be xhigh all the time, then high, now I mostly use low with fast mode, occasionally medium.

English

345

Anthony Kroeger@kr0der·3d

it's been a week of GPT 5.5, what reasoning level are you using, and do you have fast mode on? last time, the most popular response was medium -> xhigh -> high -> low i'm personally using mostly high

English

140

33.7K

Hanif Carroll@HanifCarroll·4d

Reminds me of two phrases that I often think about. "Everything counts" - Brian Tracy. Every action, thought, and decision either adds to or subtracts from your success. Nothing is neutral; small, daily habits accumulate over time to determine your ultimate results, wealth, and character. "Don't practice what you don't want to become." - Jordan Peterson. Idea that follows from "everything counts".

English

Kpaxs@Kpaxs·4d

Every moment of attention is a double-entry in your life’s accounting system. The ledger is always balanced. You never “just” do something with your attention. You’re always making a trade. The horror is that it’s ruthlessly fair. It doesn’t care about your intentions. It doesn’t grade on a curve. If you spend three hours practicing outrage detection, you get three hours better at outrage detection, and three hours worse at everything you didn’t practice. You can’t cheat it. You can’t game it. You can’t “just this once” your way out of it.

English

209

5.8K

Hanif Carroll@HanifCarroll·4d

@gregpr07 @steipete Codex for anything that needs to be done. Claude for research/learning.

English

163

Gregor Zunic@gregpr07·4d

I really like how @steipete put this Claude Code is an extrovert really pleasant to talk to at a party but sometimes but sometimes not as good at coding. Codex is the autistic kid in the corner, but if you manage to talk to it it's gonna get anything done. Is this outdated at this point?

Gregor Zunic@gregpr07

Who actually uses Codex over Claude Code? Claude Code is just 100x better imo, like the DX is WAY better.

English

25.6K

Hanif Carroll@HanifCarroll·4d

@StewartalsopIII @josemv Al azar = at random

Indonesia

Stewart Alsop - Host of Crazy Wisdom Radio Show@StewartalsopIII·5d

@josemv amazing, thanks!

English

Stewart Alsop - Host of Crazy Wisdom Radio Show@StewartalsopIII·5d

I just love that I get to live in a culture where foreigners are so accepted and given so much leeway (much like the US from 1800s to 1990s) that I get to play the ignorant gringo jester and so much performance art value arises in my daily interactions Learned the word “asar” which I think means chance in one such interaction just now

Stewart Alsop - Host of Crazy Wisdom Radio Show tweet media

English

207

Hanif Carroll@HanifCarroll·4d

Most of my problems with OpenClaw and Hermes Agent seemed to come from me asking them to update themselves. So, looks like I'll have them update each other instead.

English

Hanif Carroll@HanifCarroll·5d

Codex (app) thinking level select is still broken? Looks like they fixed it after you start a conversation, but when you're on that first screen I still can't change the thinking level. Anyone else still seeing this problem?

English

Hanif Carroll@HanifCarroll·6d

@bidah It's unfortunate, but I've given up on Next as well as Vercel. Vercel is nice, but I recently learned about everything that Cloudflare offers and their generous limits, so I'll be trying them out for future projects.

English

ROFI@bidah·6d

@HanifCarroll Next.js is chained to Vercel. You can't switch infra provider. I do think it's great tech but Vercel OSS story is broken.

English

ROFI@bidah·29 Nis

Pnpm works better for me. What are you using for monorepos and react native? Join conversation 👇

Peter Piekarczyk (🥧🚗🐥)@peterpme

There is so much friction w yarn + worktrees + react native monorepo The `yarn install` step takes over a minute and really messes with my flow 😭 I've been experimenting with hoisting my global cache up a directory so my worktrees can all re-use it but alas, still having problems What am I missing?

English

Hanif Carroll@HanifCarroll·6d

Been apartment hunting in Buenos Aires and ran into a frustrating problem: when you filter for "washer" on rental sites, you get a mix of units with an in-unit washer and units with a shared laundry room. No way to tell them apart without clicking through every single listing and scrolling through all the photos yourself. So I built something to fix it. You give it a URL, it analyzes all the listing photos, and tells you whether the washer is actually in the unit. First version is working. Next step is getting it to run across an entire search results page so I don't have to feed it URLs one by one. The architecture ended up being pretty interesting. I used a smaller, cheaper model (gpt-5.4-mini) to do the initial pass on all the photos, and anything it's not confident about gets escalated to a stronger model (gpt-5.4). To figure out which models were actually reliable, I had to build a testing harness. I collected a dataset of listings I already knew the answer to, so I could run each model against it and measure accuracy. Turns out the cheapest model (gpt-5.4-nano) wasn't cutting it. This kind of testing is called evals, and it's one of the most important parts of building anything serious with AI—without it you're just guessing. Am I the only one who's had this apartment search problem?

English

Hanif Carroll@HanifCarroll·6d

This is why we put up guardrails for the system: lints, tests, scripts, browser use so that the agent can see the work that it just completed. It is stochastic, but much less so with the proper systems in place. I do agree that you end up feeling drained, though I think this has to do with context switching than AI itself. As you wait for one task to finish, you switch to another. That adds up.

English

154

Tero Parviainen@teropa·6d

in which @jeremyphoward nails the phenomenology of agentic coding

Machine Learning Street Talk@MLStreetTalk

A masterclass from @jeremyphoward on why AI coding tools can be a trap -- and what 45 years of programming taught him that most vibe coders will never learn. - AI coding tools exploit gambling psychology - The difference between typing code and software engineering - Enterprise coding AND prompt-only vibe coding are "inhumane" i.e. disconnecting humans from understanding-building - AI tools remove the "desirable difficulty" you need to build deep mental models. Out on MLST now!

English

218

41K

Hanif Carroll@HanifCarroll·6d

@zeke Thanks for this! I'm a huge fan of this style.

English

331

Zeke Sikelianos@zeke·6d

I made a Swiss International Style design system as an agent skill. npx skills add zeke/swiss-design-skill swiss.ziki.boo

English

840

69K

Hanif Carroll@HanifCarroll·6d

@shinjipons @zeke It's a skill, not a model.

English

Shinji Pons@shinjipons·6d

@zeke But is it good? Was it made without training on copyrighted material?

English

Hanif Carroll@HanifCarroll·6d

The most underrated thing AI did: unblock high-agency non-technical people who always had ideas but no way to build them. If you haven't started experimenting with AI yet, what are you waiting for?

English

Keşfet

@signulll @tmophoto @sudoingX @brettmiller128 @petergyang @kr0der @gregpr07 @steipete