Boaz Hwang

338 posts

Boaz Hwang

@BoazWith

Shipped 4 apps to App Store in a month. Self-taught, no CS degree. Built AI App Factory — native mobile apps with AI agents. Building in public.

Seoul, South Korea انضم Mart 2022

197 يتبع46 المتابعون

Boaz Hwang@BoazWith·4h

@hamen The half-and-half phase is expensive. You keep translating every decision instead of learning the new constraints. Finishing forces the paradigm switch faster.

English

Ivan Morgillo@hamen·4h

Greatest lesson from switching stacks: you can't keep one foot in Android patterns and one foot in Flutter and expect real progress. You have to fully commit to the new paradigm. Same goes for shipping - I stopped half-building and started finishing. AI Bedtime Stories exists because I stopped hedging and just shipped.

English

Boaz Hwang@BoazWith·4h

@vincent_spruyt This is the part teams miss. Agent-friendly UI is often just accessible UI with stable names. The model does not need magic if the product gives it handles.

English

Vincent Spruyt@vincent_spruyt·4h

Long deprioritized stuff like data-testid on your components, Aria attributes, clear and unique labels and titles, make products 10x easier to use by browser-use agents. The most token efficient and easiest to use products by computer-use agents, win. MCP is not everything

English

Boaz Hwang@BoazWith·4h

@scrappyfounder Good empty states. A product that tells me why nothing happened feels safer than one pretending the click worked.

English

Scrappy Founder@scrappyfounder·4h

What’s one tiny detail in a product that immediately makes you trust it more? For me, it’s when the tool clearly explains what just happened after I click something. Not flashy. Just honest feedback that removes doubt.

English

Boaz Hwang@BoazWith·4h

@Cyb3rDav3 @astnkennedy Sharper at spotting bad specs, maybe worse at waiting. The weird skill now is knowing when to stop the agent and rewrite the task instead of pressing enter again.

English

Dave@Cyb3rDav3·4h

@astnkennedy Do you think you’ve become sharper in certain areas tho?

English

Austin Kennedy@astnkennedy·6h

I'm 22 years old and Claude Code is deteriorating my brain. Every single day for the last 6 months I've had 6 to 8 Claude Code terminals open, waiting for a response just so I can hit 'enter' 75% of the time. And it's doing something to me. In convos with a couple of friends, it's been a point that's been brought up pretty frequently. None of us feel as sharp as we used to. I don't know if it's just us, or others in their 20s are feeling the same thing, but it's something I've been thinking about a lot. P.S. I know this is a problem with my reliability/usage of it, not Claude Code itself, but the effects are real nonetheless

English

523

2.8K

344.7K

Boaz Hwang@BoazWith·5h

@EliteDevElijah Visibility makes the first sales call warmer before it exists. That is the part people underrate. The content is doing trust work while you are not in the room.

English

Elijah | Elite Fullstack Dev@EliteDevElijah·6h

Just got a mobile app dev job from TikTok. A brand reached out after seeing my content. Visibility works. #TechTwitter #BuildInPublic #Developers #Freelance #Programming

English

Boaz Hwang@BoazWith·5h

@accidentalcto That first paid user changes the work. The number is small, but the question stops being 'will anyone care?' and becomes 'how do I find the next one without breaking the product?'

English

AK Singh@accidentalcto·5h

floow.design MRR just hit $17.49. 🎉 I know that's not a lot. But 30 days ago it was $0. I was so deep in figuring out distribution that I didn't even check my Stripe dashboard. Opened it just now and there it was. $17.49. A real human paid real money for something I built alone. That hits different. $17.49 → $10,000. The journey starts now. 🚀 #buildinpublic #indiehacker #mobiledesign

English

Boaz Hwang@BoazWith·5h

@saen_dev Yep. Containers mostly fail in observable ways. Agents can finish the job and still preserve the wrong assumption. That makes rollback and review part of the runtime, not just deployment.

English

Saeed Anwar@saen_dev·5h

Every agent framework calls itself "Kubernetes for AI agents" until you ask about pod scheduling for non-deterministic workloads. Orchestrating containers is hard, orchestrating things that hallucinate is a different sport entirely.

Praveen Kumar Verma@Alacritic_Super

Everyone is building AI agents. Almost no one is building the infrastructure to run them in production. That's where this comes in ↓ → AgentField (Agent-Field/agentfield) This is not another agent framework. It's Kubernetes for AI agents. A full control plane that turns your agents into real backend services: • Every agent = API endpoint • Callable from frontend, backend, cron, or other agents • Works with Python, Go, TypeScript • Supports 100+ models (OpenAI, Claude, Llama, etc.) But the real unlock is infra 👇 Instead of duct-taping tools, AgentField gives you: • Routing + coordination across agents • Async execution for long-running workflows • Built-in memory + workflows • Cryptographic identity for every agent • Full audit trail of every decision This solves the biggest problem in AI today: 👉 Agents work in demos 👉 They break in production AgentField treats agents like microservices: • Autoscaling • Observability (logs, metrics, tracing) • Secure inter-agent communication • Rolling deployments & versioning Translation: You stop writing glue code and start building autonomous systems. If you are serious about AI agents in production, this is the layer you have been missing. Try it: github.com/Agent-Field/ag… Follow for more breakdowns on AI infra, agents, and real-world systems. #AI #AIAgents #OpenSource #Kubernetes #Backend #Automation #LLM #DevTools #BuildInPublic

English

Boaz Hwang@BoazWith·6h

@at56_ @hankisinvesting I look for replies that add a constraint, not a reaction.\n\nSomeone saying 'this failed in my case because...' is usually a real reader. A like or a generic agree tells you almost nothing.

English

Artem@at56_·6h

@hankisinvesting how do you identify the 10 read guys early

English

Henry@hankisinvesting·7h

everyone told me to grow my following nobody told me 10 real readers beat 10k followers Deebs built a list of exactly those people 10 read guys > 100 reply guys who are yours?

Deebs DeFi 🛰@Deebs_DeFi

That was a test and ~70% failed Many thought I was engagement farming. Wrong. I put a hidden CTA in my post to check who read it Why? Let me tell you a secret: . Ever seen a post with 50k views but only 25 replies?And compare it to another with 2.5k views and 150 replies? And wonder: Why did the one with more engagement get less reach? I'll tell you why: It's because READ GUYS are 10x more valuable than REPLY GUYS. It's no secret that X favors screen time. It's written in the code. The more that users: > click into a post: "show more" > actually read it > bookmark it The further the post will go. So what's the problem? Current Reply Culture Look I love replies, and I am grateful for EVERY interaction I get on my posts, whether people read it or not. However, yesterdays experiment proved a problem I had long suspected: (Most) Reply guys are not reading. They skim the post, fire off a reply, punch their card, and move onto the next creator. And it will happen again: If you read this far, prove it by putting "golden" in a comment below. Try to make it subtle Unfortunately the current skim and reply behavior is guaranteed to limit a post's reach. As creators we need READERS not just reply guys. How I'm Fixing It 1) Just like I did in my post, I will run small social experiments to test for people paying attention to my content. Small details in the text, maybe even in pictures that only a dedicated READER could catch. 2) The people who pass the test get added to a list (already created, some people made it yesterday) 3) I will reciprocate by reading their posts, bookmarking, and providing responses when I have something to say Somber Reality I cant change the culture of everyone on X Some don't have time to read only want to fire off quick replies (It's okay I still love and appreciate you) However, I'm betting that there are actual readers out there, and I just need to find them. I'm betting that a small community of readers will be more powerful than an army of reply guys. So who wants to be a reader?

English

447

Boaz Hwang@BoazWith·6h

@_andypeacock This is the underrated use case.\n\nNot writing code from scratch, but staying patient through boring environment failure. That is where I trust agents more than myself.

English

Andrew Peacock@_andypeacock·8h

One thing I love about LLMs isn't just the coding, it's dealing with the pain. - NativePHP couldn't install PHP binaries - Claude worked out it was an SSL issue - It worked out where PHP was installed and which php.ini was used - It added SSL certs and installed. I watched it.

English

Boaz Hwang@BoazWith·6h

@saen_dev 20 users with conversations is more useful than 200 silent signups.\n\nAt that stage I would optimize for reply speed, not onboarding polish.

English

Saeed Anwar@saen_dev·6h

20 users and zero revenue is actually a great position if those 20 are giving you feedback. The worst place to be is 200 users and zero conversations because then you have traffic but no signal on what to build next.

alimkhan@alimmka_

Day 8 of building in public. The goal is to get first paying users in May. Current progress: Users: 20(+0) Revenue: $0 MRR: $0 Getting new users and customers gets harder when you’re facing deadlines in life. Still waiting for update approval on CWS.

English

Boaz Hwang@BoazWith·6h

@0xDragoonLab Making setup executable is the right direction.\n\nThe hard part is preserving intent: does /forge merge with existing CLAUDE.md decisions, or regenerate the whole harness each run?

English

Dragoon's Lab | @0xDragoon@0xDragoonLab·6h

#83 i built useforgekit. every Claude Code setup guide is something you read. this is something Claude executes. run /forge and it generates CLAUDE.md, settings.json, skills, hooks, agents, and commands from your actual codebase. 48 reference modules. zero deps. link ↘︎ github.com/Dragoon0x/usef…

English

Boaz Hwang@BoazWith·6h

@accidentalcto Yes, if it changes the generated UI instead of becoming a docs graveyard.\n\nI would make it opinionated: spacing rules, component taste, and what the generator is not allowed to invent.

English

AK Singh@accidentalcto·6h

Should i introduce design.md file suport in floow.design ?

English

Boaz Hwang@BoazWith·6h

@commandlinex86 @odd_joel Yep, that is exactly the kind of tool that earns a slot.\n\nOne less notifier means one less thing to debug when the agent loop is already the messy part.

English

commandlinex86@commandlinex86·6h

@BoazWith @odd_joel Yeah exactly. I had ntfy push set up for the same reason but Moshi's notifications basically replace it. One less moving piece to babysit 🍻

English

commandlinex86@commandlinex86·7h

Didn't know I needed this until I had it: @odd_joel's Moshi. 📱 SSH TMUX terminal built for mobile, mosh protocol so the session doesn't die when I switch WiFi to LTE, push notifs when Claude Code is done cooking. ⚡ Running it next to Termius for now to see if it earns the daily-driver spot. early read: it might. Big shoutout to Joel for reaching out. nerds helping nerds, #ThisIsTheWay 🤝

English

Boaz Hwang@BoazWith·6h

@sidsinghal_ That is the honest stage. I would keep the replica until behavior forces divergence. Otherwise you end up maintaining two guesses instead of one product.

English

Siddharth | Building MyGlowLab@sidsinghal_·7h

@BoazWith Yet to discover, I do not have enough users to get those signals. Right it's a replica but probably down the line both might diverge.

English

Siddharth | Building MyGlowLab@sidsinghal_·12h

If you're building an app in 2026 and not shipping on both iOS AND Android from day 1, you're leaving half your users behind. Yes it's twice the work. Ship for everyone or accept you're only serving half the market.

English

Boaz Hwang@BoazWith·6h

@raiderfreed That is exactly where mobile gets expensive. Flow changes look like product design, but they leak into navigation state and QA fast. Which screen order changed the most?

English

Sid Mishra🇮🇳@raiderfreed·7h

@BoazWith Yes. Most of my time was spend on changing the flow and screens order.🥲

English

Sid Mishra🇮🇳@raiderfreed·7h

Today finished building one more cross platform app with react native in the company. Handed over to client for testing. It is the most intense and heavy app that I have worked with. One major learning in this whole process is that building for web and building for mobile required very different mindset.Also how important is following the system in these projects. Now its time to take 3 day break😮‍💨

English

Boaz Hwang@BoazWith·7h

@wonderwhy_er The useful part is not the exact number, it is the repeatable baseline. If limits can move quietly, builders need their own usage tests the same way they need perf tests.

English

Eduard Ruzga@wonderwhy_er·13h

Yesterday I was a guest on Budapest Claude Code Meetup's fireside chat. Host wanted to talk about AI value-per-dollar — Codex vs Claude, Plus vs Pro vs Max, what actually buys you more tokens. An hour before going live, he texted: "hey, did the limits change this week? People online are complaining." Same day, my cofounder Dmitry Sergeev pinged me: "Codex on Business feels worse this week." Well, that is exactly what I am building desktopcommander.app/best-value-ai/ for. Way to track data for answering such questions over time. I'd run baseline numbers on April 24. Five days later, on the 29th, I scrambled to re-run due to Budapest Fireside chat host question. And results are bit shocking. Every plan I could compare across both days dropped between 35% and 61% in tokens-per-week: ▸ ChatGPT Plus / GPT-5.5: 95M → 37M weekly (−61%) ▸ Claude Max 20× / Sonnet 4.6: 388M → 214M (−45%) ▸ Claude Max 20× / Opus 4.7: 248M → 162M (−35%) ▸ Claude Pro / Sonnet 4.6: 19.6M → 11.4M (−42%) ▸ Claude Pro / Opus 4.7: 15.6M → 10.2M (−35%) 5 of 5 retested plans went down. None went up. Re-ran the headline ChatGPT Plus measurement today. It came back at 32M weekly — confirming the drop, not bouncing back. Five days. Same prices. ~Half the tokens. Taking with a bit of grain of salt though. I am tweaking and improving measurement methodology. It is estimated. But tokens did go down. May be want to contribute to these tracking efforts? Subscriptions are unstable in ways the marketing pages won't tell you. The math is the only way to see it.

English

121

Boaz Hwang@BoazWith·7h

@BeauJohnson89 The e2b-compatible swap is the interesting part. Do you know if it snapshots cleanly between runs? Isolation matters, but reset speed is what keeps agent output reviewable.

English

Beau Johnson@BeauJohnson89·7h

this repo is solving the scariest part of coding agents TencentCloud/CubeSandbox > 4,814 stars on github > rust + kvm sandbox built for ai agents > creates hardware isolated sandboxes in under 60ms > under 5mb memory overhead per instance > e2b sdk compatible, so you can swap one url and keep your app logic why this matters: coding agents are getting good enough to run real code nonstop but docker shared-kernel isolation was never built for untrusted llm-generated chaos CubeSandbox is basically saying: keep the speed of containers get closer to vm-level isolation make it cheap enough to run thousands of agents on one box this is the kind of boring infra that quietly decides which agent startups can actually scale

English

Boaz Hwang@BoazWith·7h

@j_schwartzz That call-stack inversion is the useful part. My bias: the harness matters more as the model gets better. Tool contracts, logs, and failure boundaries decide whether the agent is a product or just a demo.

English

Jon Schwartz@j_schwartzz·8h

From my perspective, we're approaching an inflection point in AI enabled software. We're going from "agents that build software" to "software that contains an agent". It's a complete inversion of the AI software stack. It's indicative of a shift from workflow-driven orchestration ("chat GPT wrapper era") to agent-driven orchestration ("harness era"). Projects like @cursor_ai and @openclaw were very early to start building agent driven orchestration software in this pattern. In the old world, most apps wrote deterministic workflows and sprinkled in LLM calls. For example, if I'm building a workout agent, I might code something like this: ``` if phase == "plan": workout = llm.call("Create a workout plan") elif phase == "execute": results = tracker.run(workout) analysis = llm.call(f"Analyze results: {results}") ``` Notice how the engineer who built this determines the steps, order, and control flow. The LLM is just a function that gets called at the top of the call stack. The new approach, like @garrytan preaches - define deterministic tools + constraints (and associated skills), and let the model drive everything else. The same workout agent code might be rewritten as: ``` llm.register_tool(build_workout) llm.register_tool(evaluate_workout) llm.register_tool(save_workout) agent.run(""" Create and validate a workout plan. Use build_workout to generate candidates. Use evaluate_workout to test effectiveness. Only save_workout if it meets the criteria. """) ``` The engineer defines only the tools and their boundaries, but let's the agent itself do most of the conditional logic and control flow. At first it feels junky, but after enough trial and error, you can get a real feeling agent. The primary benefits here are (a) this style of software leads much better to self sustaining agentic companies. The agent sits in the software stack, and can actually create tools and skills on its own to fill the gaps it needs, and (b) apps capture model upside directly. When GPT / Opus improves, your entire system gets smarter, rather than just a few isolated calls.

English

اكتشف

@hamen @vincent_spruyt @scrappyfounder @Cyb3rDav3 @astnkennedy @EliteDevElijah @accidentalcto @saen_dev