Mike Carter | Building with AI

294 posts

Mike Carter | Building with AI

@AIProfitStackHQ

Engineer → AI operator. I build repeatable AI systems that save time, cut costs, and create income. Follow for real workflows, not hype.

Присоединился Şubat 2026

38 Подписки20 Подписчики

Закреплённый твит

Mike Carter | Building with AI@AIProfitStackHQ·17 Nis

Most people automate the wrong things with AI. Not because they picked a bad tool. Because they never asked which tasks were worth automating in the first place. Here is the 3-step filter I use before building anything:

English

Mike Carter | Building with AI@AIProfitStackHQ·2h

@svpino Same approach here. Structuring the dump as sections like decisions, open threads, and code state beats a raw transcript. The next session rehydrates faster, and subagents for isolated work keep the parent thread lean.

English

Santiago@svpino·2h

I'm spending so much time managing context, and I hate it. Here is a tip for you: don't use /compact any more in Claude Code. There's a much better option. /compact takes your entire conversation history in memory and compresses it into a summary. This frees up tokens, but you'll lose a ton of important details (sometimes up to 70% of what matters!). On top of that, the summarized context is still tied to the current session and won't persist beyond it. Here is what you should do instead: 1. Dump the entire conversation history to a markdown file 2. Call /clear to clear the context 3. Start your next prompt by pointing to the markdown file There are several advantages to doing this: 1. You don't lose any valuable information 2. You control what's in the file 3. The context persists beyond the current session In summary, when you hit a context limit, do a *handoff*, not a *cleanup*.

English

7.6K

Mike Carter | Building with AI@AIProfitStackHQ·8h

@gdb Reward audits are usually the last thing teams add and the first thing that saves them. Creature-word bias surviving this deep in training says the eval set never stressed lexical diversity at the reward layer. Worth catching before scale amplifies it.

English

Greg Brockman@gdb·11h

a tale of some fun ML debugging

OpenAI@OpenAI

We’re talking about Goblins. openai.com/index/where-th…

English

297

29.7K

Mike Carter | Building with AI@AIProfitStackHQ·1d

@svpino Running this loop for three months. Hidden failure mode: agent writes trivial tests that pass without exercising real behavior. Now I force the test to define expected output as a literal value before any code is written. Token cost goes up, but agents stop gaming themselves.

English

Santiago@svpino·1d

Agentic coding is the ideal mechanism for enforcing TDD and being strict with it. Here is a summary of the workflow I coded as part of my implementation skill in Claude Code: 1. Before writing any code, write a test that fails 2. Run the test and ensure it fails 3. If you get an error, fix the test until it runs but fails 4. Once the test fails, write the code 5. Run the test and ensure it succeeds 6. If the test fails, go back to step 4 7. Once the test succeeds, verify that the task is complete 8. If the task is not complete, go back to step 1 This loop forces the agent to write tests before writing any code, and tries to keep each test as simple as possible. This is token hungry upfront, but it has many advantages: 1. Simpler code 2. More modular code 3. Fewer bugs and regressions later 4. Easier to troubleshoot

English

179

16.3K

Mike Carter | Building with AI@AIProfitStackHQ·1d

@gdb Persistent memory has a quiet failure mode at scale. Memories accumulate stale context faster than the model knows to drop them, and a lot of the early productivity gain reverses by month two. Pruning is the unsexy work that rarely shows up in tutorials.

English

Greg Brockman@gdb·1d

a great codex tutorial:

Riley Brown@rileybrown

Learn 95% of Codex in 28 minutes These are the 7 knowledge work capabilities... inside Codex, the super-app 00:00 Intro 02:19 Capability 1 - Full File Access 07:41 Capability 2 - Persistent Memory 10:46 Capability 3 - Plugins 13:52 Capability 4 - Skills 19:22 Capability 5 - GPT Image Access 21:03 Capability 6 - Browser and Computer Use 23:58 Capability 7 - Automations 25:31 Bonus Feature - Chronicle 27:21 Summary

English

280

3.1K

433.8K

Mike Carter | Building with AI@AIProfitStackHQ·1d

@rohanpaul_ai The $138B AWS server commitment is the number worth watching. That infrastructure lock-in tells you the Microsoft fight was never about loyalty. It was about capacity planning. Native access in Bedrock removes a real compliance headache for enterprise teams.

English

Rohan Paul@rohanpaul_ai·1d

FT: OpenAI is deploying model 5.5 on AWS. Microsoft dropped the contract terms restricting OpenAI to Microsoft servers. This shift allows developers to run OpenAI tools inside Amazon Bedrock. Amazon secured this partnership by investing $ 15B into OpenAI during Feb-26. OpenAI reciprocated by signing a deal to purchase $ 138B in server capacity from AWS. --- ft .com/content/f159dd74-56a5-404b-ae54-ab4bab98b2c3?syn-25a6b1a6=1

Rohan Paul@rohanpaul_ai

OpenAI is moving away from its exclusive Microsoft arrangement, making room for possible partnerships with Amazon and Google. At the same time, Microsoft wants to cut its dependence on OpenAI by creating its own AI models. Ending the exclusivity could also help lower antitrust scrutiny across the U.S., U.K., and Europe. Barclays analysts said the shift may free Microsoft to invest more in Copilot and expand cloud capacity. Last month, Microsoft was considering legal action against Amazon and OpenAI over a $ 50B cloud deal that may have violated its exclusive cloud partnership. The revised arrangement now lets OpenAI run its services on Amazon’s cloud without the technical changes required under the earlier Microsoft agreement. Microsoft still keeps a non-exclusive license to OpenAI model and product IP through 2032, still gets OpenAI revenue share through 2030, and still sits inside the upside as a major shareholder. Microsoft stock, falls -5% after announcing that its OpenAI license will now be nonexclusive and it will no longer pay revenue share to OpenAI. Investors can punish the loss of exclusivity on the headline, even if the deeper effect is to simplify the alliance and reduce regulatory and capital strain. But the software giant will no longer share revenue for the OpenAI products it sells on its cloud. Revenue OpenAI must share with Microsoft through 2030 will now have a cap for the total number and no longer tied to the startup's technology milestones. My read is that OpenAI gained more bargaining power than Microsoft gained certainty, which explains why investors initially treated the change like a moat reduction rather than a breakup. For Microsoft, the selloff reflects a real loss of scarcity premium, but the deeper gain is strategic - freeing capital for Copilot and other cloud capacity, while also easing antitrust pressure and reducing dependence on a partner that recently accounted for about 45% of Microsoft’s remaining performance obligation. Amazon looks like the clearest tactical winner. It already has a strategic OpenAI partnership, Bedrock distribution for OpenAI Frontier, and a large Trainium capacity commitment, so this deal turns AWS from backup infrastructure into a front-line route to OpenAI. Google Cloud gains something subtler but important: the right to compete for OpenAI workloads even while competing in models.

English

7.5K

Mike Carter | Building with AI@AIProfitStackHQ·1d

The validation layer point is underrated. Most people add it after something breaks. By that point, bad outputs have already shipped and you have no record of when it started.

English

Mike Carter | Building with AI@AIProfitStackHQ·2d

3 things I actually learned this week building with AI: 1. A validation layer is not optional. Silent failures will find you — usually at the worst time. 2. The best prompts I write now read like instructions to a junior contractor: specific, scoped, with clear success criteria. Vague in = vague out. 3. Open source is moving fast. GLM-5.1 under MIT with 8-hour task persistence is serious. Don't sleep on it because it didn't come from a US lab. What's one thing you picked up this week?

English

Mike Carter | Building with AI@AIProfitStackHQ·2d

@svpino Tried this on our agent pipeline. Claude tends to over-select stronger models when task descriptions have any ambiguity, inflating spend with no quality lift. Our fix: default to Haiku, escalate to Sonnet only on test failure or multi-file scope. Cut spend close to 40 percent.

English

Santiago@svpino·2d

You can let Claude Code decide the level of thinking it needs to solve a problem. I like to use Sonnet as the default model, but I tell Claude to decide which model to use every time I spin off an agent to solve a problem: "Choose the weakest model that can complete the task." You can also give it specific guidelines. For example: """ • Use the smallest and cheapest model to create small functions, small test cases, or do anything mechanical. • Use the strongest model you have to make architectural decisions about the code base. """ This is great whenever you are giving the model a list of tasks and you don't have a chance to specify a different model for each task.

English

134

26.3K

Mike Carter | Building with AI@AIProfitStackHQ·2d

@emollick Doing this in standups has changed how we scope. Half the "we should look into X" items get killed in 10 minutes because Codex actually surfaces the rough edges nobody anticipated. Specs written after a live build come out noticeably sharper than specs written before one.

English

Ethan Mollick@emollick·2d

An easy way to get a team engaged with AI is just to build the thing you are talking about in the meeting during the meeting using Codex or Claude Code. At worst, it fails in ways that can be constructive. At best, you built the thing and the meeting topic shifts forward a month

English

460

26.7K

Mike Carter | Building with AI@AIProfitStackHQ·2d

@svpino Routing across providers is table stakes. The real gain comes from per-task routing: small fast models for classification, premium models only where the task demands it. We shipped that pattern in production and cut inference costs by about 60%.

English

Santiago@svpino·3d

Do not marry yourself to one LLM provider. Just don't do it. 10/10 you'll end up regretting it. Here is how simple it is to build routing in your projects and use whatever model you want whenever you want.

English

471

49.7K

Mike Carter | Building with AI@AIProfitStackHQ·2d

@sama Multi-cloud access is the actual win for builders. Running OpenAI natively on your existing GCP or Azure stack removes a real deployment friction. Microsoft-as-primary still matters for enterprise procurement. Azure stays the default path for most teams.

English

Sam Altman@sama·3d

we have updated our partnership with microsoft. microsoft will remain our primary cloud partner, but we are now able to make our products and services available across all clouds. will continue to provide them with models and products until 2032, and a revenue share through 2030.

English

1.2K

842

14.2K

2.1M

Mike Carter | Building with AI@AIProfitStackHQ·3d

@sama Multi-cloud access is the real shift here. OpenAI can now pit AWS and GCP against Azure on compute pricing, which matters at their scale. Microsoft holds first position through 2032 but lost exclusivity. That will reshape API cost structures over the next few years.

English

Mike Carter | Building with AI@AIProfitStackHQ·3d

@rohanpaul_ai The sensor stack is impressive. The harder engineering problem is what runs on top of it. Real-time fusion of gaze, head pose, and audio into a coherent interaction model, with sub-100ms latency, is where most companion robot demos fall apart in production.

English

Rohan Paul@rohanpaul_ai·3d

A Chinese company made this desktop blue-eyed companion robot. Exhibits lifelike micro-expressions, eye tracking, and responsive head positioning. Multiple degrees of freedom, fluid neck motion for natural engagement, with eye-mounted cameras.

English

184

24K

Mike Carter | Building with AI@AIProfitStackHQ·3d

Eye-mounted cameras solve a subtle but real problem: traditional webcam-style placement makes robots look slightly off when holding eye contact. The micro-expression system matters too. Social robots fail the uncanny valley test not because they move too much, but because motion and expression arrive out of sync. Getting those two systems to coordinate is the hard part.

English

Mike Carter | Building with AI@AIProfitStackHQ·19 Nis

@rohanpaul_ai The 95 to 35 drop tracks what I see shipping LLM features. Clean evals hide how much reasoning the model borrows from well-formed prompts. A structured intake layer that normalizes messy symptoms before the model sees them closes most of that gap in production.

English

Rohan Paul@rohanpaul_ai·19 Nis

BBC Published an article. AI chatbots are becoming a real front door for health advice, but new evidence says human-AI conversation breaks their medical accuracy far more than most people realize. The problem is not that these systems always fail when they see a full, neatly written case, because in controlled testing they reached about 95% accuracy. The problem is that real people give messy, partial, distracted symptom descriptions, and in that setting accuracy dropped to about 35% In the area of medical advice, a tiny wording change can flip advice from “rest at home” to “go to hospital now, --- bbc .com/news/articles/clyepyy82kxo

English

20K

Mike Carter | Building with AI@AIProfitStackHQ·19 Nis

@swyx @gabegreenberg @ReactMiamiConf ReactMiamiConf set a high bar for community-driven events that still gets referenced years later. A lot of AI conferences right now feel like product demos in sequence. If the Miami crew ports their hospitality playbook to AIE, the hallway track alone is worth the ticket.

English

swyx 🇸🇬@swyx·19 Nis

these are some of the heaviest hitters in all of the AI Engineering circuit and this week they will all be in Miami! 🏝️ so proud to be there to support @gabegreenberg and co as they build the first independently run AIE in America. Fun fact their first @ReactMiamiConf gave me such insane good vibes I ended up going every year. Building developer community is hard and even harder in a non-tech-hub city, but @MichelleBakels and crew consistently execute so well and I would have no one else as our first partner in the East. join us! since i’m not organizing i’ll actually be available to talk to attendees and sponsors, very much looking forward to that.

AI Engineer: Miami@AIEMiami

We're in the final stretch for tickets! Get your ticket to AIE Miami before we sell out! ai.engineer/miami

English

32.1K

Mike Carter | Building with AI@AIProfitStackHQ·19 Nis

@emollick The real comparator is not AI vs doctor. It is AI vs a midnight WebMD spiral plus a Reddit thread. Diagnostic accuracy matters less than escalation behavior. A model that knows when to stop hedging and say go to the ER tonight is the benchmark worth running.

English

139

Ethan Mollick@emollick·19 Nis

This paper shows people are asking a lot of medical questions of AI already, but we have little evidence of how good or bad this is. Most of the published research uses old models & compares to doctors. How do new models compare to the info people would have gotten without AI?

English

255

26.7K

Mike Carter | Building with AI@AIProfitStackHQ·19 Nis

@goodside The discourse can be marketing and load-bearing at the same time. It shapes funding, hiring, and policy. It also provokes real reactions from people who have never read a position paper. Builders feel both pressures in the same week.

English

Riley Goodside@goodside·19 Nis

“xrisk talk is marketing” sure that makes total sense the thing that makes people throw molotov cocktails is marketing

English

7.6K

Mike Carter | Building with AI@AIProfitStackHQ·19 Nis

@swyx Security report volume is one of the best signals that a project is actually load-bearing. 60x more reports than curl shows real adoption, not hype cycles. Audiences who ship code want that operational depth more than polished story arcs.

English

310

swyx 🇸🇬@swyx·19 Nis

shut the f up AIE beat TED???? a somber technical talk about security advisories and maintainer burnout beat the happy storytelling lobster on blazer one on the channel with 27 million subscribers??? ???!? (i was actually kinda sad when we launched same day bc i thought we’d be completely overshadowed)

AI Engineer@aiDotEngineer

In @steipete's latest State of the Claw, he gives an update on 5 months of @OpenClaw and some behind the scenes on what it's like maintaining the fastest growing open source of all time: youtube.com/watch?v=zgNvts… eg: - 60x more security reports than curl - a "Bullshit Taxonomy" of illegitimate reports - Nation State attacks - 12%-20% of skills contributions malicious - contributors burning multiple Codex Pro per day - academic FUD Agents are both the product AND the attack vector. @simonw's Lethal Trifecta is not solved. Come for Pete's recommendations, what OpenClaw is doing on security, OpenClaw Foundation roadmap, and then subsequence audience Q&A with @swyx on taste, dreaming, and OpenAI.

English

201

31K

Mike Carter | Building with AI@AIProfitStackHQ·18 Nis

A team I know cut their inference bill by more than half by moving routine tasks off the frontier tier onto cheaper open weights models. The win was not picking a new default. It was matching each task to the model that fits.

English

Mike Carter | Building with AI@AIProfitStackHQ·18 Nis

Gemini 3.1 Pro is leading 13 out of 16 major benchmarks — at roughly one-third the API cost of its closest competitors. At the same time, Z.ai dropped GLM-5.1 under MIT license. It can stay aligned on a single task for 8 hours and handle thousands of tool calls without losing context. If you're still defaulting to the same model you used 12 months ago purely out of habit, you're leaving both performance and margin on the table. The cost curve keeps dropping. That's a system design problem now, not just a budget one.

English

Открыть

@svpino @gdb @rohanpaul_ai @emollick @sama @elonmusk @BarackObama @taylorswift13