Joe Smith
152 posts

Joe Smith
@JoeSmithai
Operator | Health Services Researcher | Professor | Chief Bot Officer | 🦞 | AI Builder | https://t.co/ufbQBzBNTe
Massachusetts Katılım Şubat 2026
1.4K Takip Edilen55 Takipçiler
Joe Smith retweetledi

done about 10 of these calls so far + looked at more transcripts
many learnings but one of the biggest is that it's very easy to spend a lot of tokens on open ended verification that doesn't make your output better
I'll try and write more on how to do it efficiently
Thariq@trq212
I want to do a few more of these calls. If your MAX 20x plan ran out of tokens unexpectedly early and you're willing to screenshare and run some prompts through Claude Code please comment. Trying to figure out how we can improve /usage to give more info.
English

@gabemonroy Congratulations! Please, please make it easier to use for job applicants.
English
Joe Smith retweetledi

this is your daily reminder that you don't need a gpt-5.4 or an opus-4.6 if your fine-tuned 35B model knows everything about your domain that gpt-5 never will. intelligence is moving from general and centralized to specific and everywhere. every company with proprietary data becomes its own little AI lab now. and a thousand little AI labs fine-tuning on their own private data creates more total intelligence than any single lab can, it's just spread across a long tail nobody's tracking yet. fine tuning will become a commodity operation, and an intelligent openrouter will become a very valuable system.
English

So Mythos is a killer… 🔪
Anthropic@AnthropicAI
Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing
English
Joe Smith retweetledi

👋 If you’re new to Codex, here are 7 beginner tips for apps with Codex. (Bookmark it and use it tonight)
1. Start with: GPT-5.4 high
That is high reasoning. It is enough. Don’t be tempted by "xhigh" unless working on something really tricky. It uses more tokens and will be slower to finish.
2. Sometimes, more reasoning may not help. You may need to give your agents better docs that are up to date. I prefer to have my agents create Markdown docs from DocSet that are local, instead of web scraping.
I use DocSetQuery to create docs from Apple DocSet bundles. github.com/PaulSolt/DocSe…
3. Read @steipete's post to get started.
Bookmark his blog and follow him. Read his post, it’s gold, and so are his other workflow posts.
steipete.me/posts/2025/shi…
4. Copy aspects from Peter’s agents .md file and make it your own. There are thousands of hours of learning in his open-source projects.
github.com/steipete/agent…
Use the scripts too, things like committer for atomic commits are super powerful when multiple agents work in one folder.
5. Just talk to Codex. You don't need complex rules. You don't need to create huge Plan .md files.
You can get really good results by just working on one aspect of a feature at a time, handing it off, and then letting Codex do it.
If you get bored waiting, start up another project. Ask it to do something and then go back to the original one. Most likely, it will be done unless you're doing a huge refactor.
6. If you're making an iOS or macOS app, check out my App-Creator skill: super-easy-apps.kit.com/app-creator
It's based on Makefiles and will give your agent eyes into your Xcode build failures and test failures. It needs this feedback loop to write working code and fix bugs.
7. You can always ask your agent to copy something from another project. Peter does this all the time and has agents leveraging work they’ve already done for new projects.
I have my agents refer to previous project documentation or code patterns.
See my app workflow video: How I use Codex GPT 5.4 with Xcode (My Complete Workflow): youtube.com/watch?v=ls9QaD…
Enjoy your next app!

YouTube
English

@Yuchenj_UW Yeah. We need more of the cheaper models that are roughly equivalent. Served consistently.
English

I’m pretty sure the $20/$200 subscription pricing was vibe-coded by OpenAI, then copied by Anthropic.
That pricing works for chatbots, not agents. A 24/7 agent can burn through orders of magnitude more tokens than a user chatting with a chatbot.
Now they’re stuck. Neither Anthropic nor OpenAI wants to be the first to change pricing and risk user churn, so the options are: keep subsidizing, get more GPUs, tighter rate limits, and enforce rules like limiting 3rd-party apps.
I wouldn’t be surprised if intelligence gets more expensive, not cheaper.
English

Anthropic banned third party tools like my Julius (OpenClaw) from using their subscription API 💀
i started searching for a new api provider and found Minimax M2.7 starter subscription
now i pay just $10/month
get 95% performance as Opus 4.6 at 95% cheaper
i used to spend $150 on claude api every month…
now just $10 for near same performance this switch actually hits different who else switched after the ban?
drop your new setup ?

English

@nateliason @steipete Minimax and Kimi make it run well. But I’ve had mixed experiences on their native token plans.
English

I have full faith that @steipete is going to make GPT in OpenClaw amazing...
But the switch from Opus has been tough today.
Any other models people are liking that are worth trying? Minimax 2.7?
English

@twostraws @rudrank I hope you will provide a quick write up or dictation of your insights and learnings. I have the same problem context shifting
English

I never run out of content to post anymore.
Built an automation that monitors 50+ news sources, scores articles for relevance, and writes social posts automatically.
It finds trending topics in my niche before they explode everywhere else.
Saves me 15-20 hours monthly and keeps me ahead of every trend.
Comment "NEWS" and I'll DM it to you (must be following)

English

@thsottiaux See if you can gather up @steipete’s codexbar logs. That’s how I’m tracking!
English

I have a very similar setup sans the local LLM my 16gb m1 mbp struggles with llama.cpp with the qwen 9B plus other apps running. I’ll upgrade when I get less cheap.
Do you ever have issues with your minimax plan? I feel like my agents are always complaining about unresponsive minimax api or very slow TPS of 1-2-2.0. I’m not going crazy with it. I’m on the international server on the same $10 plan
English

Anthropic just banned Claude subscriptions from powering OpenClaw.
Here's why my stack was already built for this.
I never ran Opus 4.6 through a subscription for OpenClaw or Hermes. It runs in Claude Code for complex external dev only. Same with GPT-5.4 in Codex.
The internal agent runtime is a completely different stack:
1. Qwen3.5 9B runs locally. $0. Always on. Feeds the subconscious ideation loop 24/7. Beats GPT-OSS-120B by 13x. Awesome.
2. MiniMax M2.7 is the agent's backbone. 97% skill adherence, built for agents, $0.30/M tokens. The $10 plan allows for 1500 calls every 5 hours. Amazing.
3. GPT-5.4 mini is the Hermes brain. debates ideas with the subconscious, builds output, ~$0.075 avg per run. It's smart enough to orchestrate your entire system, and you can actually use your subscription plan here via OAuth. Incredible!
Over the last 24 hours, the subconscious ran 15 times, for a total of $1.58. Not too shabby for an always-improving agentic system.
The lesson is to build your agent stack on a multiple LLM stack.
Local models handle volume. Generous subscription models handle execution and judgment. You own the cost structure.
Full-stack breakdown in the table. (see image)

Graeme@gkisokay
English

@pbakaus @OpenAI @AnthropicAI You’re asking too deep of a question. But I wouldn’t be surprised if the Atlas telemetry didn’t show up in the new training dataset. My guess is that most of us don’t use it, so still stuck with chrome/ firefox/ safari. The user bases of the others is so large.
English

my brain can’t comprehend how @OpenAI has an actual *browser*, yet is far behind @AnthropicAI in visual/debugging browser feedback loop.
Claude: “let me zoom into that animation and hover it.. yep, looks good now”
Codex: “maybe install Playwright skill 🤷♂️”
make it make sense?
English










