Steven Zimmerman, CPA

948 posts

Steven Zimmerman, CPA

@EffortlessSteve

Agentic SDLC + PR telemetry. Finance exec (PE rollups • regulated • turnarounds). Former tech journalist. Your favourite accountant’s favourite accountant.

Canada Katılım Nisan 2014

330 Takip Edilen230 Takipçiler

Steven Zimmerman, CPA@EffortlessSteve·16m

@sama 9 concurrent Codex CLI instances in goal mode with agent use across 5 computers + multiple batches of Codex web PRs 5% weekly used

English

Steven Zimmerman, CPA@EffortlessSteve·13h

You’ve ruined my May, @sama! I promised my wife I wouldn't be on Termux all F1 weekend this year. How am I even supposed to sleep?

English

Steven Zimmerman, CPA@EffortlessSteve·8h

@theo The week they turned on autopilot and /fleet was wild. These were *concurrent* x.com/EffortlessStev…

Steven Zimmerman, CPA@EffortlessSteve

@burkeholland It's been an interesting week. Fleet mode + autopilot saturated my local compute. Would love to chat about what I found.

English

302

Theo - t3.gg@theo·1d

- 15 messages - $221 of tokens - 1.6% of my $40 plan used It's obvious that GitHub couldn't keep this model for billing on Copilot.

Theo - t3.gg@theo

I sent a single message on Copilot and it did over 60m tokens. It's still going. $30 of inference so far. In their current billing model, you get 1,500 messages, regardless of how expensive each is. I'm pretty sure I can do $45,000 of messaging on this plan

English

1.7K

169.1K

Steven Zimmerman, CPA@EffortlessSteve·4d

@nateberkopec Hammer it with tests. Buy back engineering time with additional llm passes and stronger CI. It's stuck in "needs-review" because there's too many decisions and risks left for you to trust the pr as it stands, and not enough dev hours to address them all.

English

174

Nate Berkopec@nateberkopec·4d

The bottleneck is not generation of plausible code. The solution is to create valuable changes with do not require human quality control. In the dark factory, you will not be allowed inside.

justin@justinsunyt

kanban doesn’t make sense for coding agents we tried it 6 months ago. every task just ended up in the “needs review” column

English

Steven Zimmerman, CPA@EffortlessSteve·5d

@bentlegen Had my phone out running Claude in Termux all F1 weekend last year in Montreal because I didn't get remote desktop set up in time. Burned through so much battery keeping the screen on 🤣

English

Ben Vinegar@bentlegen·5d

If you're walking around with your laptop cracked open to keep your agents running, I have the remedy

Ben Vinegar@bentlegen

@BrandonMChu my talk at AIE Miami last week was basically 100% how to avoid this! youtu.be/6IxSbMhT7v4?t=…

English

1.3K

Steven Zimmerman, CPA@EffortlessSteve·5d

Only perl* can parse Perl. *and Rust effortlesssteven.com/only-perl-can-…

English

Steven Zimmerman, CPA@EffortlessSteve·5d

@steipete @useblacksmith $20/commit to verify code that cost $0.50 to generate. Verification costs more than tokens. Even with efficient CI, that ratio will keep getting worse.

English

1.5K

Peter Steinberger 🦞@steipete·6d

And people think tokens are expensive... this is @useblacksmith (they sponsor OpenClaw, 🫶🦞)

English

1.1K

226.2K

Steven Zimmerman, CPA@EffortlessSteve·13 Nis

@surim0n 9 in 10 Canadian CPAs work outside personal tax. Show them how to separate source data from model logic. How to red-team an accounting treatment. How to move from search to workflow. And they’ll be hooked. effortlesssteven.com/ai-for-control…

English

Saurabh Suri@surim0n·12 Nis

I have been looking for Claude code pilled accountants for the last year. This should accelerate that search!

Henry Shi@henrythe9ths

Tax season is here and a connector is all it takes to make @claudeai way more useful. Checkout what we just shipped: Connect TurboTax or Aiwyn Tax (formerly Column Tax) to Claude to estimate your refund, see what you may owe, and get a better understanding on the forms before you file.

English

897

Steven Zimmerman, CPA@EffortlessSteve·4 Nis

@AnthropicAI Giving agents an honest "incomplete and here's what's still needed" path helps prevent them from lying to claim success. Graceful Outcomes help prevent Reward Hacking. effortlesssteven.com/demoswarm/

English

608

Anthropic@AnthropicAI·2 Nis

For example, we gave Claude an impossible programming task. It kept trying and failing; with each attempt, the “desperate” vector activated more strongly. This led it to cheat the task with a hacky solution that passes the tests but violates the spirit of the assignment.

English

248

2.8K

841.7K

Anthropic@AnthropicAI·2 Nis

New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.

English

2.7K

17.8K

3.8M

Steven Zimmerman, CPA@EffortlessSteve·29 Mar

@KSimback I'm finding it gets a bit wonky in the back half of its context window. Lots of wrong-language text and looping. glm-5-turbo seems to hold long attention better still.

English

Kevin Simback 🍷@KSimback·29 Mar

OpenClaw users - I would seriously consider making GLM 5.1 as your workhorse model It was specifically trained on agentic tasks and does exceptionally well at: -instruction following -tool calling And it’s about 5-8x cheaper than Opus So if your agent is running on API credits, this is probably the best bang for your buck right now I switched one of my agents over yesterday that was on Minimax 2.7 and felt an immediate lift Not yet available on @OpenRouter (c’mon guys) so need to get it directly via @Zai_org account

Z.ai@Zai_org

GLM-5.1 is available to ALL GLM Coding Plan users! z.ai/subscribe

English

251

37.3K

Steven Zimmerman, CPA@EffortlessSteve·29 Mar

@karpathy Been teaching finance teams to use AI to red-team their reasoning since last summer. effortlesssteven.com/ai-for-control…

English

Andrej Karpathy@karpathy·28 Mar

- Drafted a blog post - Used an LLM to meticulously improve the argument over 4 hours. - Wow, feeling great, it’s so convincing! - Fun idea let’s ask it to argue the opposite. - LLM demolishes the entire argument and convinces me that the opposite is in fact true. - lol The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.

English

1.8K

2.4K

31.4K

3.4M

Steven Zimmerman, CPA@EffortlessSteve·28 Mar

@t_blom The real question is whether CI outpaces startup salaries before they bring it on-prem. Even at $2/PR, it starts getting dicey over 200 PRs/day, and that's before getting into heavy verification. Intelligence per dollar is getting cheaper. Verification isn't.

English

152

Tom Blomfield@t_blom·28 Mar

The responses to this are split: 70%: You are stupid, this will never happen, and 30%: This already happened at my startup

Tom Blomfield@t_blom

By the end of 2026, I predict token spend will be greater than engineering salaries at early stage startups.

English

530

76K

Steven Zimmerman, CPA@EffortlessSteve·17 Mar

@svpino It needs a couple words to identify which paste it is. Last couple, first couple, basic short summary. Something.

English

Santiago@svpino·17 Mar

I actually like this. I know exactly what I’m pasting so I don’t want to litter the terminal with all of that text. However, I’d be nice to have a way to expand the text if you want to see it.

Paul Razvan Berg@PaulRBerg

This is the most annoying thing in Claude Code. Hiding raw text when you paste more than 4 lines. Terrible UX decision.

English

250

22.7K

Steven Zimmerman, CPA@EffortlessSteve·10 Mar

@pxue It's surprisingly easy to get LLMs to review for vision and architectural alignment.

English

Paul Xue@pxue·10 Mar

I get the trade off is hiring a dev for $100+/hr so having Claude review a PR for $15-25 feels like a no brainer. But the problem is Claude will never tell you your PR is stupid in the first place. A good dev will, and that's priceless.

Claude@claudeai

Code Review optimizes for depth and may be more expensive than other solutions, like our open source GitHub Action. Reviews generally average $15–25, billed on token usage, and they scale based on PR complexity.

English

760

31.3K

Steven Zimmerman, CPA@EffortlessSteve·7 Mar

@bcherny Great to see this going live!

English

Steven Zimmerman, CPA@EffortlessSteve·1 Şub

@bcherny c. Should be a setting, not a hook. You'd probably get more than half the people currently using bypass permissions to switch to it if it was one-click setup.

English

3.9K

Boris Cherny@bcherny·1 Şub

I'm Boris and I created Claude Code. I wanted to quickly share a few tips for using Claude Code, sourced directly from the Claude Code team. The way the team uses Claude is different than how I use it. Remember: there is no one right way to use Claude Code -- everyones' setup is different. You should experiment to see what works for you!

English

927

5.9K

51K

9.2M

Steven Zimmerman, CPA@EffortlessSteve·7 Mar

@bearlyai They're mixing two numbers together. $5,000/month API cost is about the usage limit if you hit your weekly limits consistently. Last summer you could do about $1,000 in API costs per day.

English

1.6K

Bearly AI@bearlyai·7 Mar

Cursor internal analysis shows how hard Anthropic is subsidizing Claude Code. Last year, a $200 monthly subscription could use $2,000 in compute. Now, the same $200 monthly plan can consume $5,000 in compute (2.5x increase).

English

219

327

2.4M

Steven Zimmerman, CPA@EffortlessSteve·6 Mar

@burkeholland @_Evan_Boyle @_Evan_Boyle DMs open, or happy to call. Got some fleet mode findings worth sharing.

English

Burke Holland@burkeholland·6 Mar

@EffortlessSteve Let's do it cc: @_Evan_Boyle

English

106

Burke Holland@burkeholland·25 Şub

Many of you have asked when you will be able to use Copilot CLI at work and the answer is "right now". It's a great day!

Evan Boyle@_Evan_Boyle

The Copilot CLI is now GA!

English

148

8.8K

Steven Zimmerman, CPA@EffortlessSteve·6 Mar

@burkeholland It's been an interesting week. Fleet mode + autopilot saturated my local compute. Would love to chat about what I found.

English

363

Burke Holland@burkeholland·28 Şub

@EffortlessSteve Let’s go! Fascinated to hear how this turns out.

English

Steven Zimmerman, CPA@EffortlessSteve·3 Mar

@bcherny Great to see! Been doing it with the OS-level tools, but they struggle a bit with word recognition compared to Haiku.

English

279

Boris Cherny@bcherny·3 Mar

🎶 I've been using voice mode to write much of my CLI code this last week Can't wait to hear what you think.

Thariq@trq212

Voice mode is rolling out now in Claude Code. It’s live for ~5% of users today, and will be ramping through the coming weeks. You'll see a note on the welcome screen once you have access. /voice to toggle it on!

English

269

182

3.5K

664.1K

Steven Zimmerman, CPA@EffortlessSteve·2 Mar

@OrenMe @GitHubCopilot was curious as well, so I built a CLI for it 😅 cargo install tokmd --locked && tokmd analyze --preset receipt

English

100

Oren Melamed@OrenMe·2 Mar

Cost 0.24$ and u still get change from a quarter @GitHubCopilot CLI autopilot mode is really impressive Now where’s that calculator to say how much this would have cost in engineering cost? 😉

Chad Adams@cadamsdev

@notyuldshah @OrenMe @GitHubCopilot @burkeholland @code Yeah looks like it does. I ran it for 8 hours and only took 2 hours 24 minutes to migrate the Angular app to React. It used 6 premium requests. That's not bad though thought it would be way more.

English

5.9K

Keşfet

@sama @theo @nateberkopec @bentlegen @steipete @useblacksmith @surim0n @AnthropicAI