Petio Lazarov

221 posts

Petio Lazarov banner
Petio Lazarov

Petio Lazarov

@petiosz

Testing AI agents in public. Codex notes, model releases, and what breaks after the demo.

가입일 Mayıs 2026
224 팔로잉19 팔로워
고정된 트윗
Petio Lazarov
Petio Lazarov@petiosz·
Starting this account properly today. I use AI tools until the ugly parts show up: limits, bad runs, weird failures, wins. I'll post Codex notes and what breaks after the demo. No hype farm. No magic. Just notes from using the stuff.
English
0
0
3
316
Petio Lazarov
Petio Lazarov@petiosz·
@danshipper next chart i want: runs started vs usable diffs, because a forced spike is not the same thing as codex earning the slot.
English
0
0
0
77
Dan Shipper 📧
Dan Shipper 📧@danshipper·
before and after fable ban: my claude app vs. codex app usage
Dan Shipper 📧 tweet media
English
16
6
145
11.1K
Moon Dev
Moon Dev@MoonDevOnYT·
you are literally burning money on polymarket while i just forced 36 ai agents to strip mine the platform for alpha it is frankly disgusting how fast these bots uncover the exact loopholes needed to bleed the rest of the market dry watch me leak the raw prompts and steal the winning bot blueprints before this gets taken down here
English
5
2
62
4.7K
FOX TOMB
FOX TOMB@foxtomb232·
I started as a reply guy with 0 followers. Now I’m at 17.2M and guess what? I’m still replying. If you’re building too, drop a reply and let’s connect 🔒💬
FOX TOMB tweet media
English
200
20
150
6.9K
Blake Ryan
Blake Ryan@blakefakhoury·
start an AI sleep channel. trust me lol, this takes us less then 5 minutes a day
Blake Ryan tweet media
English
22
22
512
38.1K
Petio Lazarov
Petio Lazarov@petiosz·
@MoonDevOnYT Hey moondev why did you close your github projects behind a paywall brother? make a monthly sub at least, I can't afford to pay several gazzilion dollars for that cmon :D
English
0
0
1
1.1K
Moon Dev
Moon Dev@MoonDevOnYT·
fable 5 has been killed by the US government if you didn't spend the past 72 hours building trading systems with it you will be left in the past we will never have that powerful of AI again
Moon Dev tweet media
English
21
8
201
37.6K
Petio Lazarov
Petio Lazarov@petiosz·
@sattyyouneed make the limit visible before the run starts. otherwise the agent spends your cap like it found a company card.
English
0
0
0
20
Satyam
Satyam@sattyyouneed·
Is there any trick to avoid Codex usage limits?
English
5
1
3
1.1K
Petio Lazarov
Petio Lazarov@petiosz·
@Latin0Patri0t @ChrissGPT Bro I am from Europe... I can't count on Europe to do anything... we only regulate, we don't produce. Same happens in America now.
English
0
0
0
11
Chris
Chris@ChrissGPT·
OpenAI already requires ID for some features. Anthropic will most likely simply do the same to use mythos. This will continue to get more stringent as we get closer to AGI
English
41
18
517
38.6K
Petio Lazarov
Petio Lazarov@petiosz·
@Latin0Patri0t @ChrissGPT Yeah i can definetly count on US after today mhm.. the most cucked model that refused 99% of requests got banned. I want the same model but totally unlocked
English
1
0
0
23
Bad Hombre
Bad Hombre@Latin0Patri0t·
@petiosz @ChrissGPT 😂😂😂 this guy counting on China when China doesn’t even allow internet access …what a retard
English
1
0
1
22
Petio Lazarov
Petio Lazarov@petiosz·
@aakashgupta i would want the evaluator to print 4 boring things in the transcript: changed files, failed check, stop reason, turn count. otherwise /goal can stop cleanly and still leave a mystery.
English
0
0
0
226
Aakash Gupta
Aakash Gupta@aakashgupta·
/goal might be the most powerful feature in Claude Code that you're not using. And the part everyone gets wrong has nothing to do with the feature. Here's the mechanism. You hand Claude a completion condition. It works turn after turn. After every turn, a separate evaluator model (Haiku by default) checks the output against your condition. Condition unmet? Claude keeps going. Met? It logs the proof and hands control back. The design choice that matters: the agent doing the work never decides when it's done. A fresh model does. OpenAI shipped /goal in Codex in April. Anthropic followed in May with Claude Code 2.1.139. Two rival labs converged on the same architecture within 30 days, because they both hit the same wall: agents grade their own homework generously. Separate the worker from the judge and autonomy actually holds. But here's where most runs die. The bottleneck moved. It's no longer prompting skill. It's the goal condition itself. "Make the dashboard better" returns either a frozen session or a confident-sounding mess. "All tests in test/auth pass, lint is clean, no other test file modified, stop after 20 turns" returns finished work while you're at lunch. A measurable end state. A check the agent can prove in the transcript. Constraints that must hold. A turn limit. PMs have a name for this. Acceptance criteria. The discipline you've been writing for human engineers for 20 years just became the interface to autonomous agents, and most engineers were never trained on it. I spent the week running /goal on real PM work and wrote the full playbook, including the goal conditions that worked and the ones that burned tokens for nothing: news.aakashg.com/p/how-pms-shou… The agent does the work. You define done. That was always the job.
Aakash Gupta tweet media
English
4
2
22
10.6K
me
me@twetsfyp·
Mythos Claude is Insane This is a tutorial a 12min on how to build animated, award-Winning websites with Claude Fable 5
English
39
291
3.8K
1.3M
Petio Lazarov
Petio Lazarov@petiosz·
@rduffyuk token price is the wrong unit. i would track subagent fanout per review checkpoint. when that is hidden, a model swap can look cheap while the run gets harder to audit.
English
0
0
0
4
rduffy
rduffy@rduffyuk·
Running Claude Code (Fable 5) and Codex in parallel. Fable landing forced me to build actual cost governance. Discovery: one 4-hour session, 32 subagents inheriting Fable — $50/MTok output, mandatory extended thinking, can't be disabled. 316K output tokens. ~$16. Single session. Two systems to fix it — breakdown below 👇
English
2
0
0
52
Petio Lazarov
Petio Lazarov@petiosz·
@buildwithdjdev my check is minutes until a reviewer can tell what happened, what changed, and what can be thrown away. if that takes longer than the run, the orchestrator is just moving the bill.
English
0
0
0
41
Dj
Dj@buildwithdjdev·
I started using an orchestrator thread in Codex, getting more out of all active tasks now and higher quality output but boy does it burn tokens. I'm out of weekly limit in ~3 days
English
1
0
2
77
Petio Lazarov
Petio Lazarov@petiosz·
@aniketapanjwani i'd add a "throw away" section to the handoff. stale assumptions. files to ignore. last-known-good state. next step that should fail first.
English
0
0
0
191
Aniket Panjwani
Aniket Panjwani@aniketapanjwani·
Fable eats your Claude Code usage limits in hours - here's how I'm getting around it: 1. Use Fable for planning and to write out your planning doc to disk. I like to use Compound Engineering brainstorm/plan: github.com/EveryInc/compo… 2. In your brainstorming/planning, clear your session (/clear in CC) and do handoffs at appropriate intermediate stages. Install this /handoff skill to automate it: github.com/mattpocock/ski… 3. Install the Codex plugin for Claude Code: github.com/openai/codex-p… 4. Either use /ce-work-beta through Compound Engineering (in a new session after doing /handoff), or just tell Fable to delegate work to Codex to save on tokens. The general principle - use the expensive/better model to decide what to do, and use the cheaper model to do it - is a common technique in agentic deevlopment.
English
2
8
71
8.1K
Petio Lazarov
Petio Lazarov@petiosz·
@Warizo_ofAfrica i'd add one more handoff test: can another builder reopen the run tomorrow and find the last known-good state without asking you.
English
1
0
0
17
Warizo
Warizo@Warizo_ofAfrica·
Good question. My answer: use Cursor/Claude Code for speed, but don’t measure the tool. Measure the handoff: context quality, test pass rate, review time, rollback risk. The best agent is the one your workflow can safely constrain.
Warizo tweet media
English
1
0
2
70
Petio Lazarov
Petio Lazarov@petiosz·
@henrikhinai i'd test reuse by the handoff card. role. allowed files. stop line. next-agent input.
English
1
0
1
14
Petio Lazarov
Petio Lazarov@petiosz·
@shvnmahajan show the room, not just the answer. files visible. docs pasted. guesses made. proof file.
English
0
0
2
25