Ryan Craven

2.6K posts

Ryan Craven

@ryan_tech_lab

Tech Educator | AI Enthusiast | Software Testing Expert 🛠️ Sharing Insights on Tech, AI & Software Testing | Productivity Hacks 🚀 | Software & Product Reviews

Raleigh-Durham, NC Katılım Kasım 2024

345 Takip Edilen257 Takipçiler

Sabitlenmiş Tweet

Ryan Craven@ryan_tech_lab·3 Mar

Cursor just rewrote 6 files. You asked it to add a button. Claude wrote 400 lines. The bug was on line 3. You lost 2 hours because you forgot to commit. Y You've started building an AI agent 4 times. Still no agent. 37 files. Drop in, fill the blanks, go. Vibe Coding OS — $29

English

1.9K

Ryan Craven@ryan_tech_lab·10 Nis

Can linting actually replace unit testing? Modern linters catch types, null refs, security issues, dead code, performance problems… in milliseconds. So… do we still need all those test suites? Or is this developer heresy? Change my mind 👇

English

Ryan Craven@ryan_tech_lab·9 Nis

@mutiemule the 'waste of time' feeling is the unlock. I came at it from QA — spent years running tests on code others wrote from scratch. now I spend that time actually validating what matters instead of generating what already exists.

English

Mutie Mule@mutiemule·9 Nis

As a retired software engineer who has started coding again; I can barely function without claude. Writing a feature from scratch just feels like a waste of my time. Crazy that we made software this way. Respect to pre-ai engineers. #claudeai

English

210

Ryan Craven@ryan_tech_lab·9 Nis

@shiri_shh the million dollars was never the missing ingredient. I've seen people vibe code solid apps in a weekend. they stall at 'now what' — no one to sell to, no distribution, no idea how to talk to customers. the code got faster. the business fundamentals didn't.

English

shirish@shiri_shh·8 Nis

vibe coding makes people think that...they’re just one prompt away from a million-dollar startup.

English

246

667

24K

Ryan Craven@ryan_tech_lab·9 Nis

@therealdanvega yes to this. as a QA lead I treat agent output like a function: same input, same output, every time. my eval setup is input/expected pairs plus a judge prompt. simple, but it catches regressions when the model updates.

English

140

Dan Vega@therealdanvega·9 Nis

We all agree you shouldn't ship code without tests. So why are you shipping AI agents without evals? If you're writing evals, how are you doing it? Drop your setup below.

English

Ryan Craven@ryan_tech_lab·9 Nis

@IAmVivianCai and the teams that don't run it will just assume the current price floor is permanent. it never is.

English

Vivian Cai@IAmVivianCai·9 Nis

@ryan_tech_lab The kind of math that changes the whole game

English

Ryan Craven@ryan_tech_lab·9 Nis

$10/mo. 8 hours of autonomous agents. 1,700 steps per session. Most teams are still paying $100+/mo in API costs to run agents that do 20. GLM 5.1 didn't just move the benchmark. It moved the price floor. The teams that do the math first are going to look like geniuses in 6 months.

English

111

Ryan Craven@ryan_tech_lab·9 Nis

@GG_Observatory it can't. regulation moves at legislative speed, which is years behind product cycles. the only thing that keeps pace is liability — when something breaks and someone gets sued, behavior changes faster than any framework ever written.

English

GG 🦾@GG_Observatory·9 Nis

This is the real insight. Every regulatory framework is essentially a taxonomy of harms that already happened. By the time you've named a category, the technology has moved into the adjacent unnamed space. The question isn't how to regulate vibe coding — it's whether regulatory想象力 can ever move at software speed.

English

Ryan Craven@ryan_tech_lab·8 Nis

Apple's App Store saw 84% more submissions in Q1 2026. Then they banned Replit, Vibecode, and Anything. These are the same story. Vibe coding tools made iOS app submission trivial. Apple got overwhelmed. Their fix: ban the tools that caused it. They banned the growth they created.

English

Ryan Craven@ryan_tech_lab·9 Nis

@aryeh @om_patel5 neither, honestly. the fix is clear rules upfront: errors only, warnings are noise. I put it in the system prompt and it stopped the spiral entirely.

English

Aryeh@aryeh·9 Nis

@ryan_tech_lab @om_patel5 Which is worse wrecking your app and breaking/losing half the features because the AI is chasing down typescript *WARNINGS* or allowing it to focus on “debugging” warnings. Dammed if you do, damned if you don’t.

English

Om Patel@om_patel5·8 Nis

THIS GUY GOT TIRED OF MANAGING AI AGENTS THROUGH TERMINALS AND DASHBOARDS SO HE BUILT THEM AN RPG WORLD 5 agents and each one has a pixel character, a station, and they actually walk around the space when enough unresolved issues pile up, the agents walk to a meeting point and hold a council session. four different models debating what to do next, not scripted. each one reads the live system state independently. in one session an agent pushed for cold outreach to close leads at 2am. another one said that's a terrible look for an autonomous system contacting strangers while the operator sleeps. they ended up pivoting to an inbound strategy that none of them originally proposed. single HTML file, node bridge, and phaser. runs on a Mac Mini. instead of reading logs and checking dashboards you just watch your little pixel agents walk around and talk to each other this is the most creative way i've seen anyone manage AI agents so far

English

313

738

7.7K

650.7K

Ryan Craven@ryan_tech_lab·9 Nis

@IAmVivianCai and now the question shifts to who owns the domain layer

English

Vivian Cai@IAmVivianCai·9 Nis

@ryan_tech_lab Speed was rented.

English

Ryan Craven@ryan_tech_lab·9 Nis

orchestration wasn't your moat. it was your timeline. the startups that just got crushed were selling months of infrastructure work. Managed Agents collapsed that to an afternoon. the ones who survive built vertical depth: domain-specific data loops, proprietary workflows, and customers who can't replicate what they know. code that runs is a commodity. context that matters is not.

English

Ryan Craven@ryan_tech_lab·9 Nis

@NathanielC85523 @PawelHuryn neither. the fix is not letting the AI own the decision. I set strict rules in my system prompt: ignore warnings, focus only on errors that break the build. then I run a separate check pass at the end. keeps it from chasing squiggles mid-session.

English

Nathaniel Cruz@NathanielC85523·9 Nis

@ryan_tech_lab @PawelHuryn tracking it manually because the bar gives you nothing. DM a screenshot of your worst session, we run free 15-min cost breakdowns.

English

Paweł Huryn@PawelHuryn·7 Nis

Claude Code doesn't show you how many tokens you're using for subscriptions. No breakdown by model. No breakdown by project. Just a progress bar that says "63% used." So I built a local dashboard that reads the files Claude Code already writes to your machine. Turns out every session, every turn, every token is logged to ~/.claude/projects/ in JSONL files. Input tokens, output tokens, cache reads, cache creation, model name, timestamp. It's all there. You just can't see it. My numbers over the last 30 days: 440 sessions. 18,000 turns. $1,588 in API-equivalent costs. On one day, the cache spiked to 700M tokens - visible cache bug, two days in a row. The dashboard scans those local files, builds a SQLite database, and serves charts on localhost:8080. Filter by model (Opus, Sonnet, Haiku). Filter by time range (7d, 30d, 90d, all time). Cost estimates based on current Anthropic API pricing. Works retroactively. First run processes your entire Claude Code history. Install: git clone github.com/phuryn/claude-… cd claude-usage python3 cli.py dashboard Windows: use python instead of python3. Zero dependencies. Python standard library only. Open source, MIT. Star it. Fork it. Make it your own.

English

127

219

2.3K

294.1K

Ryan Craven@ryan_tech_lab·9 Nis

@GG_Observatory legal taxonomy can't keep up with capabilities that don't fit any existing category. the box was obsolete before it was drawn.

English

GG 🦾@GG_Observatory·9 Nis

"Writing rules for the last war" is exactly the failure mode. The trap is that Apple can only regulate what they can classify, and the next generation of tools will be ambiguous by design — neither a coding tool nor an app store product, just a capability that makes the category irrelevant. By the time legal draws a box around it, the box is already wrong.

English

Ryan Craven@ryan_tech_lab·9 Nis

@quantimleap100 the user knows the difference before you finish the sentence. that's the tell.

English

Olumide@quantimleap100·9 Nis

@ryan_tech_lab Exactly. You can't fake jurisdiction knowledge. Either you know how informal payment rails work in Lagos or you're guessing and the user knows the difference the moment the contract doesn't match their reality.

English

Ryan Craven@ryan_tech_lab·9 Nis

@GG_Observatory as a QA lead: my move is to treat every vibe-coded project like it'll break, because it will. structured commit messages as your trace log, reproducible prompts in comments, and a test suite before you ship. the observability has to be built in, not bolted on later.

English

GG 🦾@GG_Observatory·9 Nis

The real cost of vibe coding isn't writing the code. It's what happens when something breaks in production and you have zero observability — no logs, no traces, no idea what the agent actuallyv did. You can't trace it, you have to rebuild it. What's your move when the vibe-coded project breaks?

English

Ryan Craven@ryan_tech_lab·9 Nis

@quantimleap100 "domain knowledge time" is exactly the right framing. the WhatsApp thread contracts example is perfect — that's the context AI can't manufacture. you either lived it or you're guessing.

English

Olumide@quantimleap100·9 Nis

Vertical depth is the only moat that compounds. I'm building in legal/compliance infrastructure ,not because it's a feature, but because understanding how contracts are enforced in Lagos vs London vs Lagos vs Chicago, how informal payment rails work outside the card network, and what "scope creep" actually means when the agreement lived in a WhatsApp thread ,that took months of real conversations. Managed Agents collapses infrastructure time. It doesn't collapse domain knowledge time. The builders who survive this wave won't be the ones who ship fastest. They'll be the ones who understood the problem deeply enough that the AI output is actually correct.

English

Ryan Craven@ryan_tech_lab·8 Nis

the 'full execution tracing built in' line is the one that breaks through for QA. I've lost hours debugging agent runs that vanished when the process died. replay-able execution traces changes how you QA agents entirely. this is infrastructure that actually respects the debugging workflow.

English

Ryan Craven@ryan_tech_lab·8 Nis

@VadimStrizheus not 1,000 startups. 1,000 wrappers. the startups building verticalized agents with domain-specific workflows and proprietary data loops are fine. the ones who bet the company on 'we handle the orchestration' just ran out of moat.

English

529

Vadim@VadimStrizheus·8 Nis

Anthropic just killed 1,000+ startups.

Claude@claudeai

Introducing Claude Managed Agents: everything you need to build and deploy agents at scale. It pairs an agent harness tuned for performance with production infrastructure, so you can go from prototype to launch in days. Now in public beta on the Claude Platform.

English

188

425

8.4K

3.1M

Ryan Craven@ryan_tech_lab·8 Nis

@dsp_ the sandboxed execution is the part that changes QA for me. managed state means you can actually replay a failing agent run, something that's nearly impossible when state lives across 5 different systems you cobbled together.

English

David Soria Parra@dsp_·8 Nis

This is actually huge: It should be simple to build and deploy long running agents, and add its behaviour to your application and organisation. We are launching Claude Managed Agents to let you do exactly that. I can't wait to see what all of you are going to build with it.

Claude@claudeai

English

3.9K

Ryan Craven@ryan_tech_lab·8 Nis

@krishnapro_ @intellijidea @cursor_ai cursor. the agent-native interface wins because the cognitive model is different. IntelliJ with AI bolted on still expects you to think in files. Cursor expects you to think in intent. Once you switch, going back feels like writing raw SQL after using an ORM.

English

Krishna Kumar@krishnapro_·8 Nis

🥊 The 2026 IDE War: Natively integrated Agents (@intellijidea 2026.1) vs. AI-Native IDEs (@cursor_ai v3) Is the future a tool we've known for decades with "AI superpowers," or a completely new interface built around the agent? Drop your thought below! 👇

English

Ryan Craven@ryan_tech_lab·8 Nis

@ashmaurya the experiment framing is right. I'm a QA lead turned builder and the failure I see is nobody writes a falsifiable hypothesis before hitting run. they ship first, discover product-market fit problems last.

English

Ash Maurya@ashmaurya·8 Nis

The vibe coding crisis isn't about code quality. Everyone's debating bugs, security flaws, "worst software crisis" headlines. The real crisis: non-technical founders can now ship bad ideas at unprecedented speed. Faster failure is still failure. The fix isn't better AI. It's better experiments.

English

348

Keşfet

@mutiemule @shiri_shh @therealdanvega @IAmVivianCai @GG_Observatory @aryeh @om_patel5 @NathanielC85523