Ralf Kronen

369 posts

Ralf Kronen

@RKronen

In love with entrepreneurship, ai, software development and cooking

Katılım Temmuz 2010

68 Takip Edilen13 Takipçiler

Ralf Kronen@RKronen·24m

@PawelHuryn The durable artifact point is right. But "safe to ship" splits in two: a floor the tool can enforce (tests green, no TODO) and a judgment only you can make. Gate the floor so your read goes to the real risk. Otherwise the commit just durably stores a broken state.

English

Paweł Huryn@PawelHuryn·3h

I already stopped reviewing code. Instead, I ask what moved, why, whether it's safe to ship. That's the part the tool can't do for you yet. Wrote the full version recently. Seven lessons on what "review the artifact, not the code" looks like in practice, and how hard you review scales with how much can break: productcompass.pm/p/agentic-engi…

English

409

Paweł Huryn@PawelHuryn·3h

The creator of Claude Code says coding is the easy part. The replies are arguing whether he's right. Look at what he put on the unsolved side. Past the infra and debugging: deciding what to optimize, talking to users, product planning. He just described the PM job without calling it that. There has never been more opportunities for us.

Boris Cherny@bcherny

Coding is just one part of engineering. There’s also debugging, operating services, scaling up infrastructure, deciding what to optimize, setting up hardware and capacity, talking to users, product planning, etc. Coding is the easy part, everything else is not yet solved (but is also becoming increasingly automated).

English

861

Ralf Kronen@RKronen·25m

@AanshulSadaria A second AI reviewing the first is still an opinion, not an oracle, so it can't close the loop. What closes it is a check the agent can't author: a golden output, a property, an invariant you own. Then something that refuses the merge instead of guessing.

English

Aanshul Sadaria@AanshulSadaria·9h

Talk to almost any senior engineer privately and they’ll admit it: Nobody is really reviewing PRs anymore. With AI agents writing code, PR volume has spiked 5x to 10x. Human review capacity hasn't scaled to match. Reviewers look at a massive diff, see passing unit tests, and hit "Approve". It works fine… until a critical bug hits production. 💥 Most testing tools try to solve this by adding “yet another” AI model to read the code and guess. But adding an AI to review your AI doesn't close the loop. 🤦

English

7.8K

Ralf Kronen@RKronen·26m

@NagdyWP Here's the catch. When it offers to fix and you say go for it, the same model that called it done is grading its own retry. The risk is the time it never offers, because it already thinks it's fine. That silent pass is what a check outside the model is for.

English

Ahmed Nagdy - أحمد نجدي@NagdyWP·21h

@RKronen I agree but from what I have seen, the agent always offers to fix and all I do is telling it, go for it!

English

Ahmed Nagdy - أحمد نجدي@NagdyWP·1d

1/Instruction files don't fix AI-generated code. I tried for months, Claude Code, Codex, all of them. Strong start, then the same mess: mock-everything tests, try/catch returning "ok", duplicate tests that catch nothing. What worked: reviewing AFTER the agent finishes. So I built guards.

English

148

Ralf Kronen@RKronen·35m

@hiper2d @iam_mian7 Fair, the threshold is real. But it cuts you too. Past a certain size you can't hold the whole thing in your head either, so rereading stops scaling right when the project needs it most. The gate is the one check that doesn't get worse as the code grows.

English

Aliaksei Zelianouski@hiper2d·22h

@RKronen @iam_mian7 Oh, yeah, slowest - for sure. But from my experience, a serious, long-run project still needs this. There is a certain threshold of complexity which coding agents cannot cross and keep things going well. This threshold is moving rapidly, though.

English

Ai Arainz@iam_mian7·5d

Your AI agent just wrote a few hundred lines in minutes. Quick , which line has the bug? Sonar's 2026 State of Code found that 96% of developers don't fully trust AI-generated code, yet only 48% always verify it before committing. AWS CTO Werner Vogels calls the result "verification debt." That's not a management problem. It's an engineering problem. Other verification tools read your code and guess. @Test_Sprite opens your app and uses it. With parallel exploration agents, it maps real user flows, generates a test plan, and validates behavior against the actual product, not just the diff. Adding another AI to review your AI doesn't close the loop. Using your app does

English

554

27.7K

Ralf Kronen@RKronen·3h

Best way to understand an unfamiliar codebase fast isn't "explain this code". Ask the agent to rewrite the folder idiomatically with the same tests. Don't commit it. Read the diff. Ten minutes, free mental model.

English

Ralf Kronen@RKronen·19h

@boyuan_chen Agreed, and the hard stop is the part everyone skips. The trouble is it can't live inside the loop that's hallucinating, that loop won't flag itself. Make it deterministic and outside the model: red exit code, failing test, no merge. Not a judgment it can talk its way past.

English

Boyuan (Nemo) Chen@boyuan_chen·21h

A casino is the right metaphor for bad vibe coding. The better version looks more like risk management: tests, diffs, logs, review gates, and a hard stop when the loop starts hallucinating.

Dmitrii Kovanikov@ChShersh

Vibe-coding is just a gambling addiction for SWEs

English

Ralf Kronen@RKronen·19h

@heyrapto The fix isn't asking him to be more disciplined. It's a gate that refuses the push when tests are red, so skipping them stops being a choice. Same fix whether a human or an agent wrote the code. Discipline you have to remember isn't discipline.

English

Rapto@heyrapto·1d

Vibe coding is ruining software engineering. I spoke to a vibe coder who said he doesn't test locally. He pushes code and tests it in production. Imagine your users being your QA team.

English

348

Ralf Kronen@RKronen·22h

@Antje_Kapek Besser kann man sein totalitäres Verständnis gegenüber Andersdenkenden nicht zum Ausdruck bringen. Willkommen im linken Neo Faschismus

Deutsch

Antje Kapek@Antje_Kapek·3d

Endlich! Die #Nius Kampagne in der #BVG wurde gekündigt! Das ist ein klarer Sieg der Zivilgesellschaft! Jetzt müssen die Werberechte noch so angepasst werden, das menschen- und demokratiefeindliche Inhalte oder Organisationen von Anfang an ausgeschlossen werden können.

Deutsch

2.9K

362

823.9K

Ralf Kronen@RKronen·22h

@filicroval That’s why we’ve developed this; it doesn’t have many GitHub stars yet, but it offers greater security and is ready for use in business environments: pilot.nubos.cloud

English

filipe@filicroval·22h

@RKronen that's where multi-agents structures come handy

English

filipe@filicroval·1d

guys, vibe coding might be over a free tool with 200k+ stars on github does the exact opposite of vibe coding: it forces your agent to brainstorm a spec, get your approval, write a plan, then build with real TDD and code review. Claude can run autonomously for hours without going off the rails works with Claude Code, Codex, Cursor, Gemini CLI

English

3.4K

Ralf Kronen@RKronen·22h

@shub0414 The slop is real. But rehiring people to firefight is the expensive fix. The model writing bad code was never the failure, nothing stopping it from reaching prod is. A gate that refuses the commit when verify is red costs less than a cleanup team.

English

Shub@shub0414·2d

AI is pushing so much garbage code in production now that very soon they'll have to rehire more human than they laid off just to fix bugs created by AI and vibe coding.

English

127

468

156.5K

Ralf Kronen@RKronen·22h

@webdevcody Same here, and the tax is that you are the verify step. The fix isn't a smarter model, it's making verify mechanical: tests plus a critic that won't let the work advance until it's green. The loop should close itself instead of waiting on your eyes.

English

WebDevCody@webdevcody·2d

my workflow is prompt -> verify -> repeat until it works. even with these latest models, I constantly have to verify the work as it never seems to get it right. is this the same experience others are having?

English

156

16.2K

Ralf Kronen@RKronen·22h

@thephatcoder @callmidavid The reason it's a bottleneck is that it sits by hand on every commit. And a passing type check isn't done, it proves shape not behavior. Move the gate into the pipeline: tests run, a critic checks, red blocks the merge. Then you're reviewing decisions, not diffs.

English

Deep 🔫@thephatcoder·1d

@callmidavid Reviewing AI generated code is becoming a bottleneck to me especially when working with team

English

844

David Uchenna@callmidavid·1d

So you review every line of code written by Ai?🤌

English

110

14.6K

Ralf Kronen@RKronen·1d

@kunchenguid Your 68% only holds because the checker is independent of the author. An agent grading its own diff can't catch what it couldn't see writing it, so green just means it agreed with itself. The verifier has to sit outside the code under test. Good work shipping this.

English

Kun Chen@kunchenguid·1d

AI generated code, even from the best models we have today, is not at a place where we can just trust and merge them without heavy scrutiny this is my real personal stats - 68% of my changes had problems that would have gotten merged if i didn't have no-mistakes to catch them

English

9.4K

Kun Chen@kunchenguid·1d

/no-mistakes is here! by popular demand i've made the most impactful tool in my agentic engineering setup "no-mistakes" invocable as a skill in Claude Code, Codex et al just type "/no-mistakes" once your agent has made changes, and watch the magic unfold details below 👇

English

1.5K

128.3K

Ralf Kronen@RKronen·1d

Stop prompting "build the whole thing". Ask for the skeleton first: all signatures, all Todos, no bodies. Then fill one function per iteration. You catch architecture problems while they're still cheap.

English

Ralf Kronen@RKronen·1d

@hiper2d @iam_mian7 Fair, rereading every line does build the understanding back. But that is also the slowest hour of the day, and most of what you check is mechanical: did it run, is there a test, any TODO left. Let a gate eat those so your reading goes to the parts that need a brain.

English

Aliaksei Zelianouski@hiper2d·3d

@RKronen @iam_mian7 Well.... you actually do get the understanding back by rereading every line.

English

Ralf Kronen@RKronen·1d

@rohit_ah The gap lives in what a commit means. Today it means generated, not done. Shrink it by making done mechanical: tests actually run, an independent check passes, no commit while verify is red. Then a commit equals shippable, and 180 vs 30 starts to converge.

English

Rohit Ahuja@rohit_ah·1d

New AI coding study: impressive, but inconvenient. AI agents increased commits by up to 180%. But actual releases rose only 30%. So yes, AI is helping us write much more code. But apparently “more commits” is not the same as “more product.” Code generation is becoming cheap. But review, testing, integration, product judgment, packaging, security, and actual shipping still need humans and strong systems. That said, this is likely a phase in the learning curve, not the final verdict. LLMs will improve. Coding agents will get more reliable. Toolchains will integrate better. More importantly, users will get better at prompting, decomposing tasks, defining constraints, reviewing output, and converting AI-generated work into shipped product. The real AI dividend will come from teams that redesign the full software production system around AI. More code is not the same as more product. But better AI workflows may soon make that gap much smaller.

English

Ralf Kronen@RKronen·1d

@0x_rody Layers 3 and 4 carry the weight. The CLAUDE.md rules ask the model to police itself, and it talks past them by session two. The Stop hook and fact checker work because they sit outside the model and refuse to pass. Rules are the part it can argue with.

English

rody@0x_rody·2d

x.com/i/article/2063…

ZXX

259

352.9K

Ralf Kronen@RKronen·1d

@bibryam Good framing, and it holds until the agent authors the oracle. "Do not change the test" is backpressure the model can quietly rewrite. The last sensor has to be one it cannot author, wired to block the commit. Otherwise the loop just teaches it which check to soften.

English

Bilgin Ibryam@bibryam·2d

x.com/i/article/2060…

ZXX

172

14.1K

Keşfet

@PawelHuryn @AanshulSadaria @NagdyWP @hiper2d @iam_mian7 @Test_Sprite @boyuan_chen @heyrapto