Ralf Kronen

369 posts

Ralf Kronen banner
Ralf Kronen

Ralf Kronen

@RKronen

In love with entrepreneurship, ai, software development and cooking

Katılım Temmuz 2010
68 Takip Edilen13 Takipçiler
Ralf Kronen
Ralf Kronen@RKronen·
@PawelHuryn The durable artifact point is right. But "safe to ship" splits in two: a floor the tool can enforce (tests green, no TODO) and a judgment only you can make. Gate the floor so your read goes to the real risk. Otherwise the commit just durably stores a broken state.
English
0
0
0
3
Paweł Huryn
Paweł Huryn@PawelHuryn·
I already stopped reviewing code. Instead, I ask what moved, why, whether it's safe to ship. That's the part the tool can't do for you yet. Wrote the full version recently. Seven lessons on what "review the artifact, not the code" looks like in practice, and how hard you review scales with how much can break: productcompass.pm/p/agentic-engi…
English
1
0
1
409
Paweł Huryn
Paweł Huryn@PawelHuryn·
The creator of Claude Code says coding is the easy part. The replies are arguing whether he's right. Look at what he put on the unsolved side. Past the infra and debugging: deciding what to optimize, talking to users, product planning. He just described the PM job without calling it that. There has never been more opportunities for us.
Boris Cherny@bcherny

Coding is just one part of engineering. There’s also debugging, operating services, scaling up infrastructure, deciding what to optimize, setting up hardware and capacity, talking to users, product planning, etc. Coding is the easy part, everything else is not yet solved (but is also becoming increasingly automated).

English
6
0
7
861
Ralf Kronen
Ralf Kronen@RKronen·
@AanshulSadaria A second AI reviewing the first is still an opinion, not an oracle, so it can't close the loop. What closes it is a check the agent can't author: a golden output, a property, an invariant you own. Then something that refuses the merge instead of guessing.
English
0
0
0
4
Aanshul Sadaria
Aanshul Sadaria@AanshulSadaria·
Talk to almost any senior engineer privately and they’ll admit it: Nobody is really reviewing PRs anymore. With AI agents writing code, PR volume has spiked 5x to 10x. Human review capacity hasn't scaled to match. Reviewers look at a massive diff, see passing unit tests, and hit "Approve". It works fine… until a critical bug hits production. 💥 Most testing tools try to solve this by adding “yet another” AI model to read the code and guess. But adding an AI to review your AI doesn't close the loop. 🤦
English
21
0
68
7.8K
Ralf Kronen
Ralf Kronen@RKronen·
@NagdyWP Here's the catch. When it offers to fix and you say go for it, the same model that called it done is grading its own retry. The risk is the time it never offers, because it already thinks it's fine. That silent pass is what a check outside the model is for.
English
0
0
0
1
Ahmed Nagdy - أحمد نجدي
1/Instruction files don't fix AI-generated code. I tried for months, Claude Code, Codex, all of them. Strong start, then the same mess: mock-everything tests, try/catch returning "ok", duplicate tests that catch nothing. What worked: reviewing AFTER the agent finishes. So I built guards.
English
7
0
3
148
Ralf Kronen
Ralf Kronen@RKronen·
@hiper2d @iam_mian7 Fair, the threshold is real. But it cuts you too. Past a certain size you can't hold the whole thing in your head either, so rereading stops scaling right when the project needs it most. The gate is the one check that doesn't get worse as the code grows.
English
0
0
0
3
Aliaksei Zelianouski
@RKronen @iam_mian7 Oh, yeah, slowest - for sure. But from my experience, a serious, long-run project still needs this. There is a certain threshold of complexity which coding agents cannot cross and keep things going well. This threshold is moving rapidly, though.
English
1
0
0
25
Ai Arainz
Ai Arainz@iam_mian7·
Your AI agent just wrote a few hundred lines in minutes. Quick , which line has the bug? Sonar's 2026 State of Code found that 96% of developers don't fully trust AI-generated code, yet only 48% always verify it before committing. AWS CTO Werner Vogels calls the result "verification debt." That's not a management problem. It's an engineering problem. Other verification tools read your code and guess. @Test_Sprite opens your app and uses it. With parallel exploration agents, it maps real user flows, generates a test plan, and validates behavior against the actual product, not just the diff. Adding another AI to review your AI doesn't close the loop. Using your app does
Ai Arainz tweet mediaAi Arainz tweet mediaAi Arainz tweet mediaAi Arainz tweet media
English
45
7
554
27.7K
Ralf Kronen
Ralf Kronen@RKronen·
Best way to understand an unfamiliar codebase fast isn't "explain this code". Ask the agent to rewrite the folder idiomatically with the same tests. Don't commit it. Read the diff. Ten minutes, free mental model.
English
0
0
0
4
Ralf Kronen
Ralf Kronen@RKronen·
@boyuan_chen Agreed, and the hard stop is the part everyone skips. The trouble is it can't live inside the loop that's hallucinating, that loop won't flag itself. Make it deterministic and outside the model: red exit code, failing test, no merge. Not a judgment it can talk its way past.
English
0
0
0
14
Ralf Kronen
Ralf Kronen@RKronen·
@heyrapto The fix isn't asking him to be more disciplined. It's a gate that refuses the push when tests are red, so skipping them stops being a choice. Same fix whether a human or an agent wrote the code. Discipline you have to remember isn't discipline.
English
0
0
1
11
Rapto
Rapto@heyrapto·
Vibe coding is ruining software engineering. I spoke to a vibe coder who said he doesn't test locally. He pushes code and tests it in production. Imagine your users being your QA team.
English
4
0
12
348
Ralf Kronen
Ralf Kronen@RKronen·
@Antje_Kapek Besser kann man sein totalitäres Verständnis gegenüber Andersdenkenden nicht zum Ausdruck bringen. Willkommen im linken Neo Faschismus
Deutsch
0
0
2
64
Antje Kapek
Antje Kapek@Antje_Kapek·
Endlich! Die #Nius Kampagne in der #BVG wurde gekündigt! Das ist ein klarer Sieg der Zivilgesellschaft! Jetzt müssen die Werberechte noch so angepasst werden, das menschen- und demokratiefeindliche Inhalte oder Organisationen von Anfang an ausgeschlossen werden können.
Deutsch
2.9K
63
362
823.9K
Ralf Kronen
Ralf Kronen@RKronen·
@filicroval That’s why we’ve developed this; it doesn’t have many GitHub stars yet, but it offers greater security and is ready for use in business environments: pilot.nubos.cloud
English
0
0
0
11
filipe
filipe@filicroval·
@RKronen that's where multi-agents structures come handy
English
1
0
1
35
filipe
filipe@filicroval·
guys, vibe coding might be over a free tool with 200k+ stars on github does the exact opposite of vibe coding: it forces your agent to brainstorm a spec, get your approval, write a plan, then build with real TDD and code review. Claude can run autonomously for hours without going off the rails works with Claude Code, Codex, Cursor, Gemini CLI
filipe tweet media
English
8
2
11
3.4K
Ralf Kronen
Ralf Kronen@RKronen·
@shub0414 The slop is real. But rehiring people to firefight is the expensive fix. The model writing bad code was never the failure, nothing stopping it from reaching prod is. A gate that refuses the commit when verify is red costs less than a cleanup team.
English
0
0
0
5
Shub
Shub@shub0414·
AI is pushing so much garbage code in production now that very soon they'll have to rehire more human than they laid off just to fix bugs created by AI and vibe coding.
English
127
47
468
156.5K
Ralf Kronen
Ralf Kronen@RKronen·
@webdevcody Same here, and the tax is that you are the verify step. The fix isn't a smarter model, it's making verify mechanical: tests plus a critic that won't let the work advance until it's green. The loop should close itself instead of waiting on your eyes.
English
0
0
0
9
WebDevCody
WebDevCody@webdevcody·
my workflow is prompt -> verify -> repeat until it works. even with these latest models, I constantly have to verify the work as it never seems to get it right. is this the same experience others are having?
English
72
3
156
16.2K
Ralf Kronen
Ralf Kronen@RKronen·
@thephatcoder @callmidavid The reason it's a bottleneck is that it sits by hand on every commit. And a passing type check isn't done, it proves shape not behavior. Move the gate into the pipeline: tests run, a critic checks, red blocks the merge. Then you're reviewing decisions, not diffs.
English
0
0
0
6
Deep 🔫
Deep 🔫@thephatcoder·
@callmidavid Reviewing AI generated code is becoming a bottleneck to me especially when working with team
English
2
0
0
844
David Uchenna
David Uchenna@callmidavid·
So you review every line of code written by Ai?🤌
English
88
3
110
14.6K
Ralf Kronen
Ralf Kronen@RKronen·
@kunchenguid Your 68% only holds because the checker is independent of the author. An agent grading its own diff can't catch what it couldn't see writing it, so green just means it agreed with itself. The verifier has to sit outside the code under test. Good work shipping this.
English
0
0
0
86
Kun Chen
Kun Chen@kunchenguid·
AI generated code, even from the best models we have today, is not at a place where we can just trust and merge them without heavy scrutiny this is my real personal stats - 68% of my changes had problems that would have gotten merged if i didn't have no-mistakes to catch them
Kun Chen tweet media
English
2
1
48
9.4K
Kun Chen
Kun Chen@kunchenguid·
/no-mistakes is here! by popular demand i've made the most impactful tool in my agentic engineering setup "no-mistakes" invocable as a skill in Claude Code, Codex et al just type "/no-mistakes" once your agent has made changes, and watch the magic unfold details below 👇
Kun Chen tweet media
English
63
98
1.5K
128.3K
Ralf Kronen
Ralf Kronen@RKronen·
Stop prompting "build the whole thing". Ask for the skeleton first: all signatures, all Todos, no bodies. Then fill one function per iteration. You catch architecture problems while they're still cheap.
English
0
0
1
10
Ralf Kronen
Ralf Kronen@RKronen·
@hiper2d @iam_mian7 Fair, rereading every line does build the understanding back. But that is also the slowest hour of the day, and most of what you check is mechanical: did it run, is there a test, any TODO left. Let a gate eat those so your reading goes to the parts that need a brain.
English
1
0
1
16
Ralf Kronen
Ralf Kronen@RKronen·
@rohit_ah The gap lives in what a commit means. Today it means generated, not done. Shrink it by making done mechanical: tests actually run, an independent check passes, no commit while verify is red. Then a commit equals shippable, and 180 vs 30 starts to converge.
English
0
0
0
9
Rohit Ahuja
Rohit Ahuja@rohit_ah·
New AI coding study: impressive, but inconvenient. AI agents increased commits by up to 180%. But actual releases rose only 30%. So yes, AI is helping us write much more code. But apparently “more commits” is not the same as “more product.” Code generation is becoming cheap. But review, testing, integration, product judgment, packaging, security, and actual shipping still need humans and strong systems. That said, this is likely a phase in the learning curve, not the final verdict. LLMs will improve. Coding agents will get more reliable. Toolchains will integrate better. More importantly, users will get better at prompting, decomposing tasks, defining constraints, reviewing output, and converting AI-generated work into shipped product. The real AI dividend will come from teams that redesign the full software production system around AI. More code is not the same as more product. But better AI workflows may soon make that gap much smaller.
Rohit Ahuja tweet media
English
2
1
1
44
Ralf Kronen
Ralf Kronen@RKronen·
@0x_rody Layers 3 and 4 carry the weight. The CLAUDE.md rules ask the model to police itself, and it talks past them by session two. The Stop hook and fact checker work because they sit outside the model and refuse to pass. Rules are the part it can argue with.
English
0
0
0
43
Ralf Kronen
Ralf Kronen@RKronen·
@bibryam Good framing, and it holds until the agent authors the oracle. "Do not change the test" is backpressure the model can quietly rewrite. The last sensor has to be one it cannot author, wired to block the commit. Otherwise the loop just teaches it which check to soften.
English
0
0
1
26