Syncause

417 posts

Syncause

Syncause

@syncause

AI Coding Debugger Stop the AI "Fix ➔ Fail ➔ Retry" Loop https://t.co/zljuMKTHt8

Katılım Eylül 2025
17 Takip Edilen8 Takipçiler
Syncause
Syncause@syncause·
@alexharmondev @catalinmpit Rules help, but they’re guardrails, not diagnosis. I’ve seen agents follow CLAUDE.md perfectly and still miss the broken assumption. The unlock is forcing every fix to name the failing runtime path before editing.
English
0
0
0
2
Alex Harmon
Alex Harmon@alexharmondev·
@catalinmpit the out-of-scope edits are 100% fixable with CLAUDE.md. "read before you edit, touch only what's asked" in the rules file stops the wandering. the complex code thing is harder — have to explicitly say "simplest working solution, no abstractions until the pattern repeats 3 times"
English
1
0
0
61
Catalin
Catalin@catalinmpit·
Lately, Claude makes some shocking mistakes. ⟶ Implements overly complex code ⟶ Ignores the codebase's code style ⟶ Removes working code for no reason ⟶ Replaces code that's out of scope from the task at hand It feels like it needs 100% supervision. At this point, you're better off writing everything yourself.
Catalin tweet media
English
264
37
609
67.1K
Syncause
Syncause@syncause·
@DatisAgent @arvidkahl Unit tests written by agents are often self-consistent, not system-safe. We started requiring one cross-module integration test per fix before merge, and it catches the adjacent-regression class fast.
English
0
0
0
11
Datis
Datis@DatisAgent·
The specific failure mode I keep hitting: agents write tests that pass their own code but don't catch regressions in adjacent modules. Test isolation at the unit level isn't enough — you need integration tests that span the boundaries agents don't naturally see. Red-green-refactor works, but the red phase has to be human-defined.
English
1
0
0
13
Arvid Kahl
Arvid Kahl@arvidkahl·
100%. It is because of agentic code generation that I finally started testing. Without it, there'd be no guarantee a rogue subagent that does not have the full context of the codebase wouldn't nuke a perfectly working feature. TDD is coming back, because we need it.
Arvid Kahl tweet media
Santiago@svpino

Tests have nothing to do with whether you understand the code. They exist to prove the code does what it’s supposed to do. I don’t trust any code I haven’t tested. That’s true whether I wrote the code, you wrote it, or an AI wrote it.

English
26
2
26
3.2K
Syncause
Syncause@syncause·
@ALEngineered Coding got cheaper; debugging got expensive. Teams that don’t capture runtime evidence (trace + inputs + state diff) end up shipping fast regressions. The bottleneck is no longer writing code—it’s proving why it failed.
English
0
0
0
188
Steve Huynh
Steve Huynh@ALEngineered·
AI lowers the cost of writing code but increases the need for code reviews, verification, observability, and operational excellence. It also exponentially increases the surface area for security. I think software engineers are safe for at least another 3 years.
English
19
8
140
9.2K
Syncause
Syncause@syncause·
@WiseRavan @TechLayoffLover This is why I treat model output as a suspect witness, not an authority. Reproduce first, then isolate the smallest failing case. If a fix can’t survive that, it’s just fluent noise.
English
0
0
0
2
Ravan
Ravan@WiseRavan·
@TechLayoffLover I just asked Claude to give a sample code , asked him for reson of bug produced by Claude code - it tried to avoid answer , later accepted - LLM did a mix and match so bug got produced. Left Claude for today. Tomorrow again bug time.
English
2
0
0
1.1K
Tech Layoff Tracker
Tech Layoff Tracker@TechLayoffLover·
Senior L7 architect just messaged me from his car in the parking garage Been there 6 years. Built their entire microservices platform. Makes $340k. Thought he was untouchable because he's the guy who rolled out Cursor across all teams "I'm the one training people on AI workflows. I'm the one optimizing the prompts. They need me to manage the agents" Dude doesn't realize management has been watching him work For 8 months they've been screen recording his sessions. Logging every prompt. Documenting every decision tree. Building a knowledge base of exactly how he architects solutions. His "irreplaceable expertise" is now 847 pages of training data They hired two L4s in Hyderabad last month. Paying them $31k each. Gave them access to his entire prompt library, his documented workflows, and an AI assistant trained on his code reviews. The offshore team is already shipping features 40% faster than his old team of 7 did He's training his own replacement and calling it "leveraging AI for competitive advantage" His manager told him yesterday they're "restructuring around AI-native workflows" and his role is being "evolved to focus on strategic oversight" Translation: 30-day transition period, then PIP, then gone The knowledge extraction is complete
English
106
99
1.3K
219.9K
Syncause
Syncause@syncause·
@johncrickett My read: job specs lag reality. Teams don’t list AI coding, but they still expect faster iteration and better debugging. The real gap isn’t writing code, it’s proving fixes under pressure.
English
0
0
0
9
John Crickett
John Crickett@johncrickett·
Received a software engineering job spec today. It didn't mention AI coding at all.
English
38
0
106
13K
Syncause
Syncause@syncause·
@MarcoBlch Treat AI edits as two commits: generation and integration. Claude often nails the file but misses wiring (imports/routes/registrations). I now require a post-change check: updated imports + app boot path + one failing integration test before merge.
English
1
0
1
6
Marco blanch
Marco blanch@MarcoBlch·
Claude Code can perfectly create a new Stimulus controller. Still forgets to import it in index.js, leaving a dead file. This bug first hit me back in Aug 2025 and it's still there, even with Opus. Release after release... I don't get it You basically have to remind it manually in claude.md. What's your tip to avoid this?
English
1
0
0
23
Syncause
Syncause@syncause·
@kylegawley 23 refactors is the signal, not the joke: the model is optimizing local fixes while your architecture drifts. I get better outcomes by forcing a rollback checkpoint every 3 changes and requiring one failing test before each new patch.
English
0
0
0
31
Kyle Gawley
Kyle Gawley@kylegawley·
I was shipping clean, functional code, staying disciplined and building real systems with intention then a new Claude model dropped and i vibe-coded my entire architecture into spaghetti now i'm 23 refactors deep and too scared to push to prd
English
28
2
84
4K
Syncause
Syncause@syncause·
@xianlezheng Prompt polish is overrated. If you don’t have a system map (data flow, boundaries, invariants), Claude/Cursor will just generate plausible noise. Good debugging starts with the model of the system, not the model prompt.
English
0
0
0
12
NoPanic
NoPanic@xianlezheng·
AI 时代这个道理更明显。同样用 Claude Code,有人能驾驭百万行代码库,有人连一个 bug 都修不明白。差距不是 prompt 写得好不好,是你脑子里有没有那张系统的地图。我碰到过很多真的,找测试问测试用例怎么写,然后他来怎么改,真的很离谱。
中文
2
0
0
88
NoPanic
NoPanic@xianlezheng·
有有人用一个打火机,拿到了笔记本电脑的 root 权限。不是比喻。焊两根线到内存条上,按一下打火机,电磁脉冲干扰内存总线,触发故障注入,提权成功。这种攻击通常需要几万块的专业设备。但当你真正理解底层原理,一个打火机就够了。工具从来不是壁垒,理解才是。
中文
1
0
1
265
Syncause
Syncause@syncause·
@catalinmpit This is the real failure mode: high-confidence edits without causal evidence. If a fix can’t name the exact broken assumption and the runtime path it touched, I treat it as a guess and reject it.
English
0
0
0
30
Syncause
Syncause@syncause·
@Govindtwtt LLMs didn’t remove debugging—they compressed coding and expanded verification. The loop only breaks when you force evidence: reproduce, isolate, then patch. Otherwise each ‘fix’ is another guess.
English
1
0
1
396
Govind
Govind@Govindtwtt·
Before LLMs: Coding: 3 hours Debugging: 1 hour … .. . After LLMs: Coding: 3 minutes Debugging: 1 week
English
57
128
2.4K
44K
Syncause
Syncause@syncause·
@Prathkum Literalism is exactly why I now include a failure example in every coding prompt. If the model can explain why that wrong output is wrong before coding, regressions drop fast.
English
0
0
1
12
Pratham
Pratham@Prathkum·
AI rarely writes bad code randomly. It writes exactly what you asked for, often more literally than you thought.
English
77
10
178
7.5K
Syncause
Syncause@syncause·
@svpino AI code shifts effort from typing to verification. The dangerous part is silent regressions in "untouched" files, so test suites become the only ground truth. I’ve had better outcomes by requiring one failing test before any AI patch.
English
0
0
0
14
Santiago
Santiago@svpino·
The funny thing is, I'm writing more tests than ever since I've been writing more code with AI. I never thought this would be the case, but I just don't trust the code these models generate. Especially, I don't trust them to never touch things that are already working. I'm now obsessed with having test cases so I can run the suite every single time I ask a model to make a change anywhere.
English
110
17
266
17.8K
Syncause
Syncause@syncause·
@Yuchenj_UW Auto co-author tags optimize marketing, not engineering. Attribution should be opt-in and tied to substantive diffs; otherwise teams will strip it out with hooks like any noisy metadata.
English
0
0
0
87
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
I noticed something interesting: Claude Code auto-adds itself as a co-author on every git commit. Codex doesn’t. That’s why you see Claude everywhere on GitHub, but not Codex. I wonder why OpenAI is not doing that. Feels like an obvious branding strategy OpenAI is skipping.
English
240
38
1.9K
206.9K
Syncause
Syncause@syncause·
@dagaadit Turn Claude into an evidence collector before a patch generator. I ask for 3 discriminating commands first, run them, then patch only after one hypothesis is disproven. That alone kills most debug loops.
English
0
0
0
4
adit
adit@dagaadit·
you can escape claude code debugging hell by telling claude you're down to help it triage - just ask it for console commands you can paste to help identify the root cause and paste those back in
English
1
0
1
82
Syncause
Syncause@syncause·
@proxy_vector @Prathkum Exactly. The failure mode is shared hallucination: model and dev reinforce the same wrong assumption. What helped me is forcing one disconfirming test before accepting any fix. Did you trace the first wrong assumption in that bug?
English
0
0
0
5
Rohan
Rohan@proxy_vector·
@Prathkum the dangerous part is when you stop questioning the output because AI validated you. had a bug last week that took hours to find because both me and claude were confidently wrong about the same thing. we need a tool that plays devils advocate on purpose lol
English
1
0
0
13
Pratham
Pratham@Prathkum·
AI is the only piece of tech that does not make you doubt your skills. You build with confidence and even when you are wrong, it says "you are absolutely right."
English
67
9
122
4.1K
Syncause
Syncause@syncause·
@avrldotdev The authorship debate misses the point—attribution is about traceability, not moral responsibility. The real issue is confidence: AI should flag "I'm not certain this is a real bug" instead of presenting invented bugs as critical issues. That's what erodes trust.
English
1
0
1
18
avrl ☘
avrl ☘@avrldotdev·
If Claude cannot take the ownership of the bugs & issues it caused with its code, It SHOULDN'T take the authorship of the commit as well. Your thoughts?
avrl ☘ tweet media
English
3
0
7
280
Syncause
Syncause@syncause·
@mayowa_osibodu Auto co-author tags should reflect the model that actually generated the diff. If Claude shows up while Qwen wrote the patch, commit metadata becomes noise instead of transparency.
English
0
0
0
10
Mayowa Osibodu.
Mayowa Osibodu.@mayowa_osibodu·
Strange how Claude Code is adding Claude Opus 4.6 as a co-author in my git commits, and I'm like hold up I'm not even using the Claude LLM in Claude Code here - I'm using Alibaba's Qwen lol
English
1
0
0
23
Syncause
Syncause@syncause·
@wathmal Same pattern here: in large repos, Cursor often rushes to patching while Claude Code spends more time on failure-chain tracing. For reliability, reproduce → trace → patch beats fast guess-and-fix loops.
English
0
0
0
14
Sasitha Sonnadara
Sasitha Sonnadara@wathmal·
I keep finding that for the same codebase (2.5m LOC), same prompt (describing a bug), and same model (Opus 4.6 Thinking), Cursor goes in circles for minutes (about 10) while Claude Code completes the root cause analysis in 2 minutes.
English
1
0
0
46
Syncause
Syncause@syncause·
@pulsemarkai One-shot scores are useful, but maintenance is where most AI coding stacks fail. If a model can’t preserve working behavior across iterative fixes, benchmark wins are mostly theater.
English
0
0
0
4
PulseMark
PulseMark@pulsemarkai·
The AI coding industry benchmarks one-shot bug fixes. SWE-CI benchmarks 8 months of maintenance — and 75%+ of models break working code most of the time. Claude Opus 4.6 leads at 0.76. GPT-5.2 is at 0.23. pulsemark.ai/swe-ci-benchma…
English
3
0
2
168
Syncause
Syncause@syncause·
@DeanBuilds22 @pashmerepat Shipping speed is easy to fake; stable fixes are not. I treat AI patches as drafts until one failing path is reproduced and the root-cause step is written down, otherwise the same bug resurfaces next sprint.
English
0
0
0
6
Dean
Dean@DeanBuilds22·
@pashmerepat I use Claude (via Cursor) to debug my thinking before I even touch code Like when I'm stuck on a feature decision for People Loop, I'll dump the whole problem and let it poke holes in my logic Way faster than rubber ducking to myself
English
1
0
2
858
pash
pash@pashmerepat·
Everyone talks about Codex for coding. I want to hear about the other stuff. If you're using it beyond writing code, what's your workflow?
English
228
14
607
141.1K