
Biggest problem is stability. I feel like there are several different personale in Claude, I always use latest opus with max, but it varies a lot across sessions. It would behave differently by default across sessions.
With similar research task, it would launch experiment script, some would use poll to check the result every few seconds, some would eager to use sleep 30s after launch to check(which I specifically forbid in memory, and it just ignore it), some would properly wait till the task is finished(event).
When i ask it to look for biology fact, some would go and search for papers without prompting(good), some would just make things up without trying even specifically asked to look for papers(bad).
When i prompt a research directing and goal, some would be very lazy, turn single knob and failed, and claim this is impossible. Some would reason a lot and find bottleneck and try and error and actually make progress by understanding previous failure mode.
Tool use varies a lot too, in my Julia project, some would be very eager to use python, and it always default to python command rather than uv run which is written in memory. And some would be very eager to use uv run even to run a julia program(pointless but it runs anyway) because of that memory.
When finished answering, it would often ask what I do next, some would ask directly, some would use AskUserTool, some would simply answer without asking more, and some would immediately start working without permission.
When iterating on experiments, the naming convention changes across session, some would respect the existing naming, some would make up very different way of naming even there is existing convension reference. It make v1/v2, feat1_pram2, a/b/c/d, iter01/02 kinds of naming randomly.
It's very frustrating when it's in a bad/lazy personale, no matter how I prompt it just don't work, and when it's in good mode, it just do all the good things without prompting.
English






















