Where agents work reliably:
- Reproducible, testable bugs
- Features with clear specs (tickets, design docs)
- Codebase exploration
- Prototypes / spikes
- PR cleanup
They struggle when requirements are underspecified.
Prompts must be explicit.
❌ “Enable JSON parser for backend”
✅ “Enable JSON parser in LLMOutputParsing (services/). Follow test style in TextProcessor.”
Path + class + file + reference example. Missing any of these reduces accuracy.
For multi-step tasks:
- Split into atomic subtasks
- Ask for a plan before implementation
- Let the agent run and interpret its own test output
- Parallelize your own work while it runs
Think workflows, not one-shot prompts.
If the agent drifts:
- Restart session if far off
- Redirect if partial progress is usable
- Roll back edits with checkpoints
- Inspect prompt/context gaps that caused failure
Debug agent behavior the way you debug systems: isolate root cause.