Where agents work reliably:
- Reproducible, testable bugs
- Features with clear specs (tickets, design docs)
- Codebase exploration
- Prototypes / spikes
- PR cleanup
They struggle when requirements are underspecified.
Prompts must be explicit.
❌ “Enable JSON parser for backend”
✅ “Enable JSON parser in LLMOutputParsing (services/). Follow test style in TextProcessor.”
Path + class + file + reference example. Missing any of these reduces accuracy.