Where agents work reliably:
- Reproducible, testable bugs
- Features with clear specs (tickets, design docs)
- Codebase exploration
- Prototypes / spikes
- PR cleanup
They struggle when requirements are underspecified.
Prompts must be explicit.
❌ “Enable JSON parser for backend”
✅ “Enable JSON parser in LLMOutputParsing (services/). Follow test style in TextProcessor.”
Path + class + file + reference example. Missing any of these reduces accuracy.
For multi-step tasks:
- Split into atomic subtasks
- Ask for a plan before implementation
- Let the agent run and interpret its own test output
- Parallelize your own work while it runs
Think workflows, not one-shot prompts.