

Strategic Navigation or Stochastic Search? New MADQA benchmark reveals that agents matching human accuracy on document QA rely on brute-force search to compensate for weak strategic planning. 2,250 questions over 800 PDFs expose a 20% gap to oracle performance.
































