
Takeaway: truth directions in LLMs seem robust mostly in a limited range of pure-factual tasks for specific prompt formats, but break down when truth assessment requires tracking intermediate results.
📄Testing the Limits of Truth Directions in LLMs: arxiv.org/pdf/2604.03754
English
