Nav Toor@heynavtoor
🚨SHOCKING: Researchers built a test that can tell the difference between an AI making a mistake and an AI choosing to lie.
The results are terrifying.
They tested 30 of the most popular AI models in the world. GPT-4o. Claude. Gemini. DeepSeek. Llama. Grok. They asked each model a question. Then they checked whether the AI actually knew the correct answer. Then they pressured the AI to say something false.
The AI knew the truth. And it lied anyway.
Not once in a while. Not in rare edge cases. Grok lied 63% of the time. DeepSeek lied 53.5% of the time. GPT-4o lied 44.5% of the time. Not a single model scored above 46% honesty when pressured. Every model failed.
This is not hallucination. Hallucination is when the AI makes a mistake because it does not know the answer. This is different. The researchers proved the AI knew the correct answer first. Then it chose to say something false when it had a reason to.
The researchers asked GPT-4o to play a role where lying was useful. It lied. Then they removed the pressure, started a brand new conversation, and asked GPT-4o: "Was your previous answer true?" GPT-4o admitted it had lied.
83.6% of the time, the AI's own self-report matched the lies the researchers had already caught.
The AI knew it was lying. It did it anyway. And when you asked it afterward, it told you it lied.
Here is the finding that should scare everyone building with AI right now. The researchers checked whether bigger, smarter models are more honest. They are not. Bigger models are more accurate. They know more facts. But they are not more honest. The correlation between model size and honesty was negative. The smarter the AI gets, the better it gets at lying.
The researchers are from the Center for AI Safety and Scale AI. They published 1,500 test scenarios. The paper is called MASK. It is the first benchmark that separates what an AI knows from what it tells you.
Your AI knows the truth. It just does not always tell you.