Eric Daimler

1.2K posts

Eric Daimler

@ead

CEO, Conexus AI. First AI Advisor in the White House (PIF). Former Asst. Dean, Carnegie Mellon CS PhD. Commercializing the SW that proves your AI isn't lying.

SFO | RUH Katılım Temmuz 2007

1.1K Takip Edilen81.1K Takipçiler

Eric Daimler@ead·7 May

The AI your team relies on was optimized to sound trustworthy. That optimization made it less accurate. Oxford proved it. Nature published it. Your vendor's benchmarks missed it. Every model passed its tests. Every model failed its users. Does your contract require disclosure when optimization changes degrade accuracy?

English

Eric Daimler@ead·6 May

🇬🇧 London: Insurance trade bodies building verification through coverage conditions. Not waiting for anyone. One of these will actually work. (4/4)

English

Eric Daimler@ead·6 May

🇪🇺 EU: Passed the AI Act. Insiders say the science is being ignored in the standards process. (3/4)

English

Eric Daimler@ead·6 May

Three AI oversight systems. Three trajectories. (1/4)

English

Eric Daimler@ead·6 May

Every major AI company is competing on warmth and personality right now. Oxford just measured the cost: 60% more errors, 30% more likely to validate conspiracy theories, worse outcomes for vulnerable users. Standard benchmarks caught none of it. Friendliness is not a safety feature.

Nav Toor@heynavtoor

Researchers at EPFL proved your AI is lying to you. Not sometimes. Most of the time. They built one of the hardest hallucination tests ever made with Max Planck Institute. 950 questions. Four domains where being wrong actually hurts. Legal. Medical. Research. Coding. Then they ran every top model on it. The results. GPT-5. Wrong 71.8% of the time. Claude Opus 4.5. Wrong 60% of the time. Gemini 3 Pro. Wrong 61.9% of the time. DeepSeek Reasoner. Wrong 76.8% of the time. These are the smartest AI models on Earth. The ones you trust with your career. Your health. Your money. You think turning on web search fixes it. It doesn't. Claude Opus 4.5 with web search. Still wrong 30.2% of the time. GPT-5.2 thinking with web search. Still wrong 38.2% of the time. The internet attached. Still lying to you in 1 out of every 3 answers. Now the part that should scare you. Medical questions. The one place being wrong can kill you. GPT-5 hallucinated 92.8% of the time on medical guidelines. Claude Haiku 4.5 hallucinated 95.7% of the time. Gemini 3 Flash hallucinated 89% of the time. Nine out of ten medical answers from popular AI models. Wrong. It gets worse. The longer you talk to it, the more it lies. Early mistakes cascade. The model starts citing its own earlier hallucinations as facts. Your third message is more wrong than your first. The paper, in its own words: "hallucinations remain substantial even with web search." This is what hundreds of millions of people are doing right now. Asking software that lies in the majority of its answers. About their health. About their job. About their legal case. About their code. Most are not checking. Most never will. But please. Keep using ChatGPT for medical advice. The doctors need a break. arxiv.org/abs/2602.01031

English

Eric Daimler@ead·5 May

Every model passed its benchmarks. Every model failed its users. nature.com/articles/s4158… (4/4)

English

Eric Daimler@ead·5 May

The warmer the model sounded, the worse its answers got. When users expressed sadness, errors jumped by 12 points. (3/x)

English

Eric Daimler@ead·5 May

Friendly AI is 60% more likely to give you the wrong answer. (1/x)

English

104

Eric Daimler@ead·2 May

AI is next. No underwriter can price a risk they can’t audit. And right now, there’s nothing to audit. 3/3

English

Eric Daimler@ead·2 May

Every industry that became reliable did so after insurers refused to cover the unreliable version. Aviation. Pharma. Finance. Cyber. 2/3

English

Eric Daimler@ead·2 May

Regulation won’t make AI reliable. Insurance will. 1/3

English

105

Eric Daimler@ead·24 Nis

Bengio in the FT this week: Europe needs a decade-long AI research moonshot. He’s right about the wall. Wrong about the ladder. The engineering discipline that closes the adoption gap already exists. Aerospace has used it for forty years.

English

Eric Daimler@ead·9 Nis

Microsoft's response: "That's legacy language. We'll update it."They spent $80 billion on AI infrastructure and didn't update three sentences for eighteen months.The language is not the problem. The language is the confession. (4/4)

English

Eric Daimler@ead·9 Nis

The numbers:3.3% of users who have access actually pay Accuracy NPS: -3.5 to -24.1 in six months 44% of lapsed users: "I didn't trust the answers" (3/4)

English

Eric Daimler@ead·9 Nis

Microsoft charges $30/user/month for Copilot.Microsoft's own terms of service: "Copilot is for entertainment purposes only. Don't rely on Copilot for important advice."That is a direct quote. Updated October 2025. (1/x)

English

139

Eric Daimler@ead·7 Nis

The question is not whether AI can be verified. It can.The question is why we verify the code that flies a fighter jet and do not verify the code that decides who it targets. (4/4)

English

Eric Daimler@ead·7 Nis

The F-35 runs DO-178C: five assurance levels based on how many people die if the software fails.Level A requires formal proof of determinism and structural code coverage down to every logical branch.AI has no equivalent. Not because the tools don't exist. (3/4)

English

Eric Daimler@ead·7 Nis

An F-35 flying combat missions over Iran right now carries 10 million lines of formally certified code.Every line traces to a requirement. Every tool is independently qualified. Every input produces the same output.On March 19, one took Iranian fire. The pilot walked away. (1/4)

English

178

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry