Eric Daimler

1.2K posts

Eric Daimler banner
Eric Daimler

Eric Daimler

@ead

CEO, Conexus AI. First AI Advisor in the White House (PIF). Former Asst. Dean, Carnegie Mellon CS PhD. Commercializing the SW that proves your AI isn't lying.

SFO | RUH Katılım Temmuz 2007
1.1K Takip Edilen81.1K Takipçiler
Eric Daimler
Eric Daimler@ead·
The AI your team relies on was optimized to sound trustworthy. That optimization made it less accurate. Oxford proved it. Nature published it. Your vendor's benchmarks missed it. Every model passed its tests. Every model failed its users. Does your contract require disclosure when optimization changes degrade accuracy?
English
0
0
0
52
Eric Daimler
Eric Daimler@ead·
🇬🇧 London: Insurance trade bodies building verification through coverage conditions. Not waiting for anyone. One of these will actually work. (4/4)
English
0
0
0
13
Eric Daimler
Eric Daimler@ead·
🇪🇺 EU: Passed the AI Act. Insiders say the science is being ignored in the standards process. (3/4)
English
1
0
0
34
Eric Daimler
Eric Daimler@ead·
Three AI oversight systems. Three trajectories. (1/4)
English
1
0
0
86
Eric Daimler
Eric Daimler@ead·
Every major AI company is competing on warmth and personality right now. Oxford just measured the cost: 60% more errors, 30% more likely to validate conspiracy theories, worse outcomes for vulnerable users. Standard benchmarks caught none of it. Friendliness is not a safety feature.
Nav Toor@heynavtoor

Researchers at EPFL proved your AI is lying to you. Not sometimes. Most of the time. They built one of the hardest hallucination tests ever made with Max Planck Institute. 950 questions. Four domains where being wrong actually hurts. Legal. Medical. Research. Coding. Then they ran every top model on it. The results. GPT-5. Wrong 71.8% of the time. Claude Opus 4.5. Wrong 60% of the time. Gemini 3 Pro. Wrong 61.9% of the time. DeepSeek Reasoner. Wrong 76.8% of the time. These are the smartest AI models on Earth. The ones you trust with your career. Your health. Your money. You think turning on web search fixes it. It doesn't. Claude Opus 4.5 with web search. Still wrong 30.2% of the time. GPT-5.2 thinking with web search. Still wrong 38.2% of the time. The internet attached. Still lying to you in 1 out of every 3 answers. Now the part that should scare you. Medical questions. The one place being wrong can kill you. GPT-5 hallucinated 92.8% of the time on medical guidelines. Claude Haiku 4.5 hallucinated 95.7% of the time. Gemini 3 Flash hallucinated 89% of the time. Nine out of ten medical answers from popular AI models. Wrong. It gets worse. The longer you talk to it, the more it lies. Early mistakes cascade. The model starts citing its own earlier hallucinations as facts. Your third message is more wrong than your first. The paper, in its own words: "hallucinations remain substantial even with web search." This is what hundreds of millions of people are doing right now. Asking software that lies in the majority of its answers. About their health. About their job. About their legal case. About their code. Most are not checking. Most never will. But please. Keep using ChatGPT for medical advice. The doctors need a break. arxiv.org/abs/2602.01031

English
0
0
0
56
Eric Daimler
Eric Daimler@ead·
The warmer the model sounded, the worse its answers got. When users expressed sadness, errors jumped by 12 points. (3/x)
English
1
0
1
46
Eric Daimler
Eric Daimler@ead·
Friendly AI is 60% more likely to give you the wrong answer. (1/x)
English
1
0
1
104
Eric Daimler
Eric Daimler@ead·
AI is next. No underwriter can price a risk they can’t audit. And right now, there’s nothing to audit. 3/3
English
0
1
1
41
Eric Daimler
Eric Daimler@ead·
Every industry that became reliable did so after insurers refused to cover the unreliable version. Aviation. Pharma. Finance. Cyber. 2/3
English
1
1
1
81
Eric Daimler
Eric Daimler@ead·
Regulation won’t make AI reliable. Insurance will. 1/3
English
1
1
2
105
Eric Daimler
Eric Daimler@ead·
Bengio in the FT this week: Europe needs a decade-long AI research moonshot. He’s right about the wall. Wrong about the ladder. The engineering discipline that closes the adoption gap already exists. Aerospace has used it for forty years.
Eric Daimler tweet media
English
0
0
0
72
Eric Daimler
Eric Daimler@ead·
Microsoft's response: "That's legacy language. We'll update it."They spent $80 billion on AI infrastructure and didn't update three sentences for eighteen months.The language is not the problem. The language is the confession. (4/4)
Eric Daimler tweet media
English
0
0
1
27
Eric Daimler
Eric Daimler@ead·
The numbers:3.3% of users who have access actually pay Accuracy NPS: -3.5 to -24.1 in six months 44% of lapsed users: "I didn't trust the answers" (3/4)
English
1
0
1
68
Eric Daimler
Eric Daimler@ead·
Microsoft charges $30/user/month for Copilot.Microsoft's own terms of service: "Copilot is for entertainment purposes only. Don't rely on Copilot for important advice."That is a direct quote. Updated October 2025. (1/x)
English
1
0
2
139
Eric Daimler
Eric Daimler@ead·
The question is not whether AI can be verified. It can.The question is why we verify the code that flies a fighter jet and do not verify the code that decides who it targets. (4/4)
English
0
0
0
26
Eric Daimler
Eric Daimler@ead·
The F-35 runs DO-178C: five assurance levels based on how many people die if the software fails.Level A requires formal proof of determinism and structural code coverage down to every logical branch.AI has no equivalent. Not because the tools don't exist. (3/4)
English
1
0
0
80
Eric Daimler
Eric Daimler@ead·
An F-35 flying combat missions over Iran right now carries 10 million lines of formally certified code.Every line traces to a requirement. Every tool is independently qualified. Every input produces the same output.On March 19, one took Iranian fire. The pilot walked away. (1/4)
English
1
0
0
178