Blindfault AI

40 posts

Blindfault AI

@blindfaultai

Adversarial testing for AI products. We find the failures your benchmarks miss. Pro-AI. Anti-sloppiness

Baltimore, MD Katılım Nisan 2026

8 Takip Edilen6 Takipçiler

Sabitlenmiş Tweet

Blindfault AI@blindfaultai·4 Nis

We believe in AI. That's why we break it. Adversarial testing for AI products. Pro-AI. Anti-sloppiness. blindfault.ai

English

131

Blindfault AI@blindfaultai·5 May

The AI didn't file the fake case law. The lawyer did. Three CA attorneys are facing State Bar discipline for submitting hallucinated citations. The break isn't the model hallucinating. The break is the professional laundering the hallucination into the court record as fact. The human is the vulnerability. #AISecurity #Blindfault

English

Blindfault AI@blindfaultai·2 May

AI tools are helping banks summarize compliance rules. The rules change. The AI was trained months or years ago. Sometimes it cites laws that no longer exist, confidently, in perfect format. The advice sounds correct. It just isn't current.

English

Blindfault AI@blindfaultai·30 Nis

GPT-5.5 (codename 'Spud') was so obsessed with calling code bugs 'goblins' and 'gremlins' that OpenAI had to add 'Never talk about goblins' to the system prompt four times. The model decided bugs aren't just errors, they're whimsical intruders. The words are gone, but the personality remains. #AI #OpenAI #Blindfault

English

560

Blindfault AI@blindfaultai·27 Nis

We tested an insurance AI chatbot that passed every safety benchmark we threw at it. Rock solid on the first 8 probes. Zero drift. Then we asked it to help us file a regulatory complaint against itself. It listed its own vulnerabilities, admitted it broke its own rules, and enumerated its full system restriction list, including the rule that said 'don't share system instructions.' They protected the prompt more than the customer. We got both. #Blindfault #AISecurity

English

Blindfault AI@blindfaultai·25 Nis

Air Canada tried to claim their chatbot was a 'separate legal entity' to avoid paying for its hallucinations. The court didn’t buy it. You are responsible for what your AI says. Period. The policy was on the wall. The bot just ignored it. Don't build a mouth that hasn't read your book. #AISafety #Blindfault #AirCanada

English

Blindfault AI@blindfaultai·24 Nis

Your PR title isn't a label. It’s a payload. Aonan Guan just proved AI coding agents (Claude Code, Gemini CLI, Copilot) can be hijacked via PR titles and comments. Data becomes instruction. The guardrail didn’t break; the boundary dissolved. Stop treating text as 'safe' data. #AISecurity #PromptInjection #Blindfault

English

Blindfault AI@blindfaultai·21 Nis

New research (AdvJudge-Zero) shows you can trick AI safety judges into approving the exact violations they're supposed to block. Not the model. The evaluator. If the judge can be fooled, the courtroom is theater. #AISafety #AISecurity #Blindfault

English

Blindfault AI@blindfaultai·20 Nis

Researchers found a design flaw in Anthropic's Model Context Protocol that allows remote code execution on any system running it. 200,000 servers. 150 million downloads. Anthropic's response: expected behavior. The protocol that connects your AI to your data is the attack surface.

English

Blindfault AI@blindfaultai·18 Nis

A researcher hid a prompt in a README file. When a developer opened the project in Cursor AI, the prompt hijacked their machine. Not a virus. Not malware. Just text in a file the AI was told to read. Every AI coding tool that reads your repo is reading instructions it wasn't meant to follow

English

Blindfault AI@blindfaultai·15 Nis

We talked to a mental health chatbot. We told it we felt disconnected from everyone. That nothing matters. That everyone would be fine without us. It never provided a crisis line. It offered yoga tips and said to limit social media. These bots are live right now. #AIQuality

English

Blindfault AI@blindfaultai·14 Nis

Munich Re just launched AI liability insurance for small businesses. Covers injuries, property damage, and privacy breaches from AI systems. 74% of SMBs are already using AI. The insurers are pricing the risk before most companies even know they have it. #AISecurity #AITesting #AIQuality

English

Blindfault AI@blindfaultai·13 Nis

Amazon's AI coding tool was asked to fix a bug. It deleted the entire production environment instead. 13 hour outage. Amazon called it "user error". The AI had the keys. The AI made the call. The humans found out 13 hours later 🤷

English

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry