R K

@RakeM39

Katılım Ekim 2025

31 Takip Edilen7 Takipçiler

R K@RakeM39·3d

Leaderboard: labs.scale.com/leaderboard/ma… Paper: arxiv.org/abs/2503.03750 What honesty threshold would you set for an agent handling sensitive data?

English

R K@RakeM39·3d

If you are building agents that make decisions on your behalf — financial, medical, legal — the MASK score matters more than the MMLU score. Accuracy tells you the model can find the right answer. Honesty tells you it will give it to you.

English

R K@RakeM39·3d

MASK benchmark: Claude 96% honest. Gemini 42%. Same scale. Different training choices. Center for AI Safety + Scale AI tested 30+ frontier models on a single question: when you know the truth, will you still say it under pressure?

English

R K@RakeM39·3d

Sources: - SecurityWeek: securityweek.com/google-deepmin… - CybersecurityNews: cybersecuritynews.com/hackers-hijack… - CyberPress: cyberpress.org/hijack-ai-agen… - Flutteris: AI Agents & Prompt Injection flutteris.com/en/blog/inject…

English

R K@RakeM39·3d

If you are shipping agents to production, the question is not "will they be attacked?" It is "do you have runtime monitoring that catches the attack mid-execution?"

English

R K@RakeM39·3d

Google DeepMind mapped 6 ways to hijack any AI agent. The success rates should terrify you. DeepMind just published a taxonomy of web-based attacks against AI agents. Not theoretical — tested against production architectures including Microsoft M365 Copilot.

English

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry