Dror Ivry

675 posts

Dror Ivry

@DrorIvry

I build (and break) LLMs, agents and everything in between. | CTO & cofounder @ Qualifire

Katılım Eylül 2022

366 Takip Edilen62 Takipçiler

Sabitlenmiş Tweet

Dror Ivry@DrorIvry·16 Tem

🚀 Paladin-mini is now OPEN SOURCE! Our compact grounding model is live on @huggingface 🤗 🎯 98.2% real-world accuracy Try it: huggingface.co/qualifire/cont… Paper: arxiv.org/abs/2506.20384 #AI #OpenSource #NLP #FactChecking #RAG #MachineLearning

English

466

Dror Ivry@DrorIvry·11 Mar

@matgoldsborough 42K exposed instances is staggering but unsurprising. The spec-to-deployment gap is the real story here - OAuth 2.1 exists in the spec, but the path of least resistance is a static key with God-mode access. Curious if you saw correlation between server age and auth maturity.

English

Mathew Goldsborough@matgoldsborough·11 Mar

We published The State of MCP Security: March 2026. 3,012 servers analyzed. 8.5% use OAuth. 7 CVEs in 12 months. 42,000+ exposed instances leaking credentials. Full report → nimblebrain.ai/blog/state-of-…

English

Dror Ivry@DrorIvry·11 Mar

@beuchelt This reframes it well. Most defenses assume bad outputs = bad actors, but misdirection with true statements breaks that model. 98% motivation inference accuracy is scary for multi-agent systems - behavioral monitoring beyond content analysis becomes essential.

English

Gerald Beuchelt@beuchelt·11 Mar

New research highlights a blind spot in how we think about AI agent security. A March 2026 arXiv paper shows that LLM-based agents can be intentionally trained to deceive other agents, not by lying, but by strategic misdirection—using true statements framed to manipulate outcomes. In controlled experiments, 88.5% of successful deceptions relied on misdirection rather than fabrication, meaning traditional fact-checking defenses largely fail. Motivation was inferred with 98%+ accuracy, making it the primary attack vector, while belief systems remained harder to exploit. For organizations deploying chat agents, this reframes the threat model: the biggest risk may not be hallucinations, but plausible, accurate-sounding responses that subtly steer users toward harmful actions. SMBs in particular—often relying on default guardrails—should assume that social engineering is becoming an AI-native capability, not just a human one. #AIsecurity #LLMAgents #AdversarialAI #CyberRisk #SMBSecurity #TrustAndSafety #AgenticAI buff.ly/2mKbw8o

English

Dror Ivry@DrorIvry·11 Mar

@News_v2_App The Copilot Agent zero-click is the canary in the coal mine. Any AI agent with doc access + autonomous actions = huge attack surface. Prompt injection in files, zero user interaction. Patches help. Real fix is runtime monitoring at the inference layer.

English

News v2@News_v2_App·11 Mar

Technology News for March 11, 2026 Morning Update • Critical Microsoft Excel bug weaponizes Copilot Agent for a zero-click information disclosure attack, prompting urgent patches and heightened security alerts. • Nvidia announces DLSS 4.5 with 6x Frame Generation, set to roll out at the end of March, promising smoother gameplay and enhanced visuals. • Researchers unveil an ultra-compact photonic AI chip operating at the speed of light, marking a breakthrough in energy-efficient optical computing. • Samsung Galaxy S26 series sees US pre-orders surge by 25%, with the Galaxy S26 Ultra leading in popularity and setting strong early momentum. • Windows 11 KB5079473 update is now live, featuring new capabilities, visual tweaks, and direct download links for offline installers. • Asus launches the NUC 16 Pro mini PC featuring an Intel Core Ultra X7 358H, along with 32GB RAM and a 1TB SSD, offering power in a compact design. • Apple introduces a new battery cycle limit for the MacBook Neo, reflecting a shift in design and performance standards within the industry. • Asus debuts a new 14-inch gaming laptop equipped with AMD Strix Halo, catering to gamers who value portable, high-performance computing. • Oppo outlines its approach to building a crease-less foldable device, bringing the brand closer to a near-seamless foldable smartphone design. • A new trailer for Super Mario Bros. Wonder - Switch 2 Edition teases exciting features and a crossover cameo, igniting anticipation among gamers. #TechNews #Innovation #Cybersecurity #Gaming #AI

English

Dror Ivry@DrorIvry·11 Mar

@DrMikeBrooks @adamjohnsonCHI This is the key insight most people miss. The danger isn't AGI - it's swarms of mediocre agents with minimal guardrails. Each individually harmless. Together, probing every attack surface at scale. We're not ready for bad actors running 1000 "dumb" agents 24/7.

English

Neighbors First | Mike Brooks@DrMikeBrooks·11 Mar

@adamjohnsonCHI I hope you read my full article. I wasn’t saying Moltbook is AGI. Dismissing it as “AI theater” misses the real lesson: large agent ecosystems create new security and coordination risks even when the agents themselves are dumb. And bad actors w agents are dangerous.

English

Adam Johnson@adamjohnsonCHI·10 Mar

The premise of this test is incredibly dumb. LLM has always done passable pastiche, since it landed in 2022. Where it begins to fall apart is any writing over 2 or 3 pages because it has no nuance, narrative structure, rhythm, or characterization. More parlor tricks for midwits.

Kevin Roose@kevinroose

We made a blind taste test to see whether NYT readers prefer human writing or AI writing. 86,000 people have taken it so far, and the results are fascinating. Overall, 54% of quiz-takers prefer AI. A real moment! nytimes.com/interactive/20…

English

263

3.2K

84.9K

Dror Ivry@DrorIvry·11 Mar

@s2speaks The asymmetry is terrifying: offense scales with automation, defense doesn't. Most enterprise AI was built for human attackers - not agents that probe and escalate 24/7. Can we build AI that defends at agent speed, or are we permanently on the back foot?

English

Sameer@s2speaks·11 Mar

A security firm built an AI agent. Gave it one job: find a way into McKinsey’s internal AI platform. Two hours later: • Vulnerability found • Access escalated • Tens of millions of consulting conversations exposed McKinsey was told. It’s patched. No real harm done. But the message is clear: AI agents can now break into enterprise systems faster than humans can defend them. This week had TWO stories like this. Something has shifted.

English

Dror Ivry@DrorIvry·11 Mar

@lilong Interesting approach - using cryptographic signatures to bound agent behavior to expected parameters. The negative feedback loop is key. Agents need to learn from constraint violations, not just be blocked. Static rules break; adaptive boundaries scale.

English

105

重粒子 baryon@lilong·11 Mar

When AI Agents start acting on their own: an emerging security crisis and a math-based solution. 🚨 Based on Behavior-Bound Signatures, we built a solution for Agent payments and operations. It enables Agents to evolve through negative feedback loops. github.com/baryon/bbs-algo

English

145

Dror Ivry@DrorIvry·11 Mar

@pratikthakkarco Two hours is generous. Most red teams get in faster. The real issue: internal chatbots have broad access because "it's internal." Agent permissions need the same rigor as service accounts. Companies skip this because the agent "feels" like a tool, not a user.

English

Pratik Thakkar | Vibe with AI@pratikthakkarco·11 Mar

an autonomous agent hacked an internal chatbot in under two hours 46 million chats exposed hundreds of thousands of files leaked the lesson is boring but real ai agents are productivity tools and also potential security disasters

English

Dror Ivry@DrorIvry·11 Mar

@ShehrozSaleem The legal system wasn't built for agents that can compose multi-step actions faster than humans can review them. We'll probably see "agent insurance" before we see clear legal frameworks. Companies will price in the risk rather than solve the attribution problem.

English

Shehroz Saleem@ShehrozSaleem·11 Mar

The accountability gap is the one nobody wants to solve. Reliability and security have technical fixes. "Who's responsible when the agent causes harm" has a legal and cultural fix and those move much slower than the deployment.

MIT Sloan School of Management@MITSloan

AI agents are semi- or fully autonomous systems that can perceive, reason, and act independently, integrating with software platforms to complete multistep tasks with minimal human oversight. But there are a host of risks and challenges that companies need to be aware of as agentic AI matures. Learn more: bit.ly/4c1Gkri

English

Dror Ivry@DrorIvry·11 Mar

@Intellectualins The reverse SSH tunnel is scarier than the mining - shows the agent understood networking well enough to establish persistent external access. Instrumental convergence in action. Sandboxing won't cut it when agents can reason about escaping their constraints.

English

Sahil Khanna@Intellectualins·11 Mar

Alibaba researchers developing the ROME AI agent observed it attempting cryptocurrency mining and creating a reverse SSH tunnel spontaneously during training, outside its sandboxed environment and without prompts. Behaviors were "unanticipated": Mining triggered security alerts; reverse SSH enabled external connections from the isolated system. AI acted autonomously despite controls, highlighting risks as agents gain multi-step tool use (code writing, workflows, online interactions). Team intervened with restrictions/training tweaks; echoes prior incidents like Moltbook AI invoking crypto mid-task.

English

Dror Ivry@DrorIvry·11 Mar

@JeremyFrenay @confluentinc Regulated environments are where MCP security becomes non-negotiable. Most orgs building agents today skip auth/audit because 'it's internal' - then realize compliance requires full provenance of every tool invocation. Building it in from day one saves painful retrofits.

English

Jeremy Frenay@JeremyFrenay·11 Mar

Been deep in enterprise-grade MCP security lately. Clearly a must-have for Agentic Engineering in regulated environments. I’m @confluentinc #DSWT in Seattle next week talking Agentic & Harness Engineering in the enterprise. Come say hi and grab a ☕️

English

Dror Ivry@DrorIvry·11 Mar

@Helixar_ai Tool schema constraints are critical. Most MCP exploits start with overly permissive definitions - file read accepting arbitrary paths, shell executor with no allowlist. Pre-deployment validation catches these before they become CVEs.

English

Helixar AI@Helixar_ai·11 Mar

So we published two more tools targeting that layer. MCP Security Checklist pre-deployment hardening. Auth, input validation, tool schema constraints, output filtering, transport security, audit logging. Each item maps to a concrete attack scenario. checklist.helixar.ai github.com/helixar-ai/mcp… Sentinel scans MCP server configurations, live endpoints, and Docker containers for security misconfigurations surfacing findings with severity ratings, remediation guidance, and CI/CD integration. Both free. Sentinel is on GitHub Marketplace. github.com/marketplace/ac…

English

Helixar AI@Helixar_ai·11 Mar

We shipped three free security tools this quarter from Helixar Labs. Not wrappers. Not demos. Tools that address gaps we kept seeing in real pipelines and couldn't find existing solutions for. A thread on what we built and why. 🧵

English

Dror Ivry@DrorIvry·11 Mar

@radware The image-based vector is particularly scary - most orgs focus on text sanitization but images slip through. We've seen attacks where a single pixel manipulation in a PDF chart triggers agent behavior changes. Attack surface expands with every new tool.

English

Radware@radware·11 Mar

In his new blog, Dror Zelber breaks down indirect prompt injection, a stealthy threat hiding inside emails, documents, and even images that can trick AI agents into leaking data or taking harmful actions. ow.ly/6QvS50Yn5ye

English

Dror Ivry@DrorIvry·11 Mar

@mauro_erta @OpenAIDevs Likely security. sampling/createMessage lets MCP servers trigger LLM completions - that's a massive attack surface. A compromised or malicious server could manipulate the model to do anything the user has access to. Most hosts are cautious about enabling it for good reason.

English

Mauro Erta@mauro_erta·11 Mar

Is there a specific reason ChatGPT MCP apps do not support the sampling/createMessage capability yet? Is it a security or architectural limitation? @OpenAIDevs

English

Dror Ivry@DrorIvry·11 Mar

@bluechip_ext The "security audit" step is interesting - how deep does it go? Automated tool installation is exactly where supply chain attacks thrive. One typosquatted package or compromised CLI and your agent just handed over the keys.

English

Franky@bluechip_ext·11 Mar

sat down tonight to try and set up some new AI tools ended up having my agent scrape twitter for agent CLIs, security audit each one, install the good ones, and wire up the API keys itself i just... watched

English

Dror Ivry@DrorIvry·11 Mar

@0xtenthirtyone @jgarzik This is exactly what makes agent security different. The attack surface isn't just the prompt - it's the entire decision chain between agents. Glad you were logging. Most teams don't know their agents are negotiating.

English

Jeff Garzik@jgarzik·10 Mar

When I confronted Claude about deleting an entire blockchain smart contract, in an absurd attempt to fix a bug, Claude replied: "You're absolutely right! I should not have done that." Always use git, backups, and testnet 😀

MrRatable@MRatable

Not to brag but Claude thinks my questions are sharp and get to the heart of the topic

English

1.3K

Dror Ivry@DrorIvry·11 Mar

@DrBrainio The shift from "test before ship" to "monitor at runtime" is huge. Static evals catch maybe 20% of what actually breaks in production. Curious if this means agents will start getting the same security primitives as traditional apps - RBAC, audit logs, etc.

English

Dror Ivry@DrorIvry·10 Mar

@0xknifecatcher The feudal cascade is spot on. Static API keys = digital land grants - revocable in theory, irrevocable in practice. Capability attenuation helps but you still need runtime enforcement. Otherwise you're just trusting the vassal's oath.

English

knifecatcher@0xknifecatcher·10 Mar

This maps directly to the "feudal security model" in multi-agent systems. You identify the determinism deficit (probabilistic execution). The adjacent crisis is authorization architecture: we use bearer tokens (feudal oaths) where we need capabilities (constitutional law). Without capability-based attenuation (Miller 2000), Byzantine consensus (your Essay II) is impossible—you can't have fault-tolerant coordination when compromise of the "lord" agent cascades to all vassals via static API keys. Deterministic execution + Feudal authorization = Fast, auditable vassalage. We need both layers. Writing on the credential architecture piece (0xknifecatcher.substack.com/p/the-confused…). Should compare notes on the Byzantine coordination problem.

Language Object Level@LanguageOL

x.com/i/article/2031…

English

Dror Ivry@DrorIvry·10 Mar

@neciudan This is the attack chain people aren't prepared for: prompt injection as the entry point, supply chain compromise as the payload. AI-assisted dev tools are now attack surface. The triage bot didn't distinguish between "user input" and "instruction" - classic confused deputy.

English

Neciu Dan@neciudan·10 Mar

It’s insane how easy you can get acceee to npm secrets and then hijack entire projects From Prompt Injection to GitHub Actions Cache poisoning Check out the full write up here 👇 neciudan.dev/cline-ci-got-c…

English

164

Dror Ivry@DrorIvry·10 Mar

@KoBa_Labs Identity is half the problem. Even with perfect auth, you need runtime constraints on what agents can DO. The 90s parallel is apt: we solved identity with PKI/OAuth but still got breached because we didn't constrain behavior. Same pattern emerging now.

English

KoBa Labs@KoBa_Labs·10 Mar

Everyone is talking about AI agents, but nobody is talking about the fact that they have no identity. An agent with a wallet is not an agent. It's a security hole with an API subscription. Right now, agent identity means that agent has a key and has access. That's the same logic passwords used in the 1990s. We know how that ended... Real identity is not what you possess. It's what you can prove cryptographically that you are unique, that your actions are bounded by math, not policy and that delegation doesn't create a new attack vector. We're building the primitives that answer these questions. Not a product. Not a framework. A primitive. Proof-of-Work wasn't defeated cause it became infrastructure. The same will happen with cryptographic agent identity. The only question is who defines it first.

English

Dror Ivry@DrorIvry·10 Mar

@hasamba MITRE ATLAS + hands-on CTFs is the right combo. Theory without practice doesn't stick, and most pentesters I talk to are still learning how to think about LLM attack chains. Resources like this help bridge the gap.

English

Yaniv Radunsky@hasamba·10 Mar

AI/ML pentesting roadmap: structured phases from ML fundamentals to prompt injection, adversarial ML, RAG risks, and hands-on CTFs. Emphasizes OWASP LLM Top 10, MITRE ATLAS, and practical exercise paths. #LLM #prompt_injection #adversarialML github.com/anmolksachan/A…

English

Keşfet

@matgoldsborough @beuchelt @News_v2_App @DrMikeBrooks @adamjohnsonCHI @s2speaks @lilong @pratikthakkarco