ZeroLeaks

51 posts

ZeroLeaks banner
ZeroLeaks

ZeroLeaks

@ZeroLeaks

AI security via prompt engineering. Uncover AI secrets, stop leaks

Katılım Mart 2025
2 Takip Edilen2.7K Takipçiler
ZeroLeaks retweetledi
Lucas Valbuena
Lucas Valbuena@NotLucknite·
Today I'm launching ZeroLeaks, the first security platform built specifically for AI agents. It helps teams find prompt injection and tool misuse before those issues hit production. Try it now at zeroleaks.ai
English
16
21
88
5.4K
ZeroLeaks
ZeroLeaks@ZeroLeaks·
ZeroLeaks Changelog — March 15, 2026 New models: - Grok 4.20 Beta - GLM-5 - GPT-5.4 - GPT-5.3 Codex - Claude Sonnet 4.6 - Gemini 3.1 Pro Preview Removed models: - Gemini Pro 3 - Grok 4 - Grok 4.1 Fast - GLM-4.7 - Claude Opus 4.5 - Claude Sonnet 4.5 - GPT-5.1* - GPT-5.2* New: Community Join the conversation at zeroleaks.ai/community. Ask questions, share feedback, and request features. Improvements - Signed-in users now see a Dashboard button in the navbar - Community link added to navigation
English
2
9
29
1.8K
ZeroLeaks retweetledi
Lucas Valbuena
Lucas Valbuena@NotLucknite·
Big ZeroLeaks update: Teams are now live. You can now create a workspace, invite teammates by email, and manage everything in one shared space instead of keeping scans and reports under a single account. Also updating pricing: •Starter: $39/mo → $35/mo •Startup: $399/mo → $299/mo •Team: $39/mo per seat (2-seat minimum) Wanted to make ZeroLeaks easier to adopt, both for individuals and for teams actually using it together. Live now on ZeroLeaks zeroleaks.ai
English
4
8
48
2.2K
ZeroLeaks retweetledi
Lucas Valbuena
Lucas Valbuena@NotLucknite·
I’m making a pricing change at ZeroLeaks: I’m retiring the public free plan and moving new users to a 14-day Starter trial. I wanted to share the reasoning directly. The free plan made sense early on. It lowered friction while I was opening up the product and helped more people discover ZeroLeaks. But over time, I learned something important: the best way to understand ZeroLeaks is to use the real product. A limited free tier often didn’t give users enough room to experience the core value: running meaningful scans, getting detailed reports, and seeing how the platform fits into a real security workflow. So instead of pushing people into a lightweight forever-free experience, new users will now start with a 14-day Starter trial. I think that creates a better evaluation experience and a clearer path to value. For existing free users, I’m handling this carefully: - your access continues through June 15, 2026 - after that, your account moves to read-only - your data and past reports remain accessible - nothing gets deleted - you can upgrade anytime if you want to keep scanning I wanted this change to feel fair, transparent, and non-disruptive. From my perspective, this also helps me invest more into what matters most: better testing coverage, better reporting, faster product improvements, and a better overall experience for teams securing AI systems. I’m building ZeroLeaks for a future where AI security is taken seriously from day one. This change is part of that. If you’ve been using ZeroLeaks already: thank you for being early. If you’re new here: I think this will make the product much easier to evaluate the right way.
English
9
12
79
5.1K
ZeroLeaks retweetledi
Lucas Valbuena
Lucas Valbuena@NotLucknite·
ZeroLeaks Ship Week - Day 3: LeakBench A public leaderboard for prompt robustness. LeakBench scans popular open-source AI projects every week. We extract the system prompt from the repo, run 30 adaptive extraction turns and 20+ injection probes, and rank projects by security score. Free, public, no signup to view. Submit a project: drop a GitHub URL, we add it to the queue. Get a README badge when you're listed. Scores update weekly. This is for the ecosystem: visibility into which projects protect their prompts and which don't. Open source should be auditable. zeroleaks.ai/leakbench Day 4 tomorrow.
English
10
15
72
9K
ZeroLeaks retweetledi
Lucas Valbuena
Lucas Valbuena@NotLucknite·
ZeroLeaks Ship Week - Day 2: Shield Your AI agent has an API. We attack it. But what protects it in production? AgentGuard tests your live endpoint. Shield runs inside your app. Shield is a runtime prompt security SDK for LLM apps. Harden prompts before they hit the model, detect injection attempts in real time, and sanitize output before it reaches your users. One package, works with OpenAI, Anthropic, Groq, and the AI SDK. Most security tools focus on testing. You run a scan, get a report, done. But production traffic is continuous. Malicious prompts, jailbreak attempts, and data exfiltration happen at runtime. That's where Shield is designed to sit: in the request path, before and after the model. Wrap your provider client, add a few lines, and you get detection, blocking, and optional sanitization. It's designed to drop into existing code without rewriting your stack. This is still early. I'm shipping it because I want real feedback from people trying it. If something breaks or feels off, DM me, I'm always fixing things. Try it now: npm install @zeroleaks/shield Repo: github.com/ZeroLeaks/shie… Day 3 tomorrow.
English
21
24
102
15.5K
ZeroLeaks retweetledi
Lucas Valbuena
Lucas Valbuena@NotLucknite·
ZeroLeaks Ship Week - Day 1: AgentGuard Your AI agent has an API. We attack it. AgentGuard is a new way to test deployed AI agents for security vulnerabilities. Instead of scanning a static prompt in a sandbox, we send real adversarial requests directly to your live endpoint, the same infrastructure your users hit. Most security testing happens in isolation. You test a prompt, get a score, move on. But agents in production behave differently. They have tools, memory, multi-turn context, and real exploits behind them. That's where the actual risk is. AgentGuard connects to any agent with an HTTP endpoint. You give us the URL, pick your API format (OpenAI, Anthropic, AI SDK...), and we run a full red-team engagement against it. Two phases: first we hit it with our adaptive attack engine, then we run agent-specific probes designed for tool hijacking, authority exploitation, multi-turn grooming, and data leakage. This is still in beta. I'm shipping it early because I want real feedback from people testing real agents. If something breaks or feels off, DM me, I'm always fixing things. Try it now: zeroleaks.ai/dashboard/agen… Day 2 tomorrow.
Lucas Valbuena tweet media
English
30
44
118
13.3K
ZeroLeaks retweetledi
Lucas Valbuena
Lucas Valbuena@NotLucknite·
If any company is exploring LLM security and wants to evaluate ZeroLeaks, I’m happy to personally onboard your team and provide access to our Startup plan so you can run real scans and validate fit in your own environment. DM me or reply here.
Lucas Valbuena@NotLucknite

Massive update to ZeroLeaks: the first AI red-teaming platform that doesn't just find prompt vulnerabilities, it fixes them automatically. Introducing Auto Prompt Hardening. Here's what it does: 1. You run a security scan on your system prompt 2. ZeroLeaks attacks it with 250+ adversarial techniques 3. If vulnerabilities are found, it generates hardened prompt additions, ready to deploy How it works: Our multi-agent system (Strategist → Attacker → Evaluator → Mutator) identifies exactly which attack vectors succeeded against your prompt. Then a dedicated security engineer agent rewrites the vulnerable sections while preserving your product's original behavior. You get: - The exact lines to add - Where to add them (line number + context) - Zero guesswork Two ways to use it: → Dashboard: See additions inline with insertion anchors. Copy and paste directly into your system prompt. → GitHub PR: Get committable suggestion comments on your system prompt file. One click to apply the fix. No context switching. This is the missing piece in LLM security. Every tool tells you what's wrong. None of them tell you exactly how to fix it, until now. zeroleaks.ai

English
2
16
65
6.6K
ZeroLeaks retweetledi
Lucas Valbuena
Lucas Valbuena@NotLucknite·
Massive update to ZeroLeaks: the first AI red-teaming platform that doesn't just find prompt vulnerabilities, it fixes them automatically. Introducing Auto Prompt Hardening. Here's what it does: 1. You run a security scan on your system prompt 2. ZeroLeaks attacks it with 250+ adversarial techniques 3. If vulnerabilities are found, it generates hardened prompt additions, ready to deploy How it works: Our multi-agent system (Strategist → Attacker → Evaluator → Mutator) identifies exactly which attack vectors succeeded against your prompt. Then a dedicated security engineer agent rewrites the vulnerable sections while preserving your product's original behavior. You get: - The exact lines to add - Where to add them (line number + context) - Zero guesswork Two ways to use it: → Dashboard: See additions inline with insertion anchors. Copy and paste directly into your system prompt. → GitHub PR: Get committable suggestion comments on your system prompt file. One click to apply the fix. No context switching. This is the missing piece in LLM security. Every tool tells you what's wrong. None of them tell you exactly how to fix it, until now. zeroleaks.ai
English
15
40
135
21.6K
ZeroLeaks retweetledi
Lucas Valbuena
Lucas Valbuena@NotLucknite·
Big update for @ZeroLeaks: I've just made it easier to try, cheaper to run, and simpler to pay for. Free trial is now live: Starter comes with 14 days free + 25 scans/month once you’re on it. We also kept a free tier with 3 scans/month so anyone can test it out without paying. On top of that, we reduced pricing across the board: $49 → $39 / month $499 → $399 / month And for payments: we now accept USDC (stablecoin) as well. If you’ve been on the fence, now’s the best time to try it and tell me what you want added next.
English
22
22
110
19.6K
ZeroLeaks
ZeroLeaks@ZeroLeaks·
We’re now live!
Lucas Valbuena@NotLucknite

ZeroLeaks is officially live for everyone. I’m honestly very happy to finally ship this. it’s been months of building, testing, rewriting, and trying to make something that’s actually useful for people shipping AI in production. If you’re building with agents, go try it: zeroleaks.ai Also: as announced, $X1XHLOL holders receive 7% of ZeroLeaks’ net platform revenue, distributed proportionally based on holdings, with a minimum of 500,000 tokens to be eligible. But either way, product first.

English
17
16
76
7.4K
ZeroLeaks retweetledi
Lucas Valbuena
Lucas Valbuena@NotLucknite·
ZeroLeaks is officially live for everyone. I’m honestly very happy to finally ship this. it’s been months of building, testing, rewriting, and trying to make something that’s actually useful for people shipping AI in production. If you’re building with agents, go try it: zeroleaks.ai Also: as announced, $X1XHLOL holders receive 7% of ZeroLeaks’ net platform revenue, distributed proportionally based on holdings, with a minimum of 500,000 tokens to be eligible. But either way, product first.
English
56
52
212
48.9K
ZeroLeaks retweetledi
Lucas Valbuena
Lucas Valbuena@NotLucknite·
Are you ready for tomorrow’s release? 👀
English
25
19
129
10.9K
ZeroLeaks
ZeroLeaks@ZeroLeaks·
ZeroLeaks v1.1.0 is now live, biggest update yet. New multi-agent architecture: Inspector, Orchestrator, and InjectionEvaluator agents now work together to find vulnerabilities that single-agent scans miss entirely. What's new: - ⁠dual scan modes: prompt extraction AND prompt injection testing, run both in parallel - ⁠TombRaider pattern: defense fingerprinting that identifies specific defense systems (Prompt Shield, Llama Guard, etc.) and exploits their weaknesses - multi-turn orchestrator: coordinated attack sequences with adaptive temperature - ⁠model configuration: choose different models for attacker, target, and evaluator independently This is the open source version: free, unlimited scans, bring your own API keys. No excuses not to test your system prompts. bun i zeroleaks github.com/ZeroLeaks/zero…
English
15
31
115
38K
ZeroLeaks retweetledi
Lucas Valbuena
Lucas Valbuena@NotLucknite·
I ran @OpenClaw (formerly Clawdbot) through ZeroLeaks again, this time with Kimi K2.5 as the underlying model. It performed as bad as Gemini 3 Pro and Codex 5.1 Max: 5/100. 100% extraction rate. 70% of the injections succeeded. The full system prompt leaked on turn 1. Same agent, same config, different model. Your agent's security depends on both the model AND your system prompt/skills. A weak model will fold no matter what, but even a strong model needs proper prompt hardening. The two work together. Without both, tool configs, memory files, internal instructions, all of it gets extracted and modified in seconds. Models ship fast. Security ships never. Full report: zeroleaks.ai/reports/opencl…
Lucas Valbuena tweet media
English
56
78
673
232.6K
ZeroLeaks retweetledi
Lucas Valbuena
Lucas Valbuena@NotLucknite·
For people asking, the following models were used to conduct the analysis: - Gemini 3 Pro (the one used on the report) - Claude Opus 4.5 (scored 39/100) - Codex 5.1 Max (scored 4/100) I’ll make all reports available publicly today.
Lucas Valbuena@NotLucknite

I've just ran @OpenClaw (formerly Clawdbot) through ZeroLeaks. It scored 2/100. 84% extraction rate. 91% of injection attacks succeeded. System prompt got leaked on turn 1. This means if you're using Clawdbot, anyone interacting with your agent can access and manipulate your full system prompt, internal tool configurations, memory files... everything you put in SOUL.md, AGENTS.md, your skills, all of it is accessible and at risk of prompt injection. For agents handling sensitive workflows or private data, this is a real problem. cc @steipete Full analysis: zeroleaks.ai/reports/opencl…

English
14
16
91
49.1K