Pluto Security
36 posts

Pluto Security
@pluto_security
Everyone is a builder now. Security needs visibility and control - without slowing innovation.
Joined Aralık 2025
1.4K Following1.7K Followers

@sama @VampireGurlAI Big move. Most enterprises are still figuring out how to govern the AI they already have. Now they need to secure what's securing them too.
English

@AnthropicAI Opening up bug bounties is the right call. Finding the vulnerability is step one. The harder question is what happens in production before anyone finds it.
English

Our security bug bounty program is now public on HackerOne.
We've run the program privately within the security research community, and their findings have strengthened our products. Now anyone can report vulnerabilities and get rewarded.
Read more: hackerone.com/anthropic
English

@0xcorte That's the goal. Getting there in real time is where most teams get stuck.
English

@satyanadella Agent Mode default across Word, Excel, and PowerPoint. Every enterprise just got a much bigger AI footprint to secure.
English

We're making a big change to the Copilot experience.
Agent Mode is generally available and now the default across Copilot in Word, Excel, and PowerPoint.
As models become more capable, we’re bringing that power to where real work happens, right in the canvas.
The power of a spreadsheet as an example is its spatial representation of information. What sits next to what, what feeds what. Give an agent that canvas to reason over, and a single prompt can reshape the model, the bridge, and the narrative at once.
Read more: microsoft.com/en-us/microsof…
English

@AnthropicAI Models that can self-report misalignment is a big step. The harder question is whether enterprises can act on that signal in real time.
English

In new Anthropic Fellows research, we discuss “introspection adapters": a tool that allows language models to self-report behaviors they've learned during training—including potential misalignment.
keshav@kshenoy_
Can LLMs simply tell us about unwanted behaviors they’ve picked up in training? We train a single Introspection Adapter (IA) that makes fine-tuned models describe their behaviors. It generalizes to detecting hidden misalignment, backdoors and safeguard removal.
English

ClaudeSec is officially LIVE!
Meet the new security-first hub for the Claude ecosystem, powered by @pluto_security.
❓Always yearned for a unified search of all existing extensions?
❓Ever wondered what ones are flagged as high-risk?
❓Dreaming of knowing how to deploy safely with Claude?
All of this (and more) is now waiting for you on our new planet.
Give it a go and let us know in the comments what you thought!
Link in the first comment.
English

@claudeai Agents that learn from every session are powerful. They also accumulate context that most security teams have zero visibility into.
English

@omarsar0 Self-evolving agents are impressive. They're also the hardest thing to put guardrails on.
English

// Self-Evolving Agent Protocol //
One of the more interesting papers I read this week.
(bookmark it if you are an AI dev)
The paper introduces Autogenesis, a self-evolving agent protocol where agents identify their own capability gaps, generate candidate improvements, validate them through testing, and integrate what works back into their own operational framework.
No retraining, no human patching, just an ongoing loop of assessment, proposal, validation, and integration.
Why it's worth reading this paper:
Static agents age quickly.
As deployment environments change and new tools arrive, the agents that survive will be the ones that can safely rewrite themselves. Autogenesis is part of a growing wave of self-improving agent systems, alongside work like Meta-Harness and the Darwin Gödel Machine line, and it's one of the cleaner protocol-level takes on continual self-improvement so far.
Paper: arxiv.org/abs/2604.15034
Learn to build effective AI agents in our academy: academy.dair.ai

English

@Baron_VonSnatch That's what makes this pattern so dangerous. The software is easy to run. The blast radius is not easy to see.
English

@pluto_security These kinds of vulnerabilities must be everywhere right now. Never before have there been this many people on earth willing to download and run unverified software.
English

Our research team disclosed CVE-2026-33032, a critical CVSS 9.8 vulnerability in nginx-ui that exposed over 500K users to full server takeover through a single unauthenticated request. No credentials. No exploit chain. Actively exploited in the wild.
The root cause: MCP endpoints that inherit an application's full capabilities but skip its security controls entirely.
The pattern is clear - and it's only getting more common as agentic workflows connect deeper into enterprise workspace infrastructure.
Most security teams have no visibility into what MCP servers are running in their environment, no inventory of the endpoints they're exposing, and no way to enforce it.
Full breakdown → lnkd.in/dmbkkQAp
As covered by The Hacker News → lnkd.in/gWTZt4e4

English

@wkeything Good point. Action monitoring matters. Visibility into what the agent does after it's in is exactly the gap we're focused on.
English

@pluto_security MCP endpoints inheriting full capabilities but skipping security — exactly the agent blast radius pattern. The gap: nothing watches what the agent does AFTER it authenticates. Shroud monitors the action layer for Hermes, OpenClaw, Codex & Claude Code.
English

@OpenAI More AI capability for defenders is great. The question is who's governing what that AI can access inside your environment while it's defending.
English

We’re expanding Trusted Access for Cyber with additional tiers for authenticated cybersecurity defenders.
Customers in the highest tiers can request access to GPT-5.4-Cyber, a version of GPT-5.4 fine-tuned for cybersecurity use cases, enabling more advanced defensive workflows.
openai.com/index/scaling-…
English

@vercel ZDR solves the data storage problem. The next question is what the agent can access while it's running.
English

AI Gateway now supports team-wide Zero Data Retention (ZDR).
Building safely with multiple AI models means wrestling with fragmented data policies, per-provider negotiations, and the hope that developers do not use non-complaint providers.
AI Gateway changes this with team-wide ZDR.
Gateway ensures your data requirements are automatically met by only routing to providers where we have negotiated ZDR agreements.
Instead of managing policies provider by provider, you get one unified data policy across Claude, GPT, Gemini, and many more providers.
Toggle it on in your dashboard, and all requests will route safely without touching any code:
• Team-wide ZDR
• Per-request controls
• Disallow prompt training
Move compliance to the gateway so your team can keep shipping ↓
vercel.com/blog/zdr-on-ai…
English

@Augmented_Think That's exactly the problem worth solving. Most tools pick a side. The goal is guardrails that move with the work.
English

@pluto_security Boundaries matter, but so does friction. If the boundaries are too tight, you get a sterile mirror. If they're too loose, you get diffusion. The hard part is tuning the friction.
English

@kshitizh The compliance theater problem in one post. Paying for the illusion of security instead of actual security.
English

@blindfaultai @AnthropicAI Exactly! And step three is making sure it can't do that again without someone knowing.
English

@pluto_security @AnthropicAI This. Finding the vulnerability is step one. Testing whether the AI exploits it on its own is step two. That's the gap most teams skip.
English

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software.
It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans.
anthropic.com/glasswing
English





