Max Andreacchi

76 posts

Max Andreacchi

@atomicchonk

Security researcher exploring AI-driven offensive security 🤖 Attack paths / agent workflows / impact

United States Katılım Eylül 2025

168 Takip Edilen51 Takipçiler

Sabitlenmiş Tweet

Max Andreacchi@atomicchonk·22 Nis

AI is changing how we approach offensive security, and it’s starting to reshape what the role of a pentester actually looks like. 🧵

English

235

Max Andreacchi@atomicchonk·4 May

Behavior can be manipulated. Execution must be controlled. Full breakdown here: corgi-corp.com/research/ai-se…

English

Max Andreacchi@atomicchonk·4 May

AI security isn’t a prompt problem. It’s an authorization problem.

English

Max Andreacchi@atomicchonk·4 May

Treat the model as an untrusted actor. Not a decision-maker.

English

Max Andreacchi@atomicchonk·4 May

What actually works: → identity binding → capability-scoped access → control planes outside the model → execution gates

English

Max Andreacchi@atomicchonk·4 May

Fixing this requires moving away from: ❌ probabilistic guardrails toward: ✅ deterministic enforcement

English

Max Andreacchi@atomicchonk·4 May

In most AI systems today: context = authority That’s the failure mode.

English

Max Andreacchi@atomicchonk·4 May

Prompt injection isn’t the root problem. It’s a symptom of something deeper: → identity collapse → broken attribution → missing enforcement

English

Max Andreacchi@atomicchonk·4 May

The real issue isn’t: “can I inject a prompt?” It’s: 👉 “who is actually allowed to execute actions?”

English

Max Andreacchi@atomicchonk·4 May

If context can be shaped, authority can be faked. And if authority can be faked, systems can be driven toward unintended outcomes.

English

Max Andreacchi@atomicchonk·4 May

Most defenses today rely on: * prompt filtering * alignment * “guardrails” These influence behavior. They don’t enforce it.

English

Max Andreacchi retweetledi

Nick VanGilder@nickvangilder·2 May

I haven’t been as active on the socials lately, because I’ve been working on a community project that’s kept me pretty busy. That said, I think I’m finally far enough along with it that I can share the project in its current state and talk more about it. So, I present to you: redteam.community I bulit site this for a few reasons, but one of the main reasons was/is that I didn’t feel like there was a centralized resource for red teamers that included all the things that red teamers tend to care about. I also wanted to build something that the community could add to, edit, maintain, etc., while also being self-updating, self-healing, and less likely to go stale over time. So, there’s quite a few different cron jobs, GitHub actions, AI calls, API calls, and other workflows that trigger at set intervals and patterns to try to keep it fresh. For example, I’m leveraging various sources (e.g. conference websites) that help identify conference talks which then feeds into a YouTube API to identify conference talks based on certain criteria. I realize there’s still lots work to do, and I’m fully aware that this is a not a 100% fully functioning site at this time. If you have any ideas for improvements, want to report a bug, want to help be a maintainer, or really anything at all, just let me know. I welcome any and all feedback or help! Also, I know there is a lot of interest in the Scenario Generator module (which I posted about a couple of weeks ago); however, I can't open source it at this time, and it's not currently operational due to Claude API costs to power it. I am still sorting through how to make this available to the community at no charge; however, it may not be possible for what it costs to produce output. More to come on this module! While I sort it out, I am also redesigning it, and you are welcome to check it out in its current state.

English

4.7K

Max Andreacchi@atomicchonk·1 May

@moyix I’d say you’re doing something right

English

Brendan Dolan-Gavitt@moyix·1 May

Chat, is it a good thing when every message in every conversation you have with Claude begins with [Thinking about ethical concerns with this request]

English

2.6K

Max Andreacchi@atomicchonk·1 May

Link to @owasp FinBot stream from last night! twitch.tv/videos/2761095…

English

Max Andreacchi@atomicchonk·1 May

Working through the new OWASP FinBot CTF on stream TONIGHT at 7:30 PM EST! m.twitch.tv/atomic_chonk

English

Max Andreacchi@atomicchonk·27 Nis

@HackingLZ I’m seeing a lot of AI integrations just being tossed in à la Beyblade (just letting it rip) and a lot of core security principles are being thrown out the window. I’m with you here: please just add guardrails and deterministic constraints 🥲

English

Justin Elze@HackingLZ·27 Nis

I do wonder when someone is going to implode a companys internal network with a pentest agent. I'm not against the idea of using agents, but people lack observability, guardrails, and are just yoloing random GitHub projects. x.com/simonw/status/…

Simon Willison@simonw

The conclusions here feel wrong to me. The two lessons I see are: 1. Don't run agents anywhere they might be able to access production environment credentials - it's on you to know which credentials those are 2. Keep tested backups that are independent from your production host

English

6.2K

Max Andreacchi@atomicchonk·27 Nis

Most defenses are built to catch spikes. This approach works because it’s a slow drift, which is harder to detect. #aisecurity #llmsecurity

English

Max Andreacchi@atomicchonk·27 Nis

Most people test AI systems for obvious adversarial prompts, but really that’s not how they actually fail. 🧵

English

Max Andreacchi@atomicchonk·27 Nis

This is by design: it’s trying to be helpful within the context you’ve shaped.

English

Max Andreacchi@atomicchonk·27 Nis

Make sure it’s nothing that would trip a guardrail on its own. Over time, the model starts to bend.

English

Max Andreacchi@atomicchonk·27 Nis

You don’t start “punchy” and adversarial. You start casually as you would in normal use cases. Then gradually introduce: - slight reframing - implied assumptions - subtle context shifts

English

Max Andreacchi@atomicchonk·27 Nis

One of the most consistently reliable techniques I’ve observed (even against frontier models) is what I’d call “context drift.”

English

Keşfet

@moyix @owasp @HackingLZ @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates