Ben Edelman

111 posts

Ben Edelman

@EdelmanBen

Agent Security lead @ U.S. Center for AI Standards and Innovation. Prev: science of deep learning PhD @ Harvard

가입일 Mart 2014

55 팔로잉599 팔로워

Ben Edelman@EdelmanBen·3d

@zicokolter At CAISI we started using the phrase "agent hijacking" for prompt injections of agents because it avoids the inevitable confusion about the prompt injection vs jailbreak distinction (not to even mention direct vs indirect), and conveys impact more directly for a lay audience.

English

Ben Edelman@EdelmanBen·3d

@zicokolter Yep agreed it's all the same underlying vulnerability; instruction hierarchy-style distinctions (app developer / user / external content) are "just" an abstraction. (I was also involved with the new paper, btw)

English

Zico Kolter@zicokolter·4d

As AI agents access more untrusted information with greater autonomy, prompt injections may become the greatest security challenge of our era. @GraySwanAI, in collaboration many frontier labs, just released our paper on the largest public prompt injection challenge to date. 🧵

Gray Swan AI@GraySwanAI

Your AI agent can be hijacked by a prompt injection and you'd never know! The attack executes. The response looks normal. And the user moves on. We ran the largest public competition testing this exact threat across tool use, coding, and computer use agents. 464 participants, 272K attacks, 13 frontier models. Every model proved vulnerable.

English

11.3K

Ben Edelman@EdelmanBen·3d

@zicokolter Post where Simon Willison coined prompt injection: simonwillison.net/2022/Sep/12/pr…. Paper where Greshake et al. coined indirect prompt injection: arxiv.org/abs/2302.12173

English

Ben Edelman@EdelmanBen·3d

@zicokolter Fwiw, my understanding is that the original coinage of prompt injection was focused on contexts where the untrusted data comes from an untrusted user. Then Greshake et al. coined IPI to highlight the case where the attacker leverages data likely to be retrieved at inference time.

English

Ben Edelman@EdelmanBen·18 Şub

Excited to be part of this initiative. Join our team to advance the frontier of agent security research and standards! usajobs.gov/job/856267900

Director Michael Kratsios@mkratsios47

The future of AI is agentic, and America is leading the way to make it secure and interoperable. A new AI Agent Standards Initiative is launching this week @NIST to drive industry-led standards and open protocols that build trust and advance innovation. nist.gov/news-events/ne…

English

593

Ben Edelman 리트윗함

Tony Wang@TonyWangIV·10 Şub

Excited to share @NIST+CAISI’s initial public draft on how to run and report results of automated evals. If you have opinions on evals, we’d love your feedback — help us improve the AI evals ecosystem! Public comments accepted through March 31st via ai800-2@nist.gov. more in🧵

English

3.3K

Ben Edelman 리트윗함

Boaz Barak@boazbaraktcs·5 Şub

One of the best places if you have technical background and care about AI going well!

Ben Edelman@EdelmanBen

People sometimes ask me how to leverage a technical background to jump into U.S. AI policy. As of this week my answer is straightforward: apply to join us at CAISI! We're a startup within government, and we're doing a hiring surge.

English

4.7K

Ben Edelman 리트윗함

Dwarkesh Patel@dwarkesh_sp·4 Şub

Seems like a great opportunity for technical talent to come into government and help the USG make sound, technically informed decisions on AI

Samuel Hammond 🦉@hamandcheese

CAISI is hiring for a bunch of exciting new roles, from partnerships to technical experts in AI x bio / chem and more. They're serious about bringing in strong researchers & engineers and letting them do good work. Based in DC or SF: nist.gov/caisi/careers-…

English

143

49.3K

Ben Edelman 리트윗함

Samuel Hammond 🦉@hamandcheese·4 Şub

English

171

84.1K

Ben Edelman@EdelmanBen·4 Şub

My Agent Security team is hiring Research Engineers & Scientists. Other teams are hiring people with strong technical backgrounds too: Frontier Assessment, Cyber, Chem/Bio, Applied Systems, and Partnerships. Job postings are listed here: nist.gov/caisi/careers-…

English

626

Ben Edelman@EdelmanBen·4 Şub

The United States is the center of the AI revolution. We need dedicated public servants to ensure our government is smart on AI issues.

English

5.7K

Ben Edelman@EdelmanBen·4 Şub

English

24K

Ben Edelman@EdelmanBen·30 Oca

At CAISI, we're the U.S. government's leading experts on agent security. We published this RFI so deployers, developers, and experts can provide insights that inform our research and NIST guidelines development. Responses due March 9th!

Peter Cihon@pcihon

CAISI has published an RFI about securing AI agents. It seeks insights from AI agent deployers, developers, and computer security researchers. Questions address the current threat landscape, mitigations, measurements, and other security considerations unique to AI agents.

English

668

Ben Edelman 리트윗함

Peter Cihon@pcihon·26 Ara

CAISI is recruiting an intern to support an agent security standards project. Position closes Jan. 15 for a February start. Please help spread the word. Details in thread:

English

14.4K

Ben Edelman@EdelmanBen·25 Ara

@boazbaraktcs Since I organized this by model family branding (GPT) rather than developer (OpenAI), I think the move would be to add a separate o-series line. And don't get me started about Sonnet vs Opus

English

174

Boaz Barak@boazbaraktcs·25 Ara

@EdelmanBen Where is o1 preview, o1 and o3? Our plot should be much less monotone than this 😂

English

1.7K

Ben Edelman@EdelmanBen·25 Ara

the AI race in one terrible graph

English

Ben Edelman@EdelmanBen·3 Ara

@EhudReiter Note that we discuss in the post how cheating behaviors in the model can be a natural emergent outcome of RL training, without being intentionally trained in! The mechanism is reward hacking. See the background section: nist.gov/caisi/cheating…

English

Ehud Reiter@EhudReiter·3 Ara

Wow. I guess its not surprising that LLM vendors cheat on benchmarks, but very disappointing. And of course makes evaluation much harder if people are actively trying to subvert it and cheat!

Ben Edelman@EdelmanBen

What should AI evaluators do about models cheating on agent evals? In a new write-up from the U.S. Center for AI Standards and Innovation, we characterize cheating, share examples from our logs, and suggest evaluation practices aimed at reducing cheating's incidence and impact.🧵

English

365

탐색

@zicokolter @GraySwanAI @NIST @boazbaraktcs @EhudReiter @elonmusk @BarackObama @taylorswift13