Ben Edelman

111 posts

Ben Edelman

Ben Edelman

@EdelmanBen

Agent Security lead @ U.S. Center for AI Standards and Innovation. Prev: science of deep learning PhD @ Harvard

가입일 Mart 2014
55 팔로잉599 팔로워
Ben Edelman
Ben Edelman@EdelmanBen·
@zicokolter At CAISI we started using the phrase "agent hijacking" for prompt injections of agents because it avoids the inevitable confusion about the prompt injection vs jailbreak distinction (not to even mention direct vs indirect), and conveys impact more directly for a lay audience.
English
0
0
1
53
Ben Edelman
Ben Edelman@EdelmanBen·
@zicokolter Yep agreed it's all the same underlying vulnerability; instruction hierarchy-style distinctions (app developer / user / external content) are "just" an abstraction. (I was also involved with the new paper, btw)
English
1
0
1
45
Zico Kolter
Zico Kolter@zicokolter·
As AI agents access more untrusted information with greater autonomy, prompt injections may become the greatest security challenge of our era. @GraySwanAI, in collaboration many frontier labs, just released our paper on the largest public prompt injection challenge to date. 🧵
Gray Swan AI@GraySwanAI

Your AI agent can be hijacked by a prompt injection and you'd never know! The attack executes. The response looks normal. And the user moves on. We ran the largest public competition testing this exact threat across tool use, coding, and computer use agents. 464 participants, 272K attacks, 13 frontier models. Every model proved vulnerable.

English
6
9
64
11.3K
Ben Edelman
Ben Edelman@EdelmanBen·
@zicokolter Fwiw, my understanding is that the original coinage of prompt injection was focused on contexts where the untrusted data comes from an untrusted user. Then Greshake et al. coined IPI to highlight the case where the attacker leverages data likely to be retrieved at inference time.
English
1
0
0
48
Ben Edelman
Ben Edelman@EdelmanBen·
Excited to be part of this initiative. Join our team to advance the frontier of agent security research and standards! usajobs.gov/job/856267900
Director Michael Kratsios@mkratsios47

The future of AI is agentic, and America is leading the way to make it secure and interoperable. A new AI Agent Standards Initiative is launching this week @NIST to drive industry-led standards and open protocols that build trust and advance innovation. nist.gov/news-events/ne…

English
0
0
8
593
Ben Edelman 리트윗함
Tony Wang
Tony Wang@TonyWangIV·
Excited to share @NIST+CAISI’s initial public draft on how to run and report results of automated evals. If you have opinions on evals, we’d love your feedback — help us improve the AI evals ecosystem! Public comments accepted through March 31st via ai800-2@nist.gov. more in🧵
Tony Wang tweet media
English
2
6
28
3.3K
Ben Edelman 리트윗함
Ben Edelman 리트윗함
Samuel Hammond 🦉
Samuel Hammond 🦉@hamandcheese·
CAISI is hiring for a bunch of exciting new roles, from partnerships to technical experts in AI x bio / chem and more. They're serious about bringing in strong researchers & engineers and letting them do good work. Based in DC or SF: nist.gov/caisi/careers-…
English
4
40
171
84.1K
Ben Edelman
Ben Edelman@EdelmanBen·
My Agent Security team is hiring Research Engineers & Scientists. Other teams are hiring people with strong technical backgrounds too: Frontier Assessment, Cyber, Chem/Bio, Applied Systems, and Partnerships. Job postings are listed here: nist.gov/caisi/careers-…
English
0
1
10
626
Ben Edelman
Ben Edelman@EdelmanBen·
The United States is the center of the AI revolution. We need dedicated public servants to ensure our government is smart on AI issues.
English
1
2
9
5.7K
Ben Edelman
Ben Edelman@EdelmanBen·
People sometimes ask me how to leverage a technical background to jump into U.S. AI policy. As of this week my answer is straightforward: apply to join us at CAISI! We're a startup within government, and we're doing a hiring surge.
Ben Edelman tweet media
English
4
23
89
24K
Ben Edelman
Ben Edelman@EdelmanBen·
At CAISI, we're the U.S. government's leading experts on agent security. We published this RFI so deployers, developers, and experts can provide insights that inform our research and NIST guidelines development. Responses due March 9th!
Peter Cihon@pcihon

CAISI has published an RFI about securing AI agents. It seeks insights from AI agent deployers, developers, and computer security researchers. Questions address the current threat landscape, mitigations, measurements, and other security considerations unique to AI agents.

English
1
1
8
668
Ben Edelman 리트윗함
Peter Cihon
Peter Cihon@pcihon·
CAISI is recruiting an intern to support an agent security standards project. Position closes Jan. 15 for a February start. Please help spread the word. Details in thread:
English
2
18
52
14.4K
Ben Edelman
Ben Edelman@EdelmanBen·
@boazbaraktcs Since I organized this by model family branding (GPT) rather than developer (OpenAI), I think the move would be to add a separate o-series line. And don't get me started about Sonnet vs Opus
English
0
0
0
174
Boaz Barak
Boaz Barak@boazbaraktcs·
@EdelmanBen Where is o1 preview, o1 and o3? Our plot should be much less monotone than this 😂
English
1
0
6
1.7K
Ben Edelman
Ben Edelman@EdelmanBen·
the AI race in one terrible graph
Ben Edelman tweet media
English
2
1
27
3K
Ben Edelman
Ben Edelman@EdelmanBen·
@EhudReiter Note that we discuss in the post how cheating behaviors in the model can be a natural emergent outcome of RL training, without being intentionally trained in! The mechanism is reward hacking. See the background section: nist.gov/caisi/cheating…
English
1
0
2
74
Ehud Reiter
Ehud Reiter@EhudReiter·
Wow. I guess its not surprising that LLM vendors cheat on benchmarks, but very disappointing. And of course makes evaluation much harder if people are actively trying to subvert it and cheat!
Ben Edelman@EdelmanBen

What should AI evaluators do about models cheating on agent evals? In a new write-up from the U.S. Center for AI Standards and Innovation, we characterize cheating, share examples from our logs, and suggest evaluation practices aimed at reducing cheating's incidence and impact.🧵

English
1
0
0
365