Benjamin Hilton (@benjamin_hilton) - Twitter-Profil

Benjamin Hilton@benjamin_hilton·9 Mar

Oops! We accidentally took down this ad 3 hours early. We've re-opened the ad until midnight anywhere on earth Tuesday, in case you were cut off. Please do apply! x.com/benjamin_hilto…

Benjamin Hilton@benjamin_hilton

Come work with me at @AISecurityInst! AI safety has a huge gap: we have great thoughts about how misalignment could occur, but IMO terrible models of what dangerously misalignedAIs would actually do. Many stories just jump from "misaligned" to "godlike ASI does magic." 1/5

English

1

3

20

4.1K

Benjamin Hilton retweetet

Nathan Labenz@labenz·1 Mar

The UK AISI might be the most situationally aware government entity in the world today. Today on @CogRev_Podcast, Chief Scientist @geoffreyirving surveys the AI landscape. While jailbreaking frontier models is getting harder, the AISI Red Team has never failed. 👀

English

7

24

133

20.8K

Benjamin Hilton retweetet

AI Security Institute@AISecurityInst·19 Şub

As AI systems grow more capable, making sure they behave as intended is a critical technical challenge. Today we announce the first 60 Alignment Project grants, and welcome @OpenAI and @Microsoft to a growing international coalition. Over £27m now supports this work.

English

12

13

90

28.1K

Benjamin Hilton retweetet

Geoffrey Irving@geoffreyirving·12 Şub

New complexity theory paper mapping the precise query complexity of debate, given unbounded provers. No new safety ideas: the goal is a self-contained presentation of debate + cross-examination, with the precise complexity class it achieves. 🧵

English

2

11

36

2.3K

Benjamin Hilton retweetet

Geoffrey Irving@geoffreyirving·12 Şub

Please apply to work with @benjamin_hilton if you like thinking through misalignment risk modelling in detail!

Benjamin Hilton@benjamin_hilton

Come work with me at @AISecurityInst! AI safety has a huge gap: we have great thoughts about how misalignment could occur, but IMO terrible models of what dangerously misalignedAIs would actually do. Many stories just jump from "misaligned" to "godlike ASI does magic." 1/5

English

1

8

33

4.2K

Benjamin Hilton@benjamin_hilton·12 Şub

Apply: job-boards.eu.greenhouse.io/aisi/jobs/4778… London. £65k–£145k + 29% pension. Deadline: Sun 8 March 2026, anywhere on Earth. DM me to chat, or send this to the most careful thinker you know. 5/5

English

4

3

33

2K

Benjamin Hilton@benjamin_hilton·12 Şub

This is a hard role to fill. I care less about your CV than whether people who read your work come away thinking more clearly. I'd pick a sharp generalist over a domain expert who can't think in new ways – but AI safety & cybersecurity expertise especially welcome. 4/5

English

3

0

33

1.8K

Benjamin Hilton@benjamin_hilton·12 Şub

Come work with me at @AISecurityInst! AI safety has a huge gap: we have great thoughts about how misalignment could occur, but IMO terrible models of what dangerously misalignedAIs would actually do. Many stories just jump from "misaligned" to "godlike ASI does magic." 1/5

English

12

36

231

32.9K

Benjamin Hilton retweetet

David@DavidDAfrica·8 Ara

Wireheading is sometimes an "optimal" policy. In this paper (at AAAI 2026 Foundations of Agentic Systems Theory), we show that when an agent controls its reward channel, "lying" can strictly dominate "learning," and confirmed this empirically in Llama-3 and Mistral. 🧵👇

English

1

2

6

1.5K

Benjamin Hilton retweetet

Geoffrey Irving@geoffreyirving·7 Ara

Lovely blog post version of a talk Scott Aaronson gave at the UK AISI Alignment Conference on theory and AI alignment. Thank you, Scott! scottaaronson.blog/?p=9333

English

1

8

80

7.6K

Benjamin Hilton@benjamin_hilton·20 Kas

Apply here: job-boards.eu.greenhouse.io/aisi/jobs/4703… Salary: £65k to £145k, depending on experience. Deadline: 30th November. 5/5

English

0

2

661

Benjamin Hilton@benjamin_hilton·20 Kas

We’re looking for someone with: – Quantitative research experience – Strong Python & experimental design skills – Comfort with LLMs & transformer theory Hands-on ML engineering (e.g. finetuning/RL) is not a requirement. 4/5

English

2

0

832

Benjamin Hilton@benjamin_hilton·20 Kas

Come work with me!! We're hiring a research scientist for @AISecurityInst 's Propensity project – studying one of the most critical unknowns in AI security: Will future AI systems autonomously choose to cause harm? 1/5

English

1

5

16

1.8K

Benjamin Hilton retweetet

Geoffrey Irving@geoffreyirving·13 Kas

AISI ran an Alignment Conference from 29-31 November in London! The goal was to gather a mix of people experienced in and new to alignment, and get into the details of novel approaches to alignment and related problems. Hopefully we helped create some new research bets! 🧵

AI Security Institute@AISecurityInst

Last week, we hosted our inaugural Alignment Conference, in partnership with @farairesearch. The event bought together an interdisciplinary delegation of leading researchers, funders, and policymakers to discuss urgent open problems in AI alignment 🧵

English

2

5

37

4.3K

Benjamin Hilton retweetet

Geoffrey Irving@geoffreyirving·7 Kas

The @AISecurityInst Cyber Autonomous Systems Team is hiring propensity researchers to grow the science around whether models *are likely* to attempt dangerous behaviour, as opposed to whether they are capable of doing so. Application link below! 🧵

English

1

12

53

8.2K

Benjamin Hilton retweetet

AI Security Institute@AISecurityInst·7 Kas

Last week, we hosted our inaugural Alignment Conference, in partnership with @farairesearch. The event bought together an interdisciplinary delegation of leading researchers, funders, and policymakers to discuss urgent open problems in AI alignment 🧵

English

1

11

61

32K

Benjamin Hilton

Entdecken