Ian McKenzie

9 posts

Ian McKenzie

@irobotmckenzie

Katılım Ocak 2022

78 Takip Edilen283 Takipçiler

Ian McKenzie@irobotmckenzie·7 Ağu

I'm excited to share the work we did redteaming GPT-5!

FAR.AI@farairesearch

We worked with @OpenAI to test GPT-5 and improve its safeguards. We applaud OpenAI's free sharing of 3rd-party testing and responsiveness to feedback. However, our testing uncovered key limitations with the safeguards and threat modeling, which we hope OpenAI will soon resolve.

English

154

Ian McKenzie@irobotmckenzie·2 Tem

Excited for our adversarial robustness work to be out! Classifier-based defenses are likely to only be more important as time goes on.

FAR.AI@farairesearch

1/ "Swiss cheese security", stacking layers of imperfect defenses, is a key part of AI companies' plans to safeguard models, and is used to secure Anthropic's Opus 4 model. Our new STACK attack breaks each layer in turn, highlighting this approach may be less secure than hoped.

English

189

Ian McKenzie@irobotmckenzie·24 May

I now know way more about making chemical weapons than I expected to this morning.

Adam Gleave@ARGleave

My colleague @irobotmckenzie spent six hours red-teaming Claude 4 Opus, and easily bypassed safeguards designed to block WMD development. Claude gave >15 pages of non-redundant instructions for sarin gas, describing all key steps in the manufacturing process.

English

6.4K

Ian McKenzie retweetledi

Ethan Perez@EthanJPerez·20 Haz

New paper on the Inverse Scaling Prize! We detail 11 winning tasks & identify 4 causes of inverse scaling. We discuss scaling trends with PaLM/GPT4, including when scaling trends reverse for better & worse, showing that scaling trends can be misleading: arxiv.org/abs/2306.09479 🧵

English

157

37.3K

Ian McKenzie retweetledi

Ethan Perez@EthanJPerez·28 Oca

Great podcast from Ian McKenzie (@irobotmckenzie), lead on the Inverse Scaling Prize, explaining the contest + the results from the final round

George Anadiotis@linked_do

The last couple of years have been an #AI model arms race Assumption is that the larger the model the better it will perform. But that may not always be the case LLM training, scaling laws & Inverse Scaling Challenge FAR AI Ian McKenzie @ethanjperez youtube.com/watch?v=ppPUzn…

English

1.5K

Ian McKenzie retweetledi

Ethan Perez@EthanJPerez·24 Oca

We’re awarding prizes to 7/48 submissions to the Inverse Scaling Prize Round 2! Tasks show inverse scaling on @AnthropicAI @OpenAI @MetaAI @DeepMind models, often even after training with human feedback. Details at irmckenzie.co.uk/round2 and 🧵 on winners:

English

271

76.6K

Ian McKenzie retweetledi

Ethan Perez@EthanJPerez·26 Eyl

Inverse Scaling Prize Update: We got 43 submissions in Round 1 and will award prizes to 4 tasks! These tasks were insightful, diverse, & show approximate inverse scaling on models from @AnthropicAI @OpenAI @MetaAI @DeepMind. Full details at irmckenzie.co.uk/round1, 🧵 on winners:

English

353

Ian McKenzie retweetledi

Ethan Perez@EthanJPerez·21 Tem

Some ppl have asked why we’d expect larger language models to do worse on tasks (inverse scaling). We train LMs to imitate internet text, an objective that is often misaligned w human preferences; if the data has issues, LMs will mimic those issues (esp larger ones). Examples: 🧵

English

228

Ian McKenzie@irobotmckenzie·27 Haz

Looking forward to seeing what people find! I think we could uncover some interesting and important properties of large language models.

Ethan Perez@EthanJPerez

We’re announcing the Inverse Scaling Prize: a $100k grand prize + $150k in additional prizes for finding an important task where larger language models do *worse*. Link to contest details: github.com/inverse-scalin… 🧵

English

Keşfet

@AnthropicAI @OpenAI @MetaAI @metaai @elonmusk @BarackObama @taylorswift13 @cristiano