Ian McKenzie

9 posts

Ian McKenzie

Ian McKenzie

@irobotmckenzie

Katılım Ocak 2022
78 Takip Edilen283 Takipçiler
Ian McKenzie
Ian McKenzie@irobotmckenzie·
I'm excited to share the work we did redteaming GPT-5!
FAR.AI@farairesearch

We worked with @OpenAI to test GPT-5 and improve its safeguards. We applaud OpenAI's free sharing of 3rd-party testing and responsiveness to feedback. However, our testing uncovered key limitations with the safeguards and threat modeling, which we hope OpenAI will soon resolve.

English
0
0
3
154
Ian McKenzie
Ian McKenzie@irobotmckenzie·
I now know way more about making chemical weapons than I expected to this morning.
Adam Gleave@ARGleave

My colleague @irobotmckenzie spent six hours red-teaming Claude 4 Opus, and easily bypassed safeguards designed to block WMD development. Claude gave >15 pages of non-redundant instructions for sarin gas, describing all key steps in the manufacturing process.

English
2
6
90
6.4K
Ian McKenzie retweetledi
Ethan Perez
Ethan Perez@EthanJPerez·
New paper on the Inverse Scaling Prize! We detail 11 winning tasks & identify 4 causes of inverse scaling. We discuss scaling trends with PaLM/GPT4, including when scaling trends reverse for better & worse, showing that scaling trends can be misleading: arxiv.org/abs/2306.09479 🧵
Ethan Perez tweet media
English
3
41
157
37.3K
Ian McKenzie retweetledi
Ethan Perez
Ethan Perez@EthanJPerez·
Great podcast from Ian McKenzie (@irobotmckenzie), lead on the Inverse Scaling Prize, explaining the contest + the results from the final round
George Anadiotis@linked_do

The last couple of years have been an #AI model arms race Assumption is that the larger the model the better it will perform. But that may not always be the case LLM training, scaling laws & Inverse Scaling Challenge FAR AI Ian McKenzie @ethanjperez youtube.com/watch?v=ppPUzn…

English
0
1
5
1.5K
Ian McKenzie retweetledi
Ethan Perez
Ethan Perez@EthanJPerez·
Inverse Scaling Prize Update: We got 43 submissions in Round 1 and will award prizes to 4 tasks! These tasks were insightful, diverse, & show approximate inverse scaling on models from @AnthropicAI @OpenAI @MetaAI @DeepMind. Full details at irmckenzie.co.uk/round1, 🧵 on winners:
English
5
63
353
0
Ian McKenzie retweetledi
Ethan Perez
Ethan Perez@EthanJPerez·
Some ppl have asked why we’d expect larger language models to do worse on tasks (inverse scaling). We train LMs to imitate internet text, an objective that is often misaligned w human preferences; if the data has issues, LMs will mimic those issues (esp larger ones). Examples: 🧵
English
4
39
228
0