Bowen Baker

54 posts

Bowen Baker

@bobabowen

Research Scientist at @openai since 2017 Robotics, Multi-Agent Reinforcement Learning, LM Reasoning, and now Alignment.

Katılım Ocak 2017

114 Takip Edilen3.7K Takipçiler

Sabitlenmiş Tweet

Bowen Baker@bobabowen·19 Ara

To preserve or improve chain-of-thought (CoT) monitorability, we have to be able to measure it. I'm excited to announce our new research on this at OpenAI

English

12.6K

Bowen Baker@bobabowen·24 Nis

As always, I'm incredibly grateful to work with such a talented team that can push important work like this forward❤️

English

2.5K

Bowen Baker@bobabowen·24 Nis

Monitoring an agent's CoT can be incredibly useful for detecting misbehavior. In order to preserve its usefulness, we must be able to quantify monitorability. This motivated our work in creating these evaluations and using them to conduct scaling studies openai.com/index/evaluati…

English

4.9K

Bowen Baker@bobabowen·24 Nis

Today we open sourced many of OpenAI's monitorability evaluations. We hope that the research community and other model developers can build upon them and use them to evaluate the monitorability of their own models. alignment.openai.com/monitorability…

English

593

194.1K

Bowen Baker@bobabowen·6 Mar

Because there are other possible failure modes, it will still be important to run monitorability evaluations as well (openai.com/index/evaluati…). The more pieces of evidence we can gather that our models are monitorable the better.

English

337

Bowen Baker@bobabowen·6 Mar

Models becoming aware they are being monitored and actively controlling their CoT to evade monitoring is one potential failure mode of CoT monitorability. Regularly measuring our models ability to control their CoT is therefore important if we want to preserve CoT monitorability.

Tomek Korbak@tomekkorbak

We have a new eval to help keep chains of thought (CoT) monitorable: CoT Controllability. This tests whether LLMs can control their CoT, helping to evade CoT monitors. So far, the results leave us cautiously optimistic: today’s models struggle to obfuscate their reasoning in ways that undermine monitorability.

English

2.3K

Bowen Baker retweetledi

OpenAI@OpenAI·5 Mar

We're publishing a new evaluation suite and research paper on Chain-of-Thought (CoT) Controllability. We find that GPT-5.4 Thinking shows low ability to obscure its reasoning—suggesting CoT monitoring remains a useful safety tool. openai.com/index/reasonin…

English

236

325

2.8K

350.9K

Bowen Baker retweetledi

Frontier Model Forum@fmf_org·27 Oca

New Publication: Chain of Thought Monitorability Our latest issue brief explores how Chain of Thought monitoring can help prevent certain types of harm, and why it shows promise as a new layer of defense for frontier AI safety and security: frontiermodelforum.org/issue-briefs/c…

English

6.6K

Bowen Baker retweetledi

OpenAI@OpenAI·19 Ara

To preserve chain-of-thought (CoT) monitorability, we must be able to measure it. We built a framework + evaluation suite to measure CoT monitorability — 13 evaluations across 24 environments — so that we can actually tell when models verbalize targeted aspects of their internal reasoning. openai.com/index/evaluati…

English

107

187

1.5K

348.1K

Bowen Baker@bobabowen·19 Ara

I had a great time working with such an awesome team on this. Looking forward to doing more great research with them. @MelodyGuan @MilesKWang @MicahCarroll @zehao_dou Annie Wei @Marcus_J_W Benjamin Arnav @mia_glaese @merettm

English

453

Bowen Baker@bobabowen·19 Ara

Finally, we share some preliminary work on improving monitorability. We ask models followup questions to get them to think more tokens, and oftentimes their extended thinking contains information about their reasoning the model didn't originally verbalize!

English

511

Bowen Baker@bobabowen·19 Ara

To preserve or improve chain-of-thought (CoT) monitorability, we have to be able to measure it. I'm excited to announce our new research on this at OpenAI

English

12.6K

Bowen Baker retweetledi

Jasmine Wang@j_asminewang·1 Ara

Today, OpenAI is launching a new Alignment Research blog: a space for publishing more of our work on alignment and safety more frequently, and for a technical audience. alignment.openai.com

English

137

1.2K

464.8K

Bowen Baker retweetledi

OpenAI@OpenAI·13 Kas

We’ve developed a new way to train small AI models with internal mechanisms that are easier for humans to understand. Language models like the ones behind ChatGPT have complex, sometimes surprising structures, and we don’t yet fully understand how they work. This approach helps us begin to close that gap. openai.com/index/understa…

English

217

704

5.5K

1.7M

Keşfet

@MilesKWang @MicahCarroll @zehao_dou @Marcus_J_W @mia_glaese @merettm @elonmusk @BarackObama