

Toju Duke
377 posts

@TojuDuke
Founder Diverse AI and Bedrock AI | Ex-Google | Author | Responsible AI Advisor | Speaker | Entrepreneur | AI Researcher



I appreciate @Anthropic's honesty in their latest system card, but the content of it does not give me confidence that the company will act responsibly with deployment of advanced AI models: -They primarily relied on an internal survey to determine whether Opus 4.6 crossed their autonomous AI R&D-4 threshold (and would thus require stronger safeguards to release under their Responsible Scaling Policy). This wasn't even an external survey of an impartial 3rd party, but rather a survey of Anthropic employees. -When 5/16 internal survey respondents initially gave an assessment that suggested stronger safeguards might be needed for model release, Anthropic followed up with those employees specifically and asked them to "clarify their views." They do not mention any similar follow-up for the other 11/16 respondents. There is no discussion in the system card of how this may create bias in the survey results. -Their reason for relying on surveys is that their existing AI R&D evals are saturated. Some might argue that AI progress has been so fast that it's understandable they don't have more advanced quantitative evaluations yet, but we can and should hold AI labs to a high bar. Also, other labs do have advanced AI R&D evals that aren't saturated. For example, OpenAI has the OPQA benchmark which measures AI models' ability to solve real internal problems that OpenAI research teams encountered and that took the team more than a day to solve. I don't think Opus 4.6 is actually at the level of a remote entry-level AI researcher, and I don't think it's dangerous to release. But the point of a Responsible Scaling Policy is to build institutional muscle and good habits before things do become serious. Internal surveys, especially as Anthropic has administered them, are not a responsible substitute for quantitative evaluations.








🔥Thanks to everyone who made it to our #AI + #Energy hackathon last Saturday. We developed equitable and accessible energy AI applications. Special shoutout to @EnergySysCat for kindly sponsoring this event and everyone else involved! More info at diverse-ai.org

Join us for a one-day Hackathon that intersects #AI and #energy, sponsored by @EnergySysCat! We’ll build inclusive and equitable AI solutions for the energy sector. Everyone is welcome regardless of background or skills. 👉🏻 Register here: diverse-ai.org/events/diverse… @TojuDuke
























🌞You're invited to join us for our London Summer Mixer taking place on Thursday, 26th of June from 6.30 - 8.30pm at 58VE. There’d be lots of fun activities, networking and food! 👉🏼 Sign up here: lnkd.in/evcnuEef. #ai #aifunevent #london



