
Declan Grabb, MD
59 posts



People are increasingly turning to chatbots for help. But AIs struggle to detect violent or suicidal intentions. trib.al/8BAOKnE

Announcing the release of AILuminate, a first-of-its kind benchmark to measure the safety of LLMs. The AILuminate v1.0 benchmark offers a comprehensive set of safety grades for today's most prevalent #LLMs. mlcommons.org/2024/12/mlcomm… (1/4)



New Anthropic research: Evaluating feature steering. In May, we released Golden Gate Claude: an AI fixated on the Golden Gate Bridge due to our use of “feature steering”. We've now done a deeper study on the effects of feature steering. Read the post: anthropic.com/research/evalu…



Today, we'll present our work at #AIES @AIESConf on - What are the ethical challenges associated with psychiatric care? - How does ethical AI decision-making look like in this context? - Are current language models safe enough? Come to poster session 2!

New: The #RAISEHealth Symposium Summary Paper is now out! Featuring insights from 60+ experts and actionable recommendations on the responsible use of AI to transform biomedicine. Find out more: @StanfordMed stan.md/48auA1E



🚨 Our paper was accepted for @COLM_conf! As we face a mental health crisis and lack of access to professional care, many turn to AI as a solution. But how does ethical automated care look like and are models safe enough for patients? Paper: arxiv.org/abs/2406.11852


We develop a method to test global opinions represented in language models. We find the opinions represented by the models are most similar to those of the participants in USA, Canada, and some European countries. We also show the responses are steerable in separate experiments.

