
Dr Jack JP O'Sullivan
2K posts

Dr Jack JP O'Sullivan
@OSullivan
Doctor 🫀 Freelance consultant 💻 Humanities grad 📜 Pushing frontiers of ed- and health-tech & widening access 🌍 All views my own. Impressum below













📣 Proud to share HealthBench, an open-source benchmark from our Health AI team at OpenAI, measuring LLM performance and safety across 5000 realistic health conversations. 🧵 Unlike previous narrow benchmarks, HealthBench enables meaningful open-ended evaluation through 48,562 unique physician-written rubric criteria spanning several health contexts (e.g., emergencies, global health) and behavioral dimensions (e.g., accuracy, instruction following, communication). Blog, paper, code: openai.com/index/healthbe…



Evaluations are essential to understanding how models perform in health settings. HealthBench is a new evaluation benchmark, developed with input from 250+ physicians from around the world, now available in our GitHub repository. openai.com/index/healthbe…









