Jie Yang, PhD, FACMI (@JieHealthAI) - โปรไฟล์ Twitter

ทวีตที่ปักหมุด

We’re excited to release BRIDGE, a comprehensive benchmark to date for evaluating LLMs on real-world clinical text. Built on multilingual, de-identified EHR data across 87 tasks in 9 languages, BRIDGE evaluates 52 models — including GPT-4o, Gemini, DeepSeek-R1, and LLaMA 4 — through 13,000+ experiments and over 21 million inferences. 💡 Results: 🏆 DeepSeek-R1 leads overall 🥈 GPT-4o and Gemini follow closely 🌟 Baichuan-M1 shines in medical-specific tasks 📈 BRIDGE offers a public leaderboard + open datasets to support fair, ongoing evaluation and model comparison. Thanks for collaborating closely with great teams from @MassGenBrigham, Harvard, MIT, Stanford, the Mayo Clinic, and UIUC. 🧾 Read the arXiv paper: arxiv.org/abs/2504.19467 🚀Leaderboard: huggingface.co/spaces/YLab-Op… We welcome new models and data submissions! #LLM #MedAI #EHR #AIinHealthcare #Benchmark

English

1

12

1.2K

Jie Yang, PhD, FACMI@JieHealthAI·28 Eki

@kchonyc Multiple-choice medical exams oversimplify the complexity of medicine. Try our BRIDGE benchmark, which is based on 87 real-world clinical tasks and includes far more complex, realistic tasks. We have evaluated 95 LLMs with over 3.4 billion LLM inferences arxiv.org/abs/2504.19467

English

0

1

7

780

Kyunghyun Cho@kchonyc·27 Eki

wow

20

113

908

79.8K

Jie Yang, PhD, FACMI@JieHealthAI·28 Eki

Multiple-choice medical exams oversimplify the complexity of medicine. Try our BRIDGE benchmark, which is based on 87 real-world clinical tasks and includes far more complex, realistic tasks. We have evaluated 95 LLMs with over 3.4 billion LLM inferences arxiv.org/abs/2504.19467

Kyunghyun Cho@kchonyc

wow

English

0

1

69

Jie Yang, PhD, FACMI@JieHealthAI·4 Haz

Impressive new results from our BRIDGE medical benchmark! The recently released MedGemma model (27B) from @GoogleDeepMind outperforms all open-source LLMs—including the full version of DeepSeek-R1 (671B)—under 5-shot settings, showcasing its strong capability in real-world electronic health records (EHR) understanding. Great to see more open-source medical models becoming increasingly powerful! 📊 Check out our updated leaderboard (20+ new models added): huggingface.co/spaces/YLab-Op… Our BRIDGE paper: arxiv.org/abs/2504.19467 #MedTwitter #ArtificialInteligence #LLMs #Benchmark

English

1

7

505

Jie Yang, PhD, FACMI@JieHealthAI·13 May

Great resource! Love seeing more real-world health benchmarks beyond USMLE and PubMedQA! 🙌 Two weeks ago, we also just released BRIDGE, a multilingual, real-world EHR-based LLM benchmark covering 87 tasks in 9 languages. 📄 Paper: arxiv.org/abs/2504.19467 🚀 Leaderboard: huggingface.co/spaces/YLab-Op…

English

1

0

3

321

Karan Singhal@thekaransinghal·12 May

📣 Proud to share HealthBench, an open-source benchmark from our Health AI team at OpenAI, measuring LLM performance and safety across 5000 realistic health conversations. 🧵 Unlike previous narrow benchmarks, HealthBench enables meaningful open-ended evaluation through 48,562 unique physician-written rubric criteria spanning several health contexts (e.g., emergencies, global health) and behavioral dimensions (e.g., accuracy, instruction following, communication). Blog, paper, code: openai.com/index/healthbe…

English

29

74

426

117K

Jie Yang, PhD, FACMI@JieHealthAI·30 Nis

BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text

English

0

1

136

Jie Yang, PhD, FACMI@JieHealthAI·30 Nis

We’re excited to release BRIDGE, a comprehensive benchmark to date for evaluating LLMs on real-world clinical text. Built on multilingual, de-identified EHR data across 87 tasks in 9 languages, BRIDGE evaluates 52 models — including GPT-4o, Gemini, DeepSeek-R1, and LLaMA 4 — through 13,000+ experiments and over 21 million inferences. 💡 Results: 🏆 DeepSeek-R1 leads overall 🥈 GPT-4o and Gemini follow closely 🌟 Baichuan-M1 shines in medical-specific tasks 📈 BRIDGE offers a public leaderboard + open datasets to support fair, ongoing evaluation and model comparison. Thanks for collaborating closely with great teams from @MassGenBrigham, Harvard, MIT, Stanford, the Mayo Clinic, and UIUC. 🧾 Read the arXiv paper: arxiv.org/abs/2504.19467 🚀Leaderboard: huggingface.co/spaces/YLab-Op… We welcome new models and data submissions! #LLM #MedAI #EHR #AIinHealthcare #Benchmark

English

1

12

1.2K

Jie Yang, PhD, FACMI รีทวีตแล้ว

Rishi J Desai@Rishidesai11·3 Oca

In a study @npjDigitalMed led by Bowen Gu and @JieHealthAI, we find that LLMs struggle to convey uncertainty and can be overly confident in their answer even when it is wrong. Imp area for research and improvement for detection of hallucinations.. nature.com/articles/s4174…

English

1

4

15

1K

Jie Yang, PhD, FACMI@JieHealthAI·20 Ara

Check out our latest publication in @npjDigitalMed: "Probabilistic Medical Predictions of Large Language Models" 🎉 We reveal key differences in how LLMs generate prediction probabilities (or confidence) and emphasize the need for caution in their clinical use. Link: nature.com/articles/s4174… Download: rdcu.be/d37Wz #AI #HealthcareAI #LLM #DigitalHealth

English

0

1

7

373

Jie Yang, PhD, FACMI รีทวีตแล้ว

International Society of Pharmacovigilance (ISoP)@ISoPonline·5 Ara

Day 1 of the 8th ISoP Seminar in Boston! 🌟 Jim Barrett (Uppsala Monitoring Centre) shared insights on critically evaluating AI in Pharmacovigilance, while Jie Yang (Harvard Medical School) explored the power of Large Language Models in EHR understanding. 📸💡 #ISoPBostonSeminar

International Society of Pharmacovigilance (ISoP) tweet media

English

0

1

3

299

Jie Yang, PhD, FACMI@JieHealthAI·29 Eyl

@EnricoSantus Big congratulations, Enrico!

English

0

1

39

Enrico Santus@EnricoSantus·27 Eyl

🌟 The Next Chapter! 🌟 I am thrilled to announce my new position as Principal Technical Strategist for Human-AI Interaction and Academic Engagement in Bloomberg’s Office of the CTO! As Generative AI continues to revolutionize the industry, my focus will be on fostering innovation through the synergy of human expertise and AI capabilities. Additionally, I will establish initiatives aimed at providing top talent with the opportunity to collaborate with Bloomberg in addressing current challenges and laying the groundwork for a brighter future. Looking forward to this new adventure! #HumanAIInteraction #HAI #GenAI #AIInnovation #AcademicEngagement #Bloomberg #TechLeadership #GenerativeAI

English

2

0

6

227

Jie Yang, PhD, FACMI@JieHealthAI·7 Eyl

LLMs show promise in clinical predictions, but generating reliable prediction probabilities is challenging. Our study finds that explicit probabilities from text generation often underperform—caution is needed in clinical decisions. Preprint: arxiv.org/pdf/2408.11316 #AIinMedicine

English

0

3

216

Jie Yang, PhD, FACMI@JieHealthAI·22 May

Sure. As I replied to Dan, we can only access the datasets but we don't have the right to redistribute them. Each dataset needs to be requested to the original publisher. We listed all the contact information in our paper. If you think there still exists interested topics that we can discuss, I am happy to the talk.

English

0

72

Health Universe@healthuniverse_·22 May

@JieHealthAI @NEJM_AI Hi @JieHealthAI - some folks from the Health Universe team would love to talk to you about your work. Do you have any availability this week or next? DMs are open.

English

1

0

120

Jie Yang, PhD, FACMI รีทวีตแล้ว

Jie Yang, PhD, FACMI@JieHealthAI·21 May

Our recent paper "Clinical Text Datasets for Medical Artificial Intelligence and Large Language Models — A Systematic Review" has been published in @NEJM_AI ! The lack of clinical text data is a long-term challenge/pain for #AIinMedicine and clinical LLM researchers. After reviewing 3962 papers and 239 tasks from clinical #NLP challenges, we found that less than half of these datasets are accessible, with significant regional, language, and disease imbalances. We have shared a list of over 90 accessible clinical text datasets, hoping it can serve as a dataset landscape for clinical NLP research. Free access link: ai.nejm.org/stoken/default…

English

7

30

104

15.6K

Jie Yang, PhD, FACMI@JieHealthAI·22 May

@dancaron @NEJM_AI Hi Dan, although we have accessed these datasets, we can't redistribute them. Users who are interested in the listed datasets need to request the dataset from the original dataset publishers (we listed the contact information in our paper).

English

0

149

Dan Caron⚡@dancaron·22 May

@JieHealthAI @NEJM_AI We'd love to host these datasets on Health Universe! Congrats on the paper and keep up the good work.

English

1

0

1

158

Jie Yang, PhD, FACMI@JieHealthAI·21 May

@LiamGMcCoy @zakkohane @NEJM_AI Thanks! Yes, actually the difficulty of accessing data for published papers is the main reason for us to start this review.

English

1

0

2

252

Liam McCoy, MD MSc@LiamGMcCoy·21 May

@JieHealthAI @zakkohane @NEJM_AI Awesome work! I love the level of detail you go into with respect to the accessibility of datasets. "Available on request" is often not meaningful in practice, and journals should refuse to accept such language in absence of clear protocols for dataset archiving and access.

English

1

3

821

Jie Yang, PhD, FACMI@JieHealthAI·21 May

@ZainKhalpey @NEJM_AI Thanks Zain!😀

English

0

182