Jie Yang, PhD, FACMI

84 posts

Jie Yang, PhD, FACMI

Jie Yang, PhD, FACMI

@JieHealthAI

Assistant Professor @Harvard | AI in Healthcare | NLP | EHR | Views my own.

เข้าร่วม Ağustos 2017
590 กำลังติดตาม345 ผู้ติดตาม
ทวีตที่ปักหมุด
Jie Yang, PhD, FACMI
Jie Yang, PhD, FACMI@JieHealthAI·
We’re excited to release BRIDGE, a comprehensive benchmark to date for evaluating LLMs on real-world clinical text. Built on multilingual, de-identified EHR data across 87 tasks in 9 languages, BRIDGE evaluates 52 models — including GPT-4o, Gemini, DeepSeek-R1, and LLaMA 4 — through 13,000+ experiments and over 21 million inferences. 💡 Results: 🏆 DeepSeek-R1 leads overall 🥈 GPT-4o and Gemini follow closely 🌟 Baichuan-M1 shines in medical-specific tasks 📈 BRIDGE offers a public leaderboard + open datasets to support fair, ongoing evaluation and model comparison. Thanks for collaborating closely with great teams from @MassGenBrigham, Harvard, MIT, Stanford, the Mayo Clinic, and UIUC. 🧾 Read the arXiv paper: arxiv.org/abs/2504.19467 🚀Leaderboard: huggingface.co/spaces/YLab-Op… We welcome new models and data submissions! #LLM #MedAI #EHR #AIinHealthcare #Benchmark
Jie Yang, PhD, FACMI tweet mediaJie Yang, PhD, FACMI tweet mediaJie Yang, PhD, FACMI tweet mediaJie Yang, PhD, FACMI tweet media
English
1
1
12
1.2K
Jie Yang, PhD, FACMI
Jie Yang, PhD, FACMI@JieHealthAI·
@kchonyc Multiple-choice medical exams oversimplify the complexity of medicine. Try our BRIDGE benchmark, which is based on 87 real-world clinical tasks and includes far more complex, realistic tasks. We have evaluated 95 LLMs with over 3.4 billion LLM inferences arxiv.org/abs/2504.19467
English
0
1
7
780
Jie Yang, PhD, FACMI
Jie Yang, PhD, FACMI@JieHealthAI·
Multiple-choice medical exams oversimplify the complexity of medicine. Try our BRIDGE benchmark, which is based on 87 real-world clinical tasks and includes far more complex, realistic tasks. We have evaluated 95 LLMs with over 3.4 billion LLM inferences arxiv.org/abs/2504.19467
Kyunghyun Cho@kchonyc

wow

English
0
0
1
69
Jie Yang, PhD, FACMI
Jie Yang, PhD, FACMI@JieHealthAI·
Impressive new results from our BRIDGE medical benchmark! The recently released MedGemma model (27B) from @GoogleDeepMind outperforms all open-source LLMs—including the full version of DeepSeek-R1 (671B)—under 5-shot settings, showcasing its strong capability in real-world electronic health records (EHR) understanding. Great to see more open-source medical models becoming increasingly powerful! 📊 Check out our updated leaderboard (20+ new models added): huggingface.co/spaces/YLab-Op… Our BRIDGE paper: arxiv.org/abs/2504.19467 #MedTwitter #ArtificialInteligence #LLMs #Benchmark
Jie Yang, PhD, FACMI tweet media
English
1
1
7
505
Karan Singhal
Karan Singhal@thekaransinghal·
📣 Proud to share HealthBench, an open-source benchmark from our Health AI team at OpenAI, measuring LLM performance and safety across 5000 realistic health conversations. 🧵 Unlike previous narrow benchmarks, HealthBench enables meaningful open-ended evaluation through 48,562 unique physician-written rubric criteria spanning several health contexts (e.g., emergencies, global health) and behavioral dimensions (e.g., accuracy, instruction following, communication). Blog, paper, code: openai.com/index/healthbe…
Karan Singhal tweet media
English
29
74
426
117K
Jie Yang, PhD, FACMI
Jie Yang, PhD, FACMI@JieHealthAI·
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
English
0
0
1
136
Jie Yang, PhD, FACMI
Jie Yang, PhD, FACMI@JieHealthAI·
We’re excited to release BRIDGE, a comprehensive benchmark to date for evaluating LLMs on real-world clinical text. Built on multilingual, de-identified EHR data across 87 tasks in 9 languages, BRIDGE evaluates 52 models — including GPT-4o, Gemini, DeepSeek-R1, and LLaMA 4 — through 13,000+ experiments and over 21 million inferences. 💡 Results: 🏆 DeepSeek-R1 leads overall 🥈 GPT-4o and Gemini follow closely 🌟 Baichuan-M1 shines in medical-specific tasks 📈 BRIDGE offers a public leaderboard + open datasets to support fair, ongoing evaluation and model comparison. Thanks for collaborating closely with great teams from @MassGenBrigham, Harvard, MIT, Stanford, the Mayo Clinic, and UIUC. 🧾 Read the arXiv paper: arxiv.org/abs/2504.19467 🚀Leaderboard: huggingface.co/spaces/YLab-Op… We welcome new models and data submissions! #LLM #MedAI #EHR #AIinHealthcare #Benchmark
Jie Yang, PhD, FACMI tweet mediaJie Yang, PhD, FACMI tweet mediaJie Yang, PhD, FACMI tweet mediaJie Yang, PhD, FACMI tweet media
English
1
1
12
1.2K
Jie Yang, PhD, FACMI รีทวีตแล้ว
Rishi J Desai
Rishi J Desai@Rishidesai11·
In a study @npjDigitalMed led by Bowen Gu and @JieHealthAI, we find that LLMs struggle to convey uncertainty and can be overly confident in their answer even when it is wrong. Imp area for research and improvement for detection of hallucinations.. nature.com/articles/s4174…
English
1
4
15
1K
Jie Yang, PhD, FACMI รีทวีตแล้ว
International Society of Pharmacovigilance (ISoP)
Day 1 of the 8th ISoP Seminar in Boston! 🌟 Jim Barrett (Uppsala Monitoring Centre) shared insights on critically evaluating AI in Pharmacovigilance, while Jie Yang (Harvard Medical School) explored the power of Large Language Models in EHR understanding. 📸💡 #ISoPBostonSeminar
International Society of Pharmacovigilance (ISoP) tweet mediaInternational Society of Pharmacovigilance (ISoP) tweet mediaInternational Society of Pharmacovigilance (ISoP) tweet media
English
0
1
3
299
Enrico Santus
Enrico Santus@EnricoSantus·
🌟 The Next Chapter! 🌟 I am thrilled to announce my new position as Principal Technical Strategist for Human-AI Interaction and Academic Engagement in Bloomberg’s Office of the CTO! As Generative AI continues to revolutionize the industry, my focus will be on fostering innovation through the synergy of human expertise and AI capabilities. Additionally, I will establish initiatives aimed at providing top talent with the opportunity to collaborate with Bloomberg in addressing current challenges and laying the groundwork for a brighter future. Looking forward to this new adventure! #HumanAIInteraction #HAI #GenAI #AIInnovation #AcademicEngagement #Bloomberg #TechLeadership #GenerativeAI
Enrico Santus tweet media
English
2
0
6
227
Jie Yang, PhD, FACMI
Jie Yang, PhD, FACMI@JieHealthAI·
LLMs show promise in clinical predictions, but generating reliable prediction probabilities is challenging. Our study finds that explicit probabilities from text generation often underperform—caution is needed in clinical decisions. Preprint: arxiv.org/pdf/2408.11316 #AIinMedicine
English
0
0
3
216
Jie Yang, PhD, FACMI
Jie Yang, PhD, FACMI@JieHealthAI·
Sure. As I replied to Dan, we can only access the datasets but we don't have the right to redistribute them. Each dataset needs to be requested to the original publisher. We listed all the contact information in our paper. If you think there still exists interested topics that we can discuss, I am happy to the talk.
English
0
0
0
72
Health Universe
Health Universe@healthuniverse_·
@JieHealthAI @NEJM_AI Hi @JieHealthAI - some folks from the Health Universe team would love to talk to you about your work. Do you have any availability this week or next? DMs are open.
English
1
0
0
120
Jie Yang, PhD, FACMI รีทวีตแล้ว
Jie Yang, PhD, FACMI
Jie Yang, PhD, FACMI@JieHealthAI·
Our recent paper "Clinical Text Datasets for Medical Artificial Intelligence and Large Language Models — A Systematic Review" has been published in @NEJM_AI ! The lack of clinical text data is a long-term challenge/pain for #AIinMedicine and clinical LLM researchers. After reviewing 3962 papers and 239 tasks from clinical #NLP challenges, we found that less than half of these datasets are accessible, with significant regional, language, and disease imbalances. We have shared a list of over 90 accessible clinical text datasets, hoping it can serve as a dataset landscape for clinical NLP research. Free access link: ai.nejm.org/stoken/default…
Jie Yang, PhD, FACMI tweet media
English
7
30
104
15.6K
Jie Yang, PhD, FACMI
Jie Yang, PhD, FACMI@JieHealthAI·
@dancaron @NEJM_AI Hi Dan, although we have accessed these datasets, we can't redistribute them. Users who are interested in the listed datasets need to request the dataset from the original dataset publishers (we listed the contact information in our paper).
English
0
0
0
149
Dan Caron⚡
Dan Caron⚡@dancaron·
@JieHealthAI @NEJM_AI We'd love to host these datasets on Health Universe! Congrats on the paper and keep up the good work.
English
1
0
1
158
Liam McCoy, MD MSc
Liam McCoy, MD MSc@LiamGMcCoy·
@JieHealthAI @zakkohane @NEJM_AI Awesome work! I love the level of detail you go into with respect to the accessibility of datasets. "Available on request" is often not meaningful in practice, and journals should refuse to accept such language in absence of clear protocols for dataset archiving and access.
English
1
1
3
821