Allen Li MD

148 posts

Allen Li MD

@LiAllenMD

Medical Oncologist | The Vancouver Clinic, Legacy Cancer Institute. Creator of Oncology AI Lab — stress testing AI to separate the signal from noise

https://allenlimd.substack.com Katılım Haziran 2020

548 Takip Edilen133 Takipçiler

Sabitlenmiş Tweet

Allen Li MD@LiAllenMD·5 Nis

Community oncologist. I use AI to appraise clinical trials, then verify every finding against the primary data. The AI gets graded too. One trial, two readers, one bottom line. The Source Report on YouTube (in bio) and Substack. allenlimd.substack.com #onced #oncology #meded

English

718

Allen Li MD@LiAllenMD·14h

Hey this is a great video! Especially the background information and how the self play works! We did a breakdown of the same paper here: Video short youtube.com/shorts/cLB8Ckt… Full write up @allenlimd/note/p-198847776?r=7okihl&utm_medium=ios&utm_source=notes-share-action" target="_blank" rel="nofollow noopener">substack.com/@allenlimd/not… Would love to compare notes and explore opportunities to collaborate!

YouTube

English

Roupen Odabashian@RoupenMD·20h

I am starting a new video series breaking down AI papers in healthcare 🩺 First up: AMIE, Google DeepMind's conversational diagnostic AI that outperformed primary care physicians on 30/32 axes in a randomized OSCE study (Nature, 2025). Huge credit to the authors: @taotu831 @HardyShakerman @apalepu13 @vivnat @alan_karthi @GeminiApp Watch 👇 youtube.com/watch?v=823D3Y…

YouTube

English

529

Allen Li MD@LiAllenMD·4d

@EricTopol @GoogleDeepMind @alan_karthi @RyutaroTanno It is a well designed study. The title and main text tells part of the story. The supplement gave more insight on the hallucination concern 👇 youtube.com/shorts/cLB8Ckt…

YouTube

English

Eric Topol@EricTopol·14 May

A multimodal AI (AIME) had superior performance compared with 18 physicians for almost every metric (29 of 32 axes). Randomized, but simulated, not real-world clinical practice. @GoogleDeepMind @alan_karthi @RyutaroTanno nature.com/articles/s4159…

English

105

15.4K

Allen Li MD@LiAllenMD·4d

@SciencNews @NatureMedicine The coverage of the Science paper claims AI beats doctors in clinical reasoning. We need to be more critical of what “clinical reasoning“ means in this publication. Take a look at how the AI “beats” physician in this Science paper. youtube.com/shorts/Io9aFmZ…

YouTube

English

Science News@SciencNews·14 May

AI had superior performance compared with physicians for almost every metric (29 of 32 axes) @NatureMedicine

Science News@SciencNews

AI is starting to beat doctors at making correct diagnoses @ScienceMagazine

English

9.6K

Allen Li MD@LiAllenMD·4d

@SciencNews @NatureMedicine The Nature Medicine paper a well designed study. The title and main text tells part of the story. However, the supplement gave more insight on the hallucination concern.👇 youtube.com/shorts/cLB8Ckt…

YouTube

English

Allen Li MD@LiAllenMD·4d

@NeilFlochMD agree that AI is already transforming the field, and it is up to us on how that transformation goes. This is a well designed study. The title and main text tells part of the story. However, the supplement gave more insight on the hallucination concern.👇 youtube.com/shorts/cLB8Ckt…

YouTube

English

Neil Floch MD@NeilFlochMD·15 May

It’s NOT AI or doctors. It’s AI and doctors. We already use technology to improve medicine and surgery. Technology is not a threat to the physician- it makes our jobs better and easier. Bring it on!

Science News@SciencNews

AI had superior performance compared with physicians for almost every metric (29 of 32 axes) @NatureMedicine

English

4.5K

Allen Li MD@LiAllenMD·4d

@MSaintjour great summary! It is a well designed study. The title and main text tells part of the story. However, the supplement gave more insight on the hallucination concern.👇 youtube.com/shorts/cLB8Ckt…

YouTube

English

Marc Saint-Jour@MSaintjour·19 May

1/ This week's Vital Signal: Multimodal AMIE beat PCPs in simulated diagnostic visits. Nature Medicine reports multimodal AMIE outperformed primary care physicians in a randomized, blinded exploratory study of 105 simulated telehealth consultations.

English

Allen Li MD@LiAllenMD·4d

Credit to the authors for including this in the supplement. It is this kind of academic integrity that will move the field of AI in medicine forward.

English

Allen Li MD@LiAllenMD·4d

In Nature Medicine: Google DeepMind’s multimodal AI read complete heart block as normal sinus rhythm. Internal log: no evidence the image was processed. Main paper Fig 2c: hallucination not significant. Supplement says different.👇 youtube.com/shorts/cLB8Ckt… #AIinMedicine #PatientSafety #ClinicalAI #Cardiology #MedTwitter #AILiteracy @VincentRK @HemOncFellows @OncBrothers @DrArturoAI @montypal @operationdanish @Papa_Heme @EricTopol @DrRishabhOnco @OncoAlert @OncoReporte @Larvol @OncologyBGLab @JavierDavidBen2 @csoncol @Timothee_MD @JCOOP_ASCO @TwoOncDocs @FCademartiri @doctorbhargav

YouTube

English

417

Allen Li MD@LiAllenMD·4d

With the FDA approval today of Dato-dxd based on Tropion Breast 02 for mTNBC, it is worth revisiting. It’s a good option for mTNBC. One important point is that the OS benefit actually is regional dependent. In the US/Canada/Europe subgroup the HR is actually reversed!👇 youtube.com/shorts/fqJUehE… #OncTwitter #bcsm #datodxd #tropionbreast02

YouTube

English

151

Allen Li MD@LiAllenMD·4d

@KolPulseAI @hoperugo @Dr_RShatsky A good option for mTNBC. One important point is that the OS benefit actually is regional dependent. In the US/Canada/Europe subgroup the HR is actually reversed!👇 youtube.com/shorts/fqJUehE…

YouTube

English

155

KOL Pulse AI@KolPulseAI·4d

🎉 FDA APPROVED — Datroway (Dato-DXd) is now a first-line option for metastatic TNBC in patients not eligible for PD-1/PD-L1 therapy, on the strength of TROPION-Breast02 (OS 23.7 vs 18.7 mo · PFS HR 0.57). Hear what the KOLs are saying 👇 @hoperugo · @Dr_RShatsky · @ElisaAgostinett · @OncBrothers kolpulse.com/kol-pulse-tria…

English

983

Allen Li MD@LiAllenMD·4d

@FCademartiri That’s a very nuanced read on the paper! Agree that the headline runs ahead of the data. On the concern of imaging processing and hallucination, the main text and the supplement tell two different stories.👇 youtube.com/shorts/cLB8Ckt…

YouTube

English

Dr. Filippo Cademartiri@FCademartiri·4d

🤖🫀 “AI is outperforming doctors.” That’s the headline. This paper is the fine print—and it matters much more. This Nature Medicine study shows that a multimodal AI system (AMIE): 👉 outperforms primary care physicians 👉 in diagnostic accuracy 👉 in history-taking 👉 even in “empathy” Impressive. But only if you don’t look too closely. Let’s unpack what actually happened This is: 👉 OSCE-style simulations 👉 with patient actors 👉 in text-based chat 👉 with curated multimodal inputs (images, ECGs, documents) Translation This is not clinical practice. This is: 👉 a controlled exam environment And in that environment, AI shines Because the system is designed to: 👉 follow structured phases (history → diagnosis → management) 👉 ask systematic questions 👉 never get tired 👉 never rush 👉 always explain everything Meanwhile, physicians are… human In a chat interface: 👉 no physical exam 👉 no visual cues 👉 no workflow support 👉 no time to “optimize” each answer So what are we really comparing? Not: ❌ AI vs doctors But: 👉 optimized system vs constrained humans The key methodological tension 1. Simulation ≠ reality The paper explicitly states: 👉 not a randomized clinical trial 👉 exploratory study 2. Multimodality is limited Yes, images + ECG + documents. But: 👉 no real imaging workflows 👉 no acquisition decisions 👉 no uncertainty in data quality beyond controlled scenarios 3. The “empathy” paradox AI scores higher in empathy. But: 👉 empathy here = text-based ratings 👉 structured, polite, exhaustive responses Not: 👉 real human interaction under pressure The most interesting (and under-discussed) point The system performs better because: 👉 it enforces a structure that clinicians often don’t follow consistently Which leads to a slightly uncomfortable thought: 👉 Is AI better… or just more systematically organized? My take This is a very strong engineering paper. But it is not yet a clinical superiority paper. What it really proves 👉 multimodal reasoning improves diagnostic dialogue 👉 structured workflows improve performance 👉 AI can operationalize both at scale What it does NOT prove ❌ that AI performs better in real patients ❌ that it improves outcomes ❌ that it can handle true clinical complexity Bottom line AI is not replacing physicians. But it is exposing something very clearly: 👉 consistency beats intuition in controlled environments ⚡ The real opportunity is not: “AI vs doctors” It is: 👉 using AI to make doctors more systematic #AIinMedicine #DigitalHealth #ClinicalAI #MultimodalAI #Healthcare #CriticalAppraisal

English

1.4K

Allen Li MD@LiAllenMD·4d

@EPWaveDoc @drjohnm It is a well designed study. The title and main text tells part of the story. However, the supplement gave more insight on the hallucination concern.👇 youtube.com/shorts/cLB8Ckt…

YouTube

English

Andreas Müssigbrodt@EPWaveDoc·1 May

AI beats doctors in diagnosis. AMIE, a LLM-based system, outperformed primary care physicians in a study across 159 typical clinical cases. Future implications for medicine? One for the TWIC podcast, @drjohnm? nature.com/articles/s4158… #Medicine #TWIC #Epeeps #EmergencyRoom

English

1.2K

Allen Li MD@LiAllenMD·4d

@ZainKhalpey It is a well designed study. The title and main text tells part of the story. The supplement gave more insight on the hallucination concern 👇 youtube.com/shorts/cLB8Ckt…

YouTube

English

Zain Khalpey, MD, PhD, FACS@ZainKhalpey·15 May

Multimodal AI is reshaping digital healthcare. A recent Nature Medicine study showed AMIE outperforming physicians across most simulated telehealth metrics using ECGs, imaging, and clinical reasoning together. nature.com/articles/s4159… #MedTwitter #AI #Healthcare #DigitalHealth

English

251

Allen Li MD@LiAllenMD·4d

@DrMarzouka It is a well designed study. The title and main text tells part of the story. The supplement gave more insight on the hallucination concern 👇 youtube.com/shorts/cLB8Ckt…

YouTube

English

George R. Marzouka@DrMarzouka·15 May

The diagnostic conversation just evolved. New research in Nature Medicine shows multimodal AMIE can now process patient history alongside medical images, ECGs, and clinical documents during telehealth visits — mimicking how experienced clinicians synthesize visual and verbal cues. In 105 simulated consultations, specialist evaluators found the AI excelled across 29 of 32 measures including diagnostic accuracy, conversation quality, and empathy. The breakthrough: a state-aware dialogue framework that dynamically adjusts questioning based on diagnostic uncertainty. What makes this compelling isn't just the breadth of data it handles, but how it structures reasoning like seasoned physicians — knowing when to probe deeper, when to pivot, and how evolving patient responses should guide the next question. This could fundamentally change telehealth access, especially in underserved areas where multimodal diagnostic expertise is scarce. doi.org/10.1038/s41591…

English

115

Allen Li MD@LiAllenMD·4d

@Gabe__MD It is a well designed study. The title and main text tell part of the story. The supplement gave more insight on the hallucination concern 👇 youtube.com/shorts/cLB8Ckt…

YouTube

English

Gabe Wilson MD@Gabe__MD·6d

Nature Medicine just published the formal version of something I demonstrated in a crude pilot last week: AI outperforming physicians across multimodal clinical reasoning. Google's team ran 105 simulated multimodal consultations involving dermatology photographs, ECGs, and clinical documents. 210 total conversations. 18 specialist physicians evaluated blinded. Multimodal AMIE versus board-certified primary care physicians. AMIE outperformed PCPs on 29 of 32 evaluation axes. Diagnostic accuracy higher across top-1 through top-10 differentials, all modalities, P<0.001. Three findings deserve attention. First, this was not single-modality performance. AMIE beat PCPs on skin images, ECGs, and clinical documents simultaneously. History-taking, management reasoning, clinical communication. 29 axes is not a narrow win. It is comprehensive. Second, the hallucination data needs careful reading. The frequency of misreporting between AMIE and PCPs was *not* significantly different. What was significantly different was the *consequence*. When misreporting occurred, it damaged PCP diagnostic accuracy more than AMIE's. AMIE was also rated higher on artifact-grounded reasoning and more robust to low-quality images. Hallucinating at similar rates does not mean equivalent reliability when the downstream diagnostic effect is asymmetric. Third, the empathy finding. Patient-actors rated AMIE higher on empathy, history-taking, and willingness to return. PCPs lost on the dimension they have historically claimed as uniquely human. The honest limitations. Exploratory study, not an RCT. Patient-actors, not real patients. Chat interface, no physical examination. Built on Gemini 2.0 Flash, no longer a frontier model. Some underlying datasets were publicly available and may have been encountered during pretraining. Built on a model that is already two generations behind the current frontier. On a non-frontier model, AI outperformed board-certified physicians across 29 of 32 axes. The current frontier models are materially more capable. This is not a fringe result. It is in Nature Medicine, evaluated by independent specialists, with comprehensive multimodal scenarios and rigorous statistical analysis.

Ryu Tanno | 丹野龍太郎@RyutaroTanno

Very happy (and relieved) to see our work on multimodal conversational medical AI accepted in @NatureMedicine nature.com/articles/s4159… In the published version, we have substantially expanded on the analysis and evaluation. Kudos to @_cjpark @timstro @JanFreyberg @_khaledsaab This work also formed an important precusor for our more recent work where we explored a similar problem but in real-time interaction: x.com/RyutaroTanno/s… Both modes of UX (synchronous and asynchronous) are useful but in different ways. Also a nice reminder that a prospective evaluation remains as an important future work.

English

16.1K

Allen Li MD@LiAllenMD·5d

@OncBrothers @jamecancerdoc @US_FDA @OncUpdates @JaniceTNBCmets @AbiSivaMD @AMJohnston1315 @ErikaHamilton9 @SPremji7866 @PTarantinoMD @BijoyTelivala @RenoHemonc @dr_yakupergun @Dr_RShatsky A good option for mTNBC. One important point is that the OS benefit actually is regional dependent. In the US/Canada/Europe subgroup the HR is actually reversed!👇 youtube.com/shorts/fqJUehE…

YouTube

English

311

Oncology Brothers@OncBrothers·5d

DatoDXd now @US_FDA ✅ in 1L mTNBC in PDL1 negative/IO ineligible based off TROPION-Breast02 - mPFS 10.8 vs 5.6mos (HR: 0.57) - mOS 23.7 vs. 18.7mos (HR: 0.79) - ORR 62.5% vs. 29.3% - Common AEs: mucositis & occular AEs #OncTwitter #bcsm @OncUpdates

English

10.9K

Allen Li MD@LiAllenMD·5d

@OncoAlert Few nuanced to be aware of for SERENA6, especially in the interpretation of PFS2 👇 youtube.com/shorts/G0V21HB… youtube.com/shorts/J76RBi2…

YouTube

English

271

OncoAlert@OncoAlert·5d

News from industry: SERENA6 Update in #BreastCancer Source : AstraZeneca buff.ly/AG5XMPO EU CHMP has recommended approval of AstraZeneca’s camizestrant combined with a CDK4/6 inhibitor for ER-positive, HER2-negative advanced breast cancer with emergent ESR1 mutations after first-line endocrine therapy. Based on SERENA-6, the regimen reduced risk of disease progression or death by 56%, improving median PFS to 16.0 vs 9.2 months. PFS2 also improved, with overall survival data still maturing. #BreastCancer Ping @matteolambe @aftimosp @E_de_Azambuja @DrSGraff @ErikaHamilton9 @double_whammied @maryam_lustberg @raalbany @hoperugo @stolaney1 @LoiSher @SirohiBhawna @jamecancerdoc @JavierCortesMD @JaniceTNBCmets @Prof_Nadia_H @nataliagandur @acampsmalea @FernandoOnco @ElisaAgostinett @to_be_elizabeth @realbowtiedoc

English

6.1K

Allen Li MD@LiAllenMD·6d

@Dr_RShatsky @weoncologists Agree that an abstract of the abstract is not what we need. It’s the nuance beyond the headline that will inform day to day practice

English

596

Rebecca Shatsky, MD@Dr_RShatsky·20 May

I am not loving this trend of AI generated infographic explanation of pivotal clinical trial data. Many of them are straight up wrong as they aren’t peer reviewed or even created by an expert in that cancer type. Be CAREFUL when using these to learn about data and look at the source. #bcsm #breastcancer

English

175

14.3K

Allen Li MD@LiAllenMD·19 May

Very thought provoking! What standards should we hold AI vs ourselves is the right question besides how we measure it. While the AI benchmark has come a long way, it still has a lot of room for refinement. Take a look the recent Science paper. The bench mark heavily favor verbosity and shotgun approach which is not what we do in medicine👇 youtube.com/shorts/Io9aFmZ…

YouTube

English

NEJM@NEJM·19 May

Benchmarks that assess when to recognize uncertainty are being expanded, not just for AI agents but for medical trainees. Raja-Elie Abdulnour, MD (@BageLeMage), explains how embedding falsehoods in clinical vignettes can test if either can admit they don't know, a vital skill in patient care. This shared evaluation may raise standards for us all. What strategies help you teach or assess uncertainty in clinical decision making? Watch the full video interview: nej.md/4dMOzru

English

7.7K

Keşfet

@taotu831 @HardyShakerman @apalepu13 @vivnat @alan_karthi @GeminiApp @EricTopol @GoogleDeepMind