Allen Li MD

148 posts

Allen Li MD banner
Allen Li MD

Allen Li MD

@LiAllenMD

Medical Oncologist | The Vancouver Clinic, Legacy Cancer Institute. Creator of Oncology AI Lab — stress testing AI to separate the signal from noise

https://allenlimd.substack.com Katılım Haziran 2020
548 Takip Edilen133 Takipçiler
Sabitlenmiş Tweet
Allen Li MD
Allen Li MD@LiAllenMD·
Community oncologist. I use AI to appraise clinical trials, then verify every finding against the primary data. The AI gets graded too. One trial, two readers, one bottom line. The Source Report on YouTube (in bio) and Substack. allenlimd.substack.com #onced #oncology #meded
English
0
0
6
718
Allen Li MD
Allen Li MD@LiAllenMD·
Hey this is a great video! Especially the background information and how the self play works! We did a breakdown of the same paper here: Video short youtube.com/shorts/cLB8Ckt… Full write up @allenlimd/note/p-198847776?r=7okihl&utm_medium=ios&utm_source=notes-share-action" target="_blank" rel="nofollow noopener">substack.com/@allenlimd/not… Would love to compare notes and explore opportunities to collaborate!
YouTube video
YouTube
English
1
0
1
65
Allen Li MD
Allen Li MD@LiAllenMD·
@SciencNews @NatureMedicine The coverage of the Science paper claims AI beats doctors in clinical reasoning. We need to be more critical of what “clinical reasoning“ means in this publication. Take a look at how the AI “beats” physician in this Science paper. youtube.com/shorts/Io9aFmZ…
YouTube video
YouTube
English
0
0
1
26
Allen Li MD
Allen Li MD@LiAllenMD·
@NeilFlochMD agree that AI is already transforming the field, and it is up to us on how that transformation goes. This is a well designed study. The title and main text tells part of the story. However, the supplement gave more insight on the hallucination concern.👇 youtube.com/shorts/cLB8Ckt…
YouTube video
YouTube
English
0
0
0
24
Allen Li MD
Allen Li MD@LiAllenMD·
@MSaintjour great summary! It is a well designed study. The title and main text tells part of the story. However, the supplement gave more insight on the hallucination concern.👇 youtube.com/shorts/cLB8Ckt…
YouTube video
YouTube
English
0
0
1
17
Marc Saint-Jour
Marc Saint-Jour@MSaintjour·
1/ This week's Vital Signal: Multimodal AMIE beat PCPs in simulated diagnostic visits. Nature Medicine reports multimodal AMIE outperformed primary care physicians in a randomized, blinded exploratory study of 105 simulated telehealth consultations.
English
2
0
0
26
Allen Li MD
Allen Li MD@LiAllenMD·
Credit to the authors for including this in the supplement. It is this kind of academic integrity that will move the field of AI in medicine forward.
English
0
0
0
35
Allen Li MD
Allen Li MD@LiAllenMD·
In Nature Medicine: Google DeepMind’s multimodal AI read complete heart block as normal sinus rhythm. Internal log: no evidence the image was processed. Main paper Fig 2c: hallucination not significant. Supplement says different.👇 youtube.com/shorts/cLB8Ckt… #AIinMedicine #PatientSafety #ClinicalAI #Cardiology #MedTwitter #AILiteracy @VincentRK @HemOncFellows @OncBrothers @DrArturoAI @montypal @operationdanish @Papa_Heme @EricTopol @DrRishabhOnco @OncoAlert @OncoReporte @Larvol @OncologyBGLab @JavierDavidBen2 @csoncol @Timothee_MD @JCOOP_ASCO @TwoOncDocs @FCademartiri @doctorbhargav
YouTube video
YouTube
English
1
0
3
417
Allen Li MD
Allen Li MD@LiAllenMD·
@FCademartiri That’s a very nuanced read on the paper! Agree that the headline runs ahead of the data. On the concern of imaging processing and hallucination, the main text and the supplement tell two different stories.👇 youtube.com/shorts/cLB8Ckt…
YouTube video
YouTube
English
0
0
0
56
Dr. Filippo Cademartiri
Dr. Filippo Cademartiri@FCademartiri·
🤖🫀 “AI is outperforming doctors.”
That’s the headline.
This paper is the fine print—and it matters much more. This Nature Medicine study shows that a multimodal AI system (AMIE): 👉 outperforms primary care physicians 👉 in diagnostic accuracy 👉 in history-taking 👉 even in “empathy” Impressive. But only if you don’t look too closely. Let’s unpack what actually happened This is: 👉 OSCE-style simulations 👉 with patient actors 👉 in text-based chat 👉 with curated multimodal inputs (images, ECGs, documents) Translation This is not clinical practice. This is: 👉 a controlled exam environment And in that environment, AI shines Because the system is designed to: 👉 follow structured phases (history → diagnosis → management) 👉 ask systematic questions 👉 never get tired 👉 never rush 👉 always explain everything Meanwhile, physicians are… human In a chat interface: 👉 no physical exam 👉 no visual cues 👉 no workflow support 👉 no time to “optimize” each answer So what are we really comparing? Not: ❌ AI vs doctors But: 👉 optimized system vs constrained humans The key methodological tension 1. Simulation ≠ reality The paper explicitly states: 👉 not a randomized clinical trial 👉 exploratory study 2. Multimodality is limited Yes, images + ECG + documents. But: 👉 no real imaging workflows 👉 no acquisition decisions 👉 no uncertainty in data quality beyond controlled scenarios 3. The “empathy” paradox AI scores higher in empathy. But: 👉 empathy here = text-based ratings 👉 structured, polite, exhaustive responses Not: 👉 real human interaction under pressure The most interesting (and under-discussed) point The system performs better because: 👉 it enforces a structure that clinicians often don’t follow consistently Which leads to a slightly uncomfortable thought: 👉 Is AI better… or just more systematically organized? My take This is a very strong engineering paper. But it is not yet a clinical superiority paper. What it really proves 👉 multimodal reasoning improves diagnostic dialogue 👉 structured workflows improve performance 👉 AI can operationalize both at scale What it does NOT prove ❌ that AI performs better in real patients ❌ that it improves outcomes ❌ that it can handle true clinical complexity Bottom line AI is not replacing physicians. But it is exposing something very clearly: 👉 consistency beats intuition in controlled environments ⚡ The real opportunity is not: “AI vs doctors” It is: 👉 using AI to make doctors more systematic #AIinMedicine #DigitalHealth #ClinicalAI #MultimodalAI #Healthcare #CriticalAppraisal
Dr. Filippo Cademartiri tweet media
English
2
6
11
1.4K
George R. Marzouka
George R. Marzouka@DrMarzouka·
The diagnostic conversation just evolved. New research in Nature Medicine shows multimodal AMIE can now process patient history alongside medical images, ECGs, and clinical documents during telehealth visits — mimicking how experienced clinicians synthesize visual and verbal cues. In 105 simulated consultations, specialist evaluators found the AI excelled across 29 of 32 measures including diagnostic accuracy, conversation quality, and empathy. The breakthrough: a state-aware dialogue framework that dynamically adjusts questioning based on diagnostic uncertainty. What makes this compelling isn't just the breadth of data it handles, but how it structures reasoning like seasoned physicians — knowing when to probe deeper, when to pivot, and how evolving patient responses should guide the next question. This could fundamentally change telehealth access, especially in underserved areas where multimodal diagnostic expertise is scarce. doi.org/10.1038/s41591…
English
1
0
4
115
Gabe Wilson MD
Gabe Wilson MD@Gabe__MD·
Nature Medicine just published the formal version of something I demonstrated in a crude pilot last week: AI outperforming physicians across multimodal clinical reasoning. Google's team ran 105 simulated multimodal consultations involving dermatology photographs, ECGs, and clinical documents. 210 total conversations. 18 specialist physicians evaluated blinded. Multimodal AMIE versus board-certified primary care physicians. AMIE outperformed PCPs on 29 of 32 evaluation axes. Diagnostic accuracy higher across top-1 through top-10 differentials, all modalities, P<0.001. Three findings deserve attention. First, this was not single-modality performance. AMIE beat PCPs on skin images, ECGs, and clinical documents simultaneously. History-taking, management reasoning, clinical communication. 29 axes is not a narrow win. It is comprehensive. Second, the hallucination data needs careful reading. The frequency of misreporting between AMIE and PCPs was *not* significantly different. What was significantly different was the *consequence*. When misreporting occurred, it damaged PCP diagnostic accuracy more than AMIE's. AMIE was also rated higher on artifact-grounded reasoning and more robust to low-quality images. Hallucinating at similar rates does not mean equivalent reliability when the downstream diagnostic effect is asymmetric. Third, the empathy finding. Patient-actors rated AMIE higher on empathy, history-taking, and willingness to return. PCPs lost on the dimension they have historically claimed as uniquely human. The honest limitations. Exploratory study, not an RCT. Patient-actors, not real patients. Chat interface, no physical examination. Built on Gemini 2.0 Flash, no longer a frontier model. Some underlying datasets were publicly available and may have been encountered during pretraining. Built on a model that is already two generations behind the current frontier. On a non-frontier model, AI outperformed board-certified physicians across 29 of 32 axes. The current frontier models are materially more capable. This is not a fringe result. It is in Nature Medicine, evaluated by independent specialists, with comprehensive multimodal scenarios and rigorous statistical analysis.
Ryu Tanno | 丹野 龍太郎@RyutaroTanno

Very happy (and relieved) to see our work on multimodal conversational medical AI accepted in @NatureMedicine nature.com/articles/s4159… In the published version, we have substantially expanded on the analysis and evaluation. Kudos to @_cjpark @timstro @JanFreyberg @_khaledsaab This work also formed an important precusor for our more recent work where we explored a similar problem but in real-time interaction: x.com/RyutaroTanno/s… Both modes of UX (synchronous and asynchronous) are useful but in different ways. Also a nice reminder that a prospective evaluation remains as an important future work.

English
6
5
44
16.1K
Oncology Brothers
Oncology Brothers@OncBrothers·
DatoDXd now @US_FDA ✅ in 1L mTNBC in PDL1 negative/IO ineligible based off TROPION-Breast02 - mPFS 10.8 vs 5.6mos (HR: 0.57) - mOS 23.7 vs. 18.7mos (HR: 0.79) - ORR 62.5% vs. 29.3% - Common AEs: mucositis & occular AEs #OncTwitter #bcsm @OncUpdates
Oncology Brothers tweet mediaOncology Brothers tweet mediaOncology Brothers tweet mediaOncology Brothers tweet media
English
5
41
71
10.9K
OncoAlert
OncoAlert@OncoAlert·
News from industry: SERENA6 Update in #BreastCancer Source : AstraZeneca buff.ly/AG5XMPO EU CHMP has recommended approval of AstraZeneca’s camizestrant combined with a CDK4/6 inhibitor for ER-positive, HER2-negative advanced breast cancer with emergent ESR1 mutations after first-line endocrine therapy. Based on SERENA-6, the regimen reduced risk of disease progression or death by 56%, improving median PFS to 16.0 vs 9.2 months. PFS2 also improved, with overall survival data still maturing. #BreastCancer Ping @matteolambe @aftimosp @E_de_Azambuja @DrSGraff @ErikaHamilton9 @double_whammied @maryam_lustberg @raalbany @hoperugo @stolaney1 @LoiSher @SirohiBhawna @jamecancerdoc @JavierCortesMD @JaniceTNBCmets @Prof_Nadia_H @nataliagandur @acampsmalea @FernandoOnco @ElisaAgostinett @to_be_elizabeth @realbowtiedoc
OncoAlert tweet media
English
1
14
16
6.1K
Allen Li MD
Allen Li MD@LiAllenMD·
@Dr_RShatsky @weoncologists Agree that an abstract of the abstract is not what we need. It’s the nuance beyond the headline that will inform day to day practice
English
0
0
7
596
Rebecca Shatsky, MD
Rebecca Shatsky, MD@Dr_RShatsky·
I am not loving this trend of AI generated infographic explanation of pivotal clinical trial data. Many of them are straight up wrong as they aren’t peer reviewed or even created by an expert in that cancer type. Be CAREFUL when using these to learn about data and look at the source. #bcsm #breastcancer
English
9
33
175
14.3K
Allen Li MD
Allen Li MD@LiAllenMD·
Very thought provoking! What standards should we hold AI vs ourselves is the right question besides how we measure it. While the AI benchmark has come a long way, it still has a lot of room for refinement. Take a look the recent Science paper. The bench mark heavily favor verbosity and shotgun approach which is not what we do in medicine👇 youtube.com/shorts/Io9aFmZ…
YouTube video
YouTube
English
0
0
0
46
NEJM
NEJM@NEJM·
Benchmarks that assess when to recognize uncertainty are being expanded, not just for AI agents but for medical trainees. Raja-Elie Abdulnour, MD (@BageLeMage), explains how embedding falsehoods in clinical vignettes can test if either can admit they don't know, a vital skill in patient care. This shared evaluation may raise standards for us all. What strategies help you teach or assess uncertainty in clinical decision making? Watch the full video interview: nej.md/4dMOzru
English
3
4
21
7.7K