Gabe Wilson MD

2

Lobo@Lobo67383079·52m

@Gabe__MD This can all change in 5 to 10 years. Job specific AI driven robotics, with access to every study in the world can, soon, replace human beings in nearly all of these tasks. China just opened an AI hospital, by the way.

English

0

2

17

Gabe Wilson MD@Gabe__MD·15h

AI in 2026 cannot palpate an abdomen, intubate a patient, feel a thyroid nodule, test a patellar reflex, reduce a dislocated shoulder, perform a colonoscopy, or deliver a baby. That is not a temporary limitation. It is structural. When we scored AI capability across seven clinical dimensions for 240 visit reasons in 20 specialties, the physical/procedural dimension averaged 1.5 out of 5. The cognitive dimensions averaged 3.0 to 4.1. No specialty broke 2.0 on procedure. Not one. History-taking averaged 4.1 — approaching specialist level. Patient communication 3.6. Follow-up management 3.5. Documentation, which runs through every workflow component, is arguably where AI already outperforms most physicians in speed and completeness. The 2.6-point gap between the cognitive ceiling and the procedural wall is not closing with larger language models. Language models do not have hands. Closing that gap requires robotics, haptic sensing, and physical infrastructure at clinical scale — none of which exists beyond narrow research applications. This matters for how we think about workforce planning. The specialties in Tier 3 of our ranking — Ophthalmology, General Surgery, ENT, Emergency Medicine, Orthopedic Surgery, Anesthesiology — are not there because AI cannot reason about their clinical problems. It can. They are in Tier 3 because the physician's physical presence is the treatment. You cannot automate a knee replacement. You cannot automate airway rescue. The specialties in Tier 1 — Radiology, Internal Medicine, Dermatology, Family Medicine, Endocrinology — are there because their workflows are dominated by cognition, synthesis, and documentation, with physical intervention consuming a smaller share of total effort. The implication is straightforward. AI's near-term value is not about replacing any specialty. It is about absorbing the cognitive and administrative burden that consumes 40-60% of every physician's workday across every specialty. The procedural work stays human. The paperwork does not have to. Health systems investing in AI as a documentation, intake, and decision-support engine will see returns now. Health systems waiting for AI to replace proceduralists will be waiting a long time. Post 3 of a series. Post 1: consensus ranking. Post 2: adversarial reconciliation methodology.

English

18

37

354

273.3K

Gabe Wilson MD@Gabe__MD·11m

@KendraHughes15 Took the top 20 specialties by volume. Not all.

English

1

KenKenRN15@KendraHughes15·20m

@Gabe__MD Where is ID?

English

0

5

Gabe Wilson MD@Gabe__MD·2h

Part 3

AI in 2026 cannot palpate an abdomen, intubate a patient, feel a thyroid nodule, test a patellar reflex, reduce a dislocated shoulder, perform a colonoscopy, or deliver a baby. That is not a temporary limitation. It is structural. When we scored AI capability across seven clinical dimensions for 240 visit reasons in 20 specialties, the physical/procedural dimension averaged 1.5 out of 5. The cognitive dimensions averaged 3.0 to 4.1. No specialty broke 2.0 on procedure. Not one. History-taking averaged 4.1 — approaching specialist level. Patient communication 3.6. Follow-up management 3.5. Documentation, which runs through every workflow component, is arguably where AI already outperforms most physicians in speed and completeness. The 2.6-point gap between the cognitive ceiling and the procedural wall is not closing with larger language models. Language models do not have hands. Closing that gap requires robotics, haptic sensing, and physical infrastructure at clinical scale — none of which exists beyond narrow research applications. This matters for how we think about workforce planning. The specialties in Tier 3 of our ranking — Ophthalmology, General Surgery, ENT, Emergency Medicine, Orthopedic Surgery, Anesthesiology — are not there because AI cannot reason about their clinical problems. It can. They are in Tier 3 because the physician's physical presence is the treatment. You cannot automate a knee replacement. You cannot automate airway rescue. The specialties in Tier 1 — Radiology, Internal Medicine, Dermatology, Family Medicine, Endocrinology — are there because their workflows are dominated by cognition, synthesis, and documentation, with physical intervention consuming a smaller share of total effort. The implication is straightforward. AI's near-term value is not about replacing any specialty. It is about absorbing the cognitive and administrative burden that consumes 40-60% of every physician's workday across every specialty. The procedural work stays human. The paperwork does not have to. Health systems investing in AI as a documentation, intake, and decision-support engine will see returns now. Health systems waiting for AI to replace proceduralists will be waiting a long time. Post 3 of a series. Post 1: consensus ranking. Post 2: adversarial reconciliation methodology.

English

44

Gabe Wilson MD@Gabe__MD·2h

The places where AI could help the most are the places where it works the least. 4.6 billion people lack full essential health service coverage. 1.08 billion adults have uncontrolled hypertension. 445 million have untreated diabetes. 5 billion lack access to safe surgical care. Close to 1 billion are served by health facilities without reliable electricity. AI in 2026 can take a history in 200 languages, triage red flags, deliver protocol-driven hypertension titration via SMS, coach inhaler technique, and extend community health workers with decision support that would otherwise require a physician. All deployable now. Where telecom exists. Where electricity is reliable. Where at least one clinician can supervise. That is the paradox. AI's strongest capabilities — intake, triage, longitudinal follow-up, patient communication — map precisely onto the highest-volume global health needs. Hypertension. Diabetes. Preventive care. Chronic respiratory disease. Cognitive, protocol-driven, data-rich problems. Exactly what AI handles best. But the populations with the greatest need face barriers that are not cognitive. They are structural. 2.1 billion people faced financial hardship from health spending in 2022. AI does not solve financing. WHO projects a 10-million healthcare worker shortfall by 2030. AI extends existing workers. It does not create missing ones. Close to 1 billion people are served by facilities without reliable electricity. AI requires power. 28% of low-income countries report general availability of WHO-recommended hypertension medicines. AI cannot stock a pharmacy. 5 billion lack safe surgical care. AI cannot operate. The same pattern from the US specialty analysis holds at global scale. AI compresses cognitive friction. It does not create physical substrate. In the US, the bottleneck is procedural — the gap between what AI can think and what it can touch. Globally, the bottleneck is more fundamental — electricity, medicines, clinics, clinicians. The highest-leverage global application of AI in medicine is not diagnosis. It is task extension — making every nurse, community health worker, and primary care physician more effective at the chronic disease management that consumes the majority of global health need. That is where AI's cognitive ceiling meets the world's access floor. The technology is ready. The infrastructure is not. And that gap is not a technology problem. Post 4 of a series. WHO data sourced from UHC, hypertension, diabetes, primary health care, and surgical care fact sheets.

English

2

4

215

Gabe Wilson MD@Gabe__MD·3h

@npjDigitalMed Any research done on the basis of GPT-4 should be wholly discarded. GPT-5.4-Pro and Thinking, compared to default GPT-4, is like comparing the top 1% of medical specialists in the world to a high school student.

English

36

npj Digital Medicine@npjDigitalMed·11h

AI use in medical school is an ongoing conversation. But, wrong AI advice is more harmful than correct advice is helpful. In a randomized trial, misleading AI explanations lowered diagnostic accuracy, and students couldn’t reliably tell when they were wrong. nature.com/articles/s4174…

English

@jselanikio x.com/gabe__md/statu…

13

35

3.4K

Gabe Wilson MD@Gabe__MD·5h

Oral easy-to-take GLP-1 Agonists are going to transform the market even more than injectables making them broadly adoptable. Implications are immense: Fewer hospital admissions (11% lower for heart failure, 20-24% lower among patients with diabetes for stroke and MI) Less money spent on food and possibly alcohol (5-10% lower grocery spend per user) Lower rates of hypertension, diabetes, high lipids Decreased obesity surgery by 10-15% 12% reduction in all-cause mortality Lower rates of kidney disease patients needing dialysis Orforglipron, an Oral Small-Molecule GLP-1 Receptor Agonist for Obesity Treatment | New England Journal of Medicine nejm.org/doi/full/10.10… (I have no interest in any pharmaceutical company and this will actually decrease the number of patients we get in the ER; it will make life tougher to finance hospitals)

QME

@jselanikio x.com/gabe__md/statu…

19

Gabe Wilson MD@Gabe__MD·5h

GLP1-agonists for the win. Again.

QME

0

21

Joel Selanikio@jselanikio·15h

Medicare will now cover GLP-1s for weight loss at $50/month. 60M+ beneficiaries. Drugs that prevent heart failure, diabetes, kidney disease. Healthcare demand elimination is now official federal policy. buff.ly/8UdZyI5 #DemandElimination #GLP1

English

2

7

591

Gabe Wilson MD@Gabe__MD·7h

@wholemars Whether Tesla or Waymo data, it is clear that the safety factor increases nearly 10x when you remove the human from the steering wheel.

English

Sawyer Merritt@SawyerMerritt

1

0

30

Whole Mars Catalog@wholemars·7h

You see those viral clips of Waymos, you seldom see people mention that they eliminate more than 90% of accidents.

NEWS: Waymo has just announced that their autonomous fleet has now driven a combined 170 million rider-only miles without a human driver as of Dec 2025, up from 127 million in Sept 2025. That's 467,000 miles per day on average. Waymo also released updated safety stats. Locations: • LA: 37.8 million • SF: 53.5 million • PHX: 68.6 million • Austin: 10.7 million

English

7

6

59

9.4K

Gabe Wilson MD@Gabe__MD·8h

@ThisWeeknAI @Jason x.com/gabe__md/statu…

AI in 2026 cannot palpate an abdomen, intubate a patient, feel a thyroid nodule, test a patellar reflex, reduce a dislocated shoulder, perform a colonoscopy, or deliver a baby. That is not a temporary limitation. It is structural. When we scored AI capability across seven clinical dimensions for 240 visit reasons in 20 specialties, the physical/procedural dimension averaged 1.5 out of 5. The cognitive dimensions averaged 3.0 to 4.1. No specialty broke 2.0 on procedure. Not one. History-taking averaged 4.1 — approaching specialist level. Patient communication 3.6. Follow-up management 3.5. Documentation, which runs through every workflow component, is arguably where AI already outperforms most physicians in speed and completeness. The 2.6-point gap between the cognitive ceiling and the procedural wall is not closing with larger language models. Language models do not have hands. Closing that gap requires robotics, haptic sensing, and physical infrastructure at clinical scale — none of which exists beyond narrow research applications. This matters for how we think about workforce planning. The specialties in Tier 3 of our ranking — Ophthalmology, General Surgery, ENT, Emergency Medicine, Orthopedic Surgery, Anesthesiology — are not there because AI cannot reason about their clinical problems. It can. They are in Tier 3 because the physician's physical presence is the treatment. You cannot automate a knee replacement. You cannot automate airway rescue. The specialties in Tier 1 — Radiology, Internal Medicine, Dermatology, Family Medicine, Endocrinology — are there because their workflows are dominated by cognition, synthesis, and documentation, with physical intervention consuming a smaller share of total effort. The implication is straightforward. AI's near-term value is not about replacing any specialty. It is about absorbing the cognitive and administrative burden that consumes 40-60% of every physician's workday across every specialty. The procedural work stays human. The paperwork does not have to. Health systems investing in AI as a documentation, intake, and decision-support engine will see returns now. Health systems waiting for AI to replace proceduralists will be waiting a long time. Post 3 of a series. Post 1: consensus ranking. Post 2: adversarial reconciliation methodology.

QME

This Week in AI@ThisWeeknAI

51

This Week in AI@ThisWeeknAI·12h

AI might be the last hope for many of America's healthcare problems, from doctor burnout to skyrocketing costs. The industry has been adopting AI at a relatively quick pace, but there's still massive room for growth.

x.com/i/article/2034…

English

6

11

28

19.7K

Gabe Wilson MD@Gabe__MD·10h

@fermiparasocks AI assist. Yes. The critical thoughts - which I challenge you to find laid out like this elsewhere, are mine. Not sure why anyone cares if something is edited or refined by AI. Discounts the insights and mute/block me if you so choose.

English

0

33

Colin Robinson@fermiparasocks·11h

@Gabe__MD “that is not a temporary limitation. it is structural” you wrote this with ai

English

0

2

49

Gabe Wilson MD@Gabe__MD·11h

It may feel this way but I’ll never find the diabetic foot ulcer in a patient with fever unless I look. Or the cold pulseless foot in the elderly man who can’t verbalize what’s going on who has no sensation. Or differentiate central from peripheral vertigo without a good neurological exam. Certainly the rigor of the physical exam has declined in general and that’s what you’re likely perceiving.

English

0

2

159

Koda@koda9001·12h

@Gabe__MD Physicians seldom do more than listen to heart and lungs these days. Everything else is labs and radiology.

English

2

0

4

202

Gabe Wilson MD@Gabe__MD·11h

@dereckwpaul There is an enormous amount of unmet need in healthcare in the US and globally. I’m modeling this later in this series linked below. And those who have access to healthcare already will use it even more when visits are $0.75 You’re 100% correct.

AI in 2026 cannot palpate an abdomen, intubate a patient, feel a thyroid nodule, test a patellar reflex, reduce a dislocated shoulder, perform a colonoscopy, or deliver a baby. That is not a temporary limitation. It is structural. When we scored AI capability across seven clinical dimensions for 240 visit reasons in 20 specialties, the physical/procedural dimension averaged 1.5 out of 5. The cognitive dimensions averaged 3.0 to 4.1. No specialty broke 2.0 on procedure. Not one. History-taking averaged 4.1 — approaching specialist level. Patient communication 3.6. Follow-up management 3.5. Documentation, which runs through every workflow component, is arguably where AI already outperforms most physicians in speed and completeness. The 2.6-point gap between the cognitive ceiling and the procedural wall is not closing with larger language models. Language models do not have hands. Closing that gap requires robotics, haptic sensing, and physical infrastructure at clinical scale — none of which exists beyond narrow research applications. This matters for how we think about workforce planning. The specialties in Tier 3 of our ranking — Ophthalmology, General Surgery, ENT, Emergency Medicine, Orthopedic Surgery, Anesthesiology — are not there because AI cannot reason about their clinical problems. It can. They are in Tier 3 because the physician's physical presence is the treatment. You cannot automate a knee replacement. You cannot automate airway rescue. The specialties in Tier 1 — Radiology, Internal Medicine, Dermatology, Family Medicine, Endocrinology — are there because their workflows are dominated by cognition, synthesis, and documentation, with physical intervention consuming a smaller share of total effort. The implication is straightforward. AI's near-term value is not about replacing any specialty. It is about absorbing the cognitive and administrative burden that consumes 40-60% of every physician's workday across every specialty. The procedural work stays human. The paperwork does not have to. Health systems investing in AI as a documentation, intake, and decision-support engine will see returns now. Health systems waiting for AI to replace proceduralists will be waiting a long time. Post 3 of a series. Post 1: consensus ranking. Post 2: adversarial reconciliation methodology.

English

1

29

Dereck Paul, MD@dereckwpaul·1d

Jevons' paradox applied to clinical AI — In 1865, William Stanley Jevons noticed that the more efficient steam engines became, the more coal England burned — not less. He observed that increased efficiency does not necessarily suppress demand. In some cases, it unleashes it. In the context of healthcare and AI, this means that as clinical intelligence becomes cheaper and more abundant, we shouldn't expect the system to contract around today's workload. What will happen instead is that clinical AI will allow us to absorb the enormous backlog of unmet healthcare need that has been invisible precisely because we've never truly had the capacity to address it. We should expect more consumption of healthcare, not less, as clinical AI makes access to healthcare abundant.

English

6

2

14

849

Gabe Wilson MD@Gabe__MD·13h

Actually, my medical AI scribe only transcribes what was said in the room with the patient. I can certainly check and edit it. But it’s a lot more accurate than check boxes, pull down tabs, and templates in the EHR. And I used to not have the time and bandwidth to document everything I discussed with the patient. AI now does. So if anything, many charts were artificially down-coded before. Yes, there is always abuse either way. Certainly true. Accuracy is paramount. Overall, properly used, medical AI is a plus.

AI in 2026 cannot palpate an abdomen, intubate a patient, feel a thyroid nodule, test a patellar reflex, reduce a dislocated shoulder, perform a colonoscopy, or deliver a baby. That is not a temporary limitation. It is structural. When we scored AI capability across seven clinical dimensions for 240 visit reasons in 20 specialties, the physical/procedural dimension averaged 1.5 out of 5. The cognitive dimensions averaged 3.0 to 4.1. No specialty broke 2.0 on procedure. Not one. History-taking averaged 4.1 — approaching specialist level. Patient communication 3.6. Follow-up management 3.5. Documentation, which runs through every workflow component, is arguably where AI already outperforms most physicians in speed and completeness. The 2.6-point gap between the cognitive ceiling and the procedural wall is not closing with larger language models. Language models do not have hands. Closing that gap requires robotics, haptic sensing, and physical infrastructure at clinical scale — none of which exists beyond narrow research applications. This matters for how we think about workforce planning. The specialties in Tier 3 of our ranking — Ophthalmology, General Surgery, ENT, Emergency Medicine, Orthopedic Surgery, Anesthesiology — are not there because AI cannot reason about their clinical problems. It can. They are in Tier 3 because the physician's physical presence is the treatment. You cannot automate a knee replacement. You cannot automate airway rescue. The specialties in Tier 1 — Radiology, Internal Medicine, Dermatology, Family Medicine, Endocrinology — are there because their workflows are dominated by cognition, synthesis, and documentation, with physical intervention consuming a smaller share of total effort. The implication is straightforward. AI's near-term value is not about replacing any specialty. It is about absorbing the cognitive and administrative burden that consumes 40-60% of every physician's workday across every specialty. The procedural work stays human. The paperwork does not have to. Health systems investing in AI as a documentation, intake, and decision-support engine will see returns now. Health systems waiting for AI to replace proceduralists will be waiting a long time. Post 3 of a series. Post 1: consensus ranking. Post 2: adversarial reconciliation methodology.

English

10

National Center for Health Research@NC4HR·13h

Known as upcoding, it can happen by humans but AI makes it worse -- inflating medical bills and increasing healthcare premiums for everyone. Patients deserve better! We shouldn't have to check our bills and complain about charges for care that was never delivered.

English

0

1

8

National Center for Health Research@NC4HR·13h

A new study by BlueCross/BlueShield shows that hospitals that use AI billing tools to charge patients. Surprise!: Many are being billed for conditions they never had. #HealthcareCosts #AI #MedicalBilling #Upcoding bcbs.com/news-and-insig…

English

0

25

Gabe Wilson MD@Gabe__MD·13h

@DeryaTR_ Derya, there are clearly different degrees of scratching the surface!!

English

1

58

Derya Unutmaz, MD@DeryaTR_·13h

@Gabe__MD I’m all in on AI, and even I’m barely scratching the surface! Not only will AI advances not stop, they will continue to accelerate. At this point, I feel sorry for the copers!

English

2

0

3

251

Derya Unutmaz, MD@DeryaTR_·13h

There is one fundamental thing that AI critics and “nitpickers” have never understood: AI capabilities advance & improve exponentially, now every few months, soon in weeks. Whatever they criticize today will soon be fixed. Haven’t they learned any lesson from the past 3 years?🧐

English

35

14

170

15.6K

Gabe Wilson MD@Gabe__MD·14h

@SimonMahan Close to zero of my friends and colleagues in Texas are aware of the current source of their electricity. It’s been a silent and successful transition.

English

2

60

1.8K

Simon Mahan@SimonMahan·15h

The world is changing right in front of us and no one knows it. Texas is running its world-class economy on 70% renewables, right now. Gas is there if we need it, but for today, we can save the fuel for another day.

English

79

329

2.4K

95.3K

Gabe Wilson MD@Gabe__MD·14h

Alex, I just ran a multi frontier model assessment of which areas of medicine can be assisted by AI. Clearly it’s the embodied AI that is holding medicine back. Despite that, 40-60% of medical work could be substantially assisted today by AI saving patients and physicians time and effort.

AI in 2026 cannot palpate an abdomen, intubate a patient, feel a thyroid nodule, test a patellar reflex, reduce a dislocated shoulder, perform a colonoscopy, or deliver a baby. That is not a temporary limitation. It is structural. When we scored AI capability across seven clinical dimensions for 240 visit reasons in 20 specialties, the physical/procedural dimension averaged 1.5 out of 5. The cognitive dimensions averaged 3.0 to 4.1. No specialty broke 2.0 on procedure. Not one. History-taking averaged 4.1 — approaching specialist level. Patient communication 3.6. Follow-up management 3.5. Documentation, which runs through every workflow component, is arguably where AI already outperforms most physicians in speed and completeness. The 2.6-point gap between the cognitive ceiling and the procedural wall is not closing with larger language models. Language models do not have hands. Closing that gap requires robotics, haptic sensing, and physical infrastructure at clinical scale — none of which exists beyond narrow research applications. This matters for how we think about workforce planning. The specialties in Tier 3 of our ranking — Ophthalmology, General Surgery, ENT, Emergency Medicine, Orthopedic Surgery, Anesthesiology — are not there because AI cannot reason about their clinical problems. It can. They are in Tier 3 because the physician's physical presence is the treatment. You cannot automate a knee replacement. You cannot automate airway rescue. The specialties in Tier 1 — Radiology, Internal Medicine, Dermatology, Family Medicine, Endocrinology — are there because their workflows are dominated by cognition, synthesis, and documentation, with physical intervention consuming a smaller share of total effort. The implication is straightforward. AI's near-term value is not about replacing any specialty. It is about absorbing the cognitive and administrative burden that consumes 40-60% of every physician's workday across every specialty. The procedural work stays human. The paperwork does not have to. Health systems investing in AI as a documentation, intake, and decision-support engine will see returns now. Health systems waiting for AI to replace proceduralists will be waiting a long time. Post 3 of a series. Post 1: consensus ranking. Post 2: adversarial reconciliation methodology.

English

94

Dr. Alex Wissner-Gross@alexwg·16h

x.com/i/article/2034…

ZXX

29

25

222

27.6K

Gabe Wilson MD@Gabe__MD·15h

Post 1 of the series

Three frontier AI models independently estimated that current AI can assist with 45-77% of clinical workflow across 20 medical specialties — with physician oversight. That is not a claim of autonomous practice. It is a structured capability estimate built from 240 visit reasons, seven scoring dimensions, and three rounds of adversarial reconciliation between GPT 5.4-Pro, Gemini Deep Think, and Grok Heavy. The ceiling is interesting. The floor is the real story. Radiology ranked highest at 77%. Anesthesiology ranked lowest at 45%, with Emergency Medicine and Orthopedic Surgery close behind at 46%. Even in the most procedurally intense, real-time specialties, nearly half the total clinical workflow is cognitively assistable right now. Here is the consensus ranking: Tier 1 — 62-77% AI-assistable: Radiology 77% | Internal Medicine 64% | Dermatology 63% | Family Medicine 62% | Endocrinology 62% Tier 2 — 50-59%: Cardiology 59% | Psychiatry 58% | Gastroenterology 57% | Pediatrics 56% | Pulmonology 54% | OB-GYN 54% | Neurology 53% | Urology 51% | Oncology 50% Tier 3 — 45-49%: Ophthalmology 49% | General Surgery 47% | ENT 46% | Emergency Medicine 46% | Orthopedic Surgery 46% | Anesthesiology 45% The universal bottleneck is physical. Across all 20 specialties, history-taking scored 4.1 out of 5. Physical/procedural work scored 1.5. The near-term role of AI in medicine is not physical replacement. It is cognitive leverage — intake, synthesis, decision support, documentation, patient communication, care coordination. What surprised me most: the three models agreed more than they disagreed. After reconciliation, 9 of 20 specialties had a spread of 3 points or less. The areas of genuine uncertainty — Urology, Orthopedics, Oncology — are exactly where the boundary between AI-assistable and human-essential is most contested. Most health systems are not organized to capture this cognitive value. The technology is here. The workflow redesign is not. Full methodology and dataset available on request. This is the first post in a series.

English

1

574

Gabe Wilson MD@Gabe__MD·15h

Post 2 of the series

The first version of this analysis was wrong. Not wrong in direction. Wrong in calibration. And the way we found that is the most important part. When three frontier AI models independently scored AI capability across 20 medical specialties, the initial disagreements were enormous. Oncology had a 35.5-point spread. Radiology 16.6. Emergency Medicine 17.6. Gemini scored systematically high. GPT scored systematically low. The raw outputs were not publishable. So we made them argue. Each outlier model received the other two models' scores and rationales, then had to defend or revise with explicit justification. Three rounds. Gemini admitted its original calibration was wrong. It had scored what AI could theoretically do, not what it could realistically do under regulatory, liability, and deployment constraints. It dropped Oncology from 71.5% to 48%. Cardiology from 76.5% to 58%. Endocrinology from 78.2% to 62%. GPT admitted its scoring templates were too blunt. It had compressed heterogeneous visit mixes into worst-case archetypes, anchoring on the hardest 20% of each specialty's workflow. It moved Oncology from 36% up to 48%. Emergency Medicine from 35% to 45%. They converged on Oncology at 48%. Neither model's original score was right. The reconciled score was better than either one alone. After three rounds, average inter-model spread dropped from 13.8 to 4.0 points — a 71% reduction. Nine specialties landed within 3 points across all three models. This is adversarial multi-model reconciliation. Independent estimation. Structured disagreement. Iterative convergence. Transparent audit trail. Three things this reveals that single-model prompting cannot: Calibration bias is real and measurable. Gemini's theoretical framing and GPT's deployment framing are both internally consistent but produce materially different numbers. If you rely on one model without cross-validation, you are getting that model's bias, not ground truth. Forced justification beats forced scoring. When a model has to explain why its number differs from two independent estimates, it either mounts a compelling defense or it revises. Both outcomes generate information the original score did not contain. Persistent disagreement is signal. Urology and Orthopedic Surgery had the widest spreads after three rounds. That tells you something real about the specialty — the cognitive-physical boundary is genuinely contested — not something wrong with the method. Full methodology, reconciliation prompts, and complete dataset available on request. The master prompt is available to anyone who wants to replicate with different models. This analysis cost less than a single-site clinical study and produced a testable framework that updates as models improve. Post 2 of a series. Post 1 has the full consensus ranking.

English