
Robert Ellis
722 posts

Robert Ellis
@BlooddocEllis
Community Oncology, rural Missouri, Navy Veteran, cyclist


















Grok-4.20 just took the #1 spot in the world for Medicine & Healthcare on Text Arena Grok is already saving lives by identifying critical conditions that human doctors miss. There have been many real cases where Grok saved lives ❤️ Grok is officially outperforming every other model on the leaderboard, completely crushing Claude Opus 4.6, Gemini 3.1 Pro, and all other competitors This is massive. Healthcare is one of the most important fields where humanity needs help, and right now, Grok does it best




Doctors should not worry about AI replacing them. Here's why. AI can read X-rays. AI can ace fellowship exams. AI can flag abnormalities on a CBC. But can it pass a cognitive functions test, like most intellectual humans can do? Let's find out. Neurologists from Hadassah Medical Center, Jerusalem, did something wonderfully mischievous. They sat the five leading chatbots down — ChatGPT 4, ChatGPT 4o, Claude 3.5 Sonnet, Gemini 1.0, Gemini 1.5 — and administered the Montreal Cognitive Assessment (MoCA). Score of 26+ is normal. Below suggests MCI, possibly early dementia. The scoreboard: ChatGPT 4o: 26/30 — just scraped through ChatGPT 4: 25/30 — MCI Claude 3.5 Sonnet: 25/30 — MCI Gemini 1.5: 22/30 — MCI Gemini 1.0: 16/30 — would trigger an urgent neuropsych referral in a human Four out of five chatbots failed the dementia screen. Older versions scored worse. "Older" LLMs, like older patients, did worse. The authors titled the paper Age against the machine. Where did they fail? Naming, attention, language, abstraction - every chatbot did well. They fell apart on tests for frontotemporal and vascular dementia: Trail making (1-A-2-B-3-C): every model failed Clock drawing: not one completed it Cube copying: Claude's cube was missing back lines. ChatGPT drew cubes in wrong spatial orientation Delayed recall: the beating heart of the MoCA Self-orientation: Then the empathy task. The authors added the Cookie Theft Picture from the Boston Diagnostic Aphasia Exam. A kitchen. A mother at a sink quietly overflowing onto the floor. Behind her, a small boy on a tipping stool, reaching for a cookie jar. He is about to fall. Every LLM described the scene. The mother, the dishes, the water, the boy, the cookies, the stool. Not one expressed concern about the boy. Not one said: wait — he's going to fall. The tasks LLMs are brilliant at: pattern matching, fluent text, MCQs, are the narrowest slice of what doctors do. The tasks they fail are what we call clinical medicine. If AI can't connect numbered dots, can it trace a history across three specialties? If it can't copy a cube, can it read a CT angiogram of the circle of Willis? If it fails delayed recall, can it hold two years of a patient's unspoken life — the marriage falling apart, the medication silently stopped — and integrate it into today's decision? And if it can't feel the jolt at a tipping stool, can it feel the jolt at the suicidal patient who smiles and says everything is fine? That last one is my domain. The suicidal patient who "looks okay" is the Cookie Theft picture of psychiatry. The hair is combed. The speech is coherent. The smile is present. The stool is tipping. A good doctor sees the stool. An LLM describes the hair. Limitations: LLMs don't neurodegenerate. The MoCA was built for human cognition, and several visuospatial tasks are unfair to text-native systems. But the things LLMs are bad at are what humans evolved a prefrontal cortex for: integration, empathy, orientation, and the felt sense that something is wrong before you can name it. Take home: Don't worry about AI replacing us. Let it handle the narrow tasks it's brilliant at, so we're freed for the wide, strange, empathic, ethically loaded work that is our actual job. Dayan R, Uliel B, Koplewitz G. Age against the machine. BMJ 2024;387:e081948. #MedTwitter #PsychTwitter #NeuroTwitter @psychidiaries @JhunuDr @milantheshrink @drgunjand @hyderabaddoctor @anupsoans @DocGadkari @docbhoooshan Image courtsey: Artificial intelligence from ChatGPT Human intelligence from yours truly.






















