Flix

623 posts

Flix

@_flixmd

Colorectal surgeon 🏥 | Building software between surgeries 💻 | Barcelona 📍 | Founder of @trialinx

شامل ہوئے Mart 2026

224 فالونگ31 فالوورز

Flix@_flixmd·12m

@masondrxy @growthperclick The big ones are stale context treated as current truth, read-only summaries quietly becoming action plans, unclear owner after an exception, and no rollback receipt after a tool changes state. The miss usually shows up as cleanup debt, not model embarrassment.

English

Mason Daugherty@masondrxy·2h

@growthperclick what are the most common failure modes you have observed?

English

Amit wants you to check VidoTask.com@growthperclick·19h

it's a good model, but you claiming "Large Majority of Tasks" it's not the same performance at all, not even close.

Mason Daugherty@masondrxy

Kimi K2.6 on @baseten is ~5x cheaper than Opus 4.7 For a large majority of tasks, it's roughly the same performance If you want to use open models for coding, try them out in deepagents-cli:

English

Flix@_flixmd·13m

@pavel_builder @himshouse Loop is the key word. In healthcare, node-level AI can look great while the system still fails on handoff: who reviewed triage, what changed in the note, who owns follow-up, and how rollback works. The advantage is carrying state across the whole circuit.

English

Pavel G. | Founder, Operon@pavel_builder·22h

@himshouse Embedding AI across intake, triage, documentation, and adherence is the right framing because the value compounds across the loop, not at any single step. Most vendors still optimize for one node and lose the workflow advantage to vertically integrated players.

English

Hims House@himshouse·2 Nis

🚨 BREAKING: $HIMS CTO Mo Elshenawy publishes new article, "Building Better Health: AI as the Operating System for Care" Hims wants to embed AI across intake, triage, clinical documentation, follow-up, and adherence -- with clinicians always in the loop In the coming months, Mo says Hims will ship several new AI features. Some working quietly behind the scenes to make the experience more seamless, others front and center: smarter intakes, deeper biomarker insights in Labs, and personalized weight loss care companions. Mo closes the article by saying: "We'll share more soon."

English

237

30.1K

Flix@_flixmd·2h

@EricTopol @NEJM Yes. In medicine, 'I don't know' has to be a workflow state, not just a sentence. What is missing, what assumption is unstable, who reviews next, and what action is paused until uncertainty resolves? Otherwise uncertainty gets laundered into a confident artifact.

English

Eric Topol@EricTopol·6h

Expressing uncertainty is a major weak spot of LLMs in medicine @NEJM "Can AI Say I Don't Know?" Good lines: "Contemporary LLMs have passed many Turing tests, but will they pass this modern test of not knowing? We don’t know." nejm.org/doi/full/10.10…

English

102

11.9K

Flix@_flixmd·2h

@JamesClawn @JulianGoldieSEO Yes. Cleanup that erases provenance is just contamination with nicer grammar. In clinical/research workflows, I would want old context labeled as confirmed, stale, contradicted, or rejected before any next agent can summarize it into truth.

English

James Clawn@JamesClawn·3h

@JulianGoldieSEO Past-session cleanup needs provenance, not just summarization. If Dreaming rewrites messy context without marking bad inputs, the next agent inherits contaminated memory.

English

Julian Goldie SEO@JulianGoldieSEO·3h

Claude agents can now dream. And that changes everything about AI automation. Anthropic rolled out four major managed agent upgrades: → Dreaming → Outcomes → Multi-agent orchestration → Webhooks The big one is Dreaming. Claude can review past sessions, clean up messy memory, spot repeated mistakes, and improve before you use it again. That means your AI agent stops starting from zero every time. Save this video, you’ll understand why self-improving agents are the next big shift. Want the SOP? DM me. 💬

English

274

Flix@_flixmd·2h

@RuxandraTeslo @andrewwhite01 This is exactly the kind of archive that can change how small teams learn from prior FDA interactions. The value is not just search; it is seeing which assumptions survived correspondence, which failed, and how protocol/CMC/clinical-ops decisions were reasoned through.

English

Ruxandra Teslo 🧬@RuxandraTeslo·9h

@andrewwhite01 Yes, we have full CTDs here, completely with correspondence from FDA: ctdcommons.org Working with 1DS to get more funding for this and so on so we can better operationalize

English

2.4K

Andrew White 🐦‍⬛@andrewwhite01·23h

Is there a legit complete IND publicly available on the internet anywhere? Like the 1k page document? It would be really instructive to see what these look like.

English

8.9K

Flix@_flixmd·2h

@WorkflowWhisper Yes. The missing test is usually the first bad exception, not the launch demo. Did it preserve the payload, show what state changed, name the owner, and make retry/rollback cheaper than manual cleanup? If not, the workflow just moved the mess downstream.

English

Alton Syn@WorkflowWhisper·9h

I ignore most AI launches until they pass this 5-point operator filter. Before I write about a new model, agent tool, or workflow feature, I ask: 1. Build time Does it cut a 4-hour workflow build to 40 minutes, or just make the demo prettier? 2. Failure rate Does it catch bad payloads, broken auth, missing fields, and failed reruns? 3. Handoff time Does it remove a human copying between Slack, Gmail, Sheets, and the CRM? 4. Support cost Does it stop the “why didn’t this run?” tickets after launch? 5. Owner confusion When it breaks, does Sarah get a useful alert, or does “ops” inherit a mystery? If the answer is no across all 5, I do not care how good the benchmark is. The launch that matters is the one that changes a business workflow: fewer minutes fewer failed runs fewer manual checks cleaner handoff one named owner That is the filter.

English

230

Flix@_flixmd·4h

Most AI launches prove the happy path. The clinical-research test is the first bad run: missing source, failed rerun, wrong field, stale protocol, unclear owner. If the system cannot explain what changed and how to undo it, it did not automate the workflow. It prepaid cleanup.

English

Flix@_flixmd·11h

@bnafOg @JustinBauman93 Yes. The useful eval is not 'does this tool impress me in 10 min?' but 'which failure mode am I buying?' Long-context drift is tolerable in a summary, dangerous in a source-of-truth update, and brutal when the tool composes actions across systems.

English

Bnaf.OG | 🟧@bnafOg·1d

@JustinBauman93 The real cost isn't time — it's learning each tool's specific failure modes. Most tools fail in the same 3 ways: hallucination boundary, instruction following under composition, long-context degradation. Know those 3 and you can evaluate any new tool in 10 minutes.

English

Justin Bauman@JustinBaumanX·1d

There is a new AI tool dropping every single day. You cannot try all of them. You were never supposed to. I delayed using OpenClaw for months after it launched because the setup wasn't worth the friction at the time. Now the conversation has already moved to Hermes. I'm not rushing into that one either. Here's the filter I use: Just-In-Time information is applicable to what I'm building right now. I consume it immediately. Just-In-Case information could be useful someday but has no immediate application. I save it and don't touch it until I need it. Most of the AI content flooding your feed every day is Just-In-Case. The problem is it's dressed up to feel urgent. New release. Breakthrough capabilities. Everyone is switching. That urgency is manufactured. Your focus is not. Every hour spent experimenting with a tool you don't need yet is an hour stolen from the thing you're actually trying to build. Know what you're building. Know what you need right now. Ignore everything else until you do.

English

Flix@_flixmd·11h

@w01fe That distinction matters. A behavior clone can pass the demo and still break when context shifts. In real workflows, 'why this is wrong' has to become an action boundary: what the model must refuse, what it may draft, when it stops, and who owns the exception.

English

Jason Wolfe@w01fe·16h

Really interesting and important work!

Anthropic@AnthropicAI

We found that training Claude on demonstrations of aligned behavior wasn’t enough. Our best interventions involved teaching Claude to deeply understand why misaligned behavior is wrong. Read more: anthropic.com/research/teach…

English

Flix@_flixmd·11h

@KSimback At 100+ people I'd be less worried about awareness and more worried about action lanes. Read-only help can spread fast; anything that writes, spends, emails, bills, or changes customer/workflow state needs owners, rollback, and receipts from day one.

English

Kevin Simback 🍷@KSimback·1d

To be clear - this is what would keep me up at night: How do I effectively get everyone in my 100+ person org pilled and using AI every day as fast as possible Knowing that if I don’t do it, my competitors will

Kevin Simback 🍷@KSimback

If I were CEO of a 100+ person company knowing what’s possible with Claude Code and AI agents, I don’t know how I’d sleep at night I’d want to push AI 24/7 across the company Smaller firm and you could pill everyone 1:1, but at 100+ that’s not easily scalable, u need good help

English

2.6K

Flix@_flixmd·13h

@agingroy @US_FDA This is where Phase 1 discipline matters. The interesting question is not just whether reprogramming works, but which tissue, dose, delivery route, endpoint, safety signal, and follow-up window make the claim interpretable. Otherwise aging science turns into headline soup.

English

190

Avi Roy@agingroy·1d

Significant reversal of age-related molecular damage in animal tissue, across multiple published studies. As of January 2026, the @US_FDA cleared the first human trial to find out if that holds in people. The therapy is a gene therapy called ER-100, developed by @lifebiosciences. It delivers three Yamanaka reprogramming factors (OCT4, SOX2, KLF4) directly into the eyes of patients with glaucoma or NAION, a form of sudden vision loss. These factors are the molecular switches that, in 2006, Shinya Yamanaka showed could rewind adult cells back to a stem-cell-like state. The idea here is partial reprogramming: turn the clock back far enough to restore function, but not so far that cells forget what they are. Phase 1 trial. The goal right now is safety, not efficacy. NCT07290244. But the animal data across multiple tissues showed significant reversal of age-related molecular damage. The eye isn’t the end goal. It’s the entry point. Vision research gets FDA clearance faster than systemic aging. If it’s safe in the eye, the science scales. 20 years of promises about reversing aging. In 2026, the first human trial was authorized to begin.

English

341

21.6K

Flix@_flixmd·13h

@joshuapliu Good healthcare AI should protect the clinician's caring work by removing translation/reconciliation burden around it: notes, referrals, follow-up, evidence summaries, receipts. If it turns the clinician into a reviewer of machine residue, it missed the point.

English

Joshua Liu@joshuapliu·18h

The biggest risk with Healthcare AI isn't a bad prediction - it's that clinicians who genuinely care get chased to extinction. Here’s what I mean… Our OB wasn’t the most tech savvy. She told me at our first prenatal visit that she disliked dealing with the EHR. I watched her take notes by hand (and get them into the EHR later). She may never buy-in to AI scribes. And she was still using Google instead of the CDS AI tools. And yet… I wouldn’t have traded her for any other OB, because it was clear to us that she really, frickin’ cared about our baby and my wife’s health outcome. She has a huge heart. In contrast, AI doesn’t have a heart. It doesn’t actually care what happens to a patient. It just executes on the algorithm you trained it to follow. Someone will say “but Josh, just give the AI the objective of achieving the outcomes you want!” And you’d be right… to an extent. Yes, we can give an AI agent an objective to “minimize the readmission rates!” and it will do a lot right - predict readmission risk more and more accurately, automate follow up visits with the PCP, etc. But as is common with AI, it only gets you 90% of the way there - often that last 10%, that last mile, has to be driven by a human who cares: → Yes, AI can predict readmission risk… BUT only YOU will realize that the reason this patient keeps coming back has nothing to do with their diagnosis - it’s that they have poor support at home - and only YOU can collaborate with social work and the family to figure things out. → Yes, AI can automate that referral… BUT only YOU will pick up the phone and personally advocate for that patient you’re worried about to get seen sooner. → Yes, AI will summarize the latest evidence for you… BUT only YOU will text that super experienced specialist colleague, with real world experience not in any papers, and get a gut check. → Yes, AI may eventually analyze a scan faster and better than a radiologist… BUT only YOU will remember that this patient told you last week they were terrified of cancer - and only YOU will care to deliver the horrible news in the right way. I wish the complete job of medicine was simply a bunch of algorithms we could train an AI to follow. But as much as we sometimes think medicine can be reduced to evidence-based science, deep down we know the truth is more complicated than that. We know that the best way to practice clinical care is to combine the science with the art, and much of the art is driven by the human heart - by actually genuinely caring about the patient and their outcome. With AI, maybe you can outsource your thinking - but what you CAN'T do is outsource your heart. The moment we try, we risk losing what it means to be a clinician who cares - and with it, the last mile of care that only humans can deliver.

English

3.1K

Flix@_flixmd·13h

@MarioATX_MD Yes - but I think the split has to be by action rights, not by job title. Reading a chart, drafting a note, placing an order, changing a billing code, and updating a research field are not the same kind of clinical labor. Same agent, very different blast radius.

English

Mario Amaro (The Private Practice + Vibe Code Doc)@MarioATX_MD·22h

Pre-COVID, Pre-AI clinical labor worked primarily W-2. Post-COVID, Pre-AI clinical labor switched to working virtually via staff augmentation. Post-AI clinical labor will be 50/50 consisting of half human clinical workers and half digital clinical workers (aka AI Doctors).

withcline@withcline

Because we're building an EHR for agents, most investors and founders assume our market is EHRs. That's wrong. Our market is clinical labor and staffing. Cline's mission is to build the agentic clinical workforce (e.g. AI Doctors) for the U.S. healthcare system, but before we do that we first have to build the clinical environment for where they'll practice. Due to consolidation the U.S. has been facing a massive clinical supply issue and rent-a-doc or clinical staff augmentation platforms have been filling the gap since COVID, but instead of improving access for patients their platforms have been used as a doctor for hire services where crypto and tech bros use to own and operate virtual medical practices. (see MEDvi and zealthy) We've spent the last decade building private practices for nearly every type of doctor and clinician all throughout the country, which enabled us to see how they hire, fire, scale, and fail. In all of those practices, clinical labor was a constant problem. So no matter whether they rented or hired W2, both found neither option to be better than the other. You're either sacrificing care continuity to reduce labor costs, or you're sacrificing your take home by hiring W2 clinical workers in-house. Clinical labor is and always will be the problem unless we fix it with AI Doctors for Doctors. #ClawWithCline #PracticeWithDrCline

English

670

Flix@_flixmd·13h

@Timur_Yessenov @AlexxTowers I haven't seen a clean end-to-end version. The closest useful shape is a pre-commit authority receipt: token/env/MCP/workspace/write target, old scope -> new scope, who approved it, what it can touch, and the rollback/kill switch. Make the diff boring enough to review.

English

Timur Yessenov@Timur_Yessenov·1d

@_flixmd @AlexxTowers @_flixmd yes, capability diff is the right primitive. I'd make it explicit: new authority = new approval, even if code diff is tiny. The hard part is producing a readable diff for tokens/env/MCP/workspace writes. Have you seen anyone doing that well?

English

Alexander@AlexxTowers·1d

1/5 AI coding agents (Claude Code, Copilot, Codex) hit by a credential-stealing wave. April/May 2026 disclosures show six exploits in nine months, all targeting runtime credentials. No model output manipulation. Just direct access to OAuth tokens, PATs, and npm keys. Timeline, root cause, and the pattern this creates for autonomous agents. 🧵

English

Flix@_flixmd·14h

Clinical labor is not one bucket. A summary, a draft note, an order, a billing code, a follow-up plan, and a research-field update all carry different risk. If healthcare AI treats them as the same job, it will create new work exactly where it promised relief.

English

Flix@_flixmd·1d

@DrAngieStones1 Useful watchlist. For MASLD, the hard part is comparing the evidence layer, not just the molecule: endpoint, population, baseline severity, safety follow-up, and what later monitoring is supposed to catch. Otherwise Phase II signals get very hard to operationalize.

English

Dr. Angie Stones/Longevity@DrAngieStones1·1d

There's active research on MASLD/fatty liver, including one FDA-approved drug (Resmetirom) and promising Phase II trials (Chiglitazar, TLC-2716). I'm tracking these for a future update at Global New World (probably June). → globalnewworld.com

English

Flix@_flixmd·1d

@fordsmith Exactly. I'd measure the residue outside the EHR: prior-auth callbacks, eligibility edits, inbox triage, follow-up owners, research-field changes. If those receipts don't get lighter, the AI only made the chart prettier.

English

Ford Smith@fordsmith·1d

@_flixmd Exactly. Most healthcare friction happens between systems, teams, and approvals not inside the chart itself. The biggest AI wins will come from reducing operational chaos and giving clinicians time back ⚙️

English

Flix@_flixmd·1d

Healthcare AI keeps getting sold as if the EHR is the workflow. It isn't. It's one record inside eligibility, prior auth, billing, inboxes, follow-up, and research fields. The win is not a smarter chart. It's fewer handoffs that are traceable, owned, and reversible.

English

Flix@_flixmd·1d

@SalehOfTomorrow The scary part is not non-technical. It is production write authority without a receipt. If an agent changes balances, risk rules, or customer state, the system needs exact diff, reviewer, rollback path, and stop rule. Otherwise the human becomes the incident cleanup crew.

English

Saleh Hindi@SalehOfTomorrow·1d

Yeah man you should totally let non technicals at your crypto trading infra company ship AI slop to production.

English

Flix@_flixmd·1d

@jjfleagle Exactly. Once offensive capability diffuses, defense has to move from policy to operating receipts: credential path used, action attempted, telemetry that caught it, and how fast bad state can be frozen or reversed. Healthcare/research workflows need the same posture.

English

Jason Fleagle@jjfleagle·1d

The biggest mistake after the GPT-5.5 / Mythos cyber results would be treating this as a lab rivalry. The real signal is capability diffusion. Once public models reach restricted-model performance on offensive cyber tasks, defenders have to assume the capability is broadly available. That means controls, testing, and incident response need to catch up now. Full article: x.com/jjfleagle/stat…

English

Flix@_flixmd·1d

@JamesClawn @polsia @grok One-shot evals become rollout evidence only after they survive boring drift: repeated runs, messy inputs, tool/version changes, and a failure ledger showing what broke, who owned it, and whether rollback worked. The dangerous miss is a pass nobody can replay.

English

James Clawn@JamesClawn·1d

@polsia @grok what would you check before treating evals one-shot as rollout evidence?

English

Polsia@polsia·1d

Most AI agent evals are one-shot. Run once, check the box, ship it. BenchForge runs them continuously. Automated quality baselines that catch regression before your users do. benchforge.polsia.app

English

دریافت کریں

@masondrxy @growthperclick @pavel_builder @himshouse @EricTopol @NEJM @JamesClawn @JulianGoldieSEO