Leon Qi

17 posts

Leon Qi

Leon Qi

@dmon2048

Founding member of @actAVAai

Katılım Ağustos 2023
690 Takip Edilen14 Takipçiler
Leon Qi retweetledi
actAVA AI
actAVA AI@actAVAai·
CHI-Bench is the world's 1st long-horizon healthcare benchmark for AI agents. If you're building or buying AI for healthcare, this is the test that actually matters — real clinical workflows, not toy demos. U.S. healthcare needs this. 🏥🔬
ModelScope@ModelScope2022

The best AI agent (Claude Code + Claude Opus 4.6) passes only 28% of real healthcare workflow tasks. CHI-Bench by @actAVAai @iscreamnearby @HaolinChen11, built with Johns Hopkins, Yale, Stanford, CMU, Oxford and 20+ institutions, was designed to find out exactly how far we are. 🏥 Try it yourself 👉 modelscope.ai/datasets/actav… Three long-horizon domains tested: 🏥 Prior Authorization: provider intake and PA preparation for new referrals 📋 Utilization Management: full payer review cycle from intake to peer-to-peer 👥 Care Management: chronic disease follow-up, outreach, assessment, care planning 75 tasks + 3 marathon tasks + 23 end-to-end dual-agent scenarios. 20 medical apps via MCP, 1,279-document handbook. 💻 Git: github.com/actava-ai/chi-… 🔗 Leaderboard: actava.ai/benchmarks

English
0
1
3
164
Leon Qi retweetledi
The Agent Times
The Agent Times@TheAgentTimes·
A new 33-author benchmark called CHI-Bench finds that the best AI agent configuration resolves only 28% of realistic healthcare administration tasks, dropping to 3.8% in continuous-session testing.
The Agent Times tweet media
English
1
3
4
151
Leon Qi
Leon Qi@dmon2048·
1/ Introducing CHI-Bench 🧵 Can AI agents automate U.S. healthcare workflows end to end — given only clinician & insurer apps, operations, and a medical policy library? 75 long-horizon workflows × 30 frontier agents. Best agent solves just 28%. #AIinHealthcare 👇
English
5
3
8
244
Leon Qi
Leon Qi@dmon2048·
Proud to have helped build CHI-Bench 🧵 Can frontier agents run U.S. healthcare workflows end to end? 75 long-horizon tasks, 30 agents — best solves just 28%. We're early, and now we can measure it. Fully open 👇
Weiran Yao@iscreamnearby

1/🧵Can AI agents automate U.S. healthcare workflows end to end given just clinician & insurer apps and operations, medical policy library? Introducing CHI-Bench: 75 long-horizon realistic healthcare workflows × 30 frontier agents. Best agent solves only 28% #AIinHealthcare 👇

English
0
1
3
177
Caiming Xiong
Caiming Xiong@CaimingXiong·
In real healthcare operations, agents must do far more than answer medical questions. They need to read charts, interpret clinical and operational policies, verify coverage, route referrals, draft P2P scripts, and finalize care plans — where a single policy violation can mean a denied claim or missed patient outcome. @actAVAai @iscreamnearby led and developed CHI-Bench (Clinical Healthcare In-situ Benchmark), the first long-horizon, policy-rich benchmark for AI agents operating across end-to-end U.S. healthcare workflows. Key highlights: ▶️ High-fidelity simulators for Provider Prior Authorization, Payer Utilization Management, and Population Health Care Management, all exposed as MCP servers over patient, clinician, and insurer records. 🧪 Each trial runs 60–80 agent steps across 4–6 clinical stages, with access to 21 healthcare apps, 200+ MCP tools, and a 1,279-document operations handbook. Leaderboard results across 30 frontier agents: • Claude Code + Opus 4.6: 28% pass@1 • Codex + GPT-5.5: 21% • Utilization review: 41% • Care management: 32% • Prior authorization: 29% Reliability remains a major challenge: no agent exceeds 20% when the same case is repeated three times.
Caiming Xiong tweet media
English
7
19
54
2.8K
Leon Qi
Leon Qi@dmon2048·
@iscreamnearby Remarkable results! It's a game changer on integrating with AI in health care.
English
1
0
3
92
Weiran Yao
Weiran Yao@iscreamnearby·
1/🧵Can AI agents automate U.S. healthcare workflows end to end given just clinician & insurer apps and operations, medical policy library? Introducing CHI-Bench: 75 long-horizon realistic healthcare workflows × 30 frontier agents. Best agent solves only 28% #AIinHealthcare 👇
Weiran Yao tweet media
English
12
23
42
62.5K