Jim Liu (@JimZhiwei) - Twitter Profili | Zamantika Mersobahis Locabet

Jim Liu@JimZhiwei·20 May

This is the next era of health care! Every agent should test on this benchmark.

In real healthcare operations, agents must do far more than answer medical questions. They need to read charts, interpret clinical and operational policies, verify coverage, route referrals, draft P2P scripts, and finalize care plans — where a single policy violation can mean a denied claim or missed patient outcome. @actAVAai @iscreamnearby led and developed CHI-Bench (Clinical Healthcare In-situ Benchmark), the first long-horizon, policy-rich benchmark for AI agents operating across end-to-end U.S. healthcare workflows. Key highlights: ▶️ High-fidelity simulators for Provider Prior Authorization, Payer Utilization Management, and Population Health Care Management, all exposed as MCP servers over patient, clinician, and insurer records. 🧪 Each trial runs 60–80 agent steps across 4–6 clinical stages, with access to 21 healthcare apps, 200+ MCP tools, and a 1,279-document operations handbook. Leaderboard results across 30 frontier agents: • Claude Code + Opus 4.6: 28% pass@1 • Codex + GPT-5.5: 21% • Utilization review: 41% • Care management: 32% • Prior authorization: 29% Reliability remains a major challenge: no agent exceeds 20% when the same case is repeated three times.

English

0

1

3

121

Jim Liu@JimZhiwei·20 May

Amazing

Weiran Yao@iscreamnearby

1/🧵Can AI agents automate U.S. healthcare workflows end to end given just clinician & insurer apps and operations, medical policy library? Introducing CHI-Bench: 75 long-horizon realistic healthcare workflows × 30 frontier agents. Best agent solves only 28% #AIinHealthcare 👇

English

0

1

2

233

Jim Liu retweetledi

Caiming Xiong@CaimingXiong·13 May

Today, we’re excited to launch Recursive (@recursive_si): an exceptional team across London and San Francisco, building AI systems that can safely improve their own capabilities over time.

Recursive@Recursive_SI

x.com/i/article/2054…

English

15

17

123

17.1K

Jim Liu

Keşfet