Sully.ai

87 posts

Sully.ai banner
Sully.ai

Sully.ai

@sullyai

Autonomous OS for healthcare

San Francisco Bay Area Katılım Aralık 2017
10 Takip Edilen943 Takipçiler
Sabitlenmiş Tweet
Sully.ai
Sully.ai@sullyai·
Excited to partner deeply with @nvidia & @baseten
NVIDIA AI@NVIDIAAI

🩺 @sullyai has returned over 30 million minutes to physicians — more time with patients, less on paperwork. @baseten powers this with their optimized inference stack built using NVIDIA Blackwell, NVFP4, TensorRT LLM, and NVIDIA Dynamo, to run frontier open models like gpt oss 120b. The result: 10x cost reduction and 65% faster responses for workflows like clinical note generation. 🔗 Read the blog: nvda.ws/468smA3

English
0
2
12
3.8K
Sully.ai
Sully.ai@sullyai·
For folks claiming we're using @getdelve for compliance & our SOC II, we moved to @DrataHQ almost 6 months ago. The link to our Drata trust center is in the comments.
Sully.ai tweet media
English
2
5
50
57.1K
Sully.ai
Sully.ai@sullyai·
Excited to partner deeply with @Speechmatics to launch the world's first Arabic-English speech-to-text model.
Speechmatics@Speechmatics

@sullyai tested the major STT providers on real MENA clinical audio. 👀 Code-switching, dialect-heavy consultations, the conditions generic models fail on. 🏥 Patrick Nguyen, Head of Engineering MENA: ours was the only one that hit the performance thresholds needed for clinical documentation at regional scale.

English
0
0
1
454
Sully.ai retweetledi
Ahmed Omar.
Ahmed Omar.@omar_or_ahmed·
Healthcare AI cos are solving the wrong problem. Clinicians don't need better diagnoses. They need time back. Here's the $100B insight:
English
12
12
160
19.4K
Sully.ai retweetledi
Muratcan Koylan
Muratcan Koylan@koylanai·
We benchmarked NVIDIA’s new Nemotron 3 Super in two modes **Thinking Off and High Thinking** across three medical evaluation sets: MedMCQA, MedCaseReasoning, and MedXpertQA. Thinking Off outperformed High Thinking: 26.4% vs. 25.2% accuracy. The cost gap was much larger than the accuracy gap. High Thinking increased mean latency from 1.13s to 4.43s and mean completion length from 109 tokens to 1,089 tokens. In our setup, the higher-reasoning mode was much slower and more verbose, without improving aggregate results. The benchmark-level split was more revealing than the overall average. On MedMCQA, accuracy dropped from 56.6% to 49.1% with High Thinking. On MedCaseReasoning, it also declined, from 24.4% to 20.2%. The only clear gain was on MedXpertQA, where High Thinking improved accuracy from 9.2% to 15.0%. That pattern fits the benchmark design: MedMCQA rewards concise answer selection on constrained multiple-choice questions, while MedXpertQA is harder and more reasoning-intensive, so extra inference budget appears to help more there than on exam-style MCQs. Across the overlap set, High Thinking improved 166 questions but flipped 182 previously correct answers into incorrect ones, explaining the net regression. Many of these looked like classic overthinking on structured medical multiple-choice items: the non-thinking run selected the correct answer directly, while High Thinking often chose a plausible distractor after longer deliberation. Our main takeaway: Nemotron Super’s High Thinking mode should not be treated as a universal default. In this experiment, it looked more like a specialized mode for harder expert synthesis than a general-purpose accuracy booster. For structured medical multiple-choice tasks, Thinking Off was both faster and more accurate. For harder expert-level reasoning tasks, especially those closer to MedXpertQA, additional reasoning showed some benefit. The practical implication is that the reasoning depth should likely be routed by task type rather than enabled globally. We used the @baseten Model API for these runs, and we’re grateful for their support from day one. We’re also thankful to @NVIDIAAI for its commitment to open source. As a research team that transitioned fully to open-source models this year, we deeply appreciate this level of openness, weights, data, and recipes. We also expect this model to be especially strong for orchestration and agent-style tasks, which is an area we’re excited to explore further.
Muratcan Koylan tweet media
Bryan Catanzaro@ctnzr

Announcing NVIDIA Nemotron 3 Super! 💚120B-12A Hybrid SSM Latent MoE, designed for Blackwell 💚36 on AAIndex v4 💚up to 2.2X faster than GPT-OSS-120B in FP4 💚Open data, open recipe, open weights Models, Tech report, etc. here: research.nvidia.com/labs/nemotron/… And yes, Ultra is coming!

English
5
4
30
3.7K
Sully.ai retweetledi
Ahmed Omar.
Ahmed Omar.@omar_or_ahmed·
Healthcare AI will be bigger than legal AI, bigger than coding AI. But most investors still dont see it. Heres the math that changes everything 🧵
English
98
214
2K
285.5K
Sully.ai retweetledi
Ahmed Omar.
Ahmed Omar.@omar_or_ahmed·
Proud of the Sully team to get to this! We were late to the party, but now we’re #1! cc: @amitskumthekar, @koylanai and the research team
Ahmed Omar.@omar_or_ahmed

We made an early bet on a full team of autonomous agents. We're #1 on speed! If you walk into any hospital today, you would see them using an average of 50-100 software tools, with some having over 800 SaaS subscriptions! Getting one integration with one solution to your health system at a time is a nightmare. Not just that, getting all those AI tools to talk to each other is nearly impossible. The biggest objection we get before showing people our demo is: how good each and every agent is, how you're doing all those suites of agents, and how you can claim you're better. But we don't like to talk, we show them what we built, and their jaws drop. Why does that happen? because we're driven by UX (user experience). If something is a better UX, our research and engineering team figures out how to do it. If physicians don't want to wait for something, they shouldn't; if technology isn't there yet, we'll figure it out. Thanks to the @sullyai team and our partners for making this happen! PS. A lot of people ask us about the accuracy with this speed, for us, this is clinical information, so quality is out of the question! Read our paper here about our clinical accuracy: arxiv.org/abs/2505.23075

English
0
2
8
2.3K
Sully.ai retweetledi
Ahmed Omar.
Ahmed Omar.@omar_or_ahmed·
@ns123abc wild. the talent wars in AI are just getting started. the best researchers leaving big companies to either start their own thing or join vertical AI companies where they can actually ship to production. research for researchs sake wont cut it anymore
English
0
2
15
5.6K
Sully.ai retweetledi
Sully.ai
Sully.ai@sullyai·
Excited to partner deeply with @nvidia & @baseten
NVIDIA AI@NVIDIAAI

🩺 @sullyai has returned over 30 million minutes to physicians — more time with patients, less on paperwork. @baseten powers this with their optimized inference stack built using NVIDIA Blackwell, NVFP4, TensorRT LLM, and NVIDIA Dynamo, to run frontier open models like gpt oss 120b. The result: 10x cost reduction and 65% faster responses for workflows like clinical note generation. 🔗 Read the blog: nvda.ws/468smA3

English
0
2
12
3.8K
Sully.ai retweetledi
Ahmed Omar.
Ahmed Omar.@omar_or_ahmed·
we built a SNOMED coding judge that actually works. old one scored bad agents 90% "optimal." good ones? also 90%. completely useless. new semantic judge: 70 point discrimination gap. r ≈ 0.99 with ground truth. here's how we did it 👇
Ahmed Omar. tweet media
English
1
1
3
356
Sully.ai retweetledi
Ahmed Omar.
Ahmed Omar.@omar_or_ahmed·
People think this is a Reddit shit post. I actually think this is a great presentation of our intense culture without advertising it.
Ahmed Omar. tweet media
English
1
2
7
540
Sully.ai retweetledi
Ahmed Omar.
Ahmed Omar.@omar_or_ahmed·
Why single AI models will NEVER be enough for healthcare. A thread on why the future of clinical AI is multi-agent 🧵:
English
6
6
29
6.4K