Khaled Saab

221 posts

Khaled Saab banner
Khaled Saab

Khaled Saab

@_khaledsaab

research @OpenAI, prev: @GoogleDeepMind @StanfordAILab @HazyResearch

California, USA Katılım Mart 2019
428 Takip Edilen2.9K Takipçiler
Albert Gu
Albert Gu@_albertgu·
Extremely proud of the team @cartesia for launching Sonic 3.5, which sets a new state of the art for TTS I personally led the technical direction of this model; we built it ground up from first principles, and it contains multiple non-trivial ideas that differ substantially from anything we’ve seen in the literature. It’s been very gratifying to see research bets play out and the strong research team at Cartesia continue to grow!
Artificial Analysis@ArtificialAnlys

Cartesia’s Sonic-3.5 takes the #1 spot on the Artificial Analysis Speech Arena Leaderboard, surpassing Inworld Realtime TTS 1.5 Max and Google’s Gemini 3.1 Flash TTS Sonic-3.5 is the latest TTS model from @cartesia . It supports 42 languages, including 9 Indian languages, with 500+ voices available out of the box. The model has been highly preferred among voters in the TTS Arena, with its demonstrated naturalness and accurate transcript following. Key takeaways: ➤ Quality: Sonic-3.5 has an Elo score of 1,218 (+16/-16) based on 1,144 arena appearances, placing it ahead of Inworld Realtime TTS 1.5 Max at 1,194 and Gemini 3.1 Flash TTS at 1,209 ➤ Pricing: Sonic-3.5 is priced at $39/1M characters, a premium compared to Gemini 3.1 Flash TTS at $18.3/1M characters, and Inworld Realtime TTS 1.5 Max at $35/1M characters ➤ Speed: 105.5 characters per second, compared to 205 characters per second for Inworld Realtime TTS 1.5 Max and 26.3 characters per second for Gemini 3.1 Flash TTS See more details and listen to samples below 🧵

English
7
19
184
19.8K
Vivek Natarajan
Vivek Natarajan@vivnat·
Out of all the announcements at @Google I/O today, this is the one closest to my heart - our foundational research on Co-Scientist was published in @Nature and we announced its broad availability via @GeminiApp for Science. When you are suffering from a disease, time is everything. As our collaborator and @StanfordMed Professor Dr. Gary Peltz reminds us, there are thousands of diseases out there with zero treatments. There is simply so much left to solve. Our goal with Co-Scientist has been to give scientists superpowers and help them get to these answers faster - compressing the scientific process from months and years down to hours and days. Much like Galileo's telescope helped us look into the stars, Co-Scientist is designed to help us make sense of the vast complexity of biological and scientific data. It is among the first examples of a truly general-purpose multi-agent system for scientific discovery. The core research question behind it was: How can an AI system engage in the rigorous, structured thinking that’s the hallmark of science and scientists? To tackle this, Co-Scientist builds on the principles of self-play and self-improvement underpinning @GoogleDeepMind breakthroughs like AlphaGo, generalizing them to scientific reasoning through self-debates. Since our preprint last year, we have further improved its capabilities and have been validating it in collaborations with scientists across over 100 institutions globally, spanning both academia and industry. And we are thrilled to see the emergence of a new form of AI-human scientist collaboration that's already leading to important new insights, discoveries and peer reviewed publications - from understanding antimicrobial resistance (published in @CellCellPress) to decoding plant immunity, to identifying new treatments for liver fibrosis (Advanced Science), cancer, neurodegenerative diseases like ALS and the grand challenge of aging. I have always believed AI's greatest promise is accelerating scientific discovery and advancing human health. My genuine hope for the future is that AI tools like Co-Scientist help democratize science, giving anyone, anywhere the means to pursue their child-like curiosity and change the world. This work was done with stellar team mates spanning @GoogleDeepMind @GoogleResearch, @googlecloud and @GoogleLabs especially Juro Gottweis (@Mysiak ), who is the heart and soul of this effort. Special thanks also to all our wonderful collaborators: Gary Peltz, @CostaT_Lab, @jrpenades, @_e_d_v_ , @iambyronic, @OpsBug, @jgooten, @omarabudayyeh Ritu Raman, Ryan Flynn, Filippo Menolascina, Velia Siciliano, Clare Bryant, Matt Onsum, Katherine Labbé and more. Nature paper link - lnkd.in/e8qBEJFv Google DeepMind blog - lnkd.in/etYeahMy Gemini for Science - labs.google/science.
English
19
112
522
79.9K
Khaled Saab
Khaled Saab@_khaledsaab·
Multimodal AMIE now published @NatureMedicine! This is from past work @GoogleDeepMind where we studied patients uploading images during diagnostic dialogue. We found that a multimodal reasoning harness that tracks a patient’s state greatly improves history taking and clinical accuracy. We also surpassed doctors across many evaluation axes in diverse primary care settings. nature.com/articles/s4159…
English
5
16
71
1.4M
Khaled Saab retweetledi
Shiv Rao, MD
Shiv Rao, MD@ShivdevRao·
GPT-5.5 early access results have been impressive. 25% lift in clinical quality. 30% less verbose. We orchestrate hundreds of AI tasks, some powered by our proprietary data flywheels, others by frontier models like GPT-5.5. We test rigorously for clinical accuracy, completeness, reasoning, real-world because benchmarking is a big deal in healthcare. Grateful to the OpenAI team for the early access and partnership.
Shiv Rao, MD tweet media
English
4
11
59
4.4K
Khaled Saab
Khaled Saab@_khaledsaab·
Doctors are increasingly using ChatGPT to supercharge their workflows in delivering care. To support this high impact domain, we take steps to make ChatGPT more accessible, accurate, and measurable for clinical work. openai.com/index/making-c…
Karan Singhal@thekaransinghal

Today we’re introducing two big steps for health at OpenAI: - ChatGPT for Clinicians, a free version of ChatGPT designed for clinical work - HealthBench Professional, a new benchmark to evaluate real clinician chat tasks We’re excited about what this can unlock for care. ❤️

English
0
1
19
1.6K
Khaled Saab retweetledi
Tibo
Tibo@thsottiaux·
Codex at 2M+ active users up 25% week over week... and that was before we launched the app on Windows and GPT-5.4!
English
91
41
1.6K
80.5K
Khaled Saab retweetledi
OpenAI
OpenAI@OpenAI·
Yesterday we reached an agreement with the Department of War for deploying advanced AI systems in classified environments, which we requested they make available to all AI companies. We think our deployment has more guardrails than any previous agreement for classified AI deployments, including Anthropic's. Here's why: openai.com/index/our-agre…
English
1.8K
586
3.9K
2.6M
Khaled Saab retweetledi
Boaz Barak
Boaz Barak@boazbaraktcs·
There is this narrative that up until this week, Anthropic had this wonderful contract that prevented the U.S. government from doing mass domestic surveillance or autonomous lethal weapons, and now all hell will break lose. As I wrote, I am not a fan of accelerating AI specifically in the national security space. If I had been an Anthropic employee at the time they signed their original deal with the DoW, I would have probably opposed it, especially given the reduced control since they worked through Palantir. And I don't think having some terms of use in the contract is what we can rely on to protect us. I believe the drama of the last week about these terms of use is more about politics than substance. The substance is about the details, which I hope more of which will come out soon. But it is wrong to present the OAI contract as if it is the same deal than Anthropic rejected, or even as if it is less protective of the red lines than the deal Anthropic already had in place before. Obviously I don't know all details of what Anthropic had before, but based on what I know, it is quite likely that the contract OAI signed gives *more* guarantees of no usage of models for mass domestic surveillance or autonomous lethal weapons than Anthropic ever had.
Boaz Barak@boazbaraktcs

Some thoughts (long tweet.. sorry). I would prefer if we focused first on using AI in science, healthcare, education and even just making money, than the military or law enforcement. I am no pacifist, but too many times national security has been used as an excuse to take people's freedoms (see patriot act). I am very worried about governments using AI to spy on their own people and consolidate power. I also think our current AI systems are nowhere nearly reliable enough to be used in autonomous lethal weapons. I would have preferred to take it slower with classified deployment, but if we are going to do it, it is crucial that we maintain the red lines of no domestic surveillance or autonomous lethal weapons. These are widely held positions, and codified in laws and regulations. They should be stipulated in any agreement, and (more importantly) verified via technical means. I think the terms of this agreement, as I understand them, are in line with these principles, that are also held by other AI companies too. I hope the DoW will offer them the same conditions. Regardless, a healthy AI industry is crucial for U.S. leadership. Whether or not relations have soured, there is zero justification to treat Anthropic - a leading American AI company whose founders are deeply patriotic and care very much about U.S. success - worse than the companies of our adversaries. It appears to me that much of this week's drama has been more about style and emotions than about substance. I hope that people can put this behind them, and come together for the benefit of our country.

English
47
25
239
121.6K
Khaled Saab retweetledi
Nathan Labenz
Nathan Labenz@labenz·
230M people use ChatGPT for health & wellness questions every week. 📈 A recent RCT showed that it improves patient outcomes. ⚕️ And soon... it will be FREE for ALL 👏 (with no ads!) - @thekaransinghal, @OpenAI Health Lead, on "Universal Medical Intelligence"
English
7
5
44
9.4K
Flowers ☾
Flowers ☾@flowersslop·
My friend who studies medicine said ChatGPT isnt too reliable for medical questions yet, so I told her to come up with the most complex case she could think of to make it fail. She spent like 10 minutes writing it...and ChatGPT oneshotted it hahahahahhah
English
96
59
3.3K
252K
Khaled Saab
Khaled Saab@_khaledsaab·
@SRSchmidgall @taotu831 It’s near instant but also a prerequisite that has near term impact in health AI. What are your thoughts on how medical AGI evals should evolve? Cc @taotu831
English
0
0
0
213
Khaled Saab
Khaled Saab@_khaledsaab·
We’re at an inflection point of AGI evaluation where verification gets much harder. We started with multiple choice (instant verification) to research-grade problems (multi-day by a few experts). And soon it will be multi-month and multi-year (e.g., tape-out, clinical trials).
Sam Altman@sama

We went from AI systems that struggled to do grade school math to AI systems that can solve research-level math problems in just a few years. I agree with Jakub this is perhaps the most important eval now. I am also pretty sure the main reaction will be "it's not that hard" :)

English
1
0
3
710
Khaled Saab retweetledi
Jakub Pachocki
Jakub Pachocki@merettm·
Very excited about the "First Proof" challenge. I believe novel frontier research is perhaps the most important way to evaluate capabilities of the next generation of AI models. We have run our internal model with limited human supervision on the ten proposed problems. The problems require expertise in their respective domains and are not easy to verify; based on feedback from experts, we believe at least six solutions (2, 4, 5, 6, 9, 10) have a high chance of being correct, and some further ones look promising. We will only publish the solution attempts after midnight (PT), per the authors' guidance - the sha256 hash of the PDF is d74f090af16fc8a19debf4c1fec11c0975be7d612bd5ae43c24ca939cd272b1a . This was a side-sprint executed in a week mostly by querying one of the models we're currently training; as such, the methodology we employed leaves a lot to be desired. We didn't provide proof ideas or mathematical suggestions to the model during this evaluation; for some solutions, we asked the model to expand upon some proofs, per expert feedback. We also manually facilitated a back-and-forth between this model and ChatGPT for verification, formatting and style. For some problems, we present the best of a few attempts according to human judgement. We are looking forward to more controlled evaluations in the next round! 1stproof.org #1stProof
English
244
348
2.8K
2.6M
Khaled Saab retweetledi
Vivek Natarajan
Vivek Natarajan@vivnat·
Scientific discovery and clinical medicine are often treated as distinct phases. But for patients with rare, complex, and undiagnosed diseases, this separation is a luxury they cannot afford. The timeline from understanding a genetic mechanism to accessing subspecialist care is often too long and too fragmented. Two new @GoogleDeepMind @GoogleResearch collaborations with @StanfordMed , published in Advanced Science and @NatureMedicine respectively last week, demonstrate how AI can bridge this gap. 1. Accelerating discovery (the science) In Advanced Science, we present one of the first wet-lab validated examples of AI-assisted genetic discovery. Our AI identified a novel genetic factor for hearing loss (Crym) in mice, which Dr Gary Peltz and team validated using CRISPR knock-in experiments to restore the wild-type gene and rescue the phenotype. We applied this agentic AI scaffold to human patients with complex, undiagnosed conditions in a retrospective manner. The system analyzed genomic data for rare diseases, such as IRAK4 deficiency and ODC1 mutations, successfully identifying causative variants that matched expert clinical assessments. 2. Scaling expertise (the medicine) Discovery is only the first step; patients then need access to specialized care. As we note in our Nature Medicine paper, hypertrophic cardiomyopathy (HCM) is a leading cause of sudden cardiac death, yet ~60% of patients remain undiagnosed due to a lack of specialist centers . In our RCT (one of the first of its kind) using our research AI system AMIE, we showed AI could help bridge this gap. General cardiologists using AMIE reported the system helped their assessments in 57.0% of cases, missed no clinically significant findings in 93.5% of cases and reduced assessment time in 50.5% of cases. This suggests the AI can act as a helpful co-pilot and help generalists bridge the gap to specialists. Worth noting that these studies used models like Med-PaLM 2, Gemini 2.0 Flash, and Gemini 2.5 Pro with simple agentic scaffolds. The potential for Gemini 3 and AI co-scientist to accelerate both the biology of discovery and the delivery of care is profound and we will share more soon. Its a true privilege to collaborate with @euanashley , Jack W O'Sullivan, Dr Gary Peltz and their teams at Stanford Medicine. With incredible team mates at Google including @taotu831 @apalepu13 , @alan_karthi , @Mysiak and many more. Advanced Science paper - lnkd.in/dggduzka Nature Medicine paper - lnkd.in/dPEZQ4bz AI co-scientist blog - lnkd.in/gEDeaRfu AMIE blog - lnkd.in/gzkn2ywe
Vivek Natarajan tweet mediaVivek Natarajan tweet media
English
2
20
94
7.9K
Khaled Saab retweetledi
Jerry Tworek
Jerry Tworek@MillionInt·
Run fewer experiments and think about them more
English
19
30
561
50.7K
Khaled Saab retweetledi
Tao Tu
Tao Tu@taotu831·
Excited to share our latest research published today in @NatureMedicine, demonstrating how Large Language Models (LLMs) can help bridge the critical shortage of subspecialist medical expertise. nature.com/articles/s4159…
English
1
13
54
6.5K