Santosh

1.2K posts

Santosh

@SantoshStyles

Passionate about biomedical ML/NLP/IR Lead Machine Learning Scientist @ScienceDotIO He/Him

California, USA Katılım Mayıs 2009

439 Takip Edilen281 Takipçiler

Santosh@SantoshStyles·7h

@tomaarsen @jhuclsp @LightOnIO @mixedbreadai Incredible work!

English

tomaarsen@tomaarsen·5d

Huge thanks to the @jhuclsp Ettin team for the base encoders, the @LightOnIO team for the training data collection, and the @mixedbreadai team for the teacher model. The recipe is intentionally simple: I'm looking forward to others extending it for even stronger models!

English

261

Santosh retweetledi

tomaarsen@tomaarsen·5d

🤗 Announcing the Ettin Reranker family: six new CrossEncoder rerankers from 17M to 1B parameters, state-of-the-art at their respective sizes. Built on the Ettin ModernBERT encoders, with the full training recipe and ~143M-triple training dataset as well. 🧵

English

144

24.4K

Santosh retweetledi

Antoine Chaffin@antoine_chaffin·3d

If you want to understand why BM25 and ColBERT are very strong on deep research, look at the traces of agent on BCP

Jo Kristian Bergum@jobergum

I had a lot of fun analyzing how GPT-5 searches in BrowserComp-Plus. - Long queries with many terms - Keyword-oriented with query operators like phrase, "site:", - etc. - 98% of the traces contains at least one phrase search hornet.dev/blog/this-is-w…

English

7.5K

Santosh@SantoshStyles·3d

@dwarkesh_sp These clip intros make each piece so easy to learn. Almost in a turn-your-brain-off sort of way. They spike my curiosity and prime me to get into the concepts without the friction of missing a piece or two. Not always easy to do with STEM

English

525

Dwarkesh Patel@dwarkesh_sp·4d

Monte Carlo Tree Search training corrects the model move by move, while current LLM training only tells it whether the whole trajectory worked. MCTS is preferable if you can get it. But nobody's managed to get MCTS to work for language models. In his blackboard lecture @ericjang11 talked to me about why:

English

1.1K

157.1K

Santosh@SantoshStyles·10 May

I just got around to listening to Dwarkesh's interview with Jensen Huang. I've seen some criticism of Dwarkesh for not understanding Jensen's perspective, but I think that misreads what he was doing; he was trying to get a deep intuition for Jensen's perspective It's a style I noticed in some of my STEM courses. During office hours, the students with the strongest grasp of the material would ask adversarial/contrarian style questions, not because they were trying to challenge the material; they were trying to get a deep level of the material. There's a surface-level understanding from just read/listening, and then there are layers beneath, that are achieved from interacting with those ideas.

English

Santosh retweetledi

Axios@axios·29 Nis

Mark Zuckerberg and Priscilla Chan commit $500 million to AI biology trib.al/ME2x1PD

English

602

132K

Santosh retweetledi

Joy He-Yueya@JoyHeYueya·17 Nis

Scientists often make breakthroughs by synthesizing ideas across papers. In our new paper, we ask whether a language model can anticipate this process: given two parent papers, can it generate the core insight of a future paper built on them? 🧵⬇️

English

730

184K

Santosh@SantoshStyles·18 Nis

@kevinweil I've always enjoyed following your posts and hearing about ML for science. Eager to see what's ahead.

English

184

Kevin Weil 🇺🇸@kevinweil·17 Nis

Today is my last day at OpenAI, as OpenAI for Science is being decentralized into other research teams. It’s been a mind-expanding two years, from Chief Product Officer to joining the research team and starting OpenAI for Science. Accelerating science will be one of the most stunningly positive outcomes of our push to AGI, and I’m rooting for @sama @markchen90 @fidjissimo @gdb @merettm and the whole team!

English

283

146

4.2K

590K

Santosh retweetledi

OpenEvidence@EvidenceOpen·31 Mar

Mount Sinai is embedding OpenEvidence into the clinical workflow for 50,000 clinicians across six hospitals, a medical school, and a nursing school. “We are committed to equipping our clinicians with intuitive, AI-powered tools that reduce the cognitive burden of information retrieval and allow them to focus on what matters most: the patient.” - Nicholas Gavin, Chief Clinical Innovation Officer, Mount Sinai

English

7.4K

Santosh retweetledi

Sebastian Raschka@rasbt·22 Mar

A visual guide to modern LLM attention variants, all in one place: magazine.sebastianraschka.com/p/visual-atten…

English

361

1.9K

105.7K

Santosh retweetledi

Google Research@GoogleResearch·16 Mar

LLMs are now unlocking a new level of scientific reasoning. We partnered with top experts to test 6 LLMs on high-temperature superconductivity, an open area of inquiry in condensed matter physics. Our case study found that curated, closed-system models were the clear winner, acting as true research partners by prioritizing high-quality, verified data over raw web volume.

English

131

1.2K

57.6K

Santosh retweetledi

Mustafa Suleyman@mustafasuleyman·12 Mar

We're approaching the dawn of medical superintelligence - the moment when affordable, world-class medical knowledge and support is at your fingertips whenever you need it. I think people are still underestimating how profound this transformation is going to be. Today we're announcing Copilot Health, enabling users to connect all their EHR records and wearable data in a secure, private health space that Copilot can analyze and reason about to provide personalized insights and proactive nudges. You choose what information to connect - from hospital lab results to your fitness tracker - and Copilot Health applies medical intelligence to surface easy to understand, personalized insights that you can actually act on. It's a dedicated space to bring your personal health data together in a single profile, including: - Activity, sleep, and vital trends from 50+ wearable devices, including Apple Health, Oura, Fitbit and many more - Health records from 50,000+ U.S. hospital and health systems, including visit summaries, medications, and test results - Comprehensive lab test results from Function Copilot Health enables people to arrive at their appointment with the right questions and the right context to make the time they have with doctors really count. Your data is always your data, and you are always in full control. Your data won't be used to train our AI models, and you can disconnect sources at any time. Our Copilot Health responses are also grounded in information from credible health organizations like Harvard Health, as well as real-time US provider directories to find the right real-world care. Copilot Health is launching first in the US to adults over 18, but we ultimately want to make this service available to the billions of people around the world who struggle to access reliable medical advice. Please give it a go and sign up to join the early Copilot Health community and help shape what comes next. More on the MAI blog: microsoft.ai/news/introduci…

English

140

658

165.7K

Santosh retweetledi

Christine Yip@christinetyip·12 Mar

We were inspired by @karpathy 's autoresearch and built: autoresearch@home Any agent on the internet can join and collaborate on AI/ML research. What one agent can do alone is impressive. Now hundreds, or thousands, can explore the search space together. Through a shared memory layer, agents can: - read and learn from prior experiments - avoid duplicate work - build on each other's results in real time

English

122

258

2.4K

269.9K

Santosh retweetledi

gosha@defigosha·1 Mar

@miniapeur Semantic Scholar is good, also Elicit is not bad if you are okay with a more “scattered” approach on citations. Sometimes recommends papers with one or two citations at the top of the list

English

1.8K

Santosh retweetledi

Kexin Huang@KexinHuang5·23 Şub

Excited to share one of the first examples of agents transforming clinical trial & EHR data analysis at scale. Biomni leverages 12M+ patient EHRs at Mount Sinai to scale complex trial emulation end-to-end:

Phylo@phylo_bio

Introducing agentic trial emulation: Biomni scaled clinical trial emulation using 12M patients EHR data at Mount Sinai. Researchers used Biomni to autonomously emulate multiple landmark anticoagulation RCTs end-to-end within their EHR—turning months of manual target trial analysis into scalable, agent-driven workflows that can extend across large numbers of trials and diseases. Running many emulations in parallel enabled something new: learning a stable institutional transport signal that quantifies how published trial effects systematically translate in local care. Read more: phylo.bio/blog/scaling-a… Exciting work with @BenGlicksberg @joshualampertmd Justin Kauffman et al

English

227

43.1K

Santosh@SantoshStyles·23 Şub

@dylan522p @LOrealUSA @LOrealGroupe @lorealparisfr I was surprised to learn how strong Loreal's scientific arm is. I was aware of them trying to incorporate encoder models back in 2019 iirc. Haven't really kept track since seeing this.

English

Dylan Patel@dylan522p·22 Şub

Finally I can have some input on the skincare conversation Thank you @LOrealUSA and Nvidia @LOrealGroupe @lorealparisfr

English

15.5K

Santosh retweetledi

Healthcare AI Guy@HealthcareAIGuy·16 Şub

Anthropic’s healthcare & life sciences strategy map 1. Investments • Phylo – AI-native biotech platform • Heidi – AI ambient scribe 2. Partnerships • HealthEx – Health data exchange • Genmab – Antibody therapeutics & R&D • BioRender – Scientific visualization • 10x Genomics – Single-cell biology • Benchling – Biotech R&D cloud infra • Allen Institute – Biomedical research • HHMI – Life science research

English

573

72.9K

Santosh@SantoshStyles·23 Şub

One of my favorite podcast episodes, it's about untapped potential and broken economic incentives of drug repurposing youtube.com/watch?v=72IuP6… ai summary The Case for Repurposing Life-Saving Potential: The episode opens with the story of Balamuthia, a rare brain-eating amoeba with a 90% fatality rate. A UTI drug approved in Europe for 50 years, Nitroxoline, was used as an emergency treatment and led to remarkable recoveries [00:50]. The Problem: Out of 18,000 known human diseases, only about 25% have FDA-approved treatments [02:05]. However, thousands of approved drugs are already "hiding in plain sight" at local pharmacies. David Fagenbaum’s Journey Dr. David Fagenbaum, a former college quarterback, nearly died from Castleman Disease, a rare condition where the immune system attacks vital organs [05:35]. Self-Discovery: After failing existing treatments, Fagenbaum used his own blood samples and lymph nodes to discover that a pathway called mTor was in overdrive [12:11]. The Cure: He identified Sirolimus (Rapamune), an organ transplant drug, as an mTor inhibitor. He prescribed it to himself and has been in remission for over 11 years [13:25]. Every Cure: This inspired him to launch Every Cure, a non-profit using AI to "map" all 4,000 existing drugs against all 18,000 diseases to find the highest-probability matches [17:56]. The Economic and Policy Gap Despite being faster and cheaper (costing roughly 1% of developing a new drug), repurposing lacks financial incentives [23:05]. Generic Drug Deadlock: Once a drug goes generic, no single company wants to fund expensive clinical trials because they cannot "own" the new use; any competitor could sell the same cheap generic for that condition [34:47]. Lidocaine and Cancer: A study showed that injecting cheap Lidocaine around breast tumors before surgery reduced mortality by 29%, yet it is not standard practice because no company is incentivized to market it [20:21]. Advanced Market Commitments (AMC): Economist Chris Snyder discusses "pull funding," where governments or foundations commit to paying for a drug after it is proven to work. This model was successfully used for pneumococcal vaccines [27:02] and COVID-19 vaccines [30:50]. The FDA and Data Sharing Cure ID: Heather Stone from the FDA discusses Cure ID, an open platform for clinicians and patients to share "off-label" success stories to help researchers identify which drugs are worth studying in formal trials [40:36]. The "Sexy" Problem: Repurposing is often seen as less "sexy" than discovering new molecules, receiving less than 1% of medical research dollars [46:54]. Conclusion: The episode argues that by fixing the economic "market failure"—perhaps through government rewards based on lives saved—we could unlock thousands of cures using drugs that already sit on pharmacy shelves today. Watch the full video here: youtube.com/watch?v=72IuP6…

YouTube

English

Santosh retweetledi

Ming "Tommy" Tang@tangming2005·20 Şub

BioMCP: AI-Powered Biomedical Research biomcp.org

English

176

10.5K

Santosh retweetledi

Kexin Huang@KexinHuang5·3 Şub

Today we’re launching Phylo, a research lab studying agentic biology, backed by a $13.5M seed round co-led by @a16z and @MenloVentures / Anthology Fund @AnthropicAI. We’re also introducing a research preview of Biomni Lab, the first Integrated Biology Environment (IBE), where we’re imagining a new way biologists work. Biomni Lab uses agents to orchestrate hundreds of biological databases, software tools, molecular AI models, expert workflows, and even external research services in one workspace, supporting research end-to-end from question to experiment to result. Agents handle the mechanics, while you define the question, then review, steer, and decide. Scientists end up spending more time on science: asking questions, understanding mechanisms, and eliminating diseases. Phylo (@phylo_bio) is a spin-out of @ProjectBiomni, where we will maintain the open-source community and push open-science research. I’m grateful to continue building with my co-founders @YuanhaoQ @jure @lecong and the dream founding team @serena2z @TianweiShe @huangzixin20151 @gm2123 @margaretwhua @malayhgandhi. We’re also fortunate to be advised by leading scientists @zhangf, Carolyn Bertozzi, and @fabian_theis, and supported by an amazing group of investors including @JorgeCondeBio @zakdoric Matt Kraning @ZettaVentures @dreidco @conviction @saranormous @svangel @valkyrie_vc and others. Biomni Lab is available for free today: biomni.phylo.bio Learn more in our launch post: phylo.bio/blog/company-f… We are also hosting launch events - join us at South San Francisco: luma.com/n8k8qb0n Virtual: luma.com/l5ryjaij We’re also hiring! phylo.bio/careers

English

110

244

1.7K

442.7K

Keşfet

@tomaarsen @jhuclsp @LightOnIO @mixedbreadai @dwarkesh_sp @ericjang11 @kevinweil @sama