Salman Abdullah

64 posts

Salman Abdullah

Salman Abdullah

@salmanabdullah_

BS/MS @Stanford | RL, Reasoning, Agents @StanfordAILab

Katılım Ocak 2021
421 Takip Edilen191 Takipçiler
Sabitlenmiş Tweet
Salman Abdullah
Salman Abdullah@salmanabdullah_·
Excited to introduce RAPTOR 🦖 at #ICLR2024: RAPTOR is a tree-based retrieval approach that navigates between granular details and a holistic understanding of documents. It sets a new SoTA on 3 benchmarks. Read the paper here: arxiv.org/abs/2401.18059
Parth Sarthi@parthsarthi03

Looking for a RAG system that navigates between granular details and the big picture? We’re excited to introduce RAPTOR, a tree-based retrieval approach that sets a new SoTA on 3 benchmarks #ICLR2024 w/@salmanabdullah_ , @aditituli_, @shubhkhanna__, @annadgoldie & @chrmanning at @stanfordNLP arxiv.org/abs/2401.18059 🧵

English
3
2
14
3.7K
Salman Abdullah retweetledi
Jessica Chudnovsky
Jessica Chudnovsky@jchudnov·
Your deduplication pipeline was built for small models. At scale, it's broken. New preprint: "Scale Dependent Data Duplication" 1/10
Jessica Chudnovsky tweet media
English
6
28
114
25.6K
Salman Abdullah retweetledi
Jack Bai
Jack Bai@jackbot_cs·
We're proud to share that WebGym is now accepted to CVPR 2026. I would be excited to talk to people working in the vision domain about web agents and reinforcement learning. See you in Denver soon. 😈 Code and data are now publicly available at github.com/microsoft/webg….
Jack Bai@jackbot_cs

😈 Today, we introduce WebGym, the largest-to-date open-source RL environment for web agent training that contains 300k tasks and a rollout framework optimized specifically for web environments' rollout speed. We reveal the effects of essential scaling directions we observe with WebGym. 1/n

English
0
4
21
3.3K
Salman Abdullah retweetledi
Salman Abdullah retweetledi
Jack Bai
Jack Bai@jackbot_cs·
😈 Today, Microsoft open-sources WebGym: the task set, code, a bunch of visualization tools, and guiding documentations. WebGym is an RL environment with the *first* open-source implementation of the fully asynchronous rollout system designed for multi-step vision-supported web agentic trajectory collection, which speeds up *4x-5x* compared to existing synchronous implementations. This release comes with *300k* realistic web agentic tasks with comprehensive evaluation rubrics and pipeline, together with annotations on difficulty and domains. 🧵 1/6
English
2
10
50
3.9K
Salman Abdullah retweetledi
Jack Bai
Jack Bai@jackbot_cs·
😈 Today, we introduce WebGym, the largest-to-date open-source RL environment for web agent training that contains 300k tasks and a rollout framework optimized specifically for web environments' rollout speed. We reveal the effects of essential scaling directions we observe with WebGym. 1/n
English
13
38
378
43.3K
Salman Abdullah retweetledi
Parth Sarthi
Parth Sarthi@parthsarthi03·
I am incredibly excited to introduce Chariot. (@Chariot_in) Suvrat (@TheBhooshan) and I are working a research lab based in India to research systems that can truly understand, reason, and interact with the world starting with speech. We are one of the four teams backed by the @OfficialINDIAai Mission and the Government of India. Today, we had the incredible honor of meeting and interacting with the Honorable Prime Minister of India @NarendraModi ji at his residence to explain what we are working on. While I was explaining our model to him, he zeroed in on the core problem and asked me if the model could discern the intent behind the words to determine the correct tone. With his example of "Ram Naam Satya Hai" vs "Ram Ram”, he asked if the model knows when something is solemn vs casual? Does it understand the weight of what's being said, not just the words themselves? It's exactly the kind of problem we're obsessing over at Chariot— speech isn't transcription, it's intent, emotion, cultural context, all encoded in how something is said. Grateful to the India AI Mission for this opportunity. Lots to build. 🇮🇳
Parth Sarthi tweet mediaParth Sarthi tweet media
Narendra Modi@narendramodi

Talked AI with youngsters from the Indian StartUp world. It was a memorable and insightful interaction, in which they shared their vision and work on how India is transforming the world of AI. It is commendable how these StartUps are working on diverse fields such as e-commerce, marketing, engineering simulations, material research, healthcare, medical research and more. pib.gov.in/PressReleseDet…

English
5
7
25
3.4K
Salman Abdullah retweetledi
Ahmed Awadallah
Ahmed Awadallah@AhmedHAwadallah·
Fara-7B is our first agentic small language model for computer use. We learned a lot, and looking forward to next steps: *Agentic models can be small, yet remain capable *Unlike solutions that rely on chat model wrappers, even small agentic models can process screenshots and perform direct GUI actions such as scrolling, typing, and clicking. *Simulation-driven multi-agent synthetic data to automates task generation, trajectory generation and validation is a way to address the agentic data scarcity gap, and in our case costs < $1 per task. *Evaluating CUA is hard ; we release WebTailBench, a new eval set with diverse tasks not found in other benchmarks, and work with an external party, Browserbase, to independently assessed Fara-7B using human annotators. Model available on Foundry and HuggingFace and can run on device on Copilot+ PC
Ahmed Awadallah tweet media
English
10
32
129
23.1K
Salman Abdullah retweetledi
Nikunj Kothari
Nikunj Kothari@nikunj·
Computer use agents are SO wildly underhyped.. 2026 is going to be fun 🕺
English
34
9
221
24.3K
Salman Abdullah retweetledi
Anikait Singh
Anikait Singh@Anikait_Singh_·
🚨🚨New Paper: Training LLMs to Discover Abstractions for Solving Reasoning Problems Introducing RLAD, a two-player RL framework for LLMs to discover 'reasoning abstractions'—natural language hints that encode procedural knowledge for structured exploration in reasoning.🧵⬇️
Anikait Singh tweet media
English
14
116
595
56.2K
Salman Abdullah retweetledi
Qdrant
Qdrant@qdrant_engine·
Researchers at @ETH_en and @Stanford released an open dataset of 5.8M+ long-form medical QA pairs, each grounded in peer-reviewed literature and designed for RAG. 🚀 The pipeline: ▪️ Source: 900K+ full-text medical papers (S2ORC) ▪️ QA generation via GPT-3.5 with a three-stage filtering process (regex, Mistral-7B classifier, human-in-the-loop) ▪️ Embeddings generated and indexed in Qdrant for scalable dense retrieval The dataset is available on @huggingface🤗 with full code for embedding, indexing, and RAG setup. 👉 Full story: qdrant.tech/blog/miriad-qd…
Qdrant tweet media
English
0
4
12
1.1K
Salman Abdullah retweetledi
Parth Sarthi
Parth Sarthi@parthsarthi03·
With the move to Compound AI systems— built from components like finetunable/closed-source models, LLM selectors, and more— one big challenge is end-to-end optimization. Optimizing each component individually doesn't necessarily guarantee optimization of the full system. Our latest work introduces Optimas, a framework that solves this by learning reward functions for each part that are aligned with final system performance. Each component gets its own Globally Aligned Local Reward (LRF), and we use the right optimization method for each (prompt optimization for API models, PPO for open-source, hyperparameter selection, etc). Across 5 real-world tasks, Optimas get an average 11.92% boost over top baselines (LLMSelector, TextGrad, DSPy). Check it out!
Shirley Wu@ShirleyYXWu

Introducing 🔥Optimas🔥: The first unified framework to optimize compound AI systems composed of multiple components like trainable/API-based LLMs, tools, model routers, and traditional ML models! 🌐 👉🏻 optimas.stanford.edu 🌟 Why Optimas? AI systems today combine diverse elements—prompts, model parameters, hyperparameters, and model router. Optimizing the entire system effectively is tough! Optimas tackles this with an intuitive strategy: Globally Aligned Local Rewards (LRFs), ensuring each component's optimization directly boosts overall system performance! 📈 Impressive Results: Tested rigorously on 5 real-world compound AI tasks: Product Recommendation, Medical QA, Complex Retrieval, Multi-hop QA, and Code Generation. 🤩 Delivers an impressive average boost of 11.92% over top baselines (e.g., LLMSelector, TextGrad, DSPy). 🔧 Here's the magic behind Optimas: ① Assigns each component a Local Reward Function (LRF). ② Aligns these LRFs with global objectives, enabling independent yet coordinated optimizations. ③ Adaptively updates LRFs for efficient, coherent improvements across diverse configurations. 💡 Compatible with popular agentic frameworks Easily optimize your own systems! Integrates with popular agentic frameworks like @DSPyOSS, @crewAIInc, @pyautogen, TextGrad, and OpenAI Agent SDK @OpenAIDevs! Proudly developed by an outstanding collaboration between @StanfordAILab, @AmazonScience, and more! Grateful to work with team @parthsarthi03, Shiyu, Aaron, @krypticmouse, @Diyi_Yang, @james_y_zou, @jure etc.! Check out more! 📄 Paper: arxiv.org/abs/2507.03041 💻 Code: github.com/snap-stanford/… (to be open-sourced soon!) #CompoundAISystem #LLM #Optimization #MachineLearning

English
1
7
18
3.2K
Salman Abdullah retweetledi
Marktechpost AI Dev News ⚡
Marktechpost AI Dev News ⚡@Marktechpost·
Researchers from ETH Zurich, Stanford, Mayo Clinic, and others have developed MIRIAD, a large-scale dataset containing 5.8 million medical instruction-response pairs, each grounded in peer-reviewed literature. Designed to address the factual inconsistencies of large language models (LLMs) in clinical settings, MIRIAD enhances retrieval-augmented generation (RAG) pipelines by replacing noisy, unstructured content with clean, semantically aligned QA data. When integrated with LLMs, MIRIAD improves accuracy by up to 6.7% and significantly enhances hallucination detection by up to 37%. The dataset is supported by MIRIAD-Atlas, an interactive visualization tool spanning 56 medical domains, allowing users to explore content by topic. Built through a semi-automated pipeline involving GPT-4 supervision and human expert validation, MIRIAD serves both as a high-quality retrieval corpus and a training set for specialized medical retrievers. This structured resource sets a new standard for safe and explainable medical AI, facilitating more trustworthy applications in clinical question-answering, digital health interfaces, and research. Read full article: marktechpost.com/2025/06/25/eth… Paper: arxiv.org/abs/2506.06091 Dataset: huggingface.co/miriad Code: github.com/eth-medical-ai… @Michael_D_Moor @QueyJ , @salmanabdullah_ , @samarthrawal , @cyrilzakka , @SophieOstmeier @edreisMD , @EricTopol @jure
English
0
9
23
802
Salman Abdullah retweetledi
Quentin Lhoest 🤗
Quentin Lhoest 🤗@lhoestq·
YEEESSSssss dataset loading with Spark is 🔥 👉It loads ANY dataset on @huggingface in one line of code Using pyspark_huggingface 1.0 released last week e.g. here the latest Medical QA dataset (5M+ rows🤯) by @Michael_D_Moor @salmanabdullah_ and team
Quentin Lhoest 🤗 tweet media
English
2
9
70
6.7K
Salman Abdullah retweetledi
Charly Wargnier
Charly Wargnier@DataChaz·
🚨 Just released: MIRIAD, a million-scale medical QA dataset to ground LLMs in reliable medical knowledge. 5.8M question-answer pairs, each distilled from peer-reviewed literature! 🔥 That's structured, high-quality data built for medical AI. 🧵 ↓
English
10
102
430
48.1K
Salman Abdullah
Salman Abdullah@salmanabdullah_·
We're excited to release MIRIAD - a massive-scale 5.8M+ synthetic dataset for retrieval in medicine. It improves RAG performance, helps LLMs detect medical hallucinations, and enables training of domain-specific retrievers. 🤗 Dataset: huggingface.co/miriad 🖥️ Code: github.com/eth-medical-ai…… 📄 Preprint: arxiv.org/abs/2506.06091
Michael Moor@Michael_D_Moor

Excited to announce MIRIAD — a large-scale dataset of 5,821,948 medical question-answer pairs, each rephrased from passages in the medical literature. Great collab with @QueyJ, @salmanabdullah_, @samarthrawal, @cyrilzakka, @SophieOstmeier, Maximilian Purk, @edreisMD, @EricTopol & @jure! Page: med-miriad.github.io Dataset: huggingface.co/miriad Preprint: arxiv.org/abs/2506.06091 Code: github.com/eth-medical-ai… Demo: med-miriad.github.io/demo [1/n]

English
2
10
51
5.1K
Salman Abdullah retweetledi
Avanika Narayan
Avanika Narayan@Avanika15·
can you chat privately with a cloud llm—*without* sacrificing speed? excited to release minions secure chat: an open-source protocol for end-to-end encrypted llm chat with <1% latency overhead (even @ 30B+ params!). cloud providers can’t peek—messages decrypt only inside a secure gpu enclave, where inference stays fully confidential 🤯 links + code in comments👇
English
13
65
243
79.1K
John Yang
John Yang@jyangballin·
@ weekend warriors - DM me a GitHub repo that you like / maintain, and I'll train you a 7B coding agent that's an expert for that repo. Main constraints - it's predominantly Python, and has a testing suite w/ good coverage. (example of good repo = sympy, pandas, sqlfluff)
English
19
8
115
16.9K