Kevin Wu

40 posts

Kevin Wu

Kevin Wu

@kevinywu

Research @ GXL, PhD @ Stanford DBDS

Katılım Nisan 2021
66 Takip Edilen200 Takipçiler
Sabitlenmiş Tweet
Kevin Wu
Kevin Wu@kevinywu·
Very fun to use, I'm very excited for this update!
James Zou@james_y_zou

Big Update🤩: #paperclip now includes full papers from all of arXiv, PubMed Central and 150 million abstracts!🖇️ You can give your LLM all that knowledge in one line—all optimally indexed for AI agents. Much more thorough and ~100x faster than web search, and free.

English
0
2
17
3.2K
Ben Weisburd
Ben Weisburd@benweisburd·
@james_y_zou @gxl_ai Thank you for building this! It's now my goto for searching papers programatically. I'm currently seeing bioRxiv and medRxiv papers through Feb 2026. Will paperclip be able to include more recent preprints?
Ben Weisburd tweet media
English
1
0
0
50
James Zou
James Zou@james_y_zou·
Big Update🤩: #paperclip now includes full papers from all of arXiv, PubMed Central and 150 million abstracts!🖇️ You can give your LLM all that knowledge in one line—all optimally indexed for AI agents. Much more thorough and ~100x faster than web search, and free.
James Zou tweet media
English
41
238
1.7K
119.7K
Kevin Wu
Kevin Wu@kevinywu·
@FarAICoder @james_y_zou it depends on the agent, but we've found paperclip is several times more token-efficient than native claude code + web search (which tends to burn tokens when it can't find the right answer immediately)
English
0
0
0
113
Far
Far@FarAICoder·
@james_y_zou eight million papers and the demo is a single line of code, what's the actual token cost per query on this thing
English
1
0
4
2.3K
James Zou
James Zou@james_y_zou·
You can now give your agent deep knowledge of millions of papers in one line with #paperclip!📎 >8 million papers natively indexed for agents. Much more thorough + often 10x faster than standard deep research. Just add the paperclip mcp (instruction below).
English
31
210
2K
207.2K
Kevin Wu
Kevin Wu@kevinywu·
@justic_hot @Drejc98727095 @james_y_zou you're getting the actual vlm response to your agent's question! (in large part motivated by the same frustration w/ only getting caption text) (and yes, we also index every part of the paper, including supplements, etc.)
English
1
0
0
46
tang | AI Product Maker
tang | AI Product Maker@justic_hot·
@Drejc98727095 @james_y_zou ah missed that bit, fair. tbh the figure QA claim is what i'd poke at. every paper MCP i've tried just pulls caption text and calls that 'reading the figure'. want to see if this one actually looks at the pixels
English
1
0
0
150
Kevin Wu
Kevin Wu@kevinywu·
Fine-tuning APIs allow developers to update model weights for frontier models, but can they actually teach models new information? Our study published today in @nejmai shows that out-of-box SFT with commercial APIs has poor generalizability on medical knowledge. That is to say, we need more fine-grained control of models beyond what's currently available.
Isaac Kohane@zakkohane

Generalization does not go as expected and fine-tuning does not substitute for RAG. From @NEJM_AI a study by @ericwu93 @james_y_zou on fine-tuning frontier LLM's with medical data. More in the reply below

English
0
3
11
1.9K
Kevin Wu retweetledi
James Zou
James Zou@james_y_zou·
📢New conference where AI is the primary author and reviewer! agents4science.stanford.edu Current venues don't allow AI-written papers, so it's hard to assess the +/- of such works🤔 #Agents4Science solicits papers where AI is the main author w/ human advisors. 💡Initial reviews by LLM reviewers w/ final assessment + selection by human experts. 💡Submissions are asked to clearly document AI contribution. 💡All submissions/reviews will be public to enable transparent study of the strength and limitations of AI as researcher and reviewer. We expect AI will make mistakes and it will be instructive to study these in the open! Many thanks to the fantastic co-organizers and expert advisory board! Please see the website for more information.
James Zou tweet media
English
20
129
503
113.6K
Kevin Wu retweetledi
James Zou
James Zou@james_y_zou·
Excited to introduce #CollabLLM -- a method to train LLMs to collaborate better w/ humans! Selected as #icml2025 oral (top 1%)🏅 New multi-turn training objective + user simulator👇
Shirley Wu@ShirleyYXWu

Even the smartest LLMs can fail at basic multiturn communication Ask for grocery help → without asking where you live 🤦‍♀️ Ask to write articles → assumes your preferences 🤷🏻‍♀️ ⭐️CollabLLM (top 1%; oral @icmlconf) transforms LLMs from passive responders into active collaborators. Website: aka.ms/CollabLLM Github: github.com/Wuyxin/collabl… Blog: #blog" target="_blank" rel="nofollow noopener">wuyxin.github.io/collabllm/#blog Paper: arxiv.org/pdf/2502.00640 🎯 Key insight: Rewards responses not by immediate helpfulness, but by their long-term impact on the conversation trajectory. @MSFTResearch @StanfordAILab @stanfordnlp

English
6
9
52
6.9K
Kevin Wu retweetledi
Stanford HAI
Stanford HAI@StanfordHAI·
Current paradigms for evaluating medical LLM suffer from significant challenges that limit their real-world applications. To address this, scholars introduce a free platform for clinicians to test and compare top-performing LLMs on their medical queries. hai.stanford.edu/news/medarena-…
Stanford HAI tweet media
English
2
7
20
3.5K
Kevin Wu
Kevin Wu@kevinywu·
Our paper out in @NatureComms! Citing relevant medical sources continues to be a difficult task for LLMs, largely mediated by a "tug-of-war" between model prior and context (are LLMs basing their answer off the source or do they find sources to back up their answer post-hoc?)
James Zou@james_y_zou

Does RAG solve hallucination? Even w/ RAG, we found that >30% of LLMs' medical statements are not fully supported by (sometimes contradict) the cited refs @NatureComms nature.com/articles/s4146… We present #SourceCheckUp agent to verify faithfulness of LM to source info. Great job @kevinywu @ericwu93 w/ awesome collaborators👏

English
0
2
5
674
Kevin Wu retweetledi
StanfordDBDS
StanfordDBDS@StanfordDBDS·
Zou Lab launches MedArena, a free platform for clinicians to use and compare frontier LLMs MedArena is a free platform for clinicians to use and compare how frontier LLMs work on medical queries. Check it out at: medarena.ai/login
English
0
4
5
694
Kevin Wu retweetledi
James Zou
James Zou@james_y_zou·
Interesting that Gemini Flash Thinking has emerged as clinicians' preferred model on #MedArena! 🏅 Clinicians around the world can now use and compare frontier #LLMs for free at medarena.ai/login. #medtwitter
James Zou tweet media
English
1
8
64
5.8K
Kevin Wu retweetledi
David Ouyang, MD
David Ouyang, MD@David_Ouyang·
Really cool way to evaluate LLMs for medicine in blinded fashion. It really helps build intuition on what works and doesnt. For example, I really like that there is RAG in some models (able to cite literature and references claims), but I've found when I've clicked into some of the references, it doesn't actually reflect the conclusion stated. Not sure if that's better or worse, since there seems be a bit of 'mode collapse' and for many prompts, the models all converge on similar (but incomplete) answers. Makes me wonder if they are mostly training on the same public text corpus (pubmed, google search, medscape, etc). @james_y_zou @kevinywu @ericwu93
Kevin Wu@kevinywu

Excited to share early updates from MedArena 🏥✨! 1⃣ Current frontrunner: @perplexity_ai 🤖, closely followed by Gemini 2.0 Flash Thinking 🌟—but the competition's just warming up! 🔥 2⃣ Great news: MedArena is global 🌍! Non-US clinicians are welcome—just provide your credentials at login. Feel free to share with clinicians to help us grow the leaderboard! medarena.ai 🚀 @james_y_zou @ericwu93 @EricTopol @NEJM_AI @David_Ouyang #MedTwitter #DigitalHealth #MedicalAI #MedTech

English
2
7
29
6.3K
Kevin Wu
Kevin Wu@kevinywu·
Excited to share early updates from MedArena 🏥✨! 1⃣ Current frontrunner: @perplexity_ai 🤖, closely followed by Gemini 2.0 Flash Thinking 🌟—but the competition's just warming up! 🔥 2⃣ Great news: MedArena is global 🌍! Non-US clinicians are welcome—just provide your credentials at login. Feel free to share with clinicians to help us grow the leaderboard! medarena.ai 🚀 @james_y_zou @ericwu93 @EricTopol @NEJM_AI @David_Ouyang #MedTwitter #DigitalHealth #MedicalAI #MedTech
English
4
23
69
36.2K