Kevin Wu (@kevinywu) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Kevin Wu@kevinywu·4d

Very fun to use, I'm very excited for this update!

Big Update🤩: #paperclip now includes full papers from all of arXiv, PubMed Central and 150 million abstracts!🖇️ You can give your LLM all that knowledge in one line—all optimally indexed for AI agents. Much more thorough and ~100x faster than web search, and free.

English

0

2

17

3.2K

Kevin Wu@kevinywu·10h

@benweisburd @james_y_zou @gxl_ai Yes, it should be in very shortly! Thanks for flagging.

English

0

2

17

Ben Weisburd@benweisburd·14h

@james_y_zou @gxl_ai Thank you for building this! It's now my goto for searching papers programatically. I'm currently seeing bioRxiv and medRxiv papers through Feb 2026. Will paperclip be able to include more recent preprints?

English

1

0

50

James Zou@james_y_zou·4d

Big Update🤩: #paperclip now includes full papers from all of arXiv, PubMed Central and 150 million abstracts!🖇️ You can give your LLM all that knowledge in one line—all optimally indexed for AI agents. Much more thorough and ~100x faster than web search, and free.

English

41

238

1.7K

119.7K

Kevin Wu@kevinywu·3d

@gravity7 @james_y_zou @gxl_ai The update includes all of arxiv (including CS)

English

0

1

23

Adrian Chan@gravity7·3d

@james_y_zou @gxl_ai Biomed arxiv only or also CS, etc?

English

1

0

337

Kevin Wu@kevinywu·10 Nis

@FarAICoder @james_y_zou it depends on the agent, but we've found paperclip is several times more token-efficient than native claude code + web search (which tends to burn tokens when it can't find the right answer immediately)

English

0

113

Far@FarAICoder·10 Nis

@james_y_zou eight million papers and the demo is a single line of code, what's the actual token cost per query on this thing

English

1

0

4

2.3K

James Zou@james_y_zou·9 Nis

You can now give your agent deep knowledge of millions of papers in one line with #paperclip!📎 >8 million papers natively indexed for agents. Much more thorough + often 10x faster than standard deep research. Just add the paperclip mcp (instruction below).

English

31

210

2K

207.2K

Kevin Wu@kevinywu·10 Nis

@justic_hot @Drejc98727095 @james_y_zou you're getting the actual vlm response to your agent's question! (in large part motivated by the same frustration w/ only getting caption text) (and yes, we also index every part of the paper, including supplements, etc.)

English

1

0

46

tang | AI Product Maker@justic_hot·10 Nis

@Drejc98727095 @james_y_zou ah missed that bit, fair. tbh the figure QA claim is what i'd poke at. every paper MCP i've tried just pulls caption text and calls that 'reading the figure'. want to see if this one actually looks at the pixels

English

1

0

150

Kevin Wu@kevinywu·20 Tem

Had a fun time with @PaulYiMD and @ericwu93 talking about medical LLM evaluations!

Radiology: Artificial Intelligence@Radiology_AI

🚨 New Episode of the @Radiology_AI podcast is now available! 🚨 Host @PaulYiMD chats with @ericwu93 & @kevinywu about their path from Stanford to startup, building MedArena to evaluate medical LLMs, and making AI work in real clinical workflows bit.ly/46j5NcP

English

0

2

289

Kevin Wu@kevinywu·16 Tem

Fine-tuning APIs allow developers to update model weights for frontier models, but can they actually teach models new information? Our study published today in @nejmai shows that out-of-box SFT with commercial APIs has poor generalizability on medical knowledge. That is to say, we need more fine-grained control of models beyond what's currently available.

Isaac Kohane@zakkohane

Generalization does not go as expected and fine-tuning does not substitute for RAG. From @NEJM_AI a study by @ericwu93 @james_y_zou on fine-tuning frontier LLM's with medical data. More in the reply below

English

0

3

11

1.9K

Kevin Wu retweetledi

James Zou@james_y_zou·11 Tem

📢New conference where AI is the primary author and reviewer! agents4science.stanford.edu Current venues don't allow AI-written papers, so it's hard to assess the +/- of such works🤔 #Agents4Science solicits papers where AI is the main author w/ human advisors. 💡Initial reviews by LLM reviewers w/ final assessment + selection by human experts. 💡Submissions are asked to clearly document AI contribution. 💡All submissions/reviews will be public to enable transparent study of the strength and limitations of AI as researcher and reviewer. We expect AI will make mistakes and it will be instructive to study these in the open! Many thanks to the fantastic co-organizers and expert advisory board! Please see the website for more information.

English

20

129

503

113.6K

Kevin Wu retweetledi

James Zou@james_y_zou·16 Haz

Excited to introduce #CollabLLM -- a method to train LLMs to collaborate better w/ humans! Selected as #icml2025 oral (top 1%)🏅 New multi-turn training objective + user simulator👇

Shirley Wu@ShirleyYXWu

Even the smartest LLMs can fail at basic multiturn communication Ask for grocery help → without asking where you live 🤦‍♀️ Ask to write articles → assumes your preferences 🤷🏻‍♀️ ⭐️CollabLLM (top 1%; oral @icmlconf) transforms LLMs from passive responders into active collaborators. Website: aka.ms/CollabLLM Github: github.com/Wuyxin/collabl… Blog: #blog" target="_blank" rel="nofollow noopener">wuyxin.github.io/collabllm/#blog Paper: arxiv.org/pdf/2502.00640 🎯 Key insight: Rewards responses not by immediate helpfulness, but by their long-term impact on the conversation trajectory. @MSFTResearch @StanfordAILab @stanfordnlp

English

6

9

52

6.9K

Kevin Wu retweetledi

Stanford HAI@StanfordHAI·28 Nis

Current paradigms for evaluating medical LLM suffer from significant challenges that limit their real-world applications. To address this, scholars introduce a free platform for clinicians to test and compare top-performing LLMs on their medical queries. hai.stanford.edu/news/medarena-…

English

2

7

20

3.5K

Kevin Wu retweetledi

James Zou@james_y_zou·24 Nis

We discuss medarena.ai and some interesting initial findings in new @StanfordHAI blog. hai.stanford.edu/news/medarena-…

English

1

4

12

2.6K

Kevin Wu@kevinywu·21 Nis

Our paper out in @NatureComms! Citing relevant medical sources continues to be a difficult task for LLMs, largely mediated by a "tug-of-war" between model prior and context (are LLMs basing their answer off the source or do they find sources to back up their answer post-hoc?)

James Zou@james_y_zou

Does RAG solve hallucination? Even w/ RAG, we found that >30% of LLMs' medical statements are not fully supported by (sometimes contradict) the cited refs @NatureComms nature.com/articles/s4146… We present #SourceCheckUp agent to verify faithfulness of LM to source info. Great job @kevinywu @ericwu93 w/ awesome collaborators👏

English

0

2

5

674

Kevin Wu retweetledi

StanfordDBDS@StanfordDBDS·27 Mar

Zou Lab launches MedArena, a free platform for clinicians to use and compare frontier LLMs MedArena is a free platform for clinicians to use and compare how frontier LLMs work on medical queries. Check it out at: medarena.ai/login

English

0

4

5

694

Kevin Wu retweetledi

James Zou@james_y_zou·12 Mar

Interesting that Gemini Flash Thinking has emerged as clinicians' preferred model on #MedArena! 🏅 Clinicians around the world can now use and compare frontier #LLMs for free at medarena.ai/login. #medtwitter

English

1

8

64

5.8K

Kevin Wu retweetledi

David Ouyang, MD@David_Ouyang·1 Mar

Really cool way to evaluate LLMs for medicine in blinded fashion. It really helps build intuition on what works and doesnt. For example, I really like that there is RAG in some models (able to cite literature and references claims), but I've found when I've clicked into some of the references, it doesn't actually reflect the conclusion stated. Not sure if that's better or worse, since there seems be a bit of 'mode collapse' and for many prompts, the models all converge on similar (but incomplete) answers. Makes me wonder if they are mostly training on the same public text corpus (pubmed, google search, medscape, etc). @james_y_zou @kevinywu @ericwu93

Kevin Wu@kevinywu

Excited to share early updates from MedArena 🏥✨! 1⃣ Current frontrunner: @perplexity_ai 🤖, closely followed by Gemini 2.0 Flash Thinking 🌟—but the competition's just warming up! 🔥 2⃣ Great news: MedArena is global 🌍! Non-US clinicians are welcome—just provide your credentials at login. Feel free to share with clinicians to help us grow the leaderboard! medarena.ai 🚀 @james_y_zou @ericwu93 @EricTopol @NEJM_AI @David_Ouyang #MedTwitter #DigitalHealth #MedicalAI #MedTech

English

2

7

29

6.3K

Kevin Wu@kevinywu·1 Mar

@CyrusMaher @perplexity_ai The link should be accessible from the login page, and you can also visit medarena.ai/leaderboard

English

0

2

459

Cyrus Maher@CyrusMaher·1 Mar

@kevinywu @perplexity_ai How can I view the leaderboard as a non-MD?

English

1

0

1

538

Kevin Wu@kevinywu·28 Şub

Excited to share early updates from MedArena 🏥✨! 1⃣ Current frontrunner: @perplexity_ai 🤖, closely followed by Gemini 2.0 Flash Thinking 🌟—but the competition's just warming up! 🔥 2⃣ Great news: MedArena is global 🌍! Non-US clinicians are welcome—just provide your credentials at login. Feel free to share with clinicians to help us grow the leaderboard! medarena.ai 🚀 @james_y_zou @ericwu93 @EricTopol @NEJM_AI @David_Ouyang #MedTwitter #DigitalHealth #MedicalAI #MedTech