Yue Yu

58 posts

Yue Yu

Yue Yu

@yue___yu

FAIR CodeGen @AIatMeta | Ex-Meta Llama | Alum @Tsinghua_Uni @GTCSE | NLP | Large Language Models

Mountain View, CA Katılım Ağustos 2020
522 Takip Edilen841 Takipçiler
Yue Yu retweetledi
AI at Meta
AI at Meta@AIatMeta·
Introducing Muse Spark, the first in the Muse family of models developed by Meta Superintelligence Labs. Muse Spark is a natively multimodal reasoning model with support for tool-use, visual chain of thought, and multi-agent orchestration. Muse Spark is available today at meta.ai and the Meta AI app. We’re also making it available in private preview via API to select partners, and we hope to open-source future versions of the model. Learn more: go.meta.me/43ea00
AI at Meta tweet media
English
475
1.1K
9K
2.9M
Yue Yu retweetledi
Yixin Liu
Yixin Liu@YixinLiu17·
Introducing Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training LLM alignment in non-verifiable domains is hard because there is often no clear ground-truth reward. A natural idea is to use reasoning LLM judges inside the RL training loop — but do they actually work better than standard judges? We study this question in a controlled setup with a gold-standard judge, and find that reasoning judges train much stronger policies under gold evaluation, while non-reasoning judges are much more prone to reward hacking. But there is also a catch: these reasoning-judge-trained policies can learn highly effective adversarial strategies. In our study, a Llama-3.1-8B policy trained with a Qwen3-4B reasoning judge reaches 89.6% on the creative writing subset of Arena-Hard-V2, close to o3 (92.4%). 📚 Paper: arxiv.org/abs/2603.12246 See details below 👇 🧵1/N
Yixin Liu tweet media
English
3
28
198
43.5K
Yuchen Zhuang
Yuchen Zhuang@yuchen_zhuang·
Excited to share that I joined @GoogleDeepMind as a research scientist recently 🥳 Looking forward to future collaborations on exciting projects :)
English
16
11
641
36.9K
Yue Yu retweetledi
Wenqi Shi
Wenqi Shi@WenqiShi0106·
🤔 How can we systematically enhance LLMs for complex medical coding tasks? 🚀 Introducing MedAgentGym, an interactive gym-style platform designed specifically for training LLM agents in coding-based medical reasoning! 🧬💻 🎯 Comprehensive Code-based Medical Reasoning Benchmark: - 📌 72,000+ coding tasks across 129 diverse categories - 🌐 12 authentic, real-world biomedical scenarios - 🛠️ Supports structured and open-ended coding challenges ⚙️ Robust & Scalable Infrastructure: - 🚧 Task-specific Docker environments for isolated, reproducible execution - 🚦 Multi-threaded, parallel execution and sequential sampling - 📊 Efficient trajectory collection tailored for diverse agent training 🔗 Explore more: 📖 Paper: arxiv.org/abs/2506.04405 📚 Doc: wshi83.github.io/MedAgentGym-Pa… 💻 Code: github.com/wshi83/MedAgen… 🤗 Model & Data: huggingface.co/MedAgentGym #AI #MachineLearning #Bioinformatics #MedicalAI #LLM #DeepLearning #HealthTech #OpenSource #Healthcare #DataScience #MedTech #NLP #GenerativeAI #Coding #Benchmark
Wenqi Shi tweet media
English
9
21
126
228.8K
Yue Yu retweetledi
Wei Ping
Wei Ping@_weiping·
We’re at #NeurIPS 2024 in Vancouver, presenting two papers from NVIDIA on advancing state-of-the-art LLM RAG models! ChatQA: Surpassing GPT-4 on Conversational QA and RAG Thu 12 Dec 11 a.m. PST — 2 p.m. PST, West Ballroom A-D #7201 Paper: arxiv.org/abs/2401.10225 RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs Thu 12 Dec 4:30 p.m. PST — 7:30 p.m. PST, East Exhibit Hall A-C #4604 Paper: arxiv.org/abs/2407.02485 We are hiring full-time researchers and interns to work on LLMs, reasoning and multimodal LLMs. Feel free to reach out via chat or DM if you're interested!
English
2
13
60
8.4K
Yue Yu retweetledi
Yuchen Zhuang
Yuchen Zhuang@yuchen_zhuang·
Excited to present HYDRA 🐉 at #NeurIPS2024! 🚀 Our novel model-factorization framework combines personal behavior patterns 👤 with global knowledge 🌐 for truly personalized LLM generation. Achieves 9%+ gains over SOTA across 5 tasks 🏆 using personalized RAG. Learn more: arxiv.org/pdf/2406.02888
Yuchen Zhuang tweet media
English
2
16
67
5.4K
Yue Yu
Yue Yu@yue___yu·
8/8 Thanks all coauthors from the Meta Llama Team for the great support! Zhengxing Chen, Aston Zhang @astonzhangAZ , Liang Tan, Chenguang Zhu @ChenguangZhu2 , Richard Pang @yzpang_ , Yundi Qian , Xuewei Wang, Suchin Gururangan @ssgrn, Chao Zhang @chaozhangcs, Melanie Kambadur, Dhruv Mahajan as well as my mentor Rui Hou @magpie_rayhou
Indonesia
0
0
2
1.2K
Yue Yu
Yue Yu@yue___yu·
🔍 Reward modeling is a reasoning task—can self-generated CoT-style critiques help? 🚀 Check out my intern work at Llama Team @AIatMeta, 3.7-7.3% gains on RewardBench vs. RM & LLM judge baselines, with better generalization & data efficiency! arxiv.org/abs/2411.16646 #rlhf #LLM
Yue Yu tweet mediaYue Yu tweet mediaYue Yu tweet media
English
5
51
195
20.3K
Yue Yu retweetledi
Ran Xu
Ran Xu@ritaranx·
Excited to be at #EMNLP2024 and share our 3 papers on LLM for Health! Let’s chat if you are interested! 📅 Nov 13 10:30-12:00 Session 6 BMRetriever: LLMs for text retrieval MedAdapter: LLMs for medical reasoning 📅 Nov 14 14:00-15:30 Session 12 EHRAgent: LLM Agents for EHR QA
English
2
6
22
4.7K
Yue Yu retweetledi
Duen Horng "Polo" Chau
Duen Horng "Polo" Chau@PoloChau·
🎉The coolest #CSE school in the world is hiring multiple faculty members! Application link below👇
Duen Horng "Polo" Chau tweet media
English
1
18
44
5.6K
Yue Yu
Yue Yu@yue___yu·
@katieelink @philipcortes Hi Katie, for the biomedical retrievers, feel free to checkout our recent work arxiv.org/abs/2404.18443 (with parameter size 410M/1B/2B/7B size), we are currently working on some biomed RAG tasks, so stay tuned!
Ran Xu@ritaranx

🧬 Still using BM25 for biomedical retrieval? Try out BMRetriever! 🔍 Our new series of retrievers enhance biomedical search with various scales (410M-7B). 🔓 Model/Data: huggingface.co/BMRetriever 🌠 Github: github.com/ritaranx/BMRet… #BiomedicalResearch #LLM #Retrieval #OpenScience

English
1
0
3
229
Katie Link
Katie Link@katieelink·
who's training/trained a good "small" biomedical LLM (<7B params)? i.e. a phi or gemma sized model
English
23
16
155
45.8K
Yue Yu
Yue Yu@yue___yu·
@BowenJin13 Great work! I feel it is more like a graph api/tool-usage benchmark instead of pure RAG on graph?
English
1
0
0
364
Bowen Jin
Bowen Jin@BowenJin13·
🚀Graph RAG is hot! 🚀Our "Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs" has been accepted to ACL 2024. ⭐️GRBench: A new benchmark for graph RAG research. ⭐️Graph CoT: An iterative framework to let LLM explore on graph environments. #graph #LLM
Bowen Jin tweet media
English
6
53
251
28.9K