Yue Yu

58 posts

Yue Yu

@yue___yu

FAIR CodeGen @AIatMeta | Ex-Meta Llama | Alum @Tsinghua_Uni @GTCSE | NLP | Large Language Models

Mountain View, CA Katılım Ağustos 2020

522 Takip Edilen841 Takipçiler

Yue Yu retweetledi

AI at Meta@AIatMeta·8 Nis

Introducing Muse Spark, the first in the Muse family of models developed by Meta Superintelligence Labs. Muse Spark is a natively multimodal reasoning model with support for tool-use, visual chain of thought, and multi-agent orchestration. Muse Spark is available today at meta.ai and the Meta AI app. We’re also making it available in private preview via API to select partners, and we hope to open-source future versions of the model. Learn more: go.meta.me/43ea00

English

475

1.1K

2.9M

Yue Yu retweetledi

Yixin Liu@YixinLiu17·13 Mar

Introducing Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training LLM alignment in non-verifiable domains is hard because there is often no clear ground-truth reward. A natural idea is to use reasoning LLM judges inside the RL training loop — but do they actually work better than standard judges? We study this question in a controlled setup with a gold-standard judge, and find that reasoning judges train much stronger policies under gold evaluation, while non-reasoning judges are much more prone to reward hacking. But there is also a catch: these reasoning-judge-trained policies can learn highly effective adversarial strategies. In our study, a Llama-3.1-8B policy trained with a Qwen3-4B reasoning judge reaches 89.6% on the creative writing subset of Arena-Hard-V2, close to o3 (92.4%). 📚 Paper: arxiv.org/abs/2603.12246 See details below 👇 🧵1/N

English

198

43.5K

Yue Yu retweetledi

Ran Xu@ritaranx·6 Kas

Happy to introduce my internship work at @Google and @GoogleDeepMind, collab w/ @googlecloud. We introduce TIR-Judge, an end-to-end agentic RL framework that trains LLM judges with tool-integrated reasoning 🧠🛠️ 🔗arxiv.org/pdf/2510.23038 #Agents #LLMs #Judges #RL #reasoning

English

521

45.5K

Yue Yu@yue___yu·13 Haz

@yuchen_zhuang @GoogleDeepMind Congratulations, Yuchen!

English

628

Yuchen Zhuang@yuchen_zhuang·13 Haz

Excited to share that I joined @GoogleDeepMind as a research scientist recently 🥳 Looking forward to future collaborations on exciting projects :)

English

641

36.9K

Yue Yu retweetledi

Wenqi Shi@WenqiShi0106·13 Haz

🤔 How can we systematically enhance LLMs for complex medical coding tasks? 🚀 Introducing MedAgentGym, an interactive gym-style platform designed specifically for training LLM agents in coding-based medical reasoning! 🧬💻 🎯 Comprehensive Code-based Medical Reasoning Benchmark: - 📌 72,000+ coding tasks across 129 diverse categories - 🌐 12 authentic, real-world biomedical scenarios - 🛠️ Supports structured and open-ended coding challenges ⚙️ Robust & Scalable Infrastructure: - 🚧 Task-specific Docker environments for isolated, reproducible execution - 🚦 Multi-threaded, parallel execution and sequential sampling - 📊 Efficient trajectory collection tailored for diverse agent training 🔗 Explore more: 📖 Paper: arxiv.org/abs/2506.04405 📚 Doc: wshi83.github.io/MedAgentGym-Pa… 💻 Code: github.com/wshi83/MedAgen… 🤗 Model & Data: huggingface.co/MedAgentGym #AI #MachineLearning #Bioinformatics #MedicalAI #LLM #DeepLearning #HealthTech #OpenSource #Healthcare #DataScience #MedTech #NLP #GenerativeAI #Coding #Benchmark

English

126

228.8K

Yue Yu retweetledi

Wei Ping@_weiping·12 Ara

We’re at #NeurIPS 2024 in Vancouver, presenting two papers from NVIDIA on advancing state-of-the-art LLM RAG models! ChatQA: Surpassing GPT-4 on Conversational QA and RAG Thu 12 Dec 11 a.m. PST — 2 p.m. PST, West Ballroom A-D #7201 Paper: arxiv.org/abs/2401.10225 RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs Thu 12 Dec 4:30 p.m. PST — 7:30 p.m. PST, East Exhibit Hall A-C #4604 Paper: arxiv.org/abs/2407.02485 We are hiring full-time researchers and interns to work on LLMs, reasoning and multimodal LLMs. Feel free to reach out via chat or DM if you're interested!

English

8.4K

Yue Yu retweetledi

Yuchen Zhuang@yuchen_zhuang·10 Ara

Excited to present HYDRA 🐉 at #NeurIPS2024! 🚀 Our novel model-factorization framework combines personal behavior patterns 👤 with global knowledge 🌐 for truly personalized LLM generation. Achieves 9%+ gains over SOTA across 5 tasks 🏆 using personalized RAG. Learn more: arxiv.org/pdf/2406.02888

English

5.4K

Yue Yu@yue___yu·5 Ara

@ssgrn @yue__yu Thanks Suchin🤗!

English

120

Yue Yu@yue___yu·5 Ara

8/8 Thanks all coauthors from the Meta Llama Team for the great support! Zhengxing Chen, Aston Zhang @astonzhangAZ , Liang Tan, Chenguang Zhu @ChenguangZhu2 , Richard Pang @yzpang_ , Yundi Qian , Xuewei Wang, Suchin Gururangan @ssgrn, Chao Zhang @chaozhangcs, Melanie Kambadur, Dhruv Mahajan as well as my mentor Rui Hou @magpie_rayhou

Indonesia

1.2K

Yue Yu@yue___yu·5 Ara

7/n here are some papers relevant to us: Cloud: arxiv.org/abs/2408.11791 Generative verifiers: arxiv.org/abs/2408.15240 Generative RM: arxiv.org/abs/2410.12832

English

1.2K

Yue Yu@yue___yu·5 Ara

🔍 Reward modeling is a reasoning task—can self-generated CoT-style critiques help? 🚀 Check out my intern work at Llama Team @AIatMeta, 3.7-7.3% gains on RewardBench vs. RM & LLM judge baselines, with better generalization & data efficiency! arxiv.org/abs/2411.16646 #rlhf #LLM

English

195

20.3K

Yue Yu retweetledi

Ran Xu@ritaranx·11 Kas

Excited to be at #EMNLP2024 and share our 3 papers on LLM for Health! Let’s chat if you are interested! 📅 Nov 13 10:30-12:00 Session 6 BMRetriever: LLMs for text retrieval MedAdapter: LLMs for medical reasoning 📅 Nov 14 14:00-15:30 Session 12 EHRAgent: LLM Agents for EHR QA

English

4.7K

Yue Yu retweetledi

Duen Horng "Polo" Chau@PoloChau·31 Eki

🎉The coolest #CSE school in the world is hiring multiple faculty members! Application link below👇

English

5.6K

Yue Yu@yue___yu·24 Ağu

@katieelink @philipcortes Hi Katie, for the biomedical retrievers, feel free to checkout our recent work arxiv.org/abs/2404.18443 (with parameter size 410M/1B/2B/7B size), we are currently working on some biomed RAG tasks, so stay tuned!

Ran Xu@ritaranx

🧬 Still using BM25 for biomedical retrieval? Try out BMRetriever! 🔍 Our new series of retrievers enhance biomedical search with various scales (410M-7B). 🔓 Model/Data: huggingface.co/BMRetriever 🌠 Github: github.com/ritaranx/BMRet… #BiomedicalResearch #LLM #Retrieval #OpenScience

English

229

Katie Link@katieelink·23 Ağu

@philipcortes mainly looking for the llm, but also always interested in retriever models working well in this domain! i'm a fan of @DrQiaoJin and team's work on MedCPT and MedRAG here huggingface.co/ncbi/MedCPT-Qu… teddy-xionggz.github.io/benchmark-medi…

English

1.2K

Katie Link@katieelink·23 Ağu

who's training/trained a good "small" biomedical LLM (<7B params)? i.e. a phi or gemma sized model

English

155

45.8K

Yue Yu@yue___yu·11 Ağu

@BowenJin13 Great work! I feel it is more like a graph api/tool-usage benchmark instead of pure RAG on graph?

English

364

Bowen Jin@BowenJin13·9 Ağu

🚀Graph RAG is hot! 🚀Our "Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs" has been accepted to ACL 2024. ⭐️GRBench: A new benchmark for graph RAG research. ⭐️Graph CoT: An iterative framework to let LLM explore on graph environments. #graph #LLM

English

251

28.9K

Keşfet

@Google @GoogleDeepMind @googlecloud @yuchen_zhuang @yue__yu @astonzhangAZ @ChenguangZhu2 @yzpang_