Keming (Luke) Lu

126 posts

Keming (Luke) Lu

@KemingLu612

Reasoning, Post-training, RLHF, Pre-Qwen #THU, #USC

Los Angeles, CA Sumali Aralık 2023

143 Sinusundan262 Mga Tagasunod

Keming (Luke) Lu@KemingLu612·18 Nis

Will be at #ICLR2025 next week to present this interesting work from @kakakbibibi! Can’t wait to discuss more about the next stage of LLM/VLM & RL

kabi@kakakbibibi

How can we improve instruction-following abilities without manual efforts? 🤔 We introduce AutoIF for automatically generating instruction-following data and verifying its quality using code execution feedback.🧐 📎Paper: huggingface.co/papers/2406.13… ⚙️Code: github.com/QwenLM/AutoIF

English

359

Keming (Luke) Lu nag-retweet

Tianle Cai@tianle_cai·17 Nis

Life update: Following my recent graduation, I've joined the Bytedance Seed Edge team to pursue this research direction further. Although this post was written last year, my conviction in this approach has only strengthened (many ideas here echo compelling recent writings from the legendary Rich Sutton and Shunyu, such as the need for rewards to help models evolve beyond the classic finite-context learning paradigm). In the near term, my focus will be on making individual agents evolvable, with the next phase involving connecting and scaling these evolvable agents. I'm incredibly excited about the potential achievements in this direction and welcome connections, discussions, or collaboration. I'll also be attending ICLR next week; please send a DM if you'd like to chat. Welcome to the second half of AGI 😉

Tianle Cai@tianle_cai

x.com/i/article/1848…

English

211

21.2K

Keming (Luke) Lu@KemingLu612·21 Mar

I guess yes if scaling law holds and is still the right path towards AGI. It’s too risky to recklessly run into training huge models? (small models are necessary for debugging btw lol

Chujie Zheng@ChujieZheng

Personal random thought: Should an AGI team spend time and compute in training small models? Given that: (1) AGI would not occur first in small models, and (2) training them would distract the work on larger models

English

254

Keming (Luke) Lu@KemingLu612·16 Mar

I have to retweet this one… verl is evolving so fast. Just can’t wait to see what’s next

Haibin@eric_haibin_lin

Recent updates on @verl_project (RL lib for LLMs): Engine: - Megatron qwen & GRPO support, v0.11 upgrade - vllm v0.7 integration with v1 mode - experimental sglang integration Algorithm & recipes: - vision language reasoning with qwen2.5-vl - PRIME, RLOO, remax, math-verify rewards, etc Docs: - tutorial for distributed training setup and debugging - programming model tutorial Hardware: - experimental AMD support And many awesome community projects such as code-R1, Easy-R1, Search-R1, RAGEN, etc. Big thank you to the community! Working on multi-turn & environment/tool supports. Stay tuned...

English

542

Keming (Luke) Lu@KemingLu612·19 Ara

context distillation + rejection sampling almost solve all challenges this week… Highly recommend！

English

285

Keming (Luke) Lu@KemingLu612·14 Ara

It is so offensive for including racial background in such claim as moving out the racial info could still clearly show the argument…

Jiao Sun@sunjiao123sun_

Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡

English

768

Keming (Luke) Lu@KemingLu612·10 Ara

Impressive work from Chujie 🌟 His benchmark will definitely set a new landmark in this topic

Chujie Zheng@ChujieZheng

Thrilled to introduce ProcessBench, our benchmark for measuring the ability to identify process errors in mathematical reasoning Paper: huggingface.co/papers/2412.06… We have some really intriguing observations in this work, see below👇

English

474

Keming (Luke) Lu@KemingLu612·10 Ara

@JustinLin610 GRPO/PPO+(model-based/rule-based) verifier

English

799

Junyang Lin@JustinLin610·10 Ara

Can anyone tell me your guess of the implementation of reinforcement finetuning?

English

166

29.1K

Keming (Luke) Lu nag-retweet

Jason Wei@_jasonwei·29 Kas

Andrej’s tweet is the right way to think about it right now but I totally believe that in one or two years we will start relying on AI for very challenging decisions like diagnosing disease under limited information. Key thing to note here is that big decisions can be viewed as a tree of individual reasoning steps and RL on chain of thought seems like a feasible way for AI to do any single step pretty well and probably recover if there is a mistake In addition, with better scaffolding like improvements in retrieval, browsing, and long context management, AI will be able to leverage its inherent advantages over humans like not getting tired or distracted, having nearly infinite memory, and not being clouded by emotions. So i think we will reach the “magical AI feeling” soon :)

Andrej Karpathy@karpathy

People have too inflated sense of what it means to "ask an AI" about something. The AI are language models trained basically by imitation on data from human labelers. Instead of the mysticism of "asking an AI", think of it more as "asking the average data labeler" on the internet. Few caveats apply because e.g. in many domains (e.g. code, math, creative writing) the companies hire skilled data labelers (so think of it as asking them instead), and this is not 100% true when reinforcement learning is involved, though I have an earlier rant on how RLHF is just barely RL, and "actual RL" is still too early and/or constrained to domains that offer easy reward functions (math etc.). But roughly speaking (and today), you're not asking some magical AI. You're asking a human data labeler. Whose average essence was lossily distilled into statistical token tumblers that are LLMs. This can still be super useful ofc ourse. Post triggered by someone suggesting we ask an AI how to run the government etc. TLDR you're not asking an AI, you're asking some mashup spirit of its average data labeler.

English

415

78.4K

Keming (Luke) Lu nag-retweet

Junyang Lin@JustinLin610·27 Kas

What does it mean to think, to question, to understand? Note: This is the pronunciation of QwQ: /kwju:/ , similar to the word “quill”. Blog: qwenlm.github.io/blog/qwq-32b-p… Model: huggingface.co/Qwen/QwQ-32B-P… Demo: huggingface.co/spaces/Qwen/Qw… Something that can reason, making math problem solving and coding much better. A lot of limitations by the way. Forgive us for a lot of things undone. We will make them happen soon.

English

152

928

161.6K

Keming (Luke) Lu@KemingLu612·23 Eki

@Xidong_Feng @GoogleDeepMind Cong!!

English

188

Xidong Feng@Xidong_Feng·23 Eki

Thrilled to share that I’ll be joining @GoogleDeepMind as a Research Scientist with the Discovery Team! It’s a dream come true—8 years ago, I watched AlphaGo live in high school, and now I am lucky enough to be part of this incredible journey. Can’t wait to discover what’s next!

English

454

42.1K

Keming (Luke) Lu@KemingLu612·20 Eyl

Otter has been accpeted by EMNLP2024 as a main conference paper👏

Keming (Luke) Lu@KemingLu612

Check Otter 🦦 from Chenhan Yuan and Fei Huang🙌 Non-disruptive Parameter Insertion for calibration signal prediction in Reward-guided Search, Control Generation, and Speculative Decoding… Seamlessly integrated with existing inference engine! huggingface.co/papers/2408.10…

English

272

Keming (Luke) Lu@KemingLu612·20 Eyl

Neat project👏 You should definitely try this one if you are looking for challenging math benchmarks!

Tianyu Liu@rogerliuty

Introducing Omni-MATH: A Comprehensive Benchmark for Assessing Mathematical Reasoning in LLMs 🌟 Key Features: 📊 4,428 Competition-Level Problems: Carefully categorized by difficulty and mathematical subfields. 🔢 10 Distinct Difficulty Tiers: Analyzing performance across 33+ subfields. 🏆 Top Model Performance: o1-mini scores 60.5% overall, but only 48.56% on problems rated above 5/10 difficulty. ❗ Significant Challenges: Highlights the ongoing difficulties AI faces in solving complex math problems. 🔗 Resources: 🌐 Project Page: omni-math.github.io 💻 GitHub Repository: github.com/KbsdJames/Omni… 📦 Dataset: huggingface.co/datasets/KbsdJ… ⚖️ Omni-Judge: huggingface.co/KbsdJames/Omni… Let's push the boundaries of AI's mathematical reasoning capabilities together! 💪 #AI4MATH #LLM #Mathematics

English

269

Keming (Luke) Lu@KemingLu612·16 Eyl

@JustinLin610 fight on🥹

English

339

Junyang Lin@JustinLin610·16 Eyl

sleep no more🥝

English

10.9K

Keming (Luke) Lu nag-retweet

Noam Brown@polynoamial·14 Eyl

Deep RL is finally back out of the Trough of Disillusionment

Pablo Samuel Castro@pcastr

Great keynote by David Silver, arguing that we need to re-focus on RL to get out of the LLM Valley @RL_Conference

English

623

71.7K

Keming (Luke) Lu nag-retweet

Tianyu Liu@rogerliuty·12 Eyl

New survey: Towards a unified view of preference learning for LLMs! 🧠 LLMs are powerful, but aligning them with human preferences is key. This survey breaks down existing alignment strategies into four components: Model, Data, Feedback, and Algorithm. 🔗 This unified view reveals connections between different methods and opens doors for synergistic solutions. 🚀 Explore the challenges and future directions of aligning LLMs with human preferences. Paper: arxiv.org/abs/2409.02795 Github: github.com/kbsdjames/awes… Notion Blog: aeolian-agenda-626.notion.site/Towards-a-unif… #LLMs #AI #PreferenceLearning #Survey #MachineLearning

English

6.1K

Keming (Luke) Lu@KemingLu612·5 Eyl

Cool！Look forward to the data👏

Pei Zhou@peizNLP

Meet WILDFEEDBACK: Aligning LLMs with in-situ user feedback! Users express implicit/explicit preferences in multi-turn dialogues. We automatically extract and construct contrastive preference pairs for alignment. After applying WILDFEEDBACK, both alignment and overall chat performance boost! Led by @taiwei_shi who did awesome work this summer at @MSFTResearch @Microsoft!

English

391

Keming (Luke) Lu nag-retweet

Qwen@Alibaba_Qwen·29 Ağu

Today we are thriiled to announce the release of Qwen2-VL! Specifically, we opensource Qwen2-Vl-2B and Qwen2-VL-7B under Apache 2.0 license, and we provide the API of our strongest Qwen2-VL-72B! To learn more about the models, feel free to visit our: Blog: qwenlm.github.io/blog/qwen2-vl/ GitHub: github.com/QwenLM/Qwen2-VL HF: huggingface.co/collections/Qw… ModelScope: modelscope.cn/organization/q… Qwen2-VL is the latest version of our vision language models built upon Qwen2. It consists of the following features: SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.

English

198

953

829K

Keming (Luke) Lu@KemingLu612·29 Ağu

I try Qwen2Strawberry

Binyuan Hui@huybery

ETA 6h😎 Qwen2 family will add new members Guess what

English

315

Keming (Luke) Lu nag-retweet

Cameron R. Wolfe, Ph.D.@cwolferesearch·28 Ağu

Model merging is a popular research topic with applications to LLM alignment and specialization. But, did you know this technique has been studied since the 90s? Here’s a brief timeline… (Stage 0) Original work on model merging dates back to the 90s [1], where authors showed that taking an average of neural network parameters yields a model that performs similarly to averaging the output of multiple neural networks (i.e., an ensemble). (Stage 1) Averaging along the training trajectory. Several works in the mid-to-late 2010s explore the idea of taking an average of model checkpoints throughout the training process. This can be done via an exponential moving average [2] or by just averaging specific checkpoints during training [3] and improves training stability / performance / generalization in certain cases. (Stage 2) Linear mode connectivity [4, 5] is a research topic–coming from research on neural network pruning / sparsity–that is highly related to model merging. Linear mode connectivity shows us that multiple neural networks (finetuned from the same base model) have a linear path of non-increasing loss between them in the parameter space. Put simply, interpolating between two finetuned models yields another model that also performs well. [6] studies this topic in the context of LLMs. (Stage 3) Weight averaging. Based on findings in linear mode connectivity research, we began to see several papers that directly average the parameters of several neural networks obtained in separate training runs (i.e., “model soups”) [7, 8, 9]. This strategy–although simple–was found to have several benefits in terms of model performance and generalization capability, somewhat similarly to creating an ensemble of models from several separate finetuning runs. (Stage 4) Better merging approaches. After initial explorations of model merging, we began to see several research papers that propose more specialized / effective merging techniques, such as: - Using task vectors to merge models [10]. - The TIES [11] or DARE [12] strategies for reducing interference during merging. These works show that model merging is highly effective for improving performance and combining the capabilities of multiple models. However, performance depends on the strategy we use for merging, which can be optimized to reduce conflicts / interference. (Stage 5) LLM-specific research. More recently, we have seen a widespread adoption of model merging within LLM research to combine LLM capabilities [13], improve the alignment process [14], or train better reward models [15]. One especially notable application of model merging in recent research is the use of WARP [14]–a state-of-the-art model merging technique for improving LLM alignment–for aligning Gemma-2 [16].

English

328

59.1K

Tuklasin

@kakakbibibi @JustinLin610 @Xidong_Feng @GoogleDeepMind @elonmusk @BarackObama @taylorswift13 @cristiano