Keming (Luke) Lu

126 posts

Keming (Luke) Lu banner
Keming (Luke) Lu

Keming (Luke) Lu

@KemingLu612

Reasoning, Post-training, RLHF, Pre-Qwen #THU, #USC

Los Angeles, CA Sumali Aralık 2023
143 Sinusundan262 Mga Tagasunod
Keming (Luke) Lu nag-retweet
Tianle Cai
Tianle Cai@tianle_cai·
Life update: Following my recent graduation, I've joined the Bytedance Seed Edge team to pursue this research direction further. Although this post was written last year, my conviction in this approach has only strengthened (many ideas here echo compelling recent writings from the legendary Rich Sutton and Shunyu, such as the need for rewards to help models evolve beyond the classic finite-context learning paradigm). In the near term, my focus will be on making individual agents evolvable, with the next phase involving connecting and scaling these evolvable agents. I'm incredibly excited about the potential achievements in this direction and welcome connections, discussions, or collaboration. I'll also be attending ICLR next week; please send a DM if you'd like to chat. Welcome to the second half of AGI 😉
Tianle Cai@tianle_cai

x.com/i/article/1848…

English
6
8
211
21.2K
Keming (Luke) Lu
Keming (Luke) Lu@KemingLu612·
I have to retweet this one… verl is evolving so fast. Just can’t wait to see what’s next
Haibin@eric_haibin_lin

Recent updates on @verl_project (RL lib for LLMs): Engine: - Megatron qwen & GRPO support, v0.11 upgrade - vllm v0.7 integration with v1 mode - experimental sglang integration Algorithm & recipes: - vision language reasoning with qwen2.5-vl - PRIME, RLOO, remax, math-verify rewards, etc Docs: - tutorial for distributed training setup and debugging - programming model tutorial Hardware: - experimental AMD support And many awesome community projects such as code-R1, Easy-R1, Search-R1, RAGEN, etc. Big thank you to the community! Working on multi-turn & environment/tool supports. Stay tuned...

English
1
0
6
542
Keming (Luke) Lu
Keming (Luke) Lu@KemingLu612·
context distillation + rejection sampling almost solve all challenges this week… Highly recommend!
English
0
0
4
285
Junyang Lin
Junyang Lin@JustinLin610·
Can anyone tell me your guess of the implementation of reinforcement finetuning?
English
23
5
166
29.1K
Keming (Luke) Lu nag-retweet
Jason Wei
Jason Wei@_jasonwei·
Andrej’s tweet is the right way to think about it right now but I totally believe that in one or two years we will start relying on AI for very challenging decisions like diagnosing disease under limited information. Key thing to note here is that big decisions can be viewed as a tree of individual reasoning steps and RL on chain of thought seems like a feasible way for AI to do any single step pretty well and probably recover if there is a mistake In addition, with better scaffolding like improvements in retrieval, browsing, and long context management, AI will be able to leverage its inherent advantages over humans like not getting tired or distracted, having nearly infinite memory, and not being clouded by emotions. So i think we will reach the “magical AI feeling” soon :)
Andrej Karpathy@karpathy

People have too inflated sense of what it means to "ask an AI" about something. The AI are language models trained basically by imitation on data from human labelers. Instead of the mysticism of "asking an AI", think of it more as "asking the average data labeler" on the internet. Few caveats apply because e.g. in many domains (e.g. code, math, creative writing) the companies hire skilled data labelers (so think of it as asking them instead), and this is not 100% true when reinforcement learning is involved, though I have an earlier rant on how RLHF is just barely RL, and "actual RL" is still too early and/or constrained to domains that offer easy reward functions (math etc.). But roughly speaking (and today), you're not asking some magical AI. You're asking a human data labeler. Whose average essence was lossily distilled into statistical token tumblers that are LLMs. This can still be super useful ofc ourse. Post triggered by someone suggesting we ask an AI how to run the government etc. TLDR you're not asking an AI, you're asking some mashup spirit of its average data labeler.

English
23
47
415
78.4K
Keming (Luke) Lu nag-retweet
Junyang Lin
Junyang Lin@JustinLin610·
What does it mean to think, to question, to understand? Note: This is the pronunciation of QwQ: /kwju:/ , similar to the word “quill”. Blog: qwenlm.github.io/blog/qwq-32b-p… Model: huggingface.co/Qwen/QwQ-32B-P… Demo: huggingface.co/spaces/Qwen/Qw… Something that can reason, making math problem solving and coding much better. A lot of limitations by the way. Forgive us for a lot of things undone. We will make them happen soon.
Junyang Lin tweet media
English
44
152
928
161.6K
Xidong Feng
Xidong Feng@Xidong_Feng·
Thrilled to share that I’ll be joining @GoogleDeepMind as a Research Scientist with the Discovery Team! It’s a dream come true—8 years ago, I watched AlphaGo live in high school, and now I am lucky enough to be part of this incredible journey. Can’t wait to discover what’s next!
English
23
15
454
42.1K
Junyang Lin
Junyang Lin@JustinLin610·
sleep no more🥝
English
12
1
86
10.9K
Keming (Luke) Lu nag-retweet
Tianyu Liu
Tianyu Liu@rogerliuty·
New survey: Towards a unified view of preference learning for LLMs! 🧠 LLMs are powerful, but aligning them with human preferences is key. This survey breaks down existing alignment strategies into four components: Model, Data, Feedback, and Algorithm. 🔗 This unified view reveals connections between different methods and opens doors for synergistic solutions. 🚀 Explore the challenges and future directions of aligning LLMs with human preferences. Paper: arxiv.org/abs/2409.02795 Github: github.com/kbsdjames/awes… Notion Blog: aeolian-agenda-626.notion.site/Towards-a-unif… #LLMs #AI #PreferenceLearning #Survey #MachineLearning
Tianyu Liu tweet mediaTianyu Liu tweet mediaTianyu Liu tweet mediaTianyu Liu tweet media
English
0
19
37
6.1K
Keming (Luke) Lu nag-retweet
Qwen
Qwen@Alibaba_Qwen·
Today we are thriiled to announce the release of Qwen2-VL! Specifically, we opensource Qwen2-Vl-2B and Qwen2-VL-7B under Apache 2.0 license, and we provide the API of our strongest Qwen2-VL-72B! To learn more about the models, feel free to visit our: Blog: qwenlm.github.io/blog/qwen2-vl/ GitHub: github.com/QwenLM/Qwen2-VL HF: huggingface.co/collections/Qw… ModelScope: modelscope.cn/organization/q… Qwen2-VL is the latest version of our vision language models built upon Qwen2. It consists of the following features: SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.
Qwen tweet media
English
42
198
953
829K
Keming (Luke) Lu nag-retweet
Cameron R. Wolfe, Ph.D.
Cameron R. Wolfe, Ph.D.@cwolferesearch·
Model merging is a popular research topic with applications to LLM alignment and specialization. But, did you know this technique has been studied since the 90s? Here’s a brief timeline… (Stage 0) Original work on model merging dates back to the 90s [1], where authors showed that taking an average of neural network parameters yields a model that performs similarly to averaging the output of multiple neural networks (i.e., an ensemble). (Stage 1) Averaging along the training trajectory. Several works in the mid-to-late 2010s explore the idea of taking an average of model checkpoints throughout the training process. This can be done via an exponential moving average [2] or by just averaging specific checkpoints during training [3] and improves training stability / performance / generalization in certain cases. (Stage 2) Linear mode connectivity [4, 5] is a research topic–coming from research on neural network pruning / sparsity–that is highly related to model merging. Linear mode connectivity shows us that multiple neural networks (finetuned from the same base model) have a linear path of non-increasing loss between them in the parameter space. Put simply, interpolating between two finetuned models yields another model that also performs well. [6] studies this topic in the context of LLMs. (Stage 3) Weight averaging. Based on findings in linear mode connectivity research, we began to see several papers that directly average the parameters of several neural networks obtained in separate training runs (i.e., “model soups”) [7, 8, 9]. This strategy–although simple–was found to have several benefits in terms of model performance and generalization capability, somewhat similarly to creating an ensemble of models from several separate finetuning runs. (Stage 4) Better merging approaches. After initial explorations of model merging, we began to see several research papers that propose more specialized / effective merging techniques, such as: - Using task vectors to merge models [10]. - The TIES [11] or DARE [12] strategies for reducing interference during merging. These works show that model merging is highly effective for improving performance and combining the capabilities of multiple models. However, performance depends on the strategy we use for merging, which can be optimized to reduce conflicts / interference. (Stage 5) LLM-specific research. More recently, we have seen a widespread adoption of model merging within LLM research to combine LLM capabilities [13], improve the alignment process [14], or train better reward models [15]. One especially notable application of model merging in recent research is the use of WARP [14]–a state-of-the-art model merging technique for improving LLM alignment–for aligning Gemma-2 [16].
Cameron R. Wolfe, Ph.D. tweet media
English
8
72
328
59.1K