Noah Lee

41 posts

Noah Lee

Noah Lee

@nlee288

MS Student @kaist_ai Interested in LLM, Human Alignment

Seoul Katılım Mart 2017
416 Takip Edilen160 Takipçiler
Noah Lee retweetledi
Jiwoo Hong
Jiwoo Hong@jiwoohong98·
⁉️Why do reward models suffer from over-optimization in RLHF? We revisit how representations are learned during reward modeling, revealing “hidden state dispersion” as the key, with a simple fix! 🧵 Meet us at @icmlconf! 📅July 16th (Wed) 11AM–1:30PM 📍East Hall A-B E-2608
Jiwoo Hong tweet media
English
1
3
21
895
Noah Lee retweetledi
Seungone Kim
Seungone Kim@seungonekim·
🏆Glad to share that our BiGGen Bench paper has received the best paper award at @naaclmeeting! x.com/naaclmeeting/s… 📅 Ballroom A, Session I: Thursday May 1st, 16:00-17:30 (MDT) 📅 Session M (Plenary Session): Friday May 2nd, 15:30-16:30 (MDT) 📅 Virtual Conference: Tuesday May 6th, 20:30-21:00 (MDT) I'd like to appreciate our coauthors @scott_sjy Ji Yong Cho @ShayneRedford @chaechaek1214 @dongkeun_yoon @gson_AI Yejin Cho @shafayat_sheikh @jinheonbaek @suehpark @ronalhwang @Jinkyung_Jo Hyowon Cho @haebinshin_ @sylee_ai @hanseok_oh @nlee288 @itsnamgyu @joocjun @miyoung_ko @yoonjoo_le2 @hyungjoochae @jay_shin @jang_yoel @SeonghyeonYe @billyuchenlin @wellecks @gneubig Moontae Lee @Kyungjae__Lee @seo_minjoon! It wouldn't have been possible without everyone's feedback and hard work 😀
Seungone Kim@seungonekim

🤔How can we systematically assess an LM's proficiency in a specific capability without using summary measures like helpfulness or simple proxy tasks like multiple-choice QA? Introducing the ✨BiGGen Bench, a benchmark that directly evaluates nine core capabilities of LMs.

English
10
21
131
31.7K
Noah Lee retweetledi
Jiwoo Hong
Jiwoo Hong@jiwoohong98·
🌏"When" and "Why" can we use English reward models (RMs) in other languages? Late news, but I'm happy to share that a paper co-led with @nlee288, studying the conditions for cross-lingual transfer in RMs, has been accepted to #NAACL2025!🎉 Main insights in the thread🧵
Jiwoo Hong tweet mediaJiwoo Hong tweet mediaJiwoo Hong tweet media
English
1
1
11
682
Noah Lee
Noah Lee@nlee288·
Please visit Riverfront Hall 10:30 AM tmr for our poster presentation of ORPO!
Jiwoo Hong@jiwoohong98

At the upcoming @emnlpmeeting, I will be presenting the ORPO paper with @nlee288 at Miami! 🗓️Nov 14th, 10:30 - 12:00 📍Session F, Riverfront Hall 🔥Excited to meet and chat about RLHF & alignment in LLMs, reward modeling, and diverse applications of alignment methods!🧵

English
0
0
5
515
Noah Lee retweetledi
Jiwoo Hong
Jiwoo Hong@jiwoohong98·
@emnlpmeeting @nlee288 Lastly, my recent work with @nlee288 is on the track to generalizing RLHF to different languages and tasks! 📎Paper: arxiv.org/abs/2410.18027 I would also love to chat about the multilingual alignments and reward modeling mechanisms for generalizing RLHF🙌
English
0
1
3
2K
Noah Lee retweetledi
Nathan Lambert
Nathan Lambert@natolambert·
As more and more attention shifts back to on-policy RL for LLM post training, thx o1, (away from just using DPO-like methods for alignment) it's been clear we need a better reward model ecosystem. The good news is, we're starting to get a lot of evals. True progress only comes with good evals. The most recent two papers that caught my eye are multilingual RM work (more on this topic soon from me 🤫). 1. MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models; Son et al. 2. Cross-lingual Transfer of Reward Models in Multilingual Alignment; Hong at all I've been arguing this is needed since April 2023 (blog links to back this up below), and was why I built RewardBench, so in some ways "mission accomplished," but our models really are so far behind closed labs. We need to be exploring how to use models like llama 405B as a reward model. Will open up a lot of doors.
English
3
22
108
16.3K
Noah Lee retweetledi
Jiwoo Hong
Jiwoo Hong@jiwoohong98·
🎉Thrilled to share that two papers, including ORPO, have been accepted at #EMNLP2024 Main Track! Can't wait to attend my first NLP conference😀 1⃣ ORPO: arxiv.org/abs/2403.07691 2⃣Stable Language Model Pre-training by Reducing Embedding Variability: arxiv.org/abs/2409.07787
Jiwoo Hong@jiwoohong98

Align LLMs with the preference dataset ONLY with 💡ORPO💡 We introduce ORPO, alignment without reference model & SFT! With awesome dataset from @argilla_io + Mistral(7B) + ORPO, we present 🌟Mistral-ORPO-β🌟 🧵 👉 AlpacaEval 2.0: 12.20% 👉 IFEval: 66.19% 👉 MT-Bench: 7.32

English
5
9
64
7.7K
Noah Lee retweetledi
Seungone Kim
Seungone Kim@seungonekim·
🤔How can we systematically assess an LM's proficiency in a specific capability without using summary measures like helpfulness or simple proxy tasks like multiple-choice QA? Introducing the ✨BiGGen Bench, a benchmark that directly evaluates nine core capabilities of LMs.
Seungone Kim tweet media
English
8
56
192
59.2K
Noah Lee retweetledi
AK
AK@_akhaliq·
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference Modern alignment techniques based on human preferences, such as RLHF and DPO, typically employ divergence regularization relative to the reference model to ensure training stability. However, this often limits the flexibility of models during alignment, especially when there is a clear distributional discrepancy between the preference data and the reference model. In this paper, we focus on the alignment of recent text-to-image diffusion models, such as Stable Diffusion XL (SDXL), and find that this "reference mismatch" is indeed a significant problem in aligning these models due to the unstructured nature of visual modalities: e.g., a preference for a particular stylistic aspect can easily induce such a discrepancy. Motivated by this observation, we propose a novel and memory-friendly preference alignment method for diffusion models that does not depend on any reference model, coined margin-aware preference optimization (MaPO). MaPO jointly maximizes the likelihood margin between the preferred and dispreferred image sets and the likelihood of the preferred sets, simultaneously learning general stylistic features and preferences. For evaluation, we introduce two new pairwise preference datasets, which comprise self-generated image pairs from SDXL, Pick-Style and Pick-Safety, simulating diverse scenarios of reference mismatch. Our experiments validate that MaPO can significantly improve alignment on Pick-Style and Pick-Safety and general preference alignment when used with Pick-a-Pic v2, surpassing the base SDXL and other existing methods. Our code, models, and datasets are publicly available via
AK tweet media
English
1
22
83
14.4K
Noah Lee retweetledi
Sayak Paul
Sayak Paul@RisingSayak·
Introducing MaPO, a memory-efficient technique for aligning T2I diffusion models on preference data 🔥 We eliminate the need to have a reference model when performing alignment fine-tuning. Code, models, datasets, and paper are up at: mapo-t2i.github.io 1/7
Sayak Paul tweet media
English
4
40
183
34.7K
Noah Lee retweetledi
Argilla
Argilla@argilla_io·
In Feb, our CEO @dvilasuero built a 7K version 💡Idea ✅Big labs use multiturn preference data ❌The OSS AI community had single-turn prefs Misunderstood by many, @jiwoohong98 @nlee288 showed its great for ORPO, thanks! Now it's used by +100s models twitter.com/jiwoohong98/st…
Jiwoo Hong@jiwoohong98

📢New model, Mistral-ORPO-Capybara-7k in ORPO collection!🧵 With 💡ORPO💡 + 7k Capybara preference pair by @argilla_io🔥 + Mistral (7B), you can get the human-aligned chat model within 2.5 hours of fine-tuning👀 👉AlpacaEval 2.0 (LC): 15.9% 👉MT-Bench: 7.44 👉IFEval: 61.27%

English
1
1
7
335
Noah Lee retweetledi
Omar Sanseviero
Omar Sanseviero@osanseviero·
Welcome Zephyr 141B to Hugging Chat🔥 🎉A Mixtral-8x22B fine-tune ⚡️Super fast generation with TGI 🤗Fully open source (from the data to the UI) huggingface.co/chat/models/Hu…
Omar Sanseviero tweet media
English
11
76
364
48.5K
Noah Lee
Noah Lee@nlee288·
Great collab w/ @huggingface and @argilla_io !!! Also note ORPO has been officially integrated in the TRL Library with the v0.8.2 release. github.com/huggingface/trl
Lewis Tunstall@_lewtun

The new Mixtral-8x22B base model is a total beast for fine-tuning and has produced some of the highest scores I've ever seen on IFEval and BBH 🤯 We teamed up with @argilla_io and @kaist_ai to cook up a brand new recipe for Zephyr models 🪁 huggingface.co/HuggingFaceH4/… 🧑‍🍳 Align the base model with Odds Ratio Preference Optimisation (ORPO). This novel algorithm does not require an SFT step to achieve high performance and is thus much more computationally efficient than methods like DPO and PPO 🦫 Use a brand new dataset of 7k high-quality, multi-turn preferences that has been developed by our friends at @argilla_io: huggingface.co/datasets/argil… As usual, we are open sourcing the training code in the Alignment Handbook for the community to build on: github.com/huggingface/al… This has been a epic speed run with @jiwoohong98 @nlee288 @alvarobartt - now I can finally sleep 😂

English
0
2
9
1.2K
Noah Lee retweetledi
Sayak Paul
Sayak Paul@RisingSayak·
Aligning a diffusion model on preference data WITHOUT a reference model could be nice no? So, @krasul and I are ideating the use of ORPO to align SDXL 1.0 on PickAPic. Diffusion ORPO with LoRA 💫 Code and model ⬇️ huggingface.co/sayakpaul/diff…
Sayak Paul tweet media
English
3
13
60
18.2K