Noah Lee

41 posts

Noah Lee

@nlee288

MS Student @kaist_ai Interested in LLM, Human Alignment

Seoul Katılım Mart 2017

416 Takip Edilen160 Takipçiler

Noah Lee retweetledi

Jiwoo Hong@jiwoohong98·14 Tem

⁉️Why do reward models suffer from over-optimization in RLHF? We revisit how representations are learned during reward modeling, revealing “hidden state dispersion” as the key, with a simple fix! 🧵 Meet us at @icmlconf! 📅July 16th (Wed) 11AM–1:30PM 📍East Hall A-B E-2608

English

895

Noah Lee@nlee288·14 May

@shangbinfeng Congrats Shangbin!

English

Shangbin Feng@shangbinfeng·14 May

Now accepted at ICML 2025

Shangbin Feng@shangbinfeng

👀 How to find a better adapted model? ✨ Let the models find it for you! 👉🏻 Introducing Model Swarms, multiple LLM experts collaboratively search for new adapted models in the weight space and discover their new capabilities. 📄 Paper: arxiv.org/abs/2410.11163

English

6.4K

Noah Lee retweetledi

Seungone Kim@seungonekim·25 Nis

🏆Glad to share that our BiGGen Bench paper has received the best paper award at @naaclmeeting! x.com/naaclmeeting/s… 📅 Ballroom A, Session I: Thursday May 1st, 16:00-17:30 (MDT) 📅 Session M (Plenary Session): Friday May 2nd, 15:30-16:30 (MDT) 📅 Virtual Conference: Tuesday May 6th, 20:30-21:00 (MDT) I'd like to appreciate our coauthors @scott_sjy Ji Yong Cho @ShayneRedford @chaechaek1214 @dongkeun_yoon @gson_AI Yejin Cho @shafayat_sheikh @jinheonbaek @suehpark @ronalhwang @Jinkyung_Jo Hyowon Cho @haebinshin_ @sylee_ai @hanseok_oh @nlee288 @itsnamgyu @joocjun @miyoung_ko @yoonjoo_le2 @hyungjoochae @jay_shin @jang_yoel @SeonghyeonYe @billyuchenlin @wellecks @gneubig Moontae Lee @Kyungjae__Lee @seo_minjoon! It wouldn't have been possible without everyone's feedback and hard work 😀

Seungone Kim@seungonekim

🤔How can we systematically assess an LM's proficiency in a specific capability without using summary measures like helpfulness or simple proxy tasks like multiple-choice QA? Introducing the ✨BiGGen Bench, a benchmark that directly evaluates nine core capabilities of LMs.

English

131

31.7K

Noah Lee retweetledi

Jiwoo Hong@jiwoohong98·25 Şub

🌏"When" and "Why" can we use English reward models (RMs) in other languages? Late news, but I'm happy to share that a paper co-led with @nlee288, studying the conditions for cross-lingual transfer in RMs, has been accepted to #NAACL2025!🎉 Main insights in the thread🧵

English

682

Noah Lee retweetledi

Seungone Kim@seungonekim·24 Oca

Glad to share that BiGGen Bench has been accepted at @naaclmeeting! See you at New Mexico🤗

Seungone Kim@seungonekim

English

2.1K

Noah Lee@nlee288·14 Kas

Please visit Riverfront Hall 10:30 AM tmr for our poster presentation of ORPO!

Jiwoo Hong@jiwoohong98

At the upcoming @emnlpmeeting, I will be presenting the ORPO paper with @nlee288 at Miami! 🗓️Nov 14th, 10:30 - 12:00 📍Session F, Riverfront Hall 🔥Excited to meet and chat about RLHF & alignment in LLMs, reward modeling, and diverse applications of alignment methods!🧵

English

515

Noah Lee retweetledi

Jiwoo Hong@jiwoohong98·9 Kas

@emnlpmeeting @nlee288 Lastly, my recent work with @nlee288 is on the track to generalizing RLHF to different languages and tasks! 📎Paper: arxiv.org/abs/2410.18027 I would also love to chat about the multilingual alignments and reward modeling mechanisms for generalizing RLHF🙌

English

Noah Lee retweetledi

Nathan Lambert@natolambert·28 Eki

As more and more attention shifts back to on-policy RL for LLM post training, thx o1, (away from just using DPO-like methods for alignment) it's been clear we need a better reward model ecosystem. The good news is, we're starting to get a lot of evals. True progress only comes with good evals. The most recent two papers that caught my eye are multilingual RM work (more on this topic soon from me 🤫). 1. MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models; Son et al. 2. Cross-lingual Transfer of Reward Models in Multilingual Alignment; Hong at all I've been arguing this is needed since April 2023 (blog links to back this up below), and was why I built RewardBench, so in some ways "mission accomplished," but our models really are so far behind closed labs. We need to be exploring how to use models like llama 405B as a reward model. Will open up a lot of doors.

English

108

16.3K

Noah Lee retweetledi

Jiwoo Hong@jiwoohong98·20 Eyl

🎉Thrilled to share that two papers, including ORPO, have been accepted at #EMNLP2024 Main Track! Can't wait to attend my first NLP conference😀 1⃣ ORPO: arxiv.org/abs/2403.07691 2⃣Stable Language Model Pre-training by Reducing Embedding Variability: arxiv.org/abs/2409.07787

Jiwoo Hong@jiwoohong98

Align LLMs with the preference dataset ONLY with 💡ORPO💡 We introduce ORPO, alignment without reference model & SFT! With awesome dataset from @argilla_io + Mistral(7B) + ORPO, we present 🌟Mistral-ORPO-β🌟 🧵 👉 AlpacaEval 2.0: 12.20% 👉 IFEval: 66.19% 👉 MT-Bench: 7.32

English

7.7K

Noah Lee retweetledi

Seungone Kim@seungonekim·12 Haz

English

192

59.2K

Noah Lee retweetledi

AK@_akhaliq·11 Haz

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference Modern alignment techniques based on human preferences, such as RLHF and DPO, typically employ divergence regularization relative to the reference model to ensure training stability. However, this often limits the flexibility of models during alignment, especially when there is a clear distributional discrepancy between the preference data and the reference model. In this paper, we focus on the alignment of recent text-to-image diffusion models, such as Stable Diffusion XL (SDXL), and find that this "reference mismatch" is indeed a significant problem in aligning these models due to the unstructured nature of visual modalities: e.g., a preference for a particular stylistic aspect can easily induce such a discrepancy. Motivated by this observation, we propose a novel and memory-friendly preference alignment method for diffusion models that does not depend on any reference model, coined margin-aware preference optimization (MaPO). MaPO jointly maximizes the likelihood margin between the preferred and dispreferred image sets and the likelihood of the preferred sets, simultaneously learning general stylistic features and preferences. For evaluation, we introduce two new pairwise preference datasets, which comprise self-generated image pairs from SDXL, Pick-Style and Pick-Safety, simulating diverse scenarios of reference mismatch. Our experiments validate that MaPO can significantly improve alignment on Pick-Style and Pick-Safety and general preference alignment when used with Pick-a-Pic v2, surpassing the base SDXL and other existing methods. Our code, models, and datasets are publicly available via

English

14.4K

Noah Lee retweetledi

Sayak Paul@RisingSayak·11 Haz

Introducing MaPO, a memory-efficient technique for aligning T2I diffusion models on preference data 🔥 We eliminate the need to have a reference model when performing alignment fine-tuning. Code, models, datasets, and paper are up at: mapo-t2i.github.io 1/7

English

183

34.7K

Noah Lee retweetledi

Argilla@argilla_io·19 Nis

In Feb, our CEO @dvilasuero built a 7K version 💡Idea ✅Big labs use multiturn preference data ❌The OSS AI community had single-turn prefs Misunderstood by many, @jiwoohong98 @nlee288 showed its great for ORPO, thanks! Now it's used by +100s models twitter.com/jiwoohong98/st…

Jiwoo Hong@jiwoohong98

📢New model, Mistral-ORPO-Capybara-7k in ORPO collection!🧵 With 💡ORPO💡 + 7k Capybara preference pair by @argilla_io🔥 + Mistral (7B), you can get the human-aligned chat model within 2.5 hours of fine-tuning👀 👉AlpacaEval 2.0 (LC): 15.9% 👉MT-Bench: 7.44 👉IFEval: 61.27%

English

335

Noah Lee retweetledi

Alvaro Bartolome@alvarobartt·14 Nis

ICYMI, a couple days ago, we released 🪁 Zephyr 141B fine-tune of Mixtral 8x22B using ORPO with Capybara DPO A fruitful collaboration between @kaist_ai, @huggingface and us @argilla_io 👏🏻 Special kudos to @_lewtun @jiwoohong98 @nlee288 and @krasul huggingface.co/HuggingFaceH4/…

English

5.4K

Noah Lee retweetledi

Omar Sanseviero@osanseviero·12 Nis

Welcome Zephyr 141B to Hugging Chat🔥 🎉A Mixtral-8x22B fine-tune ⚡️Super fast generation with TGI 🤗Fully open source (from the data to the UI) huggingface.co/chat/models/Hu…

English

364

48.5K

Noah Lee retweetledi

Jiwoo Hong@jiwoohong98·12 Nis

Try our Zephyr-ORPO-141B-A35B😃 Now it is in HuggingChat, thanks to @huggingface 🔥Mixtral-8x22B + ORPO🔥

Omar Sanseviero@osanseviero

Welcome Zephyr 141B to Hugging Chat🔥 🎉A Mixtral-8x22B fine-tune ⚡️Super fast generation with TGI 🤗Fully open source (from the data to the UI) huggingface.co/chat/models/Hu…

English

1.6K

Noah Lee@nlee288·11 Nis

@altryne Thank you for having us! 🙂

English

Alex Volkov@altryne·11 Nis

twitter.com/i/spaces/1OyKA…

ZXX

8.2K

Noah Lee@nlee288·11 Nis

Great collab w/ @huggingface and @argilla_io !!! Also note ORPO has been officially integrated in the TRL Library with the v0.8.2 release. github.com/huggingface/trl

Lewis Tunstall@_lewtun

The new Mixtral-8x22B base model is a total beast for fine-tuning and has produced some of the highest scores I've ever seen on IFEval and BBH 🤯 We teamed up with @argilla_io and @kaist_ai to cook up a brand new recipe for Zephyr models 🪁 huggingface.co/HuggingFaceH4/… 🧑‍🍳 Align the base model with Odds Ratio Preference Optimisation (ORPO). This novel algorithm does not require an SFT step to achieve high performance and is thus much more computationally efficient than methods like DPO and PPO 🦫 Use a brand new dataset of 7k high-quality, multi-turn preferences that has been developed by our friends at @argilla_io: huggingface.co/datasets/argil… As usual, we are open sourcing the training code in the Alignment Handbook for the community to build on: github.com/huggingface/al… This has been a epic speed run with @jiwoohong98 @nlee288 @alvarobartt - now I can finally sleep 😂

English

1.2K

Noah Lee retweetledi

Omar Sanseviero@osanseviero·11 Nis

Data: hf.co/datasets/argil… ORPO paper: hf.co/papers/2403.07… Code recipe to replicate: github.com/huggingface/al… Amazing collaboration between @argilla_io, @kaist_ai and @huggingface. Kudos to @jiwoohong98 @nlee288 @alvarobartt and @_lewtun 🔥

English

Noah Lee retweetledi

Sayak Paul@RisingSayak·25 Mar

Aligning a diffusion model on preference data WITHOUT a reference model could be nice no? So, @krasul and I are ideating the use of ORPO to align SDXL 1.0 on PickAPic. Diffusion ORPO with LoRA 💫 Code and model ⬇️ huggingface.co/sayakpaul/diff…

English

18.2K

Keşfet

@icmlconf @shangbinfeng @naaclmeeting @scott_sjy @ShayneRedford @chaechaek1214 @dongkeun_yoon @gson_AI