Angehefteter Tweet

Here's a lecture I gave on RLHF and the new DPO paper which proposes a wonderful alternative to OpenAI's PPO algorithm. If you are interested in improving the quality of your LLM completions using RLHF, please watch this video.
youtu.be/Ju-pFJNfOfY

YouTube
English























