
I am super excited to see our work addressing the core bottlenecks of RLHF finally out in the world! 🚀
Check out all the details in the thread below:
Barna Pásztor@pasztorb
🚀 Two new papers from our team are now available on ArXiv, both tackling core bottlenecks in RL post-training 1. Annotating human preference datasets without spending a fortune 2. Quantifying uncertainty for reward models 🔗lasgroup.github.io/rlhf
English