Lena Libon retweetledi

🚀 Two new papers from our team are now available on ArXiv, both tackling core bottlenecks in RL post-training
1. Annotating human preference datasets without spending a fortune
2. Quantifying uncertainty for reward models
🔗lasgroup.github.io/rlhf

English
