Andreas Plesner

1 posts

Andreas Plesner banner
Andreas Plesner

Andreas Plesner

@andreas_plesner

Research Intern at @joinhandshake and PhD student @ETH_en. Interested in how to build and design intelligent systems

San Francisco Beigetreten Ocak 2014
49 Folgt14 Follower
Andreas Plesner
Andreas Plesner@andreas_plesner·
In 2025, RLVR was the big thing following the DeepSeek moment. Now, RL for LLMs is increasingly focusing on semi-verifiable domains. After joining HART, I asked just how good a verifier has to be. The answer? Imperfection is not a problem! With @anishathalye and @guzmanhe
Anish Athalye@anishathalye

Does an imperfect verifier break reinforcement learning with verifiable rewards (RLVR)? Turns out it doesn’t! Why does this matter? As the world moves into reinforcement learning in semi-verifiable domains, perfect verifiers don’t exist. We added controlled and LLM-based noise to RLVR reward signals and found that up to 30% noise barely hurts training; performance stays within 4pp of the clean baseline. This research has already impacted how we build reinforcement learning environments at @joinHandshake. For a major benchmark we are launching tomorrow, we hill-climbed the verifier to 88% accuracy—above the 85% human inter-rater agreement—knowing from this research that this is good enough. With @andreas_plesner @guzmanhe

English
0
0
6
121