
@josephdviviano Yes, sure, for the next release we'll compare with this implementation, thanks a lot for a ref!
English
Daniil Tiapkin
23 posts

@dtiapkin
Research Scientist @ Google DeepMind | PhD in RL 🇫🇷




1/ If you’re familiar with RLHF, you likely heard of reward hacking —where over-optimizing the imperfect reward model leads to unintended behaviors. But what about teacher hacking in knowledge distillation: can the teacher be hacked, like rewards in RLHF?








