Junwei Lu がリツイート

We know optimism is provably efficient for online RL. What about offline RL? It turns out simply flipping the sign of the bonus is minimax optimal! Given a dataset, pessimism is the best effort we can make.
arxiv.org/abs/2012.15085
Just leave pessimism to 2020. Happy new year~!


English





