Junwei Lu đã retweet

We know optimism is provably efficient for online RL. What about offline RL? It turns out simply flipping the sign of the bonus is minimax optimal! Given a dataset, pessimism is the best effort we can make.
arxiv.org/abs/2012.15085
Just leave pessimism to 2020. Happy new year~!


English





