Reinforcement Learning (RL) deals with problems that can be modeled as a Markov Decision Process (MDP) where the transition function is unknown. In situations where an arbitrary policy pi is already in execution and the experiences with the environment were recorded in a batch D, an RL algorithm can use D to compute a new policy pi’. However, the policy computed by traditional RL algorithms might have worse performance compared to pi. Our goal is to develop safe RL algorithms, where the agent has a high confidence that the performance of pi’ is better than the performance of pi given D. To develop sample-efficient and safe RL algorithms we combine ideas from exploration strategies in RL with a safe policy improvement method.

Citation

  Simão, T. D. (2019). Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments. Proceedings of the 28th International Joint Conference On
                 Artificial Intelligence, IJCAI-19, 6460–6461.

@inproceedings{Simao2019dc,
  author = {Sim{\~a}o, Thiago D.},
  title = {{Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments}},
  booktitle = {Proceedings of the 28th International Joint Conference on
                   Artificial Intelligence, {IJCAI-19}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  pages = {6460--6461},
  year = {2019}
}