We address the problem of safe reinforcement learning from pixel observations. Inherent challenges in such settings are (1) a trade-off between reward optimization and adhering to safety constraints, (2) partial observability, and (3) high-dimensional observations. We formalize the problem in a constrained, partially observable Markov decision process framework, where an agent obtains distinct reward and safety signals. To address the curse of dimensionality, we employ a novel safety critic using the stochastic latent actor-critic (SLAC) approach. The latent variable model predicts rewards and safety violations, and we use the safety critic to train safe policies. Using well-known benchmark environments, we demonstrate competitive performance over existing approaches regarding computational requirements, final reward return, and satisfying the safety constraints.

Citation

  Hogewind, Y., Simão, T. D., Kachman, T., & Jansen, N. (2023). Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation. ICLR.

@inproceedings{Hogewind2023safe,
  title = {{Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation}},
  author = {Hogewind, Yannick and Sim{\~a}o, Thiago D. and Kachman, Tal and Jansen, Nils},
  year = {2023},
  booktitle = {ICLR}
}