Safety is a critical concern when applying reinforcement learning (RL) to real-world control tasks. However, existing safe RL works either only consider expected safety constraint violations and fail to maintain safety guarantees, or use safety certificate tools borrowed from safe control theory, which relies on analytic system models. This paper proposes a model-free safe RL algorithm with neural barrier certificate under stepwise state constraint setting. The barrier certificate is learned in a model-free manner by minimizing the violations of appropriate barrier properties on transition data collected by the policy. We extend the single-step invariant property of the barrier certificate to a multi-step version and construct the corresponding multi-step invariant loss. This loss balances the bias and variance of the barrier certificate and enhances both the safety and performance of the policy. We optimize the policy in a model-free manner by introducing an importance sampling weight in the constraint of the multi-step invariant property. We test our algorithm on multiple problems, including classic control tasks, robot collision avoidance, and autonomous driving. Results show that our algorithm achieves near-zero constraint violations and high performance compared to the baselines. Moreover, the learned barrier certificates successfully identify the feasible regions on multiple tasks.
Learned barrier certificates on classic control tasks:
Training curves on Safety Gym:
Learned barrier certificates on Safety Gym:
Examples of MetaDrive scenarios:
Training curves on MetaDrive: