An important goal in reinforcement learning is to create agents that can quickly adapt to new goals but at the same time avoid situations that might cause damage to themselves or their environments. One way agents learn is through exploration mechanisms, which are needed to discover new policies. However, in deep reinforcement learning, exploration is normally done by injecting noise in the action space. While performing well in many domains, this setup has the inherent risk that the noisy actions lead agents to unsafe environment states. In this paper, we introduce a novel approach called Meta-Learned Instinctual Networks (MLIN) that allows agents to perform lifetime learning while avoiding hazardous states. At the core of the approach is a plastic network trained through reinforcement learning and an evolved "instinctual" network, which does not change during the agent's lifetime but can modulate the noisy output of the plastic network. We test our idea on a simple 2D navigation task with hazard zones, in which the agent has to learn to approach new targets during deployment. While a standard meta-trained network performs poorly in these tasks, MLIN allows agents to learn to navigate to new targets while minimizing collisions with hazard zones. These results suggest that meta-learning augmented with an instinctual network is a promising approach for safe AI.
|Titel||GECCO '20 : Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion|
|Forlag||Association for Computing Machinery|
|Status||Udgivet - jul. 2020|
|Begivenhed||GECCO 2020: The Genetic and Evolutionary Computation Conference - online, Cancun, Mexico|
Varighed: 8 jul. 2020 → 12 jul. 2020
|Periode||08/07/2020 → 12/07/2020|