Safer reinforcement learning through evolved instincts

Publikation: Konference artikel i Proceeding eller bog/rapport kapitelKonferencebidrag i proceedingsForskningpeer review

Abstract

An important goal in reinforcement learning is to create agents that can quickly adapt to new goals but at the same time avoid situations that might cause damage to themselves or their environments. One way agents learn is through exploration mechanisms, which are needed to discover new policies. However, in deep reinforcement learning, exploration is normally done by injecting noise in the action space. While performing well in many domains, this setup has the inherent risk that the noisy actions lead agents to unsafe environment states. In this paper, we introduce a novel approach called Meta-Learned Instinctual Networks (MLIN) that allows agents to perform lifetime learning while avoiding hazardous states. At the core of the approach is a plastic network trained through reinforcement learning and an evolved "instinctual" network, which does not change during the agent's lifetime but can modulate the noisy output of the plastic network. We test our idea on a simple 2D navigation task with hazard zones, in which the agent has to learn to approach new targets during deployment. While a standard meta-trained network performs poorly in these tasks, MLIN allows agents to learn to navigate to new targets while minimizing collisions with hazard zones. These results suggest that meta-learning augmented with an instinctual network is a promising approach for safe AI.
OriginalsprogEngelsk
TitelGECCO '20 : Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion
Antal sider2
ForlagAssociation for Computing Machinery
Publikationsdatojul. 2020
Sider77-78
ISBN (Trykt)978-1-4503-7127-8
DOI
StatusUdgivet - jul. 2020
BegivenhedGECCO 2020: The Genetic and Evolutionary Computation Conference - online, Cancun, Mexico
Varighed: 8 jul. 202012 jul. 2020
https://gecco-2020.sigevo.org/index.html/HomePage

Konference

KonferenceGECCO 2020
Lokationonline
Land/OmrådeMexico
ByCancun
Periode08/07/202012/07/2020
Internetadresse

Emneord

  • Life-long learning
  • Reinforcement learning
  • safe reinforcement learning
  • Evolutionary algorithms

Fingeraftryk

Dyk ned i forskningsemnerne om 'Safer reinforcement learning through evolved instincts'. Sammen danner de et unikt fingeraftryk.

Citationsformater