Safer reinforcement learning through evolved instincts

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

An important goal in reinforcement learning is to create agents that can quickly adapt to new goals but at the same time avoid situations that might cause damage to themselves or their environments. One way agents learn is through exploration mechanisms, which are needed to discover new policies. However, in deep reinforcement learning, exploration is normally done by injecting noise in the action space. While performing well in many domains, this setup has the inherent risk that the noisy actions lead agents to unsafe environment states. In this paper, we introduce a novel approach called Meta-Learned Instinctual Networks (MLIN) that allows agents to perform lifetime learning while avoiding hazardous states. At the core of the approach is a plastic network trained through reinforcement learning and an evolved "instinctual" network, which does not change during the agent's lifetime but can modulate the noisy output of the plastic network. We test our idea on a simple 2D navigation task with hazard zones, in which the agent has to learn to approach new targets during deployment. While a standard meta-trained network performs poorly in these tasks, MLIN allows agents to learn to navigate to new targets while minimizing collisions with hazard zones. These results suggest that meta-learning augmented with an instinctual network is a promising approach for safe AI.
Original languageEnglish
Title of host publicationGECCO '20 : Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion
Number of pages2
PublisherAssociation for Computing Machinery
Publication dateJul 2020
Pages77-78
ISBN (Print)978-1-4503-7127-8
DOIs
Publication statusPublished - Jul 2020
EventGECCO 2020: The Genetic and Evolutionary Computation Conference - online, Cancun, Mexico
Duration: 8 Jul 202012 Jul 2020
https://gecco-2020.sigevo.org/index.html/HomePage

Conference

ConferenceGECCO 2020
Locationonline
Country/TerritoryMexico
CityCancun
Period08/07/202012/07/2020
Internet address

Keywords

  • Life-long learning
  • Reinforcement learning
  • safe reinforcement learning
  • Evolutionary algorithms

Fingerprint

Dive into the research topics of 'Safer reinforcement learning through evolved instincts'. Together they form a unique fingerprint.

Cite this