Towards Continual Reinforcement Learning through Evolutionary Meta-Learning

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

In continual learning, an agent is exposed to a changing environment, requiring it to adapt during execution time. While traditional reinforcement learning (RL) methods have shown impressive results in various domains, there has been less progress in addressing the challenge of continual learning. Current RL approaches do not al-low the agent to adapt during execution but only during a dedicated training phase. Here we study the problem of continual learning ina 2D bipedal walker domain, in which the legs of the walker grow over its lifetime, requiring the agent to adapt. The introduced approach combines neuroevolution, to determine the starting weights of a deep neural network, and a version of deep reinforcement learning that is continually running during execution time. The proof-of-concept results show that the combined approach gives abetter generalization performance when compared to evolution or reinforcement learning alone. The hybridization of reinforcement learning and evolution opens up exciting new research directions for continually learning agents that can benefit from suitable priors determined by an evolutionary process.
Original languageEnglish
Title of host publicationTowards Continual Reinforcement Learning through Evolutionary Meta-Learning
Number of pages2
VolumeProceedings of the Genetic and Evolutionary Computation Conference Companion
Place of PublicationNew York, NY, USA
PublisherAssociation for Computing Machinery
Publication date17 Jul 2019
Edition2019
Pages119-120
ISBN (Electronic)978-1-4503-6748-6
DOIs
Publication statusPublished - 17 Jul 2019

Keywords

  • Reinforcement learning
  • Continual learning
  • Meta-learning

Fingerprint

Dive into the research topics of 'Towards Continual Reinforcement Learning through Evolutionary Meta-Learning'. Together they form a unique fingerprint.

Cite this