Abstract
Reinforcement learning (RL) for complex tasks remains a challenge, primarily due
to the difficulties of engineering scalar reward functions and the inherent inefficiency of training models from scratch. Instead, it would be better to specify complex tasks in terms of elementary subtasks and to reuse subtask solutions whenever possible. In this work, we address continuous space lexicographic multi-objective
RL problems, consisting of prioritized subtasks, which are notoriously difficult to
solve. We show that these can be scalarized with a subtask transformation and
then solved incrementally using value decomposition. Exploiting this insight, we
propose prioritized soft Q-decomposition (PSQD), a novel algorithm for learning and adapting subtask solutions under lexicographic priorities in continuous
state-action spaces. PSQD offers the ability to reuse previously learned subtask
solutions in a zero-shot composition, followed by an adaptation step. Its ability
to use retained subtask training data for offline learning eliminates the need for
new environment interaction during adaptation. We demonstrate the efficacy of
our approach by presenting successful learning, reuse, and adaptation results for
both low- and high-dimensional simulated robot control tasks, as well as offline
learning results. In contrast to baseline approaches, PSQD does not trade off between conflicting subtasks or priority constraints and satisfies subtask priorities during learning. PSQD provides an intuitive framework for tackling complex RL problems, offering insights into the inner workings of the subtask composition.
to the difficulties of engineering scalar reward functions and the inherent inefficiency of training models from scratch. Instead, it would be better to specify complex tasks in terms of elementary subtasks and to reuse subtask solutions whenever possible. In this work, we address continuous space lexicographic multi-objective
RL problems, consisting of prioritized subtasks, which are notoriously difficult to
solve. We show that these can be scalarized with a subtask transformation and
then solved incrementally using value decomposition. Exploiting this insight, we
propose prioritized soft Q-decomposition (PSQD), a novel algorithm for learning and adapting subtask solutions under lexicographic priorities in continuous
state-action spaces. PSQD offers the ability to reuse previously learned subtask
solutions in a zero-shot composition, followed by an adaptation step. Its ability
to use retained subtask training data for offline learning eliminates the need for
new environment interaction during adaptation. We demonstrate the efficacy of
our approach by presenting successful learning, reuse, and adaptation results for
both low- and high-dimensional simulated robot control tasks, as well as offline
learning results. In contrast to baseline approaches, PSQD does not trade off between conflicting subtasks or priority constraints and satisfies subtask priorities during learning. PSQD provides an intuitive framework for tackling complex RL problems, offering insights into the inner workings of the subtask composition.
Originalsprog | Engelsk |
---|---|
Titel | The Twelfth International Conference on Learning Representations (ICLR2023) |
Udgivelsessted | Vienna, AT |
Publikationsdato | 8 maj 2024 |
Status | Udgivet - 8 maj 2024 |