Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning

Finn Rietz, Stefan Heinrich, Erik Schaffernicht, Johannes A. Stork

Publikation: Konference artikel i Proceeding eller bog/rapport kapitelKonferencebidrag i proceedingsForskningpeer review

Abstract

Reinforcement learning (RL) for complex tasks remains a challenge, primarily due
to the difficulties of engineering scalar reward functions and the inherent inefficiency of training models from scratch. Instead, it would be better to specify complex tasks in terms of elementary subtasks and to reuse subtask solutions whenever possible. In this work, we address continuous space lexicographic multi-objective
RL problems, consisting of prioritized subtasks, which are notoriously difficult to
solve. We show that these can be scalarized with a subtask transformation and
then solved incrementally using value decomposition. Exploiting this insight, we
propose prioritized soft Q-decomposition (PSQD), a novel algorithm for learning and adapting subtask solutions under lexicographic priorities in continuous
state-action spaces. PSQD offers the ability to reuse previously learned subtask
solutions in a zero-shot composition, followed by an adaptation step. Its ability
to use retained subtask training data for offline learning eliminates the need for
new environment interaction during adaptation. We demonstrate the efficacy of
our approach by presenting successful learning, reuse, and adaptation results for
both low- and high-dimensional simulated robot control tasks, as well as offline
learning results. In contrast to baseline approaches, PSQD does not trade off between conflicting subtasks or priority constraints and satisfies subtask priorities during learning. PSQD provides an intuitive framework for tackling complex RL problems, offering insights into the inner workings of the subtask composition.
OriginalsprogEngelsk
TitelThe Twelfth International Conference on Learning Representations (ICLR2023)
UdgivelsesstedVienna, AT
Publikationsdato8 maj 2024
StatusUdgivet - 8 maj 2024

Fingeraftryk

Dyk ned i forskningsemnerne om 'Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning'. Sammen danner de et unikt fingeraftryk.

Citationsformater