The Catalog Problem: Deep Learning Methods for Transforming Sets into Sequences of Clusters

Mateusz Jurewicz

Publikation: Bog / Antologi / Rapport / Ph.D.-afhandlingPh.d.-afhandling


The titular Catalog Problem refers to predicting a varying number of ordered clusters from sets of any cardinality. This task arises in many diverse areas, ranging from medical triage, through multi-channel signal analysis for petroleum exploration to product catalog structure prediction. This thesis focuses on the latter, which exemplifies a number of challenges inherent to ordered clustering. These include learning variable cluster constraints, exhibiting relational reasoning and managing combinatorial complexity. All of which present unique challenges for neural networks, combining elements of set representation, neural clustering and permutation learning.

In order to approach the Catalog Problem, a curated dataset of over ten thousand real-world product catalogs consisting of more than one million product offers is provided. Additionally, a library for generating simpler, synthetic catalog structures is presented. These and other datasets form the foundation of the included work, allowing for a quantitative comparison of the proposed methods’ ability to address the underlying challenge. In particular, synthetic datasets enable the assessment of the models’ capacity to learn higher order compositional and structural rules.

Two novel neural methods are proposed to tackle the Catalog Problem, a set encoding module designed to enhance the network’s ability to condition the prediction on the entirety of the input set, and a larger architecture for inferring an input- dependent number of diverse, ordered partitional clusters with an added cardinality prediction module. Both result in an improved performance on the presented datasets, with the latter being the only neural method fulfilling all requirements inherent to addressing the Catalog Problem.
ForlagIT-Universitetet i København
Antal sider240
ISBN (Elektronisk)978-87-7949-400-8
StatusUdgivet - 2023


Dyk ned i forskningsemnerne om 'The Catalog Problem: Deep Learning Methods for Transforming Sets into Sequences of Clusters'. Sammen danner de et unikt fingeraftryk.