The Catalog Problem: Deep Learning Methods for Transforming Sets into Sequences of Clusters

Mateusz Jurewicz

Research output: Book / Anthology / Report / Ph.D. thesisPh.D. thesis

Abstract

The titular Catalog Problem refers to predicting a varying number of ordered clusters from sets of any cardinality. This task arises in many diverse areas, ranging from medical triage, through multi-channel signal analysis for petroleum exploration to product catalog structure prediction. This thesis focuses on the latter, which exemplifies a number of challenges inherent to ordered clustering. These include learning variable cluster constraints, exhibiting relational reasoning and managing combinatorial complexity. All of which present unique challenges for neural networks, combining elements of set representation, neural clustering and permutation learning.

In order to approach the Catalog Problem, a curated dataset of over ten thousand real-world product catalogs consisting of more than one million product offers is provided. Additionally, a library for generating simpler, synthetic catalog structures is presented. These and other datasets form the foundation of the included work, allowing for a quantitative comparison of the proposed methods’ ability to address the underlying challenge. In particular, synthetic datasets enable the assessment of the models’ capacity to learn higher order compositional and structural rules.

Two novel neural methods are proposed to tackle the Catalog Problem, a set encoding module designed to enhance the network’s ability to condition the prediction on the entirety of the input set, and a larger architecture for inferring an input- dependent number of diverse, ordered partitional clusters with an added cardinality prediction module. Both result in an improved performance on the presented datasets, with the latter being the only neural method fulfilling all requirements inherent to addressing the Catalog Problem.
Original languageEnglish
PublisherIT-Universitetet i København
Number of pages240
ISBN (Electronic)978-87-7949-400-8
Publication statusPublished - 2023
SeriesITU-DS
Number204
ISSN1602-3536

Fingerprint

Dive into the research topics of 'The Catalog Problem: Deep Learning Methods for Transforming Sets into Sequences of Clusters'. Together they form a unique fingerprint.

Cite this