PROCAT: Product Catalogue Dataset for Implicit Clustering, Permutation Learning and Structure Prediction

Mateusz Jurewicz, Leon Derczynski

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

In this dataset paper we introduce PROCAT, a novel e-commerce dataset containing expertly designed product catalogues consisting of individual product offers grouped into complementary sections. We aim to address the scarcity of existing datasets in the area of set-to-sequence machine learning tasks, which involve complex structure prediction. The task's difficulty is further compounded by the need to place into sequences rare and previously-unseen instances, as well as by variable sequence lengths and substructures, in the form of diversely-structured catalogues. PROCAT provides catalogue data consisting of over 1.5 million set items across a 4-year period, in both raw text form and with pre-processed features containing information about relative visual placement. In addition to this ready-to-use dataset, we include baseline experimental results on a proposed benchmark task from a number of joint set encoding and permutation learning model architectures.
Original languageEnglish
Title of host publicationThirty-fifth Conference on Neural Information Processing Systems : Datasets and Benchmarks Track
Volume1
Publication date1 Dec 2021
Edition2021
DOIs
Publication statusPublished - 1 Dec 2021
EventThirty-fifth Conference on Neural Information Processing Systems - Virtual
Duration: 6 Dec 202114 Dec 2021
Conference number: 25
https://nips.cc/

Conference

ConferenceThirty-fifth Conference on Neural Information Processing Systems
Number25
LocationVirtual
Period06/12/202114/12/2021
Internet address

Keywords

  • PROCAT
  • E-commerce Dataset
  • Set-to-Sequence Machine Learning
  • Complex Structure Prediction
  • Product Catalogues

Fingerprint

Dive into the research topics of 'PROCAT: Product Catalogue Dataset for Implicit Clustering, Permutation Learning and Structure Prediction'. Together they form a unique fingerprint.

Cite this