PROCAT: Product Catalogue Dataset for Implicit Clustering, Permutation Learning and Structure Prediction
Research output: Conference Article in Proceeding or Book/Report chapter › Article in proceedings › Research › peer-review
Standard
PROCAT: Product Catalogue Dataset for Implicit Clustering, Permutation Learning and Structure Prediction. / Jurewicz, Mateusz; Derczynski, Leon.
Thirty-fifth Conference on Neural Information Processing Systems: Datasets and Benchmarks Track. Vol. 1 2021. ed. 2021.Research output: Conference Article in Proceeding or Book/Report chapter › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - PROCAT: Product Catalogue Dataset for Implicit Clustering, Permutation Learning and Structure Prediction
AU - Jurewicz, Mateusz
AU - Derczynski, Leon
N1 - Conference code: 25
PY - 2021/12/1
Y1 - 2021/12/1
N2 - In this dataset paper we introduce PROCAT, a novel e-commerce dataset containing expertly designed product catalogues consisting of individual product offers grouped into complementary sections. We aim to address the scarcity of existing datasets in the area of set-to-sequence machine learning tasks, which involve complex structure prediction. The task's difficulty is further compounded by the need to place into sequences rare and previously-unseen instances, as well as by variable sequence lengths and substructures, in the form of diversely-structured catalogues. PROCAT provides catalogue data consisting of over 1.5 million set items across a 4-year period, in both raw text form and with pre-processed features containing information about relative visual placement. In addition to this ready-to-use dataset, we include baseline experimental results on a proposed benchmark task from a number of joint set encoding and permutation learning model architectures.
AB - In this dataset paper we introduce PROCAT, a novel e-commerce dataset containing expertly designed product catalogues consisting of individual product offers grouped into complementary sections. We aim to address the scarcity of existing datasets in the area of set-to-sequence machine learning tasks, which involve complex structure prediction. The task's difficulty is further compounded by the need to place into sequences rare and previously-unseen instances, as well as by variable sequence lengths and substructures, in the form of diversely-structured catalogues. PROCAT provides catalogue data consisting of over 1.5 million set items across a 4-year period, in both raw text form and with pre-processed features containing information about relative visual placement. In addition to this ready-to-use dataset, we include baseline experimental results on a proposed benchmark task from a number of joint set encoding and permutation learning model architectures.
UR - https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/32bb90e8976aab5298d5da10fe66f21d-Abstract-round1.html
U2 - 10.6084/m9.figshare.14709507
DO - 10.6084/m9.figshare.14709507
M3 - Article in proceedings
VL - 1
BT - Thirty-fifth Conference on Neural Information Processing Systems
T2 - Thirty-fifth Conference on Neural Information Processing Systems
Y2 - 6 December 2021 through 14 December 2021
ER -
ID: 86385214