Abstract
In this dataset paper we introduce PROCAT, a novel e-commerce dataset containing expertly designed product catalogues consisting of individual product offers grouped into complementary sections. We aim to address the scarcity of existing datasets in the area of set-to-sequence machine learning tasks, which involve complex structure prediction. The task's difficulty is further compounded by the need to place into sequences rare and previously-unseen instances, as well as by variable sequence lengths and substructures, in the form of diversely-structured catalogues. PROCAT provides catalogue data consisting of over 1.5 million set items across a 4-year period, in both raw text form and with pre-processed features containing information about relative visual placement. In addition to this ready-to-use dataset, we include baseline experimental results on a proposed benchmark task from a number of joint set encoding and permutation learning model architectures.
| Originalsprog | Engelsk |
|---|---|
| Titel | Thirty-fifth Conference on Neural Information Processing Systems : Datasets and Benchmarks Track |
| Vol/bind | 1 |
| Publikationsdato | 1 dec. 2021 |
| Udgave | 2021 |
| DOI | |
| Status | Udgivet - 1 dec. 2021 |
| Begivenhed | The 35th Conference on Neural Information Processing Systems - Virtual Varighed: 6 dec. 2021 → 14 dec. 2021 Konferencens nummer: 35 https://nips.cc/ |
Konference
| Konference | The 35th Conference on Neural Information Processing Systems |
|---|---|
| Nummer | 35 |
| Lokation | Virtual |
| Periode | 06/12/2021 → 14/12/2021 |
| Internetadresse |
Emneord
- PROCAT
- E-commerce Dataset
- Set-to-Sequence Machine Learning
- Complex Structure Prediction
- Product Catalogues
Fingeraftryk
Dyk ned i forskningsemnerne om 'PROCAT: Product Catalogue Dataset for Implicit Clustering, Permutation Learning and Structure Prediction'. Sammen danner de et unikt fingeraftryk.Forskningsdatasæt
-
PROCAT Set-to-Sequence Model
Jurewicz, M. (Ophavsmand) & Derczynski, L. (Ophavsmand), ZENODO, 3 jun. 2021
DOI: 10.5281/zenodo.4896303, https://zenodo.org/record/4896303
Datasæt