ITU

PROCAT: Product Catalogue Dataset for Implicit Clustering, Permutation Learning and Structure Prediction

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Standard

PROCAT: Product Catalogue Dataset for Implicit Clustering, Permutation Learning and Structure Prediction. / Jurewicz, Mateusz; Derczynski, Leon.

Thirty-fifth Conference on Neural Information Processing Systems: Datasets and Benchmarks Track. Vol. 1 2021. ed. 2021.

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Harvard

Jurewicz, M & Derczynski, L 2021, PROCAT: Product Catalogue Dataset for Implicit Clustering, Permutation Learning and Structure Prediction. in Thirty-fifth Conference on Neural Information Processing Systems: Datasets and Benchmarks Track. 2021 edn, vol. 1, Thirty-fifth Conference on Neural Information Processing Systems, 06/12/2021. https://doi.org/10.6084/m9.figshare.14709507

APA

Jurewicz, M., & Derczynski, L. (2021). PROCAT: Product Catalogue Dataset for Implicit Clustering, Permutation Learning and Structure Prediction. In Thirty-fifth Conference on Neural Information Processing Systems: Datasets and Benchmarks Track (2021 ed., Vol. 1) https://doi.org/10.6084/m9.figshare.14709507

Vancouver

Jurewicz M, Derczynski L. PROCAT: Product Catalogue Dataset for Implicit Clustering, Permutation Learning and Structure Prediction. In Thirty-fifth Conference on Neural Information Processing Systems: Datasets and Benchmarks Track. 2021 ed. Vol. 1. 2021 https://doi.org/10.6084/m9.figshare.14709507

Author

Jurewicz, Mateusz ; Derczynski, Leon. / PROCAT: Product Catalogue Dataset for Implicit Clustering, Permutation Learning and Structure Prediction. Thirty-fifth Conference on Neural Information Processing Systems: Datasets and Benchmarks Track. Vol. 1 2021. ed. 2021.

Bibtex

@inproceedings{50a2fe24338340c982fc09225a500aa3,
title = "PROCAT: Product Catalogue Dataset for Implicit Clustering, Permutation Learning and Structure Prediction",
abstract = "In this dataset paper we introduce PROCAT, a novel e-commerce dataset containing expertly designed product catalogues consisting of individual product offers grouped into complementary sections. We aim to address the scarcity of existing datasets in the area of set-to-sequence machine learning tasks, which involve complex structure prediction. The task's difficulty is further compounded by the need to place into sequences rare and previously-unseen instances, as well as by variable sequence lengths and substructures, in the form of diversely-structured catalogues. PROCAT provides catalogue data consisting of over 1.5 million set items across a 4-year period, in both raw text form and with pre-processed features containing information about relative visual placement. In addition to this ready-to-use dataset, we include baseline experimental results on a proposed benchmark task from a number of joint set encoding and permutation learning model architectures.",
author = "Mateusz Jurewicz and Leon Derczynski",
year = "2021",
month = dec,
day = "1",
doi = "10.6084/m9.figshare.14709507",
language = "English",
volume = "1",
booktitle = "Thirty-fifth Conference on Neural Information Processing Systems",
edition = "2021",
note = "Thirty-fifth Conference on Neural Information Processing Systems, NeurIPS 2021 ; Conference date: 06-12-2021 Through 14-12-2021",
url = "https://nips.cc/",

}

RIS

TY - GEN

T1 - PROCAT: Product Catalogue Dataset for Implicit Clustering, Permutation Learning and Structure Prediction

AU - Jurewicz, Mateusz

AU - Derczynski, Leon

N1 - Conference code: 25

PY - 2021/12/1

Y1 - 2021/12/1

N2 - In this dataset paper we introduce PROCAT, a novel e-commerce dataset containing expertly designed product catalogues consisting of individual product offers grouped into complementary sections. We aim to address the scarcity of existing datasets in the area of set-to-sequence machine learning tasks, which involve complex structure prediction. The task's difficulty is further compounded by the need to place into sequences rare and previously-unseen instances, as well as by variable sequence lengths and substructures, in the form of diversely-structured catalogues. PROCAT provides catalogue data consisting of over 1.5 million set items across a 4-year period, in both raw text form and with pre-processed features containing information about relative visual placement. In addition to this ready-to-use dataset, we include baseline experimental results on a proposed benchmark task from a number of joint set encoding and permutation learning model architectures.

AB - In this dataset paper we introduce PROCAT, a novel e-commerce dataset containing expertly designed product catalogues consisting of individual product offers grouped into complementary sections. We aim to address the scarcity of existing datasets in the area of set-to-sequence machine learning tasks, which involve complex structure prediction. The task's difficulty is further compounded by the need to place into sequences rare and previously-unseen instances, as well as by variable sequence lengths and substructures, in the form of diversely-structured catalogues. PROCAT provides catalogue data consisting of over 1.5 million set items across a 4-year period, in both raw text form and with pre-processed features containing information about relative visual placement. In addition to this ready-to-use dataset, we include baseline experimental results on a proposed benchmark task from a number of joint set encoding and permutation learning model architectures.

UR - https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/32bb90e8976aab5298d5da10fe66f21d-Abstract-round1.html

U2 - 10.6084/m9.figshare.14709507

DO - 10.6084/m9.figshare.14709507

M3 - Article in proceedings

VL - 1

BT - Thirty-fifth Conference on Neural Information Processing Systems

T2 - Thirty-fifth Conference on Neural Information Processing Systems

Y2 - 6 December 2021 through 14 December 2021

ER -

ID: 86385214