Scientific dissemination is of central importance for the scientific process. This paper presents CiteTracked, a dataset of peer reviews and citation statistics covering scientific papers from the machine learning community and spanning six years. We describe and analyze the data collection of over 3,000 published papers, their peer review texts and citation counts, and depict possible usage directions. The dataset aims at fertilizing novel interdisciplinary work between fields such as scientometrics, information retrieval, computational linguistics and natural language processing to study the scientific publishing process.
Title of host publication
4th Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2019)