TY - GEN
T1 - In the Picture: Medical Imaging Datasets, Artifacts, and their Living Review
AU - Jiménez-Sánchez, Amelia
AU - Avlona, Natalia-Rozalia
AU - de Boer, Sarah
AU - Campello, Víctor M.
AU - Feragen, Aasa
AU - Ferrante, Enzo
AU - Ganz, Melanie
AU - Gichoya, Judy Wawira
AU - Gonzalez, Camila
AU - Groefsema, Steff
AU - Hering, Alessa
AU - Hulman, Adam
AU - Joskowicz, Leo
AU - Juodelyte, Dovile
AU - Kandemir, Melih
AU - Kooi, Thijs
AU - Lérida, Jorge del Pozo
AU - Li, Livie Yumeng
AU - Pacheco, Andre
AU - Rädsch, Tim
AU - Reyes, Mauricio
AU - Sourget, Théo
AU - van Ginneken, Bram
AU - Wen, David
AU - Weng, Nina
AU - Xu, Jack Junchi
AU - Zajaç, Hubert Dariusz
AU - Zuluaga, Maria A.
AU - Cheplygina, Veronika
N1 - Conference code: 8
PY - 2025/6/23
Y1 - 2025/6/23
N2 - Datasets play a critical role in medical imaging research, yet issues such as label quality, shortcuts, and metadata are often overlooked. This lack of attention may harm the generalizability of algorithms and, consequently, negatively impact patient outcomes. While existing medical imaging literature reviews mostly focus on machine learning (ML) methods, with only a few focusing on datasets for specific applications, these reviews remain static – they are published once and not updated thereafter. This fails to account for emerging evidence, such as biases, shortcuts, and additional annotations that other researchers may contribute after the dataset is published. We refer to these newly discovered findings of datasets as research artifacts. To address this gap, we propose a living review that continuously tracks public datasets and their associated research artifacts across multiple medical imaging applications. Our approach includes a framework for the living review to monitor data documentation artifacts, and an SQL database to visualize the citation relationships between research artifact and dataset. Lastly, we discuss key considerations for creating medical imaging datasets, review best practices for data annotation, discuss the significance of shortcuts and demographic diversity, and emphasize the importance of managing datasets throughout their entire lifecycle. Our demo is publicly available at http://inthepicture.itu.dk/.
AB - Datasets play a critical role in medical imaging research, yet issues such as label quality, shortcuts, and metadata are often overlooked. This lack of attention may harm the generalizability of algorithms and, consequently, negatively impact patient outcomes. While existing medical imaging literature reviews mostly focus on machine learning (ML) methods, with only a few focusing on datasets for specific applications, these reviews remain static – they are published once and not updated thereafter. This fails to account for emerging evidence, such as biases, shortcuts, and additional annotations that other researchers may contribute after the dataset is published. We refer to these newly discovered findings of datasets as research artifacts. To address this gap, we propose a living review that continuously tracks public datasets and their associated research artifacts across multiple medical imaging applications. Our approach includes a framework for the living review to monitor data documentation artifacts, and an SQL database to visualize the citation relationships between research artifact and dataset. Lastly, we discuss key considerations for creating medical imaging datasets, review best practices for data annotation, discuss the significance of shortcuts and demographic diversity, and emphasize the importance of managing datasets throughout their entire lifecycle. Our demo is publicly available at http://inthepicture.itu.dk/.
KW - open data
KW - data governance
KW - healthcare
KW - medical imaging
KW - shortcuts
KW - bias
KW - research artifacts
KW - living review
UR - https://doi.org/10.1145/3715275.3732035
UR - http://inthepicture.itu.dk/
U2 - 10.1145/3715275.3732035
DO - 10.1145/3715275.3732035
M3 - Article in proceedings
SN - 979-8-4007-1482-5
SP - 511
EP - 531
BT - FAccT '25: Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency
PB - Association for Computing Machinery
CY - New York
T2 - The 8th annual ACM Conference on Fairness, Accountability, and Transparency
Y2 - 23 June 2025 through 26 June 2025
ER -