Skip to main navigation Skip to search Skip to main content

In the Picture: Medical Imaging Datasets, Artifacts, and their Living Review

  • Amelia Jiménez-Sánchez
  • , Natalia-Rozalia Avlona
  • , Sarah de Boer
  • , Víctor M. Campello
  • , Aasa Feragen
  • , Enzo Ferrante
  • , Melanie Ganz
  • , Judy Wawira Gichoya
  • , Camila Gonzalez
  • , Steff Groefsema
  • , Alessa Hering
  • , Adam Hulman
  • , Leo Joskowicz
  • , Dovile Juodelyte
  • , Melih Kandemir
  • , Thijs Kooi
  • , Jorge del Pozo Lérida
  • , Livie Yumeng Li
  • , Andre Pacheco
  • , Tim Rädsch
  • Mauricio Reyes, Théo Sourget, Bram van Ginneken, David Wen, Nina Weng, Jack Junchi Xu, Hubert Dariusz Zajaç, Maria A. Zuluaga, Veronika Cheplygina
  • University of Copenhagen
  • Radboud Universiteit Nijmegen
  • University of Barcelona
  • Technical University of Denmark
  • University of Buenos Aires
  • National Scientific and Technical Research Council
  • Copenhagen University Hospital
  • Emory University
  • Stanford University
  • University of Groningen
  • Aarhus University Hospital
  • Aarhus University
  • Hebrew University of Jerusalem
  • University of Southern Denmark
  • Lunit Inc.
  • Cerebriu A/S
  • Federal University of Espírito Santo
  • Heidelberg University 
  • German Cancer Research Center
  • Ibero-American University
  • Oxford University Hospital
  • Radiological AI testcenter
  • EURECOM

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

Datasets play a critical role in medical imaging research, yet issues such as label quality, shortcuts, and metadata are often overlooked. This lack of attention may harm the generalizability of algorithms and, consequently, negatively impact patient outcomes. While existing medical imaging literature reviews mostly focus on machine learning (ML) methods, with only a few focusing on datasets for specific applications, these reviews remain static – they are published once and not updated thereafter. This fails to account for emerging evidence, such as biases, shortcuts, and additional annotations that other researchers may contribute after the dataset is published. We refer to these newly discovered findings of datasets as research artifacts. To address this gap, we propose a living review that continuously tracks public datasets and their associated research artifacts across multiple medical imaging applications. Our approach includes a framework for the living review to monitor data documentation artifacts, and an SQL database to visualize the citation relationships between research artifact and dataset. Lastly, we discuss key considerations for creating medical imaging datasets, review best practices for data annotation, discuss the significance of shortcuts and demographic diversity, and emphasize the importance of managing datasets throughout their entire lifecycle. Our demo is publicly available at http://inthepicture.itu.dk/.
Original languageEnglish
Title of host publicationFAccT '25: Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency
Number of pages20
Place of PublicationNew York
PublisherAssociation for Computing Machinery
Publication date23 Jun 2025
Pages511-531
ISBN (Print)979-8-4007-1482-5
DOIs
Publication statusPublished - 23 Jun 2025
EventFairness, Accountability and Transparency - Greece, Athens, Greece
Duration: 23 Jun 202526 Jun 2025
Conference number: 8
https://facctconference.org/2025/

Conference

ConferenceFairness, Accountability and Transparency
Number8
LocationGreece
Country/TerritoryGreece
CityAthens
Period23/06/202526/06/2025
Internet address

Keywords

  • open data
  • data governance
  • healthcare
  • medical imaging
  • shortcuts
  • bias
  • research artifacts
  • living review

Fingerprint

Dive into the research topics of 'In the Picture: Medical Imaging Datasets, Artifacts, and their Living Review'. Together they form a unique fingerprint.

Cite this