Bias in Danish Medical Notes: Infection Classification of Long Texts Using Transformer and LSTM Architectures Coupled with BERT

Mehdi Parviz, Rudi Agius, Carsten Niemann, Rob Van Der Goot

Publikation: Konference artikel i Proceeding eller bog/rapport kapitelKonferencebidrag i proceedingsForskningpeer review

Abstract

Medical notes contain a wealth of information related to diagnosis, prognosis, and overall patient care that can be used to help physicians make informed decisions. However, like any other data sets consisting of data from diverse demographics, they may be biased toward certain subgroups or subpopulations. Consequently, any bias in the data will be reflected in the output of the machine learning models trained on them. In this paper, we investigate the existence of such biases in Danish medical notes related to three types of blood cancer, with the goal of classifying whether the medical notes indicate severe infection. By employing a hierarchical architecture that combines a sequence model (Transformer and LSTM) with a BERT model to classify long notes, we uncover biases related to demographics and cancer types. Furthermore, we observe performance differences between hospitals. These findings underscore the importance of investigating bias in critical settings such as healthcare and the urgency of monitoring and mitigating it when developing AI-based systems.
OriginalsprogEngelsk
TitelProceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health)
RedaktørerSophia Ananiadou, Dina Demner-Fushman, Deepak Gupta, Paul Thompson
Antal sider5
UdgivelsesstedAlbuquerque, New Mexico
ForlagAssociation for Computational Linguistics
Publikationsdato1 maj 2025
Sider316-320
ISBN (Trykt)979-8-89176-238-1
DOI
StatusUdgivet - 1 maj 2025

Fingeraftryk

Dyk ned i forskningsemnerne om 'Bias in Danish Medical Notes: Infection Classification of Long Texts Using Transformer and LSTM Architectures Coupled with BERT'. Sammen danner de et unikt fingeraftryk.

Citationsformater