Projektdetaljer
Beskrivelse
This project opens modern research in clinical text analysis for Danish. It will work on developing automatic summaries of clinical records, based on extraction of events stored in those summaries.
We know that around 60% of data about patients is stored in the text part of their clinical record, with only a minority being covered by the heavily-structured part of the data. Form fields don’t suit the majority of data stored about a patient, and so everything else goes in the free text. This means that text processing of clinical notes is essential to machine understand and processing of patients – an ever-more present task in the current era. However, modern (i.e. high-performance neural network-based) tools for this for Danish do not exist, and are indeed hard to come by in our current regulatory environment, due to restrictions on data access. On the other hand, Danish medical records are perhaps the best in the world from an IT perspective, with digitized data going back as far as the 1970s. While patients in Denmark are currently falling behind due to a lack of tools to work with Danish clinical text, there is also the potential to do immense good by establishing such a toolkit.
Clinical text is a special genre, with its own idiosyncrasies, that cannot be processed well by tools trained on standard text such as news or Wikipedia. It contains a variety of spelling mistakes, abbreviations, dysfluences, jargon, and so on, often varying not only of course from country to country but also between hospitals and even individual doctors. One could analogise by saying that the reputation for doctors’ handwriting maps equally to the digital text content, though of course the problem is more semantic and orthographic than graphologic.
Beyond opening the way for modern text processing in Danish clinical practice, this natural language processing toolkit for meachine reading of clinical notes addresses two research problems. The first is in event-based summarization. It will build tools for extracting events described in clinical notes, which could be clinical or non-clinical (e.g. receiving an injection or the death of a sibling), linking these events to their actors. The second is summarization, useful when a quick overview of a potentially long record is required e.g. at patient handover; this uses the extract events and their computational representations to generate natural language summaries of a patient history.
Exploratory work has been done by some of my masters‘ students at ITU; we already have a huge database of clinical text, thanks to Søren Lauesen’s earlier work, which we can access and use, overcoming the current regulatory boundary.
This is a general problem and will not go away. We should not pay someone like IBM to build and own this technology for the Danish language. It has a broad impact for health technology in Denmark, and so it makes sense for it it to be done at a Danish institution, and given that ITU is the only university holding competency in and focusing on NLP for Denmark, I’d say it makes most sense for it to be done here.
We know that around 60% of data about patients is stored in the text part of their clinical record, with only a minority being covered by the heavily-structured part of the data. Form fields don’t suit the majority of data stored about a patient, and so everything else goes in the free text. This means that text processing of clinical notes is essential to machine understand and processing of patients – an ever-more present task in the current era. However, modern (i.e. high-performance neural network-based) tools for this for Danish do not exist, and are indeed hard to come by in our current regulatory environment, due to restrictions on data access. On the other hand, Danish medical records are perhaps the best in the world from an IT perspective, with digitized data going back as far as the 1970s. While patients in Denmark are currently falling behind due to a lack of tools to work with Danish clinical text, there is also the potential to do immense good by establishing such a toolkit.
Clinical text is a special genre, with its own idiosyncrasies, that cannot be processed well by tools trained on standard text such as news or Wikipedia. It contains a variety of spelling mistakes, abbreviations, dysfluences, jargon, and so on, often varying not only of course from country to country but also between hospitals and even individual doctors. One could analogise by saying that the reputation for doctors’ handwriting maps equally to the digital text content, though of course the problem is more semantic and orthographic than graphologic.
Beyond opening the way for modern text processing in Danish clinical practice, this natural language processing toolkit for meachine reading of clinical notes addresses two research problems. The first is in event-based summarization. It will build tools for extracting events described in clinical notes, which could be clinical or non-clinical (e.g. receiving an injection or the death of a sibling), linking these events to their actors. The second is summarization, useful when a quick overview of a potentially long record is required e.g. at patient handover; this uses the extract events and their computational representations to generate natural language summaries of a patient history.
Exploratory work has been done by some of my masters‘ students at ITU; we already have a huge database of clinical text, thanks to Søren Lauesen’s earlier work, which we can access and use, overcoming the current regulatory boundary.
This is a general problem and will not go away. We should not pay someone like IBM to build and own this technology for the Danish language. It has a broad impact for health technology in Denmark, and so it makes sense for it it to be done at a Danish institution, and given that ITU is the only university holding competency in and focusing on NLP for Denmark, I’d say it makes most sense for it to be done here.
Status | Afsluttet |
---|---|
Effektiv start/slut dato | 01/12/2020 → 30/11/2021 |
Finansiering
- Novo Nordisk Foundation: 500.000,00 kr.
Fingerprint
Udforsk forskningsemnerne, som dette projekt berører. Disse etiketter er oprettet på grundlag af de underliggende bevillinger/legater. Sammen danner de et unikt fingerprint.