Preserving medical correctness, readability and consistency in de-identified health records

Kostas Pantazos, Søren Lauesen, Søren Lippert

    Research output: Journal Article or Conference Article in JournalJournal articleResearchpeer-review

    Abstract

    A health record database contains structured data fields that identify the patient, such as patient ID, patient
    name, e-mail and phone number. These data are fairly easy to de-identify, that is, replace with other
    identifiers. However, these data also occur in fields with doctors’ free-text notes written in an abbreviated
    style that cannot be analyzed grammatically. If we replace a word that looks like a name, but isn’t, we degrade
    readability and medical correctness. If we fail to replace it when we should, we degrade confidentiality. We de-identified an existing Danish electronic health record database, ending up with 323,122 patient health records. We had to invent many methods for de-identifying potential identifiers in the free-text notes. The de-identified health records should be used with caution for statistical purposes because we removed health records that were so special that they couldn’t be de-identified. Furthermore, we distorted geography by replacing zip codes with random zip codes.
    Original languageEnglish
    JournalHealth Informatics Journal
    Pages (from-to)1-13
    Number of pages13
    ISSN1460-4582
    DOIs
    Publication statusPublished - 1 Jan 2016

    Keywords

    • anonymi
    • consistency
    • de-identification
    • electronic health record
    • readability

    Fingerprint

    Dive into the research topics of 'Preserving medical correctness, readability and consistency in de-identified health records'. Together they form a unique fingerprint.

    Cite this