Abstract
Formal Concept Analysis is a symbolic learning technique derived from mathematical algebra and order theory. The technique has been applied to a broad range of knowledge representation and exploration tasks in a number of domains. Most recorded applications of Formal Concept Analysis deal with a small number of objects and attributes, in which case the complexity of the algorithms used for indexing and retrieving data is not a significant issue. However, when Formal Concept Analysis is applied to exploration of a large numbers of objects and attributes, the size of the data makes issues of complexity and scalability crucial.
This paper presents the results of experiments carried out with a set of 4,000 medical discharge summaries in which were recognized 1,962 attributes from the Unified Medical Language System (UMLS). In this domain, the objects are medical documents (4,000) and the attributes are UMLS terms extracted from the documents (1,962). When Formal Concept Analysis is used to iteratively analyze and visualize these data, complexity and scalability become critically important.
Although the amount of data used in this experiment is small compared with the size of primary memory in modern computers, the results are still important because the probability distributions that determine the efficiencies are likely to remain stable as the size of the data is increased.
Our work presents two outcomes. First, we present a methodology for exploring knowledge in text documents using Formal Concept Analysis by employing conceptual scales created as the result of direct manipulation of a line diagram. The conceptual scales lead to small derived purified contexts that are represented using nested line diagrams. Second, we present an algorithm for the fast determination of purified contexts from compressed representation of the large formal context. Our work draws on existing encoding and compression techniques to show how rudimentary data analysis can lead to substantial efficiency improvements in knowledge visualization.
This paper presents the results of experiments carried out with a set of 4,000 medical discharge summaries in which were recognized 1,962 attributes from the Unified Medical Language System (UMLS). In this domain, the objects are medical documents (4,000) and the attributes are UMLS terms extracted from the documents (1,962). When Formal Concept Analysis is used to iteratively analyze and visualize these data, complexity and scalability become critically important.
Although the amount of data used in this experiment is small compared with the size of primary memory in modern computers, the results are still important because the probability distributions that determine the efficiencies are likely to remain stable as the size of the data is increased.
Our work presents two outcomes. First, we present a methodology for exploring knowledge in text documents using Formal Concept Analysis by employing conceptual scales created as the result of direct manipulation of a line diagram. The conceptual scales lead to small derived purified contexts that are represented using nested line diagrams. Second, we present an algorithm for the fast determination of purified contexts from compressed representation of the large formal context. Our work draws on existing encoding and compression techniques to show how rudimentary data analysis can lead to substantial efficiency improvements in knowledge visualization.
Originalsprog | Udefineret/Ukendt |
---|---|
Tidsskrift | Computational Intelligence |
Vol/bind | 15 |
Udgave nummer | 1 |
Sider (fra-til) | 11-27 |
Antal sider | 17 |
ISSN | 0824-7935 |
Status | Udgivet - 1999 |
Udgivet eksternt | Ja |
Emneord
- - Formal Concept Analysis
- - Knowledge Representation
- - Scalability
- - Medical Document Analysis
- - Conceptual Scales
- - Unified Medical Language System
- - Data Complexity
- - Nested Line Diagrams
- - Purified Contexts
- - Data Compression Techniques