Abstract
Knowledge graphs (KGs) are being used in many real-world application domains, ranging from search engines to biomedical
data analysis. Even if there is a large corpus of KGs available,
they are inherently incomplete due to the incompleteness of the
sources based on which they were constructed. Knowledge graph
embeddings (KGEs) is a very popular technique to complete KGs.
However, they are only capable of answering true or false to a
given fact. Thus, users need to provide a concrete query or some
test data. Unfortunately, such queries or data are not always
available. There are cases where users want to discover all (or as
many as possible) missing facts from an input KG. Given a KGE
model, users should thus provide to the KGE model candidate
facts consisting of the complement of the KG. This is infeasible
even for small graphs simply due to the size of the complement
graph. In this paper, we define the problem of discovering missing facts from a given KGE model and refer to it as fact discovery.
We study sampling methods to get candidate facts and then using
KGEs to retrieve the most plausible ones. We extensively evaluate
different existing sampling methods and provide guidelines on
when each one of them is most suitable. We also discuss the challenges and limitations that we encountered when investigating
the different techniques. With these insights, we expect to shed
light and attract more researchers on this unexplored direction.
data analysis. Even if there is a large corpus of KGs available,
they are inherently incomplete due to the incompleteness of the
sources based on which they were constructed. Knowledge graph
embeddings (KGEs) is a very popular technique to complete KGs.
However, they are only capable of answering true or false to a
given fact. Thus, users need to provide a concrete query or some
test data. Unfortunately, such queries or data are not always
available. There are cases where users want to discover all (or as
many as possible) missing facts from an input KG. Given a KGE
model, users should thus provide to the KGE model candidate
facts consisting of the complement of the KG. This is infeasible
even for small graphs simply due to the size of the complement
graph. In this paper, we define the problem of discovering missing facts from a given KGE model and refer to it as fact discovery.
We study sampling methods to get candidate facts and then using
KGEs to retrieve the most plausible ones. We extensively evaluate
different existing sampling methods and provide guidelines on
when each one of them is most suitable. We also discuss the challenges and limitations that we encountered when investigating
the different techniques. With these insights, we expect to shed
light and attract more researchers on this unexplored direction.
Originalsprog | Engelsk |
---|---|
Titel | EDBT |
Publikationsdato | 2024 |
Sider | 664-675 |
Status | Udgivet - 2024 |