On the Design of Scalable Outlier Detection Methods Using Approximate Nearest Neighbor Graphs

Research output: Conference Article in Proceeding or Book/Report chapterBook chapterResearchpeer-review

Abstract

Efficient and reliable methods for distinguishing outliers in
data remain crucial for data analysis. Although supervised methods based
on neural networks have gained recent traction, unsupervised methods
such as the kNN outlier method and local outlier factor (LOF) remain
state-of-the-art solutions according to different standardized benchmarks.
Unfortunately, exact outlier detection through nearest neighbor search
queries provides a scalability bottleneck for the high-dimensional, big
datasets that are routinely analyzed in data science applications. This
paper explores benefits and limitations of using approximate nearest neighbor search via Hierarchical Navigable Small World graphs (HNSW) to
overcome this scalability barrier. We evaluate direct implementations that
compute the kNN and LOF score from approximate neighborhoods and
show the robustness of the outlier detection even in settings where the
approximation is far away from the exact neighborhoods. Furthermore, we
design white-box methods that compute the outlier scores directly from
the underlying graph. These methods show much more variability in the
quality of the outlier scores and open new ground for the development of
task-aware tools based on approximate nearest neighbor search techniques
Original languageEnglish
Title of host publicationSimilarity Search and Applications: SISAP 2024
Publication date2024
DOIs
Publication statusPublished - 2024

Fingerprint

Dive into the research topics of 'On the Design of Scalable Outlier Detection Methods Using Approximate Nearest Neighbor Graphs'. Together they form a unique fingerprint.

Cite this