On the Design of Scalable Outlier Detection Methods Using Approximate Nearest Neighbor Graphs

Publikation: Konference artikel i Proceeding eller bog/rapport kapitelBidrag til bog/antologiForskningpeer review

Abstract

Efficient and reliable methods for distinguishing outliers in
data remain crucial for data analysis. Although supervised methods based
on neural networks have gained recent traction, unsupervised methods
such as the kNN outlier method and local outlier factor (LOF) remain
state-of-the-art solutions according to different standardized benchmarks.
Unfortunately, exact outlier detection through nearest neighbor search
queries provides a scalability bottleneck for the high-dimensional, big
datasets that are routinely analyzed in data science applications. This
paper explores benefits and limitations of using approximate nearest neighbor search via Hierarchical Navigable Small World graphs (HNSW) to
overcome this scalability barrier. We evaluate direct implementations that
compute the kNN and LOF score from approximate neighborhoods and
show the robustness of the outlier detection even in settings where the
approximation is far away from the exact neighborhoods. Furthermore, we
design white-box methods that compute the outlier scores directly from
the underlying graph. These methods show much more variability in the
quality of the outlier scores and open new ground for the development of
task-aware tools based on approximate nearest neighbor search techniques
OriginalsprogEngelsk
TitelSimilarity Search and Applications: SISAP 2024
Publikationsdato2024
DOI
StatusUdgivet - 2024

Fingeraftryk

Dyk ned i forskningsemnerne om 'On the Design of Scalable Outlier Detection Methods Using Approximate Nearest Neighbor Graphs'. Sammen danner de et unikt fingeraftryk.

Citationsformater