A Near-linear Time Approximation Algorithm for Angle-based Outlier Detection in High-dimensional Data

Ninh Dang Pham, Rasmus Pagh

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

Outlier mining in d-dimensional point sets is a fundamental and well studied data mining task due to its variety of applications. Most such applications arise in high-dimensional domains. A bottleneck of existing approaches is that implicit or explicit assessments on concepts of distance or nearest neighbor are deteriorated in high-dimensional data. Following up on the work of Kriegel et al. (KDD '08), we investigate the use of angle-based outlier factor in mining high-dimensional outliers. While their algorithm runs in cubic time (with a quadratic time heuristic), we propose a novel random projection-based technique that is able to estimate the angle-based outlier factor for all data points in time near-linear in the size of the data. Also, our approach is suitable
to be performed in parallel environment to achieve a parallel speedup. We introduce a theoretical analysis of the quality of approximation to guarantee the reliability of our estimation algorithm. The empirical experiments on synthetic and real world data sets demonstrate that our approach is efficient and scalable to very large high-dimensional data sets.
Original languageEnglish
Title of host publicationKDD '12 Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Number of pages9
PublisherAssociation for Computing Machinery
Publication date12 Aug 2012
Pages877-885
ISBN (Print)978-1-4503-1462-6
Publication statusPublished - 12 Aug 2012

Keywords

  • Outlier detection
  • high-dimensional
  • angle-based
  • random projection
  • AMS Sketch

Fingerprint

Dive into the research topics of 'A Near-linear Time Approximation Algorithm for Angle-based Outlier Detection in High-dimensional Data'. Together they form a unique fingerprint.

Cite this