Projekter pr. år
Abstract
Outlier mining in ddimensional point sets is a fundamental and well studied data mining task due to its variety of applications. Most such applications arise in highdimensional domains. A bottleneck of existing approaches is that implicit or explicit assessments on concepts of distance or nearest neighbor are deteriorated in highdimensional data. Following up on the work of Kriegel et al. (KDD '08), we investigate the use of anglebased outlier factor in mining highdimensional outliers. While their algorithm runs in cubic time (with a quadratic time heuristic), we propose a novel random projectionbased technique that is able to estimate the anglebased outlier factor for all data points in time nearlinear in the size of the data. Also, our approach is suitable
to be performed in parallel environment to achieve a parallel speedup. We introduce a theoretical analysis of the quality of approximation to guarantee the reliability of our estimation algorithm. The empirical experiments on synthetic and real world data sets demonstrate that our approach is efficient and scalable to very large highdimensional data sets.
to be performed in parallel environment to achieve a parallel speedup. We introduce a theoretical analysis of the quality of approximation to guarantee the reliability of our estimation algorithm. The empirical experiments on synthetic and real world data sets demonstrate that our approach is efficient and scalable to very large highdimensional data sets.
Originalsprog  Engelsk 

Titel  KDD '12 Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining 
Antal sider  9 
Forlag  Association for Computing Machinery 
Publikationsdato  12 aug. 2012 
Sider  877885 
ISBN (Trykt)  9781450314626 
Status  Udgivet  12 aug. 2012 
Emneord
 Outlier detection
 highdimensional
 anglebased
 random projection
 AMS Sketch
Fingeraftryk
Dyk ned i forskningsemnerne om 'A Nearlinear Time Approximation Algorithm for Anglebased Outlier Detection in Highdimensional Data'. Sammen danner de et unikt fingeraftryk.Projekter
 1 Afsluttet

MaDaMS: Massive Data Mining by Sampling
Pagh, R. (PI), Stöckel, M. (CoI) & Pham, N. D. (CoI)
01/01/2011 → 31/12/2014
Projekter: Projekt › Forskning