Measuring the Difficulty of Distance-Based Indexing

Matthew Skala

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

Data structures for similarity search are commonly evaluated on data in vector spaces, but distance-based data structures are also applicable to non-vector spaces with no natural concept of dimensionality. The intrinsic dimensionality statistic of Chávez and Navarro provides a way to compare the performance of similarity indexing and search algorithms across different spaces, and predict the performance of index data structures on non-vector spaces by relating them to equivalent vector spaces. We characterise its asymptotic behaviour, and give experimental results to calibrate these comparisons.
Original languageEnglish
Title of host publicationProceedings of the 12th International Conference on String Processing and Information Retrieval (SPIRE 2005), Buenos Aires, Argentina, November 2--4, 2005
Number of pages12
Volume3772
PublisherSpringer London
Publication date2005
Pages103-114
Publication statusPublished - 2005
Externally publishedYes

Keywords

  • Similarity search
  • Vector spaces
  • Distance-based data structures
  • Intrinsic dimensionality
  • Performance prediction

Fingerprint

Dive into the research topics of 'Measuring the Difficulty of Distance-Based Indexing'. Together they form a unique fingerprint.

Cite this