On the I/O Complexity of the k-Nearest Neighbors Problem

Mayank Goswami, Riko Jacob, Rasmus Pagh

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

We consider static, external memory indexes for exact and approximate versions of the k-nearest neighbor (k-NN) problem, and show new lower bounds under a standard indivisibility assumption: Polynomial space indexing schemes for high-dimensional k-NN in Hamming space cannot take advantage of block transfers: í(k) block reads are needed to to answer a query. For the l∞ metric the lower bound holds even if we allow c-appoximate nearest neighbors to be returned, for c ∈ (1, 3). The restriction to c < 3 is necessary: For every metric there exists an indexing scheme in the indexability model of Hellerstein et al. using space O(kn), where n is the number of points, that can retrieve k 3-approximate nearest neighbors using optimal ⌈k/B⌉ I/Os, where B is the block size. For specific metrics, data structures with better approximation factors are possible. For k-NN in Hamming space and every approximation factor c>1 there exists a polynomial space data structure that returns k c-approximate nearest neighbors in ⌈k/B⌉ I/Os. To show these lower bounds we develop two new techniques: First, to handle that approximation algorithms have more freedom in deciding which result set to return we develop a relaxed version of the λ-set workload technique of Hellerstein et al. This technique allows us to show lower bounds that hold in d ≥ n dimensions. To extend the lower bounds down to d = O(k log(n/k)) dimensions, we develop a new deterministic dimension reduction technique that may be of independent interest.
Original languageEnglish
Title of host publicationPODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
PublisherAssociation for Computing Machinery
Publication dateJun 2020
Pages205-212
ISBN (Electronic)978-1-4503-7108-7
DOIs
Publication statusPublished - Jun 2020
SeriesACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Keywords

  • k-nearest neighbor (k-NN)
  • external memory indexes
  • Hamming space
  • l∞ metric
  • approximate nearest neighbors
  • block reads
  • indexing schemes
  • dimension reduction
  • polynomial space data structures
  • indivisibility assumption

Fingerprint

Dive into the research topics of 'On the I/O Complexity of the k-Nearest Neighbors Problem'. Together they form a unique fingerprint.

Cite this