ITU

On the I/O Complexity of the k-Nearest Neighbors Problem

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Standard

On the I/O Complexity of the k-Nearest Neighbors Problem. / Goswami, Mayank; Jacob, Riko; Pagh, Rasmus.

PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. Association for Computing Machinery, 2020. p. 205-212 (ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems).

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Harvard

Goswami, M, Jacob, R & Pagh, R 2020, On the I/O Complexity of the k-Nearest Neighbors Problem. in PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. Association for Computing Machinery, ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 205-212. https://doi.org/10.1145/3375395.3387649

APA

Goswami, M., Jacob, R., & Pagh, R. (2020). On the I/O Complexity of the k-Nearest Neighbors Problem. In PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (pp. 205-212). Association for Computing Machinery. ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems https://doi.org/10.1145/3375395.3387649

Vancouver

Goswami M, Jacob R, Pagh R. On the I/O Complexity of the k-Nearest Neighbors Problem. In PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. Association for Computing Machinery. 2020. p. 205-212. (ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems). https://doi.org/10.1145/3375395.3387649

Author

Goswami, Mayank ; Jacob, Riko ; Pagh, Rasmus. / On the I/O Complexity of the k-Nearest Neighbors Problem. PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. Association for Computing Machinery, 2020. pp. 205-212 (ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems).

Bibtex

@inproceedings{261aa4d2051e49bfb9ab336e3b73ff14,
title = "On the I/O Complexity of the k-Nearest Neighbors Problem",
abstract = "We consider static, external memory indexes for exact and approximate versions of the k-nearest neighbor (k-NN) problem, and show new lower bounds under a standard indivisibility assumption: Polynomial space indexing schemes for high-dimensional k-NN in Hamming space cannot take advantage of block transfers: {\'i}(k) block reads are needed to to answer a query. For the l∞ metric the lower bound holds even if we allow c-appoximate nearest neighbors to be returned, for c ∈ (1, 3). The restriction to c < 3 is necessary: For every metric there exists an indexing scheme in the indexability model of Hellerstein et al. using space O(kn), where n is the number of points, that can retrieve k 3-approximate nearest neighbors using optimal ⌈k/B⌉ I/Os, where B is the block size. For specific metrics, data structures with better approximation factors are possible. For k-NN in Hamming space and every approximation factor c>1 there exists a polynomial space data structure that returns k c-approximate nearest neighbors in ⌈k/B⌉ I/Os. To show these lower bounds we develop two new techniques: First, to handle that approximation algorithms have more freedom in deciding which result set to return we develop a relaxed version of the λ-set workload technique of Hellerstein et al. This technique allows us to show lower bounds that hold in d ≥ n dimensions. To extend the lower bounds down to d = O(k log(n/k)) dimensions, we develop a new deterministic dimension reduction technique that may be of independent interest.",
author = "Mayank Goswami and Riko Jacob and Rasmus Pagh",
year = "2020",
month = jun,
doi = "10.1145/3375395.3387649",
language = "English",
series = "ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems",
publisher = "Association for Computing Machinery",
pages = "205--212",
booktitle = "PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems",
address = "United States",

}

RIS

TY - GEN

T1 - On the I/O Complexity of the k-Nearest Neighbors Problem

AU - Goswami, Mayank

AU - Jacob, Riko

AU - Pagh, Rasmus

PY - 2020/6

Y1 - 2020/6

N2 - We consider static, external memory indexes for exact and approximate versions of the k-nearest neighbor (k-NN) problem, and show new lower bounds under a standard indivisibility assumption: Polynomial space indexing schemes for high-dimensional k-NN in Hamming space cannot take advantage of block transfers: í(k) block reads are needed to to answer a query. For the l∞ metric the lower bound holds even if we allow c-appoximate nearest neighbors to be returned, for c ∈ (1, 3). The restriction to c < 3 is necessary: For every metric there exists an indexing scheme in the indexability model of Hellerstein et al. using space O(kn), where n is the number of points, that can retrieve k 3-approximate nearest neighbors using optimal ⌈k/B⌉ I/Os, where B is the block size. For specific metrics, data structures with better approximation factors are possible. For k-NN in Hamming space and every approximation factor c>1 there exists a polynomial space data structure that returns k c-approximate nearest neighbors in ⌈k/B⌉ I/Os. To show these lower bounds we develop two new techniques: First, to handle that approximation algorithms have more freedom in deciding which result set to return we develop a relaxed version of the λ-set workload technique of Hellerstein et al. This technique allows us to show lower bounds that hold in d ≥ n dimensions. To extend the lower bounds down to d = O(k log(n/k)) dimensions, we develop a new deterministic dimension reduction technique that may be of independent interest.

AB - We consider static, external memory indexes for exact and approximate versions of the k-nearest neighbor (k-NN) problem, and show new lower bounds under a standard indivisibility assumption: Polynomial space indexing schemes for high-dimensional k-NN in Hamming space cannot take advantage of block transfers: í(k) block reads are needed to to answer a query. For the l∞ metric the lower bound holds even if we allow c-appoximate nearest neighbors to be returned, for c ∈ (1, 3). The restriction to c < 3 is necessary: For every metric there exists an indexing scheme in the indexability model of Hellerstein et al. using space O(kn), where n is the number of points, that can retrieve k 3-approximate nearest neighbors using optimal ⌈k/B⌉ I/Os, where B is the block size. For specific metrics, data structures with better approximation factors are possible. For k-NN in Hamming space and every approximation factor c>1 there exists a polynomial space data structure that returns k c-approximate nearest neighbors in ⌈k/B⌉ I/Os. To show these lower bounds we develop two new techniques: First, to handle that approximation algorithms have more freedom in deciding which result set to return we develop a relaxed version of the λ-set workload technique of Hellerstein et al. This technique allows us to show lower bounds that hold in d ≥ n dimensions. To extend the lower bounds down to d = O(k log(n/k)) dimensions, we develop a new deterministic dimension reduction technique that may be of independent interest.

U2 - 10.1145/3375395.3387649

DO - 10.1145/3375395.3387649

M3 - Article in proceedings

T3 - ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

SP - 205

EP - 212

BT - PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

PB - Association for Computing Machinery

ER -

ID: 85551194