TY - CHAP
T1 - An Empirical Evaluation of Search Strategies for Locality-Sensitive Hashing: Lookup, Voting, and Natural Classifier Search
AU - Johnsen, Malte Helin
AU - Aumüller, Martin
PY - 2024
Y1 - 2024
N2 - Approximate nearest neighbor search in high-dimensional metric spaces is crucial in modern data science pipelines. Efficient search algorithms often rely on partitioning the metric space. To find the approximate nearest neighbors of a query point, a candidate set is constructed based on the points that belong to the same part in the partition, and the closest points among these candidates are identified via a bruteforce search. Hyvönen et al. (JMLR, 2024) argue that viewing this problem as a multi-class labeling problem suggests that this traditional method is not optimal. Instead, they propose a “natural classifier” search strategy that incorporates the true labels of the candidate points, demonstrating faster searches and smaller candidate sets for the same accuracy for tree-based space partitioning methods. This paper explores the natural classifier and other search strategies for partitioning based on locality-sensitive hashing. We propose a new strategy that offers more precise control over the balance between performance and quality. Our analysis highlights the trade-offs between these methods, providing insights into optimizing search efficiency in various contexts
AB - Approximate nearest neighbor search in high-dimensional metric spaces is crucial in modern data science pipelines. Efficient search algorithms often rely on partitioning the metric space. To find the approximate nearest neighbors of a query point, a candidate set is constructed based on the points that belong to the same part in the partition, and the closest points among these candidates are identified via a bruteforce search. Hyvönen et al. (JMLR, 2024) argue that viewing this problem as a multi-class labeling problem suggests that this traditional method is not optimal. Instead, they propose a “natural classifier” search strategy that incorporates the true labels of the candidate points, demonstrating faster searches and smaller candidate sets for the same accuracy for tree-based space partitioning methods. This paper explores the natural classifier and other search strategies for partitioning based on locality-sensitive hashing. We propose a new strategy that offers more precise control over the balance between performance and quality. Our analysis highlights the trade-offs between these methods, providing insights into optimizing search efficiency in various contexts
KW - Approximate Nearest Neighbor Search
KW - Locality-Sensitive Hashing
KW - Similarity Search
KW - Search Strategies
UR - https://doi.org/10.1007/978-3-031-75823-2_13
U2 - 10.1007/978-3-031-75823-2_13
DO - 10.1007/978-3-031-75823-2_13
M3 - Book chapter
BT - Similarity Search and Applications, SISAP 2024
ER -