An Empirical Evaluation of Search Strategies for Locality-Sensitive Hashing: Lookup, Voting, and Natural Classifier Search

Malte Helin Johnsen, Martin Aumüller

Research output: Conference Article in Proceeding or Book/Report chapterBook chapterResearchpeer-review

Abstract

Approximate nearest neighbor search in high-dimensional metric spaces is crucial in modern data science pipelines. Efficient search algorithms often rely on partitioning the metric space. To find the approximate nearest neighbors of a query point, a candidate set is constructed based on the points that belong to the same part in the partition, and the closest points among these candidates are identified via a bruteforce search. Hyvönen et al. (JMLR, 2024) argue that viewing this problem as a multi-class labeling problem suggests that this traditional method is not optimal. Instead, they propose a “natural classifier” search strategy that incorporates the true labels of the candidate points, demonstrating faster searches and smaller candidate sets for the same accuracy for tree-based space partitioning methods. This paper explores the natural classifier and other search strategies for partitioning based on locality-sensitive hashing. We propose a new strategy that offers more precise control over the balance between performance and quality. Our analysis highlights the trade-offs between these methods, providing insights into optimizing search efficiency in various contexts
Original languageEnglish
Title of host publicationSimilarity Search and Applications, SISAP 2024
Publication date2024
DOIs
Publication statusPublished - 2024

Keywords

  • Approximate Nearest Neighbor Search
  • Locality-Sensitive Hashing
  • Similarity Search
  • Search Strategies

Fingerprint

Dive into the research topics of 'An Empirical Evaluation of Search Strategies for Locality-Sensitive Hashing: Lookup, Voting, and Natural Classifier Search'. Together they form a unique fingerprint.

Cite this