Abstract
Approximate nearest neighbor search in high-dimensional metric spaces is crucial in modern data science pipelines. Efficient search algorithms often rely on partitioning the metric space. To find the approximate nearest neighbors of a query point, a candidate set is constructed based on the points that belong to the same part in the partition, and the closest points among these candidates are identified via a bruteforce search. Hyvönen et al. (JMLR, 2024) argue that viewing this problem as a multi-class labeling problem suggests that this traditional method is not optimal. Instead, they propose a “natural classifier” search strategy that incorporates the true labels of the candidate points, demonstrating faster searches and smaller candidate sets for the same accuracy for tree-based space partitioning methods. This paper explores the natural classifier and other search strategies for partitioning based on locality-sensitive hashing. We propose a new strategy that offers more precise control over the balance between performance and quality. Our analysis highlights the trade-offs between these methods, providing insights into optimizing search efficiency in various contexts
Original language | English |
---|---|
Title of host publication | Similarity Search and Applications, SISAP 2024 |
Publication date | 2024 |
DOIs | |
Publication status | Published - 2024 |
Keywords
- Approximate Nearest Neighbor Search
- Locality-Sensitive Hashing
- Similarity Search
- Search Strategies