Abstract
The DBSCAN algorithm is a popular density-based clustering method to find clusters of arbitrary shapes without requiring an initial guess on the number of clusters. While there are methods to run DBSCAN efficiently in low-dimensional data in near-linear time, there remains a need for an efficient DBSCAN algorithm that scales to high-dimensional data. The bottleneck in highdimensional data is that the range queries necessary in carrying out the algorithm suffer from the curse of dimensionality. In this paper we present the SRRDBSCAN algorithm. This algorithm is an implementation of approximate DBSCAN using locality-sensitive hashing. We prove sub-quadratic running time bounds under reasonable assumptions about the data. An important ingredient in the design of the data structure is the use of a multi-level LSH data structure, which automatically adapts to the density of data points. An extensive empirical analysis shows that the approximation does not significantly impact the quality of the clustering found by the algorithm as compared to the exact DBSCAN clustering. Moreover, our algorithm is competitive with other approaches even in low-dimensional settings, and thus provides a general-purpose DBSCAN implementation for arbitrary data.
| Original language | English |
|---|---|
| Title of host publication | Advances in Database Technology - EDBT |
| Number of pages | 13 |
| Publication date | 2025 |
| Pages | 694-706 |
| DOIs | |
| Publication status | Published - 2025 |
| Event | International Conference on Extending Database Technology - Barcelona, Spain Duration: 25 Mar 2025 → 28 Mar 2025 Conference number: 28 https://openproceedings.org/html/pages/2025_edbt.html |
Conference
| Conference | International Conference on Extending Database Technology |
|---|---|
| Number | 28 |
| Country/Territory | Spain |
| City | Barcelona |
| Period | 25/03/2025 → 28/03/2025 |
| Internet address |
Fingerprint
Dive into the research topics of 'High-dimensional density-based clustering using locality-sensitive hashing'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver