Skip to main navigation Skip to search Skip to main content

High-dimensional density-based clustering using locality-sensitive hashing

  • University of Southern Denmark

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

The DBSCAN algorithm is a popular density-based clustering method to find clusters of arbitrary shapes without requiring an initial guess on the number of clusters. While there are methods to run DBSCAN efficiently in low-dimensional data in near-linear time, there remains a need for an efficient DBSCAN algorithm that scales to high-dimensional data. The bottleneck in highdimensional data is that the range queries necessary in carrying out the algorithm suffer from the curse of dimensionality. In this paper we present the SRRDBSCAN algorithm. This algorithm is an implementation of approximate DBSCAN using locality-sensitive hashing. We prove sub-quadratic running time bounds under reasonable assumptions about the data. An important ingredient in the design of the data structure is the use of a multi-level LSH data structure, which automatically adapts to the density of data points. An extensive empirical analysis shows that the approximation does not significantly impact the quality of the clustering found by the algorithm as compared to the exact DBSCAN clustering. Moreover, our algorithm is competitive with other approaches even in low-dimensional settings, and thus provides a general-purpose DBSCAN implementation for arbitrary data.
Original languageEnglish
Title of host publicationAdvances in Database Technology - EDBT
Number of pages13
Publication date2025
Pages694-706
DOIs
Publication statusPublished - 2025
EventInternational Conference on Extending Database Technology - Barcelona, Spain
Duration: 25 Mar 202528 Mar 2025
Conference number: 28
https://openproceedings.org/html/pages/2025_edbt.html

Conference

ConferenceInternational Conference on Extending Database Technology
Number28
Country/TerritorySpain
CityBarcelona
Period25/03/202528/03/2025
Internet address

Fingerprint

Dive into the research topics of 'High-dimensional density-based clustering using locality-sensitive hashing'. Together they form a unique fingerprint.

Cite this