I/O-Efficient Similarity Join

Rasmus Pagh, Ninh Dang Pham, Francesco Silvestri, Morten Stöckel

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

We present an I/O-efficient algorithm for computing similarity
joins based on locality-sensitive hashing (LSH). In contrast to the filtering
methods commonly suggested our method has provable sub-quadratic
dependency on the data size. Further, in contrast to straightforward
implementations of known LSH-based algorithms on external memory,
our approach is able to take significant advantage of the available internal
memory: Whereas the time complexity of classical algorithms includes a
factor of N ρ, where ρ is a parameter of the LSH used, the I/O complexity
of our algorithm merely includes a factor (N/M)ρ, where N is the data size
and M is the size of internal memory. Our algorithm is randomized and
outputs the correct result with high probability. It is a simple, recursive,
cache-oblivious procedure, and we believe that it will be useful also in
other computational settings such as parallel computation.
Original languageEnglish
Title of host publicationAlgorithms - ESA 2015 : 23rd Annual European Symposium, Patras, Greece, September 14-16, 2015, Proceedings
Number of pages12
PublisherSpringer
Publication date14 Sept 2015
Pages941-952
ISBN (Print)978-3-662-48349-7
ISBN (Electronic)978-3-662-48350-3
DOIs
Publication statusPublished - 14 Sept 2015
SeriesLecture Notes in Computer Science
Volume9294
ISSN0302-9743

Keywords

  • I/O-efficient algorithms
  • Locality-sensitive hashing
  • Similarity joins
  • Cache-oblivious computing
  • External memory algorithms

Fingerprint

Dive into the research topics of 'I/O-Efficient Similarity Join'. Together they form a unique fingerprint.
  • SSS: Scalable Similarity Search

    Pagh, R. (PI), Christiani, T. L. (CoI), Pham, N. D. (CoI), Faithfull, A. (CoI), Silvestri, F. (CoI), Mikkelsen, J. W. (CoI), Sivertsen, J. V. T. (CoI), Aumüller, M. (CoI), Skala, M. (CoI), Ceccarello, M. (CoI), Themsen, R. (CoI), Jacob, R. (CoI), McCauley, S. (CoI) & Ahle, T. D. (CoI)

    European Commission

    01/05/201430/04/2019

    Project: Research

Cite this