I/O-efficient Similarity Join

Rasmus Pagh, Ninh Dang Pham, Francesco Silvestri, Morten Danmark Stöckel

Research output: Journal Article or Conference Article in JournalJournal articleResearchpeer-review

Abstract

We present an I/O-efficient algorithm for computing similarity joins based on locality-sensitive hashing (LSH). In contrast to the filtering methods commonly suggested our method has provable subquadratic dependency on the data size. Further, in contrast to straightforward implementations of known LSH-based algorithms on external memory, our approach is able to take significant advantage of the available internal memory:
Original languageEnglish
JournalAlgorithmica
Volume78
ISSN0178-4617
DOIs
Publication statusPublished - 2017

Keywords

  • cache oblivious
  • cache aware
  • locality sensitive hashing
  • Similarity join

Fingerprint

Dive into the research topics of 'I/O-efficient Similarity Join'. Together they form a unique fingerprint.
  • SSS: Scalable Similarity Search

    Pagh, R. (PI), Christiani, T. L. (CoI), Pham, N. D. (CoI), Faithfull, A. (CoI), Silvestri, F. (CoI), Mikkelsen, J. W. (CoI), Sivertsen, J. V. T. (CoI), Aumüller, M. (CoI), Skala, M. (CoI), Ceccarello, M. (CoI), Themsen, R. (CoI), Jacob, R. (CoI), McCauley, S. (CoI) & Ahle, T. D. (CoI)

    European Commission

    01/05/201430/04/2019

    Project: Research

Cite this