I/O-Efficient Similarity Join

Rasmus Pagh, Ninh Dang Pham, Francesco Silvestri, Morten Stöckel

Publikation: Konference artikel i Proceeding eller bog/rapport kapitelKonferencebidrag i proceedingsForskningpeer review

Abstract

We present an I/O-efficient algorithm for computing similarity
joins based on locality-sensitive hashing (LSH). In contrast to the filtering
methods commonly suggested our method has provable sub-quadratic
dependency on the data size. Further, in contrast to straightforward
implementations of known LSH-based algorithms on external memory,
our approach is able to take significant advantage of the available internal
memory: Whereas the time complexity of classical algorithms includes a
factor of N ρ, where ρ is a parameter of the LSH used, the I/O complexity
of our algorithm merely includes a factor (N/M)ρ, where N is the data size
and M is the size of internal memory. Our algorithm is randomized and
outputs the correct result with high probability. It is a simple, recursive,
cache-oblivious procedure, and we believe that it will be useful also in
other computational settings such as parallel computation.
OriginalsprogEngelsk
TitelAlgorithms - ESA 2015 : 23rd Annual European Symposium, Patras, Greece, September 14-16, 2015, Proceedings
Antal sider12
ForlagSpringer
Publikationsdato14 sep. 2015
Sider941-952
ISBN (Trykt)978-3-662-48349-7
ISBN (Elektronisk)978-3-662-48350-3
DOI
StatusUdgivet - 14 sep. 2015
NavnLecture Notes in Computer Science
Vol/bind9294
ISSN0302-9743

Emneord

  • I/O-efficient algorithms
  • Locality-sensitive hashing
  • Similarity joins
  • Cache-oblivious computing
  • External memory algorithms

Fingeraftryk

Dyk ned i forskningsemnerne om 'I/O-Efficient Similarity Join'. Sammen danner de et unikt fingeraftryk.
  • SSS: Scalable Similarity Search

    Pagh, R. (PI), Christiani, T. L. (CoI), Pham, N. D. (CoI), Faithfull, A. (CoI), Silvestri, F. (CoI), Mikkelsen, J. W. (CoI), Sivertsen, J. V. T. (CoI), Aumüller, M. (CoI), Skala, M. (CoI), Ceccarello, M. (CoI), Themsen, R. (CoI), Jacob, R. (CoI), McCauley, S. (CoI) & Ahle, T. D. (CoI)

    European Commission

    01/05/201430/04/2019

    Projekter: ProjektForskning

Citationsformater