Projects per year
Abstract
We present an I/O-efficient algorithm for computing similarity
joins based on locality-sensitive hashing (LSH). In contrast to the filtering
methods commonly suggested our method has provable sub-quadratic
dependency on the data size. Further, in contrast to straightforward
implementations of known LSH-based algorithms on external memory,
our approach is able to take significant advantage of the available internal
memory: Whereas the time complexity of classical algorithms includes a
factor of N ρ, where ρ is a parameter of the LSH used, the I/O complexity
of our algorithm merely includes a factor (N/M)ρ, where N is the data size
and M is the size of internal memory. Our algorithm is randomized and
outputs the correct result with high probability. It is a simple, recursive,
cache-oblivious procedure, and we believe that it will be useful also in
other computational settings such as parallel computation.
joins based on locality-sensitive hashing (LSH). In contrast to the filtering
methods commonly suggested our method has provable sub-quadratic
dependency on the data size. Further, in contrast to straightforward
implementations of known LSH-based algorithms on external memory,
our approach is able to take significant advantage of the available internal
memory: Whereas the time complexity of classical algorithms includes a
factor of N ρ, where ρ is a parameter of the LSH used, the I/O complexity
of our algorithm merely includes a factor (N/M)ρ, where N is the data size
and M is the size of internal memory. Our algorithm is randomized and
outputs the correct result with high probability. It is a simple, recursive,
cache-oblivious procedure, and we believe that it will be useful also in
other computational settings such as parallel computation.
Original language | English |
---|---|
Title of host publication | Algorithms - ESA 2015 : 23rd Annual European Symposium, Patras, Greece, September 14-16, 2015, Proceedings |
Number of pages | 12 |
Publisher | Springer |
Publication date | 14 Sept 2015 |
Pages | 941-952 |
ISBN (Print) | 978-3-662-48349-7 |
ISBN (Electronic) | 978-3-662-48350-3 |
DOIs | |
Publication status | Published - 14 Sept 2015 |
Series | Lecture Notes in Computer Science |
---|---|
Volume | 9294 |
ISSN | 0302-9743 |
Keywords
- I/O-efficient algorithms
- Locality-sensitive hashing
- Similarity joins
- Cache-oblivious computing
- External memory algorithms
Fingerprint
Dive into the research topics of 'I/O-Efficient Similarity Join'. Together they form a unique fingerprint.Projects
- 1 Finished
-
SSS: Scalable Similarity Search
Pagh, R. (PI), Christiani, T. L. (CoI), Pham, N. D. (CoI), Faithfull, A. (CoI), Silvestri, F. (CoI), Mikkelsen, J. W. (CoI), Sivertsen, J. V. T. (CoI), Aumüller, M. (CoI), Skala, M. (CoI), Ceccarello, M. (CoI), Themsen, R. (CoI), Jacob, R. (CoI), McCauley, S. (CoI) & Ahle, T. D. (CoI)
01/05/2014 → 30/04/2019
Project: Research