## Abstract

We present an I/O-efficient algorithm for computing similarity

joins based on locality-sensitive hashing (LSH). In contrast to the filtering

methods commonly suggested our method has provable sub-quadratic

dependency on the data size. Further, in contrast to straightforward

implementations of known LSH-based algorithms on external memory,

our approach is able to take significant advantage of the available internal

memory: Whereas the time complexity of classical algorithms includes a

factor of N ρ, where ρ is a parameter of the LSH used, the I/O complexity

of our algorithm merely includes a factor (N/M)ρ, where N is the data size

and M is the size of internal memory. Our algorithm is randomized and

outputs the correct result with high probability. It is a simple, recursive,

cache-oblivious procedure, and we believe that it will be useful also in

other computational settings such as parallel computation.

joins based on locality-sensitive hashing (LSH). In contrast to the filtering

methods commonly suggested our method has provable sub-quadratic

dependency on the data size. Further, in contrast to straightforward

implementations of known LSH-based algorithms on external memory,

our approach is able to take significant advantage of the available internal

memory: Whereas the time complexity of classical algorithms includes a

factor of N ρ, where ρ is a parameter of the LSH used, the I/O complexity

of our algorithm merely includes a factor (N/M)ρ, where N is the data size

and M is the size of internal memory. Our algorithm is randomized and

outputs the correct result with high probability. It is a simple, recursive,

cache-oblivious procedure, and we believe that it will be useful also in

other computational settings such as parallel computation.

Original language | English |
---|---|

Title of host publication | Algorithms - ESA 2015 : 23rd Annual European Symposium, Patras, Greece, September 14-16, 2015, Proceedings |

Number of pages | 12 |

Publisher | Springer |

Publication date | 14 Sept 2015 |

Pages | 941-952 |

ISBN (Print) | 978-3-662-48349-7 |

ISBN (Electronic) | 978-3-662-48350-3 |

DOIs | |

Publication status | Published - 14 Sept 2015 |

Series | Lecture Notes in Computer Science |
---|---|

Volume | 9294 |

ISSN | 0302-9743 |

## Keywords

- I/O-efficient algorithms
- Locality-sensitive hashing
- Similarity joins
- Cache-oblivious computing
- External memory algorithms