## Abstract

Similarity computations on large amounts of high-dimensional data has become

the backbone of many of the tasks today at the frontier of computer science, from

machine learning to information retrieval. With this volume and complexity of

input we commonly accept that finding exact results for a given query will entail

prohibitively large storage or time requirements, so we pursue approximate results.

The main contribution of this dissertation is the introduction of new or improved

approximation algorithms and data structures for several similarity search

problems. We examine the furthest neighbor query, the annulus query, distance

sensitive membership, nearest neighbor preserving embeddings and set similarity

queries in the large-scale, high-dimensional setting. In particular: ’

© We present an algorithm for approximate furthest neighbor improving on

the query time of the previous state-of-the-art.

© We combine this algorithm with state-of-the-art approximate nearest

neighbor algorithms to address the approximate annulus query.

© We introduce the first non-trivial algorithm for approximate distance sensitive

membership without false negatives.

© We show that nearest neighbor preserving embeddings can be performed

faster by applying ideas from the framework of Fast Distance Preserving

Embeddings.

© We introduce and analyse a new randomized algorithm for set similarity

join, several times faster than previous algorithms.

the backbone of many of the tasks today at the frontier of computer science, from

machine learning to information retrieval. With this volume and complexity of

input we commonly accept that finding exact results for a given query will entail

prohibitively large storage or time requirements, so we pursue approximate results.

The main contribution of this dissertation is the introduction of new or improved

approximation algorithms and data structures for several similarity search

problems. We examine the furthest neighbor query, the annulus query, distance

sensitive membership, nearest neighbor preserving embeddings and set similarity

queries in the large-scale, high-dimensional setting. In particular: ’

© We present an algorithm for approximate furthest neighbor improving on

the query time of the previous state-of-the-art.

© We combine this algorithm with state-of-the-art approximate nearest

neighbor algorithms to address the approximate annulus query.

© We introduce the first non-trivial algorithm for approximate distance sensitive

membership without false negatives.

© We show that nearest neighbor preserving embeddings can be performed

faster by applying ideas from the framework of Fast Distance Preserving

Embeddings.

© We introduce and analyse a new randomized algorithm for set similarity

join, several times faster than previous algorithms.

Originalsprog | Engelsk |
---|

Forlag | IT-Universitetet i København |
---|---|

Antal sider | 148 |

ISBN (Trykt) | 978-87-7949-007-9 |

Status | Udgivet - 2017 |

Navn | ITU-DS |
---|---|

Nummer | 140 |

ISSN | 1602-3536 |