Similarity Problems in High Dimensions

Research output: Book / Anthology / Report / Ph.D. thesisPh.D. thesis

View graph of relations

Similarity computations on large amounts of high-dimensional data has become the backbone of many of the tasks today at the frontier of computer science, from machine learning to information retrieval. With this volume and complexity of input we commonly accept that finding exact results for a given query will entail prohibitively large storage or time requirements, so we pursue approximate results. The main contribution of this dissertation is the
introduction of new or improved approximation algorithms and data structures for several similarity search problems. We examine the furthest neighbor query, the annulus query, distance sensitive membership, nearest neighbor preserving embeddings and set similarity queries in the large-scale, high-dimensional setting. In particular:
• We present an algorithm for approximate furthest neighbor improving
on the query time of the previous state-of-the-art.
• We combine this algorithm with state-of-the-art approximate
nearest neighbor algorithms to address the approximate annulus
• We introduce the first non-trivial algorithm for approximate
distance sensitive membership without false negatives.
• We show that nearest neighbor preserving embeddings can be performed
faster by applying ideas from the framework of Fast
Distance Preserving Embeddings.
• We introduce and analyse a new randomized algorithm for set
similarity join, several times faster than previous algorithms.
Original languageEnglish
PublisherIT-Universitetet i København
Number of pages148
ISBN (Print)978-87-7949-007-9
Publication statusPublished - 2017


No data available

ID: 83177791