Similarity Problems in High Dimensions

Johan von Tangen Sivertsen

Research output: Book / Anthology / Report / Ph.D. thesisPh.D. thesis


Similarity computations on large amounts of high-dimensional data has become the backbone of many of the tasks today at the frontier of computer science, from machine learning to information retrieval. With this volume and complexity of input we commonly accept that finding exact results for a given query will entail prohibitively large storage or time requirements, so we pursue approximate results. The main contribution of this dissertation is the
introduction of new or improved approximation algorithms and data structures for several similarity search problems. We examine the furthest neighbor query, the annulus query, distance sensitive membership, nearest neighbor preserving embeddings and set similarity queries in the large-scale, high-dimensional setting. In particular:
• We present an algorithm for approximate furthest neighbor improving
on the query time of the previous state-of-the-art.
• We combine this algorithm with state-of-the-art approximate
nearest neighbor algorithms to address the approximate annulus
• We introduce the first non-trivial algorithm for approximate
distance sensitive membership without false negatives.
• We show that nearest neighbor preserving embeddings can be performed
faster by applying ideas from the framework of Fast
Distance Preserving Embeddings.
• We introduce and analyse a new randomized algorithm for set
similarity join, several times faster than previous algorithms.
Original languageEnglish
PublisherIT-Universitetet i København
Number of pages148
ISBN (Print)978-87-7949-007-9
Publication statusPublished - 2017


Dive into the research topics of 'Similarity Problems in High Dimensions'. Together they form a unique fingerprint.

Cite this