Similarity Problems in High Dimensions

Johan von Tangen Sivertsen

Publikation: Bog / Antologi / Rapport / Ph.D.-afhandlingPh.d.-afhandling


Similarity computations on large amounts of high-dimensional data has become
the backbone of many of the tasks today at the frontier of computer science, from
machine learning to information retrieval. With this volume and complexity of
input we commonly accept that finding exact results for a given query will entail
prohibitively large storage or time requirements, so we pursue approximate results.
The main contribution of this dissertation is the introduction of new or improved
approximation algorithms and data structures for several similarity search
problems. We examine the furthest neighbor query, the annulus query, distance
sensitive membership, nearest neighbor preserving embeddings and set similarity
queries in the large-scale, high-dimensional setting. In particular: ’
© We present an algorithm for approximate furthest neighbor improving on
the query time of the previous state-of-the-art.
© We combine this algorithm with state-of-the-art approximate nearest
neighbor algorithms to address the approximate annulus query.
© We introduce the first non-trivial algorithm for approximate distance sensitive
membership without false negatives.
© We show that nearest neighbor preserving embeddings can be performed
faster by applying ideas from the framework of Fast Distance Preserving
© We introduce and analyse a new randomized algorithm for set similarity
join, several times faster than previous algorithms.
ForlagIT-Universitetet i København
Antal sider148
ISBN (Trykt)978-87-7949-007-9
StatusUdgivet - 2017


Dyk ned i forskningsemnerne om 'Similarity Problems in High Dimensions'. Sammen danner de et unikt fingeraftryk.