Similarity Search: Algorithms for Sets and other High Dimensional Data

Thomas Dybdahl Ahle

Publikation: Bog / Antologi / Rapport / Ph.D.-afhandlingPh.d.-afhandling


We study five fundemental problems in the field of high dimensional algorithms,
similarity search and machine learning. We obtain many new results, including:
• Data structures with Las Vegas guarantees for l-Approximate Near Neighbours and Braun Blanquet Approximate Set Similarity Search with performance matching state of the art Monte Carlo algorithms up to factors nο(1).• “Supermajorities:” The first space/time tradeoff for Approximate Set Similarity
Search, along with a number of lower bounds suggesting that the algorithms are optimal among all Locality Sensitive Hashing (LSH) data structures.
• The first lower bounds on approximative data structures based on the Orthogonal Vectors Conjecture (OVC).
• Output Sensitive LSH data structures with query time near t + nρ for outputting
t distinct elements, where nρ is the state of the art time to output a single element and tnρ was the previous best for t items.
• A Johnson Lindenstrauss embedding with fast multiplication on tensors (a Tensor Sketch), with high probability guarantees for subspace- and nearneighbour-embeddings.
ForlagIT University of Copenhagen
Antal sider158
StatusUdgivet - 2019


Dyk ned i forskningsemnerne om 'Similarity Search: Algorithms for Sets and other High Dimensional Data'. Sammen danner de et unikt fingeraftryk.