On the Complexity of Inner Product Similarity Join

Thomas Dybdahl Ahle, Rasmus Pagh, Ilya Razenshteyn, Francesco Silvestri

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review


A number of tasks in classification, information retrieval, recommendation systems, and record linkage reduce to the core problem of inner product similarity join (IPS join): identifying pairs of vectors in a collection that have a sufficiently large inner product. IPS join is well understood when vectors are normalized and some approximation of inner products is allowed. However, the general case where vectors may have any length appears much more challenging. Recently, new upper bounds based on asymmetric locality-sensitive hashing (ALSH) and asymmetric embeddings have emerged, but little has been known on the lower bound side. In this paper we initiate a systematic study of inner product similarity join, showing new lower and upper bounds. Our main results are: Approximation hardness of IPS join in subquadratic time, assuming the strong exponential time hypothesis. New upper and lower bounds for (A)LSH-based algorithms. In particular, we show that asymmetry can be avoided by relaxing the LSH definition to only consider the collision probability of distinct elements. A new indexing method for IPS based on linear sketches, implying that our hardness results are not far from being tight.

Our technical contributions include new asymmetric embeddings that may be of independent interest. At the conceptual level we strive to provide greater clarity, for example by distinguishing among signed and unsigned variants of IPS join and shedding new light on the effect of asymmetry.
Original languageEnglish
Title of host publicationPODS '16 Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
PublisherAssociation for Computing Machinery
Publication date2016
ISBN (Electronic)978-1-4503-4191-2
Publication statusPublished - 2016


  • Inner Product Similarity Join (IPS)
  • Approximation Hardness
  • Asymmetric Locality-Sensitive Hashing (ALSH)
  • Asymmetric Embeddings
  • Linear Sketches


Dive into the research topics of 'On the Complexity of Inner Product Similarity Join'. Together they form a unique fingerprint.

Cite this