Skip to main navigation Skip to search Skip to main content

Scalable Computation of Acyclic Joins

Anna Östlin Pagh, Rasmus Pagh

Research output: Book / Anthology / ReportReportResearch

Abstract

The join operation of relational algebra is a cornerstone of relational database systems. Computing the join of several relations is NP-hard in general, whereas special (and typical) cases are tractable. This paper primarily considers joins having an acyclic join graph, for which current methods initially apply a full reducer to efficiently eliminate tuples that will not contribute to the result of the join. The previously best worst case time for computing an acyclic join of k fully reduced relations, occupying a total of n blocks on disk, is Omega(sort(n) log k + zk) I/Os, where sort(n) is the time for sorting the data of n disk blocks, and z is the size of the output in blocks. Even if the output is small, the log k factor gives a significant overhead when joining many relations.
In this paper we show how to compute the join in a time bound that is within a constant factor of the cost of running a full reducer plus sorting the output. For a broad class of acyclic join graphs this is O(sort(n+z)) I/Os, removing the dependence on k from previous bounds. Traditional methods decompose the join into a number of binary joins, which are then carried out one at a time (with some parallelism if pipelining is possible). Departing from this approach, our technique is based on computing the size of certain subsets of the result, and using these sizes to compute the location(s) of each data item in the result. We can then assemble the result using a single sorting step.

Finally, as an initial study of cyclic joins in the I/O model, we show how to compute a join whose join graph is a 3-cycle, in O(n2/m+sort(n+z)) I/Os, where m is the number of blocks in internal memory. Previous techniques also have a quadratic dependence on n, but do not utilize internal memory this well.
Original languageEnglish
Place of PublicationCopenhagen
PublisherIT-Universitetet i København
EditionTR-2005-75
Number of pages14
ISBN (Electronic)87-7949-113-8
Publication statusPublished - Dec 2005
Externally publishedYes
SeriesIT University Technical Report Series
NumberTR-2005-75
ISSN1600-6100

Keywords

  • Join algorithms
  • Acyclic join graphs
  • External memory I/O
  • Sorting-based query processing
  • Cyclic joins

Fingerprint

Dive into the research topics of 'Scalable Computation of Acyclic Joins'. Together they form a unique fingerprint.

Cite this