Abstract
We consider the problem of enumerating all instances of a given sample graph in a large data graph. Our focus is on determining the input/output (I/O) complexity of this problem. Let $E$ be the number of edges in the data graph, $k=\BO{1}$ be the number of vertexes in the sample graph, $B$ be the block length, and $M$ be the main memory size. The main result of the paper is a randomized algorithm that enumerates all instances of the sample graph in $\BO{E^{k/2}/\left(BM^{k/2-1}\right)}$ expected I/Os if the maximum vertex degree of the data graph is $\sqrt{EM}$. Under some assumptions, the same bound also applies with high probability.
Our algorithm is I/O optimal, in the worst-case, when the sample graph belongs to the Alon class, which includes cliques, cycles and every graph with a perfect matching: indeed, we show that any algorithm enumerating $T$ instances must always use $\BOM{T/\left(BM^{k/2-1}\right)}$ I/Os and there are graphs for which $T=\BOM{E^{k/2}}$. Finally, we propose a parallelization of the randomized algorithm in the MapReduce framework.
Our algorithm is I/O optimal, in the worst-case, when the sample graph belongs to the Alon class, which includes cliques, cycles and every graph with a perfect matching: indeed, we show that any algorithm enumerating $T$ instances must always use $\BOM{T/\left(BM^{k/2-1}\right)}$ I/Os and there are graphs for which $T=\BOM{E^{k/2}}$. Finally, we propose a parallelization of the randomized algorithm in the MapReduce framework.
Originalsprog | Engelsk |
---|---|
Publikationsdato | 11 sep. 2014 |
Status | Udgivet - 11 sep. 2014 |
Begivenhed | Sixth Workshop on Massive Data Algorithmics - University of Wroclaw, Wroclow, Polen Varighed: 11 sep. 2014 → 11 sep. 2014 Konferencens nummer: 6 http://madalgo.au.dk/events/past-major-events/massive-2014/ |
Workshop
Workshop | Sixth Workshop on Massive Data Algorithmics |
---|---|
Nummer | 6 |
Lokation | University of Wroclaw |
Land/Område | Polen |
By | Wroclow |
Periode | 11/09/2014 → 11/09/2014 |
Andet | Part of ALGO 2014 |
Internetadresse |
Emneord
- Graph enumeration
- I/O complexity
- Randomized algorithms
- Alon class graphs
- MapReduce parallelization