TY - JOUR
T1 - Global communication schemes for the numerical solution of high-dimensional PDEs
AU - Hupp, Philipp
AU - Heene, Mario
AU - Jacob, Riko
AU - Pflüger, Dirk
PY - 2016/2
Y1 - 2016/2
N2 - The numerical treatment of high-dimensional partial differential equations is among the most compute-hungry problems and in urgent need for current and future high-performance computing (HPC) systems. It is thus also facing the grand challenges of exascale computing such as the requirement to reduce global communication. To cope with high dimensionalities we employ a hierarchical discretization scheme, the sparse grid combination technique. Based on an extrapolation scheme, the combination technique additionally mitigates the need for global communication: multiple and much smaller problems can be computed independently for each time step, and the global communication shrinks to a reduce/broadcast step in between. Here, we focus on this remaining synchronization step of the combination technique and present two communication schemes designed to either minimize the number of communication rounds or the total communication volume. Experiments on two different supercomputers show that either of the schemes outperforms the other depending on the size of the problem. Furthermore, we present a communication model based on the system’s latency and bandwidth and validate the model with the experiments. The model can be used to predict the runtime of the reduce/broadcast step for dimensionalities that are yet out of scope on current supercomputers.
AB - The numerical treatment of high-dimensional partial differential equations is among the most compute-hungry problems and in urgent need for current and future high-performance computing (HPC) systems. It is thus also facing the grand challenges of exascale computing such as the requirement to reduce global communication. To cope with high dimensionalities we employ a hierarchical discretization scheme, the sparse grid combination technique. Based on an extrapolation scheme, the combination technique additionally mitigates the need for global communication: multiple and much smaller problems can be computed independently for each time step, and the global communication shrinks to a reduce/broadcast step in between. Here, we focus on this remaining synchronization step of the combination technique and present two communication schemes designed to either minimize the number of communication rounds or the total communication volume. Experiments on two different supercomputers show that either of the schemes outperforms the other depending on the size of the problem. Furthermore, we present a communication model based on the system’s latency and bandwidth and validate the model with the experiments. The model can be used to predict the runtime of the reduce/broadcast step for dimensionalities that are yet out of scope on current supercomputers.
KW - High-Performance Computing
KW - Experimental Evaluation
KW - Communication Performance Analysis
KW - Communication Model
KW - Sparse Grid Combination Technique
KW - Global Communication
KW - High-Performance Computing
KW - Experimental Evaluation
KW - Communication Performance Analysis
KW - Communication Model
KW - Sparse Grid Combination Technique
KW - Global Communication
U2 - 10.1016/j.parco.2015.12.006
DO - 10.1016/j.parco.2015.12.006
M3 - Journal article
SN - 0167-8191
VL - 52
SP - 78
EP - 105
JO - Parallel Computing
JF - Parallel Computing
ER -