TY - JOUR
T1 - Efficient Placement of Decomposable Aggregation Functions for Stream Processing over Large Geo-Distributed Topologies
AU - Chatziliadis, Xenofon
AU - Tzirita Zacharatou, Eleni
AU - Eracar, Alphan
AU - Zeuch, Steffen
AU - Markl, Volker
PY - 2024/2
Y1 - 2024/2
N2 - A recent trend in stream processing is offloading the computation of decomposable aggregation functions (DAF) from cloud nodes to geo-distributed fog/edge devices to decrease latency and improve energy efficiency. However, deploying DAFs on low-end devices is challenging due to their volatility and limited resources. Additionally, in geo-distributed fog/edge environments, creating new operator instances on demand and replicating operators ubiquitously is restricted, posing challenges for achieving load balancing without overloading devices. Existing work predominantly focuses on cloud environments, overlooking DAF operator placement in resource-constrained and unreliable geo-distributed settings. This paper presents NEMO, a resource-aware optimization approach that determines the replication factor and placement of DAF operators in resource-constrained geo-distributed topologies. Leveraging Euclidean embeddings of network topologies and a set of heuristics, NEMO scales to millions of nodes and handles topological changes through adaptive re-placement and re-replication decisions. Compared to existing solutions, NEMO achieves up to 6× lower latency and up to 15× reduction in communication cost, while preventing overloaded nodes. Moreover, NEMO re-optimizes placements in constant time, regardless of the topology size. As a result, it lays the foundation to efficiently process continuous data streams on large, heterogeneous, and geo-distributed topologies.
AB - A recent trend in stream processing is offloading the computation of decomposable aggregation functions (DAF) from cloud nodes to geo-distributed fog/edge devices to decrease latency and improve energy efficiency. However, deploying DAFs on low-end devices is challenging due to their volatility and limited resources. Additionally, in geo-distributed fog/edge environments, creating new operator instances on demand and replicating operators ubiquitously is restricted, posing challenges for achieving load balancing without overloading devices. Existing work predominantly focuses on cloud environments, overlooking DAF operator placement in resource-constrained and unreliable geo-distributed settings. This paper presents NEMO, a resource-aware optimization approach that determines the replication factor and placement of DAF operators in resource-constrained geo-distributed topologies. Leveraging Euclidean embeddings of network topologies and a set of heuristics, NEMO scales to millions of nodes and handles topological changes through adaptive re-placement and re-replication decisions. Compared to existing solutions, NEMO achieves up to 6× lower latency and up to 15× reduction in communication cost, while preventing overloaded nodes. Moreover, NEMO re-optimizes placements in constant time, regardless of the topology size. As a result, it lays the foundation to efficiently process continuous data streams on large, heterogeneous, and geo-distributed topologies.
KW - Stream Processing
KW - Decomposable Aggregation Functions
KW - Fog/Edge Computing
KW - Geo-distributed Topologies
KW - Resource-aware Optimization
KW - Stream Processing
KW - Decomposable Aggregation Functions
KW - Fog/Edge Computing
KW - Geo-distributed Topologies
KW - Resource-aware Optimization
M3 - Journal article
SN - 2150-8097
VL - 17
SP - 1501
EP - 1514
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 6
ER -