TY - GEN
T1 - Good Intentions: Adaptive Parameter Management via Intent Signaling
AU - Renz-Wieland, Alexander
AU - Kieslinger, Andreas
AU - Gericke, Robert
AU - Gemulla, Rainer
AU - Kaoudi, Zoi
AU - Markl, Volker
PY - 2023
Y1 - 2023
N2 - Model parameter management is essential for distributed training of large machine learning (ML) tasks. Some ML tasks are hard to distribute because common approaches to parameter management can be highly inefficient. Advanced parameter management approaches---such as selective replication or dynamic parameter allocation---can improve efficiency, but they typically need to be integrated manually into each task's implementation and they require expensive upfront experimentation to tune correctly. In this work, we explore whether these two problems can be avoided. We first propose a novel intent signaling mechanism that integrates naturally into existing ML stacks and provides the parameter manager with crucial information about parameter accesses. We then describe AdaPM, a fully adaptive, zero-tuning parameter manager based on this mechanism. In contrast to prior parameter managers, our approach decouples how access information is provided (simple) from how and when it is exploited (hard). In our experimental evaluation, AdaPM matched or outperformed state-of-the-art parameter managers out of the box, suggesting that automatic parameter management is possible.
AB - Model parameter management is essential for distributed training of large machine learning (ML) tasks. Some ML tasks are hard to distribute because common approaches to parameter management can be highly inefficient. Advanced parameter management approaches---such as selective replication or dynamic parameter allocation---can improve efficiency, but they typically need to be integrated manually into each task's implementation and they require expensive upfront experimentation to tune correctly. In this work, we explore whether these two problems can be avoided. We first propose a novel intent signaling mechanism that integrates naturally into existing ML stacks and provides the parameter manager with crucial information about parameter accesses. We then describe AdaPM, a fully adaptive, zero-tuning parameter manager based on this mechanism. In contrast to prior parameter managers, our approach decouples how access information is provided (simple) from how and when it is exploited (hard). In our experimental evaluation, AdaPM matched or outperformed state-of-the-art parameter managers out of the box, suggesting that automatic parameter management is possible.
KW - Distributed Training
KW - Parameter Management
KW - Selective Replication
KW - Dynamic Parameter Allocation
KW - Adaptive Algorithms
U2 - 10.48550/arXiv.2206.00470
DO - 10.48550/arXiv.2206.00470
M3 - Article in proceedings
BT - CIKM
ER -