Abstract
Integrated data analysis (IDA) pipelines---that combine data management (DM) and query processing, high-performance computing (HPC), and machine learning (ML) training and scoring---become increasingly common in practice. Interestingly, systems of these areas share many compilation and runtime techniques, and the used---increasingly heterogeneous---hardware infrastructure converges as well. Yet, the programming paradigms, cluster resource management, data formats and representations, as well as execution strategies differ substantially. DAPHNE is an open and extensible system infrastructure for such IDA pipelines, including language abstractions, compilation and runtime techniques, multi-level scheduling, hardware (HW) accelerators, and computational storage for increasing productivity and eliminating unnecessary overheads. In this paper, we make a case for IDA pipelines, describe the overall DAPHNE system architecture, its key components, and the design of a vectorized execution engine for computational storage, HW accelerators, as well as local and distributed operations. Preliminary experiments that compare DAPHNE with MonetDB, Pandas, DuckDB, and TensorFlow show promising results.
| Original language | English |
|---|---|
| Title of host publication | Conference on Innovative Data Systems Research |
| Place of Publication | Santa Cruz, California, USA |
| Publication date | 9 Jan 2022 |
| Publication status | Published - 9 Jan 2022 |
| Event | Conference on Innovative Data Systems Research - Chaminade Resort & Spa, Chaminade, United States Duration: 9 Jan 2022 → 12 Jan 2022 https://www.cidrdb.org/cidr2022/index.html |
Conference
| Conference | Conference on Innovative Data Systems Research |
|---|---|
| Location | Chaminade Resort & Spa |
| Country/Territory | United States |
| City | Chaminade |
| Period | 09/01/2022 → 12/01/2022 |
| Internet address |
Keywords
- Integrated Data Analysis
- High-Performance Computing
- Machine Learning Pipelines
- DAPHNE System
- Vectorized Execution Engine
Fingerprint
Dive into the research topics of 'DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver