Projektdetaljer
Beskrivelse
The exponential growth of deep learning models and their adoption is facilitated on the one hand by modern hardware (GPUs), and on the other hand the availability of larger datasets. However, even if the training and deployment of deep learning applications is widely used in large-scale data centers, these exhibit low hardware utilization, barely reaching 50%, as shown by studies performed on Microsoft and Alibaba clusters. This waste of hardware resources is exacerbated by the expensive price of GPUs, while contributing to an unsustainable carbon footprint of AI systems.
Several factors contribute to deep learning applications not fully utilizing existing hardware resources. Potential problems in this context may include not having a large enough dataset to warrant a large model, e.g., transfer learning. Also, for deployments on resource-constrained devices at the edge, it may be required to employ a small fine-tuned model, and the ideal batch size for training the model may not be large enough to utilize all the GPU resources. Otherwise, there may be data preparation and movement bottlenecks at various points in the deep learning pipeline that have a negative impact on resource utilization.
Nevertheless, achieving effective utilization of hardware resources requires novel resource managers and schedulers that are aware of the resource needs of deep learning workloads and of the characteristics of modern hardware. Building such a resource manager and guidelines for it is the goal of the DEEP project.
Several factors contribute to deep learning applications not fully utilizing existing hardware resources. Potential problems in this context may include not having a large enough dataset to warrant a large model, e.g., transfer learning. Also, for deployments on resource-constrained devices at the edge, it may be required to employ a small fine-tuned model, and the ideal batch size for training the model may not be large enough to utilize all the GPU resources. Otherwise, there may be data preparation and movement bottlenecks at various points in the deep learning pipeline that have a negative impact on resource utilization.
Nevertheless, achieving effective utilization of hardware resources requires novel resource managers and schedulers that are aware of the resource needs of deep learning workloads and of the characteristics of modern hardware. Building such a resource manager and guidelines for it is the goal of the DEEP project.
| Kort titel | DEEP |
|---|---|
| Akronym | DEEP |
| Status | Igangværende |
| Effektiv start/slut dato | 01/08/2025 → 31/07/2029 |
Samarbejdspartnere
- IT-Universitetet i København
- University of Applied Sciences and Arts of Western Switzerland (leder)
Finansiering
- FONDO NAZIONALE SVIZZERO PER LA RICERCA SCIENTIFICA: 5.573.370,00 kr.
Fingerprint
Udforsk forskningsemnerne, som dette projekt berører. Disse etiketter er oprettet på grundlag af de underliggende bevillinger/legater. Sammen danner de et unikt fingerprint.