TPCx-AI on NVIDIA Jetsons

Robert Bayer, Jon Voigt Tøttrup, Pinar Tözün

Publikation: Artikel i tidsskrift og konference artikel i tidsskriftKonferenceartikelForskningpeer review


Despite their resource- and power-constrained nature, edge devices also exhibit an increase in the available compute and memory resources and heterogeneity, similar to the evolution of server hardware in the past decade. For example, NVIDIA Jetson devices have a system-on-chip (SoC) composed of an ARM CPU and an NVIDIA GPU sharing RAM that could be up to 32 GB. Such an SoC setup offers opportunities to push down complex computations closer to the data source rather than performing them on remote servers.
In this paper, we characterize the performance of two types of NVIDIA Jetson devices for end-to-end machine learning pipelines using the TPCx-AI benchmark. Our results demonstrate that the available memory is the main limitation to performance and scaling up machine learning workloads on edge devices. Despite this limitation, some edge devices show promise when comparing against a desktop hardware in terms of power-efficiency and reduction in data movement. In addition, exploiting the available compute parallelism on these devices can benefit not just model training and inference but also data pre-processing. By parallelizing, we get close to an order of magnitude improvement in pre-processing time for one of the TPCx-AI use cases. Finally, while TPCx-AI is a valuable benchmark, it is designed for server settings; therefore, the community needs an end-to-end machine learning benchmark targeting IoT/edge.
BogserieLecture Notes in Computer Science
Sider (fra-til)49-66
Antal sider18
StatusUdgivet - 2022


Dyk ned i forskningsemnerne om 'TPCx-AI on NVIDIA Jetsons'. Sammen danner de et unikt fingeraftryk.