Analysis of Geospatial Data Loading

Publikation: Konference artikel i Proceeding eller bog/rapport kapitelKonferencebidrag i proceedingsForskningpeer review

Abstract

The rate at which applications gather geospatial data today has turned data loading into a critical component of data analysis pipelines. However, users are confronted with multiple file formats for storing geospatial data and an array of systems for processing it. To shed light on how the choice of file format and system affects performance, this paper explores the performance of loading geospatial data stored in diverse file formats using different libraries. It aims to study the impact of different file formats, compare loading throughput across spatial libraries, and examine the microarchitectural behavior of geospatial data loading. Our findings show that GeoParquet files provide the highest loading throughput across all benchmarked libraries. Furthermore, we note that the more spatial features per byte a file format can store, the higher the data loading throughput. Our micro-architectural analysis reveals high instructions per cycle (IPC) during spatial data loading for most libraries and formats. Additionally, our experiments show that instruction misses dominate L1 cache misses, except for GeoParquet files, where data misses take over.
OriginalsprogEngelsk
TitelDBTest '24: Proceedings of the Tenth International Workshop on Testing Database Systems
Antal sider7
ForlagAssociation for Computing Machinery
Publikationsdato9 jun. 2024
Sider36-42
ISBN (Elektronisk)9798400706691
DOI
StatusUdgivet - 9 jun. 2024

Fingeraftryk

Dyk ned i forskningsemnerne om 'Analysis of Geospatial Data Loading'. Sammen danner de et unikt fingeraftryk.

Citationsformater