Analysis of Geospatial Data Loading

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

The rate at which applications gather geospatial data today has turned data loading into a critical component of data analysis pipelines. However, users are confronted with multiple file formats for storing geospatial data and an array of systems for processing it. To shed light on how the choice of file format and system affects performance, this paper explores the performance of loading geospatial data stored in diverse file formats using different libraries. It aims to study the impact of different file formats, compare loading throughput across spatial libraries, and examine the microarchitectural behavior of geospatial data loading. Our findings show that GeoParquet files provide the highest loading throughput across all benchmarked libraries. Furthermore, we note that the more spatial features per byte a file format can store, the higher the data loading throughput. Our micro-architectural analysis reveals high instructions per cycle (IPC) during spatial data loading for most libraries and formats. Additionally, our experiments show that instruction misses dominate L1 cache misses, except for GeoParquet files, where data misses take over.
Original languageEnglish
Title of host publicationDBTest '24: Proceedings of the Tenth International Workshop on Testing Database Systems
Number of pages7
PublisherAssociation for Computing Machinery
Publication date9 Jun 2024
Pages36-42
ISBN (Electronic)9798400706691
DOIs
Publication statusPublished - 9 Jun 2024

Keywords

  • spatial libraries
  • benchmarking
  • micro-architectural analysis
  • database performance evaluation
  • geographic information systems

Fingerprint

Dive into the research topics of 'Analysis of Geospatial Data Loading'. Together they form a unique fingerprint.

Cite this