Abstract
Spatial data is ubiquitous. Massive amounts of data are generated every day from a plethora of sources such as billions of GPS-enabled
devices (e.g., cell phones, cars, and sensors), consumer-based applications (e.g., Uber and Strava), and social media platforms (e.g.,
location-tagged posts on Facebook, Twitter, and Instagram). This exponential growth in spatial data has led the research community
to build systems and applications for efficient spatial data processing.
In this study, we apply a recently developed machine-learned search technique for single-dimensional sorted data to spatial indexing.
Specifically, we partition spatial data using six traditional spatial partitioning techniques and employ machine-learned search within
each partition to support point, range, distance, and spatial join queries. Adhering to the latest research trends, we tune the partitioning
techniques to be instance-optimized. By tuning each partitioning technique for optimal performance, we demonstrate that: (i) grid-based
index structures outperform tree-based index structures (from 1.23× to 2.47×), (ii) learning-enhanced variants of commonly used spatial
index structures outperform their original counterparts (from 1.44× to 53.34× faster), (iii) machine-learned search within a partition
is faster than binary search by 11.79% - 39.51% when filtering on one dimension, (iv) the benefit of machine-learned search diminishes
in the presence of other compute-intensive operations (e.g. scan costs in higher selectivity queries, Haversine distance computation, and
point-in-polygon tests), and (v) index lookup is the bottleneck for tree-based structures, which could potentially be reduced by linearizing
the indexed partitions.
Additional Key Words and Phrases: spatial data, indexing, machine-learning, spatial queries, geospatial
devices (e.g., cell phones, cars, and sensors), consumer-based applications (e.g., Uber and Strava), and social media platforms (e.g.,
location-tagged posts on Facebook, Twitter, and Instagram). This exponential growth in spatial data has led the research community
to build systems and applications for efficient spatial data processing.
In this study, we apply a recently developed machine-learned search technique for single-dimensional sorted data to spatial indexing.
Specifically, we partition spatial data using six traditional spatial partitioning techniques and employ machine-learned search within
each partition to support point, range, distance, and spatial join queries. Adhering to the latest research trends, we tune the partitioning
techniques to be instance-optimized. By tuning each partitioning technique for optimal performance, we demonstrate that: (i) grid-based
index structures outperform tree-based index structures (from 1.23× to 2.47×), (ii) learning-enhanced variants of commonly used spatial
index structures outperform their original counterparts (from 1.44× to 53.34× faster), (iii) machine-learned search within a partition
is faster than binary search by 11.79% - 39.51% when filtering on one dimension, (iv) the benefit of machine-learned search diminishes
in the presence of other compute-intensive operations (e.g. scan costs in higher selectivity queries, Haversine distance computation, and
point-in-polygon tests), and (v) index lookup is the bottleneck for tree-based structures, which could potentially be reduced by linearizing
the indexed partitions.
Additional Key Words and Phrases: spatial data, indexing, machine-learning, spatial queries, geospatial
Originalsprog | Engelsk |
---|---|
Sider | 1-26 |
Antal sider | 26 |
DOI | |
Status | Udgivet - 2023 |
Emneord
- spatial data
- indexing
- machine-learning
- spatial queries
- geospatial