[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Raster Intervals: An Approximation Technique for Polygon Intersection Joins

Published: 30 May 2023 Publication History

Abstract

Many data science applications, most notably Geographic Information Systems, require the computation of spatial joins between large object collections. The objective is to find pairs of objects that intersect, i.e., share at least one common point. The intersection test is very expensive especially for polygonal objects. Therefore, the objects are typically approximated by their minimum bounding rectangles (MBRs) and the join is performed in two steps. In the filter step, all pairs of objects whose MBRs intersect are identified as candidates; in the refinement step, each of the candidate pairs is verified for intersection. The refinement step has been shown notoriously expensive, especially for polygon-polygon joins, constituting the bottleneck of the entire process. We propose a novel approximation technique for polygons, which (i) rasterizes them using a fine grid, (ii) models groups of nearby cells that intersect a polygon as an interval, and (iii) encodes each interval by a bitstring that captures the overlap of each cell in it with the polygon. We also propose an efficient intermediate filter, which is applied on the object approximations before the refinement step, to avoid it for numerous object pairs. Via experimentation with real data, we show that the end-to-end spatial join cost can be reduced by up to one order of magnitude with the help of our filter and by at least three times compared to using alternative intermediate filters.

Supplemental Material

MP4 File
Presentation video for SIGMOD 2023

References

[1]
Danial Aghajarian and Sushil K. Prasad. 2017. A Spatial Join Algorithm Based on a Non-uniform Grid Technique over GPGPU. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 2017, Redondo Beach, CA, USA, November 7--10, 2017, Erik G. Hoel, Shawn D. Newsam, Siva Ravada, Roberto Tamassia, and Goce Trajcevski (Eds.). ACM, 56:1--56:4.
[2]
Danial Aghajarian, Satish Puri, and Sushil K. Prasad. 2016. GCMF: an efficient end-to-end spatial join system over large polygonal datasets on GPGPU platform. In Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 2016, Burlingame, California, USA, October 31 - November 3, 2016. ACM, 18:1--18:10.
[3]
Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, and Joel H. Saltz. 2013. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce. Proc. VLDB Endow., Vol. 6, 11 (2013), 1009--1020.
[4]
Lars Arge, Octavian Procopiuc, Sridhar Ramaswamy, Torsten Suel, and Jeffrey Scott Vitter. 1998. Scalable Sweeping-Based Spatial Join. In VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24--27, 1998, New York City, New York, USA, Ashish Gupta, Oded Shmueli, and Jennifer Widom (Eds.). Morgan Kaufmann, 570--581.
[5]
Wael M. Badawy and Walid G. Aref. 1999. On Local Heuristics to Speed Up Polygon-Polygon Intersection Tests. In ACM-GIS '99, Proceedings of the 7th International Symposium on Advances in Geographic Information Systems, November 2--6, 1999, Kansas City, USA. ACM, 97--102.
[6]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. 1990. The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, USA, May 23--25, 1990, Hector Garcia-Molina and H. V. Jagadish (Eds.). ACM Press, 322--331.
[7]
Thomas Brinkhoff, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. 1994. Multi-Step Processing of Spatial Joins. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, USA, May 24--27, 1994, Richard T. Snodgrass and Marianne Winslett (Eds.). ACM Press, 197--208.
[8]
Thomas Brinkhoff, Hans-Peter Kriegel, and Bernhard Seeger. 1993. Efficient Processing of Spatial Joins Using R-Trees. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, May 26--28, 1993. ACM Press, 237--246.
[9]
Ahmed Eldawy and Mohamed F. Mokbel. 2015. SpatialHadoop: A MapReduce framework for spatial data. In 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13--17, 2015, Johannes Gehrke, Wolfgang Lehner, Kyuseok Shim, Sang Kyun Cha, and Guy M. Lohman (Eds.). IEEE Computer Society, 1352--1363.
[10]
Yi Fang, Marc T. Friedman, Giri Nair, Michael Rys, and Ana-Elisa Schmid. 2008. Spatial indexing in microsoft SQL server 2008. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10--12, 2008. 1207--1216.
[11]
Stefan Gottschalk, Ming C. Lin, and Dinesh Manocha. 1996. OBBTree: A Hierarchical Structure for Rapid Interference Detection. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1996, New Orleans, LA, USA, August 4--9, 1996. 171--180.
[12]
David Hilbert. 1891. Über die stetige Abbildung einer Linie auf ein Fl"achenstück. Mathematische Annalen, Vol. 38, 1 ( 1891), 459--460.
[13]
Edwin H. Jacox and Hanan Samet. 2007. Spatial join techniques. ACM Trans. Database Syst., Vol. 32, 1 (2007), 7.
[14]
Andreas Kipf, Harald Lang, Varun Pandey, Raul Alexandru Persa, Christoph Anneser, Eleni Tzirita Zacharatou, Harish Doraiswamy, Peter A. Boncz, Thomas Neumann, and Alfons Kemper. 2020. Adaptive Main-Memory Indexing for High-Performance Point-Polygon Joins. In Proceedings of the 23rd International Conference on Extending Database Technology, EDBT 2020, Copenhagen, Denmark, March 30 - April 02, 2020. OpenProceedings.org, 347--358.
[15]
Yiming Liu and Satish Puri. 2020. Efficient Filters for Geometric Intersection Computations using GPU. In SIGSPATIAL '20: 28th International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, November 3--6, 2020, Chang-Tien Lu, Fusheng Wang, Goce Trajcevski, Yan Huang, Shawn D. Newsam, and Li Xiong (Eds.). ACM, 487--496.
[16]
Yiming Liu, Jie Yang, and Satish Puri. 2019. Hierarchical Filter and Refinement System Over Large Polygonal Datasets on CPU-GPU. In 26th IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC 2019, Hyderabad, India, December 17--20, 2019. IEEE, 141--151.
[17]
Nikos Mamoulis and Dimitris Papadias. 2003. Slot Index Spatial Join. IEEE Trans. Knowl. Data Eng., Vol. 15, 1 (2003), 211--231.
[18]
Guy Macdonald Morton. 1966. A computer Oriented Geodetic Data Base; and a New Technique in File Sequencing, Technical Report, Ottawa, Canada: IBM Ltd.
[19]
Sadegh Nobari, Farhan Tauheed, Thomas Heinis, Panagiotis Karras, Sté phane Bressan, and Anastasia Ailamaki. 2013. TOUCH: in-memory spatial join by hierarchical data-oriented partitioning. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22--27, 2013. ACM, 701--712.
[20]
Jack A. Orenstein. 1989. Redundancy in Spatial Databases. In Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data, Portland, Oregon, USA, May 31 - June 2, 1989. ACM Press, 295--305.
[21]
Varun Pandey, Andreas Kipf, Thomas Neumann, and Alfons Kemper. 2018. How Good Are Modern Spatial Analytics Systems? Proc. VLDB Endow., Vol. 11, 11 (2018), 1661--1673.
[22]
George Papadakis, Georgios M. Mandilaras, Nikos Mamoulis, and Manolis Koubarakis. 2021. Progressive, Holistic Geospatial Interlinking. In WWW '21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19--23, 2021. ACM / IW3C2, 833--844.
[23]
Jignesh M. Patel and David J. DeWitt. 1996. Partition Based Spatial-Merge Join. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4--6, 1996, H. V. Jagadish and Inderpal Singh Mumick (Eds.). ACM Press, 259--270.
[24]
Suprio Ray, Bogdan Simion, Angela Demke Brown, and Ryan Johnson. 2014. Skew-resistant parallel in-memory spatial join. In Conference on Scientific and Statistical Database Management, SSDBM '14, Aalborg, Denmark, June 30 - July 02, 2014. ACM, 6:1--6:12.
[25]
Michael Ian Shamos and Dan Hoey. 1976. Geometric Intersection Problems. In 17th Annual Symposium on Foundations of Computer Science, Houston, Texas, USA, 25--27 October 1976. IEEE Computer Society, 208--215.
[26]
Darius Sidlauskas, Sean Chester, Eleni Tzirita Zacharatou, and Anastasia Ailamaki. 2018. Improving Spatial Data Processing by Clipping Minimum Bounding Boxes. IEEE Computer Society, 425--436.
[27]
SpatialHadoop. 2015. TIGER datasets. http://spatialhadoop.cs.umn.edu/datasets.html
[28]
Ram Sriharsha. [n.,d.]. Magellan: Geospatial Analytics Using Spark. https://github.com/harsha2010/magellan.
[29]
Dejun Teng, Furqan Baig, Qiheng Sun, Jun Kong, and Fusheng Wang. 2021. IDEAL: a Vector-Raster Hybrid Model for Efficient Spatial Queries over Complex Polygons. In 22nd IEEE International Conference on Mobile Data Management, MDM 2021, Toronto, ON, Canada, June 15--18, 2021. 99--108.
[30]
Dimitrios Tsitsigkos, Panagiotis Bouros, Nikos Mamoulis, and Manolis Terrovitis. 2019. Parallel In-Memory Evaluation of Spatial Joins. In Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL 2019, Chicago, IL, USA, November 5--8, 2019. ACM, 516--519.
[31]
Dong Xie, Feifei Li, Bin Yao, Gefei Li, Liang Zhou, and Minyi Guo. 2016. Simba: Efficient In-Memory Spatial Analytics. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, Fatma Ö zcan, Georgia Koutrika, and Sam Madden (Eds.). ACM, 1071--1085.
[32]
Simin You, Jianting Zhang, and Le Gruenwald. 2015. Large-scale spatial join query processing in Cloud. In CloudDB, ICDE Workshops. 34--41.
[33]
Jia Yu, Zongsi Zhang, and Mohamed Sarwat. 2019. Spatial data management in apache spark: the GeoSpark perspective and beyond. GeoInformatica, Vol. 23, 1 (2019), 37--78.
[34]
Eleni Tzirita Zacharatou, Harish Doraiswamy, Anastasia Ailamaki, Clá udio T. Silva, and Juliana Freire. 2017. GPU Rasterization for Real-Time Spatial Aggregation over Arbitrary Polygons. Proc. VLDB Endow., Vol. 11, 3 (2017), 352--365.
[35]
Eleni Tzirita Zacharatou, Andreas Kipf, Ibrahim Sabek, Varun Pandey, Harish Doraiswamy, and Volker Markl. 2021. The Case for Distance-Bounded Spatial Approximations. In 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, January 11--15, 2021, Online Proceedings. www.cidrdb.org.
[36]
Shubin Zhang, Jizhong Han, Zhiyong Liu, Kai Wang, and Zhiyong Xu. 2009. SJMR: Parallelizing spatial join with MapReduce on clusters. In Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31 - September 4, 2009, New Orleans, Louisiana, USA. IEEE Computer Society, 1--8.
[37]
Geraldo Zimbrao and Jano Moreira de Souza. 1998. A Raster Approximation For Processing of Spatial Joins. In VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24--27, 1998, New York City, New York, USA. Morgan Kaufmann, 558--569.

Cited By

View all
  • (2024)Finding Logic Bugs in Spatial Database Engines via Affine Equivalent InputsProceedings of the ACM on Management of Data10.1145/36988102:6(1-26)Online publication date: 20-Dec-2024
  • (2024)Enabling Adaptive Sampling for Intra-Window Join: Simultaneously Optimizing Quantity and QualityProceedings of the ACM on Management of Data10.1145/36771342:4(1-31)Online publication date: 30-Sep-2024
  • (2024)RayJoin: Fast and Precise Spatial JoinProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656610(124-136)Online publication date: 30-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 1
PACMMOD
May 2023
2807 pages
EISSN:2836-6573
DOI:10.1145/3603164
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2023
Published in PACMMOD Volume 1, Issue 1

Permissions

Request permissions for this article.

Author Tags

  1. intervals
  2. polygons
  3. rasterization
  4. spatial filters
  5. spatial joins

Qualifiers

  • Research-article

Funding Sources

  • HFRI

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)294
  • Downloads (Last 6 weeks)27
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Finding Logic Bugs in Spatial Database Engines via Affine Equivalent InputsProceedings of the ACM on Management of Data10.1145/36988102:6(1-26)Online publication date: 20-Dec-2024
  • (2024)Enabling Adaptive Sampling for Intra-Window Join: Simultaneously Optimizing Quantity and QualityProceedings of the ACM on Management of Data10.1145/36771342:4(1-31)Online publication date: 30-Sep-2024
  • (2024)RayJoin: Fast and Precise Spatial JoinProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656610(124-136)Online publication date: 30-May-2024
  • (2024)GridMesaFuture Generation Computer Systems10.1016/j.future.2024.02.010155:C(324-339)Online publication date: 18-Jul-2024
  • (2024)Raster interval object approximations for spatial intersection joinsThe VLDB Journal10.1007/s00778-024-00887-434:1Online publication date: 12-Dec-2024
  • (2024)A learning-based framework for spatial join processing: estimation, optimization and tuningThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00836-133:4(1155-1177)Online publication date: 13-Feb-2024
  • (2024)Selectivity Estimation for Spatial Filters Using Optimizer Feedback: A Machine Learning PerspectiveWeb Information Systems Engineering – WISE 202410.1007/978-981-96-0573-6_8(101-115)Online publication date: 2-Dec-2024
  • (2024): A Spatial Polygon Learned IndexFoundations of Intelligent Systems10.1007/978-3-031-62700-2_24(271-281)Online publication date: 17-Jun-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media