[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3139958.3139963acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

Spatio-Temporal Join on Apache Spark

Published: 07 November 2017 Publication History

Abstract

Effective processing of extremely large volumes of spatial data has led to many organizations employing distributed processing frameworks. Apache Spark is one such open-source framework that is enjoying widespread adoption. Within this data space, it is important to note that most of the observational data (i.e., data collected by sensors, either moving or stationary) has a temporal component, or timestamp. In order to perform advanced analytics and gain insights, the temporal component becomes equally important as the spatial and attribute components. In this paper, we detail several variants of a spatial join operation that addresses both spatial, temporal, and attribute-based joins. Our spatial join technique differs from other approaches in that it combines spatial, temporal, and attribute predicates in the join operator.
In addition, our spatio-temporal join algorithm and implementation differs from others in that it runs in commercial off-the-shelf (COTS) application. The users of this functionality are assumed to be GIS analysts with little if any knowledge of the implementation details of spatio-temporal joins or distributed processing. They are comfortable using simple tools that do not provide the ability to tweak the configuration of the

References

[1]
Abel, D. J., Ooi, B. C., Tan, K.-L., Power, R., and Yu, J. X. 1995. Spatial join strategies in distributed spatial DBMS. In Advances in Spatial Databases -- 4th International Symposium, SSD'95, vol. 1619 of Springer-Verlag Lecture Notes in Computer Science. Portland, ME, 348--367.
[2]
Aji A., Wang, F., Vo H., Lee, R., Liu, Q., Zhang, X. and Saltz, J. 2013. Hadoop-GIS: a high performance spatial data warehousing system over mapreduce. In Proceedings of the VLDB Endowment, 6 (11), (pp. 1009--1020).
[3]
Baig F., Mehrotra M., Vo H., Wang F., Saltz J., Kurc T. 2016. SparkGIS: efficient comparison and evaluation of algorithm results in tissue image analysis studies. In Biomedical Data Management and Graph Online Querying. Big-O(Q) 2015, DMAH 2015. Lecture Notes in Computer Science, vol 9579. Springer.
[4]
Brinkhoff, T., Kriegel, H. P., and Seeger, B. 1996. Parallel processing of spatial joins using r-trees. In Proceedings of the 12th International Conference on Data Engineering (pp. 258--265). IEEE.
[5]
Dittrich, J. P., and Seeger, B. 2000. Data redundancy and duplicate detection in spatial join processing. In Data Engineering, 2000. Proceedings of the 16th International Conference on (pp. 535--546). IEEE.
[6]
Du, Z., Zhao, X., Ye, X., Zhou, J., Zhang, F., and Liu, R. 2017. An effective high-performance multiway spatial join algorithm with spark. ISPRS International Journal of Geo-Information, 6 (4): 96.
[7]
Eldawy, A., and Mokbel, M. F. 2015. SpatialHadoop: a mapreduce framework for spatial data. In Data Engineering (ICDE), 2015 IEEE 31st International Conference on (pp. 1352--1363). IEEE.
[8]
Esri. 2013. GIS Tools for Hadoop. https://github.com/Esri/gis-tools-for-hadoop (referenced 2017/06).
[9]
Esri. 2016. ArcGIS GeoAnalytics Server. http://server.arcgis.com/en/server/latest/get-started/windows/what-is-arcgis-geoanalytics-server-.htm (referenced 2017/06).
[10]
Ester, M., Kriegel, H. P., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), pp. 226--231.
[11]
Gargantini, I. 1982. An effective way to represent quadtrees. Communications of the ACM 25, 12 (December 1982), 905--910.
[12]
Guttman, A. 1984. "R-trees: a dynamic index structure for spatial searching," in Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, (pp. 47--57).
[13]
Hagedorn, S., Götze, P., and Sattler, K. U. 2017. The STARK framework for spatio-temporal data analytics on spark. In Proceedings of the 17th Conference on Database Systems for Business, Technology, and the Web (BTW 2017), Stuttgart, Germany, March 2017.
[14]
Hjaltason, G., and Samet, H. 2002. Speeding up construction of PMR quadtree-based spatial indexes. VLDB Journal, 11, 2 (October 2002), 109--137.
[15]
Hoel, E. and Samet, H. 1994. Data-parallel spatial join algorithms. In Proceedings of the 23rd International Conference on Parallel Processing. Vol. 3. St. Charles, IL, 227--234.
[16]
Jacox, E. H., and Samet, H. 2007. Spatial join techniques. ACM Transactions on Database Systems (TODS), 32(1), 7.
[17]
Kornacker, M., and Erickson, J. 2012. Cloudera Impala: Real Time Queries in Apache Hadoop, For Real. http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real.
[18]
Mamoulis, N. and Papadias, D. 2001. Multiway spatial joins. ACM Transactions on Database Systems 26, 4 (Dec.), 424--475.
[19]
Nelson, R. C., and Samet, H. 1986. A consistent hierarchical representation for vector data. In ACM SIGGRAPH Computer Graphics, 20 (4), pp. 197--206). ACM.
[20]
Nievergelt, J., Hinterberger, H., and Sevcik, K. C. 1984. The grid file: An adaptable, symmetric multikey file structure. ACM Transactions on Database Systems (TODS), 9(1), 38--71.
[21]
Orenstein, Jack A. "Multidimensional tries used for associative searching." Information Processing Letters 14.4 (1982): 150--157.
[22]
Raad, M. 2013. BigData Spatial Joins, Blog post. http://thunderheadxpler.blogspot.com/2013/10/bigdata-spatial-joins.html (referenced 2017/06).
[23]
Sellis, T. K., Roussopoulos, N., and Faloutsos, C. 1987. The R+-tree: a dynamic index for multi-dimensional objects, in Proceedings of the 13th International Conference on Very Large Data Bases (VLDB), (pp. 507--518).
[24]
Sriharsha, R., 2015. Magellan: geospatial analytics on spark, https://hortonworks.com/blog/magellan-geospatial-analytics-in-spark/ (referenced 2017/06).
[25]
Tang, MingJie, Yongyang Yu, Qutaibah M. Malluhi, Mourad Ouzzani and Walid G. Aref. "LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data." PVLDB 9 (2016): 1565--1568.
[26]
Valduriez, P., and Gardarin, G. 1984. Join and semijoin algorithms for a multiprocessor database machine. ACM Transactions on Database Systems (TODS), 9(1), 133--161.
[27]
White, T. 2009. Hadoop: The Definitive Guide (1st ed.). O'Reilly Media, Inc., Sebastopol, CA, USA.
[28]
Whitman, R. T., Park, M. B., Ambrose, S. M., and Hoel, E. G. 2014. Spatial indexing and analytics on hadoop. In Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (pp. 73--82). ACM.
[29]
Dong Xie, Feifei Li, Bin Yao, Gefei Li, Liang Zhou, and Minyi Guo. 2016. Simba: Efficient In-Memory Spatial Analytics. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, New York, NY, USA, 1071--1085.
[30]
You, S., Zhang, J., and Gruenwald, L. 2015. Large-scale spatial join query processing in cloud. In Data Engineering Workshops (ICDEW), 2015 31st IEEE International Conference on (pp. 34--41). IEEE.
[31]
Yu, J., Wu, J., and Sarwat, M. 2015. Geospark: A cluster computing framework for processing large-scale spatial data. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems (p. 70). ACM.
[32]
Zaharia, M., Chowdhury, M., Franklin, M., Shenker, S., and Stoica, I. 2010. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '10), Boston, June 2010.
[33]
Zhang, S., Han, J., Liu, Z., Wang, K., and Xu, Z. 2009. SJMR: Parallelizing spatial join with mapreduce on clusters. In Cluster Computing and Workshops, 2009. CLUSTER'09. IEEE international conference on (pp. 1--8). IEEE.
[34]
Zhong, Yunqin, Jizhong Han, Tieying Zhang, Zhenhua Li, Jinyun Fang and Guihai Chen. "Towards Parallel Spatial Query Processing for Big Spatial Data." 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (2012): 2085--2094.

Cited By

View all
  • (2024)SpatialSSJP: QoS-Aware Adaptive Approximate Stream-Static Spatial Join ProcessorIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.333066935:1(73-88)Online publication date: Jan-2024
  • (2022)Optimization of Population Mobility Model of Artificial Society Based on Big Data Analytics2022 8th International Conference on Big Data and Information Analytics (BigDIA)10.1109/BigDIA56350.2022.9874104(213-219)Online publication date: 24-Aug-2022
  • (2021)Spatio-Temporal Grid Partitioning for Large Spatio-Temporal Data Query Processing on SparkJournal of Digital Contents Society10.9728/dcs.2021.22.8.131522:8(1315-1322)Online publication date: 31-Aug-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGSPATIAL '17: Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
November 2017
677 pages
ISBN:9781450354905
DOI:10.1145/3139958
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. HDFS
  2. Hadoop
  3. Spark
  4. Spatial join
  5. distributed processing
  6. spatio-temporal join

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SIGSPATIAL'17
Sponsor:

Acceptance Rates

SIGSPATIAL '17 Paper Acceptance Rate 39 of 193 submissions, 20%;
Overall Acceptance Rate 257 of 1,238 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)1
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)SpatialSSJP: QoS-Aware Adaptive Approximate Stream-Static Spatial Join ProcessorIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.333066935:1(73-88)Online publication date: Jan-2024
  • (2022)Optimization of Population Mobility Model of Artificial Society Based on Big Data Analytics2022 8th International Conference on Big Data and Information Analytics (BigDIA)10.1109/BigDIA56350.2022.9874104(213-219)Online publication date: 24-Aug-2022
  • (2021)Spatio-Temporal Grid Partitioning for Large Spatio-Temporal Data Query Processing on SparkJournal of Digital Contents Society10.9728/dcs.2021.22.8.131522:8(1315-1322)Online publication date: 31-Aug-2021
  • (2021)A Survey on Big Data Processing Frameworks for Mobility AnalyticsACM SIGMOD Record10.1145/3484622.348462650:2(18-29)Online publication date: 31-Aug-2021
  • (2021)Hierarchical Semantics Matching For Heterogeneous Spatio-temporal SourcesProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482350(565-575)Online publication date: 26-Oct-2021
  • (2021)SPEAR: Dynamic Spatio-Temporal Query Processing over High Velocity Data Streams2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00237(2279-2284)Online publication date: Apr-2021
  • (2021)Big Spatial and Spatio-Temporal Data Analytics SystemsTransactions on Large-Scale Data- and Knowledge-Centered Systems XLVII10.1007/978-3-662-62919-2_7(155-180)Online publication date: 17-Jan-2021
  • (2020)MRSweep: Distributed In-Memory Sweep-line for Scalable Object Intersection Problems2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA49011.2020.00046(324-333)Online publication date: Oct-2020
  • (2019)Distributed Spatial and Spatio-Temporal Join on Apache SparkACM Transactions on Spatial Algorithms and Systems10.1145/33251355:1(1-28)Online publication date: 27-Jun-2019
  • (2019)Building a Large-Scale Microscopic Road Network Traffic Simulator in Apache Spark2019 20th IEEE International Conference on Mobile Data Management (MDM)10.1109/MDM.2019.00-42(320-328)Online publication date: Jun-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media