[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

A Survey on Big Data Processing Frameworks for Mobility Analytics

Published: 31 August 2021 Publication History

Abstract

In the current era of big spatial data, the vast amount of produced mobility data (by sensors, GPS-equipped devices, surveillance networks, radars, etc.) poses new challenges related to mobility analytics. A cornerstone facilitator for performing mobility analytics at scale is the availability of big data processing frameworks and techniques tailored for spatial and spatio-temporal data. Motivated by this pressing need, in this paper, we provide a survey of big data processing frameworks for mobility analytics. Particular focus is put on the underlying techniques; indexing, partitioning, query processing are essential for enabling efficient and scalable data management. In this way, this report serves as a useful guide of state-of-the-art methods and modern techniques for scalable mobility data management and analytics.

References

[1]
A. Abouzeid, K. Bajda-Pawlikowski, D. J. Abadi, A. Rasin, and A. Silberschatz. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow., 2(1):922--933, 2009.
[2]
A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu, X. Zhang, and J. H. Saltz. Hadoop-GIS: A high performance spatial data warehousing system over MapReduce. PVLDB, 6(11):1009--1020, 2013.
[3]
M. M. Alam, S. Ray, and V. C. Bhavsar. A performance study of big spatial data systems. In Proc. of SIGSPATIAL, pages 1--9, 2018.
[4]
L. Alarabi and M. F. Mokbel. A demonstration of ST-Hadoop: A MapReduce framework for big spatio-temporal data. PVLDB, 10(12):1961--1964, 2017.
[5]
L. Alarabi and M. F. Mokbel. A demonstration of Summit: A scalable data management framework for massive trajectory. In Proc. of MDM, pages 226--227, 2020.
[6]
L. Alarabi, M. F. Mokbel, and M. Musleh. ST-Hadoop: A MapReduce framework for spatio-temporal data. In Proc. of SSTD, pages 84--104, 2017.
[7]
A. M. Aly, A. R. Mahmood, M. S. Hassan, W. G. Aref, M. Ouzzani, H. Elmeleegy, and T. Qadah. AQWA: Adaptive query-workload-aware partitioning of big spatial data. PVLDB, 8(13):2062--2073, 2015.
[8]
Y. Arseneau, S. Gautam, B. G. Nickerson, and S. Ray. STILT: Unifying spatial, temporal and textual search using a generalized multi-dimensional index. In Proc. of SSDBM, pages 11:1--11:12. ACM, 2020.
[9]
M. S. Bakli, M. A. Sakr, and T. H. A. Soliman. HadoopTrajectory: A Hadoop spatiotemporal data processing extension. J. Geogr. Syst., 21(2):211--235, 2019.
[10]
M. S. Bakli, M. A. Sakr, and E. Zim´anyi. Distributed mobility data management in MobilityDB. In Proc. of MDM, pages 238--239, 2020.
[11]
M. S. Bakli, M. A. Sakr, and E. Zim´anyi. Distributed spatiotemporal trajectory query processing in SQL. In Proc. of SIGSPATIAL, pages 87--98, 2020.
[12]
P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas. Apache Flink?: Stream and batch processing in a single engine. IEEE Data Eng. Bull., 38(4):28--38, 2015.
[13]
R. Cattell. Scalable SQL and NoSQL data stores. SIGMOD Record, 39(4):12--27, 2010.
[14]
H. Chasparis and A. Eldawy. Experimental evaluation of selectivity estimation on big spatial data. In Proc. of GeoRich, pages 8:1--8:6, 2017.
[15]
A. Das, J. Gehrke, and M. Riedewald. Approximation techniques for spatial data. In Proc. of SIGMOD, pages 695--706, 2004.
[16]
A. Davoudian, L. Chen, and M. Liu. A survey on NoSQL stores. ACM Comput. Surv., 51(2), 2018.
[17]
X. Ding, L. Chen, Y. Gao, C. S. Jensen, and H. Bao. UlTraMan: A unified platform for big trajectory data management and analytics. PVLDB, 11(7):787--799, 2018.
[18]
C. Doulkeridis and K. Nørv°ag. A survey of large-scale analytical query processing in MapReduce. VLDB J., 23(3):355--380, 2014.
[19]
A. Eldawy, L. Alarabi, and M. F. Mokbel. Spatial partitioning techniques in SpatialHadoop. PVLDB, 8(12):1602--1605, 2015.
[20]
A. Eldawy and M. F. Mokbel. Pigeon: A spatial MapReduce language. In Proc. of ICDE, pages 1242--1245, 2014.
[21]
A. Eldawy and M. F. Mokbel. SpatialHadoop: A MapReduce framework for spatial data. In Proc. of ICDE, pages 1352--1363, 2015.
[22]
A. Eldawy and M. F. Mokbel. The era of big spatial data: A survey. Foundations and Trends in Databases, 6(3--4):163--273, 2016.
[23]
Y. Fang, R. Cheng, W. Tang, S. Maniu, and X. S. Yang. Scalable algorithms for nearest-neighbor joins on big trajectory data. IEEE Trans. Knowl. Data Eng., 28(3):785--800, 2016.
[24]
Z. Fang, L. Chen, Y. Gao, L. Pan, and C. S. Jensen. Dragoon: A hybrid and efficient big trajectory 28 SIGMOD Record, June 2021 (Vol. 50, No. 2) management system for offline and online analytics. The VLDB Journal, 30:287--310, 2021.
[25]
A. D. Fox, C. N. Eichelberger, J. N. Hughes, and S. Lyon. Spatio-temporal indexing in non-relational distributed databases. In Proc. of IEEE Big Data, pages 291--299, 2013.
[26]
A. Corral, L. Iribarne, M. Vassilakopoulos, and Y. Manolopoulos. Efficient distance join query processing in distributed spatial data management systems. Inf. Sci., 512:985--1008, 2020.
[27]
N. Giatrakos, E. Alevizos, A. Artikis, A. Deligiannakis, and M. N. Garofalakis. Complex event recognition in the big data era: A survey. VLDB J., 29(1):313--352, 2020.
[28]
X. Guan, C. Bo, Z. Li, and Y. Yu. ST-hash: An efficient spatiotemporal index for massive trajectory data in a NoSQL database. In Proc. of Geoinformatics, pages 1--7, 2017.
[29]
S. Hagedorn, P. G¨otze, and K. Sattler. Big spatial data processing frameworks: Feature and performance evaluation. In Proc. of EDBT, pages 490--493, 2017.
[30]
S. Hagedorn, P. G¨otze, and K. Sattler. The STARK framework for spatio-temporal data analytics on Spark. In Proc. of BTW, pages 123--142, 2017.
[31]
T. Hoang-Vu, H. T. Vo, and J. Freire. A unified index for spatio-temporal keyword queries. In Proc. of CIKM, pages 135--144, 2016.
[32]
S. Huang, B. Wang, J. Zhu, G. Wang, and G. Yu. R-HBase: A multi-dimensional indexing framework for cloud computing environment. In Proc. of ICDMW, pages 569--574, 2014.
[33]
E. H. Jacox and H. Samet. Spatial join techniques. ACM Trans. Database Syst., 32(1):7, 2007.
[34]
H. V. Jagadish, J. Gehrke, A. Labrinidis, Y. Papakonstantinou, J. M. Patel, R. Ramakrishnan, and C. Shahabi. Big data and its technical challenges. Commun. ACM, 57(7):86--94, 2014.
[35]
S. T. Leutenegger, J. M. Edgington, and M. A. L´opez. STR: A simple and efficient algorithm for R-tree packing. In Proc. of ICDE, pages 497--506, 1997.
[36]
S. Li, S. Hu, R. K. Ganti, M. Srivatsa, and T. F. Abdelzaher. Pyro: A spatial-temporal big-data storage system. In Proc. of USENIX, pages 97--109, 2015.
[37]
J. Lu and R. H. G¨uting. Parallel SECONDO: Practical and efficient mobility data processing in the cloud. In Proc. of IEEE Big Data, pages 17--25, 2013.
[38]
W. Lu, Y. Shen, S. Chen, and B. C. Ooi. Efficient processing of k nearest neighbor joins using MapReduce. Proc. VLDB Endow., 5(10):1016--1027, 2012.
[39]
Y. Ma, Y. Zhang, and X. Meng. ST-HBase: A scalable data management system for massive geo-tagged objects. In Proc. of WAIM, pages 155--166, 2013.
[40]
S. Maguerra, A. Boulmakoul, L. Karim, and B. Hassan. A survey on solutions for big spatio-temporal data processing and analytics. In Proc. of INTIS, pages 127--140, 2018.
[41]
P. Nikitopoulos, A. Vlachou, C. Doulkeridis, and G. A. Vouros. DiStRDF: Distributed spatio-temporal RDF queries on Spark. In Proc. of BMDA, pages 125--132, 2018.
[42]
S. Nishimura, S. Das, D. Agrawal, and A. El Abbadi. MD-HBase: A scalable multi-dimensional data infrastructure for location aware services. In Proc. of MDM, pages 7--16, 2011.
[43]
S. Nishimura and H. Yokota. QUILTS: Multidimensional data partitioning framework based on query-aware and skew-tolerant space-filling curves. In Proc. of SIGMOD, pages 1525--1537, 2017.
[44]
V. Pandey, A. van Renen, A. Kipf, J. Ding, I. Sabek, and A. Kemper. The case for learned spatial indexes. In Proc. of AIDB, 2020.
[45]
J. Qi, G. Liu, C. S. Jensen, and L. Kulik. Effectively learning spatial indices. Proc. VLDB Endow., 13(11):2341--2354, 2020.
[46]
G. M. Santipantakis, A. Glenis, K. Patroumpas, A. Vlachou, C. Doulkeridis, G. A. Vouros, N. Pelekis, and Y. Theodoridis. SPARTAN: Semantic integration of big spatio-temporal data from streaming and archival sources. Future Gener. Comput. Syst., 110:540--555, 2020.
[47]
S. Shang, L. Chen, Z. Wei, C. S. Jensen, K. Zheng, and P. Kalnis. Trajectory similarity join in spatial networks. Proc. VLDB Endow., 10(11):1178--1189, 2017.
[48]
S. Shang, L. Chen, Z. Wei, C. S. Jensen, K. Zheng, and P. Kalnis. Parallel trajectory similarity joins in spatial networks. VLDB J., 27(3):395--420, 2018.
[49]
Z. Shang, G. Li, and Z. Bao. DITA: Distributed in-memory trajectory analytics. In Proc. of SIGMOD, pages 725--740, 2018.
[50]
P. Tampakis, C. Doulkeridis, N. Pelekis, and Y. Theodoridis. Distributed subtrajectory join on massive datasets. ACM Trans. Spatial Algorithms Syst., 6(2):8:1--8:29, 2020.
[51]
M. Tang, Y. Yu, W. G. Aref, A. R. Mahmood, Q. M. Malluhi, and M. Ouzzani. LocationSpark: In-memory distributed spatial query processing and optimization. CoRR, abs/1907.03736, 2019.
[52]
M. Tang, Y. Yu, Q. M. Malluhi, M. Ouzzani, and W. G. Aref. LocationSpark: A distributed in-memory data management system for big spatial data. PVLDB, 9(13):1565--1568, 2016.
[53]
A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. V. Ryaboy. Storm@twitter. In Proc. of SIGMOD, pages 147--156, 2014.
[54]
A. Vlachou, C. Doulkeridis, A. Glenis, G. M. Santipantakis, and G. A. Vouros. Efficient spatio-temporal RDF query processing in large dynamic knowledge bases. In Proc. of SAC, pages 439--447, 2019.
[55]
T. Vu and A. Eldawy. R*-Grove: Balanced spatial partitioning for large-scale datasets. Frontiers Big Data, 3:28, 2020.
[56]
R. T. Whitman, B. G. Marsh, M. B. Park, and E. G. Hoel. Distributed spatial and spatio-temporal join on Apache Spark. ACM Trans. Spatial Algorithms Syst., 5(1):6:1--6:28, 2019.
[57]
R. T. Whitman, M. B. Park, B. G. Marsh, and E. G. Hoel. Spatio-temporal join on Apache Spark. In Proc. of SIGSPATIAL, pages 20:1--20:10, 2017.
[58]
D. Xie, F. Li, B. Yao, G. Li, Z. Chen, L. Zhou, and M. Guo. Simba: Spatial in-memory big data analysis. In Proc. of SIGSPATIAL, pages 86:1--86:4, 2016.
[59]
D. Xie, F. Li, B. Yao, G. Li, L. Zhou, and M. Guo. Simba: Efficient in-memory spatial analytics. In Proc. of SIGMOD, pages 1071--1085, 2016.
[60]
S. You, J. Zhang, and L. Gruenwald. Large-scale spatial join query processing in cloud. In Proc. of ICDEW, pages 34--41, 2015.
[61]
J. Yu, J. Wu, and M. Sarwat. A demonstration of GeoSpark: A cluster computing framework for processing big spatial data. In Proc. of ICDE, pages 1410--1413, 2016.
[62]
J. Yu, Z. Zhang, and M. Sarwat. Spatial data management in Apache Spark: The GeoSpark perspective and beyond. GeoInformatica, 23(1):37--78, 2019.
[63]
H. Yuan and G. Li. Distributed in-memory trajectory similarity search and join on road network. In Proc. of ICDE, pages 1262--1273, 2019.
[64]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauly, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proc. of NSDI, pages 15--28, 2012.
[65]
M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, A. Ghodsi, J. Gonzalez, S. Shenker, and I. Stoica. Apache Spark: A unified engine for big data processing. Commun. ACM, 59(11):56--65, 2016.
[66]
C. Zhang, F. Li, and J. Jestes. Efficient parallel kNN joins for large data in MapReduce. In E. A. Rundensteiner, V. Markl, I. Manolescu, S. Amer-Yahia, F. Naumann, and I. Ari, editors, Proc. of EDBT, pages 38--49, 2012.
[67]
E. Zim´anyi, M. A. Sakr, and A. Lesuisse. MobilityDB: A mobility database based on PostgreSQL and PostGIS. ACM Trans. Database Syst., 45(4):19:1--19:42, 2020.

Cited By

View all
  • (2024)Animating the Crowd Mirage: A WiFi-Positioning-Based Crowd Mobility Digital Twin for Smart CampusesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997928:4(1-32)Online publication date: 21-Nov-2024
  • (2024)The MobiSpaces Manifesto on Mobility Data SpacesProceedings of the 4th Eclipse Security, AI, Architecture and Modelling Conference on Data Space10.1145/3685651.3685654(66-75)Online publication date: 22-Oct-2024
  • (2024)DICER: Data Intensive Computing Environment and Runtime for Evaluating Unprecedented Scale of Geospatial-Temporal Human Mobility Data2024 25th IEEE International Conference on Mobile Data Management (MDM)10.1109/MDM61037.2024.00037(139-148)Online publication date: 24-Jun-2024
  • Show More Cited By

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 50, Issue 2
June 2021
42 pages
ISSN:0163-5808
DOI:10.1145/3484622
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 August 2021
Published in SIGMOD Volume 50, Issue 2

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)41
  • Downloads (Last 6 weeks)5
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Animating the Crowd Mirage: A WiFi-Positioning-Based Crowd Mobility Digital Twin for Smart CampusesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997928:4(1-32)Online publication date: 21-Nov-2024
  • (2024)The MobiSpaces Manifesto on Mobility Data SpacesProceedings of the 4th Eclipse Security, AI, Architecture and Modelling Conference on Data Space10.1145/3685651.3685654(66-75)Online publication date: 22-Oct-2024
  • (2024)DICER: Data Intensive Computing Environment and Runtime for Evaluating Unprecedented Scale of Geospatial-Temporal Human Mobility Data2024 25th IEEE International Conference on Mobile Data Management (MDM)10.1109/MDM61037.2024.00037(139-148)Online publication date: 24-Jun-2024
  • (2024)A Distributed and Scalable Framework for Low-Latency Continuous Trajectory Stream ProcessingIEEE Access10.1109/ACCESS.2024.348443312(159426-159444)Online publication date: 2024
  • (2024)Boosting HPC data analysis performance with the ParSoDA-Py libraryThe Journal of Supercomputing10.1007/s11227-023-05883-z80:8(11741-11761)Online publication date: 2-Feb-2024
  • (2023)Robust Location Prediction over Sparse Spatiotemporal Trajectory Data: Flashback to the Right Moment!ACM Transactions on Intelligent Systems and Technology10.1145/361654114:5(1-24)Online publication date: 30-Sep-2023
  • (2023)MobiSpaces: An Architecture for Energy-Efficient Data Spaces for Mobility Data2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386539(1487-1494)Online publication date: 15-Dec-2023
  • (2023)Efficient distributed algorithms for distance join queries in spark-based spatial analytics systemsInternational Journal of General Systems10.1080/03081079.2023.217375052:3(206-250)Online publication date: 19-Feb-2023
  • (2022)Tearing Down the Tower of Babel: Unified and Efficient Spatio-temporal Queries for NoSQL Stores2022 23rd IEEE International Conference on Mobile Data Management (MDM)10.1109/MDM55031.2022.00024(19-28)Online publication date: Jun-2022
  • (2022)Operator Placement for Spatio-temporal Tasks2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020279(281-290)Online publication date: 17-Dec-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media