Abstract
The best region search (BRS) is one of the major research problems in geospatial data processing applications. The BRS problem objective is to discover the ideal location of a particular size specified rectangle, with a predetermined end goal of maximizing the user-defined scoring function. The existing solutions for finding the top-k best regions have focused on designing algorithms for centralized settings. These solutions are not suitable for processing massive datasets. In this paper, we enable a Hadoop MapReduce-based parallel and distributed computation to obtain significant improvement in the performance. In addition to the parallel and distributed setting, we also incorporate early pruning strategies to eliminate the need to process rectangles that are not part of the output to minimize the communication cost involved in computing k-BRS. We later introduced a redistribution strategy over the initially proposed methodology that handles skew inherited from the dataset. Our results are obtained from extensive experimentation, both synthetic and real-world datasets.
Similar content being viewed by others
References
Anselin, L.: Local indicators of spatial association—LISA. Geogr. Anal. 27(2), 93–115 (1995)
Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X.; et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96, 226–231 (1996)
Feng, K.; Cong, G.; Bhowmick, S.S.; Peng, W.C.; Miao, C.: Towards best region search for data exploration. In: Proceedings of the 2016 International Conference on Management of Data, pp. 1055–1070 (2016)
Amagata, D.; Hara, T.: A general framework for MaxRS and MaxCRS monitoring in spatial data streams. ACM Trans. Spatial Algorithms Syst. (TSAS) 3(1), 1–34 (2017)
Cao, X.; Cong, G.; Jensen, C.S.; Yiu, M.L.: Retrieving regions of interest for user exploration (2014)
Choi, D.W.; Chung, C.W.; Tao, Y.: A scalable algorithm for maximizing range sum in spatial databases. Proc. VLDB Endow. 5(11), 1088–1099 (2012)
Feng, K.; Guo, T.; Cong, G.; Bhowmick, S.S.; Ma, S.: SURGE: continuous detection of bursty regions over a stream of spatial objects. IEEE Trans. Knowl. Data Eng. (2019)
Tao, Y.; Hu, X.; Choi, D.W.; Chung, C.W.: Approximate MaxRS in spatial databases. In: 39th International Conference on Very Large Data Bases 2013, VLDB 2013, International Conference on Very Large Data Bases, vol. 6, pp. 1546–1557 (2013)
Wong, R.C.W.; Özsu, M.T.; Yu, P.S.; Fu, A.W.C.; Liu, L.: Efficient method for maximizing bichromatic reverse nearest neighbor. Proc. VLDB Endow. 2(1), 1126–1137 (2009)
Cao, X.; Cong, G.; Guo, T.; Jensen, C.S.; Ooi, B.C.: Efficient processing of spatial group keyword queries. ACM Trans. Database Syst (TODS) 40(2), 1–48 (2015)
Choi, D.W.; Pei, J.; Lin, X.: Finding the minimum spatial keyword cover. In: 2016 IEEE 32nd International Conference on Data Engineering (ICDE), pp. 685–696. IEEE (2016)
Deng, K.; Li, X.; Lu, J.; Zhou, X.: Best keyword cover search. IEEE Trans. Knowl. Data Eng. 27(1), 61–73 (2014)
Luo, S.; Luo, Y.; Zhou, S.; Cong, G.; Guan, J.; Yong, Z.: Distributed spatial keyword querying on road networks. In: EDBT, Citeseer, pp. 235–246 (2014)
Eldawy, A.; Mokbel, M.F.: The era of big spatial data: a survey. Found. Trends Databases 6(3–4), 163–273 (2016)
Sheng, C.; Tao, Y.: New results on two-dimensional orthogonal range aggregation in external memory. In: Proceedings of the Thirtieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 129–139 (2011)
Imai, H.; Asano, T.: Finding the connected components and a maximum clique of an intersection graph of rectangles in the plane. J. Algorithms 4(4), 310–323 (1983)
Nandy, S.C.; Bhattacharya, B.B.: A unified algorithm for finding maximum and minimum object enclosing rectangles and cuboids. Comput. Math. Appl. 29(8), 45–61 (1995)
Skoutas, D.; Sacharidis, D.; Patroumpas, K.: Efficient progressive and diversified top-k best region search. In: Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 299–308 (2018)
Goodrich, M.T.; Tsay, J.J.; Vengroff, D.E.; Vitter, J.S.: External-memory computational geometry. In: Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science, pp. 714–723. IEEE (1993)
Qi, J.; Kumar, V.; Zhang, R.; Tanin, E.; Trajcevski, G.; Scheuermann, P.: Continuous maintenance of range sum heat maps. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1625–1628. IEEE (2018)
Athanasiou, S.; Giannopoulos, G.; Graux, D.; Karagiannakis, N.; Lehmann, J.; Ngomo, A.C.N.; Patroumpas, K.; Sherif, M.A.; Skoutas, D.: Big POI data integration with linked data technologies. In: EDBT, pp. 477–488 (2019)
Choi, D.; Park, C.S.; Chung, Y.D.: Progressive top-k subarray query processing in array databases. Proc. VLDB Endow. 12(9), 989–1001 (2019)
Skoutas, D., Sacharidis, D., Patroumpas, K.: Discovering mixture-based best regions of arbitrary shapes. In: Proceedings of the 29th International Conference on Advances in Geographic Information Systems, pp. 468–479 (2021)
Patroumpas, K.; Skoutas, D.; Mandilaras, G.; Giannopoulos, G.; Athanasiou, S.: Exposing points of interest as linked geospatial data. In: Proceedings of the 16th International Symposium on Spatial and Temporal Databases, pp. 21–30 (2019)
Alexakis, M.; Athanasiou, S.; Kouvaras, Y.; Patroumpas, K.; Skoutas, D.: Slipo: Scalable data integration for points of interest. In: Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Geospatial Data Access and Processing APIs, pp. 1–2 (2020)
Shahrivari, H.; Olma, M.; Papapetrou, O.; Skoutas, D.; Ailamaki, A.: A parallel and distributed approach for diversified top-k best region search. In: EDBT, pp. 265–276 (2020)
Xiao, X.; Yao, B.; Li, F.: Optimal location queries in road network databases. In: 2011 IEEE 27th International Conference on Data Engineering, pp. 804–815. IEEE (2011)
Chen, Z.; Yuan, Q.; Liu, W.: Monitoring best region in spatial data streams in road networks. Data Knowl. Eng. 120, 100–118 (2019)
Dean, J.; Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Qiao, B.; Hu, B.; Zhu, J.; Wu, G.; Giraud-Carrier, C.; Wang, G.: A top-k spatial join querying processing algorithm based on spark. Inf. Syst. 87(101), 419 (2020)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Potluri, A., Bhattu, S.N., Kumar, N.V.N. et al. Enhanced-Sweep: Communication Cost Efficient Top-K Best Region Search. Arab J Sci Eng 48, 2121–2132 (2023). https://doi.org/10.1007/s13369-022-07084-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-022-07084-x