[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

HBase storage schemas for massive spatial vector data

Published: 01 December 2017 Publication History

Abstract

With the development of Geographic Information System (GIS), the storage requirement of spatial vector data is increasing dramatically. Nowadays, designing an efficient storage schema for massive spatial vector data becomes a key step for GIS. Cloud computing with NoSQL, such as HBase, can provide massive high-concurrent and scalable service for storage of spatial vector data. However, storage schemas in NoSQL for spatial vector data can be rarely seen. In this paper, two HBase storage schemas for spatial vector data are proposed. One is the storage schema with rowkeys based on Z curve, Z schema, and the other is the storage schema with rowkeys based on geometry objects identifiers, ID schema. In our experiments, the region query efficiency of the two storage schemas is tested on the cloud framework built by us. Different order Z curve and different query ranges are involved in the experiments. Experimental results show, for both schemas, the increase of query range leads to the growth of response time. More importantly, response time of Z schema is about one-fifth as long as that of ID schema in all cases. It can be seen that Z schema is a better solution for storing spatial vector data in HBase.

References

[1]
Ranjan, R., Wang, L., Zomaya, A.Y., Tao, J., Jayaraman, P.P., Georgakopoulos, D.: Advances in methods and techniques for processing streaming big data in datacentre clouds. IEEE Trans. Emerg. Top. Comput. 4(2), 262---265 (2016)
[2]
Huang, F., Zhou, J., Tao, J., Tan, X., Liang, S., Cheng, J.: PMODTRAN: a parallel implementation based on MODTRAN for massive remote sensing data processing. Int, J. Digit. Earth 9, 819---834 (2016)
[3]
Chen, D., Hu, Y., Wang, L., Zomaya, A.Y., Li, X.: H-PARAFAC: hierarchical parallel factor analysis of multidimensional big data. IEEE Trans. Parallel Distrib. Syst. 28(4), 1091---1104 (2017)
[4]
Ranjan, R., Georgakopoulos, D., Wang, L.: A note on software tools and technologies for delivering smart media-optimized big data applications in the cloud. Computing 98(1---2), 1---5 (2016)
[5]
Ranjan, R., Kolodziej, J., Wang, L., Zomaya, A.Y.: Cross-layer cloud resource configuration selection in the big data era. IEEE Cloud Comput. 2(3), 16---22 (2015)
[6]
Ma, Y., Wang, L., Liu, P., Ranjan, R.: Towards building a data-intensive index for big data computing: a case study of remote sensing data processing. Inf. Sc. Int. J. 319(C), 171---188 (2015)
[7]
Deng, Z., Wu, X., Wang, L., Chen, X., Ranjan, R., Zomaya, A., Chen, D.: Parallel processing of dynamic continuous queries over streaming data flows. IEEE Trans. Parallel Distrib. Syst. 26(3), 834---846 (2015)
[8]
Wang, L., Geng, H., Liu, P., Lu, K., Kolodziej, J., Ranjan, R., Zomaya, A.Y.: Particle swarm optimization based dictionary learning for remote sensing big data. Knowl-Based Syst. 79(C), 43---50 (2015)
[9]
Wang, L., Lu, K., Liu, P., Ranjan, R., Chen, L.: IK-SVD: dictionary learning for spatial big data via incremental atom update. Comput. Sci. Eng. 16(4), 41---52 (2014)
[10]
Dan, C., Li, X., Dong, C., Wang, L., Lu, D.: Global synchronization measurement of multivariate neural signals with massively parallel nonlinear interdependence analysis. IEEE Trans. Neural Syst. Rehabil. Eng. 22(1), 33---43 (2014)
[11]
Chen, D., Li, D., Xiong, M., Bao, H., Li, X.: GPGPU-aided ensemble empirical-mode decomposition for EEG analysis during anesthesia. IEEE Trans. Inf. Technol. Biomed. 14(6), 1417 (2010)
[12]
Wang, Y., Liu, Z., Liao, H., Li, C.: Improving the performance of GIS polygon overlay computation with MapReduce for spatial big data processing. Clust. Comput. 18(2), 507---516 (2015)
[13]
Chen, Y., Li, F., Fan, J.: Mining association rules in big data with NGEP. Clust. Comput. 18(2), 577---585 (2015)
[14]
He, Z., Wu, C., Liu, G., Zheng, Z., Tian, Y.: Decomposition tree: a spatio-temporal indexing method for movement big data. Clust. Comput. 18(4), 1481---1492 (2015)
[15]
Zhao, J., Wang, L., Jie, T., Chen, J., Sun, W., Ranjan, R., Kołodziej, J., Streit, A., Georgakopoulos, D.: A security framework in G-Hadoop for big data computing across distributed Cloud data centres. J. Comput. Syst. Sci. 80(5), 994---1007 (2014)
[16]
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107---113 (2008)
[17]
Wang, L., Song, W., Liu, P.: Link the remote sensing big data to the image features via wavelet transformation. Clust. Comput. 19(2), 793---810 (2016)
[18]
Plaza, A.J., Chang, C.I.: High Performance Computing in Remote Sensing. Chapman & Hall/CRC, Boca Raton (2008)
[19]
Ma, Y., Wu, H., Wang, L., Huang, B., Ranjan, R., Zomaya, A., Jie, W.: Remote sensing big data computing: challenges and opportunities. Future Gener. Comput. Syst. 51, 47---60 (2015)
[20]
Habib, S., Morozov, V., Frontiere, N., Finkel, H., Pope, A., Heitmann, K.: HACC: extreme scaling and performance across diverse architectures. IEEE (2013)
[21]
Sadiku, M.N.O., Musa, S.M., Momoh, O.D.: Cloud computing: opportunities and challenges. Potentials IEEE 33(1), 34---36 (2014)
[22]
Karun, A.K., Chitharanjan, K.: A review on hadoop--HDFS infrastructure extensions. Inf. Commun. Technol. 2013, 132---137 (2013)
[23]
Agrawal, R., Ailamaki, A., Bernstein, P.A., Brewer, E.A., Carey, M.J., Chaudhuri, S., Doan, A., Florescu, D., Franklin, M.J., Garcia-Molina, H.: Others: the claremont report on database research. ACM Sigmod Record 37(3), 9---19 (2008)
[24]
Konstantinou, I., Angelou, E., Boumpouka, C., Tsoumakos, D., Koziris, N.: On the elasticity of NoSQL databases over cloud management platforms, pp. 2385---2388 (2011)
[25]
Cattell, R.: Scalable SQL and NoSQL data stores. Acm Sigmod Record 39(4), 12---27 (2011)
[26]
George, L.: HBase: the definitive guide: random access to your planet-size data. O'Reilly Media, Inc, California (2011)
[27]
Welcome to $$\text{Apache}^{\rm TM}$$ApacheTM Hadoop® !: Welcome to $$\text{ Apache }^{\rm TM}$$ApacheTM Hadoop ® !. http://hadoop.apache.org/ (2017). Accessed 2017/8/1 2017
[28]
White, T.: Hadoop: The Definitive Guide. O'Reilly Media, Inc, California (2012)
[29]
Vora, M.N.: Hadoop-HBase for large-scale data. In: International Conference on Computer Science and Network Technology, pp. 601---605. (2011)
[30]
Kim, D.J., Shin, J.H., Hong, K.S.: Scalable RDF store based on HBase and MapReduce. In: International Conference on Advanced Computer Theory and Engineering, pp. V1---V633. (2010)
[31]
Cryans, J., April, A., Abran, A.: Criteria to Compare Cloud Computing with Current Database Technology Software Process and Product Measurement, pp. 114---126. Springer, New York (2008)
[32]
Lam, C.: Hadoop in Action. Manning Publications Co., Greenwich (2010)
[33]
Space-filling curve - Wikipedia: Space-filling curve - Wikipedia. https://en.wikipedia.org/wiki/Space-filling_curve (2017). Accessed 2017/8/3 2017
[34]
Fu, Z.L.S.T.: Distributed spatial index based onmultilevel R-tree. Bull. Surv. Mapp. 11, 42---46p. (2012)
[35]
Li, X., Zheng, W.: Parallel Spatial Index Algorithm Based on Hilbert Partition. In: International Conference on Computational and Information Sciences, pp. 876---879 (2013)
[36]
Zhong, Y., Han, J., Zhang, T., Li, Z., Fang, J., Chen, G.: Towards Parallel Spatial Query Processing for Big Spatial Data. In: Parallel and Distributed Processing Symposium Workshops&Phd Forum, pp. 2085---2094. (2012)
[37]
Wang, L., Chen, B., Liu, Y.: Distributed storage and index of vector spatial data based on HBase. In: International Conference on Geoinformatics, pp. 1---5 (2013)
[38]
Kim, J., Hong, S., Nam, B.: A Performance Study of Traversing Spatial Indexing Structures in Parallel on GPU. In: IEEE International Conference on High PERFORMANCE Computing and Communication&2012 IEEE International Conference on Embedded Software and Systems, pp. 855---860 (2012)
[39]
Wei, L., Hsu, Y., Peng, W., Lee, W.: Indexing spatial data in cloud data managements. Pervasive Mob. Comput. 15, 48---61 (2014)
[40]
Deng, Z., Hu, Y., Zhu, M., Huang, X., Du, B.: A scalable and fast OPTICS for clustering trajectory big data. Clust. Comput. 18(2), 549---562 (2015)
[41]
Han, D., Stroulia, E.: HGrid: A Data Model for Large Geospatial Data Sets in HBase. In: IEEE Sixth International Conference on Cloud Computing, pp. 910---917 (2013)
[42]
Zhang, N., Zheng, G., Chen, H., Chen, J., Chen, X.: HBaseSpatial: A Scalable Spatial Data Storage Based on HBase. In: IEEE International Conference on Trust, Security and Privacy in Computing and Communications, pp. 644---651 (2014)
[43]
Chen, D., Hu, Y., Cai, C., Zeng, K., Li, X.: Brain big data processing with massively parallel computing technology: challenges and opportunities. Softw. Pract. Exp. 47(3), 405---420 (2017)
[44]
Chen, D., Li, X., Wang, L., Khan, S.U., Wang, J., Zeng, K., Cai, C.: Fast and scalable multi-way analysis of massive neural data. IEEE Comput. 64(3), 707---719 (2015)

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Cluster Computing
Cluster Computing  Volume 20, Issue 4
Dec 2017
862 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 December 2017

Author Tags

  1. Cloud computing
  2. HBase
  3. Spatial vector data
  4. Storage schema

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media