[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

M-Grid: a distributed framework for multidimensional indexing and querying of location based data

Published: 01 March 2017 Publication History

Abstract

The widespread use of mobile devices and the real time availability of user-location information is facilitating the development of new personalized, location-based applications and services (LBSs). Such applications require multi-attribute query processing, scalability for supporting millions of users, real-time querying capability and analyzing large volumes of data. Cloud computing aided a new generation of distributed databases commonly known as key-value stores. Key-value stores were designed to extract values from very large volumes of data while being highly available, fault-tolerant and scalable, hence providing much needed infrastructure to support LBSs. However, complex queries over multidimensional data cannot be processed efficiently as they do not provide means to access multiple attributes. In this paper, we present M-Grid, a unifying indexing and a data distribution framework which enables key-value stores to support multidimensional queries. We organize a set of nodes in a modified P-Grid overlay network which provides efficient data distribution, fault-tolerance and query processing over multidimensional data. To index, we use Hilbert Space Filling Curve based linearization technique which preserves the data locality to efficiently manage multidimensional data in a key-value store. We propose algorithms to dynamically process range and k nearest neighbor (kNN) queries on linearized values. This removes the overhead of maintaining a separate index table. Our approach is completely independent from the underlying storage layer and can be implemented on any cloud infrastructure. Our experiments on Amazon EC2 show that M-Grid achieves a performance improvement of three orders of magnitude in comparison to MapReduce and four times to that of MD-HBase scheme.

References

[1]
http://www.itu.int/
[2]
Union, I.T.: The world in 2015: Ict facts and figures. {Online}. Available: https://www.itu.int/en/ITU-D/Statistics/Documents/facts/ICTFactsFigures2015.pdf (2015)
[3]
McMahon, M., Steketee, C.: Investigation of proposed applications for lbs enabled mobile handsets. In:ICMB '06. International Conference on Mobile Business, 2006, pp. 26---26 (2006)
[4]
{Online}. Available: http://www.oracle.com/technetwork/database/options/spatialandgraph/overview/index.htm
[5]
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD '84, pp. 47---57 (1984)
[6]
Finkel, R., Bentley, J.: Quad trees a data structure for retrieval on composite keys. Acta Informatica 4, 1---9 (1974)
[7]
http://www.ibm.com/software/products/en/db2spaext
[8]
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, vol. 7, ser. OSDI '06, pp. 15---15 (2006)
[9]
http://hbase.apache.org/
[10]
http://cassandra.apache.org/
[11]
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Commun. ACM 51(1), 107---113 (2008)
[12]
Hilbert, D.: Ueber stetige abbildung einer linie auf ein flashenstuck. Mathematishe annalen 32, 459---460 (1893)
[13]
Aberer, K., Cudré-Mauroux, P., Datta, A., Despotovic, Z., Hauswirth, M., Punceva, M., Schmidt, R.: P-grid: a self-organizing structured p2p system. SIGMOD Rec. 32, 29---33 (2003)
[14]
Nishimura, S., Das, S., Agrawal, D., Abbadi, A.E.: Md-hbase: design and implementation of elastic infrastructure for cloud-scale location services. Distrib. Parallel Databases 31, 289---319 (2014)
[15]
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon's highly available key-value store. SIGOPS Oper. Syst. Rev. 41, 205---220 (2007)
[16]
Cooper, B.F., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, P., Jacobsen, H.-A., Puz, N., Weaver, D., Yerneni, R.: Pnuts: Yahoo!'s hosted data serving platform. Proc. VLDB Endow. 2(1), 1277---1288 (2008)
[17]
Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, ser. SOSP '03, pp. 29---43 (2003)
[18]
http://hadoop.apache.org/
[19]
Wu, S., Wu, K.-L.: An indexing framework for efficient retrieval on the cloud. IEEE Data Eng. Bull. 32(1), 75---82 (2009)
[20]
Wang, J., Wu, S., Gao, H., Li, J., Ooi, B.C.: Indexing multi-dimensional data in a cloud system. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD '10, pp. 591---602 (2010)
[21]
Zhang, X., Ai, J., Wang, Z., Lu, J., Meng, X.: An efficient multi-dimensional index for cloud data management. In: Proceedings of the First International Workshop on Cloud Data Management, ser. CloudDB '09, pp. 17---24 (2009)
[22]
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509---517 (1975)
[23]
Ding, L., Qiao, B., Wang, G., Chen, C.: An efficient quad-tree based index structure for cloud data management. In: Web-Age Information Management. Lecture Notes in Computer Science, vol. 6897, pp. 238---250 (2011)
[24]
Suprio Ray, R.B., Goel, A.K.: Supporting location-based services in a main-memory database. In: Proceedings of the IEEE International Conference on Mobile Data Management (MDM), (2014)
[25]
Jagadish, H., Ooi, B.-C., Vu, Q.H., Zhang, R., Zhou, A.: Vbi-tree: A peer-to-peer framework for supporting multi-dimensional indexing schemes. In: Data Engineering, 2006. in: ICDE '06 Proceedings of the 22nd International Conference on, pp. 34---34 (2006)
[26]
Li, F., Chen, R., Zhou, C., Zhang, M.: A novel geo-spatial image storage method based on hilbert space filling curves. In: 2010 18th International Conference on Geoinformatics, pp. 1---4 (2010)
[27]
Pavanakumar, M., Kaushik, K.: Revisiting the space-filling curves for storage, reordering and partitioning mesh based data in scientific computing. In: 2013 20th International Conference on High Performance Computing (HiPC), pp. 362---367 (2013)
[28]
Hu, C., Zhao, Y., Wei, X., Du, B., Huang, Y., Ma, D., Li, X.: Actgis: A web-based collaborative tiled geospatial image map system. In: 2010 IEEE Symposium on Computers and Communications (ISCC), pp. 521---528 (2010)
[29]
Butz, A.R.: Alternative algorithm for hilbert's space-filling curve. IEEE Trans. Comput. 20, 424---426 (1971)
[30]
Bially, T.: Space-filling curves: their generation and their application to bandwidth reduction. IEEE Trans. Inf. Theory 15(6), 658---664 (1969)
[31]
Hamilton, C., Rau-Chaplin, A.: Compact hilbert indices for multi-dimensional data. In: First International Conference on Complex, Intelligent and Software Intensive Systems, 2007. CISIS 2007, pp. 139---146 (2007)
[32]
Morton, G.: A computer oriented geodetic data base and a new technique in file sequencing. International Business Machines Company, {Online}. Available: http://books.google.com/books?id=9FFdHAAACAAJ (1966)
[33]
Gray, F.: Pulse code communication. (1953)
[34]
Moon, B., Jagadish, H., Faloutsos, C., Saltz, J.: Analysis of the clustering properties of the hilbert space-filling curve. Knowl. Data Eng. IEEE Trans. 13, 124---141 (2001)
[35]
Abel, D.J., Mark, D.M.: A comparative analysis of some two-dimensional orderings. Int. J. Geogr. Inf. Syst. 4, 21---31 (1990)
[36]
Mokbel, M.F., Aref, W.G., Kamel, I.: Performance of multi-dimensional space-filling curves. In: Proceedings of the 10th ACM International Symposium on Advances in Geographic Information Systems, ser. GIS '02, pp. 149---154 (2002)
[37]
Clarke, I., Sandberg, O., Wiley, B., Hong, T.W.: Freenet: A distributed anonymous information storage and retrieval system. In: International Workshop on Designing Privacy Enhancing Technologies: Design Issues in Anonymity and Unobservability, pp. 46---66 (2001)
[38]
Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, ser. SIGCOMM '01, pp. 149---160 (2001)
[39]
Jagadish, H.V., Ooi, B.C., Vu, Q.H.: Baton: A balanced tree structure for peer-to-peer networks. In: Proceedings of the 31st International Conference on Very Large Data Bases, ser. VLDB '05, pp. 661---672 (2005)
[40]
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content-addressable network. SIGCOMM Comput. Commun. Rev. 31, 161---172 (2001)
[41]
Rowstron, A.I.T., Druschel, P.: Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg, ser. Middleware '01, pp. 329---350 (2001)
[42]
Crainiceanu, A., Linga, P., Machanavajjhala, A., Gehrke, J., Shanmugasundaram, J.: P-ring: An efficient and robust p2p range index structure. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD '07, pp. 223---234 (2007)
[43]
Aberer, K., Datta, A., Hauswirth, M., Schmidt, R.: Indexing data-oriented overlay networks. In: Proceedings of the 31st International Conference on Very Large Data Bases, ser. VLDB '05, pp. 685---696 (2005)
[44]
Datta, A., Hauswirth, M., John, R., Schmidt, R., Aberer, K.: Range queries in trie-structured overlays. In: Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing, ser. P2P '05, pp. 57---66 (2005)
[45]
Rosch, P., Sattler, K., von der Weth, C., Buchmann, E.: Best effort query processing in dht-based p2p systems. In: 21st International Conference on Data Engineering Workshops, 2005, pp. 1186---1186 (2005)
[46]
Lawder, J.K.: Querying multi-dimensional data indexed using the hilbert space-filling curve. SIGMOD Rec. 30, 2001 (2001)
[47]
https://code.google.com/p/uzaygezen/
[48]
Tang, Y., Xu, J., Zhou, S., Lee, W.-C., Deng, D., Wang, Y.: A lightweight multidimensional index for complex queries over dhts. IEEE Trans. Parallel Distrib. Syst. 22, 2046---2054 (2011)
[49]
Tanin, E., Nayar, D., Samet, H.: An efficient nearest neighbor algorithm for p2p settings. In: Proceedings of the 2005 National Conference on Digital Government Research, ser. dg.o '05, pp. 21---28 (2005)
[50]
Gao, J.: Efficient support for similarity searches in dht-based peer-to-peer systems. In: In IEEE International Conference on Communications (ICC07 (2007)
[51]
Lu, W., Shen, Y., Chen, S., Ooi, B.C.: Efficient processing of k nearest neighbor joins using mapreduce. Proc. VLDB Endow. 5, 1016---1027 (2012)
[52]
Stupar, A., Michel, S., Schenkel, R.: Rankreduce - processing k-nearest neighbor queries on top of mapreduce. In: In LSDS-IR, (2010)
[53]
Tao, Y., Zhang, J., Papadias, D., Mamoulis, N.: An efficient cost model for optimization of nearest neighbor search in low and medium dimensional spaces. IEEE Trans. Knowl. Data Eng. 16, 1169---1184 (2004)
[54]
Brinkhoff, T.: A framework for generating network-based moving objects. Geoinformatica 6, 153---180 (2002)
[55]
Cooper, B. F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with ycsb. In: Proceedings of the 1st ACM Symposium on Cloud Computing, ser. SoCC '10, pp. 143---154 (2010)

Cited By

View all
  • (2022)Towards a Grid-Based Framework for Supporting Range Aggregate Queries Over Big Sensor Network ReadingsInternational Journal of Distributed Systems and Technologies10.4018/IJDST.29624813:1(1-21)Online publication date: 29-Apr-2022
  • (2022)Research on Optimization and Application of University Student Development and Management Strategy Driven by Multidimensional Big DataScientific Programming10.1155/2022/65380692022Online publication date: 1-Jan-2022
  • (2022)BBoxDB streams: scalable processing of multi-dimensional data streamsDistributed and Parallel Databases10.1007/s10619-022-07408-840:2-3(559-625)Online publication date: 1-Sep-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Distributed and Parallel Databases
Distributed and Parallel Databases  Volume 35, Issue 1
March 2017
79 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 March 2017

Author Tags

  1. Location based services
  2. Multidimensional indexing
  3. Peer-to-peer system

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Towards a Grid-Based Framework for Supporting Range Aggregate Queries Over Big Sensor Network ReadingsInternational Journal of Distributed Systems and Technologies10.4018/IJDST.29624813:1(1-21)Online publication date: 29-Apr-2022
  • (2022)Research on Optimization and Application of University Student Development and Management Strategy Driven by Multidimensional Big DataScientific Programming10.1155/2022/65380692022Online publication date: 1-Jan-2022
  • (2022)BBoxDB streams: scalable processing of multi-dimensional data streamsDistributed and Parallel Databases10.1007/s10619-022-07408-840:2-3(559-625)Online publication date: 1-Sep-2022
  • (2022)Hybrid fragmentation of medical images’ attributes for multidimensional indexingCluster Computing10.1007/s10586-021-03356-725:1(215-230)Online publication date: 1-Feb-2022
  • (2020)Social-aware spatial keyword top-k group queryDistributed and Parallel Databases10.1007/s10619-020-07292-038:3(601-623)Online publication date: 8-May-2020
  • (2020)A parameter-level parallel optimization algorithm for large-scale spatio-temporal data miningDistributed and Parallel Databases10.1007/s10619-020-07287-x38:3(739-765)Online publication date: 25-Feb-2020
  • (2018)Pedigree-ing your big dataProceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2018.00100(675-681)Online publication date: 1-May-2018

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media