High-performance XML modeling of parallel queries based on MapReduce framework

Kunfang Song¹ &
Hongwei Lu¹

260 Accesses
3 Citations
Explore all metrics

Abstract

With the increasing of data at an incredible rate, the development of cloud computing technologies is of critical importance to the advances of researches. MapReduce is a widely adopted computing framework for data-intensive applications running on clusters. Traditional parallel XML parsing and indexing approaches are inadequate for processing large-scale XML datasets on clusters and; therefore, we propose an approach to exploit data parallelisms in XML processing using MapReduce in Hadoop. Our solution seamlessly integrates data storage, labeling, indexing, and parallel queries to process a massive amount of XML data. Specifically, we introduce an SDN labeling algorithm and a distributed hierarchical index using DHTs. More importantly, we design an advanced two phase MapReduce solution that is able to efficiently address the issues of labeling, indexing, and query processing on big XML data. The first MapReduce phase applies filtering, labeling, index building techniques, in which each DataNode performs elements labeling using a map function and a reduce function to merge and build indexes. In the second phase, local XML queries in multiple partitions are performed in parallel using index-table-enabled B-SLCA. Our experimental results show the efficiency and effectiveness of our proposed parallel XML data approach using MapReduce Framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

References

Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI (2004)
Fegaras, L., Li, C., Philip, J.J.: Xml query optimization in map-reduce. In: WebDB (2011)
Yang, D.D., Wei, Z.Q., Yang, Y.Q.: A novel implementation of a Hash function based on XML DOM parser. In: Cyber-Enabled Distributed Computing and Knowledge, Discovery, pp. 5–8 (2015)
Choi, H., Lee, K.-H., Lee, Y.-J.: Parallel labeling of massive xml data with mapreduce. J. Supercomput. 67, 408–437 (2013)
Article Google Scholar
Zhou, J., Bao, Z., Meng, X.: Efficient query processing for xml keyword queries based on the idlist index. VLDB J. 23, 1–26 (2013)
Google Scholar
Xu, L., Ling, T., Bao, Z.: Dde: from dewey to a fully dynamic xml labeling scheme. In: 2009 ACM SIGMOD International Conference on Management of data, pp. 719–730 (2009)
Camacho-Rodriguez, J., Colazzo, D., Manolescu, I.: Building large xml stores in the amazon cloud. In: Data Engineering Workshops (ICDEW), pp. 151–158 (2012)
Chen, G., Vo, H.T., Ooi, B.C.: A framework for supporting dbms-like indexes in the cloud. VLDB 4, 702–713 (2011)
Google Scholar
Ottaviano, G., Grossi, R.: Semi-indexing semi-structured data in tiny space. In: Proceedings of the 20th ACM international conference on Information and Knowledge Management, pp. 1485–1494 (2011)
Feng, J., Li, G.: Efficient fuzzy type-ahead search in xml data. IEEE Trans. Knowl. Data Eng. 24, 882–895 (2012)
Article Google Scholar
Li, J.F.G., Li, C., Zhou, L.: Sail: structure-aware indexing for effective and progressive top-k keyword search over xml documents. Inf. Sci. 179, 3745–3762 (2009)
Article Google Scholar
Chen, L.J., Papakonstantinou, Y.: Supporting top-k keyword search in xml databases. In: ICDE (2010)
Ling, Y., Xu, G.: A distributed keyword search algorithm in xml databases using mapreduce. Comput. Inform. Cybern. Appl. 107, 1307–1316 (2012)
Google Scholar
Zhang, C., Ma, Q., Wang, X., Zhou, A.: Distributed slca-based xml keyword search by map-reduce. Database Syst. Adv. Appl. 6193, 386–397 (2010)
Article Google Scholar
Zhou, M., Hu, H., Zhou, M.: Search xml data by slca on a mapreduce cluster. In: IUCS, pp. 84–89 (2010)
Zinn, D., Bowers, S., Kohler, S., Ludascher, B.: Parallelizing xml data-streaming workflows via mapreduce. J. Comput. Syst. Sci. 76, 447463 (2010)
Article MathSciNet MATH Google Scholar
Fadika, Z., Head, M.R., Govindaraju, M.: Parallel and distributed approach for processing large-scale xml datasets. In: 10th IEEE/ACM International Conference on Grid Computing, pp. 105–112 (2009)
Y. Zhang, Q. L. Li and B. Liu. MapReduce implementation of XML keyword search algorithm. In: 2015 IEEE International Conference on Smart City, pp. 721–728 (2015)
Wang, X.W.W., Zhou, A.: Hash-search: an efficient slca-based keyword search algorithm on xml documents. In: DASFAA, p. 496510 (2009)
Lee, k, Choi, H., Moon, B.: Parallel data processing with mapreduce: a survey. ACM SIGMOD Rec. 40, 11–20 (2012)
Article Google Scholar
Hsu, W.-C., Shih, H.-C.: A cloud computing implementation of xml indexing method using hadoop. In: Intelligent Information and Database Systems, vol. 7198, pp. 256–265 (2012)
Wang, G., Chan, C.-Y.: Multi-query optimization in mapreduce framework. VLDB 7, 145–156 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Huazhong University of Science and Technology, Wuhan, Hubei, China
Kunfang Song & Hongwei Lu

Authors

Kunfang Song
View author publications
You can also search for this author in PubMed Google Scholar
Hongwei Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kunfang Song.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, K., Lu, H. High-performance XML modeling of parallel queries based on MapReduce framework. Cluster Comput 19, 1975–1986 (2016). https://doi.org/10.1007/s10586-016-0628-z

Download citation

Received: 27 May 2016
Revised: 07 August 2016
Accepted: 24 August 2016
Published: 14 September 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s10586-016-0628-z

High-performance XML modeling of parallel queries based on MapReduce framework

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce

Integrated method for distributed processing of large XML data

A dynamic and parallel approach for repetitive prime labeling of XML with MapReduce

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

High-performance XML modeling of parallel queries based on MapReduce framework

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce

Integrated method for distributed processing of large XML data

A dynamic and parallel approach for repetitive prime labeling of XML with MapReduce

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation