A Cloud Computing Implementation of XML Indexing Method Using Hadoop

Wen-Chiao Hsu²²,
I-En Liao²² &
Hsiao-Chen Shih²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7198))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

2635 Accesses
7 Citations

Abstract

With the increasing of data at an incredible rate, the development of cloud computing technologies is of critical importance to the advances of researches. The Apache Hadoop has become a widely used open source cloud computing framework that provides a distributed file system for large scale data processing. In this paper, we present a cloud computing implementation of an XML indexing method called NCIM (Node Clustering Indexing Method), which was developed by our research team, for indexing and querying a large number of big XML documents using MapReduce. The experimental results show that NCIM is suitable for cloud computing environment. The throughput of 1200 queries per second for huge amount of queries using a 15-node cluster signifies the potential applications of NCIM to the fast query processing of enormous Internet documents.

This research was partially supported by National Science Council, Taiwan, under contract no. NSC100-2221-E-005-070.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

High-performance XML modeling of parallel queries based on MapReduce framework

Article 14 September 2016

A Generic Tree-Like Index Framework in the Cloud

Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce

References

Liao, I.-E., Hsu, W.-C., Chen, Y.-L.: An Efficient Indexing and Compressing Scheme for XML Query Processing. In: Zavoral, F., Yaghob, J., Pichappan, P., El-Qawasmeh, E. (eds.) NDT 2010. CCIS, vol. 87, pp. 70–84. Springer, Heidelberg (2010)
Chapter Google Scholar
Dutta, H., Kamil, A., Pooleery, M., Sethumadhavan, S., Demme, J.: Distributed Storage of Large Scale Multidimensional Electroencephalogram Data using Hadoop and HBase. In: Grid and Cloud Database Management. Springer, Heidelberg (2011)
Google Scholar
Thiébaut, D., Li, Y., Jaunzeikare, D., Cheng, A., Recto, E.R., Riggs, G., Zhao, X.T., Stolpestad, T., Nguyen, C.L.T.: Processing Wikipedia Dumps: A Case-Study comparing the XGrid and MapReduce Approaches. In: 1st International Conference on Cloud Computing and Services Science (2011)
Google Scholar
Goldman, R., Widom, J.: DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In: 23rd International Conference on Very Large Data Bases, pp. 436–445 (1997)
Google Scholar
Al-Khalifa, S., Jagadish, H.V., Koudas, N., Patel, J.M., Srivastava, D., Wu, Y.: Structural Joins: a Primitive for Efficient XML Query Pattern Matching. In: 18th IEEE International Conference on Data Engineering, pp. 141–152. IEEE Press, Washington, DC (2002)
Google Scholar
Bruno, N., Koudas, N., Srivastava, D.: Holistic Twig Joins: Optimal XML Pattern Matching. In: 2002 ACM SIGMOD International Conference on Management of Data, pp. 310–321. ACM Press, New York (2002)
Chapter Google Scholar
Chen, S., Li, H.G., Tatemura, J., Hsiung, W.P., Agrawal, D., Candan, K.S.: Twig²Stack: Bottom-Up Processing of Generalized Tree-pattern Queries over XML Documents. In: 32nd International Conference on Very Large Data Bases, pp. 283–294 (2006)
Google Scholar
Qin, L., Yu, X.J., Ding, B.: TwigList: Make Twig Pattern Matching Fast. In: 12th International Conference on Database Systems for Advanced Applications, pp. 850–862 (2007)
Google Scholar
Pan, Y., Lu, W., Zhang, Y., Chiu, K.: A Static Load-Balancing Scheme for Parallel XML Parsing on Multicore CPUs. In: 7th IEEE International Symposium on Cluster Computing and the Grid, Brazil (2007)
Google Scholar
Lu, W., Chiu, K., Pan, Y.: A Parallel Approach to XML Parsing. In: 7th International Conference on Grid Computing, pp. 28–29. IEEE Press, Washington, DC (2006)
Google Scholar
Pan, Y., Zhang, Y., Chiu, K.: Simultaneous Transducers for Data-Parallel XML Parsing. In: 22nd IEEE International Parallel and Distributed Processing Symposium (2008)
Google Scholar
Pan, Y., Zhang, Y., Chiu, K.: Parsing XML Using Parallel Traversal of Streaming Trees. In: Sadayappan, P., Parashar, M., Badrinath, R., Prasanna, V.K. (eds.) HiPC 2008. LNCS, vol. 5374, pp. 142–156. Springer, Heidelberg (2008)
Chapter Google Scholar
Welcome to Apache^TM Hadoop^TM!, http://hadoop.apache.org/ (retrieved date: June 27, 2011)
Map/Reduce Tutorial, http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html (retrieved date: June 27, 2011)
Welcome to Hadoop^TM Distributed File System!, http://hadoop.apache.org/hdfs/ (retrieved date: June 27, 2011)
Wikipedia, Apach Hadoop, http://en.wikipedia.org/wiki/Apache_Hadoop (retrieved date: June 29, 2011)
Zhang, C., De Sterck, H., Aboulnaga, A., Djambazian, H., Sladek, R.: Case Study of Scientific Data Processing on a Cloud Using Hadoop. In: Mewhort, D.J.K., Cann, N.M., Slater, G.W., Naughton, T.J. (eds.) HPCS 2009. LNCS, vol. 5976, pp. 400–415. Springer, Heidelberg (2010)
Chapter Google Scholar
YFilter: Filtering and Transformation for High-Volume XML Message Brokering, http://yfilter.cs.umass.edu/code_release.html (retrieved date: June 29, 2011)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Chung-Hsing University, 250 Kuo Kuang Road, Taichung, 402, Taiwan
Wen-Chiao Hsu, I-En Liao & Hsiao-Chen Shih

Authors

Wen-Chiao Hsu
View author publications
You can also search for this author in PubMed Google Scholar
I-En Liao
View author publications
You can also search for this author in PubMed Google Scholar
Hsiao-Chen Shih
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, No. 415, Chien-Kung Road, 80778, Kaohsiung, Taiwan
Jeng-Shyang Pan
Graduate Institute of Educational Measurement and Statistics, National Taichung University of Education, No. 140, Min-Shen Road, 40306, Taichung, Taiwan
Shyi-Ming Chen
Wrocław University of Technology, Wyb. Wyspiańskiego 27, 50-370, Wrocław, Poland
Ngoc Thanh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hsu, WC., Liao, IE., Shih, HC. (2012). A Cloud Computing Implementation of XML Indexing Method Using Hadoop. In: Pan, JS., Chen, SM., Nguyen, N.T. (eds) Intelligent Information and Database Systems. ACIIDS 2012. Lecture Notes in Computer Science(), vol 7198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28493-9_28

Download citation

DOI: https://doi.org/10.1007/978-3-642-28493-9_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28492-2
Online ISBN: 978-3-642-28493-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Cloud Computing Implementation of XML Indexing Method Using Hadoop

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

High-performance XML modeling of parallel queries based on MapReduce framework

A Generic Tree-Like Index Framework in the Cloud

Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Cloud Computing Implementation of XML Indexing Method Using Hadoop

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

High-performance XML modeling of parallel queries based on MapReduce framework

A Generic Tree-Like Index Framework in the Cloud

Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation