Abstract
With the increasing of data at an incredible rate, the development of cloud computing technologies is of critical importance to the advances of researches. The Apache Hadoop has become a widely used open source cloud computing framework that provides a distributed file system for large scale data processing. In this paper, we present a cloud computing implementation of an XML indexing method called NCIM (Node Clustering Indexing Method), which was developed by our research team, for indexing and querying a large number of big XML documents using MapReduce. The experimental results show that NCIM is suitable for cloud computing environment. The throughput of 1200 queries per second for huge amount of queries using a 15-node cluster signifies the potential applications of NCIM to the fast query processing of enormous Internet documents.
This research was partially supported by National Science Council, Taiwan, under contract no. NSC100-2221-E-005-070.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Liao, I.-E., Hsu, W.-C., Chen, Y.-L.: An Efficient Indexing and Compressing Scheme for XML Query Processing. In: Zavoral, F., Yaghob, J., Pichappan, P., El-Qawasmeh, E. (eds.) NDT 2010. CCIS, vol. 87, pp. 70–84. Springer, Heidelberg (2010)
Dutta, H., Kamil, A., Pooleery, M., Sethumadhavan, S., Demme, J.: Distributed Storage of Large Scale Multidimensional Electroencephalogram Data using Hadoop and HBase. In: Grid and Cloud Database Management. Springer, Heidelberg (2011)
Thiébaut, D., Li, Y., Jaunzeikare, D., Cheng, A., Recto, E.R., Riggs, G., Zhao, X.T., Stolpestad, T., Nguyen, C.L.T.: Processing Wikipedia Dumps: A Case-Study comparing the XGrid and MapReduce Approaches. In: 1st International Conference on Cloud Computing and Services Science (2011)
Goldman, R., Widom, J.: DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In: 23rd International Conference on Very Large Data Bases, pp. 436–445 (1997)
Al-Khalifa, S., Jagadish, H.V., Koudas, N., Patel, J.M., Srivastava, D., Wu, Y.: Structural Joins: a Primitive for Efficient XML Query Pattern Matching. In: 18th IEEE International Conference on Data Engineering, pp. 141–152. IEEE Press, Washington, DC (2002)
Bruno, N., Koudas, N., Srivastava, D.: Holistic Twig Joins: Optimal XML Pattern Matching. In: 2002 ACM SIGMOD International Conference on Management of Data, pp. 310–321. ACM Press, New York (2002)
Chen, S., Li, H.G., Tatemura, J., Hsiung, W.P., Agrawal, D., Candan, K.S.: Twig2Stack: Bottom-Up Processing of Generalized Tree-pattern Queries over XML Documents. In: 32nd International Conference on Very Large Data Bases, pp. 283–294 (2006)
Qin, L., Yu, X.J., Ding, B.: TwigList: Make Twig Pattern Matching Fast. In: 12th International Conference on Database Systems for Advanced Applications, pp. 850–862 (2007)
Pan, Y., Lu, W., Zhang, Y., Chiu, K.: A Static Load-Balancing Scheme for Parallel XML Parsing on Multicore CPUs. In: 7th IEEE International Symposium on Cluster Computing and the Grid, Brazil (2007)
Lu, W., Chiu, K., Pan, Y.: A Parallel Approach to XML Parsing. In: 7th International Conference on Grid Computing, pp. 28–29. IEEE Press, Washington, DC (2006)
Pan, Y., Zhang, Y., Chiu, K.: Simultaneous Transducers for Data-Parallel XML Parsing. In: 22nd IEEE International Parallel and Distributed Processing Symposium (2008)
Pan, Y., Zhang, Y., Chiu, K.: Parsing XML Using Parallel Traversal of Streaming Trees. In: Sadayappan, P., Parashar, M., Badrinath, R., Prasanna, V.K. (eds.) HiPC 2008. LNCS, vol. 5374, pp. 142–156. Springer, Heidelberg (2008)
Welcome to ApacheTM HadoopTM!, http://hadoop.apache.org/ (retrieved date: June 27, 2011)
Map/Reduce Tutorial, http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html (retrieved date: June 27, 2011)
Welcome to HadoopTM Distributed File System!, http://hadoop.apache.org/hdfs/ (retrieved date: June 27, 2011)
Wikipedia, Apach Hadoop, http://en.wikipedia.org/wiki/Apache_Hadoop (retrieved date: June 29, 2011)
Zhang, C., De Sterck, H., Aboulnaga, A., Djambazian, H., Sladek, R.: Case Study of Scientific Data Processing on a Cloud Using Hadoop. In: Mewhort, D.J.K., Cann, N.M., Slater, G.W., Naughton, T.J. (eds.) HPCS 2009. LNCS, vol. 5976, pp. 400–415. Springer, Heidelberg (2010)
YFilter: Filtering and Transformation for High-Volume XML Message Brokering, http://yfilter.cs.umass.edu/code_release.html (retrieved date: June 29, 2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hsu, WC., Liao, IE., Shih, HC. (2012). A Cloud Computing Implementation of XML Indexing Method Using Hadoop. In: Pan, JS., Chen, SM., Nguyen, N.T. (eds) Intelligent Information and Database Systems. ACIIDS 2012. Lecture Notes in Computer Science(), vol 7198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28493-9_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-28493-9_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28492-2
Online ISBN: 978-3-642-28493-9
eBook Packages: Computer ScienceComputer Science (R0)