Abstract
XML web data is heterogeneous in terms of content and tagging of information. Integrating, querying, and presenting heterogeneous collections presents many challenges. The structure of XML documents is useful for achieving these tasks; however, not every XML document on the web includes a schema. We propose and implement a framework for efficient schema extraction, integration, and relational schema mapping from heterogeneous XML documents collected from the web. Our approach uses the Schema Extended Context Free Grammar (SECFG) to model XML schemas and transform them into relational schemas. Unlike other implementations, our approach is also able to identify and transform many XML constraints into relational schema constraints while supporting multiple XML schema languages, e.g., DTD or XSD, or no XML schema, as input. We compare our approach with other proposed approaches and conclude that we offer better functionality more efficiently and with greater flexibility.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ahmad, K.: A comparative analysis of managing XML data in relational database. In: Nguyen, N.T., Kim, C.-G., Janiak, A. (eds.) ACIIDS 2011, Part I. LNCS, vol. 6591, pp. 100–108. Springer, Heidelberg (2011)
Atay, M., Chebotko, A., Liu, D., Lu, S., Fotouhi, F.: Efficient schema-based XML-to-Relational data mapping. Information Systems 32(3), 458–476 (2006)
Abdel-Aziz, A.A., Oakasha, H.: Mapping XML DTDs to relational schemas. In: Proceedings of the 3rd International Conference on Computer Systems and Applications (AICCSA 2005), Cairo, Egypt, January 3-6, pp. 47–50 (2005)
Bohannon, P., Freire, J., Roy, P., Siméon, J.: From XML schema to relations: A cost-based approach to XML storage. In: Proceedings of 18th International Conference on Data Engineering (ICDE 2002), San Jose, California, USA, February 26-March 1, pp. 64–75 (2002)
Chidlovskii, B.: Schema extraction from XML collections. In: Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries, Portland, Oregon, USA, June 14-18, pp. 291–292 (2002)
Deutsch, A., Fernandez, M., Suciu, D.: Storing semistructured data with STORED. ACM SIGMOD Record 28(2) (1999)
Florescu, D., Kossmann, D.: Storing and Querying XML Data using an RDMBS. IEEE Data Engineering. Bull. 22(3), 27–34 (1999)
Fujimoto, K., Shimizu, T., DinhKha, D., Yoshikawa, M., Amagasa, T.: A Mapping Scheme of XML Documents into Relational Databases using Schema-based Path Identifiers. In: Proceedings of the International Workshop on Web Information Retrieval and Integration (WIRI 2005), Tokyo, Japan, April 8-9, pp. 82–90 (2005)
Garofalakis, M.N., Gionis, A., Rastogi, R., Seshadri, S., Shim, K.: XTRACT: A system for extracting document type descriptors from XML documents. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, May 16-18, pp. 165–176 (2000)
Haw, S.-C., Lee, C.-S.: Data storage practices and query processing in XML databases: A survey. Knowledge-Based Systems 24(8), 1317–1340 (2011)
Janga, P., Davis, K.C.: Tabular web data: schema discovery and integration. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2013. LNCS, vol. 8057, pp. 26–33. Springer, Heidelberg (2013)
Janga, P., Davis, K.C.: Schema extraction and integration of heterogeneous XML document collections. In: Cuzzocrea, A., Maabout, S. (eds.) MEDI 2013. LNCS, vol. 8216, pp. 176–187. Springer, Heidelberg (2013)
Jung, J.-S., Oh, D.-I., Kong, Y.-H., Ahn, J.-K.: Extracting information from XML documents by reverse generating a DTD. In: Proceedings of the 1st EurAsian Conference on Information and Communication Technology (EurAsia ICT), Shiraz, Iran, October 29-31, pp. 314–321 (2002)
Leonov, A.V., Khusnutdinov, R.R.: Study and development of the DTD generationsystem for XML documents. Programming and Computer Software (PCS) 31(4), 197–210 (2005)
Lee, D., Chu, W.W.: CPI: constraints-preserving inlining algorithm for mapping XML DTD to relational schema. Data & Knowledge Engineering 39(1), 3–25 (2001)
Lee, D., Mani, M., Chu, W.W.: Schema conversion methods between XML and relational models. In: Knowledge Transformation for the Semantic Web, pp. 245–252. IOS (2003)
Min, J.-K., Ahn, J.-Y., Chung, C.-W.: Efficient extraction of schemas for XML documents. Information Processing Letters 85(1), 7–12 (2003)
Mani, M., Lee, D.: XML to relational conversion using theory of regular tree grammars. In: Proceedings of 28th International Conference on Very Large Databases (VLDB 2002), Hong Kong, China, August 20-23, pp. 81–103 (2002)
Mani, M., Lee, D., Muntz, R.R.: Semantic data modeling using XML schemas. In: Kunii, H.S., Jajodia, S., Sølvberg, A. (eds.) ER 2001. LNCS, vol. 2224, pp. 149–163. Springer, Heidelberg (2001)
Moh, C.-H., Lim, E.-P., Ng, W.K.: DTD-Miner: A tool for mining DTD from XML documents. In: Proceedings of the Second International Workshop on Advance Issues of E-Commerce and Web-Based Information Systems (WECWIS 2000), Milpitas, California, USA, June 8-9, pp. 144–151 (2000)
Papakonstantinou, Y., Vianu, V.: DTD Inference for Views of XML Data. In: Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), Dallas, Texas, USA, May 15-17, pp. 35–46 (2000)
Schmidt, A., Kersten, M.L., Windhouwer, M., Waas, F.: Efficient relational storage and retrieval of XML documents. In: Suciu, D., Vossen, G. (eds.) WebDB 2000. LNCS, vol. 1997, pp. 137–150. Springer, Heidelberg (2001)
Schmidt, A., Waas, F., Kersten, M., Carey, M.J., Manolescu, I., Busse, R.: XMark: A benchmark for XML data management. In: Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong SAR, China, August 20–23, pp. 974–985 (2002)
Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D.J., Naughton, J.F.: Relational databases for querying XML documents: limitations and opportunities. In: Proceedings of 25th International Conference on Very Large Data Bases (VLDB 1999), Edinburgh, Scotland, UK, September 7-10, pp. 302–314 (1999)
Hongwei, S., Shusheng, Z., Jingtao, Z., Jing, W.: Constraints-preserving mapping algorithm from XML-schema to relational schema. In: Han, Y., Tai, S., Wikarski, D. (eds.) EDCIS 2002. LNCS, vol. 2480, pp. 193–207. Springer, Heidelberg (2002)
Varlamis, I., Vazirgiannis, M.: Bridging XML-schema and relational databases: A system for generating and manipulating relational databases using valid XML documents. In: Proceedings of the 2001 ACM Symposium on Document Engineering, Atlanta, Georgia, USA, November 9-10, pp. 105–114 (2001)
Wood, D.: Standard generalized markup language: mathematical and philosophical issues. In: van Leeuwen, J. (ed.) Computer Science Today. LNCS, vol. 1000, pp. 344–365. Springer, Heidelberg (1995)
Xing, G., Parthepan, V.: Efficient schema extraction from a large collection of XML documents. In: Proceedings of the 49th Annual Southeast Regional Conference, Kennesaw, GA, USA, March 24-26, pp. 92–96 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Janga, P., Davis, K.C. (2014). Mapping Heterogeneous XML Document Collections to Relational Databases. In: Yu, E., Dobbie, G., Jarke, M., Purao, S. (eds) Conceptual Modeling. ER 2014. Lecture Notes in Computer Science, vol 8824. Springer, Cham. https://doi.org/10.1007/978-3-319-12206-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-12206-9_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12205-2
Online ISBN: 978-3-319-12206-9
eBook Packages: Computer ScienceComputer Science (R0)