[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Mapping Heterogeneous XML Document Collections to Relational Databases

  • Conference paper
Conceptual Modeling (ER 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8824))

Included in the following conference series:

  • 2181 Accesses

Abstract

XML web data is heterogeneous in terms of content and tagging of information. Integrating, querying, and presenting heterogeneous collections presents many challenges. The structure of XML documents is useful for achieving these tasks; however, not every XML document on the web includes a schema. We propose and implement a framework for efficient schema extraction, integration, and relational schema mapping from heterogeneous XML documents collected from the web. Our approach uses the Schema Extended Context Free Grammar (SECFG) to model XML schemas and transform them into relational schemas. Unlike other implementations, our approach is also able to identify and transform many XML constraints into relational schema constraints while supporting multiple XML schema languages, e.g., DTD or XSD, or no XML schema, as input. We compare our approach with other proposed approaches and conclude that we offer better functionality more efficiently and with greater flexibility.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Ahmad, K.: A comparative analysis of managing XML data in relational database. In: Nguyen, N.T., Kim, C.-G., Janiak, A. (eds.) ACIIDS 2011, Part I. LNCS, vol. 6591, pp. 100–108. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  2. Atay, M., Chebotko, A., Liu, D., Lu, S., Fotouhi, F.: Efficient schema-based XML-to-Relational data mapping. Information Systems 32(3), 458–476 (2006)

    Article  Google Scholar 

  3. Abdel-Aziz, A.A., Oakasha, H.: Mapping XML DTDs to relational schemas. In: Proceedings of the 3rd International Conference on Computer Systems and Applications (AICCSA 2005), Cairo, Egypt, January 3-6, pp. 47–50 (2005)

    Google Scholar 

  4. Bohannon, P., Freire, J., Roy, P., Siméon, J.: From XML schema to relations: A cost-based approach to XML storage. In: Proceedings of 18th International Conference on Data Engineering (ICDE 2002), San Jose, California, USA, February 26-March 1, pp. 64–75 (2002)

    Google Scholar 

  5. Chidlovskii, B.: Schema extraction from XML collections. In: Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries, Portland, Oregon, USA, June 14-18, pp. 291–292 (2002)

    Google Scholar 

  6. Deutsch, A., Fernandez, M., Suciu, D.: Storing semistructured data with STORED. ACM SIGMOD Record 28(2) (1999)

    Google Scholar 

  7. Florescu, D., Kossmann, D.: Storing and Querying XML Data using an RDMBS. IEEE Data Engineering. Bull. 22(3), 27–34 (1999)

    Google Scholar 

  8. Fujimoto, K., Shimizu, T., DinhKha, D., Yoshikawa, M., Amagasa, T.: A Mapping Scheme of XML Documents into Relational Databases using Schema-based Path Identifiers. In: Proceedings of the International Workshop on Web Information Retrieval and Integration (WIRI 2005), Tokyo, Japan, April 8-9, pp. 82–90 (2005)

    Google Scholar 

  9. Garofalakis, M.N., Gionis, A., Rastogi, R., Seshadri, S., Shim, K.: XTRACT: A system for extracting document type descriptors from XML documents. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, May 16-18, pp. 165–176 (2000)

    Google Scholar 

  10. Haw, S.-C., Lee, C.-S.: Data storage practices and query processing in XML databases: A survey. Knowledge-Based Systems 24(8), 1317–1340 (2011)

    Article  Google Scholar 

  11. Janga, P., Davis, K.C.: Tabular web data: schema discovery and integration. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2013. LNCS, vol. 8057, pp. 26–33. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  12. Janga, P., Davis, K.C.: Schema extraction and integration of heterogeneous XML document collections. In: Cuzzocrea, A., Maabout, S. (eds.) MEDI 2013. LNCS, vol. 8216, pp. 176–187. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  13. Jung, J.-S., Oh, D.-I., Kong, Y.-H., Ahn, J.-K.: Extracting information from XML documents by reverse generating a DTD. In: Proceedings of the 1st EurAsian Conference on Information and Communication Technology (EurAsia ICT), Shiraz, Iran, October 29-31, pp. 314–321 (2002)

    Google Scholar 

  14. Leonov, A.V., Khusnutdinov, R.R.: Study and development of the DTD generationsystem for XML documents. Programming and Computer Software (PCS) 31(4), 197–210 (2005)

    Article  MATH  Google Scholar 

  15. Lee, D., Chu, W.W.: CPI: constraints-preserving inlining algorithm for mapping XML DTD to relational schema. Data & Knowledge Engineering 39(1), 3–25 (2001)

    Article  MathSciNet  Google Scholar 

  16. Lee, D., Mani, M., Chu, W.W.: Schema conversion methods between XML and relational models. In: Knowledge Transformation for the Semantic Web, pp. 245–252. IOS (2003)

    Google Scholar 

  17. Min, J.-K., Ahn, J.-Y., Chung, C.-W.: Efficient extraction of schemas for XML documents. Information Processing Letters 85(1), 7–12 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  18. Mani, M., Lee, D.: XML to relational conversion using theory of regular tree grammars. In: Proceedings of 28th International Conference on Very Large Databases (VLDB 2002), Hong Kong, China, August 20-23, pp. 81–103 (2002)

    Google Scholar 

  19. Mani, M., Lee, D., Muntz, R.R.: Semantic data modeling using XML schemas. In: Kunii, H.S., Jajodia, S., Sølvberg, A. (eds.) ER 2001. LNCS, vol. 2224, pp. 149–163. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  20. Moh, C.-H., Lim, E.-P., Ng, W.K.: DTD-Miner: A tool for mining DTD from XML documents. In: Proceedings of the Second International Workshop on Advance Issues of E-Commerce and Web-Based Information Systems (WECWIS 2000), Milpitas, California, USA, June 8-9, pp. 144–151 (2000)

    Google Scholar 

  21. Papakonstantinou, Y., Vianu, V.: DTD Inference for Views of XML Data. In: Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), Dallas, Texas, USA, May 15-17, pp. 35–46 (2000)

    Google Scholar 

  22. Schmidt, A., Kersten, M.L., Windhouwer, M., Waas, F.: Efficient relational storage and retrieval of XML documents. In: Suciu, D., Vossen, G. (eds.) WebDB 2000. LNCS, vol. 1997, pp. 137–150. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  23. Schmidt, A., Waas, F., Kersten, M., Carey, M.J., Manolescu, I., Busse, R.: XMark: A benchmark for XML data management. In: Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong SAR, China, August 20–23, pp. 974–985 (2002)

    Google Scholar 

  24. Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D.J., Naughton, J.F.: Relational databases for querying XML documents: limitations and opportunities. In: Proceedings of 25th International Conference on Very Large Data Bases (VLDB 1999), Edinburgh, Scotland, UK, September 7-10, pp. 302–314 (1999)

    Google Scholar 

  25. Hongwei, S., Shusheng, Z., Jingtao, Z., Jing, W.: Constraints-preserving mapping algorithm from XML-schema to relational schema. In: Han, Y., Tai, S., Wikarski, D. (eds.) EDCIS 2002. LNCS, vol. 2480, pp. 193–207. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  26. Varlamis, I., Vazirgiannis, M.: Bridging XML-schema and relational databases: A system for generating and manipulating relational databases using valid XML documents. In: Proceedings of the 2001 ACM Symposium on Document Engineering, Atlanta, Georgia, USA, November 9-10, pp. 105–114 (2001)

    Google Scholar 

  27. http://www.w3schools.com/dtd/

  28. Wood, D.: Standard generalized markup language: mathematical and philosophical issues. In: van Leeuwen, J. (ed.) Computer Science Today. LNCS, vol. 1000, pp. 344–365. Springer, Heidelberg (1995)

    Chapter  Google Scholar 

  29. http://xml.coverpages.org/schemas.html

  30. Xing, G., Parthepan, V.: Efficient schema extraction from a large collection of XML documents. In: Proceedings of the 49th Annual Southeast Regional Conference, Kennesaw, GA, USA, March 24-26, pp. 92–96 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Janga, P., Davis, K.C. (2014). Mapping Heterogeneous XML Document Collections to Relational Databases. In: Yu, E., Dobbie, G., Jarke, M., Purao, S. (eds) Conceptual Modeling. ER 2014. Lecture Notes in Computer Science, vol 8824. Springer, Cham. https://doi.org/10.1007/978-3-319-12206-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12206-9_7

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12205-2

  • Online ISBN: 978-3-319-12206-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics