[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Expressiveness and complexity of XML Schema

Published: 01 September 2006 Publication History

Abstract

The common abstraction of XML Schema by unranked regular tree languages is not entirely accurate. To shed some light on the actual expressive power of XML Schema, intuitive semantical characterizations of the Element Declarations Consistent (EDC) rule are provided. In particular, it is obtained that schemas satisfying EDC can only reason about regular properties of ancestors of nodes. Hence, with respect to expressive power, XML Schema is closer to DTDs than to tree automata. These theoretical results are complemented with an investigation of the XML Schema Definitions (XSDs) occurring in practice, revealing that the extra expressiveness of XSDs over DTDs is only used to a very limited extent. As this might be due to the complexity of the XML Schema specification and the difficulty of understanding the effect of constraints on typing and validation of schemas, a simpler formalism equivalent to XSDs is proposed. It is based on contextual patterns rather than on recursive types and it might serve as a light-weight front end for XML Schema. Next, the effect of EDC on the way XML documents can be typed is discussed. It is argued that a cleaner, more robust, larger but equally feasible class is obtained by replacing EDC with the notion of 1-pass preorder typing (1PPT): schemas that allow one to determine the type of an element of a streaming document when its opening tag is met. This notion can be defined in terms of grammars with restrained competition regular expressions and there is again an equivalent syntactical formalism based on contextual patterns. Finally, algorithms for recognition, simplification, and inclusion of schemas for the various classes are given.

References

[1]
Alur, R. and Madhusudan, P. 2004. Visibly pushdown languages. In Proceedings of the 36th Symposium on the Theory of Computing (STOC 2004). ACM Press, New York, 202--211.
[2]
Balmin, A., Papakonstantinou, Y., and Vianu, V. 2004. Incremental validation of XML documents. ACM Trans. Database Syst. 29, 4, 710--751.
[3]
Bex, G., Martens, W., Neven, F., and Schwentick, T. 2005. Expressiveness of XSDs: From practice to theory, there and back again. In Proceedings of the 14th International Conference on World Wide Web (WWW 2005). ACM Press, New York, 712--721.
[4]
Bex, G., Neven, F., and Van den Bussche, J. 2004. DTDs versus XML schema: A practical study. In Proceedings of the International Workshop on the Web and Databases (WebDB 2004). 79--84.
[5]
Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E., and Yergeau, F. 2004. Extensible Markup Language (XML). Tech. rep. World Wide Web Consortium. Go online to http://www.w3.org/TR/REC-xml/.
[6]
Brüggemann-Klein, A., Murata, M., and Wood, D. 2001. Regular tree and regular hedge languages over unranked alphabets: Version 1, April 3, 2001. Tech. rep. HKUST-TCSC-2001-0. The Hongkong University of Science and Technology, Hong Kong.
[7]
Brüggemann-Klein, A. and Wood, D. 1998. 1-unambiguous regular languages. Inf. Comput. 142, 2, 182--206.
[8]
Buck, L., Goldfarb, C., and Prescod, P. 2000. Datatypes for DTDs (DT4DTD) 1.0. Tech. rep. World Wide Web Consortium. Go online to http://www.w3.org/TR/dt4dtd/.
[9]
Clark, J. 2002. Multi-format schema converter based on RELAX NG. Go online to http://www.thaiopensource.com/relaxng/trang.html.
[10]
Clark, J. and Murata, M. 2001. Relax NG specification. Go online to http://www.relaxng.org/spec-20011203.html.
[11]
Coen, C. S., Marinelli, P., and Vitali, F. 2004. Schemapath, a minimal extension to XML Schema for conditional constraints. In Proceedings of the 14th International Conference on the World Wide Web (WWW 2004). ACM Press, New York, 164--174.
[12]
Cover, R. 2005. The Cover Pages. Go online to http://xml.coverpages.org/.
[13]
Cristau, J., Löding, C., and Thomas, W. 2005. Deterministic automata on unranked trees. In Proceedings of the 15th International Symposium on Fundamentals of Computation Theory (FCT 2005). Springer, Berlin, Germany, 68--79.
[14]
DuCharme, B. 2002. Filling in the DTD gaps with schematron. O'Reilly xml.com, Sebastopol, CA.
[15]
Fernandez, M., Malhotra, A., Marsh, J., Nagy, M., and Walsh, N. 2005. XQuery 1.0 and XPath 2.0 data model. Tech. rep. World Wide Web Consortium. Go online to http://www.w3.org/TR/xpath-datamodel/.
[16]
Fiorello, D., Gessa, N., Marinelli, P., and Vitali, F. 2004. DTD++ 2.0: Adding support for co-constraints. In Proceedings of the Extreme Markup Languages 2004 Conference (Extreme Markup Languages 2004).
[17]
Fokoué, A. and Schloss, B. 2004. XML Schema quality checker. Go online to http://www.alphaworks.ibm.com/tech/xmlsqc.
[18]
Hopcroft, J. and Ullman, J. 1979. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Boston, MA.
[19]
Hosoya, H. and Pierce, B. C. 2003. XDuce: A statically typed XML processing language. ACM Trans. Internet Techn. 3, 2, 117--148.
[20]
Hromkovic, J., Seibert, S., and Wilke, T. 2001. Translating regular expressions into small ϵ-free nondeterministic finite automata. J. Comput. Syst. Sci. 62, 4, 565--588.
[21]
Jelliffe, R. 2001. The current state of the art of schema languages for XML. Presentation at XML Asia Pacific Conference, Sidney, Australia.
[22]
Jelliffe, R. 2005. Schematron. Go online to http://xml.ascc.net/schematron/.
[23]
Klarlund, N., Møller, A., and Schwartzbach, M. I. 2002. The DSD schema language. Automat. Softw. Eng. 9, 3, 285--319.
[24]
Koch, C. and Scherzinger, S. 2003. Attribute grammars for scalable query processing on XML streams. In Proceedings of the 9th International Workshop on Database Programming Languages (DBPL 2003). Springer, Berlin, Germany, 233--256.
[25]
Lee, D. and Chu, W. 2000. Comparative analysis of six XML schema languages. ACM SIGMOD Rec. 29, 3, 76--87.
[26]
Mani, M. 2001. Keeping chess alive---do we need 1-unambiguous content models? Presentation at Extreme Markup Languages Conference 2001.
[27]
Martens, W. and Neven, F. 2004. Frontiers of tractability for typechecking simple XML transformations. In Proceedings of the 23d Symposium on Principles of Database Systems (PODS 2004). ACM Press, New York, 23--34.
[28]
Martens, W. and Neven, F. 2005. On the complexity of typechecking top-down XML transformations. Theor. Comput. Sci. 336, 1, 153--180.
[29]
Martens, W., Neven, F., and Schwentick, T. 2004. Complexity of decision problems for simple regular expressions. In Proceedings of the 29th International Symposium on Mathematical Foundations of Computer Science (MFCS 2004). Springer, Berlin, Germany, 889--900.
[30]
Martens, W., Neven, F., and Schwentick, T. 2005. Which XML schemas admit 1-pass preorder typing? In Proceedings of the 10th International Conference on Database Theory (ICDT 2005). Springer, Berlin, Germany, 68--82.
[31]
Martens, W. and Niehren, J. 2005. Minimizing tree automata for unranked trees. In Prodeedings of the 10th International Symposium on Database Programming Languages (DBPL 2005). Springer, Berlin, Germany, 232--246.
[32]
Murata, M., Lee, D., and Mani, M. 2001. Taxonomy of XML schema languages using formal language theory. In Proceedings of the Extreme Markup Languages 2001 Conference (Extreme Markup Languages 2001), Montreal, P. Q., Canada.
[33]
Murata, M., Lee, D., Mani, M., and Kawaguchi, K. 2005. Taxonomy of XML schema languages using formal language theory. ACM Trans. Internet Techn. 5, 4, 1--45.
[34]
Neven, F. 2002a. Automata, logic, and XML. In Conference for Computer Science Logic (CSL 2002). Springer, Berlin, Germany, 2--26.
[35]
Neven, F. 2002b. Automata theory for XML researchers. SIGMOD Rec. 31, 3, 39--46.
[36]
Papakonstantinou, Y. and Vianu, V. 2000. DTD inference for views of XML data. In Proceedings of the 19th Symposium on Principles of Database Systems (PODS 2000). ACM Press, New York, 35--46.
[37]
Sahuguet, A. 2000. Everything you ever wanted to know about DTDs, but were afraid to ask. In Proceedings of the International Workshop on the Web and Databases (WebDB 2000).
[38]
Segoufin, L. and Vianu, V. 2002. Validating streaming XML documents. In Proceedings of the 21st Symposium on Principles of Database Systems (PODS 2002). ACM Press, New York, 53--64.
[39]
Seidl, H. 1990. Deciding equivalence of finite tree automata. SIAM J. Comput. 19, 3, 424--437.
[40]
Siméon, J. and Wadler, P. 2003. The essence of XML. In Proceedings of the 30th Symposium on Principles of Programming Languages (POPL 2003). ACM Press, New York, 1--13.
[41]
Sperberg-McQueen, C. 2003. XML Schema 1.0: A language for document grammars. Presentation at the Joint International Conference of the Association for Computers and the Humanities Association for Literary and Linguistic Computing (ACH/ALLC 2003). Go online to http://www.w3.org/People/cmsmcq/2003/achallc/achallc2003.html.
[42]
Sperberg-McQueen, C. and Thompson, H. 2005. XML Schema. Go online to http://www.w3.org/XML/Schema.
[43]
Stockmeyer, L. and Meyer, A. 1973. Word problems requiring exponential time: Preliminary report. In Conference Record of the Fifth Annual ACM Symposium on Theory of Computing (STOC 1973). ACM Press, New York, 1--9.
[44]
Suciu, D. 2001. Typechecking for semistructured data. In Proceedings of the 8th Workshop on Data Bases and Programming Languages (DBPL 2001). Springer, Berlin, Germany, 1--20.
[45]
Thompson, H., Beech, D., Maloney, M., and Mendelsohn, N. 2004. XML Schema Part 1: Structures. Tech. rep. World Wide Web Consortium. Go online to http://www.w3.org/TR/xmlschema-1/.
[46]
van der Vlist, E. 2002. XML Schema. O'Reilly, Sebastopol, CA.
[47]
Vitali, F., Amorosi, N., and Gessa, N. 2003. Datatype- and namespace-aware DTDs: A minimal extension. In Proceedings of the Extreme Markup Languages 2003 Conference (Extreme Markup Languages 2003).

Cited By

View all
  • (2024)Validation of Modern JSON Schema: Formalization and ComplexityProceedings of the ACM on Programming Languages10.1145/36328918:POPL(1451-1481)Online publication date: 5-Jan-2024
  • (2024)A Literature Review on Schema Evolution in DatabasesComputing Open10.1142/S297237012430001202Online publication date: 12-Jul-2024
  • (2024)Checking in polynomial time whether or not a regular tree language is deterministic top-downInformation Processing Letters10.1016/j.ipl.2023.106449184(106449)Online publication date: Feb-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 31, Issue 3
September 2006
400 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1166074
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2006
Published in TODS Volume 31, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. XML
  2. XML Schema
  3. validation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)3
Reflects downloads up to 01 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Validation of Modern JSON Schema: Formalization and ComplexityProceedings of the ACM on Programming Languages10.1145/36328918:POPL(1451-1481)Online publication date: 5-Jan-2024
  • (2024)A Literature Review on Schema Evolution in DatabasesComputing Open10.1142/S297237012430001202Online publication date: 12-Jul-2024
  • (2024)Checking in polynomial time whether or not a regular tree language is deterministic top-downInformation Processing Letters10.1016/j.ipl.2023.106449184(106449)Online publication date: Feb-2024
  • (2022)Towards Theory for Real-World DataProceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3517804.3526066(261-276)Online publication date: 12-Jun-2022
  • (2021)An Empirical Study on the “Usage of Not” in Real-World JSON Schema DocumentsConceptual Modeling10.1007/978-3-030-89022-3_9(102-112)Online publication date: 18-Oct-2021
  • (2021)Deciding Top-Down Determinism of Regular Tree LanguagesFundamentals of Computation Theory10.1007/978-3-030-86593-1_24(341-353)Online publication date: 12-Sep-2021
  • (2020)Inclusion algorithms for one-unambiguous regular expressions and their applicationsScience of Computer Programming10.1016/j.scico.2020.102436193(102436)Online publication date: Jul-2020
  • (2019)An effective algorithm for learning single occurrence regular expressions with interleavingProceedings of the 23rd International Database Applications & Engineering Symposium10.1145/3331076.3331100(1-10)Online publication date: 10-Jun-2019
  • (2019)Containment of Shape Expression Schemas for RDFProceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3294052.3319687(303-319)Online publication date: 25-Jun-2019
  • (2019)Alignment Distance of Regular Tree LanguagesTheoretical Computer Science10.1016/j.tcs.2019.06.022Online publication date: Jul-2019
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media