Abstract
XML query and schema languages have some obvious connections to Formal Language Theory. For example, Document Type Definitions (DTDs) can be viewed as tree grammars and use regular expressions, XML Schemas resemble tree automata. Likewise, there are immediate links to Logic, e.g., through the classical characterization of regular tree languages by monadic second-order logic.
It is therefore not surprising that concepts from Logic and Formal Language Theory played an important role in the development of the theoretical foundations of XML query and schema languages. For example, they helped to analyze the expressiveness of languages, to understand the restrictions posed by the W3C standards, and to develop algorithms for various purposes, including efficient evaluation and static analysis.
However, methods from Logic and Formal Languages have not merely been applied to XML theory, the fertilization took place both ways. XML theory posed a lot of new challenges for Logic and Formal Language Theory and triggered various new research lines, e.g., the study of deterministic regular expressions and the development of automata for trees with data values.
The aim of the talk at FoIKS 2012 is to present some of the fundamental connections between XML query and schema languages and Logic and Formal Language Theory, to report on recent developments in the area, and to highlight some current directions of research. This accompanying paper is a kind of annotated bibliography for that talk.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aho, A.V., Ullman, J.D.: Translations on a context-free grammar. Information and Control 19(5), 439–475 (1971)
Balmin, A., Papakonstantinou, Y., Vianu, V.: Incremental validation of xml documents. ACM Trans. Database Syst. 29(4), 710–751 (2004)
Barbosa, D., Mendelzon, A.O., Libkin, L., Mignet, L., Arenas, M.: Efficient incremental validation of XML documents. In: ICDE, pp. 671–682. IEEE Computer Society (2004)
Benedikt, M., Fan, W., Geerts, F.: XPath satisfiability in the presence of DTDs. J. ACM 55(2) (2008)
Benedikt, M., Koch, C.: Interpreting Tree-to-Tree Queries. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006, Part II. LNCS, vol. 4052, pp. 552–564. Springer, Heidelberg (2006)
Benedikt, M., Koch, C.: From XQuery to relational logics. ACM Trans. Database Syst. 34(4) (2009)
Berstel, J., Boasson, L.: Formal properties of XML grammars and languages. Acta Inf. 38(9), 649–671 (2002)
Bex, G.J., Maneth, S., Neven, F.: A Formal Model for an Expressive Fragment of XSLT. In: Palamidessi, C., Moniz Pereira, L., Lloyd, J.W., Dahl, V., Furbach, U., Kerber, M., Lau, K.-K., Sagiv, Y., Stuckey, P.J. (eds.) CL 2000. LNCS (LNAI), vol. 1861, pp. 1137–1151. Springer, Heidelberg (2000)
Bex, G.J., Gelade, W., Martens, W., Neven, F.: Simplifying xml schema: effortless handling of nondeterministic regular expressions. In: Çetintemel, U., Zdonik, S.B., Kossmann, D., Tatbul, N. (eds.) SIGMOD Conference, pp. 731–744. ACM (2009)
Bex, G.J., Gelade, W., Neven, F., Vansummeren, S.: Learning deterministic regular expressions for the inference of schemas from XML data. TWEB 4(4) (2010)
Bex, G.J., Neven, F., Schwentick, T., Vansummeren, S.: Inference of concise regular expressions and DTDs. ACM Trans. Database Syst. 35(2) (2010)
Bex, G.J., Neven, F., Vansummeren, S.: Inferring XML schema definitions from XML data. In: VLDB, pp. 998–1009 (2007)
Bex, G.J., Neven, F., Vansummeren, S.: Schemascope: a system for inferring and cleaning xml schemas. In: Wang, J.T.-L. (ed.) SIGMOD Conference, pp. 1259–1262. ACM (2008)
Björklund, H., Bojańczyk, M.: Bounded Depth Data Trees. In: Arge, L., Cachin, C., Jurdziński, T., Tarlecki, A. (eds.) ICALP 2007. LNCS, vol. 4596, pp. 862–874. Springer, Heidelberg (2007)
Björklund, H., Gelade, W., Martens, W.: Incremental xpath evaluation. ACM Trans. Database Syst. 35(4), 29 (2010)
Björklund, H., Schwentick, T.: On notions of regularity for data languages. Theor. Comput. Sci. 411(4-5), 702–715 (2010)
Bojanczyk, M.: Data monoids. In: Schwentick, Dürr [82], pp. 105–116
Bojanczyk, M., Colcombet, T.: Tree-walking automata do not recognize all regular languages. SIAM J. Comput. 38(2), 658–701 (2008)
Bojanczyk, M., Lasota, S.: An extension of data automata that captures xpath. In: LICS, pp. 243–252. IEEE Computer Society (2010)
Bojanczyk, M., Muscholl, A., Schwentick, T., Segoufin, L.: Two-variable logic on data trees and xml reasoning. J. ACM 56(3) (2009)
Bojanczyk, M., Parys, P.: XPath evaluation in linear time. In: Lenzerini, M., Lembo, D. (eds.) PODS, pp. 241–250. ACM (2008)
Bojanczyk, M., Parys, P.: XPath evaluation in linear time. J. ACM 58(4), 17 (2011)
Bojańczyk, M., Samuelides, M., Schwentick, T., Segoufin, L.: Expressive Power of Pebble Automata. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006, Part I. LNCS, vol. 4051, pp. 157–168. Springer, Heidelberg (2006)
Bouajjani, A., Habermehl, P., Jurski, Y., Sighireanu, M.: Rewriting Systems with Data. In: Csuhaj-Varjú, E., Ésik, Z. (eds.) FCT 2007. LNCS, vol. 4639, pp. 1–22. Springer, Heidelberg (2007)
Bouyer, P., Petit, A., Therien, D.: An algebraic approach to data languages and timed languages. Inf. Comput. 182(2), 137–162 (2003)
Bray, T., Paoli, J., Sperberg-McQueen, C.M.: Extensible markup language (XML) working draft. Technical report, W3C (1996), www.w3.org/TR/WD-xml-961114
Bray, T., Paoli, J., Sperberg-McQueen, C.M., Maler, E., Yergeau, F.: Extensible markup language (XML) 1.0 (fifth edition). Technical report, W3C (2008), www.w3.org/TR/2008/REC-xml-20081126
Brüggemann-Klein, A., Wood, D.: Deterministic regular languages. Technical Report Bericht 38, Universität Freiburg, Institut für Informatik (1991)
Brüggemann-Klein, A., Wood, D.: One-unambiguous regular languages. Information and Computation 140(2), 229–253 (1998)
Brüggemann-Klein, A., Murata, M., Wood, D.: Regular tree languages over non-ranked alphabets (1998), http://hdl.handle.net/1783.1/738
Büchi, J.R.: Weak second-order arithmetic and finite automata. Z. Math. Logik Grundl. Math. 6, 66–92 (1960)
Chamberlin, D., Clark, J., Florescu, D., Robie, J., Simeon, J., Stefanascu, M.: XQuery 1.0: An XML query language (2002), http://www.w3.org/TR/xquery/
Clark, J., Murata, M.: Relax ng specification. Technical report, OASIS Committee Specification Document (2001)
Clark, J.: XSL transformations version 1.0 (August 1999), http://www.w3.org/TR/xslt
Colcombet, T., Ley, C., Puppis, G.: On the Use of Guards for Logics with Data. In: Murlak, F., Sankowski, P. (eds.) MFCS 2011. LNCS, vol. 6907, pp. 243–255. Springer, Heidelberg (2011)
World Wide Web Consortium. XML Path Language (XPath), Version 1.0. W3C Recommendation, November 16 (1999), http://www.w3.org/TR/xpath
World Wide Web Consortium. XML Path Language (XPath), Version 2.0. W3C Recommendation, http://www.w3.org/TR/xpath20
World Wide Web Consortium. XML schema, http://www.w3.org/XML/Schema
Demri, S., Lazic, R.: LTL with the freeze quantifier and register automata. ACM Trans. Comput. Log. 10(3) (2009)
Demri, S., Sangnier, A.: When Model-Checking Freeze ltl Over Counter Machines Becomes Decidable. In: Ong, L. (ed.) FOSSACS 2010. LNCS, vol. 6014, pp. 176–190. Springer, Heidelberg (2010)
Doner, J.: Tree acceptors and some of their applications. J. Comput. Syst. Sci. 4(5), 406–451 (1970)
Elgot, C.C.: Decision problems of finite automata design and related arithmetics. Transactions of The American Mathematical Society 98, 21–21 (1961)
Engelfriet, J., Hoogeboom, H.J.: Automata with nested pebbles capture first-order logic with transitive closure. Logical Methods in Computer Science 3(2) (2007)
Engelfriet, J., Hoogeboom, H.J., Samwel, B.: Xml transformation by tree-walking transducers with invisible pebbles. In: Libkin, L. (ed.) PODS, pp. 63–72. ACM (2007)
Engelfriet, J., Maneth, S.: Macro tree transducers, attribute grammars, and mso definable tree translations. Inf. Comput. 154(1), 34–91 (1999)
Figueira, D.: Satisfiability of downward XPath with data equality tests. In: PODS, pp. 197–206 (2009)
Figueira, D.: Forward-XPath and extended register automata on data-trees. In: ICDT, pp. 231–241 (2010)
Figueira, D., Segoufin, L.: Bottom-up automata on data trees and vertical xpath. In: Schwentick, Dürr [82], pp. 93–104
Garofalakis, M.N., Gionis, A., Rastogi, R., Seshadri, S., Shim, K.: Xtract: Learning document type descriptors from xml document collections. Data Min. Knowl. Discov. 7(1), 23–56 (2003)
Van den Bussche, J., Bex, G.J., Neven, F.: DTDs versus XML schema: A practical study. In: International Workshop on the World Wide Web and Databases, pp. 79–84 (2004)
Gelade, W., Gyssens, M., Martens, W.: Regular Expressions with Counting: Weak versus Strong Determinism. In: Královič, R., Niwiński, D. (eds.) MFCS 2009. LNCS, vol. 5734, pp. 369–381. Springer, Heidelberg (2009)
Gelade, W., Neven, F.: Succinctness of the complement and intersection of regular expressions. In: Albers, S., Weil, P. (eds.) STACS. LIPIcs, vol. 1, pp. 325–336. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany (2008)
Glushkov, V.M.: The abstract theory of automata. Russian Math. Surveys 16, 1–53 (1961)
Goris, E., Marx, M.: Looping caterpillars. In: LICS 2005 (2005)
Gottlob, G., Koch, C., Pichler, R.: Efficient algorithms for processing xpath queries. ACM Trans. Database Syst. 30(2), 444–491 (2005)
Green, T.J., Miklau, G., Onizuka, M., Suciu, D.: Processing XML Streams with Deterministic Automata. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 173–189. Springer, Heidelberg (2002)
Kaminski, M., Francez, N.: Finite-memory automata. Theor. Comput. Sci. 134(2), 329–363 (1994)
Koch, C., Scherzinger, S., Schweikardt, N., Stegmaier, B.: FluXQuery: An optimizing XQuery processor for streaming XML data. In: Proc. 30th International Conference on Very Large Data Bases (VLDB), Toronto, pp. 1309–1312 (2004)
Libkin, L.: Logics for unranked trees: An overview. Logical Methods in Computer Science 2(3) (2006)
Maneth, S., Berlea, A., Perst, T., Seidl, H.: XML type checking with macro tree transducers. In: PODS, pp. 283–294 (2005)
Maneth, S., Perst, T., Seidl, H.: Exact XML Type Checking in Polynomial Time. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 254–268. Springer, Heidelberg (2006)
Martens, W., Neven, F.: Frontiers of tractability for typechecking simple xml transformations. J. Comput. Syst. Sci. 73(3), 362–390 (2007)
Martens, W., Neven, F., Schwentick, T.: Simple off the shelf abstractions for XML schema. SIGMOD Record 36(3), 15–22 (2007)
Martens, W., Neven, F., Schwentick, T.: Complexity of decision problems for XML schemas and chain regular expressions. SIAM J. Comput. 39(4), 1486–1530 (2009)
Martens, W., Neven, F., Schwentick, T., Bex, G.J.: Expressiveness and complexity of XML schema. ACM Trans. Database Syst. 31(3), 770–813 (2006)
Marx, M., de Rijke, M.: Semantic characterizations of XPath. In: TDM 2004 Workshop on XML Databases and Information Retrieval, Twente, The Netherlands, June 21 (2004)
Marx, M.: Conditional xpath. ACM Trans. Database Syst. 30(4), 929–959 (2005)
Marx, M.: Logical Foundations of XML and XQuery. In: Tessaris, S., Franconi, E., Eiter, T., Gutierrez, C., Handschuh, S., Rousset, M.-C., Schmidt, R.A. (eds.) Reasoning Web 2009. LNCS, vol. 5689, pp. 111–157. Springer, Heidelberg (2009)
Marx, M., de Rijke, M.: Semantic characterizations of navigational xpath. SIGMOD Record 34(2), 41–46 (2005)
Milo, T., Suciu, D., Vianu, V.: Typechecking for xml transformers. J. Comput. Syst. Sci. 66(1), 66–97 (2003)
Møller, A., Schwartzbach, M.I.: The Design Space of Type Checkers for XML Transformation Languages. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 17–36. Springer, Heidelberg (2005)
Murata, M.: Forest-regular languages and tree-regular languages. Technical report, Fuji Xerox, Japan. Technical Report (1995)
Murata, M., Lee, D., Mani, M., Kawaguchi, K.: Taxonomy of XML schema languages using formal language theory. ACM Trans. Internet Techn. 5(4), 660–704 (2005)
Neven, F., Schwentick, T.: Query automata over finite trees. Theoretical Computer Science 275, 633–674 (2002)
Neven, F.: Automata, logic, and XML. In: CSL, pp. 2–26 (2002)
Neven, F.: Automata theory for XML researchers. SIGMOD Record 31(3), 39–46 (2002)
Neven, F., Schwentick, T.: Query automata over finite trees. Theoretical Computer Science 275(1-2), 633–674 (2002)
Neven, F., Schwentick, T.: On the power of tree-walking automata. Information and Control 183(1), 86–103 (2003)
Neven, F., Schwentick, T.: On the complexity of XPath containment in the presence of disjunction, DTDs, and variables. Logical Methods in Computer Science 2(3) (2006)
Potthoff, A.: Logische Klassifizierung regulärer Baumsprachen. Doctor’s thesis, Institut für Informatik u. Prakt. Math., Universität Kiel (1994)
Schwentick, T.: Automata for XML - a survey. J. Comput. Syst. Sci. 73(3), 289–315 (2007)
Schwentick, T., Dürr, C. (eds.): 28th International Symposium on Theoretical Aspects of Computer Science, STACS 2011, Dortmund, Germany, March 10-12. LIPIcs, vol. 9. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2011)
Segoufin, L.: Automata and Logics for Words and Trees over an Infinite Alphabet. In: Ésik, Z. (ed.) CSL 2006. LNCS, vol. 4207, pp. 41–57. Springer, Heidelberg (2006)
Segoufin, L.: Static analysis of xml processing with data values. SIGMOD Record 36(1), 31–38 (2007)
Segoufin, L., Vianu, V.: Validating streaming XML documents. In: PODS 2002, pp. 53–64 (2002)
ten Cate, B., Lutz, C.: The complexity of query containment in expressive fragments of xpath 2.0. J. ACM 56(6) (2009)
ten Cate, B., Marx, M.: Axiomatizing the logical core of xpath 2.0. Theory Comput. Syst. 44(4), 561–589 (2009)
ten Cate, B., Segoufin, L.: Transitive closure logic, nested tree walking automata, and xpath. J. ACM 57(3) (2010)
Thatcher, J.W., Wright, J.B.: Generalized finite automata theory with an application to a decision problem of second-order logic. Mathematical Systems Theory 2(1), 57–81 (1968)
Thomas, W.: Languages, Automata, and Logic. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, vol. III, pp. 389–455. Springer, Heidelberg (1997)
Trakhtenbrot, B.A.: Finite automata and logic of monadic predicates. Doklady Akademii Nauk SSSR 140, 326–329 (1961)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schwentick, T. (2012). Foundations of XML Based on Logic and Automata: A Snapshot. In: Lukasiewicz, T., Sali, A. (eds) Foundations of Information and Knowledge Systems. FoIKS 2012. Lecture Notes in Computer Science, vol 7153. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28472-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-28472-4_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28471-7
Online ISBN: 978-3-642-28472-4
eBook Packages: Computer ScienceComputer Science (R0)