[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1017074.1017095acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebdbConference Proceedingsconference-collections
Article

DTDs versus XML schema: a practical study

Published: 17 June 2004 Publication History

Abstract

Among the various proposals answering the shortcomings of Document Type Definitions (DTDs), XML Schema is the most widely used. Although DTDs and XML Schema Definitions (XSDs) differ syntactically, they are still quite related on an abstract level. Indeed, freed from all syntactic sugar, XML Schemas can be seen as an extension of DTDs with a restricted form of specialization. In the present paper, we inspect a number of DTDs and XSDs harvested from the web and try to answer the following questions: (1) which of the extra features/expressiveness of XML Schema not allowed by DTDs are effectively used in practice; and, (2) how sophisticated are the structural properties (i.e. the nature of regular expressions) of the two formalisms. It turns out that at present real-world XSDs only sparingly use the new features introduced by XML Schema: on a structural level the vast majority of them can already be defined by DTDs. Further, we introduce a class of simple regular expressions and obtain that a surprisingly high fraction of the content models belong to this class. The latter result sheds light on the justification of simplifying assumptions that sometimes have to be made in XML research.

References

[1]
P. Biron and A. Malhotra. XML Schema part 2: datatypes. W3C, May 2001, http://www.w3.org/TR/xmlschema-2/
[2]
T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, and F. Yergeau. Extensible Markup Language (XML) 1.0. W3C, 3 edition, February 2004, http://www.w3.org/TR/2004/REC-xml-20040204/
[3]
A. Brüggemann-Klein and D. Wood. One-unambiguous regular languages. Information and computation, 140(2):229--253, 1998.
[4]
A. Brüggemann-Klein, M. Murata, and D. Wood. Regular tree languages over non-ranked alphabets (draft 1). Unpublished manuscript, 1998.
[5]
B. Choi. What are real DTDs like? In Proceedings WebDB 2002, pages 43--48, 2002.
[6]
J. Clark. TREX - Tree Regular Expressions for XML: language specification, February 2001, http://www.thaiopensource.com/trex/spec.html
[7]
J. Clark and M. Murata. RELAX NG Specification. OASIS, December 2001, http://www.oasis-open.org/committees/relax-ng/spec-20011203.html
[8]
R. Cover. The cover pages, 2003, http://xml.coverpages.org/
[9]
D. Fallside. XML Schema part 0: primer. W3C, May 2001, http://www.w3.org/TR/xmlschema-0/
[10]
IBM corp. XML Schema Quality Checker, 2003. http://www.alphaworks.ibm.com/tech/xmlsqc
[11]
A. Møller. Document Structure Description 2.0. BRICS, 2003, http://www.brics.dk/DSD/dsd2.pdf
[12]
M. Murata. Document description and processing languages - regular language description for XML (RELAX): Part 1: RELAX core. Technical report, ISO/IEC, May 2001.
[13]
M. Murata, D. Lee, M. Mani, and K. Kawaguchi. Taxonomy of xml schema languages using formal language theory. To be submitted to ACM TOIT, 2003.
[14]
W. Martens, F. Neven and T. Schwentick Complexity of Decision Problems for Simple Regular Expressions. Submitted.
[15]
Y. Papakonstantinou and V. Vianu. DTD inference for views of XML data. In PODS proceedings, pages 35--46, 2000.
[16]
A. Sahuguet. Everything you ever wanted to know about DTDs, but were afraid to ask. In Proceedings of WebDB 2000, 2000.
[17]
H. Thompson, D. Beech, M. Maloney, and N. Mendelsohn. XML Schema part 1: structures. W3C, May 2001, http://www.w3.org/TR/xmlschema-1/
[18]
E. van der Vliet. XML Schema. O'Reilly, Cambridge, 2002.

Cited By

View all
  • (2024)Knowledge Graph-Driven Weather Overview Generation for the Beijing 2022 Winter Olympic GamesJournal of Meteorological Research10.1007/s13351-024-3202-238:5(983-998)Online publication date: 11-Nov-2024
  • (2022)Towards Theory for Real-World DataProceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3517804.3526066(261-276)Online publication date: 12-Jun-2022
  • (2022)Cryptographic Data FormatsGuide to Internet Cryptography10.1007/978-3-031-19439-9_21(505-523)Online publication date: 26-Nov-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WebDB '04: Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
June 2004
100 pages
ISBN:9781450377881
DOI:10.1145/1017074
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • INRIA: Institut Natl de Recherche en Info et en Automatique

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 June 2004

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

WebDB04
Sponsor:
  • INRIA

Acceptance Rates

Overall Acceptance Rate 30 of 100 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)3
Reflects downloads up to 04 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Knowledge Graph-Driven Weather Overview Generation for the Beijing 2022 Winter Olympic GamesJournal of Meteorological Research10.1007/s13351-024-3202-238:5(983-998)Online publication date: 11-Nov-2024
  • (2022)Towards Theory for Real-World DataProceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3517804.3526066(261-276)Online publication date: 12-Jun-2022
  • (2022)Cryptographic Data FormatsGuide to Internet Cryptography10.1007/978-3-031-19439-9_21(505-523)Online publication date: 26-Nov-2022
  • (2021)An Empirical Study on the “Usage of Not” in Real-World JSON Schema DocumentsConceptual Modeling10.1007/978-3-030-89022-3_9(102-112)Online publication date: 18-Oct-2021
  • (2020)XMLSchema2ShExSemantic Web10.3233/SW-18032911:2(235-253)Online publication date: 1-Jan-2020
  • (2020)Inferring Deterministic Regular Expression with UnorderSOFSEM 2020: Theory and Practice of Computer Science10.1007/978-3-030-38919-2_27(325-337)Online publication date: 17-Jan-2020
  • (2019)Dichotomies for Evaluating Simple Regular Path QueriesACM Transactions on Database Systems10.1145/333144644:4(1-46)Online publication date: 15-Oct-2019
  • (2019)An effective algorithm for learning single occurrence regular expressions with interleavingProceedings of the 23rd International Database Applications & Engineering Symposium10.1145/3331076.3331100(1-10)Online publication date: 10-Jun-2019
  • (2019)Containment of Shape Expression Schemas for RDFProceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3294052.3319687(303-319)Online publication date: 25-Jun-2019
  • (2019)Learning Restricted Deterministic Regular Expressions with CountingWeb Information Systems Engineering – WISE 201910.1007/978-3-030-34223-4_7(98-114)Online publication date: 29-Oct-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media