[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/775152.775223acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
Article

The XML web: a first study

Published: 20 May 2003 Publication History

Abstract

Although originally designed for large-scale electronic publishing, XML plays an increasingly important role in the exchange of data on the Web. In fact, it is expected that XML will become the lingua franca of the Web, eventually replacing HTML. Not surprisingly, there has been a great deal of interest on XML both in industry and in academia. Nevertheless, to date no comprehensive study on the XML Web (i.e., the subset of the Web made of XML documents only) nor on its contents has been made. This paper is the first attempt at describing the XML Web and the documents contained in it. Our results are drawn from a sample of a repository of the publicly available XML documents on the Web, consisting of about 200,000 documents. Our results show that, despite its short history, XML already permeates the Web, both in terms of generic domains and geographically. Also, our results about the contents of the XML Web provide valuable input for the design of algorithms, tools and systems that use XML in one form or another.

References

[1]
Serge Abiteboul, Peter Buneman, and Dan Suciu. Data on the Web. Morgan Kauffman Publishers, Inc., 1999.
[2]
Serge Abiteboul, Mihai Preda, and Grégory Cobéna. Adaptive On-Line Page Importance Computation. In WWW, 2003.
[3]
Serge Abiteboul and Victor Vianu. Queries and Computation on the Web. In ICDT, 1997.
[4]
Vincent Aguiléra, Sophie Cluet, Tova Milo, Pierangelo Veltri, and Dan Vodislav. Views in a Large Scale XML Repository. VLDB Journal, 11(3), November 2002.
[5]
Shrug Al-Khalifa, H. V. Jagadish, Nick Koudas, Jignesh Patel, Divesh Srivastava, and Yuqing Wu. Structural Joins a Primitive for Efficient XML Query Pattern Matching. In ICDE, 2002.
[6]
Philip Bohannon, Juliana Freire, Prasan Roy, and Jerome Simeon. From XML Schema to Relations: A Cost-Based Approach to XML Storage. In ICDE, 2002.
[7]
Sergey Brim and Larry Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. In WWW, 1998.
[8]
Cooperative Association for Internet Data Analysis. http://www.caida.org/.
[9]
Junghoo Cho and Hector Garcia-Molina. Finding Replicated Web Collections. In SIGMOD, 2000.
[10]
Byron Choi. What Are Real DTDs like. In WebDB, 2002.
[11]
Stephen Dill, Ravi Kumar, Kevin S. McCurley, Sridhar Rajagopalan, D. Sivakumar, and Andrew Tomkins. Self-similarity in the Web. In VLDB, 2001.
[12]
Ronald Fagin, Phokion G. Kolaitis, Reneé J. Miller, and Lucian Popa. Data Exchange: Semantics and Query Answering. In ICDT, 2003.
[13]
IBM DB2 v8.1. http://www.ibm.com.
[14]
Internet Domain Survey. http://www.isc.org/ds/.
[15]
Panagiotis Iperiotis, Luis Gravano, and Mehran Saham. Probe, Count, and Classify: Categorizing Hidden Web Databases. In SIGMOD, 2001.
[16]
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, D. Sivakumar, Andrew Tomkins, and Eli Upfal. The Web as a Graph. In PODS, 2000.
[17]
Quanzhong Li and Bongki Moon. Indexing and Querying XML Data for Regular Path Expressions. In VLDB, 2001.
[18]
Ioana Manolescu, Daniela Florescu, and Donald Kossmann. Answering XML Queries on Heterogeneous Data Sources. In VLDB, 2001.
[19]
Laurent Mignet, Mihai Preda, Serge Abiteboul, Sébastien Ailleret, Bernd Amann, and Amélie Marian. Aquiring XML Pages for a WebHouse. In Base de Donnees Avancees, 2000.
[20]
data ex machina. http://www.dataexmachina.de/.
[21]
Oracle 9i. http://www.oracle.com.
[22]
Yannid Papakonstantinou and Victor Vianu. Incremental Validation of XML Documents. In ICDT, 2003.
[23]
Sriram Raghavan and Hector Garcia-Molina. Crawling the Hidden Web. In VLDB, 2001.
[24]
ISO 8879 - Standard Generalized Markup Language (SGML), 1986.
[25]
The Plays of Shakespeare in XML. http://metalab.unc.edu/bosak/xml/.
[26]
Tamino XML Server. http://www.softwareag.com/tamino.
[27]
I. Tatarinov, Z. Ives, A. Halevy, and D. Weld. Updating XML. In SIGMOD, 2001.
[28]
Semantic Web. http://www.w3.org/2001/sw.
[29]
Wireless Application Protocol. http://www.wapforum.org/.
[30]
World Wide Web Consortium. Document Object Model (DOM). http://www.w3.org/DOM/.
[31]
World Wide Web Consortium. eXtensible Markup Language (XML) 1.0. http://www.w3.org/XML/.
[32]
World Wide Web Consortium. The Extensible Stylesheet Language (XSL). http://www.w3.org/Style/XSL/.
[33]
World Wide Web Consortium. XML Path Language (XPath). http://www.w3.org/TR/xpath/.
[34]
World Wide Web Consortium. XML Schema. http://www.w3.org/XML/Schema.
[35]
The XML benchmark project. http://www.xml-benchmark.org/.
[36]
DBLP XML. http://dblp.uni-trier.de/xml/.
[37]
Xyleme S.A. http://www.xyleme.com/.
[38]
Lucie Xyleme. A Dynamic Warehouse for XML Data of the Web. IEEE - Data Engineering Bulletin, 24(2), 2001.

Cited By

View all
  • (2022)Bayesian ExplorationOperations Research10.1287/opre.2021.220570:2(1105-1127)Online publication date: 1-Mar-2022
  • (2020)Dynamic Programming Approach in Conflict Resolution Algorithm of Access Control Module in Medical Information SystemsAdvances in Information and Communication10.1007/978-3-030-39445-5_49(672-681)Online publication date: 25-Feb-2020
  • (2020)Inferring Deterministic Regular Expression with UnorderSOFSEM 2020: Theory and Practice of Computer Science10.1007/978-3-030-38919-2_27(325-337)Online publication date: 17-Jan-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '03: Proceedings of the 12th international conference on World Wide Web
May 2003
772 pages
ISBN:1581136803
DOI:10.1145/775152
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 May 2003

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. XML documents
  2. XML web
  3. statistical analysis
  4. structural properties

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)3
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Bayesian ExplorationOperations Research10.1287/opre.2021.220570:2(1105-1127)Online publication date: 1-Mar-2022
  • (2020)Dynamic Programming Approach in Conflict Resolution Algorithm of Access Control Module in Medical Information SystemsAdvances in Information and Communication10.1007/978-3-030-39445-5_49(672-681)Online publication date: 25-Feb-2020
  • (2020)Inferring Deterministic Regular Expression with UnorderSOFSEM 2020: Theory and Practice of Computer Science10.1007/978-3-030-38919-2_27(325-337)Online publication date: 17-Jan-2020
  • (2019)An effective algorithm for learning single occurrence regular expressions with interleavingProceedings of the 23rd International Database Applications & Engineering Symposium10.1145/3331076.3331100(1-10)Online publication date: 10-Jun-2019
  • (2019)Learning Restricted Deterministic Regular Expressions with CountingWeb Information Systems Engineering – WISE 201910.1007/978-3-030-34223-4_7(98-114)Online publication date: 29-Oct-2019
  • (2019)Learning a Subclass of Deterministic Regular Expression with CountingKnowledge Science, Engineering and Management10.1007/978-3-030-29551-6_29(341-348)Online publication date: 21-Aug-2019
  • (2018)Inferring Deterministic Regular Expression with CountingConceptual Modeling10.1007/978-3-030-00847-5_15(184-199)Online publication date: 26-Sep-2018
  • (2016)Parallel Tree Accumulations on MapReduceInternational Journal of Parallel Programming10.1007/s10766-015-0355-844:3(466-485)Online publication date: 1-Jun-2016
  • (2015)An Empirical Study of XML Parsers across ApplicationsProceedings of the 2015 International Conference on Computing Communication Control and Automation10.1109/ICCUBEA.2015.83(396-401)Online publication date: 26-Feb-2015
  • (2015)Profiling relational dataThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-015-0389-y24:4(557-581)Online publication date: 1-Aug-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media