[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/846219.847340guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources

Published: 28 February 2000 Publication History

Abstract

This paper describes the methodology and the software development of XWRAP, an XML-enabled wrapper construction system for semi-automatic generation of wrapper programs. By XML-enabled we mean that the metadata about information content that are implicit in the original web pages will be extracted and encoded explicitly as XML tags in the wrapped documents. In addition, the query-based content filtering process is performed against the XML documents.The XWRAP wrapper generation framework has three distinct features. First, it explicitly separates tasks of building wrappers that are specific to a Web source from the tasks that are repetitive for any source, and uses a component library to provide basic building blocks for wrapper programs. Second, it provides a user-friendly interface program to allow wrapper developers to generate their wrapper code with a few mouse clicks. Third and most importantly, we introduce and develop a two-phase code generation framework.The first phase utilizes an interactive interface facility to encode the source-specific metadata knowledge identified by individual wrapper developers as declarative information extraction rules. The second phase combines the information extraction rules generated at the first phase with the XWRAP component library to construct an executable wrapper program for the given web source. We report the initial experiments on performance of the XWRAP code generation system and the wrapper programs generated by XWRAP.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICDE '00: Proceedings of the 16th International Conference on Data Engineering
February 2000
ISBN:0769505066

Publisher

IEEE Computer Society

United States

Publication History

Published: 28 February 2000

Author Tags

  1. Information extraction
  2. Web data management
  3. Wrapper generation system
  4. XML

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Webpage visual feature extraction and similarity algorithmProceedings of the 2020 International Conference on Cyberspace Innovation of Advanced Technologies10.1145/3444370.3444552(80-85)Online publication date: 4-Dec-2020
  • (2018)STEMKnowledge and Information Systems10.1007/s10115-017-1062-055:2(305-331)Online publication date: 1-May-2018
  • (2016)KESeDaProceedings of the 12th International Conference on Semantic Systems10.1145/2993318.2993335(129-136)Online publication date: 12-Sep-2016
  • (2016)Cross-supervised synthesis of web-crawlersProceedings of the 38th International Conference on Software Engineering10.1145/2884781.2884842(368-379)Online publication date: 14-May-2016
  • (2016)A survey of methods for the extraction of information from Web resourcesProgramming and Computing Software10.1134/S036176881605007842:5(279-291)Online publication date: 1-Sep-2016
  • (2016)A tool for producing structured interoperable data from product features on the webInformation Systems10.1016/j.is.2015.09.00256:C(36-54)Online publication date: 1-Mar-2016
  • (2015)Generating Actionable Knowledge from Big DataProceedings of the 2015 ACM SIGMOD on PhD Symposium10.1145/2744680.2744687(3-8)Online publication date: 31-May-2015
  • (2014)Entropy-based automated wrapper generation for weblog data extractionWorld Wide Web10.1007/s11280-013-0269-617:4(827-846)Online publication date: 1-Jul-2014
  • (2013)Robust detection of semi-structured web records using a DOM structure-knowledge-driven modelACM Transactions on the Web10.1145/25084347:4(1-32)Online publication date: 1-Nov-2013
  • (2013)Web news extraction via path ratiosProceedings of the 22nd ACM international conference on Information & Knowledge Management10.1145/2505515.2505558(2059-2068)Online publication date: 27-Oct-2013
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media