Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3815))

Included in the following conference series:

International Conference on Asian Digital Libraries

1160 Accesses

Abstract

Website archival refers to the task of monitoring and storing snapshots of website(s) for future retrieval and analysis. This task is particularly important for websites that have content changing over time with older information constantly overwritten by newer one. In this paper, we propose WebArc as a set of software tools to allow users to construct a logical structure for a website to be archived. Classifiers are trained to determine relevant web pages and their categories, and subsequently used in website downloading. The archival schedule can be specified and executed by a scheduler. A website viewer is also developed to browse one or more versions of archived web pages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Holistic View on Web Archives

Creating Structure in Web Archives with Collections: Different Concepts from Web Archivists

WebArc: Website Archival Using a Structured Approach

Abstract

Access this chapter

Preview

Similar content being viewed by others

A Holistic View on Web Archives

Creating Structure in Web Archives with Collections: Different Concepts from Web Archivists

Creating Searchable Web Page Snapshots Using Semantic Technologies

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

WebArc: Website Archival Using a Structured Approach

Abstract

Access this chapter

Preview

Similar content being viewed by others

A Holistic View on Web Archives

Creating Structure in Web Archives with Collections: Different Concepts from Web Archivists

Creating Searchable Web Page Snapshots Using Semantic Technologies

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation