[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1141753.1141771acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
Article

Building a research library for the history of the web

Published: 11 June 2006 Publication History

Abstract

This paper describes the building of a research library for studying the Web, especially research on how the structure and content of the Web change over time. The library is particularly aimed at supporting social scientists for whom the Web is both a fascinating social phenomenon and a mirror on society.The library is built on the collections of the Internet Archive, which has been preserving a crawl of the Web every two months since 1996. The technical challenges in organizing this data for research fall into two categories: high-performance computing to transfer and manage the very large amounts of data, and human-computer interfaces that empower research by non-computer specialists.

References

[1]
Arms, W., Aya, S., Dmitriev, P., Kot, B., Mitchell, R., Walle, L., A Research Library for the Web based on the Historical Collections of the Internet Archive. D-Lib Magazine. February 2006. http://www.dlib.org/dlib/february06/arms/02arms.html
[2]
Bergmark, D., Collection synthesis. ACM/IEEE-CS Joint Conference on Digital Libraries, 2002.
[3]
Brin, S., and Page. L., The anatomy of a large-scale hypertextual Web search engine. Seventh International World Wide Web Conference. Brisbane, Australia, 1998.
[4]
Burner, M., and Kahle, B., Internet Archive ARC File Format, 1996. http://archive.org/web/researcher/ArcFileFormat.php
[5]
Chakrabarti, D., Zhan, Y., and Faloutsos, C., R-MAT: recursive model for graph mining. SIAM International Conference on Data Mining, 2004.
[6]
Gerner, N., Sosa, C., Fall 2005 Semester Report for Web Lab Database Load Group. M.Eng. report, Computer Science Department, Cornell University, 2005. http://www.infosci.cornell.edu/SIN/WebLib/papers/Gerner2005.doc.
[7]
Ghemawat, S., Gobioff, H. and Leung, S., The Google File System. 19th ACM Symposium on Operating Systems Principles, October 2003.
[8]
Jeyabalan, K., Kallukalam, J., Representation of Web Graph for in Memory Computation. M.Eng. report, Computer Science Department, Cornell University, 2005. http://www.infosci.cornell.edu/SIN/WebLib/papers/JeyabalanKallukalam2005.doc.
[9]
J. Kleinberg. Authoritative sources in a hyperlinked environment. Ninth ACM-SIAM Symposium on Discrete Algorithms, 1998.
[10]
Mitchell, S., Mooney, M., Mason, J., Paynter, G., Ruscheinski, J., Kedzierski, A., Humphreys, K., iVia Open Source Virtual Library System. D-Lib Magazine, 9 (1), January 2003. http://www.dlib.org/dlib/january03/mitchell/01mitchell.html
[11]
Shah, S., Generating a web graph. M.Eng. report, Computer Science Department, Cornell University, 2005. http://www.infosci.cornell.edu/SIN/WebLib/papers/Shah2005a.doc.
[12]
Shah, S., Retro Browser. M.Eng. report, Computer Science Department, Cornell University, 2005. http://www.infosci.cornell.edu/SIN/WebLib/papers/Shah2005b.pdf.

Cited By

View all
  • (2016)Detecting off-topic pages within TimeMaps in Web archivesInternational Journal on Digital Libraries10.1007/s00799-016-0183-517:3(203-221)Online publication date: 1-Sep-2016
  • (2015)Detecting Off-Topic Pages in Web ArchivesResearch and Advanced Technology for Digital Libraries10.1007/978-3-319-24592-8_17(225-237)Online publication date: 28-Nov-2015
  • (2012)Generating content for digital libraries using an interactive content management systemProceedings of the Second international conference on Theory and Practice of Digital Libraries10.1007/978-3-642-33290-6_54(474-479)Online publication date: 23-Sep-2012
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
JCDL '06: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
June 2006
402 pages
ISBN:1595933549
DOI:10.1145/1141753
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. computational social science
  2. digital libraries
  3. history of the web
  4. internet archive

Qualifiers

  • Article

Conference

JCDL06
JCDL06: Joint Conference on Digital Libraries 2006
June 11 - 15, 2006
NC, Chapel Hill, USA

Acceptance Rates

Overall Acceptance Rate 415 of 1,482 submissions, 28%

Upcoming Conference

JCDL '24
The 2024 ACM/IEEE Joint Conference on Digital Libraries
December 16 - 20, 2024
Hong Kong , China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2016)Detecting off-topic pages within TimeMaps in Web archivesInternational Journal on Digital Libraries10.1007/s00799-016-0183-517:3(203-221)Online publication date: 1-Sep-2016
  • (2015)Detecting Off-Topic Pages in Web ArchivesResearch and Advanced Technology for Digital Libraries10.1007/978-3-319-24592-8_17(225-237)Online publication date: 28-Nov-2015
  • (2012)Generating content for digital libraries using an interactive content management systemProceedings of the Second international conference on Theory and Practice of Digital Libraries10.1007/978-3-642-33290-6_54(474-479)Online publication date: 23-Sep-2012
  • (2010)Automatic knowledge acquisition from historical document archivesCulture and computing10.5555/1985559.1985575(161-172)Online publication date: 1-Jan-2010
  • (2010)Behavioral simulations in MapReduceProceedings of the VLDB Endowment10.14778/1920841.19209623:1-2(952-963)Online publication date: 1-Sep-2010
  • (2010)Automatic Knowledge Acquisition from Historical Document Archives: Historiographical PerspectiveCulture and Computing10.1007/978-3-642-17184-0_13(161-172)Online publication date: 2010
  • (2009)A framework for describing web repositoriesProceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries10.1145/1555400.1555456(341-344)Online publication date: 15-Jun-2009
  • (2009)EverLastProceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries10.1145/1555400.1555455(331-340)Online publication date: 15-Jun-2009
  • (2009)Browsing Assistant for Changing PagesIntelligent Agents in the Evolution of Web and Applications10.1007/978-3-540-88071-4_7(137-160)Online publication date: 2009
  • (2007)Using the web infrastructure to preserve web pagesInternational Journal on Digital Libraries10.5555/2794654.32699446:4(327-349)Online publication date: 1-Jul-2007
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media