[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/827140.827172acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
Article

Repository synchronization in the OAI framework

Published: 27 May 2003 Publication History

Abstract

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) began as an alternative to distributed searching of scholarly eprint repositories. The model embraced by the OAI-PMH is that of metadata harvesting, where value-added services (by a "service provider") are constructed on cached copies of the metadata extracted from the repositories of the harvester's choosing. While this model dispenses with the well known problems of distributed searching, it introduces the problem of synchronization. Stated simply, this problem arises when the service provider's copy of the metadata does not match the metadata currently at the constituent repositories. We define some metrics for describing the synchronization problem in the OAI-PMH. Based on these metrics, we study the synchronization problem of the OAI-PMH framework and propose several approaches for harvesters to implement better synchronization. In particular, if a repository knows its update frequency, it can publish it in an OAI-PMH Identify response using an optional About container that borrows from RDF Site Syndication (RSS) Format.

References

[1]
G. Beged-Dov, D. Brickley, R. Dornfest, I. Davis, L. Dodds, J. Eisenzopf, D. Galbraith, R. Guha, K. MacLeod, E. Miller, A. Swartz, and E. van der Vlist. RDF Site Summary 1.0 Modules: Syndication, 2000. http://purl.org/rss/1.0/modules/syndication/.]]
[2]
C. M. Bowman, P. B. Danzig, D. R. Hardy, U. Manber, and M. F. Schwartz. The Harvest information discovery and access system. Computer Networks and ISDN Systems, 28(1--2):119--125, 1995. http://citeseer.nj.nec.com/article/bowman95harvest.html.]]
[3]
T. Brody. Mining the social life of an eprint archive. http://opcit.eprints.org/tdb198/opcit/.]]
[4]
J. Cho. Crawling the Web: Discovery and maintenance of large-scale web data. PhD thesis, Department of Computer Science, Stanford University, 2001.]]
[5]
J. Cho and H. Garcia-Molina. Synchronizing a database to improve freshness. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 117--128, 2000.]]
[6]
J. Davis and C. Lagoze. NCSTRL: Design and deployment of a globally distributed digital library. Journal of the American Society of Information Science, 51(3):273--280, 2000.]]
[7]
L. Gravano, K. Chang, H. Garcia-Molina, C. Lagoze, and A. Paepcke. STARTS:stanford proposal for Internet metasearching. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 207--218, 1997.]]
[8]
A. Labrinidis and N. Roussopoulos. "Update Propagation Strategies for Improving the Quality of Data on the Web". In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB'01), Rome, Italy, Sept. 2001.]]
[9]
C. Lagoze, W. Hoehn, D. Millman, W. Arms, S. Gan, D. Hillmann, C. Ingram, D. Krafft, R. Marisa, J. Phipps, J. Saylor, C. Terrizzi, J. Allan, S. Guzman-Lara, and T. Kalt. Core services in the architecture of the National Science Digital Library (NSDL). In Proceedings of the Second ACM/IEEE Joint Conference on Digital Libraries, pages 201--209, Portland OR, July 14--18 2002.]]
[10]
C. Lagoze, H. Van de Sompel, M. Nelson, and S. Warner. The Open Archives Initiative Protocol for Metadata Harvesting, version 2.0. http://www.openarchives.org/OAI/openarchivesprotocol.html.]]
[11]
X. Liu. Federating Heterogeneous Digital Libraries by Metadata Harvesting. PhD thesis, Department of Computer Science, Old Dominion University, 2002.]]
[12]
X. Liu, K. Maly, and M. Zubair. Enhanced Kepler framework for self archiving. In Workshop on Distributed Computing Architectures for Digital Libraries. ICPP 2002, pages 455--461, Vancouver Canada, August 18--21 2002.]]
[13]
X. Liu, K. Maly, M. Zubair, and M. L. Nelson. Arc - an OAI service provider for digital library federation. DLib Magazine, 7(4), 2001. http://www.dlib.org/dlib/april01/liu/04liu.html.]]
[14]
X. Liu, K. Maly, M. Zubair, and M. L. Nelson. Arc: An OAI service provider for cross archive searching. In Proceedings of the ACM/IEEE Joint Conference on Digtial Libraries, pages 65--66, Roanoke VA, June 24--28 2001.]]
[15]
C. Lynch. When documents deceive: Trust and provenance as new factors for information retrieval in a tangled web. Journal of the American Society for Information Science and Technology, 52(1):12--17, 2001.]]
[16]
R. Rivest. The MD5 message-digest algorithm. Technical Report Internet RFC-1321, IETF, 1992. http://www.ietf.org/rfc/rfc1321.txt.]]
[17]
A. Van Hoff, J. Giannandrea, M. Hapner, S. Carter, and M. M. The HTTP distribution and replication protocol. Technical Report NOTE-DRP, World Wide Web Consortium, 1997. http://www.w3.org/TR/NOTE-drp.]]

Cited By

View all
  • (2006)Efficient, automatic web resource harvestingProceedings of the 8th annual ACM international workshop on Web information and data management10.1145/1183550.1183560(43-50)Online publication date: 10-Nov-2006
  • (2006)Integration of wikipedia and a geography digital libraryProceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities10.1007/11931584_48(449-458)Online publication date: 27-Nov-2006
  • (2003)Report on the metadata harvesting workshop at JCDL 2003ACM SIGIR Forum10.1145/959258.95927237:2(73-78)Online publication date: 1-Sep-2003

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
JCDL '03: Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
May 2003
393 pages
ISBN:0769519393

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 27 May 2003

Check for updates

Qualifiers

  • Article

Conference

JCDL03
Sponsor:

Acceptance Rates

JCDL '03 Paper Acceptance Rate 54 of 216 submissions, 25%;
Overall Acceptance Rate 415 of 1,482 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2006)Efficient, automatic web resource harvestingProceedings of the 8th annual ACM international workshop on Web information and data management10.1145/1183550.1183560(43-50)Online publication date: 10-Nov-2006
  • (2006)Integration of wikipedia and a geography digital libraryProceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities10.1007/11931584_48(449-458)Online publication date: 27-Nov-2006
  • (2003)Report on the metadata harvesting workshop at JCDL 2003ACM SIGIR Forum10.1145/959258.95927237:2(73-78)Online publication date: 1-Sep-2003

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media