The State Library of North Carolina preserves and provides access to over 162,000 digital files from its collections. It uses CONTENTdm for access and had been storing digital content on a local server. It decided to migrate this content to DuraCloud for additional preservation storage. The migration process involved exporting metadata from CONTENTdm, file lists from the local server, checking file names and checksums, and identifying any missing files. Test uploads were done to DuraCloud before running a full sync of content over multiple days. Ongoing uploads will now be done using DuraCloud's tools. Lessons learned include the need for improved metadata handling and more automated monitoring and integration with access systems.
1 of 30
Downloaded 17 times
More Related Content
Migrating from OCLC's Digital Archive to DuraCloud
2. State Library • Part of the North
Carolina Department
of North of Cultural Resources
Carolina
• Work closely/pool
resources with the
State Archives
• Digital Information
Management Program
3. CONTENT STAFF
State Publications
Genealogy Research ~ 4.75 FTE
North Caroliniana
Local server
CONTENTdm (state-supported)
Connexion Digital Import Offsite storage
(vendor)
SYSTEMS STORAGE
7. Local storage
• managed by department-wide IT
• includes working & preservation
content
• server is shared, but our directory is
restricted
• daily incremental backups
STORAGE
8. OCLC’s • Began using in 2008
Digital • Web interface for access
Archive • FTP or automatic uploads
• Integrated with
CONTENTdm
• Detailed reporting, broken
out by CONTENTdm
collection
• Fixity checks, virus checks
12. +
• Integration with • Integration with
CONTENTdm CONTENTdm
• Fixity checks and • Finding and
retrieving items
virus scans
• Manifest/batch
• Responsive upload requirement
support • Vendor-side error
• Extensive reports reporting
• Verifying storage
contents
13. DuraSpace’s • Began using in 2012
• Web interface for
access
• Web interface or
client-side tools for
upload
• Content Management
System-agnostic
• Fixity checks
15. +
• Presentation is like a
traditional gui file manager
• Searching
• Can designate spaces, • Sorting
permissions
• Can make a space public
• Verifying storage
• Powerful upload tools contents
• Fixity scans • Overwriting isn’t
• Robust reporting
• Easy to get content out
hard to do
• Choice of storage services • Batch delete
• VERY collaborative support
• Non-profit
• MD5
17. CONTENTdm Local server
1. Exported metadata from CONTENTdm
2. Exported file names from local server
3. Bashed preservation file names, checksums
4. Identified and recovered missing files
18. 1. Exported metadata Onerous to
from CONTENTdm impossible
2. Exported file names
Easy
from local server
3. Bashed preservation Easy but time
file names, checksums consuming
4. Identified and Easy-ish
recovered missing files
19. 1. Exported metadata Onerous to
from CONTENTdm impossible
• OCLC had to provide export for largest &
most critical collection
• 363 MB tar file -> 18 x 100+ MB csv files
• Added frustration: metadata for
compound objects v. multi-page pdfs
20. 2. Exported file names Easy
from local server
1. Bashed preservation Easy but time
file names, checksums consuming
• Spreadsheet gymnastics
• Manual review for filename/checksum
inconsistencies
21. 4. Identified and Easy-ish
recovered missing files
• Missing from CONTENTdm? Added by
librarians
• Missing from local server? Request to
OCLC or re-download from CONTENTdm
22. THE MOVE
Local server DuraCloud
1. Tested sync and upload tools
2. Discussed spaces
3. Ran sync tool on local preservation storage
4. Ongoing maintenance: upload tool
23. 1. Tested sync and upload tools Easy
• Helped determine flags to manage
computer resources during sync
• Verified logging output, permissions
• Helped flesh out local workflow
24. 2. Discussed spaces Easy, and
Interesting
• Many spaces or few, to accommodate
different workflows?
• Assignment of permissions
25. 3. Ran sync tool on local Easy
preservation storage
• Ran continuously for 5 2/3 days
• 94,177 items
26. 4. Ongoing maintenance: upload tool Easy-ish
• Uploads done weekly and monthly
• Upload tool used to avoid accidental
overwriting
• Have to create “mock” file structure
27. Working
Staging – Limited Access
directory
Local server
Staging
Working
directory
DuraCloud
Working
directory
28. Insights • Room for preservation metadata
improvement
• Working with full metadata dumps
is problematic
• Need for more automated
monitoring for local storage
• Integration with CMS not helpful
unless FULL integration
in other words:
• Streamlined ingest = streamlined
preservation
29. Still more • No, really: manual management
and auditing is getting less feasible
thoughts • What is acceptable content loss?
• What is acceptable preservation
metadata error rate?
• Responsiveness to enhancement
requests should be figured into
vendor choice
• At 5 years out, PREMIS lite is just
fine