Abstract
The estimate of digitization costs is a very difficult task. It is difficult to obtain accurate values because of the great quantity of unknown factors. However, digitization projects need to have a precise idea of the economic costs and the times involved in the development of their contents. The common practice when we start digitizing a new collection is to set a schedule, and a firm commitment to fulfil it (both in terms of cost and deadlines), even before the actual digitization work starts. As it happens with software development projects, incorrect estimates produce delays and cause costs overdrafts. Based on methods used in Software Engineering for software development cost prediction like COCOMO and Function Points, and using historical data gathered during 5 years at the MCDL project, during the digitization of more than 12000 books, we have developed a method for time-and-cost estimates named DiCoMo (Digitization Cost Model) for digital content production in general. This method can be adapted to different production processes, like the production of digital XML or HTML texts using scanning and OCR, and undergoing human proofreading and error correction, or for the production of digital facsimiles (scanning without OCR). The accuracy of the estimates improve with time, since the algorithms can be optimized by making adjustments based on historical data gathered from previous tasks. Finally, we consider the problem of parallelizing tasks, i.e. dividing the work among a number of encoders that will work in parallel.
Similar content being viewed by others
References
Boehm B.W.: Software engineering economics. Prentice Hall, Englewood Cliffs (1981)
Magazinovic, A.: Exploring cost estimation inaccuracy: why do practitioners still fail to predict the actuals? Technical report, Department of Computer Science and Engineering, Chalmers University of Technology, Göteborg, Sweden (2008)
Galorath, D.: Software project failure costs billions... Better estimation and planning can help. http://tinyurl.com/Galorath (2008)
Bia A., Pedreño A.: The Miguel de Cervantes Digital Library: the Hispanic Voice on the Web. LLC (Literary and Linguistic Computing) J (Oxford University Press) 16(2), 161–177 (2001)
Bia A.: The use of multimedia to enhance the accessibility of digital library resources: The multicultural-scope of the services offered by the Miguel de Cervantes digital library project. In: Anderson, J., Dunning, A., Fraser, M. (eds) Digital resources for the humanities 2001 and 2002: an edited selection of papers, Office for Humanities Communication, vol. 16, pp. 1–11. King’s College, London (2003)
Nixon, P.G.: The human function curve. Practitioner pp. 765–769; 935–944 (1976)
Bauer K.: Cost analysis of a project to digitize classic articles in neurosurgery. J. Med. Libr. Assoc. (JMLA) 90(2), 230–234. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC100769/ (2002)
Tanner, S., Smith, J.L.: Digitisation: how much does it really cost? In: Digital resources for the humanities, King’s College, London (1999)
Puglia, S.: The costs of digital imaging projects. RLG DigiNews 3(5). http://chnm.gmu.edu/digitalhistory/links/cached/chapter3/link3.10b.digitalimagingcosts.html (1999)
Lee S.D.: Digitization: is it worth it?. Computer Libraries 21(5), 28–31. http://www.infotoday.com/cilmag/may01/lee.htm (2001)
UMich-MoA: Assessing the costs of conversion: Making of America IV: the American Voice 1850–1876. http://www.lib.umich.edu/files/services/dlps/moa4costs.pdf (2001)
Winer, D.: Good practices in cost reduction for digitisation: resources for minerva and minerva plus WG on good practices. http://www.minervaeurope.org/structure/workinggroups/goodpract/costreduction/documents/wp6costreduction0904.pdf (2004)
Hammond, M., Davies, C.: Understanding the costs of digitisation: detail report. http://www.jisc.ac.uk/media/documents/programmes/digitisation/digitisation-costs-full.pdf (2009)
Research Library Group: RLG worksheet for estimating digital reformatting costs. http://www.oclc.org/research/activities/past/rlg/digimgtools/rlgworksheet.pdf (1998)
Presto-Space: Preservation project cost calculator. http://digitalpreservation.ssl.co.uk/hosted/d13.2/newcalc.php (2007)
Putnam, L.H.: A general empirical solution to the macro software sizing and estimating problem. IEEE Trans. Software Eng. SE-4(4), 345–361, This article introduces the SLIM method (1978)
Boehm B.W., Clark B.K., Horowitz E., Westland C., Madachy R., Selby R.: Cost models for future software life-cycle processes: COCOMO 2.0. In: Arthur, J., Henry, S. (eds) Annals of software engineering special volume on software process and product measurement, vol 1, pp. 45–60. J.C. Baltzer AG, Science Publishers, Amsterdam, The Netherlands (1995)
Clark, B.K., Devnani-Chulani, S., Boehm, B.W.: Calibrating the COCOMO II post-architecture model. In: 20th international conference on software engineering. Center for Software Engineering, Computer Science Department, University of Southern California, Los Angeles (1998)
CSE COCOMO II model definition manual: Center for software Engineering, Computer Science Department, University of Southern California, Los Angeles (1997).
Albrecht, A.J.: Measuring application development productivity. In: Proceedings of the Joint Share/Guide/IBM Applications Development Symposium pp.83–92 (1979)
Albrecht A.J., Gaffney J.E.: Software function, source lines of code, and development effort prediction: a software science validation. IEEE Trans. Software Eng. SE-9(6), 639–648 (1983)
Banerjee, G.: Use case points, an estimation approach (2001)
LCI: Use cases and function points. Longstreet Consulting Inc., Blue Springs (2004)
Minkiewicz A.F.: Measuring object oriented software with predictive object points. PRICE Systems, LLC (1997)
Valerdi, R.: The constructive systems engineering cost model (COSYSMO). Phd thesis, University of Southern California. http://csse.usc.edu/csse/TECHRPTS/PhDDissertations/files/ValerdiDissertation.pdf (2005)
Salvetto-de-León, P.F.: Modelos automatizables de estimacióuy temprana del tiempo y esfuerzo de desarrollo de sistemas de información. Phd thesis, Departamento de Lenguajes y Sistemas Informáticos e Ingeniería de Software, Universidad Politécnica de Madrid. Supervisors: Francisco Javier Segovia-Pérez, Juan Carlos Nogueira-de-León. http://oa.upm.es/367/1/PEDROSALVETTOLEON.pdf (2006)
Bia A., Muñoz R., Gómez J.: Estimating digitization costs in digital libraries using DiCoMo. Lectur Notes Comput. Sci. 6273, 136–147 (2010)
Fairley R.E.: Software engineering concepts. McGraw Hill, New York (1985)
Sackman, H., et al.: Exploratory experimental studies comparing online and offline programming performance. Communications of the ACM 11(1) (1968)
DeMarco T., Lister T.: Peopleware, productive projects and teams. Dorset House Publishing, New York (1987)
Amdahl, G.: Validity of the single processor approach to achieving large-scale computing capabilities. In: AFIPS conference proceedings pp. 483–485 (1967)
Ballard J.C.: Computerized assessment of sustained attention: a review of factors affecting vigilance performance. J. Clin. Exp. Neuropsychol. 18(6), 843–863 (1996)
Kieras, D.E., Meyer, D.E.: The role of cognitive task analysis in the application of predictive models of human performance. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.39.2570&rep=rep1&type=pdf (1998)
Author information
Authors and Affiliations
Corresponding author
Additional information
This study is a substantially revised and extended version of a paper (with the title Estimating Digitization Costs in Digital Libraries Using DiCoMo) originally appeared in the Proceedings of the 14th European Conference on Digital Libraries (ECDL 2010).
Rights and permissions
About this article
Cite this article
Bia, A., Muñoz, R. & Gómez, J. DiCoMo: the digitization cost model. Int J Digit Libr 11, 141–153 (2010). https://doi.org/10.1007/s00799-011-0073-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00799-011-0073-9