[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

The data-document distinction revisited

Published: 31 January 2006 Publication History

Abstract

The Data Retrieval and Document Retrieval models have a number of differences which influence their design, use and management. This paper discusses the most prominent of these differences and shows that they all arise from the more fundamental problem of representational indeterminacy. Representational indeterminacy is a result of the effects of semantic ambiguity and system size. If the differences between the Data Retrieval and Document Retrieval models arise from the same problem then the models are not as distinct as they may have appeared. The Data Retrieval and Document Retrieval models are better seen as examples of information systems with lower and higher levels of representational indeterminacy. The paper concludes with a proposal for an operational definition of representational indeterminacy and a discussion of the role of context and system size in reducing high levels of indeterminacy.

References

[1]
Baker, G.P. and Hacker, P.M.S. (1985). Wittgenstein: Meaning and Understanding, Chicago: The Univ. of Chicago Press.
[2]
Bar-Hillel, Y. (1964). "Theoretical Aspects of the Mechanization of Literature Searching," Chapter 19 in Language and Information: Selected Essays on Their Theory and Application, London: Addison-Wesley.
[3]
Bischoff, J. and Alexander, T. (1997). Data Warehouse: Practical Advice from the Experts, Upper Saddle River, New Jersey: Prentice Hall.
[4]
Blair, D.C. (1990). Language and Representation in Information Retrieval, New York: Elsevier Science Publishers.
[5]
Blair, D.C. (2002a). "Knowledge Management: Hype, Hope or Help?" Journal of the American Society for Information Science and Technology, Vol.54, No.12, pp. 1019--1028.
[6]
Blair, D.C. (2002b). "The Challenge of Document Retrieval, Part I: Major Issues and a Framework Based on Search Exhaustivity, Determinacy of Representation and Document Collection Size," Information Processing and Management, Vol.38, pp. 273--291.
[7]
Blair, D.C. (2002c). "The Challenge of Document Retrieval, Part II: A Strategy for Document Searching Based on Identifiable Document Partitions," Information Processing and Management, Vol.38, pp. 293--304.
[8]
Blair, D.C. (1996). "STAIRS Redux: Thoughts on the STAIRS evaluation, Ten Years After," Journal of the American Society for Information Science, Vol.47, No.1, pp. 4--22.
[9]
Blair, D.C. (1999). "Logic, Determinacy of Sense and the Data Model," The Univ. of Michigan, working paper.
[10]
Blair, D.C. (1998). "The Revolution in Document Management: Corporate Memory or Information Landfill?" The Univ. of Michigan, working paper.
[11]
Blair, D.C. (1988). "An Extended Relational Document Retrieval Model," Information Processing and Management, Vol.24, No.3, pp. 349--371.
[12]
Blair, D.C. (1986). "Indeterminacy in the Subject Access to Documents," Information Processing and Management, Vol.22, No.2, pp. 229--241.
[13]
Blair, D.C. (1984). "The Data-Document Distinction in Information Retrieval," Communications of the ACM, Vol.27, No.4, pp. 369--374.
[14]
Blair, D.C. (1984). "The Management of Information: Basic Distinctions," Sloan Management Review, Vol.26, No.1, pp. 13--23.
[15]
Blair, D.C. (1980). "Searching Biases in Large, Interactive Document Retrieval Systems," Journal of the American Society for Information Science, Vol.31, No.4, pp. 271--277.
[16]
Blair, D.C. and Gordon, M. (1991). "The Management and Control of Written Information: Growing Concern Amid the Failure of Traditional Methods," Information and Management, Vol.20, pp. 239--246.
[17]
Blair, D.C. and Maron, M.E. (1985). "An Evaluation of Retrieval Effectiveness for a Full-Text Document Retrieval System," Communications of the ACM, Vol.28, No.3, pp. 289--297.
[18]
Cooper, W.S. (1973). "On Selecting a Measure of Retrieval Effectiveness, Part I: The 'Subjective' Philosophy of Evaluation," Journal of the American Society for Information Science, Vol.24, pp. 87--100.
[19]
Cooper, W.S. (1968). "Expected Search Length: A Single Measure of Retrieval Effectiveness Based on Weak Ordering Action of Retrieval Systems," Journal of the American Society for Information Science, Vol.19, pp. 30--41.
[20]
Cooper, W.S. and Chen, A. (1995). "Experiments in the Probabilistic Retrieval of Full Text Documents," TREC 3 (Text REtrieval Conference), National Institute of Standards and Technology, Gaithersburg, Maryland.
[21]
Cooper, W.S. and Maron, M.E. (1978). "Foundations of Probabilistic and Utility-Theoretic Indexing," Journal of the ACM, Vol.25, pp. 67--80.
[22]
Crestan, F., Lalmas, M., van Rijsbergen, C.J., and Campbell, I. (1998). "Is This Document Relevant?.Probably." ACM Computing Surveys, Vol.30, No.4, pp. 528--552.
[23]
The Economist (1992). "Organizing Offices: Under the Volcano", Vol.324, Iss.7775, p. 91.
[24]
Hacker, P.M.S. (1996). Wittgenstein's Place in 20th Century Analytic Philosophy, Cambridge, MA: Blackwell Publishers, Ltd.
[25]
Hurford, J.R. and Heasley, B. (1983). Semantics: A Coursebook, Cambridge, UK: Cambridge University Press.
[26]
Kaplan, S.J., Kapor, M.D., Belove, E.J., Landsman, R.A., and Drake, T.R. (1990). "AGENDA: A Personal Information Manager," Communications of the ACM, Vol.33, No.7, pp. 105--116.
[27]
Kahneman, D., Slovic, P., and Tversky, A. (1982). Judgement Under Uncertainty: Heuristics and Biases, Cambridge, UK: Cambridge University Press.
[28]
Kirk, D. (1992). "Document Management: Destined to be a Smash Hit," Infoworld, Vol.14, Iss.44, p. 52.
[29]
Langendoen, D.T. and Postal, P. (1984). The Vastness of Natural Languages, Oxford, UK: Basil Blackwell.
[30]
Lansdale, M.W., Young, D.R., and Bass, C.A. (1989). "MEMOIRS: A Personal Multimedia Information System," in Sutcliffe, A., and Macaulay, L. (Eds.), People and Computers, Cambridge: Cambridge University Press, pp. 315--330.
[31]
Lansdale, M.W. and Edmonds, E. (1992). "Using Memory for Events in the Design of Personal Filing Systems," International Journal of Man-Machine Studies, Vol.36, pp. 97--126.
[32]
Maron, M.E. (1977). "On Indexing, Retrieval and the Meaning of About," Journal of the American Society for Information Science, Vol.28, pp. 38--43.
[33]
Maron, M.E. (1966). "Relational Data File I: Design Philosophy," in Schecter, M.G. (Ed.), Information Retrieval: A Critical View, Washington, D.C.: Thompson Book Co.
[34]
Maron, M.E. and Kuhns, J.L. (1960). "On Relevance, Probabilistic Indexing and Information Retrieval," Journal of the ACM, Vol.7, pp. 216--244.
[35]
Peirce, C.S. (1955). "Logic as Semiotic: The Theory of Signs," in Buchler, J. (Ed.), Philosophical Writings of Peirce, New York: Dover Publications.
[36]
Resnikoff, H.L. (1978). "The National Need for Research in Information Science," STI Issues and Options Workshop, House Subcommittee on Science, Research and Technology, Washington, D.C.
[37]
Robertson, S.E., Maron, M.E., and Cooper, W.S. (1982). "Probability of relevance: a Unification of Two Competing Models for Document Retrieval," Information Technology: Research and Development, Vol.1, pp. 1--21.
[38]
Salton, G. (1989). Automatic Text Processing, Reading, PA: Addison-Wesley.
[39]
Swanson, D. (1977). "Information Retrieval as a Trial-and-Error Process," Library Quarterly, Vol.47, No.2, pp. 128--148.
[40]
Swanson, D. (1966). "Studies of Indexing Depth and Retrieval Effectiveness," Unpublished report, National Science Foundation Grant GN 380, p. 9.
[41]
Tague-Sutcliffe, J. (1995). Measuring Information: An Information Services Perspective, San Diego, CA: Academic Press.
[42]
Tufte, E.R. (1992). The Visual Display of Quantitative Information, Cheshire, CT: Graphics Press.
[43]
van Rijsbergen, C.J. (1979). Information Retrieval (2nd edition), London: Butterworths.
[44]
Winston, P. (1992). Artificial Intelligence (3rd edition), Reading, MA: Addison-Wesley.
[45]
Wittgenstein, L. (2001). Philosophical Investigations (3rd edition), translated by G.E.M. Anscombe, Oxford: Blackwell.
[46]
Wittgenstein, L. (1961). Tractatus, Logico-Philosophicus, translated by D.F. Pears & B.F. McGuinness, London: Routledge & Kegan Paul.
[47]
Zipf, G.K. (1965). Human Behavior and the Principle of Least-Effort, New York: Hafner Publishing Co. (facsimile of the 1949 edition).

Cited By

View all

Index Terms

  1. The data-document distinction revisited

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM SIGMIS Database: the DATABASE for Advances in Information Systems
      ACM SIGMIS Database: the DATABASE for Advances in Information Systems  Volume 37, Issue 1
      Winter 2006
      104 pages
      ISSN:0095-0033
      EISSN:1532-0936
      DOI:10.1145/1120501
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 31 January 2006
      Published in SIGMIS Volume 37, Issue 1

      Check for updates

      Author Tags

      1. data retrieval
      2. document retrieval
      3. information context
      4. information retrieval
      5. language
      6. meaning
      7. representational indeterminacy

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)28
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 12 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Documentation to Documentality in the works of Michael BucklandJournal of Documentation10.1108/JD-04-2023-006680:3(606-617)Online publication date: 23-Aug-2023
      • (2018)Twinning data science with information science in schools of library and information scienceJournal of Documentation10.1108/JD-02-2018-003674:6(1243-1257)Online publication date: 8-Oct-2018
      • (2018)Enabling self-service BIInformation Systems Frontiers10.1007/s10796-016-9722-220:2(275-288)Online publication date: 1-Apr-2018
      • (2016)The Use of Inverted Index to Information Retrieval: ADD Intelligent in Aviation Case StudyTrends and Applications in Software Engineering10.1007/978-3-319-48523-2_20(211-220)Online publication date: 9-Oct-2016
      • (2015)Semantically enhanced pseudo relevance feedback for Arabic information retrievalJournal of Information Science10.1177/016555151559472242:2(246-260)Online publication date: 9-Jul-2015
      • (2014)Impact of Stemmer on Arabic Text RetrievalInformation Retrieval Technology10.1007/978-3-319-12844-3_27(314-326)Online publication date: 2014
      • (2007)A grid-based infrastructure for distributed retrievalProceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries10.5555/2392444.2392463(161-173)Online publication date: 16-Sep-2007
      • (2007)A Grid-Based Infrastructure for Distributed RetrievalResearch and Advanced Technology for Digital Libraries10.1007/978-3-540-74851-9_14(161-173)Online publication date: 2007

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media