[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/133160.133180acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article
Free access

Experiments in automatic statistical thesaurus construction

Published: 01 June 1992 Publication History

Abstract

A well constructed thesaurus has long been recognized as a valuable tool in the effective operation of an information retrieval system. This paper reports the results of experiments designed to determine the validity of an approach to the automatic construction of global thesauri (described originally by Crouch in [1] and [2] based on a clustering of the document collection. The authors validate the approach by showing that the use of thesauri generated by this method results in substantial improvements in retrieval effectiveness in four test collections. The term discrimination value theory, used in the thesaurus generation algorithm to determine a term's membership in a particular thesaurus class, is found not to be useful in distinguishing a “good” from an “indifferent” or “poor” thesaurus class). In conclusion, the authors suggest an alternate approach to automatic thesaurus construction which greatly simplifies the work of producing viable thesaurus classes. Experimental results show that the alternate approach described herein in some cases produces thesauri which are comparable in retrieval effectiveness to those produced by the first method at much lower cost.

References

[1]
Crouch, C. A cluster-based approach to thesaurus construction. Proceedings of the Eleventh International Conference on Research and Development in Information Retrieval; 1988; Grenoble, France.
[2]
Crouch, C. An approach to the automatic construction of global thesauri. Information Processing and Management, 26(5):629-640~ 1990.
[3]
Cleverdon, C.W.; Mills, J.; Keen, M. Factors determining the performance of indexing systems. Aslib Cranfield Project, Vol. 1; 1966.
[4]
Salton, G. Scientific reports on information storage and retrieval (ISR 11). Department of Computer Science, Cornell University, 1966.
[5]
Salton, G. Scientific reports on information storage and retrieval (ISR 13). Department of Computer Science, Cornell University, 1968.
[6]
Salton, G., ed. The SMART retrieval systemexperiments in automatic document processing. Englewood Cliffs, N.J.: Prentice-Hall; 1971.
[7]
Sparck Jones, K. Automatic keyword classification for information retrieval. London: Butterworths; 1971.
[8]
Fox, E. Lexical relations: enhancing effectiveness of information retrieval systems. ACM SIGIR Forum, 15(3):5-36; Winter 1980.
[9]
Wang, Y.; Vandendorpe, J.; Evens, M. Relational thesauri in information retrieval, Journal of the American Society for Information Science, 36(1): 15- 27; 1985.
[10]
Fox, E.; Nutter, J.; Ahlswede, T.; Markowitz, J. Building a large thesaurus for information retrieval. Proceedings of the Second Conference on Applied Natural Language Processing; 1988; Austin, TX.
[11]
Fox, E. Building the CODER lexicon: the Collins English dictionary and its adverb definitions. Tech. Report 86-23, Department of Computer,Science, VPI&SU, Blacksburg, VA.
[12]
Chen, H.; Lynch, K. Semantics-based information management and retrieval: A knowledge discovery approach. IEEE Transactions on Systems, Man and Cybernetics, 1992.
[13]
Chen, H.; Dhar, V. Cognitive process as a basis for intelligent retrieval systems design. Information Processing and Management, 27(5): 405-432; 1991.
[14]
Salton, G.; Yang, C.S.; Yu, C. T. A theory of term importance in automatic text analysis. Journal of the American Society for Information Science, 26(1):33-44; 1975.15.
[15]
Salton, G.; McGill, M. introduction to modern information retrieval. New York: McGraw-Hill; 1983.
[16]
E1-Hamdouchi, A.; Willett, P. An improved algorithm for the calculation of exact term discrimination values. Information Processing and Management, 24(1): 17-22; 1988.
[17]
Voorhees, E. The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval. Ph.D. Thesis, Department of Computer Science, Cornell University, 1985.
[18]
Voorhees, E. Implementing agglomerative hierarchic clustering algorithms for use in document retrieval. Tech. Report 86-765, Department of Computer Science, Cornell University.
[19]
van Rijsbergen, C.J. Information retrieval, 2nd edition. London: Butterworths; 1979.
[20]
Buckley, C. Implementation of the SMART information retrieval system. Tech. Report 85-686, Department of Computer Science, Cornell University.
[21]
Fox, E. Characteristics of two new experimental collections in computer and information science containing textual and bibliographic concepts. Tech. Report 83-561, Department of Computer Science, Comell University.
[22]
Fox, E. Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types. Ph.D. Thesis, Department of Computer Science, Cornell University, 1983.

Cited By

View all
  • (2021)A Query Expansion Method Using Multinomial Naive BayesApplied Sciences10.3390/app11211028411:21(10284)Online publication date: 2-Nov-2021
  • (2021)Automatic Creation of a Domain Specific Thesaurus Using Siamese Networks2021 IEEE 15th International Conference on Semantic Computing (ICSC)10.1109/ICSC50631.2021.00066(355-361)Online publication date: Jan-2021
  • (2020)Health Information RetrievalSignal Processing Techniques for Computational Health Informatics10.1007/978-3-030-54932-9_8(193-207)Online publication date: 8-Oct-2020
  • Show More Cited By

Index Terms

  1. Experiments in automatic statistical thesaurus construction

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
      June 1992
      352 pages
      ISBN:0897915232
      DOI:10.1145/133160
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 June 1992

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Article

      Conference

      SIGIR92
      Sponsor:
      • SIGIR
      • Royal School of Lib.

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)79
      • Downloads (Last 6 weeks)16
      Reflects downloads up to 16 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)A Query Expansion Method Using Multinomial Naive BayesApplied Sciences10.3390/app11211028411:21(10284)Online publication date: 2-Nov-2021
      • (2021)Automatic Creation of a Domain Specific Thesaurus Using Siamese Networks2021 IEEE 15th International Conference on Semantic Computing (ICSC)10.1109/ICSC50631.2021.00066(355-361)Online publication date: Jan-2021
      • (2020)Health Information RetrievalSignal Processing Techniques for Computational Health Informatics10.1007/978-3-030-54932-9_8(193-207)Online publication date: 8-Oct-2020
      • (2018)DECISION MAKING BY EXTRACTING SOFT INFORMATION FROM CSR NEWS REPORTTechnological and Economic Development of Economy10.3846/tede.2018.312124:4(1344-1361)Online publication date: 29-Jun-2018
      • (2018)Automatic Construction of Sentiment Lexicon by Analyzing SMS Bigdata2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622280(5348-5350)Online publication date: Dec-2018
      • (2017)RQUERYProceedings of the Thirty-First AAAI Conference on Artificial Intelligence10.5555/3298023.3298140(3936-3943)Online publication date: 4-Feb-2017
      • (2017)Quary Expansion Using Local and Global Document AnalysisACM SIGIR Forum10.1145/3130348.313036451:2(168-175)Online publication date: 2-Aug-2017
      • (2015)Web Query Reformulation via Joint Modeling of Latent Topic Dependency and Term ContextACM Transactions on Information Systems10.1145/269966633:2(1-38)Online publication date: 17-Feb-2015
      • (2015)Towards semantically linked multilingual corpusInternational Journal of Information Management: The Journal for Information Professionals10.1016/j.ijinfomgt.2015.01.00435:3(387-395)Online publication date: 1-Jun-2015
      • (2014)A Fuzzy Algorithm for Optimizing Semantic Documental SearchesInternational Journal of Web Portals10.4018/ijwp.20140101046:1(50-63)Online publication date: 1-Jan-2014
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media