[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Improving the effectiveness of information retrieval with local context analysis

Published: 01 January 2000 Publication History

Abstract

Techniques for automatic query expansion have been extensively studied in information research as a means of addressing the word mismatch between queries and documents. These techniques can be categorized as either global or local. While global techniques rely on analysis of a whole collection to discover word relationships, local techniques emphasize analysis of the top-ranked documents retrieved for a query. While local techniques have shown to be more effective that global techniques in general, existing local techniques are not robust and can seriously hurt retrieved when few of the retrieval documents are relevant. We propose a new technique, called local context analysis, which selects expansion terms based on cooccurrence with the query terms within the top-ranked documents. Experiments on a number of collections, both English and non-English, show that local context analysis offers more effective and consistent retrieval results.

References

[1]
ALLAN, J., CALLAN, J., CROFT, W., BALLESTEROS, L., BYRD, D., SWAN, R., AND XU, J. 1998. INQUERY does battle with TREC-6. In Proceedings of the 6th Text Retrieval Conference (TREC-6), E. Voorhees, Ed. 169-206. NIST Special Publication 500-240.
[2]
ATTAR, R. AND FRAENKEL, A. S. 1977. Local feedback in full-text retrieval systems. J. ACM 24, 3 (July), 397-417.
[3]
BALLESTEROS, L. AND CROFT, W. B. 1997. Phrasal translation and query expansion techniques for cross-langauge information retrieval. In Proceedings of the 20th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '97, Philadelphia, PA, July 27-31), N. J. Belkin, A. D. Narasimhalu, P. Willett, W. Hersh, F. Can, and E. Voorhees, Eds, ACM Press, New York, NY, 84-91.
[4]
BROGLIO, J., CALLAN, J. P., AND CROFT, W. 1994. An overview of the INQUERY system as used for the TIPSTER project. In Proceedings of the TIPSTER Workshop, Morgan Kaufmann, San Mateo, CA, 47-67.
[5]
BROGLIO, J., CALLAN, J. P., CROFT, W. B., AND NACHBAR, D.W. 1995. Document retrieval and routing using the INQUERY system. In Proceedings of the 3rd Text Retrieval Conference (TREC-3), D. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD, 22-29.
[6]
BUCKLEY, C., MITRA, M., WALZ, J., AND CARDIE, C. 1998. Using clustering and superconcepts within SMART. In Proceedings of the 6th Text Retrieval Conference (TREC-6), E. Voorhees, Ed. 107-124. NIST Special Publication 500-240.
[7]
BUCKLEY, C., SALTON, G., ALAN, J., AND SINGHAL, A. 1995a. Automatic query expansion using SMART. In Proceedings of the 3rd Text Retrieval Conference (TREC-3), D. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD, 69-80.
[8]
BUCKLEY, C., SINGHAL, A., MITRA, M., AND SALTON, G. 1995b. New retrieval approaches using SMART. In Proceedings of the 4th Text Retrieval Conference (TREC-4, Washington, D.C., Nov.), D. K. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD, 25-48.
[9]
CAID, W. R., DUMAIS, S. T., AND GALLANT, S. I. 1995. Learned vector-space models for document retrieval. Inf. Process. Manage. 31, 3 (May-June), 419-429.
[10]
CHURCH, K. W. AND HANKS, P. 1989. Word association norms, mutual information and lexicography. In Proceedings of ACL 27 (Vancouver, Canada), 76-83.
[11]
CROFT, W. AND HARPER, D.J. 1979. Using probabilistic models of document retrieval without relevance information. J. Doc. 35, 285-295.
[12]
CROFT, W. B., COOK, R., AND WILDER, D. 1995. Providing government information on the Internet: Experiences with THOMAS. In Proceedings of the 2nd International Conference on Theory and Practice of Digital Libraries (DL '95, Austin, TX, June), 19-24.
[13]
DEERWESTER, S., DUMAI, S. T., FURNAS, G. W., LANDAUER, T. K., AND HARSHMAN, R. 1990. Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 6, 391-407.
[14]
Fox, E.A. 1983. Extending the Boolean and vector space models of information retrieval with P-norm queries and multiple concept types. Ph.D. Dissertation. Cornell University, Ithaca, NY.
[15]
FURNAS, G. W., DEERWESTER, S., DUMAIS, S. T., LANDAUER, T. K., HARSHMAN, R. A., STREETER, L. A., AND LOCHBAUM, K. E. 1988. Information retrieval using a singular value decomposition model of latent semantic structure. In Proceedings of the 11th International Conference on Research and Development in Information Retrieval (SIGIR '88, Grenoble, France, June 13-15), Y. Chiaramella, Ed. ACM Press, New York, NY, 465-480.
[16]
FURNAS, G. W., LANDAUER, T. K., GOMEZ, L. M., AND DUMAIS, S. T. 1987. The vocabulary problem in human-system communication. Commun. ACM 30, 11 (Nov. 1987), 964-971.
[17]
GREIFF, W. R. 1998. A theory of term weighting based on exploratory data analysis. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98, Melbourne, Australia, Aug. 24-28), W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Eds. ACM Press, New York, NY, 11-19.
[18]
HAWKING, D., THISTLEWAITE, P., AND CRASWELL, N. 1998. ANU/ACSys TREC-6 experiments. In Proceedings of the 6th Text Retrieval Conference (TREC-6), E. Voorhees, Ed. 275-290. NIST Special Publication 500-240.
[19]
HEARST, M. 1994. Mini-paragraph segmentation of expository discourse. In Proceedings of the 32nd Meeting of the ACL,
[20]
HEARST, M. A. AND PEDERSEN, g. O. 1996. Reexamining the cluster hypothesis: scatter/gather on retrieval results. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '96, Zurich, Switzerland, Aug. 18-22), H.-P. Frei, D. Harman, P. Scha ble, and R. Wilkinson, Eds. ACM Press, New York, NY, 76-84.
[21]
HULL, D. 1993. Using statistical testing in the evaluation of retrieval experiments. In Proceedings of the 16th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '93, Pittsburgh, PA, June 27-July), R. Korfhage, E. Rasmussen, and P. Willett, Eds. ACM Press, New York, NY, 329-338.
[22]
JING, Y. AND CROFT, W. B. 1994. An association thesaurus for information retrieval. In Proceedings of the Intelligent Multimedia Information Retrieval Systems (RIAO '94, New York, NY), 146-160.
[23]
KWOK, K. L. 1996. A new method of weighting query terms for ad-hoc retrieval. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '96, Zurich, Switzerland, Aug. 18-22), H.-P. Frei, D. Harman, P. Scha ble, and R. Wilkinson, Eds. ACM Press, New York, NY, 187-195.
[24]
KWOK, K. L., GRUNFELD, L., AND XU, J. 1998. TREC-6 English and Chinese experiments using PIRCS. In Proceedings of the 6th Text Retrieval Conference (TREC-6), E. Voorhees, Ed. 207-214. NIST Special Publication 500-240.
[25]
Lu, A., AYOUB, M., AND DONG, J. 1997. Ad hoc experiments using EUREKA. In Proceedings of the 5th Text Retrieval Conference, 229-240. NIST Special Pub 500-238.
[26]
MINKER, J., WILSON, G., AND ZIMMERMAN, B. 1972. An evaluation of query expansion by the addition of clustered terms for a document retrieval system. Inf. Storage Retrieval 8, 329-348.
[27]
MITRA, M., SINGHAL, n., AND BUCKLEY, C. 1998. Improving automatic query expansion. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98, Melbourne, Australia, Aug. 24-28), W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Eds. ACM Press, New York, NY, 206-214.
[28]
PONTE, g. M. AND CROFT, W. B. 1998. A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98, Melbourne, Australia, Aug. 24-28), W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Eds. ACM Press, New York, NY, 275-281.
[29]
PONTE, g. AND CROFT, B. 1996. USeg: A retargetable word segmentation procedure for information retrieval. In Proceedings of the Symposium on Document Analysis and Information Retrieval,
[30]
PONTE, g. AND CROFT, B. 1997. Text segmentation by topic. In Proceedings of the 1st European Conference on Research and Advanced Technology for Digital Libraries, 113-125.
[31]
QIu, Y. AND FREI, H.-P. 1993. Concept based query expansion. In Proceedings of the 16th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '93, Pittsburgh, PA, June 27-July), R. Korfhage, E. Rasmussen, and P. Willett, Eds. ACM Press, New York, NY, 160-169.
[32]
ROCCHIO, J. 1971. Relevance feedback in information retrieval. In The Smart Retrieval System--Experiments in Automatic Document Processing, G. Salton, Ed. Prentice-Hall, Englewood Cliffs, NJ, 313-323.
[33]
SALTON, G. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley Series in Computer Science. Addison-Wesley Longman Publ. Co., Inc., Reading, MA.
[34]
SALTON, G. AND BUCKLEY, C. 1990. Improving retrieval performance by relevance feedback. J. Am. Soc. Inf. Sci. 41, 4, 288-297.
[35]
SCH TZE, H. AND PEDERSEN, g. 1994. A cooccurrence-based thesaurus and two applications to information retrieval. In Proceedings of the Intelligent Multimedia Information Retrieval Systems (RIAO '94, New York, NY), 266-274.
[36]
SINGHAL, A., BUCKLEY, C., AND MITRA, M. 1996. Pivoted document length normalization. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '96, Zurich, Switzerland, Aug. 18-22), H.-P. Frei, D. Harman, P. Scha ble, and R. Wilkinson, Eds. ACM Press, New York, NY, 21-29.
[37]
SPARCK JONES, K. 1971. Automatic Keyword Classification for Information Retrieval. Butterworths, London, UK.
[38]
VAN RIJSBERGEN, C.J. 1979. Information Retrieval. 2nd ed. Butterworths, London, UK.
[39]
VOORHEES, E. AND HARMAN, D. 1998. Overview of the Sixth Text Retrieval Conference (TREC-6). In Proceedings of the 6th Text Retrieval Conference (TREC-6), E. Voorhees, Ed. 1-24. NIST Special Publication 500-240.
[40]
WALKER, S., ROBERTSON, S., BOUGHANEM, M., JONES, G., AND JONES, K. S. 1997. Okapi at TREC-6 automatic ad hoc, VLC, routing, filtering and QSDR. In Proceedings of the 6th Text Retreival Conference (TREC-6, Nov.), E. Voorhees and D. Harman, Eds. 125-136.
[41]
WILKINSON, R., ZOBEL, J., AND SACKS-DAVIS, R. 1996. Similarity measures for short queries. In Proceedings of the 4th Text Retrieval Conference, D. Harman, Ed. 277-286. NIST Special Publication 500-236.
[42]
Xu, J. 1997. Solving the word mismatch problem through automatic text analysis. Ph.D. Dissertation. Computer and Information Science Department, University of Massachusetts, Amherst, MA.
[43]
Xu, J. AND CALLAN, J. 1998. Effective retrieval with distributed collections. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98, Melbourne, Australia, Aug. 24-28), W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Eds. ACM Press, New York, NY, 112-120.
[44]
Xu, J. AND CROFT, W. B. 1996. Query expansion using local and global document analysis. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '96, Zurich, Switzerland, Aug. 18-22), H.-P. Frei, D. Harman, P. Scha ble, and R. Wilkinson, Eds. ACM Press, New York, NY, 4-11.
[45]
Xu, g., BROGLIO, g., AND CROFT, B. 1994. The design and implementation of a part of speech tagger for English. Tech. Rep. IR52. Computer and Information Science Department, University of Massachusetts, Amherst, MA.

Cited By

View all
  • (2024)Data Augmentation With Semantic Enrichment for Deep Learning Invoice Text ClassificationIEEE Access10.1109/ACCESS.2024.338786012(57326-57344)Online publication date: 2024
  • (2024)Improving the clarity of questions in Community Question Answering networksJournal of Intelligent Information Systems10.1007/s10844-024-00847-y62:6(1631-1658)Online publication date: 2-May-2024
  • (2024)A Hybrid Query Expansion Method for Effective Bengali Information RetrievalProceedings of 4th International Conference on Frontiers in Computing and Systems10.1007/978-981-97-2611-0_26(377-397)Online publication date: 29-Jun-2024
  • Show More Cited By

Index Terms

  1. Improving the effectiveness of information retrieval with local context analysis

    Recommendations

    Reviews

    Karen Sparck-Jones

    This good, solid paper addresses the word mismatch problem (that is, different words for a single concept) with query expansion, using the local context supplied by top-ranked documents in a presearch to identify good term associations. This strategy is intended to overcome the deficiencies both of local feedback that treats terms independently and of globally-defined associations. It chooses among feedback terms as candidates for expansion by preferring those most associated with the given set of query terms (allowing for frequency normalization). Feedback terms may be drawn from whole documents or, more conveniently, from passages, and may be either single words or phrases. The paper reports on experiments with a range of collections of different sizes and in different languages, comparing a no-expansion base and conventional independent local expansion, global associative expansion, and the new local context associative expansion. The results vary across the collections, but expansion in any form is usually helpful, and the new technique is normally superior to the others. The paper includes supporting tests on other parameters. The most interesting compares the local context pseudo-relevance feedback with real relevance feedback, where a non-associative approach is better, illustrating the need for different strategies in the two cases. The local context strategy has already been successfully deployed elsewhere— in the National Institute of Standards and Technology Text Retrieval Conference evaluations, for example—and is clearly a useful retrieval tool.

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Information Systems
    ACM Transactions on Information Systems  Volume 18, Issue 1
    Jan. 2000
    112 pages
    ISSN:1046-8188
    EISSN:1558-2868
    DOI:10.1145/333135
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 January 2000
    Published in TOIS Volume 18, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cooccurrence
    2. document analysis
    3. feedback
    4. global techniques
    5. information retrieval
    6. local context analysis
    7. local techniques

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)342
    • Downloads (Last 6 weeks)38
    Reflects downloads up to 25 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Data Augmentation With Semantic Enrichment for Deep Learning Invoice Text ClassificationIEEE Access10.1109/ACCESS.2024.338786012(57326-57344)Online publication date: 2024
    • (2024)Improving the clarity of questions in Community Question Answering networksJournal of Intelligent Information Systems10.1007/s10844-024-00847-y62:6(1631-1658)Online publication date: 2-May-2024
    • (2024)A Hybrid Query Expansion Method for Effective Bengali Information RetrievalProceedings of 4th International Conference on Frontiers in Computing and Systems10.1007/978-981-97-2611-0_26(377-397)Online publication date: 29-Jun-2024
    • (2024)Event-Specific Document Ranking Through Multi-stage Query Expansion Using an Event Knowledge GraphAdvances in Information Retrieval10.1007/978-3-031-56060-6_22(333-348)Online publication date: 24-Mar-2024
    • (2024)A Deep Learning Approach for Selective Relevance FeedbackAdvances in Information Retrieval10.1007/978-3-031-56060-6_13(189-204)Online publication date: 24-Mar-2024
    • (2023)Semantics-aware query expansion using pseudo-relevance feedbackJournal of Information Science10.1177/01655515231184831Online publication date: 22-Jul-2023
    • (2023)A discriminative method for global query expansion and term reweighting using co-occurrence graphsJournal of Information Science10.1177/016555152199804749:1(183-206)Online publication date: 1-Feb-2023
    • (2023)Qbias - A Dataset on Media Bias in Search Queries and Query SuggestionsProceedings of the 15th ACM Web Science Conference 202310.1145/3578503.3583628(239-244)Online publication date: 30-Apr-2023
    • (2022)Smart Teacher - An AI-Based Self-Learning Platform for Grade 10 ICT Students2022 IEEE 7th International conference for Convergence in Technology (I2CT)10.1109/I2CT54291.2022.9824753(1-5)Online publication date: 7-Apr-2022
    • (2022)Neural Network Guided Fast and Efficient Query-Based Stemming by Predicting Term Co-occurrence StatisticsSN Computer Science10.1007/s42979-022-01081-53:3Online publication date: 24-Mar-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media