[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Processing Conjunctive and Phrase Queries with the Set-Based Model

  • Conference paper
String Processing and Information Retrieval (SPIRE 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3246))

Included in the following conference series:

Abstract

The objective of this paper is to present an extension to the set-based model (SBM), which is an effective technique for computing term weights based on co-occurrence patterns, for processing conjunctive and phrase queries. The intuition that semantically related term occurrences often occur closer to each other is taken into consideration. The novelty is that all known approaches that account for co-occurrence patterns was initially designed for processing disjunctive (OR) queries, and our extension provides a simple, effective and efficient way to process conjunctive (AND) and phrase queries. This technique is time efficient and yet yields nice improvements in retrieval effectiveness. Experimental results show that our extension improves the average precision of the answer set for all collection evaluated, keeping computational cost small. For the TReC-8 collection, our extension led to a gain, relative to the standard vector space model, of 23.32% and 18.98% in average precision curves for conjunctive and phrase queries, respectively.

This work was supported in part by the GERINDO project-grant MCT/CNPq/CT-INFO 552.087/02-5 and by CNPq grant 520.916/94-8 (Nivio Ziviani).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Pôssas, B., Ziviani, N., Meira, W., Ribeiro-Neto, B.: Set-based model: A new approach for information retrieval. In: The 25th ACM-SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, pp. 230–237 (2002)

    Google Scholar 

  2. Pôssas, B., Ziviani, N., Meira, W.: Enhancing the set-based model using proximity information. In: The 9th International Symposium on String Processing and Information Retrieval, Lisbon, Portugal (2002)

    Google Scholar 

  3. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD International Conference Management of Data, Washington, D.C, pp. 207–216 (1993)

    Google Scholar 

  4. Voorhees, E., Harman, D.: Overview of the eighth text retrieval conference (trec 8). In: The Eighth Text Retrieval Conference, National Institute of Standards and Technology, pp. 1–23 (1999)

    Google Scholar 

  5. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: The 20th International Conference on Very Large Data Bases, Santiago, Chile, pp. 487–499 (1994)

    Google Scholar 

  6. Zaki, M.J.: Generating non-redundant association rules. In: 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, USA, pp. 34–43 (2000)

    Google Scholar 

  7. Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: Proceedings of the, IEEE International Conference on Data Mining, pp. 163–170 (2001)

    Google Scholar 

  8. Yu, C.T., Salton, G.: Precision weighting – an effective automatic indexing method. Journal of the ACM 23(1), 76–88 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  9. Salton, G., Buckley, C.: Term-weighting approaches in automatic retrieval. Information Processing and Management 24(5), 513–523 (1988)

    Article  Google Scholar 

  10. Hawking, D., Craswell, N.: Overview of TREC-2001 web track. In: The Tenth Text REtrieval Conference (TREC-2001), Gaithersburg, Maryland, USA, pp. 61–67 (2001)

    Google Scholar 

  11. Salton, G., Lesk, M.E.: Computer evaluation of indexing and text processing. Journal of the ACM 15(1), 8–36 (1968)

    Article  MATH  Google Scholar 

  12. Salton, G.: The SMART retrieval system – Experiments in automatic document processing. Prentice Hall Inc., Englewood Cliffs (1971)

    Google Scholar 

  13. Raghavan, V.V., Yu, C.T.: Experiments on the determination of the relationships between terms. ACM Transactions on Databases Systems 4, 240–260 (1979)

    Article  Google Scholar 

  14. Harper, D.J., Rijsbergen, C.J.V.: An evaluation of feedback in document retrieval using cooccurrence data. Journal of Documentation 34, 189–216 (1978)

    Article  Google Scholar 

  15. Salton, G., Buckley, C., Yu, C.T.: An evaluation of term dependencies models in information retrieval. In: The 5th ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 151–173 (1982)

    Google Scholar 

  16. Wong, S.K.M., Ziarko, W., Raghavan, V.V., Wong, P.C.N.: On modeling of information retrieval concepts in vector spaces. The ACM Transactions on Databases Systems 12(2), 299–321 (1987)

    Article  Google Scholar 

  17. Wong, S.K.M., Ziarko, W., Raghavan, V.V., Wong, P.C.N.: On extending the vector space model for boolean query processing. In: Proceedings of the 9th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy, September 8-10, 1986, pp. 175–185. ACM, New York (1986)

    Chapter  Google Scholar 

  18. Bollmann-Sdorra, P., Hafez, A., Raghavan, V.V.: A theoretical framework for association mining based on the boolean retrieval model. In: Data Warehousing and Knowledge Discovery: Third International Conference, Munich, Germany, pp. 21–30 (2001)

    Google Scholar 

  19. Ahonen-Myka, H., Heinonen, O., Klemettinen, M., Verkamo, A.: Finding co-occurring text phrases by combining sequence and frequent set discovery. In: Feldman, R. (ed.) Proceedings of 16th International Joint Conference on Artificial Intelligence IJCAI 1999 Workshop on Text Mining: Foundations, Techniques and Applications, Stockholm, Sweden, pp. 1–9 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pôssas, B., Ziviani, N., Ribeiro-Neto, B., Meira, W. (2004). Processing Conjunctive and Phrase Queries with the Set-Based Model. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30213-1_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23210-0

  • Online ISBN: 978-3-540-30213-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics