[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

A generic ranking function discovery framework by genetic programming for information retrieval

Published: 01 May 2004 Publication History

Abstract

Ranking functions play a substantial role in the performance of information retrieval (IR) systems and search engines. Although there are many ranking functions available in the IR literature, various empirical evaluation studies show that ranking functions do not perform consistently well across different contexts (queries, collections, users). Moreover, it is often difficult and very expensive for human beings to design optimal ranking functions that work well in all these contexts. In this paper, we propose a novel ranking function discovery framework based on Genetic Programming and show through various experiments how this new framework helps automate the ranking function design/discovery process.

References

[1]
Banzhaf, W., Nordin, P., Keller, R. E., & Francone, F. D. (1998). Genetic programming: an introduction--on the automatic evolution of computer programs and its applications. San Francisco, CA: Morgan Kaufmann Publishers.]]
[2]
Bartell, B. T., Cottrell, G. W., & Belew, R. K. (1994). Automatic combination of multiple ranked retrieval systems. In The proceedings of seventeenth annual international ACM SIGIR conference on research and development in information retrieval (pp. 173-181). Available: citeseer.nj.nec.com/bartell94automatic.html.]]
[3]
Chen, H., Chung, Y., Ramsey, M., & Yang, C. (1998). A smart itsy bitsy spider for the web. Journal of the American Society for information Science 49(7), 604-618.]]
[4]
Fan, W., Gordon, M. D., & Pathak, P. (2000). Personalization of search engine services for effective retrieval and knowledge management. In Proceedings of 2000 international conference on information systems (ICIS), Brisbane, Australia (pp. 20-34).]]
[5]
Fox, E. A. (1983). Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types. Ph.D. thesis, Cornell University.]]
[6]
Fox, E. A., Koushik, M. P., Shaw, J., Modlin, R., & Rao, D. (1993). Combining evidence from multiple searches. In Proceedings of the first text retrieval conference (TREC-1). NIST Special Publication 500-207 (pp. 319-328).]]
[7]
Fuhr, N., & Buckley, C. (1991). A probabilistic learning approach for document indexing. ACM Transactions on Information Systems, 9(3), 223-248. Available: citeseer.nj.nec.com/fuhr91probabilistic.html.]]
[8]
Fuhr, N., & Pfeifer, U. (1994). Probabilistic information retrieval as combination of abstraction, inductive learning and probabilistic assumptions. ACM Transactions on InJormation Systems, 12(1), 92-115. Available: citeseer.nj.nec.com/ fuhr94probabilistic.html.]]
[9]
Gey, F. C. (1994). Inferring probability of relevance using the method of logistic regression. In The proceedings of seventeenth annual international ACM SIGIR conference on research and development in information retrieval (pp. 222-231).]]
[10]
Gordon, M. (1988). Probabilistic and genetic algorithms for document retrieval. Communications of ACM, 31(2), 152- 169.]]
[11]
Gordon, M. (1991). User-based document clustering by redescribing subject descriptions with a genetic algorithm. Journal of the American Society for Informatioin Science, 42(5), 311-322.]]
[12]
Gordon, M., & Pathak, P. (1999). Finding information on the World Wide Web: the retrieval effectiveness of search engines, Information Processing and management, 35(2), 141-180.]]
[13]
Harman, D. K. (1993). Overview of the first text retrieval conference (TREC-1). In D. K. Harman (Ed.), Proceedings of the first text retrieval conference. NIST special Publication 500-207 (pp. 1-20).]]
[14]
Harman, D. K. (1996). Overview of the fourth text retrieval conference (TREC-4). In D. K. Harman (Ed.), Proceedings of the fourth text retrieval conference. NIST Special Publication 500-236 (pp. 1-24).]]
[15]
Jansen, B. J., Spink, A., & Saracevic, T. (2000). Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing and Management, 36(2), 207-227.]]
[16]
Jones, W. P., & Furnas, G. W. (1987). Pictures of relevance: a geometric analysis of similarity measures. Journal of the American society for information science, 38(6), 420-442.]]
[17]
Koza, J.R. (1992). Genetic programming: on the programming of computers by means of natural selection. Cambridge, MA, USA: MIT Press.]]
[18]
Langdon, W. B. (1998). Data structures and genetic programming: genetic programming+data structures=automatic programming. Kluwer Publishing.]]
[19]
Lee, J. H. (1997). Analyses of multiple evidence combination. In The proceedings of twentieth annual international ACM SIGIR conference on research and development in information retrieval (pp. 267-276).]]
[20]
Martin-Bautista, M. J., Vila, M., & Larsen, H. L. (1999). A fuzzy genetic algorithm approach to an adaptive information retrieval agent. Journal of the American Society for Information Science, 50(9), 760-771.]]
[21]
Mitchell. T. M. (1997). Machine learning. New York, NY: McGraw Hill.]]
[22]
Pathak, P., Gordon, M., & Fan, W. (2000). Effective information retrieval using genetic algorithms based matching function adaptation. In Proceedings of the 33rd Hawaii international conference on system science (HICSS), Hawaii, USA.]]
[23]
Pitkow, J., Schutze, H., Cass, T., Cooley, R., Turnbull, D., Edmonds, A., Adar, E., & Breuel, T. (2002). Personalized search. Communications of the ACM. 45(9). 50-55.]]
[24]
Raghavan, V. V., & Agarwal, B. (1987). Optimal determination of user-oriented clusters: an application for the reproductive plan. In Proceedings of the second international conference on genetic algorithms and their applications, Cambridge, MA (pp. 241-246).]]
[25]
Salton, G. (1971). The SMART retrieval system: experiments in automatic document processing. New Jersey: Prentice Hall.]]
[26]
Salton, G. (1989). Automatic text processing. Reading, MA: Addison-Wesley Publishing Co.]]
[27]
Salton G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval, Information Processing and management, 24(5). 513-523.]]
[28]
Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. McGraw-Hill.]]
[29]
Singhal, A., Salton, G., Mitra, M., & Buckley, C. (1996). Document length normalization. Information Processing and management, 32(5). 619-633.]]
[30]
Vogt, C. C., & Cottrell, G. W. (1999). Fusion via a linear Combination of scores. Information Retrieval, 1(3), 151-173.]]
[31]
Voorhees, E. M., & Harman, D. K. (1998). Overview of the seventh text retrieval conference (TREC-7). In E. M. voorhees & D. K. Harman (Eds.), Proceedings of the seventh text retrieval conference. NIST Special Publication 500-242 (pp. 1-24).]]
[32]
Zobel, J., & Moffat, A. (1998). Exploring the similarity space. SIGIR Forum, 32(1). 18-34.]]

Cited By

View all
  • (2021)Effects of central tendency measures on term weighting in textual information retrievalSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-021-05694-525:11(7341-7378)Online publication date: 1-Jun-2021
  • (2019)An effective and efficient algorithm for ranking web documents via genetic programmingProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3297385(1065-1072)Online publication date: 8-Apr-2019
  • (2017)Exploring the space of information retrieval term scoring functionsInformation Processing and Management: an International Journal10.1016/j.ipm.2016.11.00353:2(454-472)Online publication date: 1-Mar-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Processing and Management: an International Journal
Information Processing and Management: an International Journal  Volume 40, Issue 4
May 2004
136 pages

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 May 2004

Author Tags

  1. genetic algorithms
  2. genetic programming
  3. information retrieval
  4. ranking function
  5. text mining

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Effects of central tendency measures on term weighting in textual information retrievalSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-021-05694-525:11(7341-7378)Online publication date: 1-Jun-2021
  • (2019)An effective and efficient algorithm for ranking web documents via genetic programmingProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3297385(1065-1072)Online publication date: 8-Apr-2019
  • (2017)Exploring the space of information retrieval term scoring functionsInformation Processing and Management: an International Journal10.1016/j.ipm.2016.11.00353:2(454-472)Online publication date: 1-Mar-2017
  • (2017)Generation of simple structured information retrieval functions by genetic algorithm without stagnationExpert Systems with Applications: An International Journal10.1016/j.eswa.2017.05.01985:C(221-230)Online publication date: 1-Nov-2017
  • (2016)Preventing database schema extraction by error message handlingInformation Systems10.1016/j.is.2015.09.01056:C(135-156)Online publication date: 1-Mar-2016
  • (2016)Opinion-Based Entity Ranking using learning to rankApplied Soft Computing10.1016/j.asoc.2015.10.00138:C(151-163)Online publication date: 1-Jan-2016
  • (2016)Learning to Assess Linked Data Relationships Using Genetic ProgrammingThe Semantic Web – ISWC 201610.1007/978-3-319-46523-4_35(581-597)Online publication date: 17-Oct-2016
  • (2015)A new fuzzy logic based ranking function for efficient Information Retrieval systemExpert Systems with Applications: An International Journal10.1016/j.eswa.2014.09.00942:3(1223-1234)Online publication date: 15-Feb-2015
  • (2014)Exploring the Space of IR FunctionsProceedings of the 36th European Conference on IR Research on Advances in Information Retrieval - Volume 841610.5555/2964060.2964070(372-384)Online publication date: 13-Apr-2014
  • (2014)Combining pre-retrieval query quality predictors using genetic programmingApplied Intelligence10.5555/2592907.259291540:3(525-535)Online publication date: 1-Apr-2014
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media