More Web Proxy on the site http://driver.im/

article

A generic ranking function discovery framework by genetic programming for information retrieval

Authors:

Michael D. Gordon,

Praveen PathakAuthors Info & Claims

Information Processing and Management: an International Journal, Volume 40, Issue 4

Pages 587 - 602

https://doi.org/10.1016/j.ipm.2003.08.001

Published: 01 May 2004 Publication History

Abstract

Ranking functions play a substantial role in the performance of information retrieval (IR) systems and search engines. Although there are many ranking functions available in the IR literature, various empirical evaluation studies show that ranking functions do not perform consistently well across different contexts (queries, collections, users). Moreover, it is often difficult and very expensive for human beings to design optimal ranking functions that work well in all these contexts. In this paper, we propose a novel ranking function discovery framework based on Genetic Programming and show through various experiments how this new framework helps automate the ranking function design/discovery process.

References

[1]

Banzhaf, W., Nordin, P., Keller, R. E., & Francone, F. D. (1998). Genetic programming: an introduction--on the automatic evolution of computer programs and its applications. San Francisco, CA: Morgan Kaufmann Publishers.]]

[2]

Bartell, B. T., Cottrell, G. W., & Belew, R. K. (1994). Automatic combination of multiple ranked retrieval systems. In The proceedings of seventeenth annual international ACM SIGIR conference on research and development in information retrieval (pp. 173-181). Available: citeseer.nj.nec.com/bartell94automatic.html.]]

[3]

Chen, H., Chung, Y., Ramsey, M., & Yang, C. (1998). A smart itsy bitsy spider for the web. Journal of the American Society for information Science 49(7), 604-618.]]

Digital Library

[4]

Fan, W., Gordon, M. D., & Pathak, P. (2000). Personalization of search engine services for effective retrieval and knowledge management. In Proceedings of 2000 international conference on information systems (ICIS), Brisbane, Australia (pp. 20-34).]]

[5]

Fox, E. A. (1983). Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types. Ph.D. thesis, Cornell University.]]

[6]

Fox, E. A., Koushik, M. P., Shaw, J., Modlin, R., & Rao, D. (1993). Combining evidence from multiple searches. In Proceedings of the first text retrieval conference (TREC-1). NIST Special Publication 500-207 (pp. 319-328).]]

[7]

Fuhr, N., & Buckley, C. (1991). A probabilistic learning approach for document indexing. ACM Transactions on Information Systems, 9(3), 223-248. Available: citeseer.nj.nec.com/fuhr91probabilistic.html.]]

Digital Library

[8]

Fuhr, N., & Pfeifer, U. (1994). Probabilistic information retrieval as combination of abstraction, inductive learning and probabilistic assumptions. ACM Transactions on InJormation Systems, 12(1), 92-115. Available: citeseer.nj.nec.com/ fuhr94probabilistic.html.]]

Digital Library

[9]

Gey, F. C. (1994). Inferring probability of relevance using the method of logistic regression. In The proceedings of seventeenth annual international ACM SIGIR conference on research and development in information retrieval (pp. 222-231).]]

[10]

Gordon, M. (1988). Probabilistic and genetic algorithms for document retrieval. Communications of ACM, 31(2), 152- 169.]]

Digital Library

[11]

Gordon, M. (1991). User-based document clustering by redescribing subject descriptions with a genetic algorithm. Journal of the American Society for Informatioin Science, 42(5), 311-322.]]

[12]

Gordon, M., & Pathak, P. (1999). Finding information on the World Wide Web: the retrieval effectiveness of search engines, Information Processing and management, 35(2), 141-180.]]

[13]

Harman, D. K. (1993). Overview of the first text retrieval conference (TREC-1). In D. K. Harman (Ed.), Proceedings of the first text retrieval conference. NIST special Publication 500-207 (pp. 1-20).]]

[14]

Harman, D. K. (1996). Overview of the fourth text retrieval conference (TREC-4). In D. K. Harman (Ed.), Proceedings of the fourth text retrieval conference. NIST Special Publication 500-236 (pp. 1-24).]]

[15]

Jansen, B. J., Spink, A., & Saracevic, T. (2000). Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing and Management, 36(2), 207-227.]]

Digital Library

[16]

Jones, W. P., & Furnas, G. W. (1987). Pictures of relevance: a geometric analysis of similarity measures. Journal of the American society for information science, 38(6), 420-442.]]

[17]

Koza, J.R. (1992). Genetic programming: on the programming of computers by means of natural selection. Cambridge, MA, USA: MIT Press.]]

[18]

Langdon, W. B. (1998). Data structures and genetic programming: genetic programming+data structures=automatic programming. Kluwer Publishing.]]

[19]

Lee, J. H. (1997). Analyses of multiple evidence combination. In The proceedings of twentieth annual international ACM SIGIR conference on research and development in information retrieval (pp. 267-276).]]

[20]

Martin-Bautista, M. J., Vila, M., & Larsen, H. L. (1999). A fuzzy genetic algorithm approach to an adaptive information retrieval agent. Journal of the American Society for Information Science, 50(9), 760-771.]]

Digital Library

[21]

Mitchell. T. M. (1997). Machine learning. New York, NY: McGraw Hill.]]

[22]

Pathak, P., Gordon, M., & Fan, W. (2000). Effective information retrieval using genetic algorithms based matching function adaptation. In Proceedings of the 33rd Hawaii international conference on system science (HICSS), Hawaii, USA.]]

[23]

Pitkow, J., Schutze, H., Cass, T., Cooley, R., Turnbull, D., Edmonds, A., Adar, E., & Breuel, T. (2002). Personalized search. Communications of the ACM. 45(9). 50-55.]]

Digital Library

[24]

Raghavan, V. V., & Agarwal, B. (1987). Optimal determination of user-oriented clusters: an application for the reproductive plan. In Proceedings of the second international conference on genetic algorithms and their applications, Cambridge, MA (pp. 241-246).]]

[25]

Salton, G. (1971). The SMART retrieval system: experiments in automatic document processing. New Jersey: Prentice Hall.]]

[26]

Salton, G. (1989). Automatic text processing. Reading, MA: Addison-Wesley Publishing Co.]]

[27]

Salton G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval, Information Processing and management, 24(5). 513-523.]]

[28]

Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. McGraw-Hill.]]

[29]

Singhal, A., Salton, G., Mitra, M., & Buckley, C. (1996). Document length normalization. Information Processing and management, 32(5). 619-633.]]

[30]

Vogt, C. C., & Cottrell, G. W. (1999). Fusion via a linear Combination of scores. Information Retrieval, 1(3), 151-173.]]

Digital Library

[31]

Voorhees, E. M., & Harman, D. K. (1998). Overview of the seventh text retrieval conference (TREC-7). In E. M. voorhees & D. K. Harman (Eds.), Proceedings of the seventh text retrieval conference. NIST Special Publication 500-242 (pp. 1-24).]]

[32]

Zobel, J., & Moffat, A. (1998). Exploring the similarity space. SIGIR Forum, 32(1). 18-34.]]

Digital Library

Cited By

Ghahramani FTahayori HVisconti A(2021)Effects of central tendency measures on term weighting in textual information retrievalSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-021-05694-525:11(7341-7378)Online publication date: 1-Jun-2021
https://dl.acm.org/doi/10.1007/s00500-021-05694-5
Baeza-Yates RCuzzocrea ACrea DBianco GHung CPapadopoulos G(2019)An effective and efficient algorithm for ranking web documents via genetic programmingProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3297385(1065-1072)Online publication date: 8-Apr-2019
https://dl.acm.org/doi/10.1145/3297280.3297385
Goswami PGaussier EAmini M(2017)Exploring the space of information retrieval term scoring functionsInformation Processing and Management: an International Journal10.1016/j.ipm.2016.11.00353:2(454-472)Online publication date: 1-Mar-2017
https://dl.acm.org/doi/10.1016/j.ipm.2016.11.003
Show More Cited By

Index Terms

A generic ranking function discovery framework by genetic programming for information retrieval
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
      1. Heuristic function construction
2. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
  2. Information systems applications
    1. Data mining

Recommendations

Genetic Programming-Based Discovery of Ranking Functions for Effective Web Search

Web search engines have become an integral part of the daily life of a knowledge worker, who depends on these search engines to retrieve relevant information from the Web or from the company's vast document databases. Current search engines are very ...
Nonlinear ranking function representations in genetic programming-based ranking discovery for personalized search

Ranking function is instrumental in affecting the performance of a search engine. Designing and optimizing a search engine's ranking function remains a daunting task for computer and information scientists. Recently, genetic programming (GP), a machine ...
Discovery of Context-Specific Ranking Functions for Effective Information Retrieval Using Genetic Programming

Abstract--The Internet and corporate Intranets have brought a lot of information. People usually resort to search engines to find required information. However, these systems tend to use only one fixed ranking strategy regardless of the contexts. This ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Processing and Management: an International Journal

Information Processing and Management: an International Journal Volume 40, Issue 4

May 2004

136 pages

ISSN:0306-4573

Issue’s Table of Contents

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 May 2004

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

35
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ghahramani FTahayori HVisconti A(2021)Effects of central tendency measures on term weighting in textual information retrievalSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-021-05694-525:11(7341-7378)Online publication date: 1-Jun-2021
https://dl.acm.org/doi/10.1007/s00500-021-05694-5
Baeza-Yates RCuzzocrea ACrea DBianco GHung CPapadopoulos G(2019)An effective and efficient algorithm for ranking web documents via genetic programmingProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3297385(1065-1072)Online publication date: 8-Apr-2019
https://dl.acm.org/doi/10.1145/3297280.3297385
Goswami PGaussier EAmini M(2017)Exploring the space of information retrieval term scoring functionsInformation Processing and Management: an International Journal10.1016/j.ipm.2016.11.00353:2(454-472)Online publication date: 1-Mar-2017
https://dl.acm.org/doi/10.1016/j.ipm.2016.11.003
Kulunchakov AStrijov V(2017)Generation of simple structured information retrieval functions by genetic algorithm without stagnationExpert Systems with Applications: An International Journal10.1016/j.eswa.2017.05.01985:C(221-230)Online publication date: 1-Nov-2017
https://dl.acm.org/doi/10.1016/j.eswa.2017.05.019
Naghdi SAmini M(2016)Preventing database schema extraction by error message handlingInformation Systems10.1016/j.is.2015.09.01056:C(135-156)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1016/j.is.2015.09.010
Bashir SAfzal WBaig A(2016)Opinion-Based Entity Ranking using learning to rankApplied Soft Computing10.1016/j.asoc.2015.10.00138:C(151-163)Online publication date: 1-Jan-2016
https://dl.acm.org/doi/10.1016/j.asoc.2015.10.001
Tiddi Id’Aquin MMotta E(2016)Learning to Assess Linked Data Relationships Using Genetic ProgrammingThe Semantic Web – ISWC 201610.1007/978-3-319-46523-4_35(581-597)Online publication date: 17-Oct-2016
https://dl.acm.org/doi/10.1007/978-3-319-46523-4_35
(2015)A new fuzzy logic based ranking function for efficient Information Retrieval systemExpert Systems with Applications: An International Journal10.1016/j.eswa.2014.09.00942:3(1223-1234)Online publication date: 15-Feb-2015
https://dl.acm.org/doi/10.1016/j.eswa.2014.09.009
Goswami PMoura SGaussier EAmini MMaes F(2014)Exploring the Space of IR FunctionsProceedings of the 36th European Conference on IR Research on Advances in Information Retrieval - Volume 841610.5555/2964060.2964070(372-384)Online publication date: 13-Apr-2014
https://dl.acm.org/doi/10.5555/2964060.2964070
Bashir S(2014)Combining pre-retrieval query quality predictors using genetic programmingApplied Intelligence10.5555/2592907.259291540:3(525-535)Online publication date: 1-Apr-2014
https://dl.acm.org/doi/10.5555/2592907.2592915
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents