More Web Proxy on the site http://driver.im/

article

A highly scalable and effective method for metasearch

Authors:

Zhuogang LiAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 19, Issue 3

Pages 310 - 335

https://doi.org/10.1145/502115.502120

Published: 01 July 2001 Publication History

Abstract

A metasearch engine is a system that supports unified access to multiple local search engines. Database selection is one of the main challenges in building a large-scale metasearch engine. The problem is to efficiently and accurately determine a small number of potentially useful local search engines to invoke for each user query. In order to enable accurate selection, metadata that reflect the contents of each search engine need to be collected and used. This article proposes a highly scalable and accurate database selection method. This method has several novel features. First, the metadata for representing the contents of all search engines are organized into a single integrated representative. Such a representative yields both computational efficiency and storage efficiency. Second, the new selection method is based on a theory for ranking search engines optimally. Experimental results indicate that this new method is very effective. An operational prototype system has been built based on the proposed approach.

References

[1]

ARMS, W., BOWMAN, C., FUHR, N., GRAVANO, L., KAPIDAKIS, S., KOVACS, L., LAGOZE,C., LEVAN, B., PAPAZOGLOU, M., AND SMEATON, A. 1999. Resource Discovery in a Globally- DistributedDigital Library. Digital Library Collaborative Working Groups Report, http://www. iei.pi.cnr.it/DELOS/NSF/resourcediscovery.htm.

[2]

BAUMGARTEN, C. 1997. A probabilistic model for distributed information retrieval. In Proceedings of the ACM SIGIR Conference (Philadelphia, July), 258-266.

[3]

BAUMGARTEN, C. 1999. A probabilistic solution to the selection and fusion problem in distributed information retrieval. In Proceedings of the ACM SIGIR Conference (Berkeley, Calif., August), 246-253.

[4]

BERGMAN, M. 2000. The Deep Web:Surfacing the Hidden Value. BrightPlanet, www. completeplanet.com/Tutorials/DeepWeb/index.asp.

[5]

BHARAT,K.AND BRODER, A. 1998. A technique for measuring the relative size and overlap of public web search engines. In Proceedings of the Seventh World Wide Web Conference (Brisbane, April), 379-388.

[6]

CALLAN, J., CONNELL, M., AND DU, A. 1999. Automatic discovery of language models for text databases. In Proceedings of the ACM SIGMOD Conference (Philadelphia, June), 479-490.

[7]

CALLAN, J., LU, Z., AND CROFT, W. 1995. Searching distributed collections with inference networks. In Proceedings of the ACM SIGIR Conference (Seattle), 21-28.

[8]

DREILINGER,D.AND HOWE, A. 1997. Experiences with selecting search engines using metasearch. ACM Trans. Inf. Syst. 15, 3 (July), 195-222.

[9]

FAN,Y.AND GAUCH, S. 1999. Adaptive agents for information gathering from multiple, distributed information sources. In Proceedings of the 1999 AAAI Symposium on Intelligent Agents in Cyberspace (Stanford University, March), 40-46.

[10]

FRENCH, J., POWELL, A., CALLAN, J., VILES, C., EMMITT, T., PREY, K., AND MOU, Y. 1999. Comparing the performance of database selection algorithms. In Proceedings of the ACM SIGIR Conference (Berkeley, Calif., August), 238-245.

[11]

FRENCH, J., POWELL, A., AND VILES, C. 1998. Evaluating database selection techniques: A testbed and experiment. In Proceedings of the ACM SIGIR Conference (Melbourne, August), 121-129.

[12]

GAUCH, S., WANG,G.,AND GOMEZ, M. 1996. Profusion: Intelligent fusion from multiple, distributed search engines. J. Universal Comput. Sci. 2, 9, 637-649.

[13]

GRAVANO,L.AND GARCIA-MOLINA, H. 1995. Generalizing gloss to vector-space databases and broker hierarchies. In Proceedings of the International Conferences on Very Large Data Bases (Zurich, September), 78-89.

[14]

GRAVANO,L.AND GARCIA-MOLINA, H. 1997. Merging ranks from heterogeneous internet sources. In Proceedings of the International Conferences on Very Large Data Bases (Athens, August), 196-205.

[15]

HAWKING,D.AND THISTLEWAITE, P. 1999. Methods for information server selection. ACM Trans. Inf. Syst. 17, 1 (Jan.), 40-76.

[16]

IPEIROTIS, P., GRAVANO, L., AND SAHAMI, M. 2001. Probe, count, and classify: Categorizing hiddenweb databases. In Proceedings of the ACM SIGMOD Conference (Santa Barbara, Calif.), 67-78.

[17]

JANSEN, B., SPINK, A., BATEMAN,J.,AND SARACEVIC, T. 1998. Real life information retrieval: A study of user queries on the web. ACM SIGIR Forum 32, 1, 5-17.

[18]

KIRK, T., LEVY, A., SAGIV,Y.,AND SRIVASTAVA, D. 1995. The information manifold. In AAAI Spring Symposium on Information Gathering in Distributed Heterogeneous Environments.

[19]

KIRSCH, S. 1998. Internet search: Infoseek's experiences searching the internet. ACM SIGIR Forum 32, 2, 3-7.

[20]

LAWRENCE,S.AND LEE GILES, C. 1998a. Inquirus, the neci meta search engine. In Proceedings of the Seventh International World Wide Web Conference (Brisbane, April), 95-105.

[21]

LAWRENCE,S.AND LEE GILES, C. 1998b. Searching the world wide web. Science 280, 98-100.

[22]

LAWRENCE,S.AND LEE GILES, C. 1999. Accessibility of information on the web. Nature 400, 107- 109.

[23]

LIMA,E.AND PEDERSEN, J. 1999. Phrases recognition and expansion for short, precision-biased queries based on a query log. In Proceedings of the ACM SIGIR Conference (Berkeley, Calif. August), 145-152.

[24]

LIU, K., YU,C.,AND MENG, W. 2001a. Discovering the representative of a search engine. Tech. Rep., DePaul University.

[25]

LIU, K., YU, C., MENG, W., WU,W.,AND RISHE, N. 2001b. A statistical method for estimating the usefulness of text databases. IEEE Trans. Knowl. Data Eng. (to appear).

[26]

LIU, L. 1999. Query routing in large-scale digital library systems. In Proceedings of the IEEE International Conference on Data Engineering. (Sydney, March), 154-163.

[27]

MANBER,U.AND BIGOT, P. 1997. The search broker. In Proceedings of the USENIX Symposium on Internet Technologies and Systems (Monterey, Calif., December), 231-239.

[28]

MENG, M., LIU, K., YU, C., WANG, X., CHANG,Y.,AND RISHE, N. 1998. Determine text databases to search in the internet. In Proceedings of the International Conferences on Very Large Data Bases, (New York, August), 14-25.

[29]

MENG, M., LIU, K., YU, C., WU,W.,AND RISHE, N. 1999a. Estimating the usefulness of search engines. In Proceedings of the IEEE International Conference on Data Engineering (Sydney, March), 146-153.

[30]

MENG, W., WANG, W., SUN, H., AND YU, C. 2001a. Concept hierarchy based text database categorization. Int. J. Knowl. Inf. Syst. (to appear).

[31]

MENG, W., YU,C.,AND LIU, K. 2001b. Building effective and efficient metasearch engines. ACM Comput. Surv. (to appear).

[32]

MENG, W., YU,C.,AND LIU, K. 1999b. Detection of heterogeneities in a multiple text database environment. In Proceedings of the Fourth IFCIS Conference on Cooperative Information Systems (Edinburgh, September), 22-33.

[33]

PAGE, L., BRIN, S., MOTWANI, R., AND WINOGRAD, T. 1998. The pagerank citation ranking: Bring order to the web. Tech. Rep., Stanford University.

[34]

SALTON,G.AND MCGILL, M. 1983. Introduction to Modern Information Retrieval. McGraw-Hill, New York.

[35]

SELBERG,E.AND ETZIONI, O. 1995. Multi-service search and comparison using the metacrawler. In Proceedings of the Fourth World Wide Web Conference (Boston, December), 195-208.

[36]

SELBERG,E.AND ETZIONI, O. 1997. The metacrawler architecture for resource aggregation on the Web. IEEE Expert 12, 1, 8-14.

[37]

SPARCK JONES, K. 1972. Statistical interpretation of term specificity and its application in retrieval. J. Doc. 28, 1, 11-20.

[38]

SUGIURA,A.AND ETZIONI, O. 2000. Query routing for web search engines: Architecture and experiments. In Proceedings of the Ninth World Wide Web Conference (Amsterdam, May), 417-429.

[39]

VOORHEES, E., GUPTA,N.,AND JOHNSON-LAIRD, B. 1995. Learning collection fusion strategies. In Proceedings of the ACM SIGIR Conference (Seattle, July), 172-179.

[40]

WANG, W., MENG,W.,AND YU, C. 2000. Concept hierarchy based text database categorization in a metasearch engine environment. In Proceedings of the First International Conference on Web Information Systems Engineering (Hong Kong, June), 283-290.

[41]

XU,J.AND CALLAN, J. 1998. Effective retrieval with distributed collections. In Proceedings of the ACM SIGIR Conference (Melbourne, Australia), 112-120.

[42]

XU,J.AND CROFT, B. 1999. Cluster-based language models for distributed retrieval. In Proceedings of the ACM SIGIR Conference (Berkeley, Calif., August), 254-261.

[43]

YU,C.AND MENG, W. 1998. Principles of Database Query Processing for Advanced Applications. Kaufmann, San Francisco.

[44]

YU, C., LIU, K., MENG, W., WU, Z., AND RISHE, N. 2001b. Amethodology for retrieving text documents from multiple databases. IEEE Trans. Knowl. Data Eng. (to appear).

[45]

YU, C., LIU, K., WU, M., W., W., AND RISHE, N. 1999a. Finding the most similar documents across multiple text databases. In Proceedings of the IEEE Conference on Advances in Digital Libraries (Baltimore, May), 150-162.

[46]

YU, C., MENG, W., LIU, K., WU,W.,AND RISHE, N. 1999b. Efficient and effective metasearch for a large number of text databases. In Proceedings of the Eighth ACM International Conference on Information and Knowledge Management (Kansas City, November), 217-224.

[47]

YU, C., MENG, W., WU,W.,AND LIU, K. 2001a. Efficient and effective metasearch for text databases incorporating linkages among documents. In Proceedings of the ACMSIGMOD Conference (Santa Barbara, Calif., May), 187-198.

[48]

YUWONO,B.AND LEE, D. 1997. Server ranking for distributed text resource systems on the internet. In Proceedings of the fifth International Conference On Database Systems For Advanced Applications (Melbourne, Australia, April), 391-400.

Cited By

Tan XYu WTan L(2023)Large-Scale Rank Aggregation from Multiple Data Sources Based D3MOPSO MethodWeb and Big Data10.1007/978-981-97-2303-4_5(63-80)Online publication date: 6-Oct-2023
https://dl.acm.org/doi/10.1007/978-981-97-2303-4_5
Yu WLi STang XWang K(2019)An efficient top-k ranking method for service selection based on ?-ADMOPSO algorithmNeural Computing and Applications10.1007/s00521-018-3640-931:1(77-92)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1007/s00521-018-3640-9
Serrano W(2018)Neural Networks in Big Data and Web SearchData10.3390/data40100074:1(7)Online publication date: 30-Dec-2018
https://doi.org/10.3390/data4010007
Show More Cited By

Index Terms

A highly scalable and effective method for metasearch
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
2. Information systems

Recommendations

Building efficient and effective metasearch engines

Frequently a user's information needs are stored in the databases of multiple search engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search engines and identify useful documents from the returned results. To support ...
Using Online Relevance Feedback to Build Effective Personalized Metasearch Engine
WISE '01: Proceedings of the Second International Conference on Web Information Systems Engineering (WISE'01) Volume 1 - Volume 1

Metasearch Engine is popular for facilitating users' queries over multiple search engines and increasing the coverage of the WWW. How to rank the merged results becomes crucial for the success of metasearch engines. Many current metasearch engines have ...
A personalized result merging method for metasearch engine
ICSCA '17: Proceedings of the 6th International Conference on Software and Computer Applications

Metasearch engine integrates the search results from multiple sources, and improves recall in the big data environment. Result merging is a key component which will greatly affect the effectiveness of a metasearch engine. Great progress has been made in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 19, Issue 3

July 2001

119 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/502115

Issue’s Table of Contents

Copyright © 2001 ACM.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 2001

Published in TOIS Volume 19, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
969
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tan XYu WTan L(2023)Large-Scale Rank Aggregation from Multiple Data Sources Based D3MOPSO MethodWeb and Big Data10.1007/978-981-97-2303-4_5(63-80)Online publication date: 6-Oct-2023
https://dl.acm.org/doi/10.1007/978-981-97-2303-4_5
Yu WLi STang XWang K(2019)An efficient top-k ranking method for service selection based on ?-ADMOPSO algorithmNeural Computing and Applications10.1007/s00521-018-3640-931:1(77-92)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1007/s00521-018-3640-9
Serrano W(2018)Neural Networks in Big Data and Web SearchData10.3390/data40100074:1(7)Online publication date: 30-Dec-2018
https://doi.org/10.3390/data4010007
Serrano W(2018)The Random Neural Network and Web Search: Survey PaperIntelligent Systems and Applications10.1007/978-3-030-01054-6_51(700-737)Online publication date: 9-Nov-2018
https://doi.org/10.1007/978-3-030-01054-6_51
Meng W(2018)Metasearch EnginesEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_217(2248-2253)Online publication date: 7-Dec-2018
https://doi.org/10.1007/978-1-4614-8265-9_217
Meng W(2016)Metasearch EnginesEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_217-2(1-6)Online publication date: 22-Dec-2016
https://doi.org/10.1007/978-1-4899-7993-3_217-2
De ADiaz ERaghavan V(2012)Weighted Fuzzy Aggregation for Metasearch: An Application of Choquet IntegralAdvances on Computational Intelligence10.1007/978-3-642-31709-5_51(501-510)Online publication date: 2012
https://doi.org/10.1007/978-3-642-31709-5_51
Chua CChiang RStorey V(2011)Improving Domain Searches through Customized Search EnginesIntelligent, Adaptive and Reasoning Technologies10.4018/978-1-60960-595-7.ch001(1-22)Online publication date: 2011
https://doi.org/10.4018/978-1-60960-595-7.ch001
Shokouhi MSi L(2011)Federated SearchFoundations and Trends in Information Retrieval10.1561/15000000105:1(1-102)Online publication date: 1-Jan-2011
https://dl.acm.org/doi/10.1561/1500000010
Renda M(2011)Personalized Information Search and Retrieval through a Desktop ApplicationWeb Information Systems and Technologies10.1007/978-3-642-22810-0_10(129-146)Online publication date: 2011
https://doi.org/10.1007/978-3-642-22810-0_10
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents