Reliability Verification of Search Engines’ Hit Counts: How to Select a Reliable Hit Count for a Query

Takuya Funahashi¹⁸ &
Hayato Yamana¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6385))

Included in the following conference series:

International Conference on Web Engineering

Abstract

In this paper, we investigate the trustworthiness of search engines’ hit counts, numbers returned as search result counts. Since many studies adopt search engines’ hit counts to estimate the popularity of input queries, the reliability of hit counts is indispensable for archiving trustworthy studies. However, hit counts are unreliable because they change, when a user clicks the “Search” button more than once or clicks the “Next” button on the search results page, or when a user queries the same term on separate days. In this paper, we analyze the characteristics of hit count transition by gathering various types of hit counts over two months by using 10,000 queries. The results of our study show that the hit counts with the largest search offset just before search engines adjust their hit counts are the most reliable. Moreover, hit counts are the most reliable when they are consistent over approximately a week.

Download to read the full chapter text

Chapter PDF

Federated Search Using Query Log Evidence

Stable and semi-stable sampling approaches for continuously used samples

Article 03 April 2023

A Short Survey on Online and Offline Methods for Search Quality Evaluation

Keywords

References

Kilgarriff, A., Gefenstette, G.: Introduction to the Special Issue on the Web as Corpus. J. of Computational Linguistics 29(3), 333–347 (2003)
Article MathSciNet Google Scholar
Cilibrasi, R.L., Vitanyi, P.M.B.: The Google Similarity Distance. IEEE Trans. on Knowledge and Data Engineering 19(3), 370–383 (2007)
Article Google Scholar
Matsuo, Y., Sakai, T., Uchiyama, K., Ishizuka, M.: Graph-based Word Clustering using Web Search Engine. In: Proc. of the Conf. on Empirical Methods in Natural Language Processing, pp. 542–550 (2006)
Google Scholar
Yamamoto, Y., Tezuka, T., Jatowt, A., Tanaka, K.: Honto? Search: Estimating Trustworthiness of Web Information by Search Results Aggregation and Temporal Analysis. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds.) APWeb/WAIM 2007. LNCS, vol. 4505, pp. 253–264. Springer, Heidelberg (2007)
Chapter Google Scholar
Kilgarriff, A.: Googleology is Bad Science. J. of Computational Linguistics 33(1), 147–151 (2007)
Article Google Scholar
Thelwall, M.: Quantitative Comparisons of Search Engine Results. J. of the American Society for Information Science and Technology 59(11), 1702–1710 (2008)
Article Google Scholar
Uyar, A.: Investigation of the Accuracy of Search Engine Hit Counts. J. of Information Science 35(4), 469–480 (2009)
Article MathSciNet Google Scholar
info-plosion, http://www.infoplosion.nii.ac.jp/info-plosion/ctr.php/m/IndexEng/a/Index/ (accessed 28/4/2010)
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book MATH Google Scholar
Anh, V.N., de Kretser, O., Moffat, A.: Vector-Space Ranking with Effective Early Termination. In: Proc. of the 24th Ann. Int’l ACM SIGIR Conf., pp. 35–42 (2001)
Google Scholar
Carmel, D., Cohen, D., Fagin, R., Farchi, E., Herscovici, M., Maarek, Y.S., Soffer, A.: Static Index Pruning for Information Retrieval Systems. In: Proc. of the 24th Ann. Int’l ACM SIGIR Conf., pp. 43–50 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Engineering Div., Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo, 169-8555, Japan
Takuya Funahashi & Hayato Yamana

Authors

Takuya Funahashi
View author publications
You can also search for this author in PubMed Google Scholar
Hayato Yamana
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Ingegneria dell’Informazione, University of Trento, Via Sommarive 14, 38100, Povo, Italy
Florian Daniel
CREATE-NET, Via alla Cascata 56/D, 38123, Povo, Trento, Italy
Federico Michele Facca

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Funahashi, T., Yamana, H. (2010). Reliability Verification of Search Engines’ Hit Counts: How to Select a Reliable Hit Count for a Query. In: Daniel, F., Facca, F.M. (eds) Current Trends in Web Engineering. ICWE 2010. Lecture Notes in Computer Science, vol 6385. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16985-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-16985-4_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16984-7
Online ISBN: 978-3-642-16985-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics