Abstract
In this paper, we investigate the trustworthiness of search engines’ hit counts, numbers returned as search result counts. Since many studies adopt search engines’ hit counts to estimate the popularity of input queries, the reliability of hit counts is indispensable for archiving trustworthy studies. However, hit counts are unreliable because they change, when a user clicks the “Search” button more than once or clicks the “Next” button on the search results page, or when a user queries the same term on separate days. In this paper, we analyze the characteristics of hit count transition by gathering various types of hit counts over two months by using 10,000 queries. The results of our study show that the hit counts with the largest search offset just before search engines adjust their hit counts are the most reliable. Moreover, hit counts are the most reliable when they are consistent over approximately a week.
Chapter PDF
Similar content being viewed by others
References
Kilgarriff, A., Gefenstette, G.: Introduction to the Special Issue on the Web as Corpus. J. of Computational Linguistics 29(3), 333–347 (2003)
Cilibrasi, R.L., Vitanyi, P.M.B.: The Google Similarity Distance. IEEE Trans. on Knowledge and Data Engineering 19(3), 370–383 (2007)
Matsuo, Y., Sakai, T., Uchiyama, K., Ishizuka, M.: Graph-based Word Clustering using Web Search Engine. In: Proc. of the Conf. on Empirical Methods in Natural Language Processing, pp. 542–550 (2006)
Yamamoto, Y., Tezuka, T., Jatowt, A., Tanaka, K.: Honto? Search: Estimating Trustworthiness of Web Information by Search Results Aggregation and Temporal Analysis. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds.) APWeb/WAIM 2007. LNCS, vol. 4505, pp. 253–264. Springer, Heidelberg (2007)
Kilgarriff, A.: Googleology is Bad Science. J. of Computational Linguistics 33(1), 147–151 (2007)
Thelwall, M.: Quantitative Comparisons of Search Engine Results. J. of the American Society for Information Science and Technology 59(11), 1702–1710 (2008)
Uyar, A.: Investigation of the Accuracy of Search Engine Hit Counts. J. of Information Science 35(4), 469–480 (2009)
info-plosion, http://www.infoplosion.nii.ac.jp/info-plosion/ctr.php/m/IndexEng/a/Index/ (accessed 28/4/2010)
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Anh, V.N., de Kretser, O., Moffat, A.: Vector-Space Ranking with Effective Early Termination. In: Proc. of the 24th Ann. Int’l ACM SIGIR Conf., pp. 35–42 (2001)
Carmel, D., Cohen, D., Fagin, R., Farchi, E., Herscovici, M., Maarek, Y.S., Soffer, A.: Static Index Pruning for Information Retrieval Systems. In: Proc. of the 24th Ann. Int’l ACM SIGIR Conf., pp. 43–50 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Funahashi, T., Yamana, H. (2010). Reliability Verification of Search Engines’ Hit Counts: How to Select a Reliable Hit Count for a Query. In: Daniel, F., Facca, F.M. (eds) Current Trends in Web Engineering. ICWE 2010. Lecture Notes in Computer Science, vol 6385. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16985-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-16985-4_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16984-7
Online ISBN: 978-3-642-16985-4
eBook Packages: Computer ScienceComputer Science (R0)