Abstract
The human interaction through the web generates both implicit and explicit knowledge. An example of an implicit contribution is searching, as people contribute with their knowledge by clicking on retrieved documents. Thus, an important and interesting challenge is to extract semantic relations among queries and their terms from query logs. In this paper we present and discuss results on mining large query log induced graphs, and how they contribute to query classification and to understand user intent and interest. Our approach consists on efficiently obtaining a hierarchical clustering for such graphs and, then, a hierarchical query folksonomy. Results obtained with real data provide interesting insights on semantic relations among queries and are compared with conventional taxonomies, namely the ODP categorization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baeza-Yates, R.: Applications of web query mining. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 7–22. Springer, Heidelberg (2005)
Baeza-Yates, R., Hurtado, C., Mendoza, M.: Query clustering for boosting web page ranking. In: Favela, J., Menasalvas, E., Chávez, E. (eds.) AWIC 2004. LNCS (LNAI), vol. 3034, pp. 164–175. Springer, Heidelberg (2004)
Baeza-Yates, R.A., Tiberi, A.: Extracting semantic relations from query logs. In: SIGKDD, pp. 76–85. ACM, New York (2007)
Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: SIGKDD. ACM, New York (1999)
Chuang, S.L., Chien, L.F.: Towards automatic generation of query taxonomy: A hierarchical query clustering approach. In: IEEE International Conference on Data Mining. IEEE, Los Alamitos (2002)
Chuang, S.L., Chien., L.F.: Automatic query taxonomy generation for information retrieval applications. Online Information Review 27(5) (2003)
Chuang, S.L., Chien, L.F.: Enriching web taxonomies through subject categorization of query terms from search engine logs. Decision Support System 30(1) (2003)
Chung, F.: The heat kernel as the pagerank of a graph. Proceedings of the National Academy of Sciences 104(50), 19735 (2007)
Fortunato, S.: Community detection in graphs. Physics Reports 486, 75–174 (2010)
Francisco, A.P., Baeza-Yates, R., Oliveira, A.L.: Clique analysis of query log graphs. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 188–199. Springer, Heidelberg (2008)
Francisco, A.P., Baeza-Yates, R., Oliveira, A.L.: Mining query logs induced graphs. Tech. Rep. 36/2010, INESC-ID (2010)
Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community structure in large networks: Natural cluster sizes and the absence of large well-define clusters. arXiv:0810.1355 (2008)
Shen, D., Qin, M., Chen, W., Yang, Q., Chen, Z.: Mining Web Query Hierarchies from Clickthrough Data. In: AAAI 2007, pp. 341–346. AAAI Press, Menlo Park (2007)
Wei, F., Qian, W., Wang, C., Zhou, A.: Detecting Overlapping Community Structures in Networks. World Wide Web 12(2), 235–261 (2009)
Wen, J., Mie, J., Zhang, H.: Clustering user queries of a search engine. In: Proc. of the 10th International World Wide Web Conference. W3C (2001)
Zaiane, O.R., Strilets, A.: Finding similar queries to satisfy searches based on query traces. In: Efficient Web-Based Information Systems (EWIS) (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Francisco, A.P., Baeza-Yates, R., Oliveira, A.L. (2010). Mining Large Query Induced Graphs towards a Hierarchical Query Folksonomy. In: Chavez, E., Lonardi, S. (eds) String Processing and Information Retrieval. SPIRE 2010. Lecture Notes in Computer Science, vol 6393. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16321-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-16321-0_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16320-3
Online ISBN: 978-3-642-16321-0
eBook Packages: Computer ScienceComputer Science (R0)