More Web Proxy on the site http://driver.im/

Article

Search Engine Query Clustering Using Top-k Search Results

Authors:

Jaideep Vaidya,

Haibing LuAuthors Info & Claims

WI-IAT '11: Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

Pages 112 - 119

https://doi.org/10.1109/WI-IAT.2011.224

Published: 22 August 2011 Publication History

Abstract

Clustering of search engine queries has attracted significant attention in recent years. Many search engine applications such as query recommendation require query clustering as a pre-requisite to function properly. Indeed, clustering is necessary to unlock the true value of query logs. However, clustering search queries effectively is quite challenging, due to the high diversity and arbitrary input by users. Search queries are usually short and ambiguous in terms of user requirements. Many different queries may refer to a single concept, while a single query may cover many concepts. Existing prevalent clustering methods, such as K-Means or DBSCAN cannot assure good results in such a diverse environment. Agglomerative clustering gives good results but is computationally quite expensive. This paper presents a novel clustering approach based on a key insight--search engine results might themselves be used to identify query similarity. We propose a novel similarity metric for diverse queries based on the ranked URL results returned by a search engine for queries. This is used to develop a very efficient and accurate algorithm for clustering queries. Our experimental results demonstrate more accurate clustering performance, better scalability and robustness of our approach against known baselines.

References

[1]

"http://www.worldwidewebsize.com/."

[2]

G. Salton and M. J. McGill, Introduction to Modern Information Retrieval. New York, NY, USA: McGraw-Hill, Inc., 1986.

Digital Library

[3]

D. Beeferman and A. L. Berger, "Agglomerative clustering of a search engine query log," in KDD, pp. 407-416, 2000.

Digital Library

[4]

J.-R. Wen, J.-Y. Nie, and H. Zhang, "Query clustering using user logs," ACM Trans. Inf. Syst., vol. 20, no. 1, pp. 59-81, 2002.

Digital Library

[5]

H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, and H. Li, "Context-aware query suggestion by mining click-through and session data," in KDD, pp. 875-883, ACM, 2008.

Digital Library

[6]

T. Joachims, "Optimizing search engines using clickthrough data," in KDD, pp. 133-142, ACM, 2002.

Digital Library

[7]

E. Agichtein, E. Brill, and S. T. Dumais, "Improving web search ranking by incorporating user behavior information," in SIGIR, pp. 19-26, 2006.

Digital Library

[8]

U. Irmak, V. von Brzeski, and R. Kraft, "Contextual ranking of keywords using click data," in ICDE, pp. 457-468, 2009.

Digital Library

[9]

F. Radlinski and T. Joachims, "Query chains: learning to rank from implicit feedback," in KDD, pp. 239-248, 2005.

Digital Library

[10]

T. Joachims, L. A. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay, "Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search," ACM Trans. Inf. Syst., vol. 25, no. 2, 2007.

Digital Library

[11]

R. Fagin, R. Kumar, and D. Sivakumar, "Comparing top k lists," SIAM J. Discrete Math., vol. 17, no. 1, pp. 134-160, 2003.

Digital Library

[12]

C. H. Papadimitriou and K. Steiglitz, Combinatorial optimization: algorithms and complexity. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1982.

Digital Library

[13]

M. Barbaro and T. Z. Jr., "A face is exposed for aol searcher no. 4417749," Augest 9, 2006. (New York Times).

[14]

K. Hafner, "Researchers yearn to use aol logs, but they hesitate," Augest 23, 2006. (New York Times).

[15]

H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma, "Query expansion by mining user logs," IEEE Trans. Knowl. Data Eng., vol. 15, no. 4, pp. 829-839, 2003.

Digital Library

[16]

R. Baeza-Yates, C. Hurtado, and M. Mendoza, "Query recommendation using query logs in search engines," in EDBT, 2004.

[17]

J.-R. Wen, J.-Y. Nie, and H.-J. Zhang, "Clustering user queries of a search engine," in WWW '01.

Digital Library

[18]

M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, "A densitybased algorithm for discovering clusters in large spatial databases with noise," in KDD, pp. 226-231, 1996.

[19]

E. Yilmaz, J. A. Aslam, and S. Robertson, "A new rank correlation coefficient for information retrieval," in SIGIR, pp. 587-594, 2008.

Digital Library

[20]

D. L. Davies and D. W. Bouldin, "A cluster separation measure," IEEE Transactions on In Pattern Analysis and Machine Intelligence, vol. PAMI-1, pp. 224-227, Nov. 1977.

Digital Library

[21]

S. Saitta, B. Raphael, and I. F. C. Smith, "A bounded index for cluster validity," in MLDM, pp. 174-187, 2007.

Digital Library

Index Terms

Search Engine Query Clustering Using Top-k Search Results
1. Information systems
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

This work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Query Recommendation for Improving Search Engine Results

As web contents grow, the importance of search engines become more critical and at the same time user satisfaction decreases. Query recommendation is a new approach to improve search results in web. In this paper a method is proposed that, given a query ...
Google Search Engine: Seo Tools You Need to Explode Your Website Traffic - Google Seo, Google Ranking

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WI-IAT '11: Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

August 2011

520 pages

ISBN:9780769545134

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

IEEE Computer Society

United States

Publication History

Published: 22 August 2011

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
94
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents