[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/WI-IAT.2011.224acmconferencesArticle/Chapter ViewAbstractPublication PageswiConference Proceedingsconference-collections
Article

Search Engine Query Clustering Using Top-k Search Results

Published: 22 August 2011 Publication History

Abstract

Clustering of search engine queries has attracted significant attention in recent years. Many search engine applications such as query recommendation require query clustering as a pre-requisite to function properly. Indeed, clustering is necessary to unlock the true value of query logs. However, clustering search queries effectively is quite challenging, due to the high diversity and arbitrary input by users. Search queries are usually short and ambiguous in terms of user requirements. Many different queries may refer to a single concept, while a single query may cover many concepts. Existing prevalent clustering methods, such as K-Means or DBSCAN cannot assure good results in such a diverse environment. Agglomerative clustering gives good results but is computationally quite expensive. This paper presents a novel clustering approach based on a key insight--search engine results might themselves be used to identify query similarity. We propose a novel similarity metric for diverse queries based on the ranked URL results returned by a search engine for queries. This is used to develop a very efficient and accurate algorithm for clustering queries. Our experimental results demonstrate more accurate clustering performance, better scalability and robustness of our approach against known baselines.

References

[1]
"http://www.worldwidewebsize.com/."
[2]
G. Salton and M. J. McGill, Introduction to Modern Information Retrieval. New York, NY, USA: McGraw-Hill, Inc., 1986.
[3]
D. Beeferman and A. L. Berger, "Agglomerative clustering of a search engine query log," in KDD, pp. 407-416, 2000.
[4]
J.-R. Wen, J.-Y. Nie, and H. Zhang, "Query clustering using user logs," ACM Trans. Inf. Syst., vol. 20, no. 1, pp. 59-81, 2002.
[5]
H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, and H. Li, "Context-aware query suggestion by mining click-through and session data," in KDD, pp. 875-883, ACM, 2008.
[6]
T. Joachims, "Optimizing search engines using clickthrough data," in KDD, pp. 133-142, ACM, 2002.
[7]
E. Agichtein, E. Brill, and S. T. Dumais, "Improving web search ranking by incorporating user behavior information," in SIGIR, pp. 19-26, 2006.
[8]
U. Irmak, V. von Brzeski, and R. Kraft, "Contextual ranking of keywords using click data," in ICDE, pp. 457-468, 2009.
[9]
F. Radlinski and T. Joachims, "Query chains: learning to rank from implicit feedback," in KDD, pp. 239-248, 2005.
[10]
T. Joachims, L. A. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay, "Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search," ACM Trans. Inf. Syst., vol. 25, no. 2, 2007.
[11]
R. Fagin, R. Kumar, and D. Sivakumar, "Comparing top k lists," SIAM J. Discrete Math., vol. 17, no. 1, pp. 134-160, 2003.
[12]
C. H. Papadimitriou and K. Steiglitz, Combinatorial optimization: algorithms and complexity. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1982.
[13]
M. Barbaro and T. Z. Jr., "A face is exposed for aol searcher no. 4417749," Augest 9, 2006. (New York Times).
[14]
K. Hafner, "Researchers yearn to use aol logs, but they hesitate," Augest 23, 2006. (New York Times).
[15]
H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma, "Query expansion by mining user logs," IEEE Trans. Knowl. Data Eng., vol. 15, no. 4, pp. 829-839, 2003.
[16]
R. Baeza-Yates, C. Hurtado, and M. Mendoza, "Query recommendation using query logs in search engines," in EDBT, 2004.
[17]
J.-R. Wen, J.-Y. Nie, and H.-J. Zhang, "Clustering user queries of a search engine," in WWW '01.
[18]
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, "A densitybased algorithm for discovering clusters in large spatial databases with noise," in KDD, pp. 226-231, 1996.
[19]
E. Yilmaz, J. A. Aslam, and S. Robertson, "A new rank correlation coefficient for information retrieval," in SIGIR, pp. 587-594, 2008.
[20]
D. L. Davies and D. W. Bouldin, "A cluster separation measure," IEEE Transactions on In Pattern Analysis and Machine Intelligence, vol. PAMI-1, pp. 224-227, Nov. 1977.
[21]
S. Saitta, B. Raphael, and I. F. C. Smith, "A bounded index for cluster validity," in MLDM, pp. 174-187, 2007.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WI-IAT '11: Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
August 2011
520 pages
ISBN:9780769545134

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 22 August 2011

Check for updates

Author Tags

  1. Clustering validation
  2. Search egine query clustering
  3. Top-k search results

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 94
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media