Identifying similarities, periodicities and bursts for online search queries

M Vlachos, C Meek, Z Vagena… - Proceedings of the 2004 …, 2004 - dl.acm.org
Proceedings of the 2004 ACM SIGMOD international conference on Management of …, 2004dl.acm.org
We present several methods for mining knowledge from the query logs of the MSN search
engine. Using the query logs, we build a time series for each query word or phrase
(eg,'Thanksgiving'or'Christmas gifts') where the elements of the time series are the number
of times that a query is issued on a day. All of the methods we describe use sequences of
this form and can be applied to time series data generally. Our primary goal is the discovery
of semantically similar queries and we do so by identifying queries with similar demand …
We present several methods for mining knowledge from the query logs of the MSN search engine. Using the query logs, we build a time series for each query word or phrase (e.g., 'Thanksgiving' or 'Christmas gifts') where the elements of the time series are the number of times that a query is issued on a day. All of the methods we describe use sequences of this form and can be applied to time series data generally. Our primary goal is the discovery of semantically similar queries and we do so by identifying queries with similar demand patterns. Utilizing the best Fourier coefficients and the energy of the omitted components, we improve upon the state-of-the-art in time-series similarity matching. The extracted sequence features are then organized in an efficient metric tree index structure. We also demonstrate how to efficiently and accurately discover the important periods in a time-series. Finally we propose a simple but effective method for identification of bursts (long or short-term). Using the burst information extracted from a sequence, we are able to efficiently perform 'query-by-burst' on the database of time-series. We conclude the presentation with the description of a tool that uses the described methods, and serves as an interactive exploratory data discovery tool for the MSN query database.
ACM Digital Library