Abstract
We present a statistical method using n-gram language models to identify session boundaries in a large collection of Livelink log data. The identified sessions are then used for association rule learning. Unlike the traditional ad hoc timeout method, which uses fixed time thresholds for session identification, our method uses an information theoretic approach that provides a natural technique for performing dynamic session identification. The effectiveness of our approach is evaluated with respect to 4 different interestingness measures. We find that we obtain a significant improvement in each interestingness measure, ranging from a 26. 6% to 39% improvement on average over the best results obtained with standard timeout methods.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R. and Srikant, R.; (1994). Fast Algorithms for Mining Association Rules, Proc. of the 20th International Conference on Very Large Databases, Santiago, Chile.
An, A. and Cercone, N.; (2001). Rule Quality Measures for Rule Induction Systems: Description and Evaluation, Computational Intelligence, Vol. 17 No. 3.
Bahl, L., Jelinek, F. and Mercer, R.; (1983). A Maximum Likelihood Approach to Continuous Speech Recognition IEEE Transactions on Pattern Analysis and Machine Intelligence, 5(2), pp. 179–190.
Bruha, I.; (1996). Quality of Decision Rules: Definitions and Classification Schemes for Multiple Rules. In Nakhaeizadeh, G. and Taylor, C. C. (eds.): Machine Learning and Statistics, The Interface. Jone Wiley & Sons Inc.
Chen, S. and Goodman, J.; (1998). An Empirical Study of Smoothing Techniques for Language Modeling. Technical report, TR-10-98, Harvard University.
Catledge, Lara D. and Pitkow, James E.; (1995) Characterizing Browsing Strategies in the World Wide Web, Proceedings of the 3rd International World Wide Web Conference, April 1995, Darmstadt, Germany.
He, D. and Goker, A.; (2000). Detecting session boundaries from Web user logs, Proceedings of the 22nd Annual Colloquium on Information Retrieval Research (ECIR), April 2000, Sidney Sussex College, Cambridge, England.
Hiemstra, D.; (2001). Using Language Models for Information Retrieval. Ph. D. Thesis, Centre for Telematics and Information Technology, University of Twente.
Huang, X., An, A., Cercone, N. and Promhouse, G; (2002) Discovery of Interesting Association Rules from Livelink Web Log Data. In Proceedings of the IEEE International Conference on Data Mining (ICDM), December, 2002, Maebashi TERRSA, Maebashi City, Japan.
Lafferty, J. and Zhai, C.; (2001). Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In Proceedings of 24th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).
Peng, F. and Schuurmans, D.; (2003). Combining Naive Bayes and n-Gram Language Models for Text Classiffication. In Proceedings of The 25th European Conference on Information Retrieval Research (ECIR).
Ponte, J. and Croft, W.; (1998). A Language Modeling Approach to Information Retrieval. In Proceedings of ACM Research and Development in Information Retrieval (SIGIR), pp 275–281.
Tan, P. and Kumar, V.; (2000). Interestingness Measures for Association Patterns: A Perspective, Technical Report TR00-036, Department of Computer Science, Univ. of Minnestota.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huang, X., Peng, F., An, A., Schuurmans, D., Cercone, N. (2003). Session Boundary Detection for Association Rule Learning Using n-Gram Language Models. In: Xiang, Y., Chaib-draa, B. (eds) Advances in Artificial Intelligence. Canadian AI 2003. Lecture Notes in Computer Science, vol 2671. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44886-1_19
Download citation
DOI: https://doi.org/10.1007/3-540-44886-1_19
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40300-5
Online ISBN: 978-3-540-44886-0
eBook Packages: Springer Book Archive