Session Boundary Detection for Association Rule Learning Using n-Gram Language Models

Xiangji Huang⁵,
Fuchun Peng⁵,
Aijun An⁶,
Dale Schuurmans⁵ &
…
Nick Cercone⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2671))

Included in the following conference series:

Conference of the Canadian Society for Computational Studies of Intelligence

1145 Accesses

Abstract

We present a statistical method using n-gram language models to identify session boundaries in a large collection of Livelink log data. The identified sessions are then used for association rule learning. Unlike the traditional ad hoc timeout method, which uses fixed time thresholds for session identification, our method uses an information theoretic approach that provides a natural technique for performing dynamic session identification. The effectiveness of our approach is evaluated with respect to 4 different interestingness measures. We find that we obtain a significant improvement in each interestingness measure, ranging from a 26. 6% to 39% improvement on average over the best results obtained with standard timeout methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An efficient and scalable dynamic session identification framework for web usage mining

Article 09 February 2022

HI-Tree: Mining High Influence Patterns Using External and Internal Utility Values

The Curious Case of Session Identification

References

Agrawal, R. and Srikant, R.; (1994). Fast Algorithms for Mining Association Rules, Proc. of the 20th International Conference on Very Large Databases, Santiago, Chile.
Google Scholar
An, A. and Cercone, N.; (2001). Rule Quality Measures for Rule Induction Systems: Description and Evaluation, Computational Intelligence, Vol. 17 No. 3.
Google Scholar
Bahl, L., Jelinek, F. and Mercer, R.; (1983). A Maximum Likelihood Approach to Continuous Speech Recognition IEEE Transactions on Pattern Analysis and Machine Intelligence, 5(2), pp. 179–190.
Article Google Scholar
Bruha, I.; (1996). Quality of Decision Rules: Definitions and Classification Schemes for Multiple Rules. In Nakhaeizadeh, G. and Taylor, C. C. (eds.): Machine Learning and Statistics, The Interface. Jone Wiley & Sons Inc.
Google Scholar
Chen, S. and Goodman, J.; (1998). An Empirical Study of Smoothing Techniques for Language Modeling. Technical report, TR-10-98, Harvard University.
Google Scholar
Catledge, Lara D. and Pitkow, James E.; (1995) Characterizing Browsing Strategies in the World Wide Web, Proceedings of the 3rd International World Wide Web Conference, April 1995, Darmstadt, Germany.
Google Scholar
He, D. and Goker, A.; (2000). Detecting session boundaries from Web user logs, Proceedings of the 22nd Annual Colloquium on Information Retrieval Research (ECIR), April 2000, Sidney Sussex College, Cambridge, England.
Google Scholar
Hiemstra, D.; (2001). Using Language Models for Information Retrieval. Ph. D. Thesis, Centre for Telematics and Information Technology, University of Twente.
Google Scholar
Huang, X., An, A., Cercone, N. and Promhouse, G; (2002) Discovery of Interesting Association Rules from Livelink Web Log Data. In Proceedings of the IEEE International Conference on Data Mining (ICDM), December, 2002, Maebashi TERRSA, Maebashi City, Japan.
Google Scholar
Lafferty, J. and Zhai, C.; (2001). Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In Proceedings of 24th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).
Google Scholar
Peng, F. and Schuurmans, D.; (2003). Combining Naive Bayes and n-Gram Language Models for Text Classiffication. In Proceedings of The 25th European Conference on Information Retrieval Research (ECIR).
Google Scholar
Ponte, J. and Croft, W.; (1998). A Language Modeling Approach to Information Retrieval. In Proceedings of ACM Research and Development in Information Retrieval (SIGIR), pp 275–281.
Google Scholar
Tan, P. and Kumar, V.; (2000). Interestingness Measures for Association Patterns: A Perspective, Technical Report TR00-036, Department of Computer Science, Univ. of Minnestota.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada
Xiangji Huang, Fuchun Peng & Dale Schuurmans
Department of Computer Science, York University, Toronto, Ontario, M3J 1P3, Canada
Aijun An
Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, B3H 1W5, Canada
Nick Cercone

Authors

Xiangji Huang
View author publications
You can also search for this author in PubMed Google Scholar
Fuchun Peng
View author publications
You can also search for this author in PubMed Google Scholar
Aijun An
View author publications
You can also search for this author in PubMed Google Scholar
Dale Schuurmans
View author publications
You can also search for this author in PubMed Google Scholar
Nick Cercone
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computing and Information Science, College of Physical and Engineering Science, University of Guelph, Guelph, Ontario, Canada, N1G 2W1
Yang Xiang
Dépt. Informatique-Génie Logiciel, Université Laval, Pavillon Pouliot, Ste-Foy, PQ, Canada, G1K 7P4
Brahim Chaib-draa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, X., Peng, F., An, A., Schuurmans, D., Cercone, N. (2003). Session Boundary Detection for Association Rule Learning Using n-Gram Language Models. In: Xiang, Y., Chaib-draa, B. (eds) Advances in Artificial Intelligence. Canadian AI 2003. Lecture Notes in Computer Science, vol 2671. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44886-1_19

Download citation

DOI: https://doi.org/10.1007/3-540-44886-1_19
Published: 27 May 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40300-5
Online ISBN: 978-3-540-44886-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics