[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Session Boundary Detection for Association Rule Learning Using n-Gram Language Models

  • Conference paper
  • First Online:
Advances in Artificial Intelligence (Canadian AI 2003)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2671))

Abstract

We present a statistical method using n-gram language models to identify session boundaries in a large collection of Livelink log data. The identified sessions are then used for association rule learning. Unlike the traditional ad hoc timeout method, which uses fixed time thresholds for session identification, our method uses an information theoretic approach that provides a natural technique for performing dynamic session identification. The effectiveness of our approach is evaluated with respect to 4 different interestingness measures. We find that we obtain a significant improvement in each interestingness measure, ranging from a 26. 6% to 39% improvement on average over the best results obtained with standard timeout methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Agrawal, R. and Srikant, R.; (1994). Fast Algorithms for Mining Association Rules, Proc. of the 20th International Conference on Very Large Databases, Santiago, Chile.

    Google Scholar 

  2. An, A. and Cercone, N.; (2001). Rule Quality Measures for Rule Induction Systems: Description and Evaluation, Computational Intelligence, Vol. 17 No. 3.

    Google Scholar 

  3. Bahl, L., Jelinek, F. and Mercer, R.; (1983). A Maximum Likelihood Approach to Continuous Speech Recognition IEEE Transactions on Pattern Analysis and Machine Intelligence, 5(2), pp. 179–190.

    Article  Google Scholar 

  4. Bruha, I.; (1996). Quality of Decision Rules: Definitions and Classification Schemes for Multiple Rules. In Nakhaeizadeh, G. and Taylor, C. C. (eds.): Machine Learning and Statistics, The Interface. Jone Wiley & Sons Inc.

    Google Scholar 

  5. Chen, S. and Goodman, J.; (1998). An Empirical Study of Smoothing Techniques for Language Modeling. Technical report, TR-10-98, Harvard University.

    Google Scholar 

  6. Catledge, Lara D. and Pitkow, James E.; (1995) Characterizing Browsing Strategies in the World Wide Web, Proceedings of the 3rd International World Wide Web Conference, April 1995, Darmstadt, Germany.

    Google Scholar 

  7. He, D. and Goker, A.; (2000). Detecting session boundaries from Web user logs, Proceedings of the 22nd Annual Colloquium on Information Retrieval Research (ECIR), April 2000, Sidney Sussex College, Cambridge, England.

    Google Scholar 

  8. Hiemstra, D.; (2001). Using Language Models for Information Retrieval. Ph. D. Thesis, Centre for Telematics and Information Technology, University of Twente.

    Google Scholar 

  9. Huang, X., An, A., Cercone, N. and Promhouse, G; (2002) Discovery of Interesting Association Rules from Livelink Web Log Data. In Proceedings of the IEEE International Conference on Data Mining (ICDM), December, 2002, Maebashi TERRSA, Maebashi City, Japan.

    Google Scholar 

  10. Lafferty, J. and Zhai, C.; (2001). Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In Proceedings of 24th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).

    Google Scholar 

  11. Peng, F. and Schuurmans, D.; (2003). Combining Naive Bayes and n-Gram Language Models for Text Classiffication. In Proceedings of The 25th European Conference on Information Retrieval Research (ECIR).

    Google Scholar 

  12. Ponte, J. and Croft, W.; (1998). A Language Modeling Approach to Information Retrieval. In Proceedings of ACM Research and Development in Information Retrieval (SIGIR), pp 275–281.

    Google Scholar 

  13. Tan, P. and Kumar, V.; (2000). Interestingness Measures for Association Patterns: A Perspective, Technical Report TR00-036, Department of Computer Science, Univ. of Minnestota.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Huang, X., Peng, F., An, A., Schuurmans, D., Cercone, N. (2003). Session Boundary Detection for Association Rule Learning Using n-Gram Language Models. In: Xiang, Y., Chaib-draa, B. (eds) Advances in Artificial Intelligence. Canadian AI 2003. Lecture Notes in Computer Science, vol 2671. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44886-1_19

Download citation

  • DOI: https://doi.org/10.1007/3-540-44886-1_19

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40300-5

  • Online ISBN: 978-3-540-44886-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics