[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/646111.679597guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A Hybrid Approach to Web Usage Mining

Published: 04 September 2002 Publication History

Abstract

With the large number of companies using the Internet to distribute and collect information, knowledge discovery on the web has become an important research area. Web usage mining, which is the main topic of this paper, focuses on knowledge discovery from the clicks in the web log for a given site (the so-called click-stream), especially on analysis of sequences of clicks. Existing techniques for analyzing click sequences have different drawbacks, i.e., either huge storage requirements, excessive I/O cost, or scalability problems when additional information is introduced into the analysis.In this paper we present a new hybrid approach for analyzing click sequences that aims to overcome these drawbacks. The approach is based on a novel combination of existing approaches, more specifically the Hypertext Probabilistic Grammar (HPG) and Click Fact Table approaches. The approach allows for additional information, e.g., user demographics, to be included in the analysis without introducing performance problems. The development is driven by experiences gained from industry collaboration. A prototype has been implemented and experiments are presented that show that the hybrid approach performs well compared to the existing approaches. This is especially true when mining sessions containing clicks with certain characteristics, i.e., when constraints are introduced. The approach is not limited to web log analysis, but can also be used for general sequence mining tasks.

References

[1]
J. Andersen, A. Giversen, A. H. Jensen, R. S. Larsen, T. B. Pedersen, and J. Skyt. Analyzing clickstreams using subsessions. In Proceedings of the Second International Workshop on Data Warehousing and OLAP , 2000.
[2]
R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the 11th International Conference on Data Engineering , 1995.
[3]
J. Borges. A Data Mining Model to Capture User Web Navigation Patterns . PhD thesis, Department of Computer Science, University College London, 2000.
[4]
J. Borges and M. Levene. Data mining of user navigation patterns. In Proceedings of WEBKDD , 1999.
[5]
J. Borges and M. Levene. Heuristics for mining high quality user web navigation patterns. Research Note RN/99/68. Department of Computer Science, University College London, Gower Street, London, UK, 1999.
[6]
J. Borges and M. Levene.Afine grained heuristic to capture web navigation patterns. SIGKDD Explorations , 2000.
[7]
A.G. Büchner, S.S. Anand, M.D. Mulvenna, and J.G. Hughes. Discovering internet marketing intelligence through web log mining. In Proceedings of UNICOM99 , 1999.
[8]
R. Cooley, J. Srivastava, and B. Mobasher. Web mining: Information and pattern discovery on the world wide web. In Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'97) , 1997.
[9]
R. Cooley, P. Tan, and J. Srivastava. Websift: the web site information filter system. In Proceedings of the 1999 KDD Workshop on Web Mining , 1999.
[10]
S. Jespersen, T. B. Pedersen, and J. Thorhauge. A Hybrid Approach toWeb Usage Mining - Technical Report R02-5002 Dept. of CS, Aalborg University , 2002
[11]
J. Han and M. Kamber. Data Mining - Concepts and Techniques . Morgan Kaufmann, 2000.
[12]
R. Kimball and R. Merz. The Data Webhouse Toolkit . Wiley, 2000.
[13]
J. Pei, J. Han, B. Mortazavi-Asl, and H. Zhu. Mining access patterns efficiently from web logs. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining , 2000.
[14]
J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth. In Proceedings of the 17th International Conference on Data Engineering .
[15]
Sawmill, http://www.sawmill.net.
[16]
M. Spiliopoulou and L. C. Faulstich. WUM: a Web Utilization Miner. In Proceedings of the Workshop on the Web and Data Bases , 1998.
[17]
R. Srikant and R. Agrawal. Mining Sequential Patterns: Generalizations and Performance Improvements. In Proceedings of the EDBT Conference , 1996.
[18]
T. Cormen et. al. Introduction to Algorithms MIT Press, 2001.
[19]
WebTrends LogAnalyzer. http://www.webtrends.com/products/log/.
[20]
K.-L. Wu, P. S. Yu, and A. Ballman. Speedtracer:A web usage mining and analysis tool. IBM System Journal, Internet Computing, Volume 37 , 1998.
[21]
Zenaria A/S. http://www.zenaria.com.

Cited By

View all
  • (2011)Alternative Approach to Tree-Structured Web Log Representation and MiningProceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 0110.1109/WI-IAT.2011.156(235-242)Online publication date: 22-Aug-2011
  • (2008)Computational Intelligence techniques for Web personalizationWeb Intelligence and Agent Systems10.5555/1454421.14544236:3(253-272)Online publication date: 1-Aug-2008
  • (2005)Mining interesting knowledge from weblogsData & Knowledge Engineering10.1016/j.datak.2004.08.00153:3(225-241)Online publication date: 1-Jun-2005
  • Show More Cited By
  1. A Hybrid Approach to Web Usage Mining

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    DaWaK 2000: Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
    September 2002
    337 pages

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 04 September 2002

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2011)Alternative Approach to Tree-Structured Web Log Representation and MiningProceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 0110.1109/WI-IAT.2011.156(235-242)Online publication date: 22-Aug-2011
    • (2008)Computational Intelligence techniques for Web personalizationWeb Intelligence and Agent Systems10.5555/1454421.14544236:3(253-272)Online publication date: 1-Aug-2008
    • (2005)Mining interesting knowledge from weblogsData & Knowledge Engineering10.1016/j.datak.2004.08.00153:3(225-241)Online publication date: 1-Jun-2005
    • (2003)Evaluating the markov assumption for web usage miningProceedings of the 5th ACM international workshop on Web information and data management10.1145/956699.956717(82-89)Online publication date: 7-Nov-2003

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media