Abstract
Data on the Web is noisy, huge, and dynamic. This poses enormous challenges to most data mining techniques that try to extract patterns from this data. While scalable data mining methods are expected to cope with the size challenge, coping with evolving trends in noisy data in a continuous fashion, and without any unnecessary stoppages and reconfigurations is still an open challenge. This dynamic and single pass setting can be cast within the framework of mining evolving data streams. Furthermore, the heterogeneity of the Web has required Web-based applications to more effectively integrate a variety of types of data across multiple channels and from different sources such as content, structure, and more recently, semantics. Most existing Web mining and personalization methods are limited to working at the level described to be the lowest and most primitive level, namely discovering models of the user profiles from the input data stream. However, in order to improve understanding of the real intention and dynamics of Web clickstreams, we need to extend reasoning and discovery beyond the usual data stream level. We propose a new multi-level framework for Web usage mining and personalization, consisting of knowledge discovery at different granularities: (i) session/user clicks, profiles, (ii) profile life events and profile communities, and (iii) sequential patterns and predicted shifts in the user profiles. One of the most promising features of the proposed framework address the challenging dynamic scenarios, including (i) defining and detecting events in the life of a synopsis profile, such as Birth, Death and Atavism, and (ii) identifying Node Communities that can later be used to track the temporal evolution of Web profile activity events and dynamic trends within communities, such as Expansion, Shrinking, and Drift.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
P. Karnam A. Joshi, C. Punyapu. Personalization and asynchronicity to support mobile web access. In Workshop on Web Information and Data Management, ACM 7th Intl. Conf. on Information and Knowledge Management, Nov. 1998.
C. Aggarwal, J. Han, J. Wang, and P. Yu. A framework for clustering evolving data streams. 2003.
C. Aggarwal, J. Han, J. Wang, and P.S. Yu. A framework for clustering evolving data streams. In Proc. 2003 Int. Conf. on Very Large Data Bases (VLDB'03), Berlin, Germany, Sept 2003.
S. Babu and J. Widom. Continuous queries over data streams. In SIGMOD Record'01, pp. 109–120, 2001.
M. Balabanovic and Y. Shoham. Fab: Content-based, collaborative recommendation. Communications of the ACM, 40(3):67–72, 1997.
D. Barbara. Requirements for clustering data streams. ACM SIGKDD Explorations Newsletter, 3(2):23–27, 2002.
J. Borges and M. Levene. Data mining of user navigation patterns. In H.A. Abbass, R.A. Sarker, and C.S. Newton, editors, Web Usage Analysis and User Profiling, Lecture Notes in Computer Science, pp. 92–111. Springer-Verlag, 1999.
P. Bradley, U. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In Proceedings of the 4th international conf. on Knowledge Discovery and Data Mining (KDD98), 1998.
A. Buchner and M.D. Mulvenna. Discovering internet marketing intelligence through online analytical web usage mining. SIGMOD Record, 4(27), 1999.
R. Burke. Hybrid recommender systems: Survey and experiments. User Modeling and User-Adapted Interaction, 12(4):331–370, 2002.
R. Burke. Hybrid recommmender systems: Survey and experiments. User Modeling and User-Adapted Interaction, 12(4):331–370, 2002.
M. Charikar, L. O'Callaghan, and R. Panigrahy. Better streaming algorithms for clustering problems. In Proc. of 35th ACM Symposium on Theory of Computing (STOC), 2003.
Y. Chen, G. Dong, J. Han, B.W. Wah, and J. Wang. Multi-dimensional regression analysis of time-series data streams. In 2002 Int. Conf. on Very Large Data Bases (VLDB'02), Hong Kong, China, 2002.
R. Cooley, B. Mobasher, and J. Srivastava. Web mining: Information and pattern discovery on the world wide web. In IEEE Intl. Conf. Tools with AI, pp. 558–567, Newport Beach, CA, 1997.
R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining world wide web browsing patterns. Journal of knowledge and information systems, 1(1), 1999.
A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society Series B, 39(1):1–38, 1977.
U. Fayad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, 1996.
S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In IEEE Symposium on Foundations of Computer Science (FOCS'00), Redondo Beach, CA, 2000.
G.H. Hardy, J.E. Littlewood, and G Pólya. Inequalities, chapter Tchebychef's Inequality, pp. 43–45. Cambridge University Press, Cambridge, England, 2nd edition, 1988.
M. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams, 1998.
P.J. Huber. Robust Statistics. John Wiley & Sons, New York, 1981.
H. Heckerman J. Breese and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In 14th Conf. Uncertainty in Artificial Intelligence, pp. 43–52, 1998.
A. Joshi, S. Weerawarana, and E. Houstis. On disconnected browsing of distributed information. In Seventh IEEE Intl. Workshop on Research Issues in Data Engineering (RIDE), pp. 101–108, 1997.
H. Mannila, H. Toivonen, and A.I. Verkamo. Discovering frequent episodes in sequences. In Proceedings of KDD Congress, pp. 210–215, Montreal, Quebec, Canada, 1995.
D. Mladenic. Text learning and related intelligent agents. IEEE Expert, Jul. 1999.
B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. Effective personalizaton based on association rule discovery from web usage data. In ACM Workshop on Web information and data management, Atlanta, GA, Nov 2001.
O. Nasraoui, C. Cardona, C. Rojas, and F. Gonzalez. Mining evolving user profiles in noisy web clickstream data with a scalable immune system clustering algorithm. In WebKDD 2003 KDD Workshop on Web mining as a Premise to Effective and Intelligent Web Applications, Washington, DC, August 2003.
O. Nasraoui, C. Cardona, C. Rojas, and F. Gonzalez. Tecno-streams: Tracking evolving clusters in noisy data streams with a scalable immune system learning model. In Third IEEE International Conference on Data Mining (ICDM'03), Melbourne, FL, November 2003.
O. Nasraoui and R. Krishnapuram. A new evolutionary approach to web usage and context sensitive associations mining. International Journal on Computational Intelligence and Applications - Special Issue on Internet Intelligent Systems, 2(3):339–348.
O. Nasraoui and R. Krishnapuram. One step evolutionary mining of context sensitive associations and web navigation patterns. In SIAM conference on Data Mining, pp. 531–547, Arlington, VA, 2002.
O. Nasraoui, R. Krishnapuram, H. Frigui, and Joshi A. Extracting web user profiles using relational competitive fuzzy clustering. International Journal of Artificial Intelligence Tools, 9(4):509–526, 2000.
O. Nasraoui, R. Krishnapuram, and A. Joshi. Mining web access logs using a relational clustering algorithm based on a robust estimator. In 8th International World Wide Web Conference, pp. 40–41, Toronto, Canada, 1999.
O. Nasraoui and M. Pavuluri. Complete this puzzle: A connectionist approach to accurate web recommendations based on a committee of predictors. In WebKDD- 2004 workshop on Web Mining and Web Usage Analysis , B. Mobasher, B. Liu, B. Masand, O. Nasraoui, Eds, Seattle, WA, Aug 2004.
O. Nasraoui and C. Petenes. Combining web usage mining and fuzzy inference for website personalization. In Proc. of WebKDD 2003 KDD Workshop on Web mining as a Premise to Effective and Intelligent Web Applications, p. 37, Washington DC, August 2003.
M. Pazzani. A framework for collaborative, content-based and demographic filtering. AI Review, 13(5–6):393–408, 1999.
M. Perkowitz and O. Etzioni. Adaptive web sites: an ai challenge. In Intl. Joint Conf. on AI, 1997.
M. Perkowitz and O. Etzioni. Adaptive web sites: Automatically synthesizing web pp. In AAAI 98, 1998.
R.O. Duda and P.E. Hart. Pattern Classifiation and Scene Analysis. John Wiley and Sons, 1973.
P.J. Rousseeuw and A.M. Leroy. Robust Regression and Outlier Detection. John Wiley & Sons, New York, 1987.
Robert E. Schapire. The boosting approach to machine learning: An overview. In MSRI Workshop on Nonlinear Estimation and Classifiation, 2002.
C. Shahabi, A.M. Zarkesh, J. Abidi, and V. Shah. Knowledge discovery from users web-page navigation. In Proceedings of workshop on research issues in Data engineering, Birmingham, England, 1997.
M. Spiliopoulou and L.C. Faulstich. Wum: A web utilization miner. In Proceedings of EDBT workshop WebDB98, Valencia, Spain, 1999.
J. Srivastava, R. Cooley, M. Deshpande, and P.N. Tan. Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations, 1(2):1–12, Jan 2000.
L. Terveen, W. Hill, and B. Amento. Phoaks – a system for sharing recommendations. Comm. ACM, 40(3), 1997.
T. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal. From user access patterns to dynamic hypertext linking. In Proceedings of the 5th International World Wide Web conference, Paris, France, 1996.
O. Zaiane and J. Han. Webml: Querying the world-wide web for resources and knowledge. In Workshop on Web Information and Data Management, 7th Intl. Conf. on Information and Knowledge Management, 1998.
O. Zaiane, M. Xin, and J. Han. Discovering web access patterns and trends by applying olap and data mining technology on web logs. In Advances in Digital Libraries, pp. 19–29, Santa Barbara, CA, 1998.
T. Zhang, R. Ramakrishnan, and M. Livny. Birch: An efficient data clustering method for very large databases. In Proceedings of the ACM SIGMOD conference on Management of Data, Montreal Canada, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer
About this chapter
Cite this chapter
Nasraoui, O. (2006). A Multi-Layered and Multi-Faceted Framework for Mining Evolving Web Clickstreams. In: Sirmakessis, S. (eds) Adaptive and Personalized Semantic Web. Studies in Computational Intelligence, vol 14. Springer, Berlin, Heidelberg . https://doi.org/10.1007/3-540-33279-0_2
Download citation
DOI: https://doi.org/10.1007/3-540-33279-0_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30605-4
Online ISBN: 978-3-540-33279-4
eBook Packages: EngineeringEngineering (R0)