More Web Proxy on the site http://driver.im/

article

An alternative approach for clustering web user sessions considering sequential information

Authors:

Rajhans Mishra,

Bharat BhaskerAuthors Info & Claims

Intelligent Data Analysis, Volume 18, Issue 2

Pages 137 - 156

Published: 01 March 2014 Publication History

Abstract

Clustering is a prominent technique in data mining applications. It generates groups of data points that are similar to each other in a given aspect. Each group has some inherent latent similarity which is computed using the similarity measures. Clustering web users based on navigational pattern has always been an interesting as well as a challenging task. A web user, based on its navigational pattern, may belong to multiple categories. Intrinsically, web user navigation pattern exhibits sequential property. When dealing with sequence data, a similarity measure should be chosen, which captures both the order as well as content information during computation of similarity among sequences. In this paper, we have utilized the Sequence and Set Similarity Measure S^{3}M with rough set based similarity upper approximation clustering algorithm to group web users based on their navigational patterns. The quality of cluster formed using rough set based clustering algorithm with S^{3}M measure has been compared with the well known clustering algorithm, Density based spatial clustering of applications with noise DBSCAN. The experimental results show the viability of our approach.

References

[1]

R. Cooley and B. Mobasher, Web Mining: Information and Pattern Discovery on the World Wide Web, Proceedings of Ninth IEEE International Conference on Tools with Artificial Intelligence, California, USA, 3-8 Nov, 1997.

[2]

P. Kolari and A. Joshi, Web mining: Research and Practice, Computing in Science & Engineering IEEE, Co published by the IEEE CS and the AIP University of Maryland, Baltimore County, 2004, pp. 49-53.

[3]

R. Agrawal and R. Srikant, Fast algorithms for mining association rules, Proc. of the 20th VLDB Conference, Santiago, Chile, 1994, pp. 487-499.

Digital Library

[4]

S.E. Dean and M. Viveros, Data mining the IBM official 1996 Olympics Web site, Technical report, IBM T.J. Watson Research Center, 1997.

[5]

R. Agrawal and R. Srikant, Mining sequential patterns: Generalizations and performance improvements, Proc. of the Fifth Int'l Conference on Extending Database Technology, Avignon, France, 1996.

[6]

F. Masseglia, P. Poncelet, M. Teisseire and A. Marascu, Web Usage Mining: Extracting Unexpected Periods from Web Logs, Data Mining and Knowledge Discovery 16(1) (2008), 39-65.

Digital Library

[7]

G. Castellano, A.M. Fanelli and M.A. Torsello, NEWER: A system for NEuro-fuzzy WEb Recommendation, Applied Soft Computing 11(1) (2011), 793-806.

Digital Library

[8]

M. Easter, H.P. Kriegek and J.A. Sander, Density-based algorithm for discovering clusters in large databases. Proc. of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD 96), AAAI Press, Portland, Aug. 1996, pp. 226-231.

[9]

M. Ankerst, M.M. Breunig and H.P. Kriegel, OPTIC: Ordering points to identify the clustering structure, Proc. of ACM SIGMOD International Conference on Management of Data, ACM Press, Philadelphia, 1999, pp. 49-60.

Digital Library

[10]

A. Hinneburg and D.A. Keim, An efficient approach to clustering in large multimedia databases with noise, Proc. Fourth International Conference on Knowledge Discovery and Data Mining (KDD 98), AAAI Press New York, 1998, pp. 58-65.

Digital Library

[11]

B. Borah and D.K. Bhattacharyya, An Improved Sampling-Based DBSCAN for Large Spatial Databases, Proc. International Conference on Intelligent Sensing and Information, IEEE Press 2004, pp. 92-96.

[12]

B. Borah and D.K. Bhattacharyya, DDSC: A Density Differentiated Spatial Clustering Technique, Journal of computers 3(2) (2008), 72-79.

[13]

Z. Pawlak, Rough sets, International Journal of Computer and Information Sciences 2 (1982), 341-346.

[14]

P. Kumar, P.R. Krishna, S.K. De and R.S. Bapi, Web usage mining using rough agglomerative clustering, Proceedings of Seventh International Conference on Enterprise Information System, LNCS Springer-Verlag, London, UK, 2005, pp. 315-320.

[15]

P. Kumar, B.S. Raju and P.R. Krishna, A New Similarity Metric for Sequential Data, International Journal of Data Warehousing and Mining (IJDWM) 6(4) (2010), 16-32.

[16]

J. Yang and W. Wang, CLUSEQ: efficient and effective sequence clustering, Proceedings of the 19th International Conference on Data Engineering, Bangalore, India, 2003, pp. 101-112.

[17]

J. Xiao, Y. Zhang, X. Jia and T. Li, Measuring similarity of interests for clustering web-users, Proceedings of the 12th Australasian Conference on Database Technologies, Australia 2001, pp. 107-114.

Digital Library

[18]

M.E. Sayed, C. Ruiz and E.A. Rundensteiner, FS-Miner: efficient and incremental mining of frequent sequence patterns in web logs, Proceedings of the 6th annual ACM international workshop on Web information and data management (WIDM '04). ACM, New York, USA, 2004, pp. 128-135.

Digital Library

[19]

V. Guralnik and G. Karypis, A scalable algorithm for clustering sequential data, Proceedings of the IEEE International Conference on Data Mining, San Jose, CA, 2001, pp. 179-186.

Digital Library

[20]

H.C.M. Kum, J. Pei, W. Wang and D. Duncan, ApproxMAP: approximate mining of consensus sequential patterns, Proceedings of the Third SIAM International Conference on Data Mining (SDM), San Francisco, CA, 2003, pp. 311- 315.

[21]

P. Kumar, R.S. Bapi and P.R. Krishna, SeqPAM: A sequence clustering algorithm for Web personalization, International Journal of Data Warehousing and Mining 3(1) (2007), 29-53.

[22]

P. Lingras and C. West, Interval set clustering of web users with rough k-means, Journal of Intelligent Information Systems 23(1) (2004), 5-16.

Digital Library

[23]

P. Lingras and Y.Y. Yao, Time complexity of rough clustering: Gas versus k-means, Proceedings of Third International Conference on Rough Sets and Current Trends in Computing, LNCS Springer-Verlag, London, UK, 2002, pp. 263-270.

[24]

P.J. Lingras, Rough Set Clustering for Web Mining, Proceedings of IEEE International Conference on Fuzzy Systems, Honolulu, 2002.

[25]

S. Hirano and S. Tsumoto, An indiscernibility-based clustering method with iterative refinement of equivalence relations - rough clustering, Journal of Advanced Computational Intelligence and Intelligent Informatics 7(2) (2003), 169-177.

[26]

S. Hirano and S. Tsumoto, Indiscernibility-based clustering: Rough clustering, Proceedings of International Fuzzy Systems Association World Congress, LNCS Springer-Verlag, Heidelberg, 2003, pp. 378-386.

[27]

S.K. De and P.R. Krishna, Clustering web transactions using rough approximation, Fuzzy Sets and Systems 148(1) (2004), 131-138.

[28]

S.K. Pal and P. Mitra, Case generation using rough sets with fuzzy representation, IEEE Transactions on Knowledge and Data Engineering 16(3) (2004), 292-300.

Digital Library

[29]

S.K. Pal and A. Skowron, Rough Fuzzy Hybridization: New Trends in Decision Making. Singapore: LNCS Springer Verlag, 1999.

[30]

M. Sarkar, Rough-fuzzy functions in classification, Fuzzy Sets and Systems 132(3) (2002), 353-369.

Digital Library

[31]

S. Asharaf, M.N. Murty and S.K. Shevade, Rough set based incremental clustering of interval data, Pattern Recognition Letters 27(6) (2006), 515-519.

Digital Library

[32]

E. Mohebi and M.N.N. Sap, Rough Set Based Clustering of the Self Organizing Map, Proceedings of First Asian Conference on Intelligent Information and Database Systems Vietnam 2009, pp. 82-85.

[33]

P. Kumar, P.R. Krishna, R.S. Bapi and S.K. De, Clustering using Similarity Upper Approximation, Proceedings of IEEE International Conference on Fuzzy Systems Canada, 2006.

[34]

R. Kandwal, P. Mahajan and R. Vijay, Rough Set Based Clustering Using Active Learning Approach, International Journal of Artificial Life Research 2(4) (2011), 12-23.

Digital Library

[35]

S. Trabelsi, Z. Elouedi and P. Lingras, Classification systems based on rough sets under the belief function framework, International Journal of Approximate Reasoning 52 (2011), 1409-1432.

Digital Library

[36]

I.T.R. Yanto, P. Vitasari, T. Herawan and M.M. Deris, Applying variable precision rough set model for clustering student suffering study's anxiety, Expert Systems with Applications 39 (2012), 452-459.

Digital Library

[37]

L. Bergroth, H. Hakonen and T. Raita, A survey of longest common subsequence algorithm, In Seventh International Symposium on String Processing and Information Retrieval SPIRE Atlanta 2000, 39-48.

[38]

P. Gludici, Applied Data Mining. Statistical methods for business and industry, Wiely publication, West Sussex, England 2003.

[39]

http://archive.ics.uci.edu/ml/datasets/MSNBC.com+Anonymous+Web+Data.

[40]

A.K. Jain, M.N. Murty and P.J. Flynn, Data clustering: A review, ACM Computing Surveys 31(3) (1999), 264-323.

Digital Library

[41]

L.I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals, Soviet Physics-Doklady 10(7) (1966), 707-710.

Cited By

(2017)Similarity upper approximation-based clustering for recommendation systemInternational Journal of Business Information Systems10.5555/3140773.314077626:1(33-45)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.5555/3140773.3140776
Jie Hu Tianrui Li Chuan Luo Shaoyong Li (undefined)Incremental fuzzy probabilistic rough sets over dual universes2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ-IEEE.2015.7337866(1-8)
https://dl.acm.org/doi/10.1109/FUZZ-IEEE.2015.7337866

An alternative approach for clustering web user sessions considering sequential information
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
  2. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
  2. Information systems applications
    1. Data mining

Recommendations

Rough clustering of sequential data

This paper presents a new indiscernibility-based rough agglomerative hierarchical clustering algorithm for sequential data. In this approach, the indiscernibility relation has been extended to a tolerance relation with the transitivity property being ...
Augmented intuitive dissimilarity metric for clustering of Web user sessions

Clustering is a very useful technique to categorise Web users with common browsing activities, access patterns and navigational behaviour. Web user clustering is used to build Web visitor profiles that make the core of a personalised information ...
Web Usage Data Clustering Using Dbscan Algorithm and Set Similarities
DSDE '10: Proceedings of the 2010 International Conference on Data Storage and Data Engineering

Web usage mining is the application of data mining techniques to web log data repositories. It is used in finding the user access patterns from web access log. User page visits are sequential in nature. In this paper we presented new Rough set Dbscan ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Intelligent Data Analysis

Intelligent Data Analysis Volume 18, Issue 2

March 2014

197 pages

ISSN:1088-467X

Issue’s Table of Contents

Publisher

IOS Press

Netherlands

Publication History

Published: 01 March 2014

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

(2017)Similarity upper approximation-based clustering for recommendation systemInternational Journal of Business Information Systems10.5555/3140773.314077626:1(33-45)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.5555/3140773.3140776
Jie Hu Tianrui Li Chuan Luo Shaoyong Li (undefined)Incremental fuzzy probabilistic rough sets over dual universes2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ-IEEE.2015.7337866(1-8)
https://dl.acm.org/doi/10.1109/FUZZ-IEEE.2015.7337866

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents