[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

An alternative approach for clustering web user sessions considering sequential information

Published: 01 March 2014 Publication History

Abstract

Clustering is a prominent technique in data mining applications. It generates groups of data points that are similar to each other in a given aspect. Each group has some inherent latent similarity which is computed using the similarity measures. Clustering web users based on navigational pattern has always been an interesting as well as a challenging task. A web user, based on its navigational pattern, may belong to multiple categories. Intrinsically, web user navigation pattern exhibits sequential property. When dealing with sequence data, a similarity measure should be chosen, which captures both the order as well as content information during computation of similarity among sequences. In this paper, we have utilized the Sequence and Set Similarity Measure S^{3}M with rough set based similarity upper approximation clustering algorithm to group web users based on their navigational patterns. The quality of cluster formed using rough set based clustering algorithm with S^{3}M measure has been compared with the well known clustering algorithm, Density based spatial clustering of applications with noise DBSCAN. The experimental results show the viability of our approach.

References

[1]
R. Cooley and B. Mobasher, Web Mining: Information and Pattern Discovery on the World Wide Web, Proceedings of Ninth IEEE International Conference on Tools with Artificial Intelligence, California, USA, 3-8 Nov, 1997.
[2]
P. Kolari and A. Joshi, Web mining: Research and Practice, Computing in Science & Engineering IEEE, Co published by the IEEE CS and the AIP University of Maryland, Baltimore County, 2004, pp. 49-53.
[3]
R. Agrawal and R. Srikant, Fast algorithms for mining association rules, Proc. of the 20th VLDB Conference, Santiago, Chile, 1994, pp. 487-499.
[4]
S.E. Dean and M. Viveros, Data mining the IBM official 1996 Olympics Web site, Technical report, IBM T.J. Watson Research Center, 1997.
[5]
R. Agrawal and R. Srikant, Mining sequential patterns: Generalizations and performance improvements, Proc. of the Fifth Int'l Conference on Extending Database Technology, Avignon, France, 1996.
[6]
F. Masseglia, P. Poncelet, M. Teisseire and A. Marascu, Web Usage Mining: Extracting Unexpected Periods from Web Logs, Data Mining and Knowledge Discovery 16(1) (2008), 39-65.
[7]
G. Castellano, A.M. Fanelli and M.A. Torsello, NEWER: A system for NEuro-fuzzy WEb Recommendation, Applied Soft Computing 11(1) (2011), 793-806.
[8]
M. Easter, H.P. Kriegek and J.A. Sander, Density-based algorithm for discovering clusters in large databases. Proc. of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD 96), AAAI Press, Portland, Aug. 1996, pp. 226-231.
[9]
M. Ankerst, M.M. Breunig and H.P. Kriegel, OPTIC: Ordering points to identify the clustering structure, Proc. of ACM SIGMOD International Conference on Management of Data, ACM Press, Philadelphia, 1999, pp. 49-60.
[10]
A. Hinneburg and D.A. Keim, An efficient approach to clustering in large multimedia databases with noise, Proc. Fourth International Conference on Knowledge Discovery and Data Mining (KDD 98), AAAI Press New York, 1998, pp. 58-65.
[11]
B. Borah and D.K. Bhattacharyya, An Improved Sampling-Based DBSCAN for Large Spatial Databases, Proc. International Conference on Intelligent Sensing and Information, IEEE Press 2004, pp. 92-96.
[12]
B. Borah and D.K. Bhattacharyya, DDSC: A Density Differentiated Spatial Clustering Technique, Journal of computers 3(2) (2008), 72-79.
[13]
Z. Pawlak, Rough sets, International Journal of Computer and Information Sciences 2 (1982), 341-346.
[14]
P. Kumar, P.R. Krishna, S.K. De and R.S. Bapi, Web usage mining using rough agglomerative clustering, Proceedings of Seventh International Conference on Enterprise Information System, LNCS Springer-Verlag, London, UK, 2005, pp. 315-320.
[15]
P. Kumar, B.S. Raju and P.R. Krishna, A New Similarity Metric for Sequential Data, International Journal of Data Warehousing and Mining (IJDWM) 6(4) (2010), 16-32.
[16]
J. Yang and W. Wang, CLUSEQ: efficient and effective sequence clustering, Proceedings of the 19th International Conference on Data Engineering, Bangalore, India, 2003, pp. 101-112.
[17]
J. Xiao, Y. Zhang, X. Jia and T. Li, Measuring similarity of interests for clustering web-users, Proceedings of the 12th Australasian Conference on Database Technologies, Australia 2001, pp. 107-114.
[18]
M.E. Sayed, C. Ruiz and E.A. Rundensteiner, FS-Miner: efficient and incremental mining of frequent sequence patterns in web logs, Proceedings of the 6th annual ACM international workshop on Web information and data management (WIDM '04). ACM, New York, USA, 2004, pp. 128-135.
[19]
V. Guralnik and G. Karypis, A scalable algorithm for clustering sequential data, Proceedings of the IEEE International Conference on Data Mining, San Jose, CA, 2001, pp. 179-186.
[20]
H.C.M. Kum, J. Pei, W. Wang and D. Duncan, ApproxMAP: approximate mining of consensus sequential patterns, Proceedings of the Third SIAM International Conference on Data Mining (SDM), San Francisco, CA, 2003, pp. 311- 315.
[21]
P. Kumar, R.S. Bapi and P.R. Krishna, SeqPAM: A sequence clustering algorithm for Web personalization, International Journal of Data Warehousing and Mining 3(1) (2007), 29-53.
[22]
P. Lingras and C. West, Interval set clustering of web users with rough k-means, Journal of Intelligent Information Systems 23(1) (2004), 5-16.
[23]
P. Lingras and Y.Y. Yao, Time complexity of rough clustering: Gas versus k-means, Proceedings of Third International Conference on Rough Sets and Current Trends in Computing, LNCS Springer-Verlag, London, UK, 2002, pp. 263-270.
[24]
P.J. Lingras, Rough Set Clustering for Web Mining, Proceedings of IEEE International Conference on Fuzzy Systems, Honolulu, 2002.
[25]
S. Hirano and S. Tsumoto, An indiscernibility-based clustering method with iterative refinement of equivalence relations - rough clustering, Journal of Advanced Computational Intelligence and Intelligent Informatics 7(2) (2003), 169-177.
[26]
S. Hirano and S. Tsumoto, Indiscernibility-based clustering: Rough clustering, Proceedings of International Fuzzy Systems Association World Congress, LNCS Springer-Verlag, Heidelberg, 2003, pp. 378-386.
[27]
S.K. De and P.R. Krishna, Clustering web transactions using rough approximation, Fuzzy Sets and Systems 148(1) (2004), 131-138.
[28]
S.K. Pal and P. Mitra, Case generation using rough sets with fuzzy representation, IEEE Transactions on Knowledge and Data Engineering 16(3) (2004), 292-300.
[29]
S.K. Pal and A. Skowron, Rough Fuzzy Hybridization: New Trends in Decision Making. Singapore: LNCS Springer Verlag, 1999.
[30]
M. Sarkar, Rough-fuzzy functions in classification, Fuzzy Sets and Systems 132(3) (2002), 353-369.
[31]
S. Asharaf, M.N. Murty and S.K. Shevade, Rough set based incremental clustering of interval data, Pattern Recognition Letters 27(6) (2006), 515-519.
[32]
E. Mohebi and M.N.N. Sap, Rough Set Based Clustering of the Self Organizing Map, Proceedings of First Asian Conference on Intelligent Information and Database Systems Vietnam 2009, pp. 82-85.
[33]
P. Kumar, P.R. Krishna, R.S. Bapi and S.K. De, Clustering using Similarity Upper Approximation, Proceedings of IEEE International Conference on Fuzzy Systems Canada, 2006.
[34]
R. Kandwal, P. Mahajan and R. Vijay, Rough Set Based Clustering Using Active Learning Approach, International Journal of Artificial Life Research 2(4) (2011), 12-23.
[35]
S. Trabelsi, Z. Elouedi and P. Lingras, Classification systems based on rough sets under the belief function framework, International Journal of Approximate Reasoning 52 (2011), 1409-1432.
[36]
I.T.R. Yanto, P. Vitasari, T. Herawan and M.M. Deris, Applying variable precision rough set model for clustering student suffering study's anxiety, Expert Systems with Applications 39 (2012), 452-459.
[37]
L. Bergroth, H. Hakonen and T. Raita, A survey of longest common subsequence algorithm, In Seventh International Symposium on String Processing and Information Retrieval SPIRE Atlanta 2000, 39-48.
[38]
P. Gludici, Applied Data Mining. Statistical methods for business and industry, Wiely publication, West Sussex, England 2003.
[39]
http://archive.ics.uci.edu/ml/datasets/MSNBC.com+Anonymous+Web+Data.
[40]
A.K. Jain, M.N. Murty and P.J. Flynn, Data clustering: A review, ACM Computing Surveys 31(3) (1999), 264-323.
[41]
L.I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals, Soviet Physics-Doklady 10(7) (1966), 707-710.

Cited By

View all
  • (2017)Similarity upper approximation-based clustering for recommendation systemInternational Journal of Business Information Systems10.5555/3140773.314077626:1(33-45)Online publication date: 1-Jan-2017
  • (undefined)Incremental fuzzy probabilistic rough sets over dual universes2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ-IEEE.2015.7337866(1-8)

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Intelligent Data Analysis
Intelligent Data Analysis  Volume 18, Issue 2
March 2014
197 pages

Publisher

IOS Press

Netherlands

Publication History

Published: 01 March 2014

Author Tags

  1. Clustering
  2. Sequential Data
  3. Similarity Upper Approximation
  4. Web Usage Data

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Similarity upper approximation-based clustering for recommendation systemInternational Journal of Business Information Systems10.5555/3140773.314077626:1(33-45)Online publication date: 1-Jan-2017
  • (undefined)Incremental fuzzy probabilistic rough sets over dual universes2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ-IEEE.2015.7337866(1-8)

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media