Abstract
As a well-known clustering algorithm, Fuzzy C-Means (FCM) allows each input sample to belong to more than one cluster, providing more flexibility than non-fuzzy clustering methods. However, the accuracy of FCM is subject to false detections caused by noisy records, weak feature selection and low certainty of the algorithm in some cases. The false detections are very important in some decision-making application domains like network security and medical diagnosis, where weak decisions based on such false detections may lead to catastrophic outcomes. They mainly emerge from making decisions about a subset of records that do not provide sufficient evidence to make a good decision. In this paper, we propose a method for detecting such ambiguous records in FCM by introducing a certainty factor to decrease invalid detections. This approach enables us to send the detected ambiguous records to another discrimination method for a deeper investigation, thus increasing the accuracy by lowering the error rate. Most of the records are still processed quickly and with low error rate preventing performance loss which is common in similar hybrid methods. Experimental results of applying the proposed method on several datasets from different domains show a significant decrease in error rate as well as improved sensitivity of the algorithm.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bezdek JC, Ehrlich R, Full W (1984) Fcm: The fuzzy c-means clustering algorithm. Comput Geosci 10 (2):191–203
Brush AJ, Krumm J, Scott J (2010) Exploring end user preferences for location obfuscation, location-based services, and the value of location. In: Proceedings of the 12th ACM international conference on Ubiquitous computing, pp 95–104. ACM
Callado A, Kamienski C, Szabó G, Gero B, Kelner J, Fernandes S, Sadok D (2009) A survey on internet traffic identification. IEEE Communications Surveys & Tutorials 11(3):37–52
Callado A, Kelner J, Sadok D, Kamienski C A, Fernandes S (2010) Better network traffic identification through the independent combination of techniques. J Netw Comput Appl 33(4):433–446
Casas-Roma J, Herrera-Joancomartí J, Torra V (2014) Anonymizing graphs: measuring quality for clustering. Knowl Inf Syst:1–22
Chuang K-S, Tzeng H-L, Chen S, Wu J, Chen T-J (2006) Fuzzy c-means clustering with spatial information for image segmentation. Comput Med Imaging Graph 30(1):9–15
Dainotti A, Pescape A, Claffy KC (2012) Issues and future directions in traffic classification. IEEE Netw 26(1):35–40
Endo Y, Hasegawa Y, Yukihiro H, Kanzawa Y (2011) Fuzzy c-means clustering for uncertain data using quadratic penalty-vector regularization. Journal of Advanced Computational Intelligence 15(1)
Fonseca J, Abdelouahab Z, Lopes D, Labidi S (2010) A security framework for soa applications in mobile environment. arXiv:1004.0774
Ghadiri A, Ghadiri N (2011) An adaptive hybrid architecture for intrusion detection based on fuzzy clustering and rbf neural networks. In: Communication Networks and Services Research Conference (CNSR), 2011 Ninth Annual, pp 123–129. IEEE
Graves D, Pedrycz W (2010) Kernel-based fuzzy clustering and fuzzy clustering: A comparative experimental study. Fuzzy Sets Syst 161(4):522–543
Hamasuna Y, Endo Y, Miyamoto S (2011) On mahalanobis distance based fuzzy c-means clustering for uncertain data using penalty vector regularization. In: 2011 IEEE International Conference on Fuzzy Systems (FUZZ), pp 810–815. IEEE
Hartigan JA, Wong MA (1979) Algorithm as 136: A k-means clustering algorithm. Appl Stat:100–108
Hoh B, Gruteser M (2005) Protecting location privacy through path confusion. In: First International Conference on Security and Privacy for Emerging Areas in Communications Networks, 2005. SecureComm 2005, pp 194–205. IEEE
Hoh B, Gruteser M, Xiong H, Alrabady A (2006) Enhancing security and privacy in traffic-monitoring systems. IEEE Pervasive Computing 5(4):38–46
Höppner F, Klawonn F (2003) Improved fuzzy partitions for fuzzy regression models. Int J Approx Reason 32(2):85–102
Jain A, Agrawal S, Agrawal J, F-fdrpso Sanjeev Sharma. (2014) A novel approach based on hybridization of fuzzy c-means and fdrpso for gene clustering. In: Proceedings of the Third International Conference on Soft Computing for Problem Solving, pp 709–719. Springer
Jiang W, Yao M, Yan J (2008) Intrusion detection based on improved fuzzy c-means algorithm. In: International Symposium on Information Science and Engineering, 2008. ISISE’08, vol 2, pp 326–329. IEEE
Jianliang M, Haikun S, Ling B (2009) The application on intrusion detection based on k-means cluster algorithm. In: International Forum on Information Technology and Applications, 2009. IFITA’09, vol 1, pp 150–152. IEEE
Li D-C, Liu C-W, Susan CH (2010) A learning method for the class imbalance problem with medical data sets. Comput Biol Med 40(5):509–518
Li H, Cai J, Nguyen TNA, Zheng J (2013) A benchmark for semantic image segmentation. In: 2013 IEEE International Conference on Multimedia and Expo (ICME), pp 1–6. IEEE
Li W, Canini M, Moore AW, Bolla R (2009) Efficient application identification and the temporal and spatial stability of classification schema. Comput Netw 53(6):790–809
Lim Y-s, Kim H-c, Jeong J, Kim C-k, Kwon TT, Choi Y (2010) Internet traffic classification demystified: on the sources of the discriminative power. In: Proceedings of the 6th International COnference, p 9. ACM
Lin K-P (2014) A novel evolutionary kernel intuitionistic fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst 22(5):1074–1087
Linda O, Manic M (2012) General type-2 fuzzy c-means algorithm for uncertain fuzzy clustering. IEEE Trans Fuzzy Syst 20(5):883–897
Octavio L-G, García-Borroto M, Medina-Pérez MA, Martínez-Trinidad JF, Carrasco-Ochoa JA, De Ita G (2013) An empirical study of oversampling and undersampling methods for lcmine an emerging pattern based classifier. In: Pattern Recognition, pp 264–273. Springer
Mei J-P, Linkfcm LC (2013) Relation integrated fuzzy c-means. Pattern Recog 46(1):272–283
Ménard M, Demko C, Loonis P (2000) The fuzzy c + 2-means: solving the ambiguity rejection in clustering. Pattern recog 33(7):1219–1237
Mohd AB, Nor SbM (2009) Towards a flow-based internet traffic classification for bandwidth optimization. Int J Comput Sci Secur (IJCSS) 3(2):146–153
Nejad TR, Abadi MSA (2014) Intrusion detection in computer networks through a hybrid approach of data mining and decision trees
Pal NR, Pal K, Keller JM, Bezdek JC (2005) A possibilistic fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst 13(4):517–530
Parker JK, Hall LO (2014) Accelerating fuzzy-c means using an estimated subsample size. IEEE Trans Fuzzy Syst 22(5):1229–1244
Pedrycz W, Rai P (2008) Collaborative clustering with the use of fuzzy c-means and its quantification. Fuzzy Sets Syst 159(18):2399–2427
Sezer EA, Nefeslioglu HA, Gokceoglu C (2014) An assessment on producing synthetic samples by fuzzy c-means for limited number of data in prediction models. Appl Soft Comput 24:126–134
Chao-Ton S, Chen L-S, Yih Y (2006) Knowledge acquisition through information granulation for imbalanced data. Expert Syst Appl 31(3):531–541
Velmurugan T (2014) Performance based analysis between k-means and fuzzy c-means clustering algorithms for connection oriented telecommunication data. Appl Soft Comput 19:134– 146
Wang X-Y, Juan B (2010) A fast and robust image segmentation using fcm with spatial information. Digital Signal Processing 20(4):1173–1182
Williams N, Zander S, Armitage G (2006) A preliminary performance comparison of five machine learning algorithms for practical ip traffic flow classification. ACM SIGCOMM Computer Communication Review 36(5):5–16
Yasunori E, Isao T, Yukihiro H, Sadaaki M (2011) Kernelized fuzzy c-means clustering for uncertain data using quadratic penalty-vector regularization with explicit mappings. In: 2011 IEEE International Conference on Fuzzy Systems (FUZZ), pp 804–809. IEEE
Yu P, Qinghua L, Xiyuan P (2011) Uck-means: A customized k-means for clustering uncertain measurement data. In: 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), vol 2, pp 1196–1200. IEEE
Yuan R, Li Z, Guan X, Li X (2010) An svm-based machine learning method for accurate internet traffic classification. Inf Syst Front 12(2):149–156
Zeng S, Tong X, Sang N (2014) Study on multi-center fuzzy c-means algorithm based on transitive closure and spectral clustering. Appl Soft Comput 16:89–101
Zhao F, Liu H, Fan J (2015) A multiobjective spatial fuzzy clustering algorithm for image segmentation. Appl Soft Comput 30:48–57
Zhen L, Qiong L (2012) A new feature selection method for internet traffic classification using ml. Phys Procedia 33:1338–1345
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ghaffari, M., Ghadiri, N. Ambiguity-driven fuzzy C-means clustering: how to detect uncertain clustered records. Appl Intell 45, 293–304 (2016). https://doi.org/10.1007/s10489-016-0759-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-016-0759-1