Abstract
Neighborhood rough set has been proven to be an effective tool for feature selection. In this model, the positive region of decision is used to evaluate the classification ability of a subset of candidate features. It is computed by just considering consistent samples. However, the classification ability is not only related to consistent samples, but also to the ability to discriminate samples with different decisions. Hence, the dependency function, constructed by the positive region, cannot reflect the actual classification ability of a feature subset. In this paper, we propose a new feature evaluation function for feature selection by using discernibility matrix. We first introduce the concept of neighborhood discernibility matrix to characterize the classification ability of a feature subset. We then present the relationship between distance matrix and discernibility matrix, and construct a feature evaluation function based on discernibility matrix. It is used to measure the significance of a candidate feature. The proposed model not only maintains the maximal dependency function, but also can select features with the greatest discernibility ability. The experimental results show that the proposed method can be used to deal with heterogeneous data sets. It is able to find effective feature subsets in comparison with some existing algorithms.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Kohavi R (1994) Feature subset selection as search with probabilistic estimates. In: Proceeding AAAI fall symposium on relevance, pp 122–126
Kwak N, Choi C-H (2002) Input feature selection by mutual information based on Parzen window. IEEE Trans Pattern Anal Mach Intell 24(12):1667–1671
Parthalain N, Shen Q, ldquo RJ (2010) A distance measure approach to exploring the rough set boundary region for attribute reduction. IEEE Trans Knowl Data Eng 22(3):305–317
Qian Y, Li Y, Liang J, Lin G, Dang C (2015) Fuzzy granular structure distance. IEEE Trans Fuzzy Syst 23(6):2245–2259
Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceeding 17th international conference on machine learning, pp 359–366
Dai J, Tian H (2013) Entropy measures and granularity measures for set-valued information systems. Inf Sci 240:72–82
Duntsch I, Gediga G (1998) Uncertainty measures of rough set prediction. Artif Intell 106:109–137
Hu QH, Yu DR, Xie ZX, Liu JF (2006) Fuzzy probabilistic approximation spaces and their information measures. IEEE Trans Fuzzy Syst 14(2):191–201
Liang JY, Wang F, Dang CY, Qian YH (2014) A group incremental approach to feature selection applying rough set technique. IEEE Trans Knowl Data Eng 26(2):294–304
Wang C, Hu Q, Wang X, Chen D, Qian Y (2017) Feature selection based on neighborhood discrimination index. IEEE Trans Neural Netw Learn Syst. doi:10.1109/TNNLS.2017.2710422
Xu W, Yu J (2017) A novel approach to information fusion in multi-source datasets: a granular computing viewpoint. Inf Sci 378:410–423
Dash M, Liu H (2013) Consistency-based search in feature selection. Artif Intell 151(1/2):155–176
Raza Ashfaq RA, Wang X, Huang JZ, Abbas H, He Y (2017) Fuzziness based semi-supervised learning approach for intrusion detection system. Inf Sci 378:484–497
Wang XZ, Aamir R, Fu AM (2015) Fuzziness based sample categorization for classifier performance improvement. J Intell Fuzzy Syst 29:1185–1196
Wang XZ, Xing HJ, Li Y et al (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654
Wang X, Zhang T, Wang R (2017) Non-Iterative deep learning: incorporating restricted boltzmann machine into multilayer random weight neural networks. IEEE Trans Syst Man Cybern Syst. doi:10.1109/TSMC.2017.2701419
Wang X, Wang R, Xu C (2017) Discovering the relationship between generalization and uncertainty by incorporating complexity of classification. IEEE Trans Cybern. doi:10.1109/TCYB.2017.2653223
Zhang X, Miao D, Liu C, Le M (2016) Constructive methods of rough approximation operators and multigranulation rough sets. Knowl Based Syst 91:114–125
Zhang H, Yang S (2016) Ranking interval sets based on inclusion measures and application to three-way decisions. Knowl Based Syst 91:62–70
Zhang H, Yang S (2016) Inclusion measure for typical hesitant fuzzy sets, the relative similarity measure and fuzzy entropy. Soft Comput 20(4):1277–1287
Zhu H, Wang X (2017) A cost-sensitive semi-supervised learning model based on uncertainty. Neurocomputing 251(16):106–114
Pawlak Z (1982) Rough sets. Int J Comput Inform Sci 11(5):341–356
Beynon M (2001) Reducts within the variable precision rough sets model: a further investigation. Eur J Oper Res 134:593–605
Bhatt RB, Gopal M (2005) On the compact computational domain of fuzzy rough sets. Pattern Recognit Lett 26:1632–1640
Chen DG, Zhang L, Zhao SY, Hu QH, Zhu PF (2012) A novel algorithm for finding reducts with fuzzy rough sets. IEEE Trans Fuzzy Syst 20(2):385–389
Chouchoulas A, Shen Q (2001) Rough set-aided keyword reduction for text categorization. Appl Artif Intell 15(9):843–873
Cornelis C, Jensen R, Hurtado G et al (2007) Attribute select with fuzzy decision reducts. Inf Sci 177:3–20
Dubois D, Prade H (1990) Rough fuzzy sets and fuzzy rough sets. Int J Gen Syst 17:191–208
Lang G, Miao D, Yang T, Cai M (2016) Knowledge reduction of dynamic covering decision information systems when varying covering cardinalities. Inf Sci 346–347:236–260
Li J, Aswani Kumar C, Mei C, Wang X (2017) Comparison of reduction in formal decision contexts. Int J Approx Reason 80:100–122
Li J, Ren Y, Mei C, Qian Y, Yang X (2016) A comparative study of multigranulation rough sets and concept lattices via rule acquisition. Knowledge Based Syst 91:152–164
Li J, Zhao H, Zhu W (2015) Fast randomized algorithm with restart strategy for minimal test cost feature selection. Int J Mach Learn Cybern 6(3):435–442
Lin G, Liang J, Qian Y, Li J (2016) A fuzzy multigranulation decision-theoretic approach to multi-source fuzzy information systems. Knowl Based Syst 91:102–113
Lin TY (1997) Neighborhood systems—application to qualitative fuzzy and rough sets. In: Wang PP (ed) Advances in machine intelligence and soft computing, Department of Electrical Engineering. Duke University, Durham, pp 132–155
Mi JS, Leung Y, Zhao HY, Feng T (2008) Generalized fuzzy rough sets determined by a triangular norm. Inf Sci 78(16):3203–3213
Mieszkowicz-Rolka A, Rolka L (2004) Variable precision fuzzy rough sets, in: Transactions on Rough sets 1, LNCS-3100. Springer, Berlin, pp 144–160
Radzikowska AM, Kerre EE (2002) A comparative study of fuzzy rough sets. Fuzzy Sets Syst 126(22):137–155
Sun B, Ma W, Qian Y (2017) Multigranulation fuzzy rough set over two universes and its application to decision making. Knowl Based Syst 123:61–74
Sun B, Ma W, Xiao X (2017) Three-way group decision making based on multigranulation fuzzy decision-theoretic rough set over two universes. Int J Approx Reason 81:87–102
Skowron A, Rauszer C (1992) The discernibility matrices and functions in information systems. In: Slowinski R (ed) Intelligent decision support. Kluwer Academic Publishers, Dordrecht, pp 331–362
Slowinski R, Vanderpooten D (2000) A generalized definition of rough approximations based on similarity. IEEE Trans Knowl Data Eng 2:331–336
Eric CC, Tsang Q, Hu D, Chen (2016) Feature and instance reduction for PNN classifiers based on fuzzy rough sets. Int J Mach Learn Cybern 7(1):1–11
Xu W, Guo Y (2016) Generalized multigranulation double-quantitative decision-theoretic rough set. Knowl Based Syst 105(1):190–205
Xu W, Li W (2016) Granular computing approach to two-way learning based on formal concept analysis in fuzzy datasets. IEEE Trans Cybern 46(2):366–379
Xu W, Li Y, Liao X (2012) Approaches to attribute reductions based on rough set and matrix computation in inconsistent ordered information systems. Knowl Based Syst 27:78–91
Yang X, Qi Y, Yu H, Song X, Yang J (2014) Updating multigranulation rough approximations with increasing of granular structures. Knowl Based Syst 64:59–69
Yang Y, Chen D, Wang H, Wang X (2017) incremental perspective for feature selection for feature selection based on fuzzy rough sets. IEEE Trans Fuzzy Syst. doi:10.1109/TFUZZ.2017.2718492
Yang Y, Chen D, Wang H (2017) Active sample selection based incremental algorithm for attribute reduction with rough set. IEEE Trans Fuzzy Syst 25(4):825–838
Yao YY (2008) Probabilistic rough set approximations. Int J Approx Reason 49(2):255–271
Zhao SY, Chen H, Li CP, Zhai MY, Du XY (2013) RFRR: robust fuzzy rough reduction. IEEE Trans Fuzzy Syst 21(5):825–841
Zhao S, Chen H, Li C, Du X, Sun H (2015) A novel approach to building a robust fuzzy rough classifier. IEEE Trans Fuzzy Syst 23(4):769–786
Zhao S, Wang X, Chen D, Eric CC, Tsang (2013) Nested structure in parameterized rough reduction. Inf Sci 248:130–150
Zhang X, Dai J, Yu Y (2015) On the union and intersection operations of rough sets based on various approximation spaces. Inf Sci 292:214–229
Ziarko W (1993) Variable precision rough set model. J Comput Syst Sci 46:39–59
Hu QH, Yu D, Liu JF, Wu C (2008) Neighborhood-rough-set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594
Wu WZ, Zhang WX (2002) Neighborhood operator systems and approximations. Inf Sci 144(1–4):201–217
Wang C, Qi Y, Shao M et al (2016) A fitting model for feature selection with fuzzy rough sets. IEEE Trans Fuzzy Syst 25(4):741–753
Wang C, Shao M, He Q, Qian Y, Qi Y (2016) Feature subset selection based on fuzzy neighborhood rough sets. Knowl Based Syst 111(1):173–179
Wang XZ, Zhai JH, Lu SX (2008) Induction of multiple fuzzy decision trees based on rough set technique. Inf Sci 178(16):3188–3202
Femandez SJM, Murakami S (2003) Rough set analysis of a general type of fuzzy data using transitive aggregations of fuzzy similarity relations. Fuzzy Sets Syst 139:635–660
Greco S, Matarazzo B, Slowinski R (2002) Rough approximation by dominance relations. Int J Intell Syst 17:153–171
Inuiguchi M, Yoshioka Y, Kusunoki Y (2009) Variable-precision dominance based rough set approach and attribute reduction. Int J Approx Reason 50:1199–1214
Kim D (2001) Data classification based on tolerant rough set. Pattern Recognit 34(8):1613–1624
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grants 61473111, 61572082, 61673396, and 61363056, the Foundation of Educational Committee of Liaoning Province (LZ2016003), the Natural Science Foundation of Liaoning Province (20170540012), the Program for Liaoning Innovative Research Team in University (LT2014024), the Macau Science and Technology Development Fund (Nos.100/2013/A2 and 081/2015/A3) and Natural Science Foundation of BUCEA under Grants KYJJ2017017.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, C., He, Q., Shao, M. et al. Feature selection based on maximal neighborhood discernibility. Int. J. Mach. Learn. & Cyber. 9, 1929–1940 (2018). https://doi.org/10.1007/s13042-017-0712-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-017-0712-6