More Web Proxy on the site http://driver.im/

research-article

Nearest-neighbor guided evaluation of data reliability and its applications

Authors:

Tossapon Boongoen,

Qiang ShenAuthors Info & Claims

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, Volume 40, Issue 6

Pages 1622 - 1633

https://doi.org/10.1109/TSMCB.2010.2043357

Published: 01 December 2010 Publication History

Abstract

The intuition of data reliability has recently been incorporated into the main stream of research on ordered weighted averaging (OWA) operators. Instead of relying on human-guided variables, the aggregation behavior is determined in accordance with the underlying characteristics of the data being aggregated. Data-oriented operators such as the dependent OWA (DOWA) utilize centralized data structures to generate reliable weights, however. Despite their simplicity, the approach taken by these operators neglects entirely any local data structure that represents a strong agreement or consensus. To address this issue, the cluster-based OWA (Clus-DOWA) operator has been proposed. It employs a cluster-based reliability measure that is effective to differentiate the accountability of different input arguments. Yet, its actual application is constrained by the high computational requirement. This paper presents a more efficient nearest-neighbor-based reliability assessment for which an expensive clustering process is not required. The proposed measure can be perceived as a stress function, from which the OWA weights and associated decision-support explanations can be generated. To illustrate the potential of this measure, it is applied to both the problem of information aggregation for alias detection and the problem of unsupervised feature selection (in which unreliable features are excluded from an actual learning process). Experimental results demonstrate that these techniques usually outperform their conventional state-of-the-art counterparts.

References

[1]

D.W. Aha and R. L. Bankert, "A comparative evaluation of sequential feature selection algorithms," in Learning From Data. New York: Springer-Verlag, 1996, pp. 199-206.

[2]

A. Asuncion and D. J. Newman, UCI Machine Learning Repository, Irvine, CA: School Inf. Comput. Sci., Univ. California2007. {Online}. Available: www.ics.uci.edu/~mlearn/MLRepository.html

[3]

G. Beliakov, A. Pradera, and T. Calvo, Aggregation Functions: A Guide for Practitioners. Berlin, Germany: Springer-Verlag, 2007.

Digital Library

[4]

C. M. Bishop, Neural Networks for Pattern Recognition. Oxford, U.K.: Oxford Univ. Press, 1995.

Digital Library

[5]

A. L. Blum and P. Langley, "Selection of relevant features and examples in machine learning," Artif. Intell., vol. 97, no. 1/2, pp. 245-271, Dec. 1997.

Digital Library

[6]

T. Boongoen and Q. Shen, "Clus-DOWA: A new dependent OWA operator," in Proc. IEEE Int. Conf. Fuzzy Sets Syst., 2008, pp. 1057-1063.

[7]

T. Boongoen, Q. Shen, and C. Price, "Disclosing false identity through hybrid link analysis," AI and Law, to be published. 010-9085-9.

[8]

X. Chen, "An improved branch and bound algorithm for feature selection," Pattern Recognit. Lett., vol. 24, no. 12, pp. 1925-1933, Aug. 2003.

Digital Library

[9]

M. Dash, K. Choi, P. Scheuermann, and H. Liu, "Feature selection for clustering: A filter solution," in Proc. IEEE Int. Conf. Data Mining, 2002, pp. 115-122.

Digital Library

[10]

M. Dash and H. Liu, "Feature selection for classification," Int. J. Intell. Data Anal., vol. 1, no. 3, pp. 131-156, 1997.

Digital Library

[11]

M. Dash and H. Liu, "Unsupervised feature selection and ranking," in New Trends in Knowledge Discovery for Business Information Systems. Norwell, MA: Kluwer, 2000.

[12]

P. A. Denvijver and J. Kittler, Pattern Recognition: A Statistical Approach. Englewood Cliffs, NJ: Prentice-Hall, 1982.

[13]

J. G. Dy and C. E. Brodley, "Feature selection for unsupervised learning," J. Mach. Learn. Res., vol. 5, pp. 845-889, Dec. 2004.

Digital Library

[14]

M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein, "Cluster analysis and display of genome-wide expression patterns," Proc. Nat. Acad. Sci. U.S.A., vol. 95, no. 25, pp. 14 863-14 868, Dec. 1998.

[15]

X. Z. Fern and C. E. Brodley, "Solving cluster ensemble problems by bipartite graph partitioning," in Proc. Int. Conf. Mach. Learn., 2004, pp. 36-43.

Digital Library

[16]

D. Filev and R. R. Yager, "On the issue of obtaining OWA operator weights," Fuzzy Sets Syst., vol. 94, no. 2, pp. 157-169, Mar. 1998.

Digital Library

[17]

M. Filippone, F. Camastra, F. Masulli, and S. Rovetta, "A survey of kernel and spectral methods for clustering," Pattern Recognit., vol. 41, no. 1, pp. 176-190, Jan. 2008.

Digital Library

[18]

A. L. N. Fred and A. K. Jain, "Combining multiple clusterings using evidence accumulation," IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 6, pp. 835-850, Jun. 2005.

Digital Library

[19]

X. Fu, T. Boongoen, and Q. Shen, "Evidence directed generation of plausible crime scenarios with identity resolution," Appl. Artif. Intell., to be published.

Digital Library

[20]

R. Fuller, "On obtaining OWA operator weights: A short survey of recent developments," in Proc. IEEE Int. Conf. Comput. Cybern., 2007, pp. 241-244.

[21]

A. Gionis, H. Mannila, and P. Tsaparas, "Clustering aggregation," in Proc. Int. Conf. Data Eng., 2005, pp. 341-352.

Digital Library

[22]

I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," J. Mach. Learn. Res., vol. 3, pp. 1157-1182, Mar. 2003.

Digital Library

[23]

J. Han and M. Kamber, Data Mining: Concepts and Techniques. San Mateo, CA: Morgan Kaufmann, 2000.

Digital Library

[24]

J. Handl and J. Knowles, "Feature subset selection in unsupervised learning via multiobjective optimization," Int. J. Comput. Intell. Res., vol. 2, no. 3, pp. 217-238, 2006.

[25]

Y. Hong, S. Kwong, Y. Chang, and Q. Ren, "Consensus unsupervised feature ranking from multiple views," Pattern Recognit. Lett., vol. 29, no. 5, pp. 595-602, Apr. 2008.

Digital Library

[26]

P. Hsiung, A. Moore, D. Neill, and J. Schneider, "Alias detection in link data sets," in Proc. Int. Conf. Intell. Anal., 2005.

[27]

A. Jain and D. Zongker, "Feature selection: Evaluation, application, and small sample performance," IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 2, pp. 153-158, Feb. 1997.

Digital Library

[28]

M. A. Jaro, "Probabilistic linkage of large public health data files," Stat. Med., vol. 14, no. 5-7, pp. 491-498, Mar./Apr. 1995.

[29]

R. Jensen and Q. Shen, Computational Intelligence and Feature Selection: Rough and Fuzzy Approaches. Piscataway, NJ: IEEE Press, 2008.

Digital Library

[30]

R. Jensen and Q. Shen, "New approaches to fuzzy-rough feature selection," IEEE Trans. Fuzzy Syst., vol. 17, no. 4, pp. 824-838, Jul. 2009.

Digital Library

[31]

Y. Kim, W. N. Street, and F. Menczer, "Evolutionary model selection in unsupervised learning," Intell. Data Anal., vol. 6, no. 6, pp. 531-556, Dec. 2002.

[32]

R. Kohavi and G. John, "Wrappers for feature subset selection," Artif. Intell., vol. 97, no. 1/2, pp. 273-324, Dec. 1997.

Digital Library

[33]

K. Kukich, "Techniques for automatically correcting words in text," ACM Comput. Surv., vol. 24, no. 4, pp. 377-439, Dec. 1992.

Digital Library

[34]

W. Lee, S. J. Stolfo, and K. W. Mok, "Adaptive intrusion detection: A data mining approach," Artif. Intell. Rev., vol. 14, no. 6, pp. 533-567, Dec. 2000.

Digital Library

[35]

E. Leopold and J. Kindermann, "Text categorization with support vector machines: How to represent texts in input space?" Mach. Learn., vol. 46, no. 1-3, pp. 423-444, Jan. 2002.

Digital Library

[36]

H. Liu and H. Motoda, Feature Extraction, Construction and Selection: A Data Mining Perspective. Norwell, MA: Kluwer, 1998.

Digital Library

[37]

H. Liu and L. Yu, "Toward integrating feature selection algorithms for classification and clustering," IEEE Trans. Knowl. Data Eng., vol. 17, no. 4, pp. 491-502, Apr. 2005.

Digital Library

[38]

X. Liu, "Some properties of the weighted OWA operator," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 36, no. 1, pp. 118-127, Feb. 2006.

Digital Library

[39]

P. Mitra, C. A. Murthy, and S. K. Pal, "Unsupervised feature selection using feature similarity," IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 3, pp. 301-312, Mar. 2002.

Digital Library

[40]

G. Navarro, "A guided tour to approximate string matching," ACM Comput. Surv., vol. 33, no. 1, pp. 31-88, Mar. 2001.

Digital Library

[41]

S. B. Needleman and C. D. Wunsch, "A general method applicable to the search for similarities in the amino acid sequence of two proteins," J. Mol. Biol., vol. 48, no. 3, pp. 443-453, Mar. 1970.

[42]

K. S. Ng and H. Liu, "Customer retention via data mining," Artif. Intell. Rev., vol. 14, no. 6, pp. 569-590, Dec. 2000.

Digital Library

[43]

N. Nguyen and R. Caruana, "Consensus clusterings," in Proc. IEEE Int. Conf. Data Mining, 2007, pp. 607-612.

Digital Library

[44]

M. O'Hagan, "Aggregating template rule antecedents in real-time expert systems with fuzzy set logic," in Proc. Annu. IEEE Conf. Signals, Syst., Comput., 1988, pp. 681-689.

[45]

S. K. Pal, R. K. De, and J. Basak, "Unsupervised feature evaluation: A neuro-fuzzy approach," IEEE Trans. Neural Netw., vol. 11, no. 2, pp. 366- 376, Aug. 2000.

Digital Library

[46]

J. M. Pena, J. A. Lozano, P. Larranaga, and I. Inza, "Dimensionality reduction in unsupervised learning of conditional Gaussian networks," IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 6, pp. 590-603, Jun. 2001.

Digital Library

[47]

P. Pudil, J. Novovicov'a, and J. Kittler, "Floating search methods in feature selection," Pattern Recognit. Lett., vol. 15, no. 11, pp. 1119-1125, Nov. 1994.

Digital Library

[48]

W. M. Rand, "Objective criteria for the evaluation of clustering methods," J. Amer. Stat. Assoc., vol. 66, no. 336, pp. 846-850, Dec. 1971.

[49]

K. A. Rasmani and Q. Shen, "Data-driven fuzzy rule generation and its application for student academic performance evaluation," Appl. Intell., vol. 25, no. 3, pp. 305-319, Dec. 2006.

Digital Library

[50]

S. T. Roweis and L. K. Saul, "Nonlinear dimensionality reduction by locally linear embedding," Science, vol. 290, no. 5500, pp. 2323-2326, Dec. 2000.

[51]

W. Siedlecki and J. Sklansky, "A note on genetic algorithms for large-scale feature selection," Pattern Recognit. Lett., vol. 10, no. 5, pp. 335-347, Nov. 1989.

Digital Library

[52]

M. C. P. Souto, I. G. Costa, D. S. A. Araujo, T. B. Ludermir, and A. Schliep, "Clustering cancer gene expression data: A comparative study," BMC Bioinformatics, vol. 9, p. 497, Nov. 2008.

[53]

A. Strehl and J. Ghosh, "Cluster ensembles--A knowledge reuse framework for combining multiple partitions," J. Mach. Learn. Res., vol. 3, pp. 583-617, Mar. 2003.

Digital Library

[54]

J. B. Tenenbaum, V. de Silva, and J. C. Langford, "A global geometric framework for nonlinear dimensionality reduction," Science, vol. 290, no. 5500, pp. 2319-2323, Dec. 2000.

[55]

A. P. Topchy, A. K. Jain, andW. F. Punch, "Clustering ensembles: Models of consensus and weak partitions," IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 12, pp. 1866-1881, Oct. 2005.

Digital Library

[56]

W. S. Torgenson, "Multidimensional scaling," Psychometrika, vol. 17, pp. 401-419, 1952.

[57]

L. Troiano and R. R. Yager, "Recursive and iterative OWA operators," Int. J. Uncertainty, Fuzziness Knowl.-Based Syst., vol. 13, no. 6, pp. 579-599, Dec. 2005.

Digital Library

[58]

Z. Xu, "An overview of methods for determining OWA weights," Int. J. Intell. Syst., vol. 20, no. 8, pp. 843-865, Aug. 2005.

Digital Library

[59]

Z. Xu, "Dependent OWA operators," in Proc. Int. Conf. Model Decisions Artif. Intell., 2006, pp. 172-178.

Digital Library

[60]

R. R. Yager, "On ordered weighted averaging aggregation operators in multi-criteria decision making," IEEE Trans. Syst., Man, Cybern., vol. 18, no. 1, pp. 183-190, Jan./Feb. 1988.

Digital Library

[61]

R. R. Yager, "Families of OWA operators," Fuzzy Sets Syst., vol. 59, no. 2, pp. 125-148, 1993.

Digital Library

[62]

R. R. Yager, "Quantifier guided aggregation using OWA operators," Int. J. Intell. Syst., vol. 11, no. 1, pp. 49-73, 1996.

[63]

R. R. Yager, "Centered OWA operators," Soft Comput., vol. 11, no. 7, pp. 631-639, Feb. 2007.

Digital Library

[64]

R. R. Yager, "Using stress functions to obtain OWA operators," IEEE Trans. Fuzzy Syst., vol. 15, no. 6, pp. 1122-1129, Dec. 2007.

Digital Library

[65]

R. R. Yager and J. Kacprzyk, The Ordered Weighted Averaging Operators: Theory and Applications. Norwell, MA: Kluwer, 1997.

Digital Library

[66]

L. A. Zadeh, "The concept of a linguistic variable and its application to approximate reasoning--I," Inf. Sci., vol. 8, no. 3, pp. 199-249, 1975.

[67]

D. Zhang, S. Chen, and Z. H. Zhou, "Constraint score: A new filter method for feature selection with pairwise constraints," Pattern Recognit., vol. 41, no. 5, pp. 1440-1451, May 2008.

Digital Library

[68]

J. Zhao, K. Lu, and X. He, "Locality sensitive semi-supervised feature selection," Neurocomputing, vol. 71, no. 10-12, pp. 1842-1849, Jun. 2008.

Digital Library

Cited By

Su PZhao YChen TXie JZhao YQi HZheng YLiu J(2019)Exploiting Reliability-Guided Aggregation for the Assessment of Curvilinear Structure TortuosityMedical Image Computing and Computer Assisted Intervention – MICCAI 201910.1007/978-3-030-32251-9_2(12-20)Online publication date: 13-Oct-2019
https://dl.acm.org/doi/10.1007/978-3-030-32251-9_2
Qu YShang CParthaláin NWu WShen Q(2018)Multi-functional nearest-neighbour classificationSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-017-2528-422:8(2717-2730)Online publication date: 1-Apr-2018
https://dl.acm.org/doi/10.1007/s00500-017-2528-4
Yi PLi W(2018)Induced cluster‐based OWA operators with reliability measures and the application in group decision‐makingInternational Journal of Intelligent Systems10.1002/int.2206334:4(527-540)Online publication date: 24-Oct-2018
https://dl.acm.org/doi/10.1002/int.22063
Show More Cited By

Recommendations

The OWA Aggregation With Uncertain Descriptions on Weights and Input Arguments

Since the ordered weighted averaging (OWA) operator was introduced by Yager to provide a method for aggregating multiple inputs that lie between the max and min operators, much research that deals with uncertain information on input arguments instead of ...
On kernel difference-weighted k-nearest neighbor classification
Special Issue: Non-parametric distance-based classification techniques and their applications

Nearest neighbor (NN) rule is one of the simplest and the most important methods in pattern recognition. In this paper, we propose a kernel difference-weighted k-nearest neighbor (KDF-KNN) method for pattern classification. The proposed method defines ...
Ranked Reverse Nearest Neighbor Search

Given a set of data points P and a query point q in a multidimensional space, Reverse Nearest Neighbor (RNN) query finds data points in P whose nearest neighbors are q. Reverse k-Nearest Neighbor (RkNN) query (where k ≥ 1) generalizes RNN query to find ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics Volume 40, Issue 6

December 2010

224 pages

ISSN:1083-4419

Issue’s Table of Contents

Copyright © 2010.

Publisher

IEEE Press

Publication History

Published: 01 December 2010

Accepted: 18 January 2010

Revised: 19 November 2009

Received: 24 July 2009

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Su PZhao YChen TXie JZhao YQi HZheng YLiu J(2019)Exploiting Reliability-Guided Aggregation for the Assessment of Curvilinear Structure TortuosityMedical Image Computing and Computer Assisted Intervention – MICCAI 201910.1007/978-3-030-32251-9_2(12-20)Online publication date: 13-Oct-2019
https://dl.acm.org/doi/10.1007/978-3-030-32251-9_2
Qu YShang CParthaláin NWu WShen Q(2018)Multi-functional nearest-neighbour classificationSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-017-2528-422:8(2717-2730)Online publication date: 1-Apr-2018
https://dl.acm.org/doi/10.1007/s00500-017-2528-4
Yi PLi W(2018)Induced cluster‐based OWA operators with reliability measures and the application in group decision‐makingInternational Journal of Intelligent Systems10.1002/int.2206334:4(527-540)Online publication date: 24-Oct-2018
https://dl.acm.org/doi/10.1002/int.22063
Iam-On NBoongoen T(2017)Generating descriptive model for student dropoutHuman-centric Computing and Information Sciences10.1186/s13673-016-0083-07:1(1-24)Online publication date: 1-Dec-2017
https://dl.acm.org/doi/10.1186/s13673-016-0083-0
Khan FAkbar SBasit AKhan IAkhlaq H(2017)Identification of Anticancer Peptides Using Optimal Feature Space of Chou's Split Amino Acid Composition and Support Vector MachineProceedings of the 2017 4th International Conference on Biomedical and Bioinformatics Engineering10.1145/3168776.3168787(91-96)Online publication date: 12-Nov-2017
https://dl.acm.org/doi/10.1145/3168776.3168787
Su PShang CChen TShen Q(2017)Exploiting Data Reliability and Fuzzy Clustering for Journal RankingIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2016.261226525:5(1306-1319)Online publication date: 3-Oct-2017
https://dl.acm.org/doi/10.1109/TFUZZ.2016.2612265
Chen TSu PShang CShen Q(2017)Reliability-guided fuzzy classifier ensemble2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ-IEEE.2017.8015407(1-6)Online publication date: 9-Jul-2017
https://dl.acm.org/doi/10.1109/FUZZ-IEEE.2017.8015407
Keerin PKurutach WBoongoen T(2016)A cluster-directed framework for neighbour based imputation of missing value in microarray dataInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2016.07653515:2(165-193)Online publication date: 1-May-2016
https://dl.acm.org/doi/10.1504/IJDMB.2016.076535
(2016)Rough-fuzzy rule interpolationInformation Sciences: an International Journal10.1016/j.ins.2016.02.036351:C(1-17)Online publication date: 10-Jul-2016
https://dl.acm.org/doi/10.1016/j.ins.2016.02.036
Li WYi PGuo Y(2016)Majority Clusters-Density Ordered Weighting AveragingInternational Journal of Intelligent Systems10.1002/int.2182131:12(1166-1180)Online publication date: 1-Dec-2016
https://dl.acm.org/doi/10.1002/int.21821
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents