[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Nearest-neighbor guided evaluation of data reliability and its applications

Published: 01 December 2010 Publication History

Abstract

The intuition of data reliability has recently been incorporated into the main stream of research on ordered weighted averaging (OWA) operators. Instead of relying on human-guided variables, the aggregation behavior is determined in accordance with the underlying characteristics of the data being aggregated. Data-oriented operators such as the dependent OWA (DOWA) utilize centralized data structures to generate reliable weights, however. Despite their simplicity, the approach taken by these operators neglects entirely any local data structure that represents a strong agreement or consensus. To address this issue, the cluster-based OWA (Clus-DOWA) operator has been proposed. It employs a cluster-based reliability measure that is effective to differentiate the accountability of different input arguments. Yet, its actual application is constrained by the high computational requirement. This paper presents a more efficient nearest-neighbor-based reliability assessment for which an expensive clustering process is not required. The proposed measure can be perceived as a stress function, from which the OWA weights and associated decision-support explanations can be generated. To illustrate the potential of this measure, it is applied to both the problem of information aggregation for alias detection and the problem of unsupervised feature selection (in which unreliable features are excluded from an actual learning process). Experimental results demonstrate that these techniques usually outperform their conventional state-of-the-art counterparts.

References

[1]
D.W. Aha and R. L. Bankert, "A comparative evaluation of sequential feature selection algorithms," in Learning From Data. New York: Springer-Verlag, 1996, pp. 199-206.
[2]
A. Asuncion and D. J. Newman, UCI Machine Learning Repository, Irvine, CA: School Inf. Comput. Sci., Univ. California2007. {Online}. Available: www.ics.uci.edu/~mlearn/MLRepository.html
[3]
G. Beliakov, A. Pradera, and T. Calvo, Aggregation Functions: A Guide for Practitioners. Berlin, Germany: Springer-Verlag, 2007.
[4]
C. M. Bishop, Neural Networks for Pattern Recognition. Oxford, U.K.: Oxford Univ. Press, 1995.
[5]
A. L. Blum and P. Langley, "Selection of relevant features and examples in machine learning," Artif. Intell., vol. 97, no. 1/2, pp. 245-271, Dec. 1997.
[6]
T. Boongoen and Q. Shen, "Clus-DOWA: A new dependent OWA operator," in Proc. IEEE Int. Conf. Fuzzy Sets Syst., 2008, pp. 1057-1063.
[7]
T. Boongoen, Q. Shen, and C. Price, "Disclosing false identity through hybrid link analysis," AI and Law, to be published. 010-9085-9.
[8]
X. Chen, "An improved branch and bound algorithm for feature selection," Pattern Recognit. Lett., vol. 24, no. 12, pp. 1925-1933, Aug. 2003.
[9]
M. Dash, K. Choi, P. Scheuermann, and H. Liu, "Feature selection for clustering: A filter solution," in Proc. IEEE Int. Conf. Data Mining, 2002, pp. 115-122.
[10]
M. Dash and H. Liu, "Feature selection for classification," Int. J. Intell. Data Anal., vol. 1, no. 3, pp. 131-156, 1997.
[11]
M. Dash and H. Liu, "Unsupervised feature selection and ranking," in New Trends in Knowledge Discovery for Business Information Systems. Norwell, MA: Kluwer, 2000.
[12]
P. A. Denvijver and J. Kittler, Pattern Recognition: A Statistical Approach. Englewood Cliffs, NJ: Prentice-Hall, 1982.
[13]
J. G. Dy and C. E. Brodley, "Feature selection for unsupervised learning," J. Mach. Learn. Res., vol. 5, pp. 845-889, Dec. 2004.
[14]
M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein, "Cluster analysis and display of genome-wide expression patterns," Proc. Nat. Acad. Sci. U.S.A., vol. 95, no. 25, pp. 14 863-14 868, Dec. 1998.
[15]
X. Z. Fern and C. E. Brodley, "Solving cluster ensemble problems by bipartite graph partitioning," in Proc. Int. Conf. Mach. Learn., 2004, pp. 36-43.
[16]
D. Filev and R. R. Yager, "On the issue of obtaining OWA operator weights," Fuzzy Sets Syst., vol. 94, no. 2, pp. 157-169, Mar. 1998.
[17]
M. Filippone, F. Camastra, F. Masulli, and S. Rovetta, "A survey of kernel and spectral methods for clustering," Pattern Recognit., vol. 41, no. 1, pp. 176-190, Jan. 2008.
[18]
A. L. N. Fred and A. K. Jain, "Combining multiple clusterings using evidence accumulation," IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 6, pp. 835-850, Jun. 2005.
[19]
X. Fu, T. Boongoen, and Q. Shen, "Evidence directed generation of plausible crime scenarios with identity resolution," Appl. Artif. Intell., to be published.
[20]
R. Fuller, "On obtaining OWA operator weights: A short survey of recent developments," in Proc. IEEE Int. Conf. Comput. Cybern., 2007, pp. 241-244.
[21]
A. Gionis, H. Mannila, and P. Tsaparas, "Clustering aggregation," in Proc. Int. Conf. Data Eng., 2005, pp. 341-352.
[22]
I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," J. Mach. Learn. Res., vol. 3, pp. 1157-1182, Mar. 2003.
[23]
J. Han and M. Kamber, Data Mining: Concepts and Techniques. San Mateo, CA: Morgan Kaufmann, 2000.
[24]
J. Handl and J. Knowles, "Feature subset selection in unsupervised learning via multiobjective optimization," Int. J. Comput. Intell. Res., vol. 2, no. 3, pp. 217-238, 2006.
[25]
Y. Hong, S. Kwong, Y. Chang, and Q. Ren, "Consensus unsupervised feature ranking from multiple views," Pattern Recognit. Lett., vol. 29, no. 5, pp. 595-602, Apr. 2008.
[26]
P. Hsiung, A. Moore, D. Neill, and J. Schneider, "Alias detection in link data sets," in Proc. Int. Conf. Intell. Anal., 2005.
[27]
A. Jain and D. Zongker, "Feature selection: Evaluation, application, and small sample performance," IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 2, pp. 153-158, Feb. 1997.
[28]
M. A. Jaro, "Probabilistic linkage of large public health data files," Stat. Med., vol. 14, no. 5-7, pp. 491-498, Mar./Apr. 1995.
[29]
R. Jensen and Q. Shen, Computational Intelligence and Feature Selection: Rough and Fuzzy Approaches. Piscataway, NJ: IEEE Press, 2008.
[30]
R. Jensen and Q. Shen, "New approaches to fuzzy-rough feature selection," IEEE Trans. Fuzzy Syst., vol. 17, no. 4, pp. 824-838, Jul. 2009.
[31]
Y. Kim, W. N. Street, and F. Menczer, "Evolutionary model selection in unsupervised learning," Intell. Data Anal., vol. 6, no. 6, pp. 531-556, Dec. 2002.
[32]
R. Kohavi and G. John, "Wrappers for feature subset selection," Artif. Intell., vol. 97, no. 1/2, pp. 273-324, Dec. 1997.
[33]
K. Kukich, "Techniques for automatically correcting words in text," ACM Comput. Surv., vol. 24, no. 4, pp. 377-439, Dec. 1992.
[34]
W. Lee, S. J. Stolfo, and K. W. Mok, "Adaptive intrusion detection: A data mining approach," Artif. Intell. Rev., vol. 14, no. 6, pp. 533-567, Dec. 2000.
[35]
E. Leopold and J. Kindermann, "Text categorization with support vector machines: How to represent texts in input space?" Mach. Learn., vol. 46, no. 1-3, pp. 423-444, Jan. 2002.
[36]
H. Liu and H. Motoda, Feature Extraction, Construction and Selection: A Data Mining Perspective. Norwell, MA: Kluwer, 1998.
[37]
H. Liu and L. Yu, "Toward integrating feature selection algorithms for classification and clustering," IEEE Trans. Knowl. Data Eng., vol. 17, no. 4, pp. 491-502, Apr. 2005.
[38]
X. Liu, "Some properties of the weighted OWA operator," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 36, no. 1, pp. 118-127, Feb. 2006.
[39]
P. Mitra, C. A. Murthy, and S. K. Pal, "Unsupervised feature selection using feature similarity," IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 3, pp. 301-312, Mar. 2002.
[40]
G. Navarro, "A guided tour to approximate string matching," ACM Comput. Surv., vol. 33, no. 1, pp. 31-88, Mar. 2001.
[41]
S. B. Needleman and C. D. Wunsch, "A general method applicable to the search for similarities in the amino acid sequence of two proteins," J. Mol. Biol., vol. 48, no. 3, pp. 443-453, Mar. 1970.
[42]
K. S. Ng and H. Liu, "Customer retention via data mining," Artif. Intell. Rev., vol. 14, no. 6, pp. 569-590, Dec. 2000.
[43]
N. Nguyen and R. Caruana, "Consensus clusterings," in Proc. IEEE Int. Conf. Data Mining, 2007, pp. 607-612.
[44]
M. O'Hagan, "Aggregating template rule antecedents in real-time expert systems with fuzzy set logic," in Proc. Annu. IEEE Conf. Signals, Syst., Comput., 1988, pp. 681-689.
[45]
S. K. Pal, R. K. De, and J. Basak, "Unsupervised feature evaluation: A neuro-fuzzy approach," IEEE Trans. Neural Netw., vol. 11, no. 2, pp. 366- 376, Aug. 2000.
[46]
J. M. Pena, J. A. Lozano, P. Larranaga, and I. Inza, "Dimensionality reduction in unsupervised learning of conditional Gaussian networks," IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 6, pp. 590-603, Jun. 2001.
[47]
P. Pudil, J. Novovicov'a, and J. Kittler, "Floating search methods in feature selection," Pattern Recognit. Lett., vol. 15, no. 11, pp. 1119-1125, Nov. 1994.
[48]
W. M. Rand, "Objective criteria for the evaluation of clustering methods," J. Amer. Stat. Assoc., vol. 66, no. 336, pp. 846-850, Dec. 1971.
[49]
K. A. Rasmani and Q. Shen, "Data-driven fuzzy rule generation and its application for student academic performance evaluation," Appl. Intell., vol. 25, no. 3, pp. 305-319, Dec. 2006.
[50]
S. T. Roweis and L. K. Saul, "Nonlinear dimensionality reduction by locally linear embedding," Science, vol. 290, no. 5500, pp. 2323-2326, Dec. 2000.
[51]
W. Siedlecki and J. Sklansky, "A note on genetic algorithms for large-scale feature selection," Pattern Recognit. Lett., vol. 10, no. 5, pp. 335-347, Nov. 1989.
[52]
M. C. P. Souto, I. G. Costa, D. S. A. Araujo, T. B. Ludermir, and A. Schliep, "Clustering cancer gene expression data: A comparative study," BMC Bioinformatics, vol. 9, p. 497, Nov. 2008.
[53]
A. Strehl and J. Ghosh, "Cluster ensembles--A knowledge reuse framework for combining multiple partitions," J. Mach. Learn. Res., vol. 3, pp. 583-617, Mar. 2003.
[54]
J. B. Tenenbaum, V. de Silva, and J. C. Langford, "A global geometric framework for nonlinear dimensionality reduction," Science, vol. 290, no. 5500, pp. 2319-2323, Dec. 2000.
[55]
A. P. Topchy, A. K. Jain, andW. F. Punch, "Clustering ensembles: Models of consensus and weak partitions," IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 12, pp. 1866-1881, Oct. 2005.
[56]
W. S. Torgenson, "Multidimensional scaling," Psychometrika, vol. 17, pp. 401-419, 1952.
[57]
L. Troiano and R. R. Yager, "Recursive and iterative OWA operators," Int. J. Uncertainty, Fuzziness Knowl.-Based Syst., vol. 13, no. 6, pp. 579-599, Dec. 2005.
[58]
Z. Xu, "An overview of methods for determining OWA weights," Int. J. Intell. Syst., vol. 20, no. 8, pp. 843-865, Aug. 2005.
[59]
Z. Xu, "Dependent OWA operators," in Proc. Int. Conf. Model Decisions Artif. Intell., 2006, pp. 172-178.
[60]
R. R. Yager, "On ordered weighted averaging aggregation operators in multi-criteria decision making," IEEE Trans. Syst., Man, Cybern., vol. 18, no. 1, pp. 183-190, Jan./Feb. 1988.
[61]
R. R. Yager, "Families of OWA operators," Fuzzy Sets Syst., vol. 59, no. 2, pp. 125-148, 1993.
[62]
R. R. Yager, "Quantifier guided aggregation using OWA operators," Int. J. Intell. Syst., vol. 11, no. 1, pp. 49-73, 1996.
[63]
R. R. Yager, "Centered OWA operators," Soft Comput., vol. 11, no. 7, pp. 631-639, Feb. 2007.
[64]
R. R. Yager, "Using stress functions to obtain OWA operators," IEEE Trans. Fuzzy Syst., vol. 15, no. 6, pp. 1122-1129, Dec. 2007.
[65]
R. R. Yager and J. Kacprzyk, The Ordered Weighted Averaging Operators: Theory and Applications. Norwell, MA: Kluwer, 1997.
[66]
L. A. Zadeh, "The concept of a linguistic variable and its application to approximate reasoning--I," Inf. Sci., vol. 8, no. 3, pp. 199-249, 1975.
[67]
D. Zhang, S. Chen, and Z. H. Zhou, "Constraint score: A new filter method for feature selection with pairwise constraints," Pattern Recognit., vol. 41, no. 5, pp. 1440-1451, May 2008.
[68]
J. Zhao, K. Lu, and X. He, "Locality sensitive semi-supervised feature selection," Neurocomputing, vol. 71, no. 10-12, pp. 1842-1849, Jun. 2008.

Cited By

View all
  • (2019)Exploiting Reliability-Guided Aggregation for the Assessment of Curvilinear Structure TortuosityMedical Image Computing and Computer Assisted Intervention – MICCAI 201910.1007/978-3-030-32251-9_2(12-20)Online publication date: 13-Oct-2019
  • (2018)Multi-functional nearest-neighbour classificationSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-017-2528-422:8(2717-2730)Online publication date: 1-Apr-2018
  • (2018)Induced cluster‐based OWA operators with reliability measures and the application in group decision‐makingInternational Journal of Intelligent Systems10.1002/int.2206334:4(527-540)Online publication date: 24-Oct-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics  Volume 40, Issue 6
December 2010
224 pages

Publisher

IEEE Press

Publication History

Published: 01 December 2010
Accepted: 18 January 2010
Revised: 19 November 2009
Received: 24 July 2009

Author Tags

  1. Alias detection
  2. alias detection
  3. data reliability
  4. nearest neighbor
  5. ordered weighted averaging (OWA) aggregation
  6. unsupervised feature selection
  7. weight determination

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Exploiting Reliability-Guided Aggregation for the Assessment of Curvilinear Structure TortuosityMedical Image Computing and Computer Assisted Intervention – MICCAI 201910.1007/978-3-030-32251-9_2(12-20)Online publication date: 13-Oct-2019
  • (2018)Multi-functional nearest-neighbour classificationSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-017-2528-422:8(2717-2730)Online publication date: 1-Apr-2018
  • (2018)Induced cluster‐based OWA operators with reliability measures and the application in group decision‐makingInternational Journal of Intelligent Systems10.1002/int.2206334:4(527-540)Online publication date: 24-Oct-2018
  • (2017)Generating descriptive model for student dropoutHuman-centric Computing and Information Sciences10.1186/s13673-016-0083-07:1(1-24)Online publication date: 1-Dec-2017
  • (2017)Identification of Anticancer Peptides Using Optimal Feature Space of Chou's Split Amino Acid Composition and Support Vector MachineProceedings of the 2017 4th International Conference on Biomedical and Bioinformatics Engineering10.1145/3168776.3168787(91-96)Online publication date: 12-Nov-2017
  • (2017)Exploiting Data Reliability and Fuzzy Clustering for Journal RankingIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2016.261226525:5(1306-1319)Online publication date: 3-Oct-2017
  • (2017)Reliability-guided fuzzy classifier ensemble2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ-IEEE.2017.8015407(1-6)Online publication date: 9-Jul-2017
  • (2016)A cluster-directed framework for neighbour based imputation of missing value in microarray dataInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2016.07653515:2(165-193)Online publication date: 1-May-2016
  • (2016)Rough-fuzzy rule interpolationInformation Sciences: an International Journal10.1016/j.ins.2016.02.036351:C(1-17)Online publication date: 10-Jul-2016
  • (2016)Majority Clusters-Density Ordered Weighting AveragingInternational Journal of Intelligent Systems10.1002/int.2182131:12(1166-1180)Online publication date: 1-Dec-2016
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media