[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Shell-neighbor method and its application in missing data imputation

Published: 01 August 2011 Publication History

Abstract

Data preparation is an important step in mining incomplete data. To deal with this problem, this paper introduces a new imputation approach called SN (Shell Neighbors) imputation, or simply SNI. The SNI fills in an incomplete instance (with missing values) in a given dataset by only using its left and right nearest neighbors with respect to each factor (attribute), referred them to Shell Neighbors. The left and right nearest neighbors are selected from a set of nearest neighbors of the incomplete instance. The size of the sets of the nearest neighbors is determined with the cross-validation method. And then the SNI is generalized to deal with missing data in datasets with mixed attributes, for example, continuous and categorical attributes. Some experiments are conducted for evaluating the proposed approach, and demonstrate that the generalized SNI method outperforms the kNN imputation method at imputation accuracy and classification accuracy.

References

[1]
Batista G, Monard MC (2003) An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell 17(5-6):519-533.
[2]
Berthold MR, Huber KP (1998) Missing values and learning of fuzzy rules. Int J Uncertain, Fuzziness Knowl-Based Syst 6(2):171-178.
[3]
Chen J, Shao J (2001) Jackknife variance estimation for nearest-neighbor imputation. J Am Stat Assoc 96:260-269.
[4]
Dempster AP, Laird NM, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc, Ser B 39:1-38.
[5]
Farhangfar A, et al (2007) A novel framework for imputation of missing values in databases. IEEE Trans Syst Man Cybern Part A: Syst Humans 37(5):692-709.
[6]
Gabrys B (2002) Neuro-fuzzy approach to processing inputs with missing values in pattern recognition problems. Int J Approx Reason 30(3):149-179.
[7]
Gabrys B, Petrakieva L (2004) Combining labelled and unlabelled data in the design of pattern classification systems. Int J Approx Reason 35(3):251-273.
[8]
Ghahramani Z, Jordan M (1994) Supervised learning from incomplete data via an EM approach. Adv Neural Inf Process Syst 6:120-127.
[9]
Graham J, Cumsille P, Elek-Fisk E (2003) Methods for handling missing data. In: Handbook of psychology, vol 2. Wiley, New York, pp 87-114.
[10]
Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann, San Mateo.
[11]
Kang SS, Koehler K, Larsen MD (2007) Partial FEFI for incomplete tables with covariates. Iowa State University Press, Ames.
[12]
Kothari R, Jain V (2002) Learning from labeled and unlabeled data. In: Proceedings of the 2002 international joint conference on neural networks, vol 3, pp 2803-2808.
[13]
Lin D (1998) An information-theoretic definition of similarity. In: ICML-98, pp 296-304.
[14]
Little R, Rubin D (2002) Statistical analysis with missing data. Wiley, New York, 2002.
[15]
Mitchell T (1999) The role of unlabeled data in supervised learning. In: Proceedings of the sixth international colloquium on cognitive science.
[16]
Nauck D, Kruse R (1999) Learning in neuro-fuzzy systems with symbolic attributes and missing values. In: Proceedings of the international conference on neural information processing (ICONIP'99), Perth, pp 142-147.
[17]
Nijman MJ, Kappen HJ (1997) Symmetry breaking and training from incomplete data with radial basis Boltzmann machines. Int J Neural Syst 8(3):301-315.
[18]
Peng C, Zhu J (2008) Comparison of two approaches for handling missing covariates in logistic regression. Educ Psychol Meas 68(1):58-77.
[19]
Qin YS et al (2007) Semi-parametric optimization for missing data imputation. Appl Intell 27(1):79-88.
[20]
Quinlan J (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo.
[21]
Rubin D, et al (1976) Inference and missing data. Biometrika 63(3):581-592.
[22]
Schafer J (1997) Analysis of incomplete multivariate data. Chapman & Hall, London.
[23]
Schafer J, Graham J (2002) Missing data: Our view of the state of the art. Psychol Methods 7(2):147-177.
[24]
Tresp V, Ahmad S, Neuneier R (1994) Training neural networks with deficient data. Adv Neural Inf Process Syst 6:128-135.
[25]
Zhang CQ et al (2007) GBKII: an imputation method for missing values. PAKDD-07, 2007, pp 1080-1087.
[26]
Zhang SC (2008) Parimputation: from imputation and nullimputation to partially imputation. IEEE Intell Inf Bull 9(1): 32-38.
[27]
Zhang SC, Qin ZX, Sheng SL, Ling CL (2005) "Missing is useful": missing values in cost-sensitive decision trees. IEEE Trans Knowl Data Eng 17(12):1689-1693.
[28]
Zhang SC et al (2008) Missing value imputation based on data clustering. Trans Comput Sci J 1:128-138.
[29]
Zhang SC, Zhang CQ, Yang Q (2004) Information enhancement for data mining. IEEE Intell Syst 19:12-13.

Cited By

View all
  • (2022)A Neighborhood Model with Both Distance and Quantity Constraints for Multilabel DataComputational Intelligence and Neuroscience10.1155/2022/98919712022Online publication date: 1-Jan-2022
  • (2022)Causal Feature Selection with Missing DataACM Transactions on Knowledge Discovery from Data10.1145/348805516:4(1-24)Online publication date: 8-Jan-2022
  • (2022)A review of the current publication trends on missing data imputation over three decades: direction and future researchNeural Computing and Applications10.1007/s00521-022-07702-734:21(18325-18340)Online publication date: 1-Nov-2022
  • Show More Cited By
  1. Shell-neighbor method and its application in missing data imputation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Applied Intelligence
    Applied Intelligence  Volume 35, Issue 1
    August 2011
    161 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 August 2011

    Author Tags

    1. Mining incomplete data
    2. Missing data imputation
    3. Shell-NN
    4. kNN

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 04 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)A Neighborhood Model with Both Distance and Quantity Constraints for Multilabel DataComputational Intelligence and Neuroscience10.1155/2022/98919712022Online publication date: 1-Jan-2022
    • (2022)Causal Feature Selection with Missing DataACM Transactions on Knowledge Discovery from Data10.1145/348805516:4(1-24)Online publication date: 8-Jan-2022
    • (2022)A review of the current publication trends on missing data imputation over three decades: direction and future researchNeural Computing and Applications10.1007/s00521-022-07702-734:21(18325-18340)Online publication date: 1-Nov-2022
    • (2021)Missing Data Interpolation of Alzheimer’s Disease Based on Column-by-Column Mixed ModeComplexity10.1155/2021/35415162021Online publication date: 1-Jan-2021
    • (2020)REMIANACM Transactions on Knowledge Discovery from Data10.1145/341236414:6(1-38)Online publication date: 28-Sep-2020
    • (2020)Performance Evaluation for Class Center-Based Missing Data Imputation AlgorithmProceedings of the 2020 9th International Conference on Software and Computer Applications10.1145/3384544.3384575(36-40)Online publication date: 18-Feb-2020
    • (2020)Missing value imputation: a review and analysis of the literature (2006–2017)Artificial Intelligence Review10.1007/s10462-019-09709-453:2(1487-1509)Online publication date: 1-Feb-2020
    • (2019)Low-quality multivariate spatio-temporal serial data preprocessingCluster Computing10.1007/s10586-017-1453-822:1(2357-2370)Online publication date: 1-Jan-2019
    • (2018)Unsupervised feature selection via local structure learning and sparse learningMultimedia Tools and Applications10.5555/3288251.328829377:22(29605-29622)Online publication date: 1-Nov-2018
    • (2018)Hypergraph expressing low-rank feature selection algorithmMultimedia Tools and Applications10.5555/3288251.328828977:22(29551-29572)Online publication date: 1-Nov-2018
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media