More Web Proxy on the site http://driver.im/

article

Shell-neighbor method and its application in missing data imputation

Author:

Shichao ZhangAuthors Info & Claims

Applied Intelligence, Volume 35, Issue 1

Pages 123 - 133

https://doi.org/10.1007/s10489-009-0207-6

Published: 01 August 2011 Publication History

Abstract

Data preparation is an important step in mining incomplete data. To deal with this problem, this paper introduces a new imputation approach called SN (Shell Neighbors) imputation, or simply SNI. The SNI fills in an incomplete instance (with missing values) in a given dataset by only using its left and right nearest neighbors with respect to each factor (attribute), referred them to Shell Neighbors. The left and right nearest neighbors are selected from a set of nearest neighbors of the incomplete instance. The size of the sets of the nearest neighbors is determined with the cross-validation method. And then the SNI is generalized to deal with missing data in datasets with mixed attributes, for example, continuous and categorical attributes. Some experiments are conducted for evaluating the proposed approach, and demonstrate that the generalized SNI method outperforms the kNN imputation method at imputation accuracy and classification accuracy.

References

[1]

Batista G, Monard MC (2003) An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell 17(5-6):519-533.

[2]

Berthold MR, Huber KP (1998) Missing values and learning of fuzzy rules. Int J Uncertain, Fuzziness Knowl-Based Syst 6(2):171-178.

Digital Library

[3]

Chen J, Shao J (2001) Jackknife variance estimation for nearest-neighbor imputation. J Am Stat Assoc 96:260-269.

[4]

Dempster AP, Laird NM, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc, Ser B 39:1-38.

[5]

Farhangfar A, et al (2007) A novel framework for imputation of missing values in databases. IEEE Trans Syst Man Cybern Part A: Syst Humans 37(5):692-709.

[6]

Gabrys B (2002) Neuro-fuzzy approach to processing inputs with missing values in pattern recognition problems. Int J Approx Reason 30(3):149-179.

Digital Library

[7]

Gabrys B, Petrakieva L (2004) Combining labelled and unlabelled data in the design of pattern classification systems. Int J Approx Reason 35(3):251-273.

[8]

Ghahramani Z, Jordan M (1994) Supervised learning from incomplete data via an EM approach. Adv Neural Inf Process Syst 6:120-127.

[9]

Graham J, Cumsille P, Elek-Fisk E (2003) Methods for handling missing data. In: Handbook of psychology, vol 2. Wiley, New York, pp 87-114.

[10]

Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann, San Mateo.

[11]

Kang SS, Koehler K, Larsen MD (2007) Partial FEFI for incomplete tables with covariates. Iowa State University Press, Ames.

[12]

Kothari R, Jain V (2002) Learning from labeled and unlabeled data. In: Proceedings of the 2002 international joint conference on neural networks, vol 3, pp 2803-2808.

[13]

Lin D (1998) An information-theoretic definition of similarity. In: ICML-98, pp 296-304.

[14]

Little R, Rubin D (2002) Statistical analysis with missing data. Wiley, New York, 2002.

Digital Library

[15]

Mitchell T (1999) The role of unlabeled data in supervised learning. In: Proceedings of the sixth international colloquium on cognitive science.

[16]

Nauck D, Kruse R (1999) Learning in neuro-fuzzy systems with symbolic attributes and missing values. In: Proceedings of the international conference on neural information processing (ICONIP'99), Perth, pp 142-147.

[17]

Nijman MJ, Kappen HJ (1997) Symmetry breaking and training from incomplete data with radial basis Boltzmann machines. Int J Neural Syst 8(3):301-315.

[18]

Peng C, Zhu J (2008) Comparison of two approaches for handling missing covariates in logistic regression. Educ Psychol Meas 68(1):58-77.

[19]

Qin YS et al (2007) Semi-parametric optimization for missing data imputation. Appl Intell 27(1):79-88.

Digital Library

[20]

Quinlan J (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo.

[21]

Rubin D, et al (1976) Inference and missing data. Biometrika 63(3):581-592.

[22]

Schafer J (1997) Analysis of incomplete multivariate data. Chapman & Hall, London.

[23]

Schafer J, Graham J (2002) Missing data: Our view of the state of the art. Psychol Methods 7(2):147-177.

[24]

Tresp V, Ahmad S, Neuneier R (1994) Training neural networks with deficient data. Adv Neural Inf Process Syst 6:128-135.

[25]

Zhang CQ et al (2007) GBKII: an imputation method for missing values. PAKDD-07, 2007, pp 1080-1087.

[26]

Zhang SC (2008) Parimputation: from imputation and nullimputation to partially imputation. IEEE Intell Inf Bull 9(1): 32-38.

[27]

Zhang SC, Qin ZX, Sheng SL, Ling CL (2005) "Missing is useful": missing values in cost-sensitive decision trees. IEEE Trans Knowl Data Eng 17(12):1689-1693.

Digital Library

[28]

Zhang SC et al (2008) Missing value imputation based on data clustering. Trans Comput Sci J 1:128-138.

[29]

Zhang SC, Zhang CQ, Yang Q (2004) Information enhancement for data mining. IEEE Intell Syst 19:12-13.

Digital Library

Cited By

Jiang XZhou JQiao XPeng CSu S(2022)A Neighborhood Model with Both Distance and Quantity Constraints for Multilabel DataComputational Intelligence and Neuroscience10.1155/2022/98919712022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/9891971
Yu KYang YDing W(2022)Causal Feature Selection with Missing DataACM Transactions on Knowledge Discovery from Data10.1145/348805516:4(1-24)Online publication date: 8-Jan-2022
https://dl.acm.org/doi/10.1145/3488055
Adnan FJamaludin KWan Muhamad WMiskon S(2022)A review of the current publication trends on missing data imputation over three decades: direction and future researchNeural Computing and Applications10.1007/s00521-022-07702-734:21(18325-18340)Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1007/s00521-022-07702-7
Show More Cited By

Shell-neighbor method and its application in missing data imputation
1. Information systems
  1. Information systems applications

Recommendations

An effective ensemble method for missing data imputation

The presence of missing data in a dataset plays a vital role in the design of classification, clustering, or regression methods. An efficient missing data imputation can enhance the overall performance of a machine learning method. This paper ensembles k-...
Modified K-Nearest Neighbour Using Proposed Similarity Fuzzy Measure for Missing Data Imputation on Medical Datasets (MKNNMBI)

Early disease diagnosis is a burning problem in health sector, medical domain and disease management. During analysis, quality of the data can be achieved only if the data is complete. Missing values reduces the efficiency of data analysis task. ...
Missing data imputation in breast cancer prognosis
BioMed'06: Proceedings of the 24th IASTED international conference on Biomedical engineering

Missing data are often a problem present in real datasets and different imputation techniques are normally used to alleviate this problem. In this paper we analyze the performance of two different data imputation methods in a task where the aim is to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Applied Intelligence

Applied Intelligence Volume 35, Issue 1

August 2011

161 pages

ISSN:0924-669X

Issue’s Table of Contents

Copyright © Copyright © 2011 Springer Science+Business Media, LLC.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 August 2011

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jiang XZhou JQiao XPeng CSu S(2022)A Neighborhood Model with Both Distance and Quantity Constraints for Multilabel DataComputational Intelligence and Neuroscience10.1155/2022/98919712022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/9891971
Yu KYang YDing W(2022)Causal Feature Selection with Missing DataACM Transactions on Knowledge Discovery from Data10.1145/348805516:4(1-24)Online publication date: 8-Jan-2022
https://dl.acm.org/doi/10.1145/3488055
Adnan FJamaludin KWan Muhamad WMiskon S(2022)A review of the current publication trends on missing data imputation over three decades: direction and future researchNeural Computing and Applications10.1007/s00521-022-07702-734:21(18325-18340)Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1007/s00521-022-07702-7
Miao SLi SZheng XWang RLi JDing SMa J(2021)Missing Data Interpolation of Alzheimer’s Disease Based on Column-by-Column Mixed ModeComplexity10.1155/2021/35415162021Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1155/2021/3541516
Ma QGu YLee WYu GLiu HWu X(2020)REMIANACM Transactions on Knowledge Discovery from Data10.1145/341236414:6(1-38)Online publication date: 28-Sep-2020
https://dl.acm.org/doi/10.1145/3412364
Nugroho HUtama NSurendro K(2020)Performance Evaluation for Class Center-Based Missing Data Imputation AlgorithmProceedings of the 2020 9th International Conference on Software and Computer Applications10.1145/3384544.3384575(36-40)Online publication date: 18-Feb-2020
https://dl.acm.org/doi/10.1145/3384544.3384575
Lin WTsai C(2020)Missing value imputation: a review and analysis of the literature (2006–2017)Artificial Intelligence Review10.1007/s10462-019-09709-453:2(1487-1509)Online publication date: 1-Feb-2020
https://dl.acm.org/doi/10.1007/s10462-019-09709-4
Yu TLi LChen LSong W(2019)Low-quality multivariate spatio-temporal serial data preprocessingCluster Computing10.1007/s10586-017-1453-822:1(2357-2370)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1007/s10586-017-1453-8
Lei CZhu X(2018)Unsupervised feature selection via local structure learning and sparse learningMultimedia Tools and Applications10.5555/3288251.328829377:22(29605-29622)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.5555/3288251.3288293
Fang YLi YLei CLi YDeng X(2018)Hypergraph expressing low-rank feature selection algorithmMultimedia Tools and Applications10.5555/3288251.328828977:22(29551-29572)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.5555/3288251.3288289
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents