Abstract
Feature Subset Selection is a well known task in the Machine Learn-ing, Data Mining, Pattern Recognition and Text Learning paradigms. In this chapter, we present a set of Estimation of Distribution Algorihtms (EDAs) inspired techniques to tackle the Feature Subset Selection problem in Machine Learning and Data Mining tasks. Bayesian networks are used to factorize the probability distribution of best solutions in small and medium dimensionality datasets, and simpler probabilistic models are used in larger dimensionality domains. In a comparison with different sequential and genetic-inspired algorithms in natural and artificial datasets, EDA-based approaches have obtained encouraging accuracy results and need a smaller number of evaluations than genetic approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aha, D.W. and Bankert, R.L. (1994). Feature selection for case-based classification of cloud types: An empirical comparison. In Proceedings of the AAAI’94 Workshop on Case-Based Reasoning, pages 106–112.
Alpaydin, E. (1999). Combined 5x2cv f test for comparing supervised classification learning algorithms. Neural Computation, 11:1885–1892.
Bäck, T. (1996). Evolutionary Algorithms in Theory and Practice. Oxford University Press.
Baluja, S. (1994). Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical Report CMU-CS-94–163, Carnegie Mellon University, Pittsburgh, PA.
Blum, A.L. and Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97:245–271.
Buntine, W. (1991). Theory refinement in Bayesian networks. In Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence, pages 52–60.
Cestnik, B. (1990). Estimating probabilities: a crucial task in machine learning. In Proceedings of the European Conference on Artificial Intelligence, pages 147–149.
Chow, C. and Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14:462467.
De Bonet, J.D., Isbell, C.L., and Viola, P. (1997). MIMIC: Finding optima by estimating probability densities. In Advances in Neural Information Processing Systems, volume 9. MIT Press.
Doak, J. (1992). An evaluation of feature selection methods and their application to computer security. Technical Report CSE-92–18, University of California at Davis.
Etxeberria, R. and Larrañaga, P. (1999). Global optimization with Bayesian networks. In II Symposium on Artificial Intelligence. CIMAF99. Special Session on Distributions and Evolutionary Optimization, pages 332–339.
Ferri, F.J., Pudil, P., Hatef, M., and Kittler, J. (1994). Comparative study of techniques for large scale feature selection. In Gelsema, E.S. and Kanal, L.N., editors, Multiple Paradigms, Comparative Studies and Hybrid Systems, pages 403–413. North Holland.
Friedman, N. and Yakhini, Z. (1996). On the sample complexity of learning Bayesian networks. In Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence, pages 274–282.
González, C., Lozano, J. A., and Larrañaga, P. (2001). The converge behavior of PBIL algorithm: a preliminary approach. In Kurková, V., Steel, N. C., Neruda, R., and Kárnÿ, M., editors, International Conference on Artificial Neural Networks and Genetic Algorithms. ICANNGA-2001, pages 228–231. Springer.
Grefenstette, J.J. (1986). Optimization of control parameters for genetic algorithms. IEEE Transactions on Systems, Man,and Cybernetics, 16(1):122–128.
Harik G.R. and Goldberg, D.E. (1996). Learning linkage. Technical Report IlliGAL Report 99003, University of Illinois at Urbana-Champaign, Illinois Genetic Algorithms Laboratory.
Inza, I., Larrañaga, P., Etxeberria, R., and Sierra, B. (2000). Feature subset selection by Bayesian network-based optimization. Artificial Intelligence, 123(1–2):157–184.
Jain, A.K. and Chandrasekaran, R. (1982). Dimensionality and sample size considerations in pattern recognition practice. In Krishnaiah, P.R. and Kanal, L.N., editors, Handbook of Statistics, volume 2, pages 835–855. North-Holland.
Jain, A.K. and Zongker, D. (1997). Feature selection: Evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(2):153–158.
Kittler, J. (1978). Feature set search algorithms. In Chen, C., editor, Pattern Recognition and Signal Processing, pages 41–60.
Sithoff and Noordhoff. Kohavi, R. and John, G. (1997). Wrappers for feature subset selection. ArtificialIntelligence, 97(1–2):273–324.
Kohavi, R., Sommerfield, D., and Dougherty, J. (1997). Data mining using MLC++, a machine learning library in C++. International Journal of Artificial Intelligence Tools, 6:537–566.
Kudo, M. and Sklansky, J. (2000). Comparison of algorithms that select features for pattern classifiers. Pattern Recognition, 33:25–41.
Langley, P. and Sage, S. (1994). Induction of selective Bayesian classifiers. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, pages 399–406.
Liu, H. and Motoda, H. (1998). Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers.
Miller, A.J. (1990). Subset Selection in Regression. Chapman and Hall.
Mladenic, M. (1998). Feature subset selection in text-learning. In Proceedings of the Tenth European Conference on Machine Learning, pages 95–100.
Murphy, P. (1995). UCI Repository of machine learning databases. University of California, Department of Information and Computer Science.
Narendra, P. and Fukunaga, K. (1977). A branch and bound algorithm for feature subset selection. IEEE Transactions on Computer, C-26(9):917–922.
Ng, A.Y. (1997). Preventing “overfitting” of cross-validation data. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 245–253.
Pelikan, M., Goldberg, D.E., and Cantú-Paz, E. (1998). Linkage problem, distribution estimation, and Bayesian networks. Technical Report IlliGAL Report 98013, University of Illinois at Urbana-Champaign, Illinois Genetic Algorithms Laboratory.
Pelikan, M. and Müehlenbein, H. (1999). The bivariate marginal distribution algorithm. In Advances in Soft Computing-Engineering Design and Manufacturing, pages 521–535. Springer-Verlag.
Pudil, P., Novovicova, J., and Kittler, J. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15(1):1119–1125.
Rana, S., Whitley, L.D., and Cogswell, R. (1996). Searching in the presence of noise. In Lecture Notes in Computer Science 1411: Parallel Problem Solving from Nature - PPSN IV, pages 198–207.
Sangüesa, R., Cortés, U., and Gisolfi, A. (1998). A parallel algorithm for building possibilistic causal networks. International Journal of Approximate Reasoning, 18(3–4):251–270.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 7:461–464.
Siedelecky, W. and Sklansky, J. (1988). On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2:197–220.
Syswerda, G. (1993). Simulated crossover in genetic algorithms. In Whitley, L.D., editor, Foundations of Genetic Algorithms, volume 2, pages 239–255.
Thierens, D. and Goldberg, D.E. (1993). Mixing in genetic algorithms. In Proceedings of the Fifth International Conference in Genetic Algorithms, pages 38–45.
Xiang, Y. and Chu, T. (1999). Parallel learning of belief networks in large and difficult domains. Data Mining and Knowledge Discovery, 3(3):315–338.
Yang, Y. and Pedersen, J.O. (1997). A comparative study on feature selection in text categorization. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 412–420.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer Science+Business Media New York
About this chapter
Cite this chapter
Inza, I., Larrañaga, P., Sierra, B. (2002). Feature Subset Selection by Estimation of Distribution Algorithms. In: Larrañaga, P., Lozano, J.A. (eds) Estimation of Distribution Algorithms. Genetic Algorithms and Evolutionary Computation, vol 2. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-1539-5_13
Download citation
DOI: https://doi.org/10.1007/978-1-4615-1539-5_13
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5604-2
Online ISBN: 978-1-4615-1539-5
eBook Packages: Springer Book Archive