Feature Subset Selection by Estimation of Distribution Algorithms

I. Inza²,
P. Larrañaga² &
B. Sierra²

Part of the book series: Genetic Algorithms and Evolutionary Computation ((GENA,volume 2))

611 Accesses
7 Citations

Abstract

Feature Subset Selection is a well known task in the Machine Learn-ing, Data Mining, Pattern Recognition and Text Learning paradigms. In this chapter, we present a set of Estimation of Distribution Algorihtms (EDAs) inspired techniques to tackle the Feature Subset Selection problem in Machine Learning and Data Mining tasks. Bayesian networks are used to factorize the probability distribution of best solutions in small and medium dimensionality datasets, and simpler probabilistic models are used in larger dimensionality domains. In a comparison with different sequential and genetic-inspired algorithms in natural and artificial datasets, EDA-based approaches have obtained encouraging accuracy results and need a smaller number of evaluations than genetic approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 143.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 179.99; Price includes VAT (United Kingdom)

Hardcover Book: GBP 179.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Generalized Information-Theoretic Measures for Feature Selection

Adaptive Information-Theoretical Feature Selection for Pattern Classification

A Comparative Study Between Feature Selection Algorithms

References

Aha, D.W. and Bankert, R.L. (1994). Feature selection for case-based classification of cloud types: An empirical comparison. In Proceedings of the AAAI’94 Workshop on Case-Based Reasoning, pages 106–112.
Google Scholar
Alpaydin, E. (1999). Combined 5x2cv f test for comparing supervised classification learning algorithms. Neural Computation, 11:1885–1892.
Article Google Scholar
Bäck, T. (1996). Evolutionary Algorithms in Theory and Practice. Oxford University Press.
Google Scholar
Baluja, S. (1994). Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical Report CMU-CS-94–163, Carnegie Mellon University, Pittsburgh, PA.
Google Scholar
Blum, A.L. and Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97:245–271.
Article MathSciNet MATH Google Scholar
Buntine, W. (1991). Theory refinement in Bayesian networks. In Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence, pages 52–60.
Google Scholar
Cestnik, B. (1990). Estimating probabilities: a crucial task in machine learning. In Proceedings of the European Conference on Artificial Intelligence, pages 147–149.
Google Scholar
Chow, C. and Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14:462467.
Google Scholar
De Bonet, J.D., Isbell, C.L., and Viola, P. (1997). MIMIC: Finding optima by estimating probability densities. In Advances in Neural Information Processing Systems, volume 9. MIT Press.
Google Scholar
Doak, J. (1992). An evaluation of feature selection methods and their application to computer security. Technical Report CSE-92–18, University of California at Davis.
Google Scholar
Etxeberria, R. and Larrañaga, P. (1999). Global optimization with Bayesian networks. In II Symposium on Artificial Intelligence. CIMAF99. Special Session on Distributions and Evolutionary Optimization, pages 332–339.
Google Scholar
Ferri, F.J., Pudil, P., Hatef, M., and Kittler, J. (1994). Comparative study of techniques for large scale feature selection. In Gelsema, E.S. and Kanal, L.N., editors, Multiple Paradigms, Comparative Studies and Hybrid Systems, pages 403–413. North Holland.
Google Scholar
Friedman, N. and Yakhini, Z. (1996). On the sample complexity of learning Bayesian networks. In Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence, pages 274–282.
Google Scholar
González, C., Lozano, J. A., and Larrañaga, P. (2001). The converge behavior of PBIL algorithm: a preliminary approach. In Kurková, V., Steel, N. C., Neruda, R., and Kárnÿ, M., editors, International Conference on Artificial Neural Networks and Genetic Algorithms. ICANNGA-2001, pages 228–231. Springer.
Google Scholar
Grefenstette, J.J. (1986). Optimization of control parameters for genetic algorithms. IEEE Transactions on Systems, Man,and Cybernetics, 16(1):122–128.
Article Google Scholar
Harik G.R. and Goldberg, D.E. (1996). Learning linkage. Technical Report IlliGAL Report 99003, University of Illinois at Urbana-Champaign, Illinois Genetic Algorithms Laboratory.
Google Scholar
Inza, I., Larrañaga, P., Etxeberria, R., and Sierra, B. (2000). Feature subset selection by Bayesian network-based optimization. Artificial Intelligence, 123(1–2):157–184.
Article MATH Google Scholar
Jain, A.K. and Chandrasekaran, R. (1982). Dimensionality and sample size considerations in pattern recognition practice. In Krishnaiah, P.R. and Kanal, L.N., editors, Handbook of Statistics, volume 2, pages 835–855. North-Holland.
Google Scholar
Jain, A.K. and Zongker, D. (1997). Feature selection: Evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(2):153–158.
Article Google Scholar
Kittler, J. (1978). Feature set search algorithms. In Chen, C., editor, Pattern Recognition and Signal Processing, pages 41–60.
Chapter Google Scholar
Sithoff and Noordhoff. Kohavi, R. and John, G. (1997). Wrappers for feature subset selection. ArtificialIntelligence, 97(1–2):273–324.
Google Scholar
Kohavi, R., Sommerfield, D., and Dougherty, J. (1997). Data mining using MLC++, a machine learning library in C++. International Journal of Artificial Intelligence Tools, 6:537–566.
Article Google Scholar
Kudo, M. and Sklansky, J. (2000). Comparison of algorithms that select features for pattern classifiers. Pattern Recognition, 33:25–41.
Article Google Scholar
Langley, P. and Sage, S. (1994). Induction of selective Bayesian classifiers. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, pages 399–406.
Google Scholar
Liu, H. and Motoda, H. (1998). Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers.
Book MATH Google Scholar
Miller, A.J. (1990). Subset Selection in Regression. Chapman and Hall.
Google Scholar
Mladenic, M. (1998). Feature subset selection in text-learning. In Proceedings of the Tenth European Conference on Machine Learning, pages 95–100.
Google Scholar
Murphy, P. (1995). UCI Repository of machine learning databases. University of California, Department of Information and Computer Science.
Google Scholar
Narendra, P. and Fukunaga, K. (1977). A branch and bound algorithm for feature subset selection. IEEE Transactions on Computer, C-26(9):917–922.
Article MATH Google Scholar
Ng, A.Y. (1997). Preventing “overfitting” of cross-validation data. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 245–253.
Google Scholar
Pelikan, M., Goldberg, D.E., and Cantú-Paz, E. (1998). Linkage problem, distribution estimation, and Bayesian networks. Technical Report IlliGAL Report 98013, University of Illinois at Urbana-Champaign, Illinois Genetic Algorithms Laboratory.
Google Scholar
Pelikan, M. and Müehlenbein, H. (1999). The bivariate marginal distribution algorithm. In Advances in Soft Computing-Engineering Design and Manufacturing, pages 521–535. Springer-Verlag.
Google Scholar
Pudil, P., Novovicova, J., and Kittler, J. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15(1):1119–1125.
Article Google Scholar
Rana, S., Whitley, L.D., and Cogswell, R. (1996). Searching in the presence of noise. In Lecture Notes in Computer Science 1411: Parallel Problem Solving from Nature - PPSN IV, pages 198–207.
Google Scholar
Sangüesa, R., Cortés, U., and Gisolfi, A. (1998). A parallel algorithm for building possibilistic causal networks. International Journal of Approximate Reasoning, 18(3–4):251–270.
Article MathSciNet MATH Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 7:461–464.
Article Google Scholar
Siedelecky, W. and Sklansky, J. (1988). On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2:197–220.
Article Google Scholar
Syswerda, G. (1993). Simulated crossover in genetic algorithms. In Whitley, L.D., editor, Foundations of Genetic Algorithms, volume 2, pages 239–255.
Google Scholar
Thierens, D. and Goldberg, D.E. (1993). Mixing in genetic algorithms. In Proceedings of the Fifth International Conference in Genetic Algorithms, pages 38–45.
Google Scholar
Xiang, Y. and Chu, T. (1999). Parallel learning of belief networks in large and difficult domains. Data Mining and Knowledge Discovery, 3(3):315–338.
Article Google Scholar
Yang, Y. and Pedersen, J.O. (1997). A comparative study on feature selection in text categorization. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 412–420.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Artificial Intelligence, University of the Basque Country, Spain
I. Inza, P. Larrañaga & B. Sierra

Authors

I. Inza
View author publications
You can also search for this author in PubMed Google Scholar
P. Larrañaga
View author publications
You can also search for this author in PubMed Google Scholar
B. Sierra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of the Basque Country, Spain
Pedro Larrañaga & Jose A. Lozano &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Inza, I., Larrañaga, P., Sierra, B. (2002). Feature Subset Selection by Estimation of Distribution Algorithms. In: Larrañaga, P., Lozano, J.A. (eds) Estimation of Distribution Algorithms. Genetic Algorithms and Evolutionary Computation, vol 2. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-1539-5_13

Download citation

DOI: https://doi.org/10.1007/978-1-4615-1539-5_13
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5604-2
Online ISBN: 978-1-4615-1539-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Feature Subset Selection by Estimation of Distribution Algorithms

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Generalized Information-Theoretic Measures for Feature Selection

Adaptive Information-Theoretical Feature Selection for Pattern Classification

A Comparative Study Between Feature Selection Algorithms

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Feature Subset Selection by Estimation of Distribution Algorithms

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Generalized Information-Theoretic Measures for Feature Selection

Adaptive Information-Theoretical Feature Selection for Pattern Classification

A Comparative Study Between Feature Selection Algorithms

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation