Feature Evaluation by Filter, Wrapper, and Embedded Approaches

Urszula Stańczyk⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 584))

3617 Accesses
15 Citations

Abstract

The choice of particular variables for construction of a set of characteristic features relevant to classification can be executed in a kind of external process with respect to a classification system employed in pattern recognition, it can depend on the performance of such system, or it can involve some inherent mechanism, build-in in the system. The three types of approaches correspond to three categories of methodologies typically exploited in feature selection and reduction: filters, wrappers, and embedded solutions, respectively. They are used when domain knowledge is unavailable or insufficient for an informed choice, or in order to support this expert knowledge to achieve higher efficiency, enhanced classification, or reduced sizes of classifiers. The chapter illustrates the combinations of the three approaches with the aim of feature evaluation, for binary classification with balanced, for the task of authorship attribution that belongs with stylometric analysis of texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 119.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 149.99; Price includes VAT (United Kingdom)

Hardcover Book: GBP 149.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Comparison of Embedded and Wrapper Approaches for Feature Selection in Support Vector Machines

Feature Selection and Interpretable Feature Transformation: A Preliminary Study on Feature Engineering for Classification Algorithms

A Survey on Filter Techniques for Feature Selection in Text Mining

References

Ahonen, H., Heinonen, O., Klemettinen, M., Verkamo, A.: Applying data mining techniques in text analysis. Technical Report C-1997-23. Department of Computer Science, University of Helsinki, Finland (1997)
Google Scholar
Argamon, S., Karlgren, J., Shanahan, J.: Stylistic analysis of text for information access. In: Proceedings of the 28th International ACM Conference on Research and Development in Information Retrieval, Brazil (2005)
Google Scholar
Argamon, S., Burns, K., Dubnov, S. (eds.): The Structure of Style: Algorithmic Approaches to Understanding Manner and Meaning. Springer, Berlin (2010)
Google Scholar
Baayen, H., van Haltern, H., Tweedie, F.: Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Lit. Linguist. Comput. 11(3), 121–132 (1996)
Article Google Scholar
Bayardo Jr, R., Agrawal, R.: Mining the most interesting rules. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 145–154 (1999)
Google Scholar
Berber Sardinha, T.: Using key words in text analysis: practical aspects. Available on-line from ftp://ftp.liv.ac.uk/pub/linguistics (1999)
Craig, H.: Stylistic analysis and authorship studies. In: Schreibman, S., Siemens, R., Unsworth, J. (eds.) A Companion to Digital Humanities. Blackwell, Oxford (2004)
Google Scholar
Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1, 131–156 (1997)
Article Google Scholar
Dash, M., Liu, H.: Consistency-based search in feature selection. Artif. Intell. 151, 155–176 (2003)
Article MathSciNet MATH Google Scholar
Deuntsch, I., Gediga, G.: Rough Set Data Analysis: A Road to Noninvasive Knowledge Discovery. Mathodos Publishers, Bangor (2000)
Google Scholar
Fiesler, E., Beale, R.: Handbook of Neural Computation. Oxford University Press, Oxford (1997)
Book Google Scholar
Greco, S., Matarazzo, B., Słowiński, R.: Rough set theory for multicriteria decision analysis. Eur. J. Oper. Res. 129(1), 1–47 (2001)
Article MATH Google Scholar
Greco, S., Matarazzo, B., Słowiński, R.: Dominance-based rough set approach as a proper way of handling graduality in rough set theory. Trans. Rough Sets 7, 36–52 (2007)
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Article Google Scholar
Jelonek, J., Krawiec, K., Stefanowski, J.: Comparative study of feature subset selection techniques for machine learning tasks. In: Proceedings of the 7th Workshop on Intelligent Information Systems (1998)
Google Scholar
Jensen, R., Shen, Q.: Computational Intelligence and Feature Selection. Wiley, Hoboken (2008)
Book Google Scholar
John, G., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Cohen, W., Hirsh, H. (eds.) Machine Learning: Proceedings of the 11th International Conference, pp. 121–129. Morgan Kaufmann Publishers (1994)
Google Scholar
Kavzoglu, T., Mather, P.: Assessing artificial neural network pruning algorithms. In: Proceedings of the 24th Annual Conference and Exhibition of the Remote Sensing Society, pp. 603–609. Greenwich (2011)
Google Scholar
Khmelev, D., Tweedie, F.: Using Markov chains for identification of writers. Lit. Linguist. Comput. 16(4), 299–307 (2001)
Article Google Scholar
Kingston, G., Maier, H., Lambert, M.: A statistical input pruning method for artificial neural networks used in environmental modelling. In: Transactions of the 2nd Biennial Meeting of the International Environmental Modelling and Software Society, pp. 87–92. Osnabrueck, Germany (2004)
Google Scholar
Kohavi, R., John, G.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)
Article MATH Google Scholar
Lal, T., Chapelle, O., Weston, J., Elisseeff, E.: Embedded methods. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L. (eds.) Feature Extraction: Foundations and Applications. Studies in Fuzziness and Soft Computing, vol. 207, pp. 137–165. Springer, Berlin (2006)
Chapter Google Scholar
Lynam, T., Clarke, C., Cormack, G.: Information extraction with term frequencies. In: Proceedings of the Human Language Technology Conference, pp. 1–4. San Diego (2001)
Google Scholar
Moshkov, M., Piliszczuk, M., Zielosko, B.: On partial covers, reducts and decision rules with weights. Trans. Rough Sets 6, 211–246 (2006)
Google Scholar
Moshkow, M., Skowron, A., Suraj, Z.: On covering attribute sets by reducts. In: Kryszkiewicz, M., Peters, J., Rybinski, H., Skowron, A. (eds.) Rough Sets and Emerging Intelligent Systems Paradigms. LNCS (LNAI), vol. 4585, pp. 175–180. Springer, Berlin (2007)
Chapter Google Scholar
Novaković, J., Strbac, P., Bulatović, D.: Toward optimal feature selection using ranking methods and classification algorithms. Yugosl. J. Oper. Res. 21(1), 119–135 (2011)
Article MathSciNet MATH Google Scholar
Pawlak, Z.: Computing, artificial intelligence and information technology: rough sets, decision algorithms and Bayes’ theorem. Eur. J. Oper. Res. 136, 181–189 (2002)
Article MathSciNet MATH Google Scholar
Pawlak, Z.: Rough sets and intelligent data analysis. Inf. Sci. 147, 1–12 (2002)
Article MathSciNet MATH Google Scholar
Peng, R.: Statistical aspects of literary style. Bachelor’s Thesis, Yale University (1999)
Google Scholar
Peng, R., Hengartner, H.: Quantitative analysis of literary styles. Am. Stat. 56(3), 15–38 (2002)
Article MathSciNet Google Scholar
Sikora, M.: Rule quality measures in creation and reduction of data rule models. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H., Słowiński, R. (eds.) Rough Sets and Current Trends in Computing. Lecture Notes in Computer Science, vol. 4259, pp. 716–725. Springer (2006)
Google Scholar
Słowiński, R., Greco, S., Matarazzo, B.: Dominance-based rough set approach to reasoning about ordinal data. LNCS (LNAI) 4585, 5–11 (2007)
Google Scholar
Stańczyk, U.: Dominance-based rough set approach employed in search of authorial invariants. In: Kurzyński, M., Woźniak, M. (eds.) Computer Recognition Systems 3. AISC, vol. 57, pp. 315–323. Springer, Berlin (2009)
Google Scholar
Stańczyk, U.: DRSA decision algorithm analysis in stylometric processing of literary texts. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) Rough Sets and Current Trends in Computing. LNCS (LNAI), vol. 6086, pp. 600–609. Springer, Berlin (2010)
Chapter Google Scholar
Stańczyk, U.: Rough set-based analysis of characteristic features for ANN classifier. In: Grana Romay, M., Corchado, E., Garcia-Sebastian, M. (eds.) Hybrid Artificial Intelligence Systems Part 1. LNCS (LNAI), vol. 6076, pp. 565–572. Springer, Berlin (2010)
Chapter Google Scholar
Stańczyk, U.: On performance of DRSA-ANN classifier. In: Corchado, M., Kurzyński, E., Woźniak, M. (eds.) Hybrid Artificial Intelligence Systems Part 2. LNCS (LNAI), vol. 6679, pp. 172–179. Springer, Berlin (2011)
Google Scholar
Stańczyk, U.: Rule-based approach to computational stylistics. In: Bouvry, P., Kłopotek, M., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) Security and Intelligent Information Systems. LNCS (LNAI), vol. 7053, pp. 168–179. Springer, Berlin (2012)
Chapter Google Scholar
Stańczyk, U.: On preference order of DRSA conditional attributes for computational stylistics. In: Decker, H., Lhotska, L., Link, S., Basl, J., Tjoa, A. (eds.) Database and Expert Systems Applications. LNCS, vol. 8056, pp. 26–33. Springer, Berlin (2013)
Chapter Google Scholar
Stańczyk, U.: Relative reduct-based estimation of relevance for stylometric features. In: Catania, B., Guerrini, G., Pokorny, J. (eds.) Advances in Databases and Information Systems. LNCS, vol. 8133, pp. 135–147. Springer, Berlin (2013)
Chapter Google Scholar
Stańczyk, U.: Rough set and artificial neural network approach to computational stylistics. In: Ramanna, S., Howlett, R., Jain, L. (eds.) Emerging Paradigms in Machine Learning, Smart Innovation, Systems and Technologies, vol. 13, pp. 441–470. Springer, Berlin (2013)
Chapter Google Scholar
Stańczyk, U.: Weighting of attributes in an embedded rough approach. In: Gruca, A., Czachórski, T., Kozielski, S. (eds.) Man-Machine Interactions. AISC, vol. 242, pp. 475–483. Springer, Berlin (2013)
Chapter Google Scholar
Sun, Y., Wu, D.: A RELIEF based feature extraction algorithm. In: Proceedings of the SIAM International Conference on Data Mining, pp. 188–195 (2008)
Google Scholar

Download references

Acknowledgments

All texts used in the performed experiments are available for on-line reading and download thanks to Project Guttenberg (http://www.gutenberg.org). 4eMka Software used in DRSA processing [13, 33] was developed at the Laboratory of Intelligent Decision Support Systems, (http://www-idss.cs.put.poznan.pl/), Poznan University of Technology, Poland. For simulation of ANN there was used California Scientific Brainmaker software package. Ranking of features with Relief algorithm was executed with WEKA software [15].

Author information

Authors and Affiliations

Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland
Urszula Stańczyk

Authors

Urszula Stańczyk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Urszula Stańczyk .

Editor information

Editors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Urszula Stańczyk
Mawson Lakes Campus, Faculty of Education, Science, Technology and Mathematics, University of Canberra, Canberra, Australia, and University of South Australia, Adelaide, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Stańczyk, U. (2015). Feature Evaluation by Filter, Wrapper, and Embedded Approaches. In: Stańczyk, U., Jain, L. (eds) Feature Selection for Data and Pattern Recognition. Studies in Computational Intelligence, vol 584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45620-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-662-45620-0_3
Published: 31 December 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45619-4
Online ISBN: 978-3-662-45620-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Feature Evaluation by Filter, Wrapper, and Embedded Approaches

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Comparison of Embedded and Wrapper Approaches for Feature Selection in Support Vector Machines

Feature Selection and Interpretable Feature Transformation: A Preliminary Study on Feature Engineering for Classification Algorithms

A Survey on Filter Techniques for Feature Selection in Text Mining

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Feature Evaluation by Filter, Wrapper, and Embedded Approaches

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Comparison of Embedded and Wrapper Approaches for Feature Selection in Support Vector Machines

Feature Selection and Interpretable Feature Transformation: A Preliminary Study on Feature Engineering for Classification Algorithms

A Survey on Filter Techniques for Feature Selection in Text Mining

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation