Weighting of Features by Sequential Selection

Urszula Stańczyk⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 584))

3029 Accesses
3 Citations

Abstract

Constructing a set with characteristic features for supervised classification is a task which can be considered as preliminary for the intended purpose, just a step to take on the way, yet with its significance and bearing on the outcome, the level of difficulty and computational costs involved, the problem has evolved in time to constitute by itself a field of intense study. We can use statistics, available expert domain knowledge, specialised procedures, analyse the set of all accessible features and reduce them backward, we can examine them one by one and select them forward. The process of sequential selection can be conditioned by the performance of a classification system, while exploiting a wrapper model, and the observations with respect to selected variables can result in assignment of weights and ranking. The chapter illustrates weighting of features with the procedures of sequential backward and forward selection for rule and connectionist classifiers employed in the stylometric task of authorship attribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 119.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 149.99; Price includes VAT (United Kingdom)

Hardcover Book: GBP 149.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Ranking-Based Rule Classifier Optimisation

The autofeat Python Library for Automated Feature Engineering and Selection

Feature Selection: Traditional and Wrapping Techniques with Tabu Search

References

Ahonen, H., Heinonen, O., Klemettinen, M., Verkamo, A.: Applying data mining techniques in text analysis. Technical Report C-1997-23, Department of Computer Science, University of Helsinki, Finland (1997)
Google Scholar
Argamon, S., Burns, K., Dubnov, S. (eds.): The Structure of Style: Algorithmic Approaches to Understanding Manner and Meaning. Springer, Berlin (2010)
Google Scholar
Argamon, S., Karlgren, J., Shanahan, J.: Stylistic analysis of text for information access. In: Proceedings of the 28th International ACM Conference on Research and Development in Information Retrieval, Brazil (2005)
Google Scholar
Baayen, H., van Haltern, H., Tweedie, F.: Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Lit. Linguist. Comput. 11(3), 121–132 (1996)
Article Google Scholar
Bayardo Jr., R., Agrawal, R.: Mining the most interesting rules. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 145–154 (1999)
Google Scholar
Berber Sardinha, T.: Using key words in text analysis: practical aspects (1999). Available on-line from ftp://ftp.liv.ac.uk/pub/linguistics
Burrows, J.: Textual analysis. In: Schreibman, S., Siemens, R., Unsworth, J. (eds.) A Companion to Digital Humanities. Blackwell, Oxford (2004)
Google Scholar
Craig, H.: Stylistic analysis and authorship studies. In: Schreibman, S., Siemens, R., Unsworth, J. (eds.) A Companion to Digital Humanities. Blackwell, Oxford (2004)
Google Scholar
Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1, 131–156 (1997)
Article Google Scholar
Dash, M., Liu, H.: Consistency-based search in feature selection. Artif. Intell. 151, 155–176 (2003)
Article MathSciNet MATH Google Scholar
Fiesler, E., Beale, R.: Handbook of Neural Computation. Oxford University Press, Oxford (1997)
Book Google Scholar
Greco, S., Matarazzo, B., Słowiñski, R.: Advances in multiple criteria decision making. In: Gal, T., Hanne, T., Stewart, T. (eds.) The Use of Rough Sets and Fuzzy Sets in Multi Criteria Decision Making Chap. 14, pp. 14.1–14.59. Kluwer Academic Publishers, Boston (1999)
Google Scholar
Greco, S., Matarazzo, B., Słowiński, R.: Rough set theory for multicriteria decision analysis. Eur. J. Oper. Res. 129(1), 1–47 (2001)
Article MATH Google Scholar
Greco, S., Matarazzo, B., Słowiński, R.: Dominance-based rough set approach as a proper way of handling graduality in rough set theory. Trans. Rough Sets 7, 36–52 (2007)
Google Scholar
Greco, S., Słowiński, R., Stefanowski, J.: Evaluating importance of conditions in the set of discovered rules. Lect. Notes Artif. Intell. 4482, 314–321 (2007)
Google Scholar
Greco, S., Słowiński, R., Stefanowski, J., Żurawski, M.: Incremental versus non-incremental rule induction for multicriteria classification. Trans. Rough Sets 2, 33–53 (2004)
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Jelonek, J., Krawiec, K., Stefanowski, J.: Comparative study of feature subset selection techniques for machine learning tasks. In: Proceedings of the 7th Workshop on Intelligent Information Systems (1998)
Google Scholar
Jensen, R., Shen, Q.: Computational Intelligence and Feature Selection. Wiley, Hoboken (2008)
Book Google Scholar
John, G., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Cohen, W., Hirsh, H. (eds.) Machine Learning: Proceedings of the 11th International Conference, pp. 121–129. Morgan Kaufmann Publishers (1994)
Google Scholar
Kavzoglu, T., Mather, P.: Assessing artificial neural network pruning algorithms. In: Proceedings of the 24th Annual Conference and Exhibition of the Remote Sensing Society, pp. 603–609. Greenwich (2011)
Google Scholar
Khmelev, D., Tweedie, F.: Using Markov chains for identification of writers. Lit. Linguist. Comput. 16(4), 299–307 (2001)
Article Google Scholar
Kingston, G., Maier, H., Lambert, M.: A statistical input pruning method for artificial neural networks used in environmental modelling. In: Transactions of the 2nd Biennial Meeting of the International Environmental Modelling and Software Society, pp. 87–92. Osnabrueck (2004)
Google Scholar
Liu, H., Motoda, H.: Computational Methods of Feature Selection. Chapman & Hall/CRC, Boca Raton (2008)
Google Scholar
Lynam, T., Clarke, C., Cormack, G.: Information extraction with term frequencies. In: Proceedings of the Human Language Technology Conference, pp. 1–4. San Diego (2001)
Google Scholar
Moshkov, M., Piliszczuk, M., Zielosko, B.: On partial covers, reducts and decision rules with weights. Trans. Rough Sets 6, 211–246 (2006)
Google Scholar
Moshkow, M., Skowron, A., Suraj, Z.: On covering attribute sets by reducts. In: Kryszkiewicz, M., Peters, J., Rybinski, H., Skowron, A. (eds.) Rough Sets and Emerging Intelligent Systems Paradigms. LNCS (LNAI), vol. 4585, pp. 175–180. Springer, Berlin (2007)
Chapter Google Scholar
Munro, R.: A Queing-theory model of word frequency distributions. In: Proceedings of the 1st Australasian Language Technology Workshop, pp. 1–8. Melbourne (2003)
Google Scholar
Pawlak, Z.: Rough sets. Int. J. Comput. Inform. Sci. 11(5), 341–356 (1982)
Article MathSciNet MATH Google Scholar
Pawlak, Z.: Rough sets and intelligent data analysis. Inf. Sci. 147, 1–12 (2002)
Article MathSciNet MATH Google Scholar
Peng, R.: Statistical aspects of literary style. Bachelor’s Thesis, Yale University (1999)
Google Scholar
Peng, R., Hengartner, H.: Quantitative analysis of literary styles. Am. Stat. 56(3), 15–38 (2002)
Article MathSciNet Google Scholar
Shen, Q.: Rough feature selection for intelligent classifiers. Trans. Rough Sets 7, 244–255 (2006)
Google Scholar
Sikora, M.: Rule quality measures in creation and reduction of data rule models. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H., Słowiński, R. (eds.) Rough Sets and Current Trends in Computing. Lecture Notes in Computer Science, vol. 4259, pp. 716–725. Springer (2006)
Google Scholar
Słowiński, R., Greco, S., Matarazzo, B.: Dominance-Based Rough Set Approach to Reasoning About Ordinal Data. LNCS (LNAI), vol. 4585, pp. 5–11 (2007)
Google Scholar
Stańczyk, U.: Relative reduct-based selection of features for ANN classifier. In: Cyran, K., Kozielski, S., Peters, J., Stańczyk, U., Wakulicz-Deja, A. (eds.) Man-Machine Interactions. AISC, vol. 59, pp. 335–344. Springer, Berlin (2009)
Chapter Google Scholar
Stańczyk, U.: DRSA decision algorithm analysis in stylometric processing of literary texts. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) Rough Sets and Current Trends in Computing. LNCS (LNAI), vol. 6086, pp. 600–609. Springer, Berlin (2010)
Chapter Google Scholar
Stańczyk, U.: Reduct-based analysis of decision algorithms: application in computational stylistics. In: Corchado, M., Kurzyński, E., Woźniak, M.(eds.) Hybrid Artificial Intelligence Systems. Part 2. LNCS (LNAI), vol. 6679, pp. 295–302. Springer (2011)
Google Scholar
Stańczyk, U.: Rule-based approach to computational stylistics. In: Bouvry, P., Kłopotek, M., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) Security and Intelligent Information Systems. LNCS (LNAI), vol. 7053, pp. 168–179. Springer, Berlin (2012)
Chapter Google Scholar
Stańczyk, U.: On preference order of DRSA conditional attributes for computational stylistics. In: Decker, H., Lhotska, L., Link, S., Tjoa, B.J,A. (eds.) Database and Expert Systems Applications. LNCS, pp. 26–33. Springer, Berlin (2013)
Chapter Google Scholar
Stańczyk, U.: Relative reduct-based estimation of relevance for stylometric features. In: Catania, B., Guerrini, G., Pokorny, J. (eds.) Advances in Databases and Information Systems. LNCS, vol. 8133, pp. 135–147. Springer, Berlin (2013)
Chapter Google Scholar
Waugh, S., Adams, A., Tweedie, F.: Computational stylistics using artificial neural networks. Lit. Linguist. Comput. 15(2), 187–198 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland
Urszula Stańczyk

Authors

Urszula Stańczyk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Urszula Stańczyk .

Editor information

Editors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Urszula Stańczyk
Mawson Lakes Campus, Faculty of Education, Science, Technology and Mathematics, University of Canberra, Canberra, Australia, and University of South Australia, Adelaide, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Stańczyk, U. (2015). Weighting of Features by Sequential Selection. In: Stańczyk, U., Jain, L. (eds) Feature Selection for Data and Pattern Recognition. Studies in Computational Intelligence, vol 584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45620-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-662-45620-0_5
Published: 31 December 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45619-4
Online ISBN: 978-3-662-45620-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics