Active Feature Selection Using Classes

Huan Liu⁵,
Lei Yu⁵,
Manoranjan Dash⁶ &
…
Hiroshi Motoda⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2637))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1224 Accesses
11 Citations

Abstract

Feature selection is frequently used in data pre-processing for data mining. When the training data set is too large, sampling is commonly used to overcome the difficulty. This work investigates the applicability of active sampling in feature selection in a filter model setting. Our objective is to partition data by taking advantage of class information so as to achieve the same or better performance for feature selection with fewer but more relevant instances than random sampling. Two versions of active feature selection that employ class information are proposed and empirically evaluated. In comparison with random sampling, we conduct extensive experiments with benchmark data sets, and analyze reasons why class-based active feature selection works in the way it does. The results will help us deal with large data sets and provide ideas to scale up other feature selection algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A survey on feature selection approaches for clustering

Article 02 January 2020

Constrained class-wise feature selection (CCFS)

Article 20 June 2022

A Stable Instance Based Filter for Feature Selection in Small Sample Size Data Sets

References

R. Kohavi and G.H. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1–2):273–324, 1997.
Article MATH Google Scholar
H. Liu and H. Motoda. Feature Selection for Knowledge Discovery & Data Mining. Boston: Kluwer Academic Publishers, 1998.
MATH Google Scholar
M. Dash and H. Liu. Feature selection methods for classifications. Intelligent Data Analysis: An International Journal, 1(3), 1997.
Google Scholar
L. Talavera. Feature selection as a preprocessing step for hierarchical clustering. In Proceedings of Internationl Conference on Machine Learning (ICML’99), 1999.
Google Scholar
U.M. Fayyad and K.B. Irani. The attribute selection Problem in decision tree generation. In AAAI-92, Proceedings of the Ninth National Conference on Artificial Intelligence, pages 104–110. AAAI Press/The MIT Press, 1992.
Google Scholar
G.H. John, R. Kohavi, and K. Pfleger. Irrelevant feature and the subset selection Problem. In W.W. Cohen and Hirsh H., editors, Machine Learning: Proceedings of the Eleventh International Conference, pages 121–129, New Brunswick, N.J., 1994. Rutgers University.
Google Scholar
P. Langley. Selection of relevant features in machine learning. In Proceedings of the AAAI Fall Symposium on Relevance. AAAI Press, 1994.
Google Scholar
P.S. Bradley and O. L. Mangasarian. Feature selection via concave minimization and support vector machines. In Proceedings of Fifteenth International Conference on Machine Learning, pages 82–90, 1998.
Google Scholar
M.A. Hall. Correlation-based feature selection for discrete and numeric class machine learning. In Proceedings of Seventeenth International Conference on Machine Learning (ICML-00). Morgan Kaufmann Publishers, 2000.
Google Scholar
W.G. Cochran. Sampling Techniques. John Wiley & Sons, 1977.
Google Scholar
B. Gu, F. Hu, and H. Liu. Sampling: Knowing Whole from Its Part, pages 21–38. Boston: Kluwer Academic Publishers, 2001.
Google Scholar
K. Kira and L.A. Rendell. The feature selection Problem: Traditional methods and a new algorithm. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 129–134. Menlo Park: AAAI Press/The MIT Press, 1992.
Google Scholar
I. Kononenko. Estimating attributes: Analysis and extension of RELIEF. In F. Bergadano and L. De Raedt, editors, Proceedings of the European Conference on Machine Learning, April 6–8, pages 171–182, Catania, Italy, 1994. Berlin: Springer.
Google Scholar
J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
Google Scholar
I.H. Witten and E. Frank. Data Mining — Practical Machine Learning Tools and Techniques with JAVA Implementations. Morgan Kaufmann Publishers, 2000.
Google Scholar
C.L. Blake and C.J. Merz. UCI repository of machine learning databases, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html.
S.D. Bay. The UCI KDD archive, 1999. http://kdd.ics.uci.edu.

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, Arizona State University, Tempe, AZ, 85287-5406
Huan Liu & Lei Yu
Department of Elec. & Computer Engineering, Northwestern University, Evanston, IL, 60201-3118
Manoranjan Dash
Institute of Scientific & Industrial Research, Osaka University, Ibaraki, Osaka, 567-0047, Japan
Hiroshi Motoda

Authors

Huan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Lei Yu
View author publications
You can also search for this author in PubMed Google Scholar
Manoranjan Dash
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Motoda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, Korea Advanced Institute of Science and Technology, 373-1 Koo-Sung Dong, Yoo-Sung Ku, Daejeon, 305-701, Korea
Kyu-Young Whang
Department of Statistics, Seoul National University, Sillimdong Kwanakgu, Seoul, 151-742, Korea
Jongwoo Jeon
School of Electrical Engineering and Computer Science, Seoul National University, Kwanak P.O. Box 34, Seoul, 151-742, Korea
Kyuseok Shim
Department of Computer Science and Engineering, University of Minnesota, 200 Union St SE, Minneapolis, MN, 55455, USA
Jaideep Srivastava

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, H., Yu, L., Dash, M., Motoda, H. (2003). Active Feature Selection Using Classes. In: Whang, KY., Jeon, J., Shim, K., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2003. Lecture Notes in Computer Science(), vol 2637. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36175-8_48

Download citation

DOI: https://doi.org/10.1007/3-540-36175-8_48
Published: 30 April 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-04760-5
Online ISBN: 978-3-540-36175-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Active Feature Selection Using Classes

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A survey on feature selection approaches for clustering

Constrained class-wise feature selection (CCFS)

A Stable Instance Based Filter for Feature Selection in Small Sample Size Data Sets

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Active Feature Selection Using Classes

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A survey on feature selection approaches for clustering

Constrained class-wise feature selection (CCFS)

A Stable Instance Based Filter for Feature Selection in Small Sample Size Data Sets

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation