Abstract
In this paper, we present a methodology for identifying best features from a large feature space. In high dimensional feature space nearest neighbor search is meaningless. In this feature space we see quality and performance issue with nearest neighbor search. Many data mining algorithms use nearest neighbor search. So instead of doing nearest neighbor search using all the features we need to select relevant features. We propose feature selection using Non-negative Matrix Factorization(NMF) and its application to nearest neighbor search.
Recent clustering algorithm based on Locally Consistent Concept Factorization(LCCF) shows better quality of document clustering by using local geometrical and discriminating structure of the data. By using our feature selection method we have shown further improvement of performance in the clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Beyer, K., Goldstein, J., Ramakrishnan, R.: Shaft, Uri.: When is ”Nearest Neighbor” Meaningful? In: Int. Conf. on Database Theory (1999)
Christopher, J.C.B.: A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 2, 121–167 (1998)
Cai, D., He, X.F., Han, J.W.: Locally Consistent Concept Factorization for Document Clustering. IEEE Trans. on Knowl. and Data Eng. 23, 902–913 (2011)
Chapelle, O., Keerthi, S.: Multi-class Feature Selection with Support Vector Machines. In: Proceedings of the American Statistical Association (2008)
Cunningham, P., Delany, S.J.: K-nearest Neighbour Classifiers. Technical Report (2007)
Lee, D.D., Seung, H.S.: Algorithms for Non-negative Matrix Factorization. In: Advances in Neural Information Processing Systems, vol. 13, MIT Press (2001)
Xu, W., Gong, Y.H.: Document Clustering by Concept Factorization. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2004. ACM (2004)
Xu, W., Liu, X., Gong, Y.H.: Document Clustering Based on Non-negative Matrix Factorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR 2003. ACM (2003)
Yang, Y.M., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. Morgan Kaufmann Publishers (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Adhikary, J.R., Narasimha Murty, M. (2012). Feature Selection for Unsupervised Learning. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds) Neural Information Processing. ICONIP 2012. Lecture Notes in Computer Science, vol 7665. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34487-9_47
Download citation
DOI: https://doi.org/10.1007/978-3-642-34487-9_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34486-2
Online ISBN: 978-3-642-34487-9
eBook Packages: Computer ScienceComputer Science (R0)