Abstract
Identifying anomalous documents in a text corpus is an important problem that has wide applications. Due to the high dimensional and sparse nature of text data, traditional outlier detection methods fail to identify features that distinguish outliers. Inspired by the capability of Nonnegative Matrix Factorization (NMF) for text clustering, we explore it for text outlier detection. In this paper, a novel NMF-based method called Nonnegative Orthogonal Constraint Outlier Learning (NOCOL) is introduced that learns the outliers effectively during the factorization process. Experimental results show the higher accuracy of NOCOL in identifying text outliers in comparison to the state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Aggarwal, C.C.: Outlier analysis. In: Data Mining, pp. 237–263. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8_8
Allan, E.G., Horvath, M.R., Kopek, C.V., Lamb, B.T., Whaples, T.S., Berry, M.W.: Anomaly detection using nonnegative matrix factorization. In: Survey of Text Mining II, pp. 203–217. Springer, Heidelberg (2008). https://doi.org/10.1007/978-1-84800-046-9_11
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. ACM Sigmod Rec. 29(2), 93–104 (2000)
Choi, S.: Algorithms for orthogonal nonnegative matrix factorization. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1828–1832. IEEE (2008)
Dong, X.L., Srivastava, D.: Big data integration. In: ICDE, pp. 1245–1248. IEEE (2013)
Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: SDM, pp. 47–58. SIAM (2003)
Gokcesu, K., Neyshabouri, M.M., Gokcesu, H., Kozat, S.S.: Sequential outlier detection based on incremental decision trees. IEEE Trans. Signal Process. 67(4), 993–1005 (2018)
Jackson, D.A., Chen, Y.: Robust principal component analysis and outlier detection with ecological data. Environmetrics Off. J. Int. Environmetrics Soc. 15(2), 129–139 (2004)
Kannan, R., Woo, H., Aggarwal, C.C., Park, H.: Outlier detection for text data: an extended version (2017). arXiv preprint arXiv:1701.01325
Li, T., Ding, C.c.: Nonnegative matrix factorizations for clustering: a survey. In: Data Clustering, pp. 149–176. Chapman and Hall/CRC (2013)
Liu, H., Li, X., Li, J., Zhang, S.: Efficient outlier detection for high-dimensional data. IEEE Trans. Syst. Man Cybern. Syst. 48, 2451–2461 (2017)
Liu, Y., et al.: Generative adversarial active learning for unsupervised outlier detection. IEEE Trans. Knowl. Data Eng. 32(8), 1517–1528 (2020)
McGill, R., Tukey, J.W., Larsen, W.A.: Variations of box plots. Am. Stat. 32(1), 12–16 (1978)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. ACM Sigmod Rec. 29(2), 427–438 (2000)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Wang, C., Liu, Z., Gao, H., Fu, Y.: Vos: a new outlier detection model using virtual graph. Knowl.-Based Syst. 185, 104907 (2019)
Wang, H., Bah, M.J., Hammad, M.: Progress in outlier detection techniques: a survey. IEEE Access 7, 107964–108000 (2019)
Wang, X., Zheng, Q., Zheng, K., Sui, Y., Cao, S., Shi, Y.: Detecting social media bots with variational autoencoder and k-nearest neighbor. Appl. Sci. 11(12), 5482 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Balasubramaniam, T., Mohotti, W.A., Nayak, R., Yuen, C. (2021). NOCOL - Nonnegative Orthogonal Constraint Outlier Learning. In: Zhang, W., Zou, L., Maamar, Z., Chen, L. (eds) Web Information Systems Engineering – WISE 2021. WISE 2021. Lecture Notes in Computer Science(), vol 13081. Springer, Cham. https://doi.org/10.1007/978-3-030-91560-5_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-91560-5_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91559-9
Online ISBN: 978-3-030-91560-5
eBook Packages: Computer ScienceComputer Science (R0)