Applying Time Series for Background User Identification Based on Their Text Data Analysis

V. Yu. Korolev¹,
A. Yu. Korchagin¹,
I. V. Mashechkin¹,
M. I. Petrovskii¹ &
…
D. V. Tsarev¹

134 Accesses
Explore all metrics

Abstract

An approach to user identification based on deviations of their topic trends in operation with text information is presented. An approach is proposed to solve this problem; the approach implies topic analysis of the user’s past trends (behavior) in operation with text content of various (including confidential) categories and forecast of their future behavior. The topic analysis of user’s operation implies determining the principal topics of their text content and calculating their respective weights at the given instants. Deviations in the behavior in the user’s operation with the content from the forecast are used to identify this user. In the framework of this approach, our own original time series forecasting method is proposed based on orthogonal non-negative matrix factorization (ONMF). Note that ONMF has not been used to solve time series forecasting problems before. The experimental research held on the example of real-world corporate emailing formed out of the Enron data set showed the proposed user identification approach to be applicable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Using Time Series Analysis for Estimating the Time Stamp of a Text

Time-series topic analysis using singular spectrum transformation for detecting political business cycles

Article Open access 06 March 2021

Using LDA and Time Series Analysis for Timestamping Documents

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

REFERENCES

Yampolskiy, V.R. and Govindaraju, V., Behavioural biometrics: a survey and classification, Int. J. Biometrics (IJBM), 2008, vol. 1, no. 1.
Time Series. http://www.machinelearning.ru/wiki/ index.php?title=Временной ряд. Cited March 24, 2015.
Mashechkin, I.V., Petrovskii, M.I., and Tsarev, D.V., Methods for calculation of text fragment relevance based on subject area models in the problem of automatic annotation, Numer. Methods Program, 2013, vol. 14, no. 1, pp. 91–102.
Google Scholar
Mashechkin, I.V., Petrovskiy, M.I., Popov, D.S., and Tsarev, D.V., Automatic text summarization using latent semantic analysis, Program. Comput. Software, 2011, vol. 37, no. 6, pp. 299–305.
Article MathSciNet MATH Google Scholar
Tsarev, D.V., Petrovskiy, M.I., and Mashechkin, I.V., Using NMF-based text summarization to improve supervised and unsupervised classification, 11th Int. Conf. on Hybrid Intelligent Systems (HIS 2011), Malacca, Malaysia, 2011 (IEEE, 2011), pp. 185–189.
Tsarev, D.V., Petrovskiy, M.I., and Mashechkin, I.V., Supervised and unsupervised text classification via generic summarization, International Journal of Computer Information Systems and Industrial Management Applications, MIR Labs, 2013, vol. 5, pp. 509–515.
Google Scholar
Mashechkin, I.V., Petrovskiy, M.I., Popov, D.S., and Tsarev D.V., Applying text mining methods for data loss prevention, Program. Comput. Software, 2015, vol. 41, no. 1, pp. 23–30.
Article Google Scholar
Manning, C.D., Raghavan, P., and Schutze, H., Introduction to Information Retrieval, Cambridge: Cambridge University Press, 2008.
Book MATH Google Scholar
Mirzal, A., Converged Algorithms for Orthogonal Nonnegative Matrix Factorizations. CoRR abs/ 1010.5290, 2010.
Wei Xu, Xin Liu, and Yihong Gong, Document clustering based on non-negative matrix factorization, Proc. 26th Annu. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Toronto, Canada, 2003.
Chris Ding, Tao Li, Wei Peng, Haesun Park, Orthogonal nonnegative matrix tri-factorizations for clustering, SIGKDD, 2006.
Google Scholar
Berry, M.W., Browne, M., Langville, A.N., Pauca, V.P., and Plemmons, R.J., Algorithms and applications for approximate nonnegative matrix factorization, Computational Statistics Data Analysis, 2007, vol. 52, no. 1, pp. 155–173.
Article MathSciNet MATH Google Scholar
Yoo, J. and Choi, S., Orthogonal nonnegative matrix factorization: multiplicative updates on Stiefel manifolds, Intelligent Data Engineering and Automated Learning — IDEAL 2008, Lect. Notes Comput. Sci., 2008, vol. 5326, pp. 140–147.
Article Google Scholar
Meek, C., Chickering, D.M., and Heckerman, D., Autoregressive tree models for time-series analysis, Proc. 2002 SIAM Int. Conf. on Data Mining, SIAM, August 4, 2002. http://go.microsoft.com/fwlink/ ?LinkId=45966.
Microsoft Time Series Algorithm Technical Reference. http://msdn.microsoft.com/ru-ru/library/bb677216. aspx.
Hastie, T., Tibshirani, R., Sherlock, G., Eisen, M., Brown, P., and Botstein, D., Imputing missing data for gene expression arrays, Technical Report, Stanford Statistics Department, 1999.
Google Scholar
Troyanskaya, O., Missing value estimation methods for DNA microarrays, Bioinformatics, 2001, vol. 17, no. 6, pp. 520–525.
Article Google Scholar
Tsarev, D.V., Kurynin, R.V., Petrovskiy, M.I., and Mashechkin, I.V., Applying non-negative matrix factorization methods to discover user’s resource access patterns for computer security tasks, Proc. 2014 Int. Conf. on Hybrid Intelligent Systems (HIS 2014), IEEE Computer Society, New York, United States, 2014, pp. 43–48.
Lee, D. and Seung, S., Learning the parts of objects by non-negative matrix factorization, Nature, 1999, vol. 401, pp.788–791.
Article MATH Google Scholar
Enron Email Dataset. http://www.cs.cmu.edu/~./ enron/. Cited March 24, 2015.
Natural Language Toolkit (NLTK). http://www.nltk. org. Cited March 24, 2015.
Kendall, M. and Stuart, A., The Advanced Theory of Statistics, New York: McGraw-Hill, 1969.
MATH Google Scholar
Receiver Operating Characteristic (ROC) curve. http://www.machinelearning.ru/wiki/index.php?title =ROC-кpивaя. Cited March 24, 2015.

Download references

Author information

Authors and Affiliations

Faculty of Computational Mathematics and Cybernetics, Lomonosov State University, 119991, Moscow, Russia
V. Yu. Korolev, A. Yu. Korchagin, I. V. Mashechkin, M. I. Petrovskii & D. V. Tsarev

Authors

V. Yu. Korolev
View author publications
You can also search for this author in PubMed Google Scholar
A. Yu. Korchagin
View author publications
You can also search for this author in PubMed Google Scholar
I. V. Mashechkin
View author publications
You can also search for this author in PubMed Google Scholar
M. I. Petrovskii
View author publications
You can also search for this author in PubMed Google Scholar
D. V. Tsarev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to V. Yu. Korolev, A. Yu. Korchagin, I. V. Mashechkin, M. I. Petrovskii or D. V. Tsarev.

Additional information

Translated by M. Talacheva

Rights and permissions

Reprints and permissions

About this article

Cite this article

Korolev, V.Y., Korchagin, A.Y., Mashechkin, I.V. et al. Applying Time Series for Background User Identification Based on Their Text Data Analysis. Program Comput Soft 44, 353–362 (2018). https://doi.org/10.1134/S0361768818050055

Download citation

Received: 08 August 2017
Published: 21 September 2018
Issue Date: September 2018
DOI: https://doi.org/10.1134/S0361768818050055

Applying Time Series for Background User Identification Based on Their Text Data Analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Using Time Series Analysis for Estimating the Time Stamp of a Text

Time-series topic analysis using singular spectrum transformation for detecting political business cycles

Using LDA and Time Series Analysis for Timestamping Documents

REFERENCES

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Keywords:

Subscribe and save

Buy Now

Navigation

Applying Time Series for Background User Identification Based on Their Text Data Analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Using Time Series Analysis for Estimating the Time Stamp of a Text

Time-series topic analysis using singular spectrum transformation for detecting political business cycles

Using LDA and Time Series Analysis for Timestamping Documents

Explore related subjects

REFERENCES

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords:

Subscribe and save

Buy Now

Search

Navigation