Process monitoring using maximum sequence divergence

Yihuang Kang¹ &
Vladimir Zadorozhny²

367 Accesses
4 Citations
2 Altmetric
Explore all metrics

Abstract

Process monitoring involves tracking a system’s behaviors, evaluating the current state of the system, and discovering interesting events that require immediate actions. In this paper, we consider monitoring temporal system state sequences to help detect the changes of dynamic systems, check the divergence of the system development, and evaluate the significance of the deviation. We begin with discussions of data reduction, symbolic data representation, and anomaly detection in temporal discrete sequences. Time-series representation methods are also discussed and used in this paper to discretize raw data into sequences of system states. Markov chains and stationary-state distributions are continuously generated from temporal sequences to represent snapshots of the system dynamics in different time frames. We use generalized Jensen–Shannon divergence as the measure to monitor changes of the stationary symbol probability distributions and evaluate the significance of system deviations. We prove that the proposed approach is able to detect deviations of the systems we monitor and assess the deviation significance in probabilistic manner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Change Point Detection and Dealing with Gradual and Multi-order Dynamics in Process Mining

Clustering Traces Using Sequence Alignment

Simultaneous Process Drift Detection and Characterization with Pattern-Based Change Detectors

References

Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15:1–15:58
Article Google Scholar
Willsky A, Jones H (1976) A generalized likelihood ratio approach to the detection and estimation of jumps in linear systems. IEEE Trans Automat Control 21(1):108–112
Article MathSciNet MATH Google Scholar
Basseville M, Nikiforov IV (1993) Detection of abrupt changes: theory and application, vol 104, Prentice Hall, Englewood Cliffs
Gustafsson F (1996) The marginalized likelihood ratio test for detecting abrupt changes. IEEE Trans Automat Control 41(1):66–78
Article MathSciNet MATH Google Scholar
Yamanishi K, Takeuchi J (2002) A unifying framework for detecting outliers and change points from non-stationary time series data. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 676–681
Kawahara Y, Sugiyama M (2009) Change-point detection in time-series data by direct density-ratio estimation. In: Proceedings of 2009 SIAM International Conference on Data Mining (SDM2009), pp 389–400
Montgomery DC (2008) Introduction to statistical quality control, 6th edn. Wiley, New York
MATH Google Scholar
Fodor IK (2002) A survey of dimension reduction techniques. Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Los Alamos
Book Google Scholar
Daw CS, Finney CEA, Tracy ER (2003) A review of symbolic analysis of experimental data. Review of Scientific Instruments 74(2):915–930
Article Google Scholar
Bergroth L, Hakonen H, Raita T (2000) A survey of longest common subsequence algorithms. In: String processing and information retrieval, 2000. SPIRE 2000 Proceedings. Seventh international symposium on, pp 39 –48
Gusfield D (1997) Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge
Book MATH Google Scholar
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Article MathSciNet MATH Google Scholar
Bhattacharyya A (1946) On a measure of divergence between two multinomial populations Sankhyā. Indian J Stat (1933–1960) 7(4):401–406
MATH Google Scholar
Lin J (1991) Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory 37:145–151
Article MathSciNet MATH Google Scholar
“Markov chain,” Wikipedia, the free encyclopedia, 09 Nov 2014. http://en.wikipedia.org/w/index.php?title=Markov_chain&oldid=632678042
Langville AN, Meyer CD (2006) Google PageRank and Beyond. Princeton University Press, Princeton
MATH Google Scholar
Grosse I, Bernaola-Galván P, Carpena P, Román-Roldán R, Oliver J, Stanley HE (2002) Analysis of symbolic sequences using the Jensen-Shannon divergence. Phys Rev E 65(4):041905
Article MathSciNet MATH Google Scholar
Keogh E, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4):349–371
Article MathSciNet Google Scholar
Jolliffe IT (2002) Principal component analysis. Springer, New York
MATH Google Scholar
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. SIGMOD Rec 23(2):419–429
Article Google Scholar
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286
Article MATH Google Scholar
Keogh EJ, Pazzani MJ (2000) Scaling up dynamic time warping for datamining applications. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, Boston, pp 285–289
Agrawal R, Psaila G, Wimmers EL, Zaït M (1995) Querying shapes of histories. In: Proceedings of the 21th International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc, pp 502–514
Keogh E, Lin J, Fu A (2005) Hot sax: efficiently finding the most unusual time series subsequence. In: Data mining, IEEE international conference on, Los Alamitos, pp 226–233
Robin S, Rodolphe F, Schbath S (2005) DNA, words and models. Cambridge University Press, Cambridge
MATH Google Scholar
Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, p 11
Shieh J, Keogh E (2008) iSAX: indexing and mining terabyte sized time series. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, Las Vegas, pp 623–631
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144
Article MathSciNet Google Scholar
Van Der Aalst WMP (2011) Process mining: discovery, conformance and enhancement of business processes. Springer, New York
Book MATH Google Scholar
Koller D, Friedman N (2009) Probabilistic graphical models. MIT press, Cambridge
MATH Google Scholar
Cassandras CG, Lafortune S (2008) Introduction to discrete event systems, 2nd edn. Springer, New York
Book MATH Google Scholar
Arvey AJ, Azad RK, Raval A, Lawrence JG (2009) Detection of genomic islands via segmental genome heterogeneity. Nucleic Acids Res 37(16):5255–5266
Article Google Scholar
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7):107–117
Article Google Scholar
Majtey AP, Lamberti PW, Prato DP (2005) Jensen-Shannon divergence as a measure of distinguishability between mixed quantum states. Phys Rev A 72(5):052310
Article Google Scholar
Lamberti PW, Majtey AP, Borras A, Casas M, Plastino A (2008) Metric character of the quantum Jensen-Shannon divergence. Phys Rev A 77(5):052311
Article Google Scholar
Patil GP (2002) Weighted distributions. John Wiley & Sons, Ltd
Herzel H, Große I (1997) Correlations in DNA sequences: the role of protein coding segments. Phys Rev E 55(1):800–810
Article Google Scholar
R Core Team (2014) R: a Language and environment for statistical computing, R Foundation for Statistical Computing, Vienna. http://www.R-project.org/
Y. Kang, Supplemental contents for process monitoring using maximum sequence divergence. http://ykang.info/ProcessMon/index.html
Lake JA (1994) Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc Natl Acad Sci US Am 91(4):1455–1459
Article Google Scholar
Budalakoti S, Budalakoti S, Srivastava AN, Otey ME, Otey ME (2009) Anomaly detection and diagnosis algorithms for discrete symbol sequences with applications to airline safety. IEEE Trans Syst Man Cybern Part C 39(1):101–113
Article Google Scholar
Jiang T, Li M (1995) On the approximation of shortest common supersequences and longest common subsequences. SIAM J Compu 24(5):1122–1139
Article MathSciNet MATH Google Scholar
Srivastava A, Sahami M (2009) Text mining: classification, clustering, and applications, vol 10. Chapman & Hall/CRC, Boca Raton
MATH Google Scholar
Durbin R (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge
Book MATH Google Scholar
“UCI Machine Learning Repository,” Ozone level detection data set. http://archive.ics.uci.edu/ml/datasets/Ozone+Level+Detection. Accessed 20 Dec 2013]
Zhang K, Fan W (2008) Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond. Knowl Inf Syst 14(3):299–326
Article Google Scholar
Chandola V, Banerjee A, Kumar V (2012) Anomaly detection for discrete sequences: a survey. IEEE Trans Knowl Data Eng 24(5):823–839
Article Google Scholar

Download references

Acknowledgments

We would like to thank editor and anonymous reviewers for their time and feedback that help improve this paper. We would also like to thank Dr. Tatjana Welzer in the Institute of Informatics, University of Maribor, for her advice on data integration and analysis.

Author information

Authors and Affiliations

Department of Information Management, National Sun Yat-sen University, Kaohsiung, 80424, Taiwan
Yihuang Kang
School of Information Sciences, University of Pittsburgh, Pittsburgh, PA, 15260, USA
Vladimir Zadorozhny

Authors

Yihuang Kang
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Zadorozhny
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yihuang Kang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kang, Y., Zadorozhny, V. Process monitoring using maximum sequence divergence. Knowl Inf Syst 48, 81–109 (2016). https://doi.org/10.1007/s10115-015-0858-z

Download citation

Received: 15 March 2014
Revised: 17 December 2014
Accepted: 06 July 2015
Published: 22 July 2015
Issue Date: July 2016
DOI: https://doi.org/10.1007/s10115-015-0858-z

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Change Point Detection and Dealing with Gradual and Multi-order Dynamics in Process Mining

Clustering Traces Using Sequence Alignment

Simultaneous Process Drift Detection and Characterization with Pattern-Based Change Detectors

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Process monitoring using maximum sequence divergence

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Change Point Detection and Dealing with Gradual and Multi-order Dynamics in Process Mining

Clustering Traces Using Sequence Alignment

Simultaneous Process Drift Detection and Characterization with Pattern-Based Change Detectors

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation