[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Process monitoring using maximum sequence divergence

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Process monitoring involves tracking a system’s behaviors, evaluating the current state of the system, and discovering interesting events that require immediate actions. In this paper, we consider monitoring temporal system state sequences to help detect the changes of dynamic systems, check the divergence of the system development, and evaluate the significance of the deviation. We begin with discussions of data reduction, symbolic data representation, and anomaly detection in temporal discrete sequences. Time-series representation methods are also discussed and used in this paper to discretize raw data into sequences of system states. Markov chains and stationary-state distributions are continuously generated from temporal sequences to represent snapshots of the system dynamics in different time frames. We use generalized Jensen–Shannon divergence as the measure to monitor changes of the stationary symbol probability distributions and evaluate the significance of system deviations. We prove that the proposed approach is able to detect deviations of the systems we monitor and assess the deviation significance in probabilistic manner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  1. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15:1–15:58

    Article  Google Scholar 

  2. Willsky A, Jones H (1976) A generalized likelihood ratio approach to the detection and estimation of jumps in linear systems. IEEE Trans Automat Control 21(1):108–112

    Article  MathSciNet  MATH  Google Scholar 

  3. Basseville M, Nikiforov IV (1993) Detection of abrupt changes: theory and application, vol 104, Prentice Hall, Englewood Cliffs

  4. Gustafsson F (1996) The marginalized likelihood ratio test for detecting abrupt changes. IEEE Trans Automat Control 41(1):66–78

    Article  MathSciNet  MATH  Google Scholar 

  5. Yamanishi K, Takeuchi J (2002) A unifying framework for detecting outliers and change points from non-stationary time series data. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 676–681

  6. Kawahara Y, Sugiyama M (2009) Change-point detection in time-series data by direct density-ratio estimation. In: Proceedings of 2009 SIAM International Conference on Data Mining (SDM2009), pp 389–400

  7. Montgomery DC (2008) Introduction to statistical quality control, 6th edn. Wiley, New York

    MATH  Google Scholar 

  8. Fodor IK (2002) A survey of dimension reduction techniques. Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Los Alamos

    Book  Google Scholar 

  9. Daw CS, Finney CEA, Tracy ER (2003) A review of symbolic analysis of experimental data. Review of Scientific Instruments 74(2):915–930

    Article  Google Scholar 

  10. Bergroth L, Hakonen H, Raita T (2000) A survey of longest common subsequence algorithms. In: String processing and information retrieval, 2000. SPIRE 2000 Proceedings. Seventh international symposium on, pp 39 –48

  11. Gusfield D (1997) Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  12. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

    Article  MathSciNet  MATH  Google Scholar 

  13. Bhattacharyya A (1946) On a measure of divergence between two multinomial populations Sankhyā. Indian J Stat (1933–1960) 7(4):401–406

    MATH  Google Scholar 

  14. Lin J (1991) Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory 37:145–151

    Article  MathSciNet  MATH  Google Scholar 

  15. “Markov chain,” Wikipedia, the free encyclopedia, 09 Nov 2014. http://en.wikipedia.org/w/index.php?title=Markov_chain&oldid=632678042

  16. Langville AN, Meyer CD (2006) Google PageRank and Beyond. Princeton University Press, Princeton

    MATH  Google Scholar 

  17. Grosse I, Bernaola-Galván P, Carpena P, Román-Roldán R, Oliver J, Stanley HE (2002) Analysis of symbolic sequences using the Jensen-Shannon divergence. Phys Rev E 65(4):041905

    Article  MathSciNet  MATH  Google Scholar 

  18. Keogh E, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4):349–371

    Article  MathSciNet  Google Scholar 

  19. Jolliffe IT (2002) Principal component analysis. Springer, New York

    MATH  Google Scholar 

  20. Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. SIGMOD Rec 23(2):419–429

    Article  Google Scholar 

  21. Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286

    Article  MATH  Google Scholar 

  22. Keogh EJ, Pazzani MJ (2000) Scaling up dynamic time warping for datamining applications. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, Boston, pp 285–289

  23. Agrawal R, Psaila G, Wimmers EL, Zaït M (1995) Querying shapes of histories. In: Proceedings of the 21th International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc, pp 502–514

  24. Keogh E, Lin J, Fu A (2005) Hot sax: efficiently finding the most unusual time series subsequence. In: Data mining, IEEE international conference on, Los Alamitos, pp 226–233

  25. Robin S, Rodolphe F, Schbath S (2005) DNA, words and models. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  26. Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, p 11

  27. Shieh J, Keogh E (2008) iSAX: indexing and mining terabyte sized time series. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, Las Vegas, pp 623–631

  28. Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144

    Article  MathSciNet  Google Scholar 

  29. Van Der Aalst WMP (2011) Process mining: discovery, conformance and enhancement of business processes. Springer, New York

    Book  MATH  Google Scholar 

  30. Koller D, Friedman N (2009) Probabilistic graphical models. MIT press, Cambridge

    MATH  Google Scholar 

  31. Cassandras CG, Lafortune S (2008) Introduction to discrete event systems, 2nd edn. Springer, New York

    Book  MATH  Google Scholar 

  32. Arvey AJ, Azad RK, Raval A, Lawrence JG (2009) Detection of genomic islands via segmental genome heterogeneity. Nucleic Acids Res 37(16):5255–5266

    Article  Google Scholar 

  33. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7):107–117

    Article  Google Scholar 

  34. Majtey AP, Lamberti PW, Prato DP (2005) Jensen-Shannon divergence as a measure of distinguishability between mixed quantum states. Phys Rev A 72(5):052310

    Article  Google Scholar 

  35. Lamberti PW, Majtey AP, Borras A, Casas M, Plastino A (2008) Metric character of the quantum Jensen-Shannon divergence. Phys Rev A 77(5):052311

    Article  Google Scholar 

  36. Patil GP (2002) Weighted distributions. John Wiley & Sons, Ltd

  37. Herzel H, Große I (1997) Correlations in DNA sequences: the role of protein coding segments. Phys Rev E 55(1):800–810

    Article  Google Scholar 

  38. R Core Team (2014) R: a Language and environment for statistical computing, R Foundation for Statistical Computing, Vienna. http://www.R-project.org/

  39. Y. Kang, Supplemental contents for process monitoring using maximum sequence divergence. http://ykang.info/ProcessMon/index.html

  40. Lake JA (1994) Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc Natl Acad Sci US Am 91(4):1455–1459

    Article  Google Scholar 

  41. Budalakoti S, Budalakoti S, Srivastava AN, Otey ME, Otey ME (2009) Anomaly detection and diagnosis algorithms for discrete symbol sequences with applications to airline safety. IEEE Trans Syst Man Cybern Part C 39(1):101–113

    Article  Google Scholar 

  42. Jiang T, Li M (1995) On the approximation of shortest common supersequences and longest common subsequences. SIAM J Compu 24(5):1122–1139

    Article  MathSciNet  MATH  Google Scholar 

  43. Srivastava A, Sahami M (2009) Text mining: classification, clustering, and applications, vol 10. Chapman & Hall/CRC, Boca Raton

    MATH  Google Scholar 

  44. Durbin R (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  45. “UCI Machine Learning Repository,” Ozone level detection data set. http://archive.ics.uci.edu/ml/datasets/Ozone+Level+Detection. Accessed 20 Dec 2013]

  46. Zhang K, Fan W (2008) Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond. Knowl Inf Syst 14(3):299–326

    Article  Google Scholar 

  47. Chandola V, Banerjee A, Kumar V (2012) Anomaly detection for discrete sequences: a survey. IEEE Trans Knowl Data Eng 24(5):823–839

    Article  Google Scholar 

Download references

Acknowledgments

We would like to thank editor and anonymous reviewers for their time and feedback that help improve this paper. We would also like to thank Dr. Tatjana Welzer in the Institute of Informatics, University of Maribor, for her advice on data integration and analysis.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yihuang Kang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kang, Y., Zadorozhny, V. Process monitoring using maximum sequence divergence. Knowl Inf Syst 48, 81–109 (2016). https://doi.org/10.1007/s10115-015-0858-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-015-0858-z

Keywords