[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Two Novel Techniques to Improve MDL-Based Semi-Supervised Classification of Time Series

  • Chapter
  • First Online:
Transactions on Computational Collective Intelligence XXV

Part of the book series: Lecture Notes in Computer Science ((TCCI,volume 9990))

Abstract

Semi-supervised classification problem arises in the situation that we just have a small amount of labeled instances in the training set. One method to classify the new time series in such situation is that; firstly we need to use self-training to classify the unlabeled instances in the training set. Then, we use the output training set to classify the new time series. In this paper, we propose two novel improvements for Minimum Description Length-based semi-supervised classification of time series: an improvement technique for Minimum Description Length-based stopping criterion and a refinement step to make the classifier more accurate. Our first improvement applies the non-linear alignment between two time series when we compute Reduced Description Length of one time series exploiting the information from the other. The second improvement is a post-processing step that aims to identify the class boundary between positive and negative instances accurately. For the second improvement, we propose an algorithm called Refinement that attempts to identify the wrongly classified instances in the self-training step; then it reclassifies these instances. We compare our method with some previous methods. Experimental results show that our two improvements can construct more accurate semi-supervised time series classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Begum, N., Hu, B., Rakthanmanon, T., Keogh, E.J.: Towards a minimum description length based stopping criterion for semi-supervised time series classification. In: IEEE 14th International Conference on Information Reuse and Integration, IRI 2013, San Francisco, CA, USA, August 14–16, 2013, pp. 333–340 (2013)

    Google Scholar 

  2. Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: Knowledge Discovery in Databases: Papers from the 1994 AAAI Workshop, Seattle, Washington, July 1994. Technical report WS-94-03, pp. 359–370 (1994)

    Google Scholar 

  3. Keogh, E.J., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2005)

    Article  Google Scholar 

  4. Chen, Y., Keogh, E., Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G.: The UCR time series classification archive, July 2015. www.cs.ucr.edu/~eamonn/time_series_data/

  5. Pelleg, D., Moore, A.W.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29–July 2, 2000, pp. 727–734 (2000)

    Google Scholar 

  6. Ratanamahatana, C.A., Wanichsan, D.: Stopping criterion selection for efficient semi-supervised time series classification. In: Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 1–14 (2008)

    Google Scholar 

  7. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978)

    Article  MATH  Google Scholar 

  8. Wei, L., Keogh, E.J.: Semi-supervised time series classification. In: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20–23, 2006, pp. 748–753 (2006)

    Google Scholar 

  9. Begum, N.: Minimum description length based stopping criterion for semi-supervised time series classification (2013). www.cs.ucr.edu/~nbegu001/SSL_myMDL.htm

  10. Marussy, K., Buza, K.: SUCCESS: a new approach for semi-supervised classification of time-series. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2013, Part I. LNCS, vol. 7894, pp. 437–447. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38658-9_39

    Chapter  Google Scholar 

  11. Nguyen, M.N., Li, X.L., Ng, S.K.: Positive unlabeled learning for time series classification. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI 2011, vol. 2, pp. 1421–1426. AAAI Press (2011)

    Google Scholar 

  12. Nguyen, M., Li, X.-L., Ng, S.-K.: Ensemble based positive unlabeled learning for time series classification. In: Lee, S., Peng, Z., Zhou, X., Moon, Y.-S., Unland, R., Yoo, J. (eds.) DASFAA 2012, Part I. LNCS, vol. 7238, pp. 243–257. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29038-1_19

    Chapter  Google Scholar 

  13. Batista, G.E.A.P.A., Keogh, E.J., Tataw, O.M., de Souza, V.M.A.: CID: an efficient complexity-invariant distance for time series. Data Min. Knowl. Discov. 28(3), 634–669 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  14. Vinh, V.T., Anh, D.T.: Compression rate distance measure for time series. In: 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015, Campus des Cordeliers, Paris, France, October 19–21, 2015, pp. 1–10 (2015)

    Google Scholar 

  15. Vinh, V.T., Anh, D.T.: Constraint-based MDL principle for semi-supervised classification of time series. In: 2015 Seventh International Conference on Knowledge and Systems Engineering, KSE 2015, Ho Chi Minh City, Vietnam, October 8–10, 2015, pp. 43–48 (2015)

    Google Scholar 

  16. Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., Keogh, E.J.: Querying and mining of time series data: experimental comparison of representations and distance measures. PVLDB 1(2), 1542–1552 (2008)

    Google Scholar 

  17. Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)

    Article  MATH  Google Scholar 

  18. Tanaka, Y., Iwamoto, K., Uehara, K.: Discovery of time-series motif from multidimensional data based on MDL principle. Mach. Learn. 58(2–3), 269–300 (2005)

    Article  MATH  Google Scholar 

  19. Rakthanmanon, T., Keogh, E.J., Lonardi, S., Evans, S.: MDL-based time series clustering. Knowl. Inf. Syst. 33(2), 371–399 (2012)

    Article  Google Scholar 

  20. Schwarz, G.E.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  21. Shokoohi-Yekta, M., Chen, Y., Campana, B.J.L., Hu, B., Zakaria, J., Keogh, E.J.: Discovery of meaningful rules in time series. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10–13, 2015, pp. 1085–1094 (2015)

    Google Scholar 

Download references

Acknowledgment

We would like to thank Prof. Eamonn Keogh and Nurjahan Begum for kindly sharing the datasets which help us in constructing the experiments in this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vo Thanh Vinh .

Editor information

Editors and Affiliations

Appendix A: Some More Experimental Results

Appendix A: Some More Experimental Results

This section shows the experimental results of X-means-classifier which was used to support the Refinement step shown in Subsect. 4.4. Table 4 illustrates the precision, recall and F-measure of X-means classifier. The experiments reveal that X-means classifier gives good results in some datasets such as in Synthetic Control F-measure = 100 %, in Symbols F-measure = 94.118 %, in Gun Point F-measure = 71.795 %, in Fish F-measure = 71.795 %. Specially, in Synthetic Control, the result is perfect F-measure = 100 % (totally exact).

Table 4. Semi-supervised classification of time series by X-means-Classifier

In Table 5, we show the execution time (seconds) of some algorithms: Refinement with Improved MDL based stopping criterion, Refinement with Ratanamahatana and Wanichsan’s stopping criterion, and X-means-classifier. Note that these figures do not include the execution time of Self-Learning process.

Table 5. The execution time (seconds) of each algorithm

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag GmbH Germany

About this chapter

Cite this chapter

Vinh, V.T., Anh, D.T. (2016). Two Novel Techniques to Improve MDL-Based Semi-Supervised Classification of Time Series. In: Nguyen, N., Kowalczyk, R., Orłowski, C., Ziółkowski, A. (eds) Transactions on Computational Collective Intelligence XXV. Lecture Notes in Computer Science(), vol 9990. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-53580-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-53580-6_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-53579-0

  • Online ISBN: 978-3-662-53580-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics