Two Novel Techniques to Improve MDL-Based Semi-Supervised Classification of Time Series

Vo Thanh Vinh¹⁷ &
Duong Tuan Anh¹⁸

Part of the book series: Lecture Notes in Computer Science ((TCCI,volume 9990))

549 Accesses
2 Citations

Abstract

Semi-supervised classification problem arises in the situation that we just have a small amount of labeled instances in the training set. One method to classify the new time series in such situation is that; firstly we need to use self-training to classify the unlabeled instances in the training set. Then, we use the output training set to classify the new time series. In this paper, we propose two novel improvements for Minimum Description Length-based semi-supervised classification of time series: an improvement technique for Minimum Description Length-based stopping criterion and a refinement step to make the classifier more accurate. Our first improvement applies the non-linear alignment between two time series when we compute Reduced Description Length of one time series exploiting the information from the other. The second improvement is a post-processing step that aims to identify the class boundary between positive and negative instances accurately. For the second improvement, we propose an algorithm called Refinement that attempts to identify the wrongly classified instances in the self-training step; then it reclassifies these instances. We compare our method with some previous methods. Experimental results show that our two improvements can construct more accurate semi-supervised time series classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Some Novel Improvements for MDL-Based Semi-supervised Classification of Time Series

SUCCESS: A New Approach for Semi-supervised Classification of Time-Series

A Minimum Description Length Technique for Semi-Supervised Time Series Classification

References

Begum, N., Hu, B., Rakthanmanon, T., Keogh, E.J.: Towards a minimum description length based stopping criterion for semi-supervised time series classification. In: IEEE 14th International Conference on Information Reuse and Integration, IRI 2013, San Francisco, CA, USA, August 14–16, 2013, pp. 333–340 (2013)
Google Scholar
Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: Knowledge Discovery in Databases: Papers from the 1994 AAAI Workshop, Seattle, Washington, July 1994. Technical report WS-94-03, pp. 359–370 (1994)
Google Scholar
Keogh, E.J., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2005)
Article Google Scholar
Chen, Y., Keogh, E., Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G.: The UCR time series classification archive, July 2015. www.cs.ucr.edu/~eamonn/time_series_data/
Pelleg, D., Moore, A.W.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29–July 2, 2000, pp. 727–734 (2000)
Google Scholar
Ratanamahatana, C.A., Wanichsan, D.: Stopping criterion selection for efficient semi-supervised time series classification. In: Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 1–14 (2008)
Google Scholar
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978)
Article MATH Google Scholar
Wei, L., Keogh, E.J.: Semi-supervised time series classification. In: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20–23, 2006, pp. 748–753 (2006)
Google Scholar
Begum, N.: Minimum description length based stopping criterion for semi-supervised time series classification (2013). www.cs.ucr.edu/~nbegu001/SSL_myMDL.htm
Marussy, K., Buza, K.: SUCCESS: a new approach for semi-supervised classification of time-series. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2013, Part I. LNCS, vol. 7894, pp. 437–447. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38658-9_39
Chapter Google Scholar
Nguyen, M.N., Li, X.L., Ng, S.K.: Positive unlabeled learning for time series classification. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI 2011, vol. 2, pp. 1421–1426. AAAI Press (2011)
Google Scholar
Nguyen, M., Li, X.-L., Ng, S.-K.: Ensemble based positive unlabeled learning for time series classification. In: Lee, S., Peng, Z., Zhou, X., Moon, Y.-S., Unland, R., Yoo, J. (eds.) DASFAA 2012, Part I. LNCS, vol. 7238, pp. 243–257. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29038-1_19
Chapter Google Scholar
Batista, G.E.A.P.A., Keogh, E.J., Tataw, O.M., de Souza, V.M.A.: CID: an efficient complexity-invariant distance for time series. Data Min. Knowl. Discov. 28(3), 634–669 (2014)
Article MathSciNet MATH Google Scholar
Vinh, V.T., Anh, D.T.: Compression rate distance measure for time series. In: 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015, Campus des Cordeliers, Paris, France, October 19–21, 2015, pp. 1–10 (2015)
Google Scholar
Vinh, V.T., Anh, D.T.: Constraint-based MDL principle for semi-supervised classification of time series. In: 2015 Seventh International Conference on Knowledge and Systems Engineering, KSE 2015, Ho Chi Minh City, Vietnam, October 8–10, 2015, pp. 43–48 (2015)
Google Scholar
Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., Keogh, E.J.: Querying and mining of time series data: experimental comparison of representations and distance measures. PVLDB 1(2), 1542–1552 (2008)
Google Scholar
Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)
Article MATH Google Scholar
Tanaka, Y., Iwamoto, K., Uehara, K.: Discovery of time-series motif from multidimensional data based on MDL principle. Mach. Learn. 58(2–3), 269–300 (2005)
Article MATH Google Scholar
Rakthanmanon, T., Keogh, E.J., Lonardi, S., Evans, S.: MDL-based time series clustering. Knowl. Inf. Syst. 33(2), 371–399 (2012)
Article Google Scholar
Schwarz, G.E.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Article MathSciNet MATH Google Scholar
Shokoohi-Yekta, M., Chen, Y., Campana, B.J.L., Hu, B., Zakaria, J., Keogh, E.J.: Discovery of meaningful rules in time series. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10–13, 2015, pp. 1085–1094 (2015)
Google Scholar

Download references

Acknowledgment

We would like to thank Prof. Eamonn Keogh and Nurjahan Begum for kindly sharing the datasets which help us in constructing the experiments in this work.

Author information

Authors and Affiliations

Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam
Vo Thanh Vinh
Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam
Duong Tuan Anh

Authors

Vo Thanh Vinh
View author publications
You can also search for this author in PubMed Google Scholar
Duong Tuan Anh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vo Thanh Vinh .

Editor information

Editors and Affiliations

Department of Information Systems, Wrocław University of Technology, Wroclaw, Poland
Ngoc Thanh Nguyen
Swinburne University of Technology, Hawthorn, Victoria, Australia
Ryszard Kowalczyk
Gdansk School of Banking (WSB Gdańsk), Gdańsk, Poland
Cezary Orłowski
Gdansk School of Banking (WSB Gdańsk), Gdańsk, Poland
Artur Ziółkowski

Appendix A: Some More Experimental Results

This section shows the experimental results of X-means-classifier which was used to support the Refinement step shown in Subsect. 4.4. Table 4 illustrates the precision, recall and F-measure of X-means classifier. The experiments reveal that X-means classifier gives good results in some datasets such as in Synthetic Control F-measure = 100 %, in Symbols F-measure = 94.118 %, in Gun Point F-measure = 71.795 %, in Fish F-measure = 71.795 %. Specially, in Synthetic Control, the result is perfect F-measure = 100 % (totally exact).

Table 4. Semi-supervised classification of time series by X-means-Classifier

Full size table

In Table 5, we show the execution time (seconds) of some algorithms: Refinement with Improved MDL based stopping criterion, Refinement with Ratanamahatana and Wanichsan’s stopping criterion, and X-means-classifier. Note that these figures do not include the execution time of Self-Learning process.

Table 5. The execution time (seconds) of each algorithm

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Vinh, V.T., Anh, D.T. (2016). Two Novel Techniques to Improve MDL-Based Semi-Supervised Classification of Time Series. In: Nguyen, N., Kowalczyk, R., Orłowski, C., Ziółkowski, A. (eds) Transactions on Computational Collective Intelligence XXV. Lecture Notes in Computer Science(), vol 9990. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-53580-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-662-53580-6_8
Published: 28 September 2016
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-53579-0
Online ISBN: 978-3-662-53580-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Two Novel Techniques to Improve MDL-Based Semi-Supervised Classification of Time Series

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Some Novel Improvements for MDL-Based Semi-supervised Classification of Time Series

SUCCESS: A New Approach for Semi-supervised Classification of Time-Series

A Minimum Description Length Technique for Semi-Supervised Time Series Classification

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix A: Some More Experimental Results

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Two Novel Techniques to Improve MDL-Based Semi-Supervised Classification of Time Series

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Some Novel Improvements for MDL-Based Semi-supervised Classification of Time Series

SUCCESS: A New Approach for Semi-supervised Classification of Time-Series

A Minimum Description Length Technique for Semi-Supervised Time Series Classification

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix A: Some More Experimental Results

Appendix A: Some More Experimental Results

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation