Abstract
Semi-supervised classification problem arises in the situation that we just have a small amount of labeled instances in the training set. One method to classify the new time series in such situation is that; firstly we need to use self-training to classify the unlabeled instances in the training set. Then, we use the output training set to classify the new time series. In this paper, we propose two novel improvements for Minimum Description Length-based semi-supervised classification of time series: an improvement technique for Minimum Description Length-based stopping criterion and a refinement step to make the classifier more accurate. Our first improvement applies the non-linear alignment between two time series when we compute Reduced Description Length of one time series exploiting the information from the other. The second improvement is a post-processing step that aims to identify the class boundary between positive and negative instances accurately. For the second improvement, we propose an algorithm called Refinement that attempts to identify the wrongly classified instances in the self-training step; then it reclassifies these instances. We compare our method with some previous methods. Experimental results show that our two improvements can construct more accurate semi-supervised time series classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Begum, N., Hu, B., Rakthanmanon, T., Keogh, E.J.: Towards a minimum description length based stopping criterion for semi-supervised time series classification. In: IEEE 14th International Conference on Information Reuse and Integration, IRI 2013, San Francisco, CA, USA, August 14–16, 2013, pp. 333–340 (2013)
Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: Knowledge Discovery in Databases: Papers from the 1994 AAAI Workshop, Seattle, Washington, July 1994. Technical report WS-94-03, pp. 359–370 (1994)
Keogh, E.J., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2005)
Chen, Y., Keogh, E., Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G.: The UCR time series classification archive, July 2015. www.cs.ucr.edu/~eamonn/time_series_data/
Pelleg, D., Moore, A.W.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29–July 2, 2000, pp. 727–734 (2000)
Ratanamahatana, C.A., Wanichsan, D.: Stopping criterion selection for efficient semi-supervised time series classification. In: Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 1–14 (2008)
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978)
Wei, L., Keogh, E.J.: Semi-supervised time series classification. In: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20–23, 2006, pp. 748–753 (2006)
Begum, N.: Minimum description length based stopping criterion for semi-supervised time series classification (2013). www.cs.ucr.edu/~nbegu001/SSL_myMDL.htm
Marussy, K., Buza, K.: SUCCESS: a new approach for semi-supervised classification of time-series. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2013, Part I. LNCS, vol. 7894, pp. 437–447. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38658-9_39
Nguyen, M.N., Li, X.L., Ng, S.K.: Positive unlabeled learning for time series classification. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI 2011, vol. 2, pp. 1421–1426. AAAI Press (2011)
Nguyen, M., Li, X.-L., Ng, S.-K.: Ensemble based positive unlabeled learning for time series classification. In: Lee, S., Peng, Z., Zhou, X., Moon, Y.-S., Unland, R., Yoo, J. (eds.) DASFAA 2012, Part I. LNCS, vol. 7238, pp. 243–257. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29038-1_19
Batista, G.E.A.P.A., Keogh, E.J., Tataw, O.M., de Souza, V.M.A.: CID: an efficient complexity-invariant distance for time series. Data Min. Knowl. Discov. 28(3), 634–669 (2014)
Vinh, V.T., Anh, D.T.: Compression rate distance measure for time series. In: 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015, Campus des Cordeliers, Paris, France, October 19–21, 2015, pp. 1–10 (2015)
Vinh, V.T., Anh, D.T.: Constraint-based MDL principle for semi-supervised classification of time series. In: 2015 Seventh International Conference on Knowledge and Systems Engineering, KSE 2015, Ho Chi Minh City, Vietnam, October 8–10, 2015, pp. 43–48 (2015)
Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., Keogh, E.J.: Querying and mining of time series data: experimental comparison of representations and distance measures. PVLDB 1(2), 1542–1552 (2008)
Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)
Tanaka, Y., Iwamoto, K., Uehara, K.: Discovery of time-series motif from multidimensional data based on MDL principle. Mach. Learn. 58(2–3), 269–300 (2005)
Rakthanmanon, T., Keogh, E.J., Lonardi, S., Evans, S.: MDL-based time series clustering. Knowl. Inf. Syst. 33(2), 371–399 (2012)
Schwarz, G.E.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Shokoohi-Yekta, M., Chen, Y., Campana, B.J.L., Hu, B., Zakaria, J., Keogh, E.J.: Discovery of meaningful rules in time series. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10–13, 2015, pp. 1085–1094 (2015)
Acknowledgment
We would like to thank Prof. Eamonn Keogh and Nurjahan Begum for kindly sharing the datasets which help us in constructing the experiments in this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix A: Some More Experimental Results
Appendix A: Some More Experimental Results
This section shows the experimental results of X-means-classifier which was used to support the Refinement step shown in Subsect. 4.4. Table 4 illustrates the precision, recall and F-measure of X-means classifier. The experiments reveal that X-means classifier gives good results in some datasets such as in Synthetic Control F-measure = 100 %, in Symbols F-measure = 94.118 %, in Gun Point F-measure = 71.795 %, in Fish F-measure = 71.795 %. Specially, in Synthetic Control, the result is perfect F-measure = 100 % (totally exact).
In Table 5, we show the execution time (seconds) of some algorithms: Refinement with Improved MDL based stopping criterion, Refinement with Ratanamahatana and Wanichsan’s stopping criterion, and X-means-classifier. Note that these figures do not include the execution time of Self-Learning process.
Rights and permissions
Copyright information
© 2016 Springer-Verlag GmbH Germany
About this chapter
Cite this chapter
Vinh, V.T., Anh, D.T. (2016). Two Novel Techniques to Improve MDL-Based Semi-Supervised Classification of Time Series. In: Nguyen, N., Kowalczyk, R., Orłowski, C., Ziółkowski, A. (eds) Transactions on Computational Collective Intelligence XXV. Lecture Notes in Computer Science(), vol 9990. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-53580-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-662-53580-6_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-53579-0
Online ISBN: 978-3-662-53580-6
eBook Packages: Computer ScienceComputer Science (R0)