[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Query-sensitive distance measure selection for time series nearest neighbor classification

Published: 18 January 2016 Publication History

Abstract

Many distance or similarity measures have been proposed for time series similarity search. However, none of these measures is guaranteed to be optimal when used for 1-Nearest Neighbor (NN) classification. In this paper we study the problem of selecting the most appropriate distance measure, given a pool of time series distance measures and a query, so as to perform NN classification of the query. We propose a framework for solving this problem, by identifying, given the query, the distance measure most likely to produce the correct classification result for that query. From this proposed framework, we derive three specific methods, that differ from each other in the way they estimate the probability that a distance measure correctly classifies a query object. In our experiments, our pool of measures consists of Dynamic Time Warping (DTW), Move-Split-Merge (MSM), and Edit distance with Real Penalty (ERP). Based on experimental evaluation with 45 datasets, the best-performing of the three proposed methods provides the best results in terms of classification error rate, compared to the competitors, which include using the Cross Validation method for selecting the distance measure in each dataset, as well as using a single specific distance measure (DTW, MSM, or ERP) across all datasets.

References

[1]
Agrawal R., Faloutsos C. and Swami A., Efficient similarity search in sequence databases, in: In Proc of FODO, Springer Verlag (1993), 69-84.
[2]
Athitsos V., Hadjieleftheriou M., Kollios G. and Sclaroff S., Query-sensitive embeddings, ACM Trans Database Syst 32(2) (2007).
[3]
Athitsos V., Papapetrou P., Potamias M., Kollios G. and Gunopulos D., Approximate embedding-based subsequence matching of time series, in: SIGMOD (2008), 365-378.
[4]
Bellman R., The theory of dynamic programming, Bull Amer Math Soc 60(6) (1954), 503-515.
[5]
Bergroth L., Hakonen H. and Raita T., A survey of longest common subsequence algorithms, in: Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00), SPIRE '00, (2000), 39.
[6]
Bollobás B., Das G., Gunopulos D. and Mannila H., Time-series similarity problems and well-separated geometric sets, in: Symposium on Computational Geometry (1997), 454-456.
[7]
Chen L. and Ng R., On the marriage of l_p-norms and edit distance, in: VLDB (2004), 792-803.
[8]
Chen L. and Özsu M.T., Robust and fast similarity search for moving object trajectories, in: SIGMOD (2005), 491-502.
[9]
Chen Y., Nascimento M.A., Chin B., Anthony O. and Tung K.H., Spade: On shape-based pattern detection in streaming time series, in: Proceedings of the IEEE International Conference of Data Engineering (ICDE), (2007), 786-795.
[10]
Cox D., Principles of Statistical Inference, Cambridge University Press, 2006.
[11]
Domeniconi C., Peng J. and Gunopulos D., Locally adaptive metric nearest-neighbor classification, Pattern Analysis and Machine Intelligence, IEEE Transactions on 24(9) (2002), 1281-1285.
[12]
Hastie T. and Tibshirani R., Discriminant adaptive nearest neighbor classification, Pattern Analysis and Machine Intelligence, IEEE Transactions on 18(6) (1996), 607-616.
[13]
Hu B., Chen Y. and Keogh E., Time series classification under more realistic assumptions, in: SDM (2012), 578-586.
[14]
Keogh E., Exact indexing of dynamic time warping, in: VLDB (2002), 406-417.
[15]
Keogh E., Zhu Q., Hu B., Hao Y., Xi X., Wei L. and Ratanamahatana C., The UCR time series classification/clustering homepage: www.cs.ucr.edu/eamonn/time_series_data/, 2011.
[16]
Kotsifakos A., Papapetrou P., Hollmén J. and Gunopulos D., A subsequence matching with gaps-range-tolerances framework: A query-by-humming application, PVLDB 4(11) (2011), 761-771.
[17]
Kruskall J.B. and Liberman M., The symmetric time warping algorithm: From continuous to discrete, in: Time Warps, Addison-Wesley (1983).
[18]
Levenshtein V.I., Binary codes capable of correcting deletions, insertions, and reversals, Soviet Physics 10(8) (1966), 707-710.
[19]
Lin J., Keogh E., Lonardi S. and Chiu B., A symbolic representation of time series, with implications for streaming algorithms, in: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD '03, (2003), 2-11.
[20]
Lin J., Khade R. and Li Y., Rotation-invariant similarity in time series using bag-of-patterns representation, J Intell Inf Syst 39(2) (2012), 287-315.
[21]
Lines J., Davis L.M., Hills J. and Bagnall A., A shapelet transform for time series classification, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, (2012), 289-297.
[22]
Marteau P.-F., Time warp edit distance with stiffness adjustment for time series matching, Pattern Analysis and Machine Intelligence, IEEE Transactions on 31(2) (2009), 306-318.
[23]
Mueen A., Keogh E. and Young N., Logical-shapelets: An expressive primitive for time series classification, in: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, (2011), 1154-1162.
[24]
Sakoe H. and Chiba S., Dynamic programming algorithm optimization for spoken word recognition, Transactions on ASSP 26 (1978), 43-49.
[25]
Stefan A., Athitsos V. and Das G., The move-split-merge metric for time series, Transactions on Knowledge and Data Engineering, (2012).
[26]
Unal E., Chew E., Georgiou P. and Narayanan S., Challenging uncertainty in query by humming systems: A fingerprinting approach, Transactions on Audio Speech and Language Processing 16(2) (2008), 359-371.
[27]
Wang X., Mueen A., Ding H., Trajcevski G., Scheuermann P. and Keogh E.J., Experimental comparison of representation methods and distance measures for time series data, Data Minining and Knowledge Discovery 26(2) (2013), 275-309.
[28]
Xing Z., Pei J., Yu P.S. and Wang K., Extracting interpretable features for early classification on time series, in: SDM (2011), 247-258.
[29]
Ye L. and Keogh E., Time series shapelets: A novel technique that allows accurate, interpretable and fast classification, Data Min Knowl Discov 22(1-2) (2011), 149-182.

Cited By

View all

Index Terms

  1. Query-sensitive distance measure selection for time series nearest neighbor classification
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Intelligent Data Analysis
        Intelligent Data Analysis  Volume 20, Issue 1
        Jan 2016
        211 pages

        Publisher

        IOS Press

        Netherlands

        Publication History

        Published: 18 January 2016

        Author Tags

        1. Time series
        2. classification
        3. distance measures

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 16 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media