[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh – A Python package)

Published: 13 September 2018 Publication History

Abstract

Time series feature engineering is a time-consuming process because scientists and engineers have to consider the multifarious algorithms of signal processing and time series analysis for identifying and extracting meaningful features from time series. The Python package tsfresh (Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests) accelerates this process by combining 63 time series characterization methods, which by default compute a total of 794 time series features, with feature selection on basis automatically configured hypothesis tests. By identifying statistically significant time series characteristics in an early stage of the data science process, tsfresh closes feedback loops with domain experts and fosters the development of domain specific features early on. The package implements standard APIs of time series and machine learning libraries (e.g. pandas and scikit-learn) and is designed for both exploratory analyses as well as straightforward integration into operational data science applications.

References

[1]
J. Gubbi, R. Buyya, S. Marusic, M. Palaniswami, Internet of Things (IoT): a vision, architectural elements, and future directions, Future Gener. Comput. Syst. 29 (7) (2013) 1645–1660.
[2]
M. Hermann, T. Pentek, B. Otto, Design principles for Industrie 4.0 scenarios, Proceedings of 2016 49th Hawaii International Conference on System Sciences (HICSS), 2016, p. 3928.
[3]
F.S. Collins, H. Varmus, A new initiative on precision medicine, N. Engl. J. Med. 372 (9) (2015) 793–795,.
[4]
R.K. Mobley, An introduction to predictive maintenance, second, Elsevier Inc., Woburn, MA, 2002.
[5]
B.D. Fulcher, M.A. Little, N.S. Jones, Highly comparative time-series analysis: the empirical structure of time series and their methods, J. R. Soc. Interface 10 (83) (2013) 20130048.
[6]
M. Christ, F. Kienle, A.W. Kempa-Liehr, Time series analysis in industrial applications, Proceedings of Workshop on Extreme Value and Time Series Analysis, KIT Karlsruhe, 2016,.
[7]
A.L. Buczak, E. Guven, A survey of data mining and machine learning methods for cyber security intrusion detection, IEEE Commun. Surv. Tutor. 18 (2) (2016) 1153–1176.
[8]
J. Wiens, E. Horvitz, J.V. Guttag, Patient risk stratification for hospital-associated c. diff as a time-series classification task, in: F. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 25, Curran Associates, Inc., 2012, pp. 467–475.
[9]
J. Yan, Machinery Prognostics and Prognosis Oriented Maintenance Management, John Wiley & Sons, Singapore, 2015.
[10]
M. Christ, J. Krumeich, A.W. Kempa-Liehr, Integrating predictive analytics into complex event processing by using conditional density estimations, Proceedings of IEEE 20th International Enterprise Distributed Object Computing Workshop (EDOCW), IEEE Computer Society, Los Alamitos, CA, USA, 2016, pp. 1–8,.
[11]
A. Kempa-Liehr, Performance analysis of concurrent workflows, J. Big Data 2 (10) (2015) 1–14,.
[12]
M. Christ, A.W. Kempa-Liehr, M. Feindt, Distributed and parallel time series feature extraction for industrial big data applications. Asian Machine Learning Conference (ACML) 2016, Workshop on Learning on Big Data (WLBD), Hamilton (New Zealand), ArXiv preprint arXiv: 1610.07717v1.
[13]
G.E.P. Box, G.M. Jenkins, G.C. Reinsel, G.M. Ljung, Time Series Analysis: Forecasting and Control, fifth ed., John Wiley & Sons, Hoboken, New Jersey, 2016.
[14]
C.M. Bishop, Pattern Recognition and Machine Learning, Information Science and Statistics, Springer, New York, 2006.
[15]
B.D. Fulcher, Feature-Based Time-Series Analysis, Cornell University Library, 2017.
[16]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in Python, J. Mach. Learn. Res. 12 (2011) 2825–2830.
[17]
S.V.D. Walt, S.C. Colbert, G. Varoquaux, The numpy array: a structure for efficient numerical computation, Comput. Sci. Eng. 13 (2) (2011) 22–30.
[18]
W. McKinney, Data structures for statistical computing in Python, in: S. van der Walt, J. Millman (Eds.), Proceedings of the 9th Python in Science Conference, 2010, pp. 51–56.
[19]
E. Jones, T. Oliphant, P. Peterson, et al., SciPy: open source scientific tools for Python, 2001. http://www.scipy.org/.
[21]
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, Tensorflow: Large-scale machine learning on heterogeneous systems, 2015, http://tensorflow.org/.
[22]
M. Rocklin, Dask: Parallel computation with blocked algorithms and task scheduling, in: K. Huff, J. Bergstra (Eds.), Proceedings of the 14th Python in Science Conference, 2015, pp. 130–136.
[23]
L.S. Lopes, Robot learning at the task level: a study in the assembly domain, Ph.D. thesis, Universidade Nova de Lisboa, Portugal, 1997.
[24]
M. Lichman, UCI machine learning repository, 2013. http://archive.ics.uci.edu/ml.
[25]
A.W. Liehr, Dissipative solitons in reaction diffusion systems, Mechanisms, Dynamics, Interaction, 70, Springer Series in Synergetics, Berlin, 2013.
[26]
B.D. Fulcher, N.S. Jones, hctsa: a computational framework for automated time-series phenotyping using massive feature extraction, Cell Syst. 5 (5) (2017) 527–531.

Cited By

View all
  • (2025)Trustworthy AI-based Performance Diagnosis Systems for Cloud Applications: A ReviewACM Computing Surveys10.1145/370174057:5(1-37)Online publication date: 9-Jan-2025
  • (2025)A Cooperative Game Theory-Based Feature Selection for Efficient Hand Grasp Classification Using Minimal Number of sEMG SignalsACM Transactions on Computing for Healthcare10.1145/37001436:1(1-22)Online publication date: 8-Jan-2025
  • (2025)Bending classification from interference signals of a fiber optic sensor using shallow learning and convolutional neural networksPattern Recognition Letters10.1016/j.patrec.2024.06.029186:C(354-360)Online publication date: 30-Jan-2025
  • Show More Cited By

Index Terms

  1. Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh – A Python package)
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Neurocomputing
        Neurocomputing  Volume 307, Issue C
        Sep 2018
        227 pages

        Publisher

        Elsevier Science Publishers B. V.

        Netherlands

        Publication History

        Published: 13 September 2018

        Author Tags

        1. Feature engineering
        2. Time series
        3. Feature extraction
        4. Feature selection
        5. Machine learning

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 05 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2025)Trustworthy AI-based Performance Diagnosis Systems for Cloud Applications: A ReviewACM Computing Surveys10.1145/370174057:5(1-37)Online publication date: 9-Jan-2025
        • (2025)A Cooperative Game Theory-Based Feature Selection for Efficient Hand Grasp Classification Using Minimal Number of sEMG SignalsACM Transactions on Computing for Healthcare10.1145/37001436:1(1-22)Online publication date: 8-Jan-2025
        • (2025)Bending classification from interference signals of a fiber optic sensor using shallow learning and convolutional neural networksPattern Recognition Letters10.1016/j.patrec.2024.06.029186:C(354-360)Online publication date: 30-Jan-2025
        • (2025)Clustering and Interpretation of time-series trajectories of chronic pain using evidential c-meansExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.125369260:COnline publication date: 15-Jan-2025
        • (2025)Using Decision Trees to Predict Insolvency in Spanish SMEs: Is Early Warning Possible?Computational Economics10.1007/s10614-024-10586-565:1(91-116)Online publication date: 1-Jan-2025
        • (2024)Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and DetectionProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663004(1445-1453)Online publication date: 6-May-2024
        • (2024)Fully Automated Correlated Time Series Forecasting in MinutesProceedings of the VLDB Endowment10.14778/3705829.370583518:2(144-157)Online publication date: 1-Oct-2024
        • (2024)AutoTSAD: Unsupervised Holistic Anomaly Detection for Time Series DataProceedings of the VLDB Endowment10.14778/3681954.368197817:11(2987-3002)Online publication date: 1-Jul-2024
        • (2024)VETT: VectorDB-Enabled Transfer-Learning for Time-Series ForecastingProceedings of the 4th International Conference on AI-ML Systems10.1145/3703412.3704452(1-9)Online publication date: 8-Oct-2024
        • (2024)Data Augmentation Strategies for Improving Time Series Classification AccuracyProceedings of the 2024 7th International Conference on Robot Systems and Applications10.1145/3702468.3702479(51-58)Online publication date: 12-Sep-2024
        • Show More Cited By

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media