[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh – A Python package)

Published: 13 September 2018 Publication History

Abstract

Time series feature engineering is a time-consuming process because scientists and engineers have to consider the multifarious algorithms of signal processing and time series analysis for identifying and extracting meaningful features from time series. The Python package tsfresh (Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests) accelerates this process by combining 63 time series characterization methods, which by default compute a total of 794 time series features, with feature selection on basis automatically configured hypothesis tests. By identifying statistically significant time series characteristics in an early stage of the data science process, tsfresh closes feedback loops with domain experts and fosters the development of domain specific features early on. The package implements standard APIs of time series and machine learning libraries (e.g. pandas and scikit-learn) and is designed for both exploratory analyses as well as straightforward integration into operational data science applications.

References

[1]
J. Gubbi, R. Buyya, S. Marusic, M. Palaniswami, Internet of Things (IoT): a vision, architectural elements, and future directions, Future Gener. Comput. Syst. 29 (7) (2013) 1645–1660.
[2]
M. Hermann, T. Pentek, B. Otto, Design principles for Industrie 4.0 scenarios, Proceedings of 2016 49th Hawaii International Conference on System Sciences (HICSS), 2016, p. 3928.
[3]
F.S. Collins, H. Varmus, A new initiative on precision medicine, N. Engl. J. Med. 372 (9) (2015) 793–795,.
[4]
R.K. Mobley, An introduction to predictive maintenance, second, Elsevier Inc., Woburn, MA, 2002.
[5]
B.D. Fulcher, M.A. Little, N.S. Jones, Highly comparative time-series analysis: the empirical structure of time series and their methods, J. R. Soc. Interface 10 (83) (2013) 20130048.
[6]
M. Christ, F. Kienle, A.W. Kempa-Liehr, Time series analysis in industrial applications, Proceedings of Workshop on Extreme Value and Time Series Analysis, KIT Karlsruhe, 2016,.
[7]
A.L. Buczak, E. Guven, A survey of data mining and machine learning methods for cyber security intrusion detection, IEEE Commun. Surv. Tutor. 18 (2) (2016) 1153–1176.
[8]
J. Wiens, E. Horvitz, J.V. Guttag, Patient risk stratification for hospital-associated c. diff as a time-series classification task, in: F. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 25, Curran Associates, Inc., 2012, pp. 467–475.
[9]
J. Yan, Machinery Prognostics and Prognosis Oriented Maintenance Management, John Wiley & Sons, Singapore, 2015.
[10]
M. Christ, J. Krumeich, A.W. Kempa-Liehr, Integrating predictive analytics into complex event processing by using conditional density estimations, Proceedings of IEEE 20th International Enterprise Distributed Object Computing Workshop (EDOCW), IEEE Computer Society, Los Alamitos, CA, USA, 2016, pp. 1–8,.
[11]
A. Kempa-Liehr, Performance analysis of concurrent workflows, J. Big Data 2 (10) (2015) 1–14,.
[12]
M. Christ, A.W. Kempa-Liehr, M. Feindt, Distributed and parallel time series feature extraction for industrial big data applications. Asian Machine Learning Conference (ACML) 2016, Workshop on Learning on Big Data (WLBD), Hamilton (New Zealand), ArXiv preprint arXiv: 1610.07717v1.
[13]
G.E.P. Box, G.M. Jenkins, G.C. Reinsel, G.M. Ljung, Time Series Analysis: Forecasting and Control, fifth ed., John Wiley & Sons, Hoboken, New Jersey, 2016.
[14]
C.M. Bishop, Pattern Recognition and Machine Learning, Information Science and Statistics, Springer, New York, 2006.
[15]
B.D. Fulcher, Feature-Based Time-Series Analysis, Cornell University Library, 2017.
[16]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in Python, J. Mach. Learn. Res. 12 (2011) 2825–2830.
[17]
S.V.D. Walt, S.C. Colbert, G. Varoquaux, The numpy array: a structure for efficient numerical computation, Comput. Sci. Eng. 13 (2) (2011) 22–30.
[18]
W. McKinney, Data structures for statistical computing in Python, in: S. van der Walt, J. Millman (Eds.), Proceedings of the 9th Python in Science Conference, 2010, pp. 51–56.
[19]
E. Jones, T. Oliphant, P. Peterson, et al., SciPy: open source scientific tools for Python, 2001. http://www.scipy.org/.
[21]
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, Tensorflow: Large-scale machine learning on heterogeneous systems, 2015, http://tensorflow.org/.
[22]
M. Rocklin, Dask: Parallel computation with blocked algorithms and task scheduling, in: K. Huff, J. Bergstra (Eds.), Proceedings of the 14th Python in Science Conference, 2015, pp. 130–136.
[23]
L.S. Lopes, Robot learning at the task level: a study in the assembly domain, Ph.D. thesis, Universidade Nova de Lisboa, Portugal, 1997.
[24]
M. Lichman, UCI machine learning repository, 2013. http://archive.ics.uci.edu/ml.
[25]
A.W. Liehr, Dissipative solitons in reaction diffusion systems, Mechanisms, Dynamics, Interaction, 70, Springer Series in Synergetics, Berlin, 2013.
[26]
B.D. Fulcher, N.S. Jones, hctsa: a computational framework for automated time-series phenotyping using massive feature extraction, Cell Syst. 5 (5) (2017) 527–531.

Cited By

View all
  • (2024)Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and DetectionProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663004(1445-1453)Online publication date: 6-May-2024
  • (2024)AutoTSAD: Unsupervised Holistic Anomaly Detection for Time Series DataProceedings of the VLDB Endowment10.14778/3681954.368197817:11(2987-3002)Online publication date: 1-Jul-2024
  • (2024)Practically High Performant Neural Adaptive Video StreamingProceedings of the ACM on Networking10.1145/36964012:CoNEXT4(1-23)Online publication date: 25-Nov-2024
  • Show More Cited By

Index Terms

  1. Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh – A Python package)
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Neurocomputing
        Neurocomputing  Volume 307, Issue C
        Sep 2018
        227 pages

        Publisher

        Elsevier Science Publishers B. V.

        Netherlands

        Publication History

        Published: 13 September 2018

        Author Tags

        1. Feature engineering
        2. Time series
        3. Feature extraction
        4. Feature selection
        5. Machine learning

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 13 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and DetectionProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663004(1445-1453)Online publication date: 6-May-2024
        • (2024)AutoTSAD: Unsupervised Holistic Anomaly Detection for Time Series DataProceedings of the VLDB Endowment10.14778/3681954.368197817:11(2987-3002)Online publication date: 1-Jul-2024
        • (2024)Practically High Performant Neural Adaptive Video StreamingProceedings of the ACM on Networking10.1145/36964012:CoNEXT4(1-23)Online publication date: 25-Nov-2024
        • (2024)Estimating Video Quality Using Coarse-Grained Features: Insights and Limitations from Gaussian Mixture ModelsProceedings of the 20th International Conference on emerging Networking EXperiments and Technologies10.1145/3680121.3697811(10-16)Online publication date: 9-Dec-2024
        • (2024)Boxing Gesture Recognition in Real-Time using Earable IMUsCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3680524(673-678)Online publication date: 5-Oct-2024
        • (2024)BioSys: Efficient Quality Control System for Manufacturing of Sustainable Biopolymer CompositesProceedings of the 11th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation10.1145/3671127.3698165(11-21)Online publication date: 29-Oct-2024
        • (2024)A Bayesian LSTM Based Active Anomaly Detection Service for Large Online SystemsProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3674818(407-416)Online publication date: 24-Jul-2024
        • (2024)M3BAT: Unsupervised Domain Adaptation for Multimodal Mobile Sensing with Multi-Branch Adversarial TrainingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595918:2(1-30)Online publication date: 15-May-2024
        • (2024)DeepHYDRA: A Hybrid Deep Learning and DBSCAN-Based Approach to Time-Series Anomaly Detection in Dynamically-Configured SystemsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656637(272-285)Online publication date: 30-May-2024
        • (2024)SeamSleeve: Robust Arm Movement Sensing through Powered StitchingProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3660726(1134-1147)Online publication date: 1-Jul-2024
        • Show More Cited By

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media