GitHub - daniyalk20/PyPOTS: A python toolbox / library for data mining on partially-observed time series, supporting tasks of forecasting / imputation / classification / clustering on incomplete (irregularly-sampled) multivariate time series with missing values.

Welcome to PyPOTS

A Python Toolbox for Data Mining on Partially-Observed Time Series

⦿ Motivation: Due to all kinds of reasons like failure of collection sensors, communication error, and unexpected malfunction, missing values are common to see in time series from the real-world environment. This makes partially-observed time series (POTS) a pervasive problem in open-world modeling and prevents advanced data analysis. Although this problem is important, the area of data mining on POTS still lacks a dedicated toolkit. PyPOTS is created to fill in this blank.

⦿ Mission: PyPOTS is born to become a handy toolbox that is going to make data mining on POTS easy rather than tedious, to help engineers and researchers focus more on the core problems in their hands rather than on how to deal with the missing parts in their data. PyPOTS will keep integrating classical and the latest state-of-the-art data mining algorithms for partially-observed multivariate time series. For sure, besides various algorithms, PyPOTS is going to have unified APIs together with detailed documentation and interactive examples across algorithms as tutorials.

To make various open-source time-series datasets readily available to our users, PyPOTS gets supported by project TSDB (Time-Series DataBase), a toolbox making loading time-series datasets super easy!

Visit TSDB right now to know more about this handy tool 🛠! It now supports a total of 119 open-source datasets.

❖ Installation

Install the latest release from PyPI:

pip install pypots

Below is an example applying SAITS in PyPOTS to impute missing values in the dataset PhysioNet2012:

import numpy as np
from sklearn.preprocessing import StandardScaler
from pypots.data import load_specific_dataset, mcar, masked_fill
from pypots.imputation import SAITS
from pypots.utils.metrics import cal_mae
# Data preprocessing. Tedious, but PyPOTS can help. 🤓
data = load_specific_dataset('physionet_2012')  # PyPOTS will automatically download and extract it.
X = data['X']
num_samples = len(X['RecordID'].unique())
X = X.drop('RecordID', axis = 1)
X = StandardScaler().fit_transform(X.to_numpy())
X = X.reshape(num_samples, 48, -1)
X_intact, X, missing_mask, indicating_mask = mcar(X, 0.1) # hold out 10% observed values as ground truth
X = masked_fill(X, 1 - missing_mask, np.nan)
# Model training. This is PyPOTS showtime. 💪
saits = SAITS(n_steps=48, n_features=37, n_layers=2, d_model=256, d_inner=128, n_head=4, d_k=64, d_v=64, dropout=0.1, epochs=10)
saits.fit(X)  # train the model. Here I use the whole dataset as the training set, because ground truth is not visible to the model.
imputation = saits.impute(X)  # impute the originally-missing values and artificially-missing values
mae = cal_mae(imputation, X_intact, indicating_mask)  # calculate mean absolute error on the ground truth (artificially-missing values)

❖ Available Algorithms

Task	Type	Algorithm	Year	Reference
Imputation	Neural Network	SAITS (Self-Attention-based Imputation for Time Series)	2022	¹
Imputation	Neural Network	Transformer	2017	² ¹
Imputation, Classification	Neural Network	BRITS (Bidirectional Recurrent Imputation for Time Series)	2018	³
Imputation	Naive	LOCF (Last Observation Carried Forward)	-	-
Classification	Neural Network	GRU-D	2018	⁴
Classification	Neural Network	Raindrop	2022	⁵
Clustering	Neural Network	CRLI (Clustering Representation Learning on Incomplete time-series data)	2021	⁶
Clustering	Neural Network	VaDER (Variational Deep Embedding with Recurrence)	2019	⁷
Forecasting	Probabilistic	BTTF (Bayesian Temporal Tensor Factorization)	2021	⁸

❖ Reference

If you find PyPOTS is helpful to your research, please cite it as below and ⭐️star this repository to make others notice this work. 🤗

@misc{du2022PyPOTS,
author = {Wenjie Du},
title = {{PyPOTS: A Python Toolbox for Data Mining on Partially-Observed Time Series}},
howpublished = {\url{https://github.com/wenjiedu/pypots}},
year = {2022},
doi = {10.5281/zenodo.6823222},
}

or

Wenjie Du. (2022). PyPOTS: A Python Toolbox for Data Mining on Partially-Observed Time Series. Zenodo. https://doi.org/10.5281/zenodo.6823222

❖ Attention 👀

The documentation and tutorials are under construction. And a short paper introducing PyPOTS is on the way! 🚀 Stay tuned please!

‼️ PyPOTS is currently under developing. If you like it and look forward to its growth, please give PyPOTS a star and watch it to keep you posted on its progress and to let me know that its development is meaningful. If you have any feedback, or want to contribute ideas/suggestions or share time-series related algorithms/papers, please join PyPOTS community and chat on , or create an issue. If you have any additional questions or have interests in collaboration, please take a look at my GitHub profile and feel free to contact me 🤝.

Thank you all for your attention! 😃

🏠 Visits

Du, W., Cote, D., & Liu, Y. (2022). SAITS: Self-Attention-based Imputation for Time Series. ArXiv, abs/2202.08516. ↩ ↩²
Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., & Polosukhin, I. (2017). Attention is All you Need. NeurIPS 2017. ↩
Cao, W., Wang, D., Li, J., Zhou, H., Li, L., & Li, Y. (2018). BRITS: Bidirectional Recurrent Imputation for Time Series. NeurIPS 2018. ↩
Che, Z., Purushotham, S., Cho, K., Sontag, D.A., & Liu, Y. (2018). Recurrent Neural Networks for Multivariate Time Series with Missing Values. Scientific Reports, 8. ↩
Zhang, X., Zeman, M., Tsiligkaridis, T., & Zitnik, M. (2022). Graph-Guided Network for Irregularly Sampled Multivariate Time Series. ICLR 2022. ↩
Ma, Q., Chen, C., Li, S., & Cottrell, G. W. (2021). Learning Representations for Incomplete Time Series Clustering. AAAI 2021. ↩
Jong, J.D., Emon, M.A., Wu, P., Karki, R., Sood, M., Godard, P., Ahmad, A., Vrooman, H.A., Hofmann-Apitius, M., & Fröhlich, H. (2019). Deep learning for clustering of multivariate clinical patient trajectories with missing values. GigaScience, 8. ↩
Chen, X., & Sun, L. (2021). Bayesian Temporal Factorization for Multidimensional Time Series Prediction. IEEE transactions on pattern analysis and machine intelligence, PP. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 188 Commits
.github/workflows		.github/workflows
docs		docs
pypots		pypots
tutorials		tutorials
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to PyPOTS

❖ Installation

❖ Available Algorithms

❖ Reference

❖ Attention 👀

About

Releases

Packages

Languages

License

daniyalk20/PyPOTS

Folders and files

Latest commit

History

Repository files navigation

Welcome to PyPOTS

❖ Installation

❖ Available Algorithms

❖ Reference

❖ Attention 👀

Footnotes

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages