[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
survey

A Comprehensive Survey on Rare Event Prediction

Published: 11 November 2024 Publication History

Abstract

Rare event prediction involves identifying and forecasting events with a low probability using machine learning (ML) and data analysis. Due to the imbalanced data distributions, where the frequency of common events vastly outweighs that of rare events, it requires using specialized methods within each step of the ML pipeline, that is, from data processing to algorithms to evaluation protocols. Predicting the occurrences of rare events is important for real-world applications, such as Industry 4.0, and is an active research area in statistical and ML. This article comprehensively reviews the current approaches for rare event prediction along four dimensions: rare event data, data processing, algorithmic approaches, and evaluation approaches. Specifically, we consider 73 datasets from different modalities (i.e., numerical, image, text, and audio), four major categories of data processing, five major algorithmic groupings, and two broader evaluation approaches. This article aims to identify gaps in the current literature and highlight the challenges of predicting rare events. It also suggests potential research directions, which can help guide practitioners and researchers.

Supplemental Material

PDF File
Supplemental Material for A Comprehensive Survey on Rare Event Prediction

References

[1]
1975. Access to Forecasts - ecmwf.int. Retrieved May 14, 2023 from https://www.ecmwf.int/en/forecasts/accessing-forecasts
[2]
1987. UCI Machine Learning Repository: Data Sets — archive.ics.uci.edu. Retrieved May 13, 2023 from https://archive.ics.uci.edu
[3]
1996. Adult Dataset. Retrieved May 09, 2023 from https://archive.ics.uci.edu/ml/datasets/adult
[4]
1998. UCI Machine Learning Repository: KDD Cup 1999 Data Data Set — archive.ics.uci.edu. Retrieved May 14, 2023 from https://archive.ics.uci.edu/ml/datasets/kdd+cup+1999+data
[5]
1999. KDD Cup 1999 Data — kdd.ics.uci.edu. Retrieved May 14, 2023 from http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
[6]
1999. UCI Machine Learning Repository: Spambase Data Set — archive.ics.uci.edu. Retrieved May 09, 2023 from https://archive.ics.uci.edu/ml/datasets/spambase
[7]
2004. KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems (Regression, Classification, Clustering, Pattern Mining and so on) — sci2s.ugr.es. Retrieved May 13, 2023 from https://sci2s.ugr.es/keel/datasets.php
[8]
2005. Google Maps Platform | Google Developers — developers.google.com. Retrieved May 14, 2023 from https://developers.google.com/maps
[9]
2007. UCI Machine Learning Repository — archive.ics.uci.edu. Retrieved Jun 13, 2023 from https://archive.ics.uci.edu/dataset/159/magic+gamma+telescope
[10]
2008. Aircraft Condition Monitoring System (ACMS) | SKYbrary Aviation Safety - skybrary.aero. Retrieved May 07, 2023 from https://www.skybrary.aero/articles/aircraft-condition-monitoring-system-acms
[11]
2009. PHM Society — phmsociety.org. Retrieved August 05, 2023 from https://phmsociety.org/
[12]
2010. Find Open Datasets and Machine Learning Projects | Kaggle — kaggle.com. Retrieved May 13, 2023 from https://www.kaggle.com/datasets
[13]
2013. UCI Machine Learning Repository: Data Set — archive.ics.uci.edu. Retrieved May 07, 2023 from http://archive.ics.uci.edu/ml/datasets/seismic-bumps
[14]
2016. IEEE 39-Bus System — electricgrids.engr.tamu.edu. Retrieved from https://electricgrids.engr.tamu.edu/electric-grid-test-cases/ieee-39-bus-system/
[15]
2017. ABCD Dataset - datasets.abci.ai. Retrieved May 14, 2023 from https://datasets.abci.ai/dataset/abcd/
[16]
2017. GitHub - gistairc/ABCDdataset — github.com. Retrieved May 14, 2023 from https://github.com/gistairc/ABCDdataset
[17]
2017. UCI Machine Learning Repository: APS Failure at Scania Trucks Data Set — archive.ics.uci.edu. Retrieved May 14, 2023 from https://archive.ics.uci.edu/ml/datasets/APS+Failure+at+Scania+Trucks
[18]
2018. GitHub - MHResearchNetwork/Diagnosis-Codes: Information on Diagnosis Code Lists used in MHRN, Including Formats and Datasets — github.com. Retrieved May 07, 2023 from https://github.com/MHResearchNetwork/Diagnosis-Codes
[19]
2018. KDD Cup 1999 Data — kaggle.com. Retrieved April 19, 2023 from https://www.kaggle.com/datasets/galaxyh/kdd-cup-1999-data
[20]
2019. GitHub - subhande/APS-Failure-at-Scania-Trucks-Data-Set — github.com. Retrieved August 05, 2023 from https://github.com/subhande/APS-Failure-at-Scania-Trucks-Data-Set
[21]
2020. 5 year BSE Sensex Dataset-kaggle.com. Retrieved May 07, 2023 from https://www.kaggle.com/datasets/ravisane1/5-year-bse-sensex-dataset
[22]
2020. Stroke Prediction Dataset — kaggle.com. Retrieved May 09, 2023 from https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset
[23]
2021. Bearing Data Center: Case School of Engineering: Case Western Reserve University. Retrieved May 09, 2023 from https://engineering.case.edu/bearingdatacenter
[24]
2021. NIFTY-50 Stock Market Data (2000-2021) — kaggle.com. Retrieved May 07, 2023 from https://www.kaggle.com/datasets/rohanrao/nifty50-stock-market-data
[25]
2022. Audio-Anomaly-Dataset - kaggle.com. Retrieved May 09, 2023 from https://www.kaggle.com/datasets/ahmedabbasi/audioanomalydataset
[26]
2022. Cerberus Web Client — ftp.fdot.gov. Retrieved May 17, 2023 from https://ftp.fdot.gov/file/d/FTP/FDOT/co/planning/transtat/gis/shs_maps/shsmap.pdf
[27]
[28]
2022. GitHub - Chao-Dang/Rare-Event-Estimation-by-Parallel-Adaptive-Bayesian-Quadrature — github.com. Retrieved May 15, 2023 from https://github.com/Chao-Dang/Rare-Event-Estimation-by-Parallel-Adaptive-Bayesian-Quadrature
[29]
2022. GitHub - petrobras/3W: This is the First Repository Published by Petrobras on GitHub. It Supports the 3W Project and Promotes Experimentation of Machine Learning-based Approaches and Algorithms for Specific Problems Related to Undesirable Events that Occur in Offshore Oil Wells. — github.com. Retrieved May 09, 2023 from https://github.com/petrobras/3W
[30]
2022. Intelligent Systems Division — nasa.gov. Retrieved May 15, 2023 from https://www.nasa.gov/intelligent-systems-division
[31]
2022. NASA Open APIs — api.nasa.gov. Retrieved May 13, 2023 from https://api.nasa.gov/
[32]
2023. OLGA Dynamic Multiphase Flow Simulator — software.slb.com. Retrieved May 15, 2023 from https://www.software.slb.com/products/olga
[33]
2023. Roadway Data Home - TDA, MnDOT — dot.state.mn.us. Retrieved May 07, 2023 from https://www.dot.state.mn.us/roadway/data/[Accessed 07-May-2023].
[34]
2023. S and P BSE SENSEX Dataset-bseindia.com. https://www.bseindia.com/
[35]
2023. Storm Prediction Center Storm Reports — spc.noaa.gov. Retrieved May 14, 2023 from https://www.spc.noaa.gov/climo/
[36]
Ahmed Abbasi, Abdul Rehman Rehman Javed, Amanullah Yasin, Zunera Jalil, Natalia Kryvinska, and Usman Tariq. 2022. A large-scale benchmark dataset for anomaly detection and rare event classification for audio forensics. IEEE Access 10 (2022), 38885–38894.
[37]
Mohamed Abdel-Basset, Laila Abdel-Fatah, and Arun Kumar Sangaiah. 2018. Metaheuristic algorithms: A comprehensive review. Computational Intelligence for Multimedia Big Data on the Cloud with Engineering Applications (2018), 185–231.
[38]
Syed M. Adil, Cyrus Elahi, Dev N. Patel, Andreas Seas, Pranav I. Warman, Anthony T. Fuller, Michael M. Haglund, and Timothy W. Dunn. 2022. Deep learning to predict traumatic brain injury outcomes in the low-resource setting. World Neurosurgery 164 (2022), e8–e16.
[39]
Ramesh Agarwal and Mahesh V. Joshi. 2001. PNrule: A new framework for learning classifier models in data mining (a case-study in network intrusion detection). In Proceedings of the 2001 SIAM International Conference on Data Mining. SIAM, 1–17.
[40]
Azim Ahmadzadeh, Berkay Aydin, Dustin J. Kempton, Maxwell Hostetter, Rafal A. Angryk, Manolis K. Georgoulis, and Sushant S. Mahajan. 2019. Rare-event time series prediction: A case study of solar flare forecasting. In 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA’19). IEEE, 1814–1820.
[41]
S. Alestra, C. Bordry, C. Brand, E. Burnaev, P. Erofeev, A. Papanov, and C. Silveira-Freixo. 2014. Rare event anticipation and degradation trending for aircraft predictive maintenance. In Proceedings of the 11th World Congress on Computational Mechanics, WCCM, Vol. 5. 6571.
[42]
Özden Gür Ali and Umut Arıtürk. 2014. Dynamic churn prediction framework with more effective use of rare event data: The case of private banking. Expert Systems with Applications 41, 17 (2014), 7889–7903.
[43]
Omri Allouche, Asaf Tsoar, and Ronen Kadmon. 2006. Assessing the accuracy of species distribution models: Prevalence, kappa and the true skill statistic (TSS). Journal of Applied Ecology 43, 6 (2006), 1223–1232.
[44]
Henri Arno, Klaas Mulier, Joke Baeck, and Thomas Demeester. 2023. From numbers to words: Multi-modal bankruptcy prediction using the ECL dataset. In Proceedings of the Sixth Workshop on Financial Technology and Natural Language Processing, Chung-Chi Chen, Hen-Hsen Huang, Hiroya Takamura, Hsin-Hsi Chen, Hiroki Sakaji, and Kiyoshi Izumi (Eds.). Association for Computational Linguistics, Bali, Indonesia, 11--21.
[45]
Md Tanvir Ashraf, Kakan Dey, and Sabyasachee Mishra. 2023. Identification of high-risk roadway segments for wrong-way driving crash using rare event modeling and data augmentation techniques. Accident Analysis & Prevention 181 (2023), 106933.
[46]
Yuanlu Bai, Zhiyuan Huang, Henry Lam, and Ding Zhao. 2022. Rare-event simulation for neural network and random forest predictors. ACM Transactions on Modeling and Computer Simulation 32, 3 (2022), 1–33.
[47]
Mohamed Bekkar and Taklit Akrouf Alitouche. 2013. Imbalanced data learning approaches review. International Journal of Data Mining & Knowledge Management Process 3, 4 (2013), 15.
[48]
Zied Ben Bouallègue and David S. Richardson. 2022. On the ROC area of ensemble forecasts for rare events. Weather and Forecasting 37, 5 (2022), 787–796.
[49]
Christos Berberidis and Ioannis Vlahavas. 2007. Detection and prediction of rare events in transaction databases. International Journal on Artificial Intelligence Tools 16, 05 (2007), 829–848.
[50]
Michael J. A. Berry and Gordon S. Linoff. 2004. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management. John Wiley & Sons.
[51]
Samit Bhanja and Abhishek Das. 2022. A Black Swan event-based hybrid model for Indian stock markets’ trends prediction. Innovations in Systems and Software Engineering 20 (2022), 1--15.
[52]
Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Philipp Kranen, Hardy Kremer, Timm Jansen, and Thomas Seidl. 2010. Moa: Massive online analysis, a framework for stream classification and clustering. In Proceedings of the 1st Workshop on Applications of Pattern Analysis. PMLR, 44–50.
[53]
Bosch. 2016. Bosch Production Line Performance - kaggle.com. Retrieved April 19, 2023 from https://www.kaggle.com/c/bosch-production-line-performance
[54]
Richard R. Brooks and S. Sitharama Iyengar. 1996. Robust distributed computing and sensing algorithm. Computer 29, 6 (1996), 53–60.
[55]
Syed Muhammad Salman Bukhari, Muhammad Hamza Zafar, Mohamad Abou Houran, Syed Kumayl Raza Moosavi, Majad Mansoor, Muhammad Muaaz, and Filippo Sanfilippo. 2024. Secure and privacy-preserving intrusion detection in wireless sensor networks: Federated learning with SCNN-Bi-LSTM for enhanced reliability. Ad Hoc Networks 155 (2024), 103407.
[56]
Ander Carreño, Iñaki Inza, and Jose A. Lozano. 2020. Analyzing rare event, anomaly, novelty and outlier detection terms under the supervised classification framework. Artificial Intelligence Review 53, 5 (2020), 3575–3594.
[57]
Thasorn Chalongvorachai and Kuntpong Woraratpanya. 2021. 3dvae-ersg: 3d variational autoencoder for extremely rare signal generation. In Proceedings of the 2021 13th International Conference on Information Technology and Electrical Engineering. IEEE, 177–182.
[58]
Thasorn Chalongvorachai and Kuntpong Woraratpanya. 2021. A data generation framework for extremely rare case signals. Heliyon 7, 8 (2021).
[59]
Chao Chen, Andy Liaw, and Leo Breiman. 2004. Using random forest to learn imbalanced data. University of California, Berkeley 110, 1–12 (2004), 24.
[60]
Seong-Pyo Cheon, Sungshin Kim, So-Young Lee, and Chong-Bum Lee. 2009. Bayesian networks based rare event prediction with sensor data. Knowledge-Based Systems 22, 5 (2009), 336–343.
[61]
Davide Chicco and Giuseppe Jurman. 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21, 6 (2020), 1–13.
[62]
William W. Cohen. 1995. Fast effective rule induction. In Machine Learning Proceedings 1995. Elsevier, 115–123.
[63]
R. Yates Coley, Qinqing Liao, Noah Simon, and Susan M. Shortreed. 2023. Empirical evaluation of internal validation methods for prediction in large-scale clinical data with rare-event outcomes: A case study in suicide risk prediction. BMC Medical Research Methodology 23, 1 (2023), 33.
[64]
Florian Combes, Ricardo Fraiman, and Badih Ghattas. 2022. Time series sampling. Engineering Proceedings 18, 1 (2022), 32.
[65]
Dong Dai and Shaowen Hua. 2016. Random under-sampling ensemble methods for highly imbalanced rare disease classification. In Proceedings of the International Conference on Data Science. The Steering Committee of The World Congress in Computer Science, Computer. 54.
[66]
Chao Dang, Pengfei Wei, Matthias GR Faes, Marcos A. Valdebenito, and Michael Beer. 2022. Parallel adaptive Bayesian quadrature for rare event estimation. Reliability Engineering & System Safety 225, C (2022), 108621.
[67]
Maren David Dangut, Ian K. Jennions, Steve King, and Zakwan Skaf. 2022. Application of deep reinforcement learning for extremely rare failure prediction in aircraft maintenance. Mechanical Systems and Signal Processing 171 (2022), 108873.
[68]
Elaine Ribeiro de Faria, Isabel Ribeiro Goncalves, Joao Gama, André Carlos Ponce de Leon Ferreira. 2015. Evaluation of multiclass novelty detection algorithms for data streams. IEEE Transactions on Knowledge and Data Engineering 27, 11 (2015), 2961–2973.
[69]
Sayera Dhaubhadel, Kumkum Ganguly, Ruy M. Ribeiro, Judith D. Cohn, James M. Hyman, Nicolas W. Hengartner, Beauty Kolade, Anna Singley, Tanmoy Bhattacharya, Patrick Finley, Drew Levin, Haedi Thelen, Kelly Cho, Lauren Costa, Yuk-Lam Ho, Amy C. Justice, John P. Pestian, Daniel Santel, Rafael Zamora-Resendiz, Silvia Crivelli, Suzanne Tamang, Susana Martins, Jodie Trafton, David W. Oslin, Jean C. Beckham, Nathan A. Kimbrel, Khushbu Agarwal, Allison E. Ashley-Koch, Mihaela Aslan, Edmond Begoli, Ben Brown, Patrick Calhoun, Kei Cheung, Sutanay Choudhury, Ashley M. Cliff, Leticia Cuellar-Hengartner, Haedi E. Deangelis, Michelle F. Dennis, Patrick D. Finley, Michael R. Garvin, Joel E. Gelernter, Lauren P. Hair, Colby Ham, Phillip D. Harvey, Elizabeth R. Hauser, Michael A. Hauser, Nick W. Hengartner, Dan Jacobson, Jessica Jones, Piet C. Jones, David Kainer, Alan D. Kaplan, Ira R. Katz, Rachel L. Kember, Angela C. Kirby, John C. Ko, John Lagergren, Matthew Lane, Daniel F. Levey, Jennifer Hoff Lindquist, Xianlian Liu, Ravi K Madduri, Carrie Manore, Carianne Martinez, John F. McCarthy, Mikaela McDevitt Cashman, J. Izaak Miller, Destinee Morrow, Mirko Pavicic-Venegas, Saiju Pyarajan, Xue J. Qin, Nallakkandi Rajeevan, Christine M. Ramsey, Ruy Ribeiro, Alex Rodriguez, Jonathon Romero, Yunling Shi, Murray B. Stein, Kyle Sullivan, Ning Sun, Suzanne R. Tamang, Alice Townsend, Jodie A. Trafton, Angelica Walker, Xiange Wang, Victoria Wangia-Anderson, Renji Yang, Shinjae Yoo, Hongyu Zhao, and Benjamin H. McMahon. 2024. High dimensional predictions of suicide risk in 4.2 million US Veterans using ensemble transfer learning. Scientific Reports 14, 1 (2024), 1793.
[70]
Somayajulu L. N. Dhulipala, Michael D. Shields, Benjamin W. Spencer, Chandrakanth Bolisetti, Andrew E. Slaughter, Vincent M. Labouré, and Promit Chakroborty. 2022. Active learning with multifidelity modeling for efficient rare event simulation. Journal of Computational Physics 468 (2022), 111506.
[71]
Chuong B. Do and Serafim Batzoglou. 2008. What is the expectation maximization algorithm? Nature Biotechnology 26, 8 (2008), 897–899.
[72]
Margaret H. Dunham, Yu Meng, and Jie Huang. 2004. Extensible markov model. In Proceedings of the 4th IEEE International Conference on Data Mining. IEEE, 371–374.
[73]
Yasmin Fathy, Mona Jaber, and Alexandra Brintrup. 2020. Learning with imbalanced data in smart manufacturing: A comparative analysis. IEEE Access 9 (2020), 2734–2757.
[74]
Christina Felix, Joshua D. Johnston, Kelsey Owen, Emil Shirima, Sidney R. Hinds, Kenneth D. Mandl, Alex Milinovich, and Jay L. Alberts. 2024. Explainable machine learning for predicting conversion to neurological disease: Results from 52,939 medical records. Digital Health 10 (2024), 20552076241249286.
[75]
National Centers for Environmental Information (NCEI). [n. d.]. Web Services API (version 2) Documentation | Climate Data Online (CDO) | National Climatic Data Center (NCDC) — ncdc.noaa.gov. Retrieved May 13, 2023 from https://www.ncdc.noaa.gov/cdo-web/webservices/v2
[76]
Nicolas C. Frazee, Alexander Brace, Anthony Bogetti, Arvind Ramanathan, and Lillian T. Chong. 2024. Deepdrivewe: A deep-learning-enhanced weighted ensemble rare-event sampling method. Biophysical Journal 123, 3 (2024), 280a.
[77]
Aito Fujita, Ken Sakurada, Tomoyuki Imaizumi, Riho Ito, Shuhei Hikosaka, and Ryosuke Nakamura. 2017. Damage detection from aerial images via convolutional neural networks. In Proceedings of the 2017 15th IAPR International Conference on Machine Vision Applications. IEEE, 5–8.
[78]
Jun-Ichiro Fukuchi. 1999. Subsampling and model selection in time series analysis. Biometrika 86, 3 (1999), 591–604.
[79]
João Gama, Rita P. Ribeiro, Saulo Mastelini, Narjes Davarid, and Bruno Veloso. 2024. A neuro-symbolic explainer for rare events: A case study on predictive maintenance. arXiv preprint arXiv:2404.14455 (2024).
[80]
Manolis Georgoulis. [n. d.]. SWAN-SF — dataverse.harvard.edu. Retrieved May 11, 2023 from https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/EBCFKM
[81]
Paul Glasserman, Philip Heidelberger, Perwez Shahabuddin, and Tim Zajic. 1999. Multilevel splitting for estimating rare event probabilities. Operations Research 47, 4 (1999), 585–600.
[82]
Christopher Gondek, Daniel Hafner, and Oliver R. Sampson. 2016. Prediction of failures in the air pressure system of scania trucks using a random forest and feature engineering. In Proceedings of the Advances in Intelligent Data Analysis XV: 15th International Symposium, IDA 2016, Stockholm, Sweden, October 13–15, 2016, Proceedings 15. Springer, 398–402.
[83]
Xianliang Gong and Yulin Pan. 2024. Multifidelity bayesian experimental design to quantify rare-event statistics. SIAM/ASA Journal on Uncertainty Quantification 12, 1 (2024), 101–127.
[84]
Google. 2017. AudioSet Dataset. https://research.google.com/audioset/dataset/index.html[Accessed 15-May-2023].
[85]
Google. 2017. MNIST Variations. Retrieved May 19, 2023 from http://www.iro.umontreal.ca/IClisa/twiki/bin/view.cgi/Public/MnistVariations
[86]
Ruben Grewal, Paolo Tonella, and Andrea Stocco. 2024. Predicting safety misbehaviours in autonomous driving systems using uncertainty quantification. 2024 IEEE Conference on Software Testing, Verification and Validation (ICST) (2024), 70--81.
[87]
Yijie Gui, Wensheng Gan, Yongdong Wu, and S. Yu Philip. 2024. Privacy preserving rare itemset mining. Information Sciences 662 (2024), 120262. https://api.semanticscholar.org/CorpusID:267445658
[88]
Prashasta Gujrati, Yawei Li, Zhiling Lan, Rajeev Thakur, and John White. 2007. A meta-learning failure predictor for blue gene/l systems. In Proceedings of the 2007 International Conference on Parallel Processing. IEEE, 40–40.
[89]
Ryuhei Hamaguchi, Ken Sakurada, and Ryosuke Nakamura. 2019. Rare event detection using disentangled representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9327–9335.
[90]
Ryuhei Hamaguchi, Ken Sakurada, and Ryosuke Nakamura. 2019. Supplementary material for CVPR submission# 4406. (2019).
[91]
Eui-Hong Han, Daniel Boley, Maria Gini, Robert Gross, Kyle Hastings, George Karypis, Vipin Kumar, Bamshad Mobasher, and Jerome Moore. 1998. WebACE: A web agent for document categorization and exploration. In Proceedings of the 2nd International Conference on Autonomous Agents. 408–415.
[92]
David C. Harrison, Winston K. G. Seah, and Ramesh Rayudu. 2016. Rare event detection and propagation in wireless sensor networks. ACM Computing Surveys 48, 4 (2016), 1–22.
[93]
Peter Hart. 1968. The condensed nearest neighbor rule (corresp.). IEEE Transactions on Information Theory 14, 3 (1968), 515–516.
[94]
Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). IEEE, 1322–1328.
[95]
Jia He and Maggie X. Cheng. 2021. Weighting methods for rare event identification from imbalanced datasets. Frontiers in Big Data 4 (2021), 715320.
[96]
Jeff Hebert. 2016. Predicting rare failure events using classification trees on large scale manufacturing data with complex interactions. In Proceedings of the 2016 IEEE International Conference on Big Data (Big Data). IEEE, 2024–2028.
[97]
Ruei-Jie Hsieh, Jerry Chou, and Chih-Hsiang Ho. 2019. Unsupervised online anomaly detection on multivariate sensing time series data for smart manufacturing. In Proceedings of the 2019 IEEE 12th Conference on Service-oriented Computing and Applications. IEEE, 90–97.
[98]
Kaizhu Huang, Haiqin Yang, Irwin King, and Michael R. Lyu. 2004. Learning classifiers from imbalanced data based on biased minimax probability machine. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., Vol. 2. IEEE, II–II.
[99]
NSE India. 2023. NSE - National Stock Exchange of India ltd. Retrieved May 14, 2023 from https://www.nseindia.com/
[100]
V. Iyer, S. Shetty, and S. S. Iyengar. 2015. Statistical methods in AI: Rare event learning using associative rules and higher-order statistics. (2015).
[101]
Valérian Jacques-Dumas, René M. van Westen, and Henk A. Dijkstra. 2024. Estimation of AMOC transition probabilities using a machine learning based rare-event algorithm. (2024). https://journals.ametsoc.org/view/journals/aies/aop/AIES-D-24-0002.1/AIES-D-24-0002.1.xml
[102]
Elaheh Jafarigol and Theodore B. Trafalis. 2024. A distributed approach to meteorological predictions: Addressing data imbalance in precipitation prediction models through federated learning and GANs. Computational Management Science 21, 1 (2024), 22.
[103]
Nathalie Japkowicz, Catherine Myers, and Mark Gluck. 1995. A novelty detection approach to classification. In Proceedings of the 14th international joint conference on Artificial intelligence, Vol. 1. Citeseer, 518–523.
[104]
Taeho Jo and Nathalie Japkowicz. 2004. Class imbalances versus small disjuncts. ACM Sigkdd Explorations Newsletter 6, 1 (2004), 40–49.
[105]
Mahesh V. Joshi, Ramesh C. Agarwal, and Vipin Kumar. 2001. Mining needle in a haystack: Classifying rare classes via two-phase rule induction. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. 91–102.
[106]
CREST JST. 2015. Change detection from a street image pair using cnn features and superpixel segmentation. (2015).
[107]
Thongchai Kaewkiriya and Kuntpong Woraratpanya. 2022. 3DVAE-LSTM for extremely rare anomaly signal generation. In Proceedings of the 2022 14th International Conference on Information Technology and Electrical Engineering. IEEE, 229–234.
[108]
Soumyashree Kar, Jason R. McKenna, Vishwamithra Sunkara, Robert Coniglione, Steve Stanic, and Landry Bernard. 2024. XWaveNet: Enabling uncertainty quantification in short-term ocean wave height forecasts and extreme event prediction. Applied Ocean Research 148 (2024), 103994.
[109]
Lukas Kaupp, Heiko Webert, Kawa Nazemi, Bernhard Humm, and Stephan Simons. 2021. CONTEXT: An industry 4.0 dataset of contextual faults in a smart factory. Procedia Computer Science 180 (2021), 492–501.
[110]
Gary King and Langche Zeng. 2001. Explaining rare events in international relations. International Organization 55, 3 (2001), 693–715.
[111]
Sotiris Kotsiantis, Dimitris Kanellopoulos, and Panayiotis Pintelas. 2006. Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering 30, 1 (2006), 25–36.
[112]
Bartosz Krawczyk, Michał Woźniak, and Gerald Schaefer. 2014. Cost-sensitive decision tree ensembles for effective imbalanced classification. Applied Soft Computing 14 (2014), 554–562.
[113]
Miroslav Kubat, Robert C. Holte, and Stan Matwin. 1998. Machine learning for the detection of oil spills in satellite radar images. Machine Learning 30 (1998), 195–215.
[114]
Miroslav Kubat and Stan Matwin. 1997. Addressing the curse of imbalanced training sets: One-sided selection. In Proceedings of the International Conference on Machine Learning, Vol. 97. Nashville, USA, 179.
[115]
Mandar Kulkarni and Aria Abubakar. 2020. Soft attention convolutional neural networks for rare event detection in sequences. AI for Earth Sciences Workshop at NeurIPS (2020).
[116]
V. Lakshmanan, G. Stumpf, and A. Witt. 2005. A neural network for detecting and diagnosing tornadic circulations using the mesocyclone detection and near storm environment algorithms. In Proceedings of the 21st International Conference on Information Processing Systems, San Diego, CA, Amer. Meteor. Soc.
[117]
Zhiling Lan, Jiexing Gu, Ziming Zheng, Rajeev Thakur, and Susan Coghlan. 2010. A study of dynamic meta-learning for failure prediction in large-scale systems. Journal of Parallel and Distributed Computing 70, 6 (2010), 630–643.
[118]
A. Lazarevic, Jaideep Srivastava, and Vipin Kumar. 2004. Data mining for analysis of rare events: A case study in security, financial and medical applications. In Proceedings of the Pacific-asia Conference on Knowledge Discovery and Data Mining.
[119]
Wonjae Lee and Kangwon Seo. 2021. Early failure detection of paper manufacturing machinery using nearest neighbor-based feature extraction. Engineering Reports 3, 2 (2021), e12291.
[120]
Jinyan Li, Simon Fong, Shimin Hu, Victor W. Chu, Raymond K. Wong, Sabah Mohammed, and Nilanjan Dey. 2017. Rare event prediction using similarity majority under-sampling technique. In Proceedings of the Soft Computing in Data Science: 3rd International Conference, SCDS 2017, Yogyakarta, Indonesia, November 27–28, 2017, Proceedings 3. Springer, 23–39.
[121]
Jinyan Li, Lian-sheng Liu, Simon Fong, Raymond K. Wong, Sabah Mohammed, Jinan Fiaidhi, Yunsick Sung, and Kelvin KL Wong. 2017. Adaptive swarm balancing algorithms for rare-event prediction in imbalanced healthcare data. PloS One 12, 7 (2017), e0180830.
[122]
Chunyan Ling and Zhenzhou Lu. 2021. Support vector machine-based importance sampling for rare event estimation. Structural and Multidisciplinary Optimization 63, 4 (2021), 1609–1631.
[123]
Haiying Liu, Ruizhe Ma, Daiyi Li, Li Yan, and Zongmin Ma. 2021. Machinery fault diagnosis based on deep learning for time series analysis and knowledge graphs. Journal of Signal Processing Systems 93, 12 (2021), 1433–1455.
[124]
Henry X. Liu and Shuo Feng. 2022. “Curse of rarity” for autonomous vehicles. Nature Communications 15 (2022).
[125]
Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. 2021. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence 3, 3 (2021), 218–229.
[126]
Victor S. L’vov, Anna Pomyalov, and Itamar Procaccia. 2001. Outliers, extreme events, and multiscaling. Physical Review E 63, 5 (2001), 056118.
[127]
Maher Maalouf, Dirar Homouz, and Theodore B. Trafalis. 2018. Logistic regression in large rare events and imbalanced data: A performance comparison of prior correction and weighting methods. Computational Intelligence 34, 1 (2018), 161–174.
[128]
Maher Maalouf and Theodore B. Trafalis. 2011. Rare events and imbalanced datasets: An overview. International Journal of Data Mining, Modelling and Management 3, 4 (2011), 375–388.
[129]
Maher Maalouf and Theodore B. Trafalis. 2011. Robust weighted kernel logistic regression in imbalanced and rare events data. Computational Statistics & Data Analysis 55, 1 (2011), 168–183.
[130]
Inderjeet Mani and I. Zhang. 2003. kNN approach to unbalanced data distributions: A case study involving information extraction. In Proceedings of Workshop on Learning from Imbalanced Datasets, Vol. 126. ICML, 1–7.
[131]
Christina Marel, Mohammad H. Afzali, Matthew Sunderland, Maree Teesson, and Katherine L. Mills. 2024. Predicting risk of heroin overdose, remission, use, and mortality using ensemble learning methods in a cohort of people with heroin dependence. International Journal of Mental Health and Addiction (2024), 1–19.
[132]
Matheus A. Marins, Bettina D. Barros, Ismael H. Santos, Daniel C. Barrionuevo, Ricardo E. V. Vargas, Thiago de M. Prego, Amaro A. de Lima, Marcello L. R. de Campos, Eduardo A. B. da Silva, and Sergio L. Netto. 2021. Fault detection and classification in oil wells and production/service lines using random forest. Journal of Petroleum Science and Engineering 197 (2021), 107879.
[133]
Rafael H. Martello, Lucas Ranzan, Marcelo Farenzena, and Jorge O. Trierweiler. 2021. Improving autoencoder training with novel goal functions based on multivariable control concepts. IFAC-PapersOnLine 54, 3 (2021), 73–78.
[134]
Yu Meng, Margaret H. Dunham, F. Marco Marchetti, and Jie Huang. 2006. Rare event detection in a spatiotemporal environment. In Proceedings of the 2006 IEEE International Conference on Granular Computing. IEEE, 629–634.
[135]
Katerina Mitropoulou, Panagiotis Kokkinos, Polyzois Soumplis, and Emmanouel Varvarigos. 2024. Anomaly detection in cloud computing using knowledge graph embedding and machine learning mechanisms. Journal of Grid Computing 22, 1 (2024), 6.
[136]
Aman Samson Mogos, Xiaodong Liang, and Chi Yung Chung. 2023. Distribution transformer failure prediction for predictive maintenance using hybrid one-class deep SVDD classification and lightning strike failures data. IEEE Transactions on Power Delivery 38, 5 (2023), 3250--3261. https://api.semanticscholar.org/CorpusID:258232364
[137]
Maria Carolina Monard and GEAPA Batista. 2002. Learning with skewed class distributions. Advances in Logic, Artificial Intelligence and Robotics 85 (2002), 173–180.
[138]
Rodrigo Moura, Armando Mendes, José Cascalho, Sandra Mendes, Rodolfo Melo, and Emanuel Barcelos. 2024. Predicting flood events with streaming data: A preliminary approach with GRU and ARIMA. In Proceedings of the International Conference on Optimization, Learning Algorithms and Applications. Springer, 319–332.
[139]
Garimella Rama Murthy and Vasanth Iyer. 2007. Distributed wireless sensor network architecture: Fuzzy logic based sensor fusion. In New Dimensions in Fuzzy Logic and Related Technologies. Proceedings of the 5th EUSFLAT Conference, Ostrava, Czech Republic, September 11–14, 2007, Volume 2: Regular Sessions. Citeseer, 71–78.
[140]
Krystyna Napierala and Jerzy Stefanowski. 2016. Types of minority class examples and their influence on learning classifiers from imbalanced data. Journal of Intelligent Information Systems 46 (2016), 563–597.
[141]
National Climatic Data Center (NCDC). 2023. Storm Data Publication | IPS | National Climatic Data Center (NCDC) — ncdc.noaa.gov. Retrieved May 15, 2023 from https://www.ncdc.noaa.gov/IPS/sd/sd.html
[142]
NCEI. 2023. Storm Events Database | National Centers for Environmental Information — ncdc.noaa.gov. Retrieved May 15, 2023 from https://www.ncdc.noaa.gov/stormevents/
[143]
Patrick Nectoux, Rafael Gouriveau, Kamal Medjaher, Emmanuel Ramasso, Brigitte Chebel-Morello, Noureddine Zerhouni, and Christophe Varnier. 2012. PRONOSTIA: An experimental platform for bearings accelerated degradation tests. In Proceedings of the IEEE International Conference on Prognostics and Health Management, PHM’12. IEEE Catalog Number: CPF12PHM-CDR, 1–8.
[144]
Yair Neuman, Yochai Cohen, and Eden Erez. 2021. Extreme rare events identification through jaynes inferential approach. Big Data 9, 6 (2021), 417–426.
[145]
Wahyu Nugraha, Muhammad Sony Maulana, and Agung Sasongko. 2020. Clustering based undersampling for handling class imbalance in C4. 5 classification algorithm. In Proceedings of the Journal of Physics: Conference Series, Vol. 1641. IOP Publishing, 012014.
[146]
Hülya Olmuş, Ezgi Nazman, and Semra Erbaş. 2022. Comparison of penalized logistic regression models for rare event case. Communications in Statistics-Simulation and Computation 51, 4 (2022), 1578–1590.
[147]
Z. A. Omar, S. N. Chin, SRM Hashim, and N. Hamzah. 2022. Exploring clusters of rare events using unsupervised random forests. In Proceedings of the Journal of Physics: Conference Series, Vol. 2314. IOP Publishing, 012019.
[148]
World Health Organization. 2022. Retrieved May 13, 2023 from https://www.who.int/data/gho/data/themes/hiv-aids
[149]
Cara O’Brien, Benjamin A. Goldstein, Yueqi Shen, Matthew Phelan, Curtis Lambert, Armando D. Bedoya, and Rebecca C. Steorts. 2020. Development, implementation, and evaluation of an in-hospital optimized early warning score for patient deterioration. MDM Policy & Practice 5, 1 (2020).
[150]
Mohammad Parsa, Emmanuel John M. Carranza, and Bahman Ahmadi. 2021. Deep GMDH neural networks for predictive mapping of mineral prospectivity in terrains hosting few but large mineral deposits. Natural Resources Research 31 (2021), 37--50. https://api.semanticscholar.org/CorpusID:244742494
[151]
Michael Peng, Elisheva R. Stern, and Hanwen Hu. [n. d.]. Forecasting China bond defaults with severe imbalanced data: A meta-learning approach. Available at SSRN 4684217.
[152]
Ethan Pickering, Stephen Guth, George Em Karniadakis, and Themistoklis P. Sapsis. 2022. Discovering and forecasting extreme events via active learning in neural operators. Nature Computational Science 2, 12 (2022), 823–833.
[153]
Marco A. F. Pimentel, David A. Clifton, Lei Clifton, and Lionel Tarassenko. 2014. A review of novelty detection. Signal Processing 99 (2014), 215–249.
[154]
Rainer Puhr, Georg Heinze, Mariana Nold, Lara Lusa, and Angelika Geroldinger. 2017. Firth’s logistic regression with rare events: Accurate effect estimates and predictions? Statistics in Medicine 36, 14 (2017), 2302–2317.
[155]
Hai Qiu, Jay Lee, Jing Lin, and Gang Yu. 2006. Wavelet filter-based weak signature detection method and its application on rolling element bearing prognostics. Journal of Sound and Vibration 289, 4–5 (2006), 1066–1090.
[156]
Noor Fadhilah Ahmad Radi, Roslinazairimah Zakaria, and Muhammad Az-zuhri Azman. 2015. Estimation of missing rainfall data using spatial interpolation and imputation methods. In Proceedings of the AIP Conference Proceedings, Vol. 1643. American Institute of Physics, 42–48.
[157]
Siam Rafsunjani, Rifat Sultana Safa, Abdullah Al Imran, Md Shamsur Rahim, and Dip Nandi. 2019. An empirical comparison of missing value imputation techniques on APS failure prediction. International Journal of Information Technology and Computer Science 2, 2 (2019), 21–29.
[158]
R. Ramya. 2024. Analysis and applications finding of wireless sensors and IoT devices with artificial intelligence/machine learning. In Proceedings of the AIoT and Smart Sensing Technologies for Smart Devices. IGI Global, 77–102.
[159]
Chitta Ranjan. 2020. Understanding Deep Learning: Application in Rare Event Prediction. Connaissance Publishing Atlanta, GA, USA.
[160]
Chitta Ranjan, Mahendranath Reddy, Markku Mustonen, Kamran Paynabar, and Karim Pourak. 2018. Dataset: Rare event classification in multivariate time series. arXiv preprint arXiv:1809.10717 (2018).
[161]
Chitta Ranjan, Mahendranath Reddy, Markku Mustonen, Kamran Paynabar, and Karim Pourak. 2019. Data challenge: Data augmentation for rare events in multivariate time series.
[162]
Manjusha Ravindranath, K. Selçuk Candan, and Maria Luisa Sapino. 2020. M2NN: Rare event inference through multi-variate multi-scale attention. In Proceedings of the 2020 IEEE International Conference on Smart Data Services. IEEE, 53–62.
[163]
Manjusha Ravindranath, K. Selçuk Candan, Maria Luisa Sapino, and Brian Appavu. 2024. MMA: Metadata supported multi-variate attention for onset detection and prediction. Data Mining and Knowledge Discovery 38, 4 (2024), 1545--1588. https://api.semanticscholar.org/CorpusID:267953544
[164]
Shebuti Rayana. 2016. ODDS Library. Retrieved May 14, 2023 from https://odds.cs.stonybrook.edu
[165]
Akthem Rehab, Islam Ali, Walid Gomaa, and M. Nashat Fors. 2021. Bearings fault detection using hidden markov models and principal component analysis enhanced features. PHM Society European Conference 6, 1 (2021), 11.
[166]
Chris Reimann. 2024. Predicting financial crises: An evaluation of machine learning algorithms and model explainability for early warning systems. Review of Evolutionary Political Economy 5 (2024), 1–33.
[167]
L. A. Gloeckler Ries, J. L. Young, G. E. Keel, M. P. Eisner, Y. D. Lin, and M. J. Horner. 2007. SEER survival monograph: Cancer survival among adults: US SEER program, 1988–2001, patient and tumor characteristics. National Cancer Institute, SEER Program, NIH Pub 7, 6215 (2007), 193–202.
[168]
Shakila Khan Rumi, Ke Deng, and Flora Dilys Salim. 2018. Crime event prediction with dynamic features. EPJ Data Science 7, 1 (2018), 43.
[169]
Jaleal Sanjak, Jessica Binder, Arjun Singh Yadaw, Qian Zhu, and Ewy A. Mathé. 2024. Clustering rare diseases within an ontology-enriched knowledge graph. Journal of the American Medical Informatics Association 31, 1 (2024), 154–164.
[170]
Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van Hulse, and Amri Napolitano. 2007. Mining data with rare events: A case study. In Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence, Vol. 2. IEEE, 132–139.
[171]
Burr Settles. 2009. Active learning literature survey. (2009).
[172]
Momina Shaheen, Muhammad S. Farooq, and Tariq Umer. 2024. AI-empowered mobile edge computing: Inducing balanced federated learning strategy over edge for balanced data and optimized computation cost. Journal of Cloud Computing 13, 1 (2024), 1–21.
[173]
Bowen Shi, Ming Sun, Krishna C. Puvvada, Chieh-Chi Kao, Spyros Matsoukas, and Chao Wang. 2020. Few-shot acoustic event detection via meta learning. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 76–80.
[174]
Chathurangi Shyalika, Kaushik Roy, Renjith Prasad, Fadi El Kalach, Yuxin Zi, Priya Mittal, Vignesh Narayanan, Ramy Harik, and Amit Sheth. 2024. RI2AP: Robust and interpretable 2D anomaly prediction in assembly pipelines. Sensors 24, 10 (2024), 3244.
[175]
Chathurangi Shyalika, Ruwan Wickramarachchi, Fadi El Kalach, Ramy Harik, and Amit Sheth. 2024. Evaluating the role of data enrichment approaches towards rare event analysis in manufacturing. Sensors 24, 15 (2024), 5009.
[176]
Gregory E. Simon, Eric Johnson, Jean M. Lawrence, Rebecca C. Rossom, Brian Ahmedani, Frances L. Lynch, Arne Beck, Beth Waitzfelder, Rebecca Ziebell, Robert B. Penfold, and Susan M. Shortreed. 2018. Predicting suicide attempts and suicide deaths following outpatient visits using electronic health records. American Journal of Psychiatry 175, 10 (2018), 951–960.
[177]
Jerzy Stefanowski. 2013. Overlapping, rare examples and class decomposition in learning classifiers from imbalanced data. Emerging Paradigms in Machine Learning 13 (2013), 277--306.
[178]
Daniel J. Stekhoven and Peter Bühlmann. 2012. MissForest–non-parametric missing value imputation for mixed-type data. Bioinformatics 28, 1 (2012), 112–118.
[179]
John Strahan, Spencer C. Guo, Chatipat Lorpaiboon, Aaron R. Dinner, and Jonathan Weare. 2023. Inexact iterative numerical linear algebra for neural network-based spectral estimation and rare-event prediction. The Journal of Chemical Physics 159, 1 (2023). https://api.semanticscholar.org/CorpusID:257663313
[180]
Yuchun Tang, Bo Jin, Yi Sun, and Yan-Qing Zhang. 2004. Granular support vector machines for medical binary classification problems. In Proceedings of the 2004 Symposium on Computational Intelligence in Bioinformatics and Computational Biology. IEEE, 73–78.
[181]
Yuchun Tang, Yan-Qing Zhang, Nitesh V. Chawla, and Sven Krasser. 2008. SVMs modeling for highly imbalanced classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39, 1 (2008), 281–288.
[182]
Tippaya Thinsungnoena, Nuntawut Kaoungkub, Pongsakorn Durongdumronchaib, Kittisak Kerdprasopb, and Nittaya Kerdprasopb. 2015. The clustering validity with silhouette and sum of squared errors. Learning 3, 7 (2015), 44–51.
[183]
Ivan Tomek. 1976. An experiment with the edited nearest-nieghbor rule. IEEE Transactions on Systems, Man, and Cybernetics (SMC-6) 6 (1976), 448--452.
[184]
Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein, and Russ B. Altman. 2001. Missing value estimation methods for DNA microarrays. Bioinformatics 17, 6 (2001), 520–525.
[185]
Miet Van Den Eeckhaut, P. Reichenbach, F Guzzetti, M. Rossi, and Jean Poesen. 2009. Combined landslide inventory and susceptibility assessment based on different mapping units: An example from the Flemish Ardennes, Belgium. Natural Hazards and Earth System Sciences 9, 2 (2009), 507–521.
[186]
Miet Van Den Eeckhaut, Tom Vanwalleghem, Jean Poesen, Gerard Govers, Gert Verstraeten, and Liesbeth Vandekerckhove. 2006. Prediction of landslide susceptibility using rare events logistic regression: A case-study in the Flemish Ardennes (Belgium). Geomorphology 76, 3–4 (2006), 392–410.
[187]
Ricardo Emanuel Vaz Vargas, Celso José Munaro, Patrick Marques Ciarelli, André Gonçalves Medeiros, Bruno Guberfain do Amaral, Daniel Centurion Barrionuevo, Jean Carlos Dias de Araújo, Jorge Lins Ribeiro, and Lucas Pierezan Magalhães. 2019. A realistic and public dataset with rare undesirable real events in oil wells. Journal of Petroleum Science and Engineering 181 (2019), 106223.
[188]
Panagiota Varsou. 2024. The DeepProbCEP system for Neuro-Symbolic Complex Event Recognition. Master’s Thesis. \(\Pi\)\(\alpha\)\(\nu\)\(\varepsilon\)\(\pi\)\(\iota\)\(\sigma\)\(\tau\)\(\acute{\eta }\)\(\mu\)\(\iota\)o \(\Pi\)\(\varepsilon\)\(\iota\)\(\rho\)\(\alpha\)\(\iota\)\(\acute{\omega }\)\(\varsigma\).
[189]
Ricardo Vilalta and Sheng Ma. 2002. Predicting rare events in temporal domains. In Proceedings of the 2002 IEEE International Conference on Data Mining, 2002. Proceedings. IEEE, 474–481.
[190]
Biao Wang, Yaguo Lei, Naipeng Li, and Ningbo Li. 2018. A hybrid prognostics approach for estimating remaining useful life of rolling element bearings. IEEE Transactions on Reliability 69, 1 (2018), 401–412.
[191]
Senzhang Wang, Zhoujun Li, Wenhan Chao, and Qinghua Cao. 2012. Applying adaptive over-sampling technique based on data density and cost-sensitive SVM to imbalanced learning. In Proceedings of the 2012 International Joint Conference on Neural Networks. IEEE, 1–8.
[192]
Shuo Wang, Leandro L. Minku, and Xin Yao. 2018. A systematic study of online class imbalance learning with concept drift. IEEE Transactions on Neural Networks and Learning Systems 29, 10 (2018), 4802–4821.
[193]
Runmin Wei, Jingye Wang, Mingming Su, Erik Jia, Shaoqiu Chen, Tianlu Chen, and Yan Ni. 2018. Missing value imputation approach for mass spectrometry-based metabolomics data. Scientific Reports 8, 1 (2018), 663.
[194]
Gary M. Weiss. 2004. Mining with rarity: A unifying framework. ACM Sigkdd Explorations Newsletter 6, 1 (2004), 7–19.
[195]
Gary M. Weiss and Haym Hirsh. 1998. Learning to predict rare events in categorical time-series data. In Proceedings of the International Conference on Machine Learning. 83–90.
[196]
Junfeng Wen, Bo Dai, Lihong Li, and Dale Schuurmans. 2020. Batch stationary distribution estimation. In Proceedings of the 37th International Conference on Machine Learning (ICML'20). JMLR.org, (2020), 11 pages.
[197]
Zoie Shui Yee Wong. 2016. Statistical classification of drug incidents due to look-alike sound-alike mix-ups. Health Informatics Journal 22, 2 (2016), 276–292.
[198]
Junjie Wu, Hui Xiong, Peng Wu, and Jian Chen. 2007. Local decomposition for rare class analysis. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 814–823.
[199]
Jianhua Xian and Ziqi Wang. 2024. A physics and data co-driven surrogate modeling method for high-dimensional rare event simulation. J. Comput. Phys. 510 (2023), 113069. https://api.semanticscholar.org/CorpusID:263334279
[200]
Zidi Xiu, Chenyang Tao, Michael Gao, Connor Davis, Benjamin A. Goldstein, and Ricardo Henao. 2021. Variational disentanglement for rare event modeling. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 10469–10477.
[201]
Dongting Xu, Zhisheng Zhang, and Jinfei Shi. 2022. A new multi-sensor stream data augmentation method for imbalanced learning in complex manufacturing process. Sensors 22, 11 (2022), 4042.
[202]
Dongting Xu, Zhisheng Zhang, and Jinfei Shi. 2022. Training data selection by categorical variables for better rare event prediction in multiple products production line. Electronics 11, 7 (2022), 1056.
[203]
Hongling Xu, Ruizhe Ma, Li Yan, and Zongmin Ma. 2021. Two-stage prediction of machinery fault trend based on deep learning for time series analysis. Digital Signal Processing 117 (2021), 103150.
[204]
Hsin-Chih Yang, Ming-Chuan Yang, Guo-Wei Wong, and Meng Chang Chen. 2023. Extreme event discovery with self-attention for PM2. 5 anomaly prediction. IEEE Intelligent Systems 38, 2 (2023), 36–45.
[205]
Na Yang, Zhenkai Zhang, Jianhua Yang, and Zenglin Hong. 2022. Applications of data augmentation in mineral prospectivity prediction based on convolutional neural networks. Computers & Geosciences 161 (2022), 105075.
[206]
Peng-Hui Yang, Yao Yu, Feng Gu, Meng-Jie Qu, and Jia-Ming Zhu. 2022. Prediction and risk assessment of extreme weather events based on gumbel copula function. Journal of Function Spaces 2022 (2022), 1–13.
[207]
Quanming Yao and James T. Kwok. 2018. Accelerated and inexact soft-impute for large-scale matrix and tensor completion. IEEE Transactions on Knowledge and Data Engineering 31, 9 (2018), 1665–1679.
[208]
Sunhee Yoon and Wang-Hee Lee. 2023. Application of true skill statistics as a practical method for quantitatively assessing CLIMEX performance. Ecological Indicators 146 (2023), 109830.
[209]
Yuxuan Zhang, Xiaoyou Wang, and Yong Xia. 2024. Few-shot classification for sensor anomalies with limited samples. Journal of Infrastructure Intelligence and Resilience (2024), 100087.
[210]
Liang Zhao. 2021. Event prediction in the big data era: A systematic survey. ACM Computing Surveys 54, 5 (2021), 1–37.
[211]
Yang Zhao, Zoie Shui-Yee Wong, and Kwok Leung Tsui. 2018. A framework of rebalancing imbalanced healthcare data for rare events’ classification: A case of look-alike sound-alike mix-up incident detection. Journal of Healthcare Engineering 2018 (2018).

Cited By

View all
  • (2025)Advancing machine learning in Industry 4.0: Benchmark framework for rare-event prediction in chemical processesComputers & Chemical Engineering10.1016/j.compchemeng.2024.108929194(108929)Online publication date: Mar-2025
  • (2024)A limit formula and recursive algorithm for multivariate Normal tail probabilityStatistics and Computing10.1007/s11222-024-10552-z35:1Online publication date: 28-Dec-2024

Index Terms

  1. A Comprehensive Survey on Rare Event Prediction

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Computing Surveys
    ACM Computing Surveys  Volume 57, Issue 3
    March 2025
    984 pages
    EISSN:1557-7341
    DOI:10.1145/3697147
    • Editors:
    • David Atienza,
    • Michela Milano
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 November 2024
    Online AM: 14 October 2024
    Accepted: 01 October 2024
    Revised: 16 September 2024
    Received: 12 September 2023
    Published in CSUR Volume 57, Issue 3

    Check for updates

    Author Tags

    1. Event-prediction
    2. rare-events
    3. time-series
    4. anomaly prediction
    5. forecasting

    Qualifiers

    • Survey

    Funding Sources

    • National Science Foundation
    • National Science Foundation

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)537
    • Downloads (Last 6 weeks)221
    Reflects downloads up to 31 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Advancing machine learning in Industry 4.0: Benchmark framework for rare-event prediction in chemical processesComputers & Chemical Engineering10.1016/j.compchemeng.2024.108929194(108929)Online publication date: Mar-2025
    • (2024)A limit formula and recursive algorithm for multivariate Normal tail probabilityStatistics and Computing10.1007/s11222-024-10552-z35:1Online publication date: 28-Dec-2024

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media