[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2983323.2983784acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Model-Based Oversampling for Imbalanced Sequence Classification

Published: 24 October 2016 Publication History

Abstract

Sequence classification is critical in the data mining communities. It becomes more challenging when the class distribution is imbalanced, which occurs in many real-world applications. Oversampling algorithms try to re-balance the skewed class by generating synthetic data for minority classes, but most of existing oversampling approaches could not consider the temporal structure of sequences, or handle multivariate and long sequences. To address these problems, this paper proposes a novel oversampling algorithm based on the 'generative' models of sequences. In particular, a recurrent neural network was employed to learn the generative mechanics for sequences as representations for the corresponding sequences. These generative models are then utilized to form a kernel to capture the similarity between different sequences. Finally, oversampling is performed in the kernel feature space to generate synthetic data. The proposed approach can handle highly imbalanced sequential data and is robust to noise. The competitiveness of the proposed approach is demonstrated by experiments on both synthetic data and benchmark data, including univariate and multivariate sequences.

References

[1]
S. Wang, L. L. Minku, and X. Yao, "Resampling-based ensemble methods for online class imbalance learning," IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 5, pp. 1356--1368, 2015.
[2]
Y. H. Zhou and Z. H. Zhou, "Large margin distribution learning with cost interval and unlabeled data," IEEE Transactions on Knowledge and Data Engineering, vol. PP, no. 99, pp. 1--1, 2016.
[3]
H. He and E. A. Garcia, "Learning from imbalanced data," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263--1284, 2009.
[4]
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "Smote: synthetic minority over-sampling technique," Journal of Artificial Intelligence Research, pp. 321--357, 2002.
[5]
Y. Jo, N. Loghmanpour, and C. P. Rosé, "Time series analysis of nursing notes for mortality prediction via a state transition topic model," in Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1171--1180, ACM, 2015.
[6]
K. H. Brodersen, T. M. Schofield, A. P. Leff, C. S. Ong, E. I. Lomakina, J. M. Buhmann, and K. E. Stephan, "Generative embedding for model-based classification of fmri data," PLoS Comput Biol, vol. 7, no. 6, p. e1002079, 2011.
[7]
J.-S. Wu and Z.-H. Zhou, "Sequence-based prediction of microrna-binding residues in proteins using cost-sensitive laplacian support vector machines," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 3, pp. 752--759, 2013.
[8]
H. Chen, P. Tino, A. Rodan, and X. Yao, "Learning in the model space for cognitive fault diagnosis," IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 1, pp. 124--136, 2014.
[9]
Y. Bengio, N. Boulanger-Lewandowski, and R. Pascanu, "Advances in optimizing recurrent networks," in IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8624--8628, IEEE, 2013.
[10]
R. Goroshin, J. Bruna, J. Tompson, D. Eigen, and Y. LeCun, "Unsupervised learning of spatiotemporally coherent metrics," in Proceedings of the IEEE International Conference on Computer Vision, pp. 4086--4093, 2015.
[11]
H. Han, W.-Y. Wang, and B.-H. Mao, "Borderline-smote: a new over-sampling method in imbalanced data sets learning," in Advances in Intelligent Computing, pp. 878--887, Springer, 2005.
[12]
H. He, Y. Bai, E. A. Garcia, and S. Li, "Adasyn: Adaptive synthetic sampling approach for imbalanced learning," in IEEE International Joint Conference on Neural Networks, pp. 1322--1328, IEEE, 2008.
[13]
H. Cao, X.-L. Li, D. Y.-K. Woon, and S.-K. Ng, "Integrated oversampling for imbalanced time series classification," IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 12, pp. 2809--2822, 2013.
[14]
M. Gönen and E. Alpaydın, "Multiple kernel learning algorithms," The Journal of Machine Learning Research, vol. 12, pp. 2211--2268, 2011.
[15]
C. Cortes, M. Mohri, and A. Rostamizadeh, "Algorithms for learning kernels based on centered alignment," The Journal of Machine Learning Research, vol. 13, no. 1, pp. 795--828, 2012.
[16]
L. R. Rabiner, "A tutorial on hidden markov models and selected applications in speech recognition," Proceedings of the IEEE, vol. 77, no. 2, pp. 257--286, 1989.
[17]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in Advances in Neural Information Processing Systems, pp. 3111--3119, 2013.
[18]
H. Jaeger, "The "echo state" approach to analysing and training recurrent neural networks-with an erratum note," Bonn, Germany: German National Research Center for Information Technology GMD Technical Report, vol. 148, p. 34, 2001.
[19]
H. Chen, F. Tang, P. Tino, and X. Yao, "Model-based kernel for efficient time series analysis," in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 392--400, ACM, 2013.
[20]
V. N. Vapnik and V. Vapnik, Statistical learning theory, vol. 1. Wiley New York, 1998.
[21]
J. Shawe-Taylor and N. Cristianini, Kernel methods for pattern analysis. Cambridge university press, 2004.
[22]
N. ello Cristianini, A. Elisseeff, J. Shawe-Taylor, and J. Kandola, "On kernel-target alignment," in Advances in Neural Information Processing Systems, 2001.
[23]
A. Rodan and P. Ti\vno, "Simple deterministically constructed cycle reservoirs with regular jumps," Neural computation, vol. 24, no. 7, pp. 1822--1852, 2012.
[24]
E. J. Keogh and M. J. Pazzani, "Derivative dynamic time warping.," in Sdm, vol. 1, pp. 5--7, SIAM, 2001.
[25]
C.-C. Chang and C.-J. Lin, "LIBSVM: A library for support vector machines," ACM Transactions on Intelligent Systems and Technology, vol. 2, pp. 27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
[26]
Y. Chen, E. Keogh, B. Hu, N. Begum, A. Bagnall, A. Mueen, and G. Batista, "The ucr time series classification archive," July 2015. www.cs.ucr.edu/ eamonn/time_series_data/.
[27]
G. E. Batista, R. C. Prati, and M. C. Monard, "A study of the behavior of several methods for balancing machine learning training data," ACM Sigkdd Explorations Newsletter, vol. 6, no. 1, pp. 20--29, 2004.

Cited By

View all
  • (2024)Few-shot generative model for skeleton-based human action synthesis using cross-domain adversarial learning2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00390(3934-3943)Online publication date: 3-Jan-2024
  • (2024)Dynamic Ensemble Selection for Imbalanced Data Streams With Concept DriftIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.318312035:1(1278-1291)Online publication date: Jan-2024
  • (2024)From Data to D3 Model: Adaptive Subsurface Anomaly Detection in GPR DataIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.338900962(1-12)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
October 2016
2566 pages
ISBN:9781450340731
DOI:10.1145/2983323
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. imbalanced learning
  2. model space
  3. oversampling
  4. sequence classification

Qualifiers

  • Research-article

Funding Sources

Conference

CIKM'16
Sponsor:
CIKM'16: ACM Conference on Information and Knowledge Management
October 24 - 28, 2016
Indiana, Indianapolis, USA

Acceptance Rates

CIKM '16 Paper Acceptance Rate 160 of 701 submissions, 23%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)37
  • Downloads (Last 6 weeks)3
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Few-shot generative model for skeleton-based human action synthesis using cross-domain adversarial learning2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00390(3934-3943)Online publication date: 3-Jan-2024
  • (2024)Dynamic Ensemble Selection for Imbalanced Data Streams With Concept DriftIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.318312035:1(1278-1291)Online publication date: Jan-2024
  • (2024)From Data to D3 Model: Adaptive Subsurface Anomaly Detection in GPR DataIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.338900962(1-12)Online publication date: 2024
  • (2024)Characteristic Attribute Organization System (CAOS): Identifying Classification Rules Based on Phylogenetically Organized SequencesDNA Barcoding10.1007/978-1-0716-3581-0_21(335-345)Online publication date: 30-Apr-2024
  • (2023)A Background Knowledge Revising and Incorporating Dialogue ModelIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.312312834:8(3874-3884)Online publication date: Aug-2023
  • (2023)Minimum Recall-Based Loss Function for Imbalanced Time Series ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.326899435:10(10024-10034)Online publication date: 1-Oct-2023
  • (2023)Autoencoders and Generative Adversarial Networks for Imbalanced Sequence Classification2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386960(1101-1108)Online publication date: 15-Dec-2023
  • (2023)Religious Affiliation in the Twenty-First Century: A Machine Learning Perspective on the World Value SurveySociety10.1007/s12115-023-00887-060:5(733-749)Online publication date: 25-Aug-2023
  • (2022)An Encoder-Decoder Network for Automatic Clinical Target Volume Target Segmentation of Cervical Cancer in CT ImagesInternational Journal of Crowd Science10.26599/IJCS.2022.91000146:3(111-116)Online publication date: Aug-2022
  • (2022)Evolutionary Dual-Ensemble Class Imbalance Learning for Human Activity RecognitionIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2021.30799666:4(728-739)Online publication date: Aug-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media