More Web Proxy on the site http://driver.im/

research-article

Model-Based Oversampling for Imbalanced Sequence Classification

Authors:

Huanhuan ChenAuthors Info & Claims

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Pages 1009 - 1018

https://doi.org/10.1145/2983323.2983784

Published: 24 October 2016 Publication History

Abstract

Sequence classification is critical in the data mining communities. It becomes more challenging when the class distribution is imbalanced, which occurs in many real-world applications. Oversampling algorithms try to re-balance the skewed class by generating synthetic data for minority classes, but most of existing oversampling approaches could not consider the temporal structure of sequences, or handle multivariate and long sequences. To address these problems, this paper proposes a novel oversampling algorithm based on the 'generative' models of sequences. In particular, a recurrent neural network was employed to learn the generative mechanics for sequences as representations for the corresponding sequences. These generative models are then utilized to form a kernel to capture the similarity between different sequences. Finally, oversampling is performed in the kernel feature space to generate synthetic data. The proposed approach can handle highly imbalanced sequential data and is robust to noise. The competitiveness of the proposed approach is demonstrated by experiments on both synthetic data and benchmark data, including univariate and multivariate sequences.

References

[1]

S. Wang, L. L. Minku, and X. Yao, "Resampling-based ensemble methods for online class imbalance learning," IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 5, pp. 1356--1368, 2015.

Digital Library

[2]

Y. H. Zhou and Z. H. Zhou, "Large margin distribution learning with cost interval and unlabeled data," IEEE Transactions on Knowledge and Data Engineering, vol. PP, no. 99, pp. 1--1, 2016.

[3]

H. He and E. A. Garcia, "Learning from imbalanced data," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263--1284, 2009.

Digital Library

[4]

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "Smote: synthetic minority over-sampling technique," Journal of Artificial Intelligence Research, pp. 321--357, 2002.

Digital Library

[5]

Y. Jo, N. Loghmanpour, and C. P. Rosé, "Time series analysis of nursing notes for mortality prediction via a state transition topic model," in Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1171--1180, ACM, 2015.

Digital Library

[6]

K. H. Brodersen, T. M. Schofield, A. P. Leff, C. S. Ong, E. I. Lomakina, J. M. Buhmann, and K. E. Stephan, "Generative embedding for model-based classification of fmri data," PLoS Comput Biol, vol. 7, no. 6, p. e1002079, 2011.

[7]

J.-S. Wu and Z.-H. Zhou, "Sequence-based prediction of microrna-binding residues in proteins using cost-sensitive laplacian support vector machines," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 3, pp. 752--759, 2013.

Digital Library

[8]

H. Chen, P. Tino, A. Rodan, and X. Yao, "Learning in the model space for cognitive fault diagnosis," IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 1, pp. 124--136, 2014.

[9]

Y. Bengio, N. Boulanger-Lewandowski, and R. Pascanu, "Advances in optimizing recurrent networks," in IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8624--8628, IEEE, 2013.

[10]

R. Goroshin, J. Bruna, J. Tompson, D. Eigen, and Y. LeCun, "Unsupervised learning of spatiotemporally coherent metrics," in Proceedings of the IEEE International Conference on Computer Vision, pp. 4086--4093, 2015.

Digital Library

[11]

H. Han, W.-Y. Wang, and B.-H. Mao, "Borderline-smote: a new over-sampling method in imbalanced data sets learning," in Advances in Intelligent Computing, pp. 878--887, Springer, 2005.

Digital Library

[12]

H. He, Y. Bai, E. A. Garcia, and S. Li, "Adasyn: Adaptive synthetic sampling approach for imbalanced learning," in IEEE International Joint Conference on Neural Networks, pp. 1322--1328, IEEE, 2008.

[13]

H. Cao, X.-L. Li, D. Y.-K. Woon, and S.-K. Ng, "Integrated oversampling for imbalanced time series classification," IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 12, pp. 2809--2822, 2013.

Digital Library

[14]

M. Gönen and E. Alpaydın, "Multiple kernel learning algorithms," The Journal of Machine Learning Research, vol. 12, pp. 2211--2268, 2011.

Digital Library

[15]

C. Cortes, M. Mohri, and A. Rostamizadeh, "Algorithms for learning kernels based on centered alignment," The Journal of Machine Learning Research, vol. 13, no. 1, pp. 795--828, 2012.

Digital Library

[16]

L. R. Rabiner, "A tutorial on hidden markov models and selected applications in speech recognition," Proceedings of the IEEE, vol. 77, no. 2, pp. 257--286, 1989.

[17]

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in Advances in Neural Information Processing Systems, pp. 3111--3119, 2013.

Digital Library

[18]

H. Jaeger, "The "echo state" approach to analysing and training recurrent neural networks-with an erratum note," Bonn, Germany: German National Research Center for Information Technology GMD Technical Report, vol. 148, p. 34, 2001.

[19]

H. Chen, F. Tang, P. Tino, and X. Yao, "Model-based kernel for efficient time series analysis," in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 392--400, ACM, 2013.

Digital Library

[20]

V. N. Vapnik and V. Vapnik, Statistical learning theory, vol. 1. Wiley New York, 1998.

Digital Library

[21]

J. Shawe-Taylor and N. Cristianini, Kernel methods for pattern analysis. Cambridge university press, 2004.

[22]

N. ello Cristianini, A. Elisseeff, J. Shawe-Taylor, and J. Kandola, "On kernel-target alignment," in Advances in Neural Information Processing Systems, 2001.

Digital Library

[23]

A. Rodan and P. Ti\vno, "Simple deterministically constructed cycle reservoirs with regular jumps," Neural computation, vol. 24, no. 7, pp. 1822--1852, 2012.

Digital Library

[24]

E. J. Keogh and M. J. Pazzani, "Derivative dynamic time warping.," in Sdm, vol. 1, pp. 5--7, SIAM, 2001.

[25]

C.-C. Chang and C.-J. Lin, "LIBSVM: A library for support vector machines," ACM Transactions on Intelligent Systems and Technology, vol. 2, pp. 27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.

Digital Library

[26]

Y. Chen, E. Keogh, B. Hu, N. Begum, A. Bagnall, A. Mueen, and G. Batista, "The ucr time series classification archive," July 2015. www.cs.ucr.edu/ eamonn/time_series_data/.

[27]

G. E. Batista, R. C. Prati, and M. C. Monard, "A study of the behavior of several methods for balancing machine learning training data," ACM Sigkdd Explorations Newsletter, vol. 6, no. 1, pp. 20--29, 2004.

Digital Library

Cited By

Fukushi KNozaki YNishihara KNakahara K(2024)Few-shot generative model for skeleton-based human action synthesis using cross-domain adversarial learning2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00390(3934-3943)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00390
Jiao BGuo YGong DChen Q(2024)Dynamic Ensemble Selection for Imbalanced Data Streams With Concept DriftIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.318312035:1(1278-1291)Online publication date: Jan-2024
https://doi.org/10.1109/TNNLS.2022.3183120
Liu SZhou XChen H(2024)From Data to D3 Model: Adaptive Subsurface Anomaly Detection in GPR DataIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.338900962(1-12)Online publication date: 2024
https://doi.org/10.1109/TGRS.2024.3389009
Show More Cited By

Index Terms

Model-Based Oversampling for Imbalanced Sequence Classification
1. Information systems
  1. Data management systems
    1. Information integration
      1. Entity resolution
      2. Extraction, transformation and loading

Recommendations

MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning

Imbalanced learning problems contain an unequal distribution of data samples among different classes and pose a challenge to any classifier as it becomes hard to learn the minority class samples. Synthetic oversampling methods address this problem by ...
Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets

A new oversampling method for imbalanced dataset classification is presented.It clusters the minority class and identifies borderline minority instances.Considering majority class during minority class clustering improves oversampling.Cluster size after ...
SDDSMOTE:Synthetic Minority Oversampling Technique based on Sample Density Distribution for Enhanced Classification on Imbalanced Microarray Data
ICCDA '22: Proceedings of the 2022 6th International Conference on Compute and Data Analysis

Microarray gene expression data contain an unbalanced distribution of data samples among different classes, which poses a challenge to machine learning-based cancer diagnosis. In addition, microarray data consists of small samples and a huge number of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

October 2016

2566 pages

ISBN:9781450340731

DOI:10.1145/2983323

General Chairs:
Snehasis Mukhopadhyay
Indiana University Purdue University Indianapolis, USA
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign, USA
,
Program Chairs:
Elisa Bertino
Purdue University
,
Fabio Crestani
University of Lugano
,
Javed Mostafa
University of North Carolina
,
Jie Tang
Tsinghua University
,
Luo Si
Alibaba Group Inc & Purdue University
,
Xiaofang Zhou
University of Queensland
,
Yi Chang
Yahoo Research
,
Yunyao Li
IBM Research - Almaden
,
Parikshit Sondhi
WalmartLabs

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Fundamental Research Funds for the Central Universities
the National Science Foundation of China
Ministry of Science and Technology of the People's Republic of China

Conference

CIKM'16

Sponsor:

CIKM'16: ACM Conference on Information and Knowledge Management

October 24 - 28, 2016

Indiana, Indianapolis, USA

Acceptance Rates

CIKM '16 Paper Acceptance Rate 160 of 701 submissions, 23%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
601
Total Downloads

Downloads (Last 12 months)37
Downloads (Last 6 weeks)3

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fukushi KNozaki YNishihara KNakahara K(2024)Few-shot generative model for skeleton-based human action synthesis using cross-domain adversarial learning2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00390(3934-3943)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00390
Jiao BGuo YGong DChen Q(2024)Dynamic Ensemble Selection for Imbalanced Data Streams With Concept DriftIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.318312035:1(1278-1291)Online publication date: Jan-2024
https://doi.org/10.1109/TNNLS.2022.3183120
Liu SZhou XChen H(2024)From Data to D3 Model: Adaptive Subsurface Anomaly Detection in GPR DataIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.338900962(1-12)Online publication date: 2024
https://doi.org/10.1109/TGRS.2024.3389009
Ramanan VSarkar I(2024)Characteristic Attribute Organization System (CAOS): Identifying Classification Rules Based on Phylogenetically Organized SequencesDNA Barcoding10.1007/978-1-0716-3581-0_21(335-345)Online publication date: 30-Apr-2024
https://doi.org/10.1007/978-1-0716-3581-0_21
Zhao XFeng XChen H(2023)A Background Knowledge Revising and Incorporating Dialogue ModelIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.312312834:8(3874-3884)Online publication date: Aug-2023
https://doi.org/10.1109/TNNLS.2021.3123128
Ircio JLojo AMori UMalinowski SLozano J(2023)Minimum Recall-Based Loss Function for Imbalanced Time Series ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.326899435:10(10024-10034)Online publication date: 1-Oct-2023
https://doi.org/10.1109/TKDE.2023.3268994
Ger SJambunath YKlabjan D(2023)Autoencoders and Generative Adversarial Networks for Imbalanced Sequence Classification2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386960(1101-1108)Online publication date: 15-Dec-2023
https://doi.org/10.1109/BigData59044.2023.10386960
Jafarigol EKeely WHortag TWelborn THekmatpour PTrafalis T(2023)Religious Affiliation in the Twenty-First Century: A Machine Learning Perspective on the World Value SurveySociety10.1007/s12115-023-00887-060:5(733-749)Online publication date: 25-Aug-2023
https://doi.org/10.1007/s12115-023-00887-0
Fan YTao ZLin JChen H(2022)An Encoder-Decoder Network for Automatic Clinical Target Volume Target Segmentation of Cervical Cancer in CT ImagesInternational Journal of Crowd Science10.26599/IJCS.2022.91000146:3(111-116)Online publication date: Aug-2022
https://doi.org/10.26599/IJCS.2022.9100014
Guo YChu YJiao BCheng JYu ZCui NMa L(2022)Evolutionary Dual-Ensemble Class Imbalance Learning for Human Activity RecognitionIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2021.30799666:4(728-739)Online publication date: Aug-2022
https://doi.org/10.1109/TETCI.2021.3079966
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents