Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleAugust 2024
An adaptive over-sampling method for imbalanced data based on simultaneous clustering and filtering noisy
Applied Intelligence (KLU-APIN), Volume 54, Issue 22Pages 11430–11449https://doi.org/10.1007/s10489-024-05754-xAbstractImbalanced data classification problem is a prevalent concern within the realms of machine learning and data mining. However, conventional methods primarily concentrate on between-class imbalance, ignoring noisy, overlap and within-class issues. ...
- research-articleAugust 2024
FIAO: Feature Information Aggregation Oversampling for imbalanced data classification
AbstractClassification performance often deteriorates when machine learning algorithms are trained on imbalanced data. Although oversampling methods have been successfully employed to address imbalanced data, existing approaches have limitations such as ...
Highlights- We proposes feature information aggregation oversampling for imbalanced data classification (FIAO).
- The FIAO proposed in this study can effectively improve the performance of the classifier.
- This study uses a series of feature ...
- research-articleJuly 2024
A MeanShift-guided oversampling with self-adaptive sizes for imbalanced data classification
Information Sciences: an International Journal (ISCI), Volume 672, Issue Chttps://doi.org/10.1016/j.ins.2024.120699Highlights- The method can address within-class imbalance, small sample sizes and small disjuncts.
- The oversampling simultaneously considers majority and minority class distribution.
- The introduced randomness and cut-off threshold can avoid ...
The imbalanced data classification has gained popularity in machine learning research domain due to its prevalence in numerous applications and its difficulty. However, the majority of contemporary work primarily focuses on addressing between-...
- review-articleApril 2024
Fraud Detection Using Machine Learning and Deep Learning
AbstractDetecting fraudulent activities is a major worry for businesses and financial organizations because they can result in significant financial losses and reputational harm. Traditional fraud detection a method frequently depend on present rules and ...
-
- research-articleFebruary 2024
IMI2: A fuzzy clustering validity index for multiple imbalanced clusters▪
Expert Systems with Applications: An International Journal (EXWA), Volume 238, Issue PEhttps://doi.org/10.1016/j.eswa.2023.122231AbstractThe fuzzy c-means (FCM) clustering algorithm requires the pre-definition of the number of clusters. When used for imbalanced datasets, FCM tends to equalize the sizes of clusters and thus produces bad clustering results. Although clustering ...
Highlights- We propose a new clustering validity index that is effective for unbalanced data sets with multiple clusters.
- We design a merging algorithm that is able to produce majority clusters of high quality.
- This new CVI performs better ...
- research-articleFebruary 2024
ASE: Anomaly scoring based ensemble learning for highly imbalanced datasets▪
Expert Systems with Applications: An International Journal (EXWA), Volume 238, Issue PChttps://doi.org/10.1016/j.eswa.2023.122049AbstractNowadays, many classification algorithms have been applied to various industries to help them work out their problems met in real-life scenarios. However, in many binary classification tasks, samples in the minority class only make up a small ...
Highlights- Introduce a scoring system based on anomaly detection to the resampling strategy.
- The proposed weighting functions are intuitive and easy to understand.
- Propose an efficient ensemble learning framework.
- Excellent performance on ...
- research-articleJuly 2024
Enhancing Spam Detection with GANs and BERT Embeddings: A Novel Approach to Imbalanced Datasets
Procedia Computer Science (PROCS), Volume 236, Issue CPages 420–427https://doi.org/10.1016/j.procs.2024.05.049AbstractIn recent years, the prevalence of imbalanced datasets has posed significant challenges to traditional machine learning models. This imbalance is especially pronounced in fields such as spam detection, where malicious or unwanted messages are ...
- research-articleFebruary 2024
Iterative minority oversampling and its ensemble for ordinal imbalanced datasets
Engineering Applications of Artificial Intelligence (EAAI), Volume 127, Issue PAhttps://doi.org/10.1016/j.engappai.2023.107211AbstractOrdinal classification of imbalanced datasets is a challenging problem that occurs in many real-world applications. The main challenge is to simultaneously consider the classes ordering and imbalanced distribution. Although the classic synthetic ...
- research-articleDecember 2023
A new oversampling approach based differential evolution on the safe set for highly imbalanced datasets
Expert Systems with Applications: An International Journal (EXWA), Volume 234, Issue Chttps://doi.org/10.1016/j.eswa.2023.121039AbstractOversampling method is used to solve the class imbalanced issues. Some existing oversampling methods do not well remove noisy samples and avoid synthesizing noisy samples. Therefore, we propose a new oversampling approach based differential ...
Highlights- SS_DEBOHID improves diversity of synthesized samples and reduces noisy samples.
- SS_DEBOHID uses the basic strategy of DE to generate samples in the safe area.
- SS_DEBOHID is better than 10 methods on 43 highly imbalanced datasets.
- research-articleFebruary 2024
Sentiment analysis of imbalanced datasets using BERT and ensemble stacking for deep learning
Engineering Applications of Artificial Intelligence (EAAI), Volume 126, Issue PChttps://doi.org/10.1016/j.engappai.2023.106999AbstractThe Internet is a crucial way to share information in both personal and professional areas. Sentiment analysis attracts great interest in marketing, research, and business today. The instability faced by imbalanced datasets on sentiment analysis ...
- research-articleNovember 2023
Addressing the class-imbalance and class-overlap problems by a metaheuristic-based under-sampling approach
Highlights- A method is presented to address the class-imbalance and class-overlap problems.
- The proposed method is based on a metaheuristic-based under-sampling approach.
- The under-sampling problem is mapped into an optimization problem.
- ...
The problem of imbalanced class distribution in real-world datasets severely impairs the performance of classification algorithms. The learning task becomes more complicated and challenging when there is also the class-overlap problem in ...
- research-articleOctober 2023
Self-adaptive oversampling method based on the complexity of minority data in imbalanced datasets classification
AbstractLearning from imbalanced datasets is a nontrivial task for supervised learning community. Traditional classifiers may have difficulties to learn the concept related to the minority class when addressing imbalanced classification and ...
Highlights- A self-adaptive oversampling method based on minority data complexity is presented.
- research-articleAugust 2023
Mega trend diffusion-siamese network oversampling for imbalanced datasets’ SVM classification
AbstractImbalanced class distribution is a frequent and problematic issue in the domains of data engineering and machine learning. Traditional classification algorithms or machine learning models frequently fail, in this difficult situation, to provide ...
Graphical abstractDisplay Omitted
Highlights- Imbalanced datasets pose common and important problems in many fields.
- A novel distance-based mega-trend-diffusion Siamese Network (DB-MTD-SN) oversampling approach is developed in this paper.
- The proposed method’s efficacy is ...
- research-articleDecember 2022
ESMOTE: an overproduce-and-choose synthetic examples generation strategy based on evolutionary computation
Neural Computing and Applications (NCAA), Volume 35, Issue 9Pages 6891–6977https://doi.org/10.1007/s00521-022-08004-8AbstractThe class imbalance learning problem is an important topic that has attracted considerable attention in machine learning and data mining. The most common method of addressing imbalanced datasets is the synthetic minority oversampling technique (...
- ArticleSeptember 2022
Oversampling for Mining Imbalanced Datasets: Taxonomy and Performance Evaluation
AbstractThe paper focuses on methods and algorithms for oversampling two-classes imbalanced datasets. We propose a taxonomy for oversampling approaches and review state-of-the-art algorithms. The paper discusses also some strengths and weaknesses of the ...
- research-articleAugust 2022
PF-SMOTE: A novel parameter-free SMOTE for imbalanced datasets
AbstractClass imbalance learning is one of the most important topics in the field of machine learning and data mining, and the Synthetic Minority Oversampling Techniques (SMOTE) is the common method to handle this issue. The main shortcomings ...
- ArticleApril 2023
Semantic-Based Classification of Relevant Case Law
AbstractThe challenge of information overload in the legal domain increases every day. The COLIEE competition has created four challenges which are intended to encourage the development of systems and methods to alleviate some of that pressure: a case law ...
- research-articleJune 2022
Efficiency of oversampling methods for enhancing software defect prediction by using imbalanced data
Innovations in Systems and Software Engineering (SPISSE), Volume 19, Issue 3Pages 247–263https://doi.org/10.1007/s11334-022-00457-3AbstractSoftware defect prediction (SDP) is essential to analyze and identify defects present in a software model in early stages of software development. The identification of these defects and their early removal provides cost-efficient software. ...
- research-articleApril 2022
SVDD-based weighted oversampling technique for imbalanced and overlapped dataset learning
Information Sciences: an International Journal (ISCI), Volume 588, Issue CPages 13–51https://doi.org/10.1016/j.ins.2021.12.066Highlights- A novel SVDD boundary-based weighted oversampling approach is presented.
- The ...
Imbalanced dataset classification issue poses a major challenge on machine learning domain. Traditional supervised learning algorithms usually bias towards the majority class when handling imbalanced datasets, thus leading to poor ...