[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/ISPDC.2012.30guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A Distributed Methodology for Imbalanced Classification Problems

Published: 25 June 2012 Publication History

Abstract

Current important challenges in data mining research are triggered by the need to address various particularities of real-world problems, such as imbalanced data and error cost distributions. This paper presents Distributed Evolutionary Cost-Sensitive Balancing, a distributed methodology for dealing with imbalanced data and -- if necessary -- cost distributions. The method employs a genetic algorithm to search for an optimal cost matrix and base classifier settings, which are then employed by a cost-sensitive classifier, wrapped around the base classifier. Individual fitness computation is the most intensive task in the algorithm, but it also presents a high parallelization potential. Two different parallelization alternatives have been explored: a computation-driven approach, and a data-driven approach. Both have been developed within the Apache Watchmaker framework and deployed on Hadoop-based infrastructures. Experimental evaluations performed up to this point have indicated that the computation-driven approach achieves a good classification performance, but does not reduce the running time significantly, the data-driven approach reduces the running time for slow algorithms, such as the kNN and the SVM, while still yielding important performance improvements.

Cited By

View all
  • (2021)A novel approach based on genetic algorithm to speed up the discovery of classification rules on GPUsKnowledge-Based Systems10.1016/j.knosys.2021.107419231:COnline publication date: 14-Nov-2021
  1. A Distributed Methodology for Imbalanced Classification Problems

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ISPDC '12: Proceedings of the 2012 11th International Symposium on Parallel and Distributed Computing
    June 2012
    314 pages
    ISBN:9780769548050

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 25 June 2012

    Author Tags

    1. Hadoop
    2. distributed cost-sensitive balancing
    3. imbalanced classification

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 04 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)A novel approach based on genetic algorithm to speed up the discovery of classification rules on GPUsKnowledge-Based Systems10.1016/j.knosys.2021.107419231:COnline publication date: 14-Nov-2021

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media