Authors:
Nic Herndon
and
Doina Caragea
Affiliation:
Kansas State University, United States
Keyword(s):
Domain Adaptation, Naïve Bayes, Splice Site Prediction, Unbalanced Data.
Related
Ontology
Subjects/Areas/Topics:
Bioinformatics
;
Biomedical Engineering
;
Data Mining and Machine Learning
;
Genomics and Proteomics
;
Sequence Analysis
Abstract:
For many machine learning problems, training an accurate classifier in a supervised setting requires a substantial
volume of labeled data. While large volumes of labeled data are currently available for some of these
problems, little or no labeled data exists for others. Manually labeling data can be costly and time consuming.
An alternative is to learn classifiers in a domain adaptation setting in which existing labeled data can be leveraged
from a related problem, referred to as source domain, in conjunction with a small amount of labeled data
and large amount of unlabeled data for the problem of interest, or target domain. In this paper, we propose two
similar domain adaptation classifiers based on a na¨ıve Bayes algorithm. We evaluate these classifiers on the
difficult task of splice site prediction, essential for gene prediction. Results show that the algorithms correctly
classified instances, with highest average area under precision-recall curve (auPRC) values between 18.46%
a
nd 78.01%.
(More)