Abstract
In this study, we seek to develop a predictive model for finding the strength of binding between a particular transcription factor (TF) variant and a particular DNA target variant. The DNA binding paired domains of the Pax transcription factors, which are our main focus, show seemingly fuzzy and degenerate binding to various DNA targets, and paired domain-DNA binding is not a problem well suited for previously proposed algorithms. Here, we introduce a simple way to use rule induction for predicting the strength of TFDNA binding. We have created a dataset consisting of 597 example cases for paired domain-DNA binding by collecting information about all published and quantified interactions between TF and DNA sequence variants. Application of the rule induction based method on this dataset yields a high, although far from ideal accuracy of 69.7% (based on cross-validation), but perhaps more importantly, several useful rules for predicting the binding strength have been found. Although the primary motivation for introducing the rule induction based methods is the lack of efficient algorithms for paired domain-DNA binding prediction, we also show that the method can be applied with some success to a more well-studied TF-DNA binding prediction task involving the early growth response (EGR) TF family.
© 2006 The Author(s). Published by Journal of Integrative Bioinformatics.
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.