2008 Volume 3 Issue 2 Pages 327-340
We present a novel algorithm to predict transmembrane regions from a primary amino acid sequence. Previous studies have shown that the Hidden Markov Model (HMM) is one of the powerful tools known to predict transmembrane regions; however, one of the conceptual drawbacks of the standard HMM is the fact that the state duration, i.e., the duration for which the hidden dynamics remains in a particular state follows the geometric distribution. Real data, however, does not always indicate such a geometric distribution. The proposed algorithm utilizes a Generalized Hidden Markov Model (GHMM), an extension of the HMM, to cope with this problem. In the GHMM, the state duration probability can be any discrete distribution, including a geometric distribution. The proposed algorithm employs a state duration probability based on a Poisson distribution. We consider the two-dimensional vector trajectory consisting of hydropathy index and charge associated with amino acids, instead of the 20 letter symbol sequences. Also a Monte Carlo method (Forward/Backward Sampling method) is adopted for the transmembrane region prediction step. Prediction accuracies using publicly available data sets show that the proposed algorithm yields reasonably good results when compared against some existing algorithms.