Abstract
Frameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins’ common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level. To cope with this situation, we propose a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. This allows us to uncover evolutionary information that is not captured by traditional alignment methods, which is confirmed by biologically significant examples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Raes, J., Van de Peer, Y.: Functional divergence of proteins through frameshift mutations. Trends in Genetics 21(8), 428–431 (2005)
Okamura, K., et al.: Frequent appearance of novel protein-coding sequences by frameshift translation. Genomics 88(6), 690–697 (2006)
Harrison, P., Yu, Z.: Frame disruptions in human mRNA transcripts, and their relationship with splicing and protein structures. BMC Genomics 8, 371 (2007)
Hahn, Y., Lee, B.: Identification of nine human-specific frameshift mutations by comparative analysis of the human and the chimpanzee genome sequences. Bioinformatics 21(suppl. 1), i186–i194 (2005)
Grantham, R., Gautier, C., Gouy, M., Mercier, R., Pave, A.: Codon catalog usage and the genome hypothesis. Nucleic Acids Research (8), 49–62 (1980)
Shepherd, J.C.: Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification.. Proceedings National Academy Sciences USA (78), 1596–1600 (1981)
Guigo, R.: DNA composition, codon usage and exon prediction. Nucleic Protein Databases, 53–80 (1999)
Leluk, J.: A new algorithm for analysis of the homology in protein primary structure. Computers and Chemistry 22(1), 123–131 (1998)
Leluk, J.: A non-statistical approach to protein mutational variability. BioSystems 56(2-3), 83–93 (2000)
Altschul, S., et al.: Basic local alignment search tool. JMB 215(3), 403–410 (1990)
Altschul, S., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)
Pellegrini, M., Yeates, T.: Searching for Frameshift Evolutionary Relationships Between Protein Sequence Families. Proteins 37, 278–283 (1999)
Arvestad, L.: Aligning coding DNA in the presence of frame-shift errors. In: Hein, J., Apostolico, A. (eds.) CPM 1997. LNCS, vol. 1264, pp. 180–190. Springer, Heidelberg (1997)
Arvestad, L.: Algorithms for biological sequence alignment. PhD thesis, Royal Institute of Technology, Stocholm, Numerical Analysis and Computer Science (2000)
Blake, R., Hess, S., Nicholson-Tuell, J.: The influence of nearest neighbors on the rate and pattern of spontaneous point mutations. JME 34(3), 189–200 (1992)
Kosiol, C., Holmes, I., Goldman, N.: An Empirical Codon Model for Protein Sequence Evolution. Molecular Biology and Evolution 24(7), 1464 (2007)
Pedersen, A., Jensen, J.: A dependent-rates model and an MCMC-based methodology for the maximum-likelihood analysis of sequences with overlapping reading frames. Molecular Biology and Evolution 18, 763–776 (2001)
Lio, P., Goldman, N.: Models of Molecular Evolution and Phylogeny. Genome Research 8(12), 1233–1244 (1998)
Altschul, S., et al.: The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Research 29(2), 351–361 (2001)
Olsen, R., Bundschuh, R., Hwa, T.: Rapid assessment of extremal statistics for gapped local alignment. In: ISMB, pp. 211–222 (1999)
Delaye, L., DeLuna, A., Lazcano, A., Becerra, A.: The origin of a novel gene through overprinting in Escherichia coli. BMC Evolutionary Biology 8, 31 (2008)
Hubbard, T., et al.: Ensembl 2007. Nucleic Acids Res. 35 (2007)
Clamp, M., et al.: Distinguishing protein-coding and noncoding genes in the human genome. Proc. Natl. Acad. Sci. 104(49), 19428–19433 (2007)
Oostra, B., Chiurazzi, P.: The fragile X gene and its function. Clinical genetics 60(6), 399 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gîrdea, M., Noé, L., Kucherov, G. (2009). Back-Translation for Discovering Distant Protein Homologies. In: Salzberg, S.L., Warnow, T. (eds) Algorithms in Bioinformatics. WABI 2009. Lecture Notes in Computer Science(), vol 5724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04241-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-04241-6_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04240-9
Online ISBN: 978-3-642-04241-6
eBook Packages: Computer ScienceComputer Science (R0)