Back-Translation for Discovering Distant Protein Homologies

Marta Gîrdea²¹,
Laurent Noé²¹ &
Gregory Kucherov²¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5724))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

787 Accesses
1 Citations

Abstract

Frameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins’ common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level. To cope with this situation, we propose a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. This allows us to uncover evolutionary information that is not captured by traditional alignment methods, which is confirmed by biologically significant examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids

Article Open access 07 May 2015

Partially Local Multi-way Alignments

Article Open access 19 March 2018

Dynamic Programming

References

Raes, J., Van de Peer, Y.: Functional divergence of proteins through frameshift mutations. Trends in Genetics 21(8), 428–431 (2005)
Article CAS PubMed Google Scholar
Okamura, K., et al.: Frequent appearance of novel protein-coding sequences by frameshift translation. Genomics 88(6), 690–697 (2006)
Article CAS PubMed Google Scholar
Harrison, P., Yu, Z.: Frame disruptions in human mRNA transcripts, and their relationship with splicing and protein structures. BMC Genomics 8, 371 (2007)
Article PubMed PubMed Central Google Scholar
Hahn, Y., Lee, B.: Identification of nine human-specific frameshift mutations by comparative analysis of the human and the chimpanzee genome sequences. Bioinformatics 21(suppl. 1), i186–i194 (2005)
Article Google Scholar
Grantham, R., Gautier, C., Gouy, M., Mercier, R., Pave, A.: Codon catalog usage and the genome hypothesis. Nucleic Acids Research (8), 49–62 (1980)
Google Scholar
Shepherd, J.C.: Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification.. Proceedings National Academy Sciences USA (78), 1596–1600 (1981)
Google Scholar
Guigo, R.: DNA composition, codon usage and exon prediction. Nucleic Protein Databases, 53–80 (1999)
Google Scholar
Leluk, J.: A new algorithm for analysis of the homology in protein primary structure. Computers and Chemistry 22(1), 123–131 (1998)
Article CAS PubMed Google Scholar
Leluk, J.: A non-statistical approach to protein mutational variability. BioSystems 56(2-3), 83–93 (2000)
Article CAS PubMed Google Scholar
Altschul, S., et al.: Basic local alignment search tool. JMB 215(3), 403–410 (1990)
Article CAS Google Scholar
Altschul, S., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)
Article CAS PubMed PubMed Central Google Scholar
Pellegrini, M., Yeates, T.: Searching for Frameshift Evolutionary Relationships Between Protein Sequence Families. Proteins 37, 278–283 (1999)
Article CAS PubMed Google Scholar
Arvestad, L.: Aligning coding DNA in the presence of frame-shift errors. In: Hein, J., Apostolico, A. (eds.) CPM 1997. LNCS, vol. 1264, pp. 180–190. Springer, Heidelberg (1997)
Chapter Google Scholar
Arvestad, L.: Algorithms for biological sequence alignment. PhD thesis, Royal Institute of Technology, Stocholm, Numerical Analysis and Computer Science (2000)
Google Scholar
Blake, R., Hess, S., Nicholson-Tuell, J.: The influence of nearest neighbors on the rate and pattern of spontaneous point mutations. JME 34(3), 189–200 (1992)
Article CAS Google Scholar
Kosiol, C., Holmes, I., Goldman, N.: An Empirical Codon Model for Protein Sequence Evolution. Molecular Biology and Evolution 24(7), 1464 (2007)
Article CAS PubMed Google Scholar
Pedersen, A., Jensen, J.: A dependent-rates model and an MCMC-based methodology for the maximum-likelihood analysis of sequences with overlapping reading frames. Molecular Biology and Evolution 18, 763–776 (2001)
Article CAS PubMed Google Scholar
Lio, P., Goldman, N.: Models of Molecular Evolution and Phylogeny. Genome Research 8(12), 1233–1244 (1998)
CAS PubMed Google Scholar
Altschul, S., et al.: The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Research 29(2), 351–361 (2001)
Article CAS PubMed PubMed Central Google Scholar
Olsen, R., Bundschuh, R., Hwa, T.: Rapid assessment of extremal statistics for gapped local alignment. In: ISMB, pp. 211–222 (1999)
Google Scholar
Delaye, L., DeLuna, A., Lazcano, A., Becerra, A.: The origin of a novel gene through overprinting in Escherichia coli. BMC Evolutionary Biology 8, 31 (2008)
Article PubMed PubMed Central Google Scholar
Hubbard, T., et al.: Ensembl 2007. Nucleic Acids Res. 35 (2007)
Google Scholar
Clamp, M., et al.: Distinguishing protein-coding and noncoding genes in the human genome. Proc. Natl. Acad. Sci. 104(49), 19428–19433 (2007)
Article CAS PubMed PubMed Central Google Scholar
Oostra, B., Chiurazzi, P.: The fragile X gene and its function. Clinical genetics 60(6), 399 (2001)
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

INRIA Lille - Nord Europe, LIFL/CNRS, Université Lille 1, 59655, Villeneuve d’Ascq, France
Marta Gîrdea, Laurent Noé & Gregory Kucherov

Authors

Marta Gîrdea
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Noé
View author publications
You can also search for this author in PubMed Google Scholar
Gregory Kucherov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Bioinformatics and Computational Biology, and Department of Computer Science, University of Maryland, MD, College Park, USA
Steven L. Salzberg
Department of Computer Sciences, The University of Texas at Austin, TX, USA
Tandy Warnow

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gîrdea, M., Noé, L., Kucherov, G. (2009). Back-Translation for Discovering Distant Protein Homologies. In: Salzberg, S.L., Warnow, T. (eds) Algorithms in Bioinformatics. WABI 2009. Lecture Notes in Computer Science(), vol 5724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04241-6_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-04241-6_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04240-9
Online ISBN: 978-3-642-04241-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Back-Translation for Discovering Distant Protein Homologies

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids

Partially Local Multi-way Alignments

Dynamic Programming

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Back-Translation for Discovering Distant Protein Homologies

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids

Partially Local Multi-way Alignments

Dynamic Programming

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation