Abstract
Phylogenetic analyses of species based on single genes or parts of the genomes are often inconsistent because of factors such as variable rates of evolution and horizontal gene transfer. The availability of more and more sequenced genomes allows phylogeny construction from complete genomes that is less sensitive to such inconsistency. For such long sequences, construction methods like maximum parsimony and maximum likelihood are often not possible due to their intensive computational requirement. Another class of tree construction methods, namely distance-based methods, require a measure of distances between any two genomes. Some measures such as evolutionary edit distance of gene order and gene content are computational expensive or do not perform well when the gene content of the organisms are similar. This study presents an information theoretic measure of genetic distances between genomes based on the biological compression algorithm expert model. We demonstrate that our distance measure can be applied to reconstruct the consensus phylogenetic tree of a number of Plasmodium parasites from their genomes, the statistical bias of which would mislead conventional analysis methods. Our approach is also used to successfully construct a plausible evolutionary tree for the γ-Proteobacteria group whose genomes are known to contain many horizontally transferred genes.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Camin, J., Sokal, R.: A method for deducing branching sequences in phylogeny. Evolution, 311–326 (1965)
Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Bio., 368–376 (1981)
Saitou, N., Nei, M.: The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol., 406–425 (1987)
Gogarten, P., Townsend, F.: Horizontal gene transfer, genome innovation and evolution. Nature Reviews Microbiology, 679–687 (2005)
Sankoff, D., Leduc, G., Antoine, N., Paquin, B., Lang, B.F., Cedergren, R.: Gene order comparisons for phylogenetic inference: evolution of the mitochondrial genome. PNAS, 6575–6579 (1992)
Snel, B., Bork, P., Huynen, M.A.: Genome phylogeny based on gene content. Nat. Genet., 66–67 (1999)
Shannon, C.E.: A mathematical theory of communication. The Bell System Technical Journal, 379–423 (1948)
Wallace, C.S., Boulton, D.M.: An information measure for classification. Computer Journal, 185–194 (1968)
Sokal, R., Michener, C.: A statistical method for evaluating systematic relationships. University of Kansas Science Bulletin, 1409–1438 (1958)
Lerat, E., Daubin, V., Moran, N.A.: From gene trees to organismal phylogeny in prokaryotes: the case of the gamma-proteobacteria. PLoS Biology, e19 (2003)
Vinga, S., Almeida, J.: Alignment-free sequence comparison - a review. Bioinformatics, 513–523 (2003)
Blaisdell, B.E.: A measure of the similarity of sets of sequences not requiring sequence alignment. PNAS, 5155–5159 (1986)
Gentleman, J., Mullin, R.: The distribution of the frequency of occurrence of nucleotide subsequences, based on their overlap capability. Biometrics, 35–52 (1989)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 337–342 (1977)
Cleary, J.G., Witten, I.H.: Data compression using adaptive coding and partial string matching. IEEE Transactions on Communications, 396–402 (1984)
Grumbach, S., Tahi, F.: A new challenge for compression algorithms: Genetic sequences. Journal of Information Processing and Management, 875–866 (1994)
Chen, X., Kwong, S., Li, M.: A compression algorithm for DNA sequences and its applications in genome comparison. RECOMB, 107 (2000)
Li, M., Badger, J.H., Chen, X., Kwong, S., Kearney, P., Zhang, H.: An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics, 149–154 (2001)
Otu, H., Sayood, K.: A new sequence distance measure for phylogenetic tree construction. Bioinformatics, 2122–2130 (2003)
Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Transactions on Information Theory, 75–81 (1976)
Cao, M.D., Dix, T.I., Allison, L., Mears, C.: A simple statistical algorithm for biological sequence compression. DCC, 43–52 (2007)
Felsenstein, J.: PHYLIP phylogeny inference package. Technical report (1993)
Waters, A., Higgins, D., McCutchan, T.: Evolutionary relatedness of some primate models of plasmodium. Mol. Biol. Evol., 914–923 (1993)
Escalante, A., Goldman, I.F., Rijk, P.D., Wachter, R.D., Collins, W.E., Qari, S.H., Lal, A.A.: Phylogenetic study of the genus plasmodium based on the secondary structure-based alignment of the small subunit ribosomal RNA. Molecular and Biochemical Parasitology, 317–321 (1997)
Corredor, V., Enea, V.: Plasmodial ribosomal RNA as phylogenetic probe: a cautionary note. Mol. Biol. Evol., 924–926 (1993)
Leclerc, M.C., Hugot, J.P., Durand, P., Renaud, F.: Evolutionary relationships between 15 plasmodium species from new and old world primates (including humans): an 18s rDNA cladistic analysis. Parasitology, 677–684 (2004)
Cao, M.D., Dix, T.I., Allison, L.: Computing substitution matrices for genomic comparative analysis. In: PAKDD, pp. 647–655 (2009)
Siddall, M.E., Barta, J.R.: Phylogeny of plasmodium species: Estimation and inference. The Journal of Parasitology, 567–568 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cao, M.D., Allison, L., Dix, T. (2009). A Distance Measure for Genome Phylogenetic Analysis. In: Nicholson, A., Li, X. (eds) AI 2009: Advances in Artificial Intelligence. AI 2009. Lecture Notes in Computer Science(), vol 5866. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10439-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-10439-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10438-1
Online ISBN: 978-3-642-10439-8
eBook Packages: Computer ScienceComputer Science (R0)