[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Unconstrained Diameters for Deep Coalescence

Published: 01 September 2017 Publication History

Abstract

The minimizing-deep-coalescence MDC approach infers a median species tree for a given set of gene trees under the deep coalescence cost. This cost accounts for the minimum number of deep coalescences needed to reconcile a gene tree with a species tree where the leaf-genes are mapped to the leaf-species through a function called leaf labeling. In order to better understand the MDC approach we investigate here the diameter of a gene tree, which is an important property of the deep coalescence cost. This diameter is the maximal deep coalescence costs for a given gene tree under all leaf labelings for each possible species tree topology. While we prove that this diameter is generally infinite, this result relies on the diameter’s unrealistic assumption that species trees can be of infinite size. Providing a more practical definition, we introduce a natural extension of the gene tree diameter that constrains the species tree size by a given constant. For this new diameter, we describe an exact formula, present a complete classification of the trees yielding this diameter, derive formulas for its mean and variance, and demonstrate its ability using comparative studies.

References

[1]
B. L. Allen and M. Steel, "Subtree transfer operations and their induced metrics on evolutionary trees," Ann. Combinatorics, vol. 5, pp. 1-15, 2001.
[2]
M. S. Bansal, J. G. Burleigh, and O. Eulenstein, "Efficient genomescale phylogenetic analysis under the duplication-loss and deep coalescence cost models," BMC Bioinf., vol. 11, no. Suppl. 1, p. S42, 2010.
[3]
B. C. Carstens and L. L. Knowles, "Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from melanoplus grasshoppers," Syst. Biol., vol. 56, no. 3, pp. 400-11, 2007.
[4]
W.-C. Chang, P. Górecki, and O. Eulenstein, "Exact solutions for species tree inference from discordant gene trees," J. Bioinf. Comput. Biol., vol. 11, no. 05, p. 1342005, 2013.
[5]
R. Chaudhary, M. S. Bansal, A. Wehe, D. Fernández-Baca, and O. Eulenstein, "iGTP: A software package for large-scale gene tree parsimony analysis," BMC Bioinf., vol. 11, p. 574, 2010.
[6]
F.-C. Chen and W.-H. Li, "Genomic divergences between human and other hominoids and the effective population size of the common ancestor of humans and chimpanzees," Am. J. Human Genet., vol. 68, pp. 444-456, 2001.
[7]
J. H. Degnan and N. A. Rosenberg, "Gene tree discordance, phylogenetic inference and the multispecies coalescent," Trends Ecol. Evol., vol. 24, no. 6, pp. 332-40, 2009.
[8]
F. Disanto and T. Wiehe, "Exact enumeration of cherries and pitchforks in ranked trees under the coalescent model," Math. Biosci., vol. 242, no. 2, pp. 195-200, 2013.
[9]
I. Ebersberger, P. Galgoczy, S. Taudien, S. Taenzer, M. Platzer, and A. Von Haeseller, "Mapping human genetic ancestry," Molecular Biol. Evol., vol. 24, pp. 2266-76, 2007.
[10]
P. Górecki and O. Eulenstein, "A linear time algorithm for error-corrected reconciliation of unrooted gene trees," in Proc. 7th Int. Conf. Bioinf. Res. Appl., pp. 148-159, vol. 6674, 2011.
[11]
P. Górecki and O. Eulenstein, "Deep coalescence reconciliation with unrooted gene trees: Linear time algorithms," in Proc. 18th Annu. Int. Conf. Comput. Combinator., 2012, vol. 7434, pp. 531-542.
[12]
P. Górecki and O. Eulenstein, "GTP supertrees from unrooted gene trees: Linear time algorithms for NNI based local searches," in Proc. 8th Int. Conf. Bioinf. Res. Appl., 2012, vol. 7292, pp. 83-105.
[13]
P. Górecki and O. Eulenstein, "Gene tree diameter for deep coalescence," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 12, no. 1, pp. 155-165, Jan./Feb. 2015.
[14]
P. Górecki and O. Eulenstein, "Maximizing deep coalescence cost," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 11, no. 1, pp. 231-242, Jan./Feb. 2014.
[15]
P. Górecki, O. Eulenstein, and J. Tiuryn, "Unrooted tree reconciliation: A unified approach," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 10, no. 2, pp. 552-536, Mar./Apr. 2013.
[16]
P. Górecki, J. Paszek, and O. Eulenstein, "Duplication cost diameters," in Proc. 10th Int. Symp. Bioinf. Res. Appl., 2014, vol. 8492, pp. 212-223.
[17]
P. Górecki and J. Tiuryn, "DLS-trees: A model of evolutionary scenarios," Theor. Comput. Sci., vol. 359, nos. 1-3, pp. 378-399, 2006.
[18]
P. Górecki and J. Tiuryn, "UREC: A system for unrooted reconciliation," Bioinf., vol. 23, no. 4, pp. 511-512, 2007.
[19]
R. Guigó, I. B. Muchnik, and T. F. Smith, "Reconstruction of ancient molecular phylogeny," Molecular Phylogenet. Evol., vol. 6, no. 2, pp. 189-213, 1996.
[20]
S. Guindon, F. Delsuc, J. Dufayard, and O. Gascuel, "Estimating maximum likelihood phylogenies with PhyML," Methods Mol. Biol., vol. 537, pp. 113-37, 2009.
[21]
E. F. Harding, "The probabilities of rooted tree-shapes generated by random bifurcation," Adv. Appl. Probab., vol. 3, no. 1, pp. 44-77, 1971.
[22]
S. R. Harris, E. J. P. Cartwright, M. E. Török, M. T. G. Holden, N. M. Brown, A. L. Ogilvy-Stuart, M. J. Ellington, M. A. Quail, S. D. Bentley, J. Parkhill, and S. J. Peacock, "Whole-genome sequencing for analysis of an outbreak of meticillin-resistant staphylococcus aureus: A descriptive study," Lancet Infect. Disease, vol. 13, no. 2, pp. 130-136, 2013.
[23]
M. Hasegawa, H. Kishino, and T. Yano, "Dating of human-ape splitting by a molecular clock of mitochondrial DNA," J. Molecular Evol., vol. 22, no. 2, pp. 160-174, 1985.
[24]
M. D. Hendy and D. Penny, "Branch and bound algorithms to determine minimal evolutionary trees," Math. Biosci., vol. 59, no. 2, pp. 277-290, 1982.
[25]
K. T. Huber, A. Spillner, R. Suchecki, and V. Moulton, "Metrics on multilabeled trees: Interrelationships and diameter bounds," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 8, no. 4, pp. 1029-1040, Jul./Aug. 2011.
[26]
R. A. Hufbauer, R. A. Marrs, A. K. Jackson, R. Sforza, H. P. Bais, J. M. Vivanco, and S. E. Carney, "Population structure, ploidy levels and allelopathy of Centaurea maculosa (spotted knapweed) and C. diffusa (diffuse knapweed) in North America and Eurasia," in Proc. Int. Symp. Biological Control of Weeds. New York, NY, US: CABI, 2003.
[27]
P. J. Humphries and T. Wu, "On the neighbourhoods of trees," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 10, no. 3, pp. 721-728, May/Jun. 2013.
[28]
W. B. Jennings and S. V. Edwards, "Speciational history of australian grass finches (poephila) inferred from thirty gene trees," Evolution, vol. 59, no. 9, pp. 2033-2047, 2005.
[29]
M. Li, J. Tromp, and L. Zhang, "On the nearest neighbour interchange distance between evolutionary trees," J. Theory Biol., vol. 182, no. 4, pp. 463-467, Oct. 1996.
[30]
H. T. Lin, J. G. Burleigh, and O. Eulenstein, "Consensus properties for the deep coalescence problem and their application for scalable tree search," BMC Bioinf., vol. 13, no. Suppl 10, p. S12, 2012.
[31]
W. P. Maddison, "Gene trees in species trees," Syst. Biol., vol. 46, pp. 523-536, 1997.
[32]
W. P. Maddison and L. L. Knowles, "Inferring phylogeny despite incomplete lineage sorting," Syst. Biol., vol. 55, no. 1, pp. 21-30, 2006.
[33]
A. McKenzie and M. Steel, "Distributions of cherries for two models of trees," Math. Biosci., vol. 164, no. 1, pp. 81-92, 2000.
[34]
S. Nik-Zainal, et al., "The life history of 21 breast cancers," Cell, vol. 149, no. 5, pp. 994-1007, 2012.
[35]
R. D. M. Page and M. A. Charleston, "Reconciled trees and incongruent gene and species trees," In B. Mirkin, F. Roberts, and A. Rzhetsky (ends), Math. Hierarchies in Biol., 1997, vol. 37, pp. 57-70.
[36]
R. D. M. Page and E. C. Holmes, "Molecular Evolution: A Phylogenetic Approach. Hoboken, NJ, US: Blackwell, 1998.
[37]
D. A. Pollard, V. N. Lyer, A. M. Moses, and M. B. Eisen, "Widespread discordance of gene trees with species tree in drosophila: Evidence for incomplete lineage sorting," PLoS Genet., vol. 2, no. 10, p. e173, 2006.
[38]
A. Rambaut and N. C. Grassly, "Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees," Comput. Appl. Biosci., vol. 13, no. 3, pp. 235-238, 1997.
[39]
A. Rokas, B. L. Williams, N. King, and S. B. Carroll, "Genomescale approaches to resolving incongruence in molecular phylogenies," Nature, vol. 425, pp. 798-804, 2003.
[40]
N. A. Rosenberg, "The mean and variance of the numbers of r-pronged nodes and r-caterpillars in Yule-generated genealogical trees," Ann. Comb., vol. 10, no. 1, pp. 129-146, 2006.
[41]
J. Ruan, et al., "TreeFam: 2008 update," Nucleic Acids Res., vol. 36, pp. D735-D740, 2008.
[42]
A. Sánchez-Gracia and J. Castresana, "Impact of deep coalescence on the reliability of species tree inference from different types of dna markers in mammals," PLoS ONE, vol. 7, no. 1, p. e30239, 2012.
[43]
M. J. Sanderson, "R8S: Inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock," Bioinf., vol. 19, no. 2, pp. 301-302, 2003.
[44]
Y. Satta, J. Klein, and N. Takahata, "DNA archives and our nearest relative: The trichotomy problem revisited," Molecular Phylogenet Evol., vol. 14, no. 2, pp. 259-275, 2000.
[45]
M. A. Steel and D. Penny, "Distributions of tree comparison metrics-- some new results," Syst. Biol., vol. 42, no. 2, pp. 126-141, 1993.
[46]
S.-J. Sul and T. L. Williams, "An experimental analysis of Robinson-Foulds distance matrix algorithms," Lect. Notes Comput. Sci., vol. 5193, pp. 793-804, 2008.
[47]
J. Syring, F. Kathleen, R. Businsky, R. Cronn, and A. Liston, "Widespread genealogical nonmonophyly in species of pinus subgenus strobus," Syst. Biol., vol. 56, no. 2, pp. 163-81, 2007.
[48]
K. Takahashi, Y. Terai, M. Nishida, and N. Okada, "Phylogenetic relationships and ancient incomplete lineage sorting among cichlid fishes in lake tanganyika as revealed by analysis of the insertion of retroposons," Molecular Biol. Evol., vol. 18, no. 11, pp. 2057- 2066, 2001.
[49]
C. V. Than and L. Nakhleh, "Species tree inference by minimizing deep coalescences," PLoS Comput. Biol., vol. 5, no. 9, p. e1000501, 2009.
[50]
C. V. Than and N. A. Rosenberg, "Consistency properties of species tree inference by minimizing deep coalescences," J. Comput. Biol., vol. 18, no. 1, pp. 1-15, 2011.
[51]
C. V. Than and N. A. Rosenberg, "Mathematical properties of the deep coalescence cost," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 10, no. 1, pp. 61-72, 2013.
[52]
M. Wilkinson, J. A. Cotton, F.-J. Lapointe, and D. Pisani, "Properties of supertree methods in the consensus setting," Syst. Biol., vol. 56, no. 2, pp. 330-337, 2007.
[53]
L. Zhang, "From Gene Trees to Species Trees II: Species Tree Inference by Minimizing Deep Coalescence Events," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 8, pp. 1685-1691, Nov./Dec. 2011.
  1. Unconstrained Diameters for Deep Coalescence

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
    IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 14, Issue 5
    September 2017
    202 pages

    Publisher

    IEEE Computer Society Press

    Washington, DC, United States

    Publication History

    Published: 01 September 2017
    Published in TCBB Volume 14, Issue 5

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 22
      Total Downloads
    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Dec 2024

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media