[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

The Gene-Duplication Problem: Near-Linear Time Algorithms for NNI-Based Local Searches

Published: 01 April 2009 Publication History

Abstract

The gene-duplication problem is to infer a species supertree from a collection of gene trees that are confounded by complex histories of gene-duplication events. This problem is NP-complete and thus requires efficient and effective heuristics. Existing heuristics perform a stepwise search of the tree space, where each step is guided by an exact solution to an instance of a local search problem. A classical local search problem is the {\tt NNI} search problem, which is based on the nearest neighbor interchange operation. In this work, we 1) provide a novel near-linear time algorithm for the {\tt NNI} search problem, 2) introduce extensions that significantly enlarge the search space of the {\tt NNI} search problem, and 3) present algorithms for these extended versions that are asymptotically just as efficient as our algorithm for the {\tt NNI} search problem. The exceptional speedup achieved in the extended {\tt NNI} search problems makes the gene-duplication problem more tractable for large-scale phylogenetic analyses. We verify the performance of our algorithms in a comparison study using sets of large randomly generated gene trees.

References

[1]
M.S. Bansal and O. Eulenstein, "The Gene-Duplication Problem: Near-Linear Time Algorithms for NNI-Based Local Searches," Proc. Int'l Symp. Bioinformatics Research and Applications (ISBRA), pp. 14-25, May 2008.
[2]
M. Goodman, J. Czelusniak, G.W. Moore, A.E. Romero-Herrera, and G. Matsuda, "Fitting the Gene Lineage into Its Species Lineage: A Parsimony Strategy Illustrated by Cladograms Constructed from Globin Sequences," Systematic Zoology, vol. 28, pp. 132-163, 1979.
[3]
R. Guigó, I. Muchnik, and T.F. Smith, "Reconstruction of Ancient Molecular Phylogeny," Molecular Phylogenetics and Evolution, vol. 6, no. 2, pp. 189-213, 1996.
[4]
I. Wapinski, A. Pfeffer, N. Friedman, and A. Regev, "Automatic Genome-Wide Reconstruction of Phylogenetic Gene Trees," Proc. Int'l Conf. Intelligent Systems for Molecular Biology/European Conf. Computational Biology (ISMB/ECCB) (Supplement of Bioinformatics), pp. 549-558, July 2007.
[5]
I. Wapinski, A. Pfeffer, N. Friedman, and A. Regev, "Natural History and Evolutionary Principles of Gene Duplication in Fungi," Nature, vol. 449, pp. 54-61, 2007.
[6]
L. Arvestad, A.-C. Berglund, J. Lagergren, and B. Sennblad, "Bayesian Gene/Species Tree Reconciliation and Orthology Analysis Using MCMC," Proc. Int'l Conf. Intelligent Systems for Molecular Biology (ISMB) (Supplement of Bioinformatics), pp. 7-15, 2003.
[7]
L. Arvestad, A.-C. Berglund, J. Lagergren, and B. Sennblad "Gene Tree Reconstruction and Orthology Analysis Based on an Integrated Model for Duplications and Sequence Evolution," Proc. Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB), pp. 326-335, 2004.
[8]
B. Ma, M. Li, and L. Zhang, "From Gene Trees to Species Trees," SIAM J. Computing, vol. 30, no. 3, pp. 729-752, 2000.
[9]
R.D.M. Page, "GeneTree: Comparing Gene and Species Phylogenies Using Reconciled Trees," Bioinformatics, vol. 14, no. 9, pp. 819- 820, 1998.
[10]
J. Felsenstein, Inferring Phylogenies, pp. 37-46. Sinauer Assoc., 2004.
[11]
C. Semple and M. Steel, Phylogenetics, pp. 30-33. Oxford Univ. Press, 2003.
[12]
B.L. Allen and M. Steel, "Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees," Annals of Combinatorics, vol. 5, pp. 1-13, 2001.
[13]
M. Bordewich and C. Semple, "On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance," Annals of Combinatorics, vol. 8, pp. 409-423, 2004.
[14]
R.D.M. Page and M.A. Charleston, "From Gene to Organismal Phylogeny: Reconciled Trees and the Gene Tree/Species Tree Problem," Molecular Phylogenetics and Evolution, vol. 7, pp. 231- 240, 1997.
[15]
D.L. Swofford, G.J. Olsen, P.J. Waddel, and D.M. Hillis, "Phylogenetic Inference," Molecular Systematics, D.M. Hillis, C. Moritz, and B.K. Mable, eds., chapter 11, pp. 407-509, Sinauer Assoc., 1996.
[16]
D. Chen, O. Eulenstein, D. Fernández-Baca, and J.G. Burleigh, "Improved Heuristics for Minimum-Flip Supertree Construction," Evolutionary Bioinformatics, vol. 2, pp. 347-356, 2006.
[17]
M.S. Bansal, J.G. Burleigh, O. Eulenstein, and A. Wehe, "Heuristics for the Gene-Duplication Problem: A ¿(n) Speedup for the Local Search," Proc. Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB), pp. 238-252, 2007.
[18]
M.S. Bansal and O. Eulenstein, "An ¿(n2 / log n) Speedup of TBR Heuristics for the Gene-Duplication Problem," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 5, no. 4, pp. 514-524, Oct.-Dec. 2008.
[19]
G. Ganapathy, V. Ramachandran, and T. Warnow, "Better Hill-Climbing Searches for Parsimony," Proc. Workshop Algorithms in Bioinformatics (WABI), pp. 245-258, 2003.
[20]
G. Ganapathy, V. Ramachandran, and T. Warnow, "On Contract-and-Refine Transformations between Phylogenetic Trees," Proc. Symp. Discrete Algorithms (SODA), pp. 900-909, 2004.
[21]
R.D.M. Page, "Maps between Trees and Cladistic Analysis of Historical Associations Among Genes, Organisms, and Areas," Systematic Biology, vol. 43, no. 1, pp. 58-77, 1994.
[22]
B. Mirkin, I. Muchnik, and T.F. Smith, "A Biologically Consistent Model for Comparing Molecular Phylogenies," J. Computational Biology, vol. 2, no. 4, pp. 493-507, 1995.
[23]
O. Eulenstein, "Predictions of Gene-Duplications and Their Phylogenetic Development," PhD dissertation, Univ. of Bonn, GMD Research Series no. 20/1998, 1998.
[24]
L. Zhang, "On a Mirkin-Muchnik-Smith Conjecture for Comparing Molecular Phylogenies," J. Computational Biology, vol. 4, no. 2, pp. 177-187, 1997.
[25]
K. Chen, D. Durand, and M. Farach-Colton, "Notung: A Program for Dating Gene Duplications and Optimizing Gene Family Trees," J. Computational Biology, vol. 7, pp. 429-447, 2000.
[26]
P. Bonizzoni, G.D. Vedova, and R. Dondi, "Reconciling a Gene Tree to a Species Tree under the Duplication Cost Model," Theoretical Computer Science, vol. 347, nos. 1/2, pp. 36-53, 2005.
[27]
P. Górecki and J. Tiuryn, "On the Structure of Reconciliations," Proc. Research in Computational Molecular Biology (RECOMB) Comparative Genomics Workshop, pp. 42-52, Oct. 2004.
[28]
M.A. Bender and M. Farach-Colton, "The LCA Problem Revisited," Proc. Latin Am. Theoretical Informatics Symp. (LATIN), pp. 88- 94, 2000.
[29]
J.B. Slowinski, A. Knight, and A.P. Rooney, "Inferring Species Trees from Gene Trees: A Phylogenetic Analysis of the Elapidae (Serpentes) Based on the Amino Acid Sequences of Venom Proteins," Molecular Phylogenetics and Evolution, vol. 8, pp. 349- 362, 1997.
[30]
R.D.M. Page, "Extracting Species Trees from Complex Gene Trees: Reconciled Trees and Vertebrate Phylogeny," Molecular Phylogenetics and Evolution, vol. 14, pp. 89-106, 2000.
[31]
R.D.M. Page and J. Cotton, "Vertebrate Phylogenomics: Reconciled Trees and Gene Duplications," Proc. Pacific Symp. Biocomputing, pp. 536-547, 2002.
[32]
J.A. Cotton and R.D.M. Page, "Tangled Tales from Multiple Markers: Reconciling Conflict between Phylogenies to Build Molecular Supertrees," Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, pp. 107-125, Springer-Verlag, 2004.
[33]
M.J. Sanderson and M.M. McMahon, "Inferring Angiosperm Phylogeny from EST Data with Widespread Gene Duplication," BMC Evolutionary Biology, vol. 7, (Suppl. 1): S3, 2007.
[34]
M. Fellows, M. Hallett, C. Korostensky, and U. Stege, "Analogs & Duals of the MAST Problem for Sequences & Trees," Proc. European Symp. Algorithms (ESA), pp. 103-114, 1998.
[35]
U. Stege, "Gene Trees and Species Trees: The Gene-Duplication Problem is Fixed-Parameter Tractable," Proc. Algorithms and Data Structures Symp. (WADS), pp. 288-293, 1999.
[36]
M.T. Hallett and J. Lagergren, "New Algorithms for the Duplication-Loss Model," Proc. Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB), pp. 138-146, 2000.
[37]
B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, and L. Zhang, "On Distances between Phylogenetic Trees," Proc. Symp. Discrete Algorithms (SODA), pp. 427-436, 1997.
[38]
A. Wehe, M.S. Bansal, J.G. Burleigh, and O. Eulenstein, "DupTree: A Program for Large-Scale Phylogenetic Analyses Using Gene Tree Parsimony," Bioinformatics, vol. 24, no. 13, pp. 1540-1541, 2008.

Cited By

View all
  • (2013)Efficient Algorithms for Knowledge-Enhanced Supertree and Supermatrix Phylogenetic ProblemsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2012.16210:6(1432-1441)Online publication date: 1-Nov-2013
  • (2012)Inferring evolutionary scenarios in the duplication, loss and horizontal gene transfer modelLogic and Program Semantics10.5555/2340820.2340828(83-105)Online publication date: 1-Jan-2012
  • (2012)An Efficient Method for Exploring the Space of Gene Tree/Species Tree Reconciliations in a Probabilistic FrameworkIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2011.649:1(26-39)Online publication date: 1-Jan-2012
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 6, Issue 2
April 2009
191 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 April 2009
Published in TCBB Volume 6, Issue 2

Author Tags

  1. Computational phylogenetics
  2. gene-duplication
  3. local search
  4. supertrees
  5. {\tt NNI}.

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2013)Efficient Algorithms for Knowledge-Enhanced Supertree and Supermatrix Phylogenetic ProblemsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2012.16210:6(1432-1441)Online publication date: 1-Nov-2013
  • (2012)Inferring evolutionary scenarios in the duplication, loss and horizontal gene transfer modelLogic and Program Semantics10.5555/2340820.2340828(83-105)Online publication date: 1-Jan-2012
  • (2012)An Efficient Method for Exploring the Space of Gene Tree/Species Tree Reconciliations in a Probabilistic FrameworkIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2011.649:1(26-39)Online publication date: 1-Jan-2012
  • (2012)GTP supertrees from unrooted gene treesProceedings of the 8th international conference on Bioinformatics Research and Applications10.1007/978-3-642-30191-9_11(102-114)Online publication date: 21-May-2012
  • (2012)Complexity insights of the minimum duplication problemProceedings of the 38th international conference on Current Trends in Theory and Practice of Computer Science10.1007/978-3-642-27660-6_13(153-164)Online publication date: 21-Jan-2012
  • (2011)A Note on the Fixed Parameter Tractability of the Gene-Duplication ProblemIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2010.748:3(848-850)Online publication date: 1-May-2011
  • (2010)H-trees: a Model of Evolutionary Scenarios with Horizontal Gene TransferFundamenta Informaticae10.5555/1922521.1922528103:1-4(105-128)Online publication date: 1-Jan-2010

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media