[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3388440.3412416acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Identifying Evolutionary Origins of Repeat Domains in Protein Families

Published: 10 November 2020 Publication History

Abstract

Arrays of repeat domains are critical to the proper function of a significant fraction of protein families. These repeats are easily identified in sequence, and are thought to have arisen primarily through the simultaneous duplication of multiple domains. However, for most repeat domain protein families, very little is typically known about the specific domain duplication events that occurred in their evolutionary histories. Here we extend existing reconciliation formulations that use domain trees and sequence trees to infer domain duplication and loss events to additionally consider simultaneous domain duplications under arbitrary cost models. We develop a novel integer linear programming (ILP) solution to this reconciliation problem, and demonstrate the accuracy and robustness of our approach on simulated datasets. Finally, as proof of principle, we apply our approach to an orthogroup containing the C2H2 zinc finger repeat domain, and identify simultaneous domain duplications that occurred at the onset of the primate lineage. Simulation and ILP code is available at https://github.com/Singh-Lab/treeSim.

References

[1]
Gordana Apic, Julian Gough, and Sarah Teichmann. 2001. Domain combinations in archaeal, eubacterial and eukaryotic proteomes. Journal of Molecular Biology 310 (2001), 311--325.
[2]
Eric Audemard, Thomas Schiex, and Thomas Faraut. 2012. Detecting long tandem duplications in genomic sequences. BMC Bioinformatics 13, 1 (2012), 83.
[3]
Mukul S Bansal, Eric J Alm, and Manolis Kellis. 2012. Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss. Bioinformatics 28, 12 (2012), i283-i291.
[4]
Mukul S Bansal and Oliver Eulenstein. 2008. The multiple gene duplication problem revisited. Bioinformatics 24, 13 (2008), i132--i138.
[5]
Winona C Barker, Lynne K Ketcham, and Margaret O Dayhoff. 1978. A comprehensive examination of protein sequences for evidence of internal gene duplication. Journal of Molecular Evolution 10, 4 (1978), 265--281.
[6]
Åsa K Björklund, Diana Ekman, and Arne Elofsson. 2006. Expansion of protein domain repeats. PLoS Computational Biology 2, 8 (2006).
[7]
Cyrus Chothia, Julian Gough, Christine Vogel, and Sarah Teichmann. 2003. Evolution of the Protein Repertoire. Science 300 (2003), 1701--1703.
[8]
Phuong Do Viet, Daniel B Roche, and Andrey V Kajava. 2015. TAPO: A combined method for the identification of tandem repeats in protein structures. FEBS Letters 589, 19 (2015), 2611--2619.
[9]
Riccardo Dondi, Manuel Lafond, and Celine Scornavacca. 2019. Reconciling multiple genes trees via segmental duplications and losses. Algorithms for Molecular Biology 14, 1 (2019), 7.
[10]
Ryan O Emerson and James H Thomas. 2009. Adaptive evolution in zinc finger transcription factors. PLoS Genetics 5, 1 (2009).
[11]
Michael Fellows, Michael Hallett, and Ulrike Stege. 1998. On the multiple gene duplication problem. In International Symposium on Algorithms and Computation. Springer, 348--357.
[12]
Robert D Finn, Jody Clements, and Sean R Eddy. 2011. HMMER web server: Interactive sequence similarity searching. Nucleic Acids Research 39, suppl_2 (2011), W29-W37.
[13]
Morris Goodman, John Czelusniak, G. William Moore, A. E. Romero-Herrera, and Genji Matsuda. 1979. Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Systematic Zoology 28 (1979), 132---163.
[14]
Roderic Guigo, Ilya Muchnik, and Temple F Smith. 1996. Reconstruction of ancient molecular phylogeny. Molecular Phylogenetics and Evolution 6, 2 (1996), 189--213.
[15]
LLC Gurobi Optimization. 2020. Gurobi Optimizer Reference Manual. http://www.gurobi.com
[16]
Mike Hallett, Jens Lagergren, and Ali Tofigh. 2004. Simultaneous identification of duplications and lateral transfers. In Proceedings of the Eighth Annual International Conference on Resaerch in Computational Molecular Biology. 347--356.
[17]
Julien Jorda and Andrey V Kajava. 2009. T-REKS: Identification of Tandem REpeats in sequences with a K-meanS based algorithm. Bioinformatics 25, 20 (2009), 2632--2638.
[18]
Lei Li and Mukul S Bansal. 2018. An integrated reconciliation framework for domain, gene, and species level evolution. IEEE/ACM Transactions on Computational Biology and Bioinformatics 16, 1 (2018), 63--76.
[19]
W Liu, W Yuan, X Li, J Zhuang, X Mo, G Dai, Y Wang, J Chen, Y Wan, Y Li, et al. 2018. ZNF424 Induces Apoptosis and Inhibits Proliferation in Lung Carcinoma Cells. Current Molecular Medicine 18, 2 (2018), 109--115.
[20]
Sayyed Auwn Muhammad, Bengt Sennblad, and Jens Lagergren. 2018. Species tree-aware simultaneous reconstruction of gene and domain evolution. bioRxiv (2018), 336453.
[21]
NCBI. 2020 (accessed May 8, 2020). ZNF57 - zinc finger protein 57. https://www.ncbi.nlm.nih.gov/gene/126295/ortholog/?scope=1437010.
[22]
Stephanie J Spielman and Claus O Wilke. 2015. Pyvolve: A flexible Python module for simulating sequences along phylogenies. PloS One 10, 9 (2015).
[23]
Tanja Stadler. 2011. Simulating trees with a fixed number of extant species. Systematic Biology 60, 5 (2011), 676--684.
[24]
Alexandros Stamatakis. 2014. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 9 (2014), 1312--1313.
[25]
Maureen Stolzer, Han Lai, Minli Xu, Deepa Sathaye, Benjamin Vernot, and Dannie Durand. 2012. Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees. Bioinformatics 28 (2012), i409-i415.
[26]
Maureen Stolzer, Katherine Siewert, Han Lai, Minli Xu, and Dannie Durand. 2015. Event inference in multidomain families with phylogenetic reconciliation. BMC Bioinformatics 16 (2015), S8.
[27]
Lisa Stubbs, Younguk Sun, and Derek Caetano-Anolles. 2011. Function and evolution of C2H2 zinc finger arrays. In A Handbook of Transcription Factors, Timothy Hughes (Ed.). Springer Publishing.
[28]
Yuequn Wang, Junnei Zhou, Xiangli Ye, Yongqi Wan, Youngqing Li, Xiaoyan Mo, Wuzhou Yuan, Yan Yan, Na Luo, Zequn Wang, et al. 2010. ZNF424, a novel human KRAB/C2H2 zinc finger protein, suppresses NFAT and p21 pathway. BMB Reports 43, 3 (2010), 212--218.
[29]
Yi-Chieh Wu, Matthew D Rasmussen, Mukul S Bansal, and Manolis Kellis. 2013. TreeFix: Statistically informed gene tree error correction using species trees. Systematic Biology 62, 1 (2013), 110--120.
[30]
Yi-Chieh Wu, Matthew D Rasmussen, and Manolis Kellis. 2012. Evolution at the subgene level: Domain rearrangements in the Drosophila phylogeny. Molecular Biology and Evolution 29, 2 (2012), 689--705.
  1. Identifying Evolutionary Origins of Repeat Domains in Protein Families

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    BCB '20: Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
    September 2020
    193 pages
    ISBN:9781450379649
    DOI:10.1145/3388440
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 November 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. duplications
    2. phylogenetics
    3. protein domains
    4. reconciliation

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    BCB '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 254 of 885 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 99
      Total Downloads
    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media