More Web Proxy on the site http://driver.im/

research-article

Identifying Evolutionary Origins of Repeat Domains in Protein Families

Authors:

Chaitanya Aluru,

Mona SinghAuthors Info & Claims

BCB '20: Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

Article No.: 2, Pages 1 - 11

https://doi.org/10.1145/3388440.3412416

Published: 10 November 2020 Publication History

Abstract

Arrays of repeat domains are critical to the proper function of a significant fraction of protein families. These repeats are easily identified in sequence, and are thought to have arisen primarily through the simultaneous duplication of multiple domains. However, for most repeat domain protein families, very little is typically known about the specific domain duplication events that occurred in their evolutionary histories. Here we extend existing reconciliation formulations that use domain trees and sequence trees to infer domain duplication and loss events to additionally consider simultaneous domain duplications under arbitrary cost models. We develop a novel integer linear programming (ILP) solution to this reconciliation problem, and demonstrate the accuracy and robustness of our approach on simulated datasets. Finally, as proof of principle, we apply our approach to an orthogroup containing the C2H2 zinc finger repeat domain, and identify simultaneous domain duplications that occurred at the onset of the primate lineage. Simulation and ILP code is available at https://github.com/Singh-Lab/treeSim.

References

[1]

Gordana Apic, Julian Gough, and Sarah Teichmann. 2001. Domain combinations in archaeal, eubacterial and eukaryotic proteomes. Journal of Molecular Biology 310 (2001), 311--325.

[2]

Eric Audemard, Thomas Schiex, and Thomas Faraut. 2012. Detecting long tandem duplications in genomic sequences. BMC Bioinformatics 13, 1 (2012), 83.

[3]

Mukul S Bansal, Eric J Alm, and Manolis Kellis. 2012. Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss. Bioinformatics 28, 12 (2012), i283-i291.

Digital Library

[4]

Mukul S Bansal and Oliver Eulenstein. 2008. The multiple gene duplication problem revisited. Bioinformatics 24, 13 (2008), i132--i138.

Digital Library

[5]

Winona C Barker, Lynne K Ketcham, and Margaret O Dayhoff. 1978. A comprehensive examination of protein sequences for evidence of internal gene duplication. Journal of Molecular Evolution 10, 4 (1978), 265--281.

[6]

Åsa K Björklund, Diana Ekman, and Arne Elofsson. 2006. Expansion of protein domain repeats. PLoS Computational Biology 2, 8 (2006).

[7]

Cyrus Chothia, Julian Gough, Christine Vogel, and Sarah Teichmann. 2003. Evolution of the Protein Repertoire. Science 300 (2003), 1701--1703.

[8]

Phuong Do Viet, Daniel B Roche, and Andrey V Kajava. 2015. TAPO: A combined method for the identification of tandem repeats in protein structures. FEBS Letters 589, 19 (2015), 2611--2619.

[9]

Riccardo Dondi, Manuel Lafond, and Celine Scornavacca. 2019. Reconciling multiple genes trees via segmental duplications and losses. Algorithms for Molecular Biology 14, 1 (2019), 7.

[10]

Ryan O Emerson and James H Thomas. 2009. Adaptive evolution in zinc finger transcription factors. PLoS Genetics 5, 1 (2009).

[11]

Michael Fellows, Michael Hallett, and Ulrike Stege. 1998. On the multiple gene duplication problem. In International Symposium on Algorithms and Computation. Springer, 348--357.

[12]

Robert D Finn, Jody Clements, and Sean R Eddy. 2011. HMMER web server: Interactive sequence similarity searching. Nucleic Acids Research 39, suppl_2 (2011), W29-W37.

[13]

Morris Goodman, John Czelusniak, G. William Moore, A. E. Romero-Herrera, and Genji Matsuda. 1979. Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Systematic Zoology 28 (1979), 132---163.

[14]

Roderic Guigo, Ilya Muchnik, and Temple F Smith. 1996. Reconstruction of ancient molecular phylogeny. Molecular Phylogenetics and Evolution 6, 2 (1996), 189--213.

[15]

LLC Gurobi Optimization. 2020. Gurobi Optimizer Reference Manual. http://www.gurobi.com

[16]

Mike Hallett, Jens Lagergren, and Ali Tofigh. 2004. Simultaneous identification of duplications and lateral transfers. In Proceedings of the Eighth Annual International Conference on Resaerch in Computational Molecular Biology. 347--356.

Digital Library

[17]

Julien Jorda and Andrey V Kajava. 2009. T-REKS: Identification of Tandem REpeats in sequences with a K-meanS based algorithm. Bioinformatics 25, 20 (2009), 2632--2638.

Digital Library

[18]

Lei Li and Mukul S Bansal. 2018. An integrated reconciliation framework for domain, gene, and species level evolution. IEEE/ACM Transactions on Computational Biology and Bioinformatics 16, 1 (2018), 63--76.

Digital Library

[19]

W Liu, W Yuan, X Li, J Zhuang, X Mo, G Dai, Y Wang, J Chen, Y Wan, Y Li, et al. 2018. ZNF424 Induces Apoptosis and Inhibits Proliferation in Lung Carcinoma Cells. Current Molecular Medicine 18, 2 (2018), 109--115.

[20]

Sayyed Auwn Muhammad, Bengt Sennblad, and Jens Lagergren. 2018. Species tree-aware simultaneous reconstruction of gene and domain evolution. bioRxiv (2018), 336453.

[21]

NCBI. 2020 (accessed May 8, 2020). ZNF57 - zinc finger protein 57. https://www.ncbi.nlm.nih.gov/gene/126295/ortholog/?scope=1437010.

[22]

Stephanie J Spielman and Claus O Wilke. 2015. Pyvolve: A flexible Python module for simulating sequences along phylogenies. PloS One 10, 9 (2015).

[23]

Tanja Stadler. 2011. Simulating trees with a fixed number of extant species. Systematic Biology 60, 5 (2011), 676--684.

[24]

Alexandros Stamatakis. 2014. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 9 (2014), 1312--1313.

[25]

Maureen Stolzer, Han Lai, Minli Xu, Deepa Sathaye, Benjamin Vernot, and Dannie Durand. 2012. Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees. Bioinformatics 28 (2012), i409-i415.

Digital Library

[26]

Maureen Stolzer, Katherine Siewert, Han Lai, Minli Xu, and Dannie Durand. 2015. Event inference in multidomain families with phylogenetic reconciliation. BMC Bioinformatics 16 (2015), S8.

[27]

Lisa Stubbs, Younguk Sun, and Derek Caetano-Anolles. 2011. Function and evolution of C2H2 zinc finger arrays. In A Handbook of Transcription Factors, Timothy Hughes (Ed.). Springer Publishing.

[28]

Yuequn Wang, Junnei Zhou, Xiangli Ye, Yongqi Wan, Youngqing Li, Xiaoyan Mo, Wuzhou Yuan, Yan Yan, Na Luo, Zequn Wang, et al. 2010. ZNF424, a novel human KRAB/C2H2 zinc finger protein, suppresses NFAT and p21 pathway. BMB Reports 43, 3 (2010), 212--218.

[29]

Yi-Chieh Wu, Matthew D Rasmussen, Mukul S Bansal, and Manolis Kellis. 2013. TreeFix: Statistically informed gene tree error correction using species trees. Systematic Biology 62, 1 (2013), 110--120.

[30]

Yi-Chieh Wu, Matthew D Rasmussen, and Manolis Kellis. 2012. Evolution at the subgene level: Domain rearrangements in the Drosophila phylogeny. Molecular Biology and Evolution 29, 2 (2012), 689--705.

Identifying Evolutionary Origins of Repeat Domains in Protein Families
1. Applied computing
  1. Life and medical sciences

Recommendations

Reconciliation-Based Methods for Identifying the Evolutionary Origins of Tandem Duplications in Repeat Domain Families
Finding Protein Domain Boundaries: An Automated, Non-Homology-Based Method

A Bayesian algorithm identifies structural domains in proteins using amino acid sequence information only. This approach differs from other sequence-only approaches, which are typically sequence-homology-based, not fully automated, or dependent on the ...
A multi-expert system for the automatic detection of protein domains from sequence information
RECOMB '03: Proceedings of the seventh annual international conference on Research in computational molecular biology

We describe a novel method for detecting the domain structure of a protein from sequence information alone. The method is based on analyzing multiple sequence alignments that are derived from a database search. Multiple measures are defined to quantify ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

BCB '20: Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

September 2020

193 pages

ISBN:9781450379649

DOI:10.1145/3388440

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGBio: ACM Special Interest Group on Bioinformatics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 November 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Institutes of Health

Conference

BCB '20

Sponsor:

SIGBio

BCB '20: 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

September 21 - 24, 2020

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
99
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten