[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3233547.3233592acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article
Open access

Splice-Aware Multiple Sequence Alignment of Protein Isoforms

Published: 15 August 2018 Publication History

Abstract

Multiple sequence alignment (MSA) is a classic problem in computational genomics. In typical use, MSA software is expected to align a collection of homologous genes, such as orthologs from multiple species or duplication-induced paralogs within a species. Recent focus on the importance of alternatively-spliced isoforms in disease and cell biology has highlighted the need to create MSAs that more effectively accommodate isoforms. MSAs are traditionally constructed using scoring criteria that prefer alignments with occasional mismatches over alignments with long gaps. Alternatively spliced protein isoforms effectively contain exon-length insertions or deletions (indels) relative to each other, and demand an alternative approach. Some improvements can be achieved by making indel penalties much smaller, but this is merely a patchwork solution. In this work we present Mirage, a novel MSA software package for the alignment of alternatively spliced protein isoforms. Mirage aligns isoforms to each other by first mapping each protein sequence to its encoding genomic sequence, and then aligning isoforms to one another based on the relative genomic coordinates of their constitutive codons. Mirage is highly effective at mapping proteins back to their encoding exons, and these protein-genome mappings lead to extremely accurate intra-species alignments; splice site information in these alignments is used to improve the accuracy of inter-species alignments of isoforms. Mirage alignments have also revealed the ubiquity of dual-coding exons, in which an exon conditionally encodes multiple open reading frames as overlapping spliced segments of frame-shifted genomic sequence.

References

[1]
Helena Block, Anika Stadtmann, Daniel Riad, Jan Rossaint, Charlotte Sohlbach, Giulia Germena, Dianqing Wu, Scott I Simon, Klaus Ley, and Alexander Zarbock. 2016. Gnb isoforms control a signaling pathway comprising Rac1, Plcβ2, and Plcβ3 leading to LFA-1 activation and neutrophil arrest in vivo. Blood 127, 3 (2016), 314--324.
[2]
John Brognard and Tony Hunter. 2011. Protein kinase signaling networks in cancer. Current Opinion in Genetics & Development 21, 1 (2011), 4--11.
[3]
Robert C Edgar. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32, 5 (2004), 1792--1797.
[4]
Robert C Edgar. 2010. Quality measures for protein alignment benchmarks. Nucleic acids research 38, 7 (2010), 2145--2153.
[5]
Robert C Edgar and Serafim Batzoglou. 2006. Multiple sequence alignment. Current Opinion in Structural Biology 16, 3 (2006), 368--373.
[6]
Walid H Gharib and Marc Robinson-Rechavi. 2011. When orthologs diverge between human and mouse. Briefings in bioinformatics 12, 5 (2011), 436--441.
[7]
Osamu Gotoh, Mariko Morita, and David R Nelson. 2014. Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment. BMC Bioinformatics 15, 1 (2014), 189.
[8]
Emanuela Grassilli, Fabio Pisano, Annamaria Cialdella, Sara Bonomo, Carola Missaglia, Maria Grazia Cerrito, Laura Masiero, Leonarda Ianzano, Federica Giordano, Vittoria Cicirelli, et al. 2016. A novel oncogenic BTK isoform is overexpressed in colon cancers and required for RAS-mediated transformation. Oncogene 35, 33 (2016), 4368.
[9]
M Kamrul Hasan, Tomoko Yaguchi, Yasumasu Minoda, Takashi Hirano, Kazunari Taira, Renu Wadhwa, et al. 2004. Alternative reading frame protein (ARF)- independent function of CARF (collaborator of ARF) involves its interactions with p53: evidence for a novel p53-activation pathway and its negative feedback control. Biochemical Journal 380, 3 (2004), 605--610.
[10]
Klas Hatje, Raza-Ur Rahman, Ramon O Vidal, Dominic Simm, Björn Hammesfahr, Vikas Bansal, Ashish Rajput, Michel Edwar Mickael, Ting Sun, Stefan Bonn, et al. 2017. The landscape of human mutually exclusive splicing. Molecular Systems Biology 13, 12 (2017), 959.
[11]
Steven Henikoff and Jorja G Henikoff. 1992. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences 89, 22 (1992), 10915--10919.
[12]
Desmond G Higgins and Paul M Sharp. 1988. CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene 73, 1 (1988), 237--244.
[13]
Peter V Hornbeck, Bin Zhang, Beth Murray, Jon M Kornhauser, Vaughan Latham, and Elzbieta Skrzypek. 2014. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Research 43, D1 (2014), D512--D520.
[14]
Stefano Iantorno, Kevin Gori, Nick Goldman, Manuel Gil, and Christophe Dessimoz. 2014. Who watches the watchmen? An appraisal of benchmarks for multiple sequence alignment. In Multiple Sequence Alignment Methods. Springer, 59--73.
[15]
Hiroaki Iwata and Osamu Gotoh. 2012. Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features. Nucleic Acids Research 40, 20 (2012), e161--e161.
[16]
Kazutaka Katoh, Kazuharu Misawa, Kei-ichi Kuma, and Takashi Miyata. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic acids research 30, 14 (2002), 3059--3066.
[17]
John D Kececioglu and Weiqing Zhang. 1998. Aligning alignments. In Annual Symposium on Combinatorial Pattern Matching. Springer, 189--208.
[18]
W James Kent. 2002. BLAT--the BLAST-like alignment tool. Genome Research 12, 4 (2002), 656--664.
[19]
W James Kent, Robert Baertsch, Angie Hinrichs, Webb Miller, and David Haussler. 2003. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proceedings of the National Academy of Sciences 100, 20 (2003), 11484--11489.
[20]
W James Kent, Charles W Sugnet, Terrence S Furey, Krishna M Roskin, Tom H Pringle, Alan M Zahler, and David Haussler. 2002. The human genome browser at UCSC. Genome Research 12, 6 (2002), 996--1006.
[21]
Erika Kovacs, Peter Tompa, Karoly Liliom, and Lajos Kalmar. 2010. Dual coding in alternative reading frames correlates with intrinsic protein disorder. Proceedings of the National Academy of Sciences 107, 12 (2010), 5429--5434.
[22]
Hong Li, Xiaobin Xing, Guohui Ding, Qingrun Li, Chuan Wang, Lu Xie, Rong Zeng, and Yixue Li. 2009. SysPTM: a systematic resource for proteomic research on post-translational modifications. Molecular & Cellular Proteomics 8, 8 (2009), 1839--1849.
[23]
Han Liang and Laura F Landweber. 2006. A genome-wide study of dual coding regions in human alternatively spliced genes. Genome research 16, 2 (2006), 190--196.
[24]
Kiersten A Liddy, Melanie Y White, and Stuart J Cordwell. 2013. Functional decorations: post-translational modifications and heart disease delineated by targeted proteomics. Genome medicine 5, 2 (2013), 20.
[25]
Stefan Maas, Alexander Rich, and Kazuko Nishikura. 2003. A-to-I RNA editing: recent news and residual mysteries. Journal of Biological Chemistry 278, 3 (2003), 1391--1394.
[26]
Jason Merkin, Caitlin Russell, Ping Chen, and Christopher B Burge. 2012. Evolutionary dynamics of gene and isoform regulation in Mammalian tissues. Science 338, 6114 (2012), 1593--1599.
[27]
Saul B Needleman and Christian D Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 3 (1970), 443--453.
[28]
Qun Pan, Ofer Shai, Leo J Lee, Brendan J Frey, and Benjamin J Blencowe. 2008. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature Genetics 40, 12 (2008), 1413--1415.
[29]
Sangya Pundir, Maria J Martin, and Claire O'Donovan. 2017. UniProt protein knowledgebase. Protein Bioinformatics: From Protein Modifications and Networks to Proteomics (2017), 41--55.
[30]
Corinne Rancurel, Mahvash Khosravi, A Keith Dunker, Pedro R Romero, and David Karlin. 2009. Overlapping genes produce proteins with unusual sequence properties and offer insight into de novo protein creation. Journal of virology 83, 20 (2009), 10719--10736.
[31]
Chris Sander and Reinhard Schneider. 1991. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins: Structure, Function, and Bioinformatics 9, 1 (1991), 56--68.
[32]
Fabian Sievers, Andreas Wilm, David Dineen, Toby J Gibson, Kevin Karplus, Weizhong Li, Rodrigo Lopez, Hamish McWilliam, Michael Remmert, Johannes Söding, et al. 2011. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular Systems Biology 7, 1 (2011), 539.
[33]
Guy St C Slater and Ewan Birney. 2005. Automated generation of heuristics for biological sequence comparison. BMC bioinformatics 6, 1 (2005), 31.
[34]
Dorothee Staiger and John WS Brown. 2013. Alternative splicing at the intersection of biological timing, development, and stress responses. The Plant Cell 25, 10 (2013), 3640--3656.
[35]
Eric T Wang, Rickard Sandberg, Shujun Luo, Irina Khrebtukova, Lu Zhang, Christine Mayr, Stephen F Kingsmore, Gary P Schroth, and Christopher B Burge. 2008. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 7221 (2008), 470.
[36]
Youchun Wang, Huayuan Zhang, Roger Ling, Hemin Li, and Tim J Harrison. 2000. The complete sequence of hepatitis E virus genotype 4 reveals an alternative strategy for translation of open reading frames 2 and 3. Journal of General Virology 81, 7 (2000), 1675--1686.
[37]
Robert J Weatheritt, Norman E Davey, and Toby J Gibson. 2012. Linear motifs confer functional diversity onto splice variants. Nucleic acids research 40, 15 (2012), 7123--7131.
[38]
Travis J Wheeler and John D Kececioglu. 2007. Multiple alignment by aligning alignments. Bioinformatics 23, 13 (2007), i559--i568.
[39]
Daniel R Zerbino, Premanand Achuthan, Wasiu Akanni, M Ridwan Amode, Daniel Barrell, Jyothish Bhai, Konstantinos Billis, Carla Cummins, Astrid Gall, Carlos García Girón, et al. 2017. Ensembl 2018. Nucleic Acids Research 46, D1 (2017), D754--D761.

Cited By

View all
  • (2023)Leveraging genomic redundancy to improve inference and alignment of orthologous proteinsG3: Genes, Genomes, Genetics10.1093/g3journal/jkad22213:12Online publication date: 28-Sep-2023
  • (2022)IsoAligner: dynamic mapping of amino acid positions across protein isoformsF1000Research10.12688/f1000research.76154.111(382)Online publication date: 31-Mar-2022
  • (2020)The number of Z-repeats and super-repeats in nebulin greatly varies across vertebrates and scales with animal sizeJournal of General Physiology10.1085/jgp.202012783153:3Online publication date: 18-Dec-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
August 2018
727 pages
ISBN:9781450357944
DOI:10.1145/3233547
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 August 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. alternative splicing
  2. dual-coding exons
  3. multiple sequence alignment
  4. protein isoforms

Qualifiers

  • Research-article

Funding Sources

Conference

BCB '18
Sponsor:

Acceptance Rates

BCB '18 Paper Acceptance Rate 46 of 148 submissions, 31%;
Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)124
  • Downloads (Last 6 weeks)10
Reflects downloads up to 18 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Leveraging genomic redundancy to improve inference and alignment of orthologous proteinsG3: Genes, Genomes, Genetics10.1093/g3journal/jkad22213:12Online publication date: 28-Sep-2023
  • (2022)IsoAligner: dynamic mapping of amino acid positions across protein isoformsF1000Research10.12688/f1000research.76154.111(382)Online publication date: 31-Mar-2022
  • (2020)The number of Z-repeats and super-repeats in nebulin greatly varies across vertebrates and scales with animal sizeJournal of General Physiology10.1085/jgp.202012783153:3Online publication date: 18-Dec-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media