Predicting Core Columns of Protein Multiple Sequence Alignments for Improved Parameter Advising

Dan DeBlasio¹⁵ &
John Kececioglu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9838))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

1408 Accesses
2 Citations

Abstract

In a computed protein multiple sequence alignment, the coreness of a column is the fraction of its substitutions that are in so-called core columns of the gold-standard reference alignment of its proteins. In benchmark suites of protein reference alignments, the core columns of the reference are those that can be confidently labeled as correct, usually due to all residues in the column being sufficiently close in the spatial superposition of the folded three-dimensional structures of the proteins. When computing a protein multiple sequence alignment in practice, a reference alignment is not known, so its coreness can only be predicted.

We develop for the first time a predictor of column coreness for protein multiple sequence alignments. This allows us to predict which columns of a computed alignment are core, and hence better estimate the alignment’s accuracy. Our approach to predicting coreness is similar to nearest-neighbor classification from machine learning, except we transform nearest-neighbor distances into a coreness prediction via a regression function, and we learn an appropriate distance function through a new optimization formulation that solves a large-scale linear programming problem. We apply our coreness predictor to parameter advising, the task of choosing parameter values for an aligner’s scoring function to obtain a more accurate alignment of a specific set of sequences. We show that for this task, our predictor strongly outperforms other column-confidence estimators from the literature, and affords a substantial boost in alignment accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Core column prediction for protein multiple sequence alignments

Article Open access 19 April 2017

Boosting Alignment Accuracy by Adaptive Local Realignment

QuickProbs 2: Towards rapid construction of high-quality alignments of large protein families

Article Open access 31 January 2017

References

Balaji, S., Sujatha, S., Kumar, S., Srinivasan, N.: PALI—a database of Phylogeny and ALIgnment of homologous protein structures. NAR 29(1), 61–65 (2001)
Article Google Scholar
Capella-Gutierrez, S., Silla-Martinez, J.M., Gabaldón, T.: trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25(15), 1972–1973 (2009)
Article Google Scholar
Castresana, J.: Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17(4), 540–552 (2000)
Article Google Scholar
Chang, J.M., Tommaso, P.D., Notredame, C.: TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction. Mol. Biol. Evol. 31, 1625–1637 (2014)
Article Google Scholar
DeBlasio, D., Kececioglu, J.: Ensemble multiple sequence alignment via advising. In: Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (BCB), pp. 452–461 (2015)
Google Scholar
DeBlasio, D.F., Kececioglu, J.D.: Learning parameter sets for alignment advising. In: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (BCB), pp. 230–239 (2014)
Google Scholar
DeBlasio, D.F., Wheeler, T.J., Kececioglu, J.D.: Estimating the accuracy of multiple alignments and its use in parameter advising. In: Chor, B. (ed.) RECOMB 2012. LNCS, vol. 7262, pp. 45–59. Springer, Heidelberg (2012)
Chapter Google Scholar
Dress, A.W., Flamm, C., Fritzsch, G., Grünewald, S., Kruspe, M., Prohaska, S.J., Stadler, P.F.: Noisy: identification of problematic columns in multiple sequence alignments. Algorithms Mol. Biol. 3(7) (2008)
Google Scholar
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probablistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Book MATH Google Scholar
Edgar, R.C.: BENCH, January 2009. drive5.com/bench
Edgar, R.C.: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinform. 5(113), 1–19 (2004)
Google Scholar
Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292(2), 195–202 (1999)
Article Google Scholar
Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: open source scientific tools for Python (2001). http://www.scipy.org
Katoh, K., Kuma, K.I., Toh, H., Miyata, T.: MAFFT ver. 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33(2), 511–518 (2005)
Article Google Scholar
Kececioglu, J., DeBlasio, D.: Accuracy estimation and parameter advising for protein multiple sequence alignment. J. Comput. Biol. 20(4), 259–279 (2013)
Article Google Scholar
Kück, P., Meusemann, K., Dambach, J., et al.: Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees. Front. Zool. 7(10), 1–10 (2010)
Google Scholar
Sela, I., Ashkenazy, H., Katoh, K., Pupko, T.: GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res. 43(W1), W7–W14 (2015)
Article Google Scholar
Sievers, F., et al.: Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7(1), 539 (2011)
Article Google Scholar
Wheeler, T.J., Kececioglu, J.D.: Multiple alignment by aligning alignments. Bioinformatics 23(13), i559–i568 (2007). Proceedings of ISMB 2007
Article Google Scholar
Wheeler, T.J., Kececioglu, J.D.: Opal: software for sum-of-pairs multiple sequence alignment, January 2012. http://opal.cs.arizona.edu
Wu, M., Chatterji, S., Eisen, J.A.: Accounting for alignment uncertainty in phylogenomics. PLoS One 7(1), e30288 (2012)
Article Google Scholar

Download references

Acknowledgement

This research was supported by NSF grant IIS-1217886 to J.K.

Author information

Authors and Affiliations

Department of Computer Science, The University of Arizona, Tucson, AZ, 85721, USA
Dan DeBlasio & John Kececioglu

Authors

Dan DeBlasio
View author publications
You can also search for this author in PubMed Google Scholar
John Kececioglu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dan DeBlasio .

Editor information

Editors and Affiliations

AIST and University of Tokyo , Tokyo, Japan
Martin Frith
Aarhus University, Aarhus, Denmark
Christian Nørgaard Storm Pedersen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

DeBlasio, D., Kececioglu, J. (2016). Predicting Core Columns of Protein Multiple Sequence Alignments for Improved Parameter Advising. In: Frith, M., Storm Pedersen, C. (eds) Algorithms in Bioinformatics. WABI 2016. Lecture Notes in Computer Science(), vol 9838. Springer, Cham. https://doi.org/10.1007/978-3-319-43681-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-43681-4_7
Published: 06 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43680-7
Online ISBN: 978-3-319-43681-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Predicting Core Columns of Protein Multiple Sequence Alignments for Improved Parameter Advising

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Core column prediction for protein multiple sequence alignments

Boosting Alignment Accuracy by Adaptive Local Realignment

QuickProbs 2: Towards rapid construction of high-quality alignments of large protein families

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Predicting Core Columns of Protein Multiple Sequence Alignments for Improved Parameter Advising

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Core column prediction for protein multiple sequence alignments

Boosting Alignment Accuracy by Adaptive Local Realignment

QuickProbs 2: Towards rapid construction of high-quality alignments of large protein families

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation