Abstract
Progressive methods offer efficient and reasonably good solutions to the multiple sequence alignment problem. However, resulting alignments are biased by guide-trees, especially for relatively distant sequences.
We propose MSARC, a new graph-clustering based algorithm that aligns sequence sets without guide-trees. Experiments on the BAliBASE dataset show that MSARC achieves alignment quality similar to best progressive methods and substantially higher than the quality of other non-progressive algorithms. Furthermore, MSARC outperforms all other methods on sequence sets whose evolutionary distances are hardly representable by a phylogenetic tree. These datasets are most exposed to the guide-tree bias of alignments.
MSARC is available at http://bioputer.mimuw.edu.pl/msarc .
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Do, C.B., Mahabhashyam, M.S.P., Brudno, M., Batzoglou, S.: Probcons: Probabilistic consistency-based multiple sequence alignment. Genome. Res. 15(2), 330–340 (2005), http://dx.doi.org/10.1101/gr.2821705
Edgar, R.C.: Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5), 1792–1797 (2004), http://dx.doi.org/10.1093/nar/gkh340
Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. In: Proceedings of the 19th Design Automation Conference, DAC 1982, pp. 175–181. IEEE Press, Piscataway (1982), http://dl.acm.org/citation.cfm?id=800263.809204
Gonnet, G.H., Cohen, M.A., Benner, S.A.: Exhaustive matching of the entire protein sequence database. Science 256(5062), 1443–1445 (1992)
Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162(3), 705–708 (1982)
Guindon, S., Dufayard, J.F., Lefort, V., Anisimova, M., Hordijk, W., Gascuel, O.: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of phyml 3.0. Syst. Biol. 59(3), 307–321 (2010), http://dx.doi.org/10.1093/sysbio/syq010
Hendrickson, B., Leland, R.: A multilevel algorithm for partitioning graphs. In: Proceedings of the 1995 ACM/IEEE Conference on Supercomputing (CDROM). Supercomputing 1995. ACM, New York (1995), http://doi.acm.org/10.1145/224170.224228
Katoh, K., Kuma, K.-I., Toh, H., Miyata, T.: Mafft version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33(2), 511–518 (2005), http://dx.doi.org/10.1093/nar/gki198
Kececioglu, J.: The maximum weight trace problem in multiple sequence alignment. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1993. LNCS, vol. 684, pp. 106–119. Springer, Heidelberg (1993)
Liu, K., Raghavan, S., Nelesen, S., Linder, C.R., Warnow, T.: Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science 324(5934), 1561–1564 (2009), http://dx.doi.org/10.1126/science.1171243
Löytynoja, A., Goldman, N.: Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320(5883), 1632–1635 (2008), http://dx.doi.org/10.1126/science.1158395
Lunter, G., Miklós, I., Drummond, A., Jensen, J.L., Hein, J.: Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinformatics 6, 83 (2005), http://dx.doi.org/10.1186/1471-2105-6-83
Miyazawa, S.: A reliable sequence alignment method based on probabilities of residue correspondences. Protein Eng. 8(10), 999–1009 (1995)
Mückstein, U., Hofacker, I.L., Stadler, P.F.: Stochastic pairwise alignments. Bioinformatics 18(suppl. 2), S153–S160 (2002)
Notredame, C., Higgins, D.G., Heringa, J.: T-coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302(1), 205–217 (2000), http://dx.doi.org/10.1006/jmbi.2000.4042
Redelings, B.D., Suchard, M.A.: Joint bayesian estimation of alignment and phylogeny. Syst. Biol. 54(3), 401–418 (2005), http://dx.doi.org/10.1080/10635150590947041
Roshan, U., Livesay, D.R.: Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22(22), 2715–2721 (2006), http://dx.doi.org/10.1093/bioinformatics/btl472
Sievers, F., Wilm, A., Dineen, D., Gibson, T.J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Söding, J., Thompson, J.D., Higgins, D.G.: Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega. Mol. Syst. Biol. 7, 539 (2011), http://dx.doi.org/10.1038/msb.2011.75
Subramanian, A.R., Kaufmann, M., Morgenstern, B.: Dialign-tx: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol. Biol. 3, 6 (2008), http://dx.doi.org/10.1186/1748-7188-3-6
Subramanian, A.R., Weyer-Menkhoff, J., Kaufmann, M., Morgenstern, B.: Dialign-t: an improved algorithm for segment-based multiple sequence alignment. BMC Bioinformatics 6, 66 (2005), http://dx.doi.org/10.1186/1471-2105-6-66
Thompson, J.D., Higgins, D.G., Gibson, T.J.: Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22(22), 4673–4680 (1994)
Thompson, J.D., Koehl, P., Ripp, R., Poch, O.: Balibase 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61(1), 127–136 (2005), http://dx.doi.org/10.1002/prot.20527
Wong, K.M., Suchard, M.A., Huelsenbeck, J.P.: Alignment uncertainty and genomic analysis. Science 319(5862), 473–476 (2008), http://dx.doi.org/10.1126/science.1151532
Yu, Y.K., Hwa, T.: Statistical significance of probabilistic sequence alignment and related local hidden markov models. J. Comput. Biol. 8(3), 249–282 (2001), http://dx.doi.org/10.1089/10665270152530845
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Modzelewski, M., Dojer, N. (2013). MSARC: Multiple Sequence Alignment by Residue Clustering. In: Darling, A., Stoye, J. (eds) Algorithms in Bioinformatics. WABI 2013. Lecture Notes in Computer Science(), vol 8126. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40453-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-40453-5_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40452-8
Online ISBN: 978-3-642-40453-5
eBook Packages: Computer ScienceComputer Science (R0)