[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3321707.3321773acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

A 'phylogeny-aware' multi-objective optimization approach for computing MSA

Published: 13 July 2019 Publication History

Abstract

Multiple sequence alignment (MSA) is a basic step in many analyses in bioinformatics, including predicting the structure and function of proteins, orthology prediction and estimating phylogenies. The objective of MSA is to infer the homology among the sequences of chosen species. Commonly, the MSAs are inferred by optimizing a single objective function. The alignments estimated under one criterion may be different to the alignments generated by other criteria, inferring discordant homologies and thus leading to different evolutionary histories relating the sequences. In the recent past, researchers have advocated for the multi-objective formulation of MSA, to address this issue, where multiple conflicting objective functions are being optimized simultaneously to generate a set of alignments. However, no theoretical or empirical justification with respect to a real-life application has been shown for a particular multi-objective formulation. In this study, we investigate the impact of multi-objective formulation in the context of phylogenetic tree estimation. In essence, we ask the question whether a phylogeny-aware metric can guide us in choosing appropriate multi-objective formulations. Employing evolutionary optimization, we demonstrate that trees estimated on the alignments generated by multi-objective formulation are substantially better than the trees estimated by the state-of-the-art MSA tools, including PASTA, T-Coffee, MAFFT etc.

Supplementary Material

PDF File (p577-nayeem_suppl.pdf)
Supplemental material.

References

[1]
Maryam Abbasi, Luís Paquete, Arnaud Liefooghe, Miguel Pinheiro, and Pedro Matias. 2013. Improvements on bicriteria pairwise sequence alignment: algorithms and applications. Bioinformatics 29, 8 (2013), 996--1003.
[2]
Maryam Abbasi, Luís Paquete, and Francisco B Pereira. 2015. Local search for multiobjective multiple sequence alignment. In International Conference on Bioinformatics and Biomedical Engineering. Springer, 175--182.
[3]
Robert K Bradley, Adam Roberts, Michael Smoot, Sudeep Juvekar, Jaeyoung Do, Colin Dewey, Ian Holmes, and Lior Pachter. 2009. Fast statistical alignment. PLoS computational biology 5, 5 (2009), e1000392.
[4]
Fernando José Mateus da Silva, Juan Manuel Sánchez Pérez, Juan Antonio Gómez Pulido, and Miguel A Vega Rodríguez. 2010. AlineaGA---a genetic algorithm with local search optimization for multiple sequence alignment. Applied Intelligence 32, 2 (2010), 164--172.
[5]
Kalyanmoy Deb and Himanshu Jain. 2014. An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: Solving problems with box constraints. IEEE Trans. Evolutionary Computation 18, 4 (2014), 577--601.
[6]
Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation 6, 2 (2002), 182--197.
[7]
Joaquín Derrac, Salvador García, Daniel Molina, and Francisco Herrera. 2011. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm and Evolutionary Computation 1, 1 (2011), 3--18.
[8]
Chuong B Do, Mahathi SP Mahabhashyam, Michael Brudno, and Serafim Batzoglou. 2005. ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome research 15, 2 (2005), 330--340.
[9]
Robert C Edgar. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research 32, 5 (2004), 1792--1797.
[10]
Milton Friedman. 1937. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the american statistical association 32, 200 (1937), 675--701.
[11]
Steven Henikoff and Jorja G Henikoff. 1992. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences 89, 22 (1992), 10915--10919.
[12]
Sture Holm. 1979. A simple sequentially rejective multiple test procedure. Scandinavian journal of statistics (1979), 65--70.
[13]
Deb Kalyanmoy. 2001. Multi objective optimization using evolutionary algorithms. John Wiley and Sons.
[14]
Kazutaka Katoh, Kazuharu Misawa, Kei-ichi Kuma, and Takashi Miyata. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic acids research 30, 14 (2002), 3059--3066.
[15]
Timo Lassmann, Oliver Frings, and Erik LL Sonnhammer. 2008. Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleic acids research 37, 3 (2008), 858--865.
[16]
Kevin Liu, Sindhu Raghavan, Serita Nelesen, C Randal Linder, and Tandy Warnow. 2009. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science 324, 5934 (2009), 1561--1564.
[17]
Yongchao Liu, Bertil Schmidt, and Douglas L Maskell. 2010. MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities. Bioinformatics 26, 16 (2010), 1958--1964.
[18]
Ari Löytynoja and Nick Goldman. 2005. An algorithm for progressive multiple alignment of sequences with insertions. Proceedings of the National academy of sciences of the United States of America 102, 30 (2005), 10557--10562.
[19]
Siavash Mirarab, Nam Nguyen, Sheng Guo, Li-San Wang, Junhyong Kim, and Tandy Warnow. 2015. PASTA: ultra-large multiple sequence alignment for nucleotide and amino-acid sequences. Journal of Computational Biology 22, 5 (2015), 377--386.
[20]
Douglas C Montgomery, Elizabeth A Peck, and G Geoffrey Vining. 2012. Introduction to linear regression analysis. Vol. 821. John Wiley & Sons.
[21]
Cédric Notredame, Desmond G Higgins, and Jaap Heringa. 2000. T-coffee: a novel method for fast and accurate multiple sequence alignment1. Journal of molecular biology 302, 1 (2000), 205--217.
[22]
Francisco M Ortuño, Olga Valenzuela, Fernando Rojas, Hector Pomares, Javier P Florido, Jose M Urquiza, and Ignacio Rojas. 2013. Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns. Bioinformatics 29, 17 (2013), 2112--2121.
[23]
Jimin Pei and Nick V Grishin. 2006. MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information. Nucleic acids research 34, 16 (2006), 4364--4374.
[24]
Usman Roshan and Dennis R Livesay. 2006. Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22, 22 (2006), 2715--2721.
[25]
Álvaro Rubio-Largo, Leonardo Vanneschi, Mauro Castelli, and Miguel A Vega-Rodríguez. 2018. A characteristic-based framework for multiple sequence aligners. IEEE transactions on cybernetics 48, 1 (2018), 41--51.
[26]
Álvaro Rubio-Largo, Miguel A Vega-Rodríguez, and David L González-Álvarez. 2016. A hybrid multiobjective memetic metaheuristic for multiple sequence alignment. IEEE Transactions on Evolutionary Computation 20, 4 (2016), 499--514.
[27]
David J Sheskin. 2003. Handbook of parametric and nonparametric statistical procedures. crc Press.
[28]
Fabian Sievers, Andreas Wilm, David Dineen, Toby J Gibson, Kevin Karplus, Weizhong Li, Rodrigo Lopez, Hamish McWilliam, Michael Remmert, Johannes Söding, et al. 2011. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular systems biology 7, 1 (2011), 539.
[29]
Wilson Soto and David Becerra. 2014. A multi-objective evolutionary algorithm for improving multiple sequence alignments. In Brazilian Symposium on Bioinformatics. Springer, 73--82.
[30]
Adrienn Szabó, Ádám Novák, István Miklós, and Jotun Hein. 2010. Reticular alignment: A progressive corner-cutting method for multiple sequence alignment. BMC bioinformatics 11, 1 (2010), 570.
[31]
Julie D Thompson, Desmond G Higgins, and Toby J Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic acids research 22, 22 (1994), 4673--4680.
[32]
Julie D Thompson, Patrice Koehl, Raymond Ripp, and Olivier Poch. 2005. BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins: Structure, Function, and Bioinformatics 61, 1 (2005), 127--136.
[33]
Julie D Thompson, Benjamin Linard, Odile Lecompte, and Olivier Poch. 2011. A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PloS one 6, 3 (2011), e18093.
[34]
Tandy Warnow. 2013. Large-scale multiple sequence alignment and phylogeny estimation. In Models and algorithms for genome evolution. Springer, 85--146.
[35]
Cristian Zambrano-Vega, Antonio J Nebro, José García-Nieto, and José F Aldana-Montes. 2017. Comparing multi-objective metaheuristics for solving a three-objective formulation of multiple sequence alignment. Progress in Artificial Intelligence (2017), 1--16.
[36]
Cristian Zambrano-Vega, Antonio J Nebro, José García-Nieto, and Jose F Aldana-Montes. 2017. M2Align: parallel multiple sequence alignment with a multi-objective metaheuristic. Bioinformatics 33, 19 (2017), 3011--3017.
[37]
Cristian Zambrano-Vega, Antonio J Nebro, José García-Nieto, and José F Aldana-Montes. 2017. A Multi-objective Optimization Framework for Multiple Sequence Alignment with Metaheuristics. In International Conference on Bioinformatics and Biomedical Engineering. Springer, 245--256.

Cited By

View all
  • (2022)Multiobjective Formulation of Multiple Sequence Alignment for Phylogeny InferenceIEEE Transactions on Cybernetics10.1109/TCYB.2020.302030852:5(2775-2786)Online publication date: May-2022
  • (2021)Species Tree Estimation from Gene Trees by Minimizing Deep Coalescence and Maximizing Quartet Consistency: A Comparative Study and the Presence of Pseudo Species Tree TerracesSystematic Biology10.1093/sysbio/syab026Online publication date: 12-Apr-2021
  • (2021)Biological computation and computational biology: survey, challenges, and discussionArtificial Intelligence Review10.1007/s10462-020-09951-154:6(4169-4235)Online publication date: 27-Jan-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference
July 2019
1545 pages
ISBN:9781450361118
DOI:10.1145/3321707
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. evolutionary multiobjective optimization
  2. multiple sequence alignment
  3. phylogenetic tree

Qualifiers

  • Research-article

Funding Sources

  • ICT Division, Government of People?s Republic of Bangladesh

Conference

GECCO '19
Sponsor:
GECCO '19: Genetic and Evolutionary Computation Conference
July 13 - 17, 2019
Prague, Czech Republic

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)3
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Multiobjective Formulation of Multiple Sequence Alignment for Phylogeny InferenceIEEE Transactions on Cybernetics10.1109/TCYB.2020.302030852:5(2775-2786)Online publication date: May-2022
  • (2021)Species Tree Estimation from Gene Trees by Minimizing Deep Coalescence and Maximizing Quartet Consistency: A Comparative Study and the Presence of Pseudo Species Tree TerracesSystematic Biology10.1093/sysbio/syab026Online publication date: 12-Apr-2021
  • (2021)Biological computation and computational biology: survey, challenges, and discussionArtificial Intelligence Review10.1007/s10462-020-09951-154:6(4169-4235)Online publication date: 27-Jan-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media