[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/369133.369199acmconferencesArticle/Chapter ViewAbstractPublication PagesrecombConference Proceedingsconference-collections
Article

101 optimal PDB structure alignments: a branch-and-cut algorithm for the maximum contact map overlap problem

Published: 22 April 2001 Publication History

Abstract

Structure comparison is a fundamental problem for structural genomics. A variety of structure comparison methods were proposed and several protein structure classification servers e.g., SCOP, DALI, CATH, were designed based on them, and are extensively used in practice. This area of research continues to be very active, being energized bi-annually by the CASP folding competitions, but despite the extraordinary international research effort devoted to it, progress is slow. A fundamental dimension of this bottleneck is the absence of rigorous algorithmic methods. A recent excellent survey on structure comparison by Taylor et.al. [23] records the state of the art of the area: In structure comparison, we do not even have an algorithm that guarantees an optimal answer for pairs of structures …
In this paper we provide the first rigorous algorithm for structure comparison. Our method is based on developing an effective integer linear programming (IP) formulation of protein structure contact maps overlap (CMO), and a branch-and-cut strategy that employs lower-bounding heuristics at the branch nodes. Our algorithms identified a gallery of optimal and near-optimal structure alignments for pairs of proteins from the Protein Data Bank with up to 80 amino acids and about 150 contacts each — problems of instance size of about 300. Although these sizes also reflect our current limitations, these are the first provable optimal and near-optimal algorithms in the literature for a measure of structure similarity which sees extensive practical use. At the heart of our success in finding optimal alignments is a reduction of the CMO optimization to the maximum independent set (MIS) problem on special graphs. For CMO instances of size 300, the corresponding MIS graph instance contains about 10,000 nodes. While our algorithms are able to solve to optimality MIS problem of these sizes, the known optimal algorithms for the MIS on general graphs can at present only solve instances with up to a few hundred nodes. This is the first effective use of IP methods in protein structure comparison; the biomolecular structure literature contains only one other effective IP method devoted to RNA comparison, due to Lenhof et.al. [18].
The hybrid heuristic approach that worked well for providing lower bounds in the branch and cut algorithm was tried on large proteins in a test set suggested by Jeffrey Skolnick. It involved 33 proteins classified into four families: Flavodoxin-like fold CheY-related, Plastocyanin, TIM Barrel, and Ferratin. Out of the set of all 528 pairwise structure alignments, we have validated the clustering with a 98.7% accuracy (1.3% false negatives and 0% false positives).

References

[1]
E. Balas and C. S. Yu, Finding a maximum clique in an arbitrary graph, SIAM J. on Comp., 15(4) :1054-1068, 1986.
[2]
H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne, The Protein Data Bank, Nucleic Acids Research, 28 pp. 235-242, 2000.
[3]
W. J. Cook, W. H. Cunningham, W. R. Pulleyblank and A. Schrijver, Combinatorial Optimization, John Wiley and Sons, New York, 1998.
[4]
P. Crescenzi and V. Kann, A compendium of NP optimization problems, http ://www. nada. kth. se/~viggo, the web.
[5]
M. Grotschel, L. Lov~sz and A. Schrijver, "The Ellipsoid Method and its Consequences in Combinatorial Optimization", Combinatorica 1 (1981), 169-197.
[6]
A. Godzik, The structural alignment between two proteins: Is there a unique answer ?, Protein Science, 5:1325-1338, 1996.
[7]
A. Godzik, J. Sklonick and A. Kolinski, A topology fingerprint approach to inverse protein folding problem, J. Mol. Bio1.,227:227-238, 1992.
[8]
A. Godzik and J. Skolnick, Flexible algorithm for direct multiple alignment of protein structures and sequences, CABIOS, 10, (6) 587-596, 1994.
[9]
Garey and Johnson, Computers and intractability: A Guide to the Theory of NP-Completeness, Freeman, 1979.
[10]
D. Goldman, S. Istrail and C. Papadimitriou, Algorithmic Aspects of Protein Structure Similarity, Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science, 512-522, 1999.
[11]
D. Goldman, PhD. Thesis, Dept. of Computer Science, U C Berkeley, 2000.
[12]
A. Lucas, K. Dill and S. Istrail, Contact maps and the computational statistical mechanics aspects of protein folding (in preparation).
[13]
R. B. Hayward, Wealky Triangulated Graphs, J. of Comb. Theory, Series B, (39)200-209, 1985.
[14]
R.B. Hayward, C. Hoang and F. Maffray, Optimizing Wealky Triangulated Graphs, Graphs and Combinatorics, 1987.
[15]
L. Holm and C. Sander, 3-D lookup: fast protein structure searches at 90% reliability, Proceedings of the ISMB 1995, p. 179-187, AAAI., 1995.
[16]
D. S. Johnson and M. A. Trick eds, Cliques, Coloring, and Satisfiability, Dimacs Series in Discrete Mathematics and Theoretical Computer Science, the American Mathematical Society, 1996.
[17]
Kabash-W., A solution for the best rotation to relate two sets of vectors, Acta Cryst. A32, 922-923, 1978.
[18]
H. P. Lenhof, K. Reinert, M. Vingron, A Polyhedral Approach to RNA Sequence Structure Alignment, J. Comp. Biol., 5(3):517-530, 1998.
[19]
A. Lesk, 11th Lipari International Summer School in Computational Biology, 1999.
[20]
G. L. Nemhauser and L. Wolsey, Integer and Combinatorial Optimization, J. Wiley and Sons, 1988.
[21]
G. L. Nemhauser and L. E. Trotter, Vertex packings: Structural properties and algorithms, Mathematical Programming, 8:232-248, 1975.
[22]
A. Raghunathan, Algorithms for Weakly Triangulated Graphs, UC. Berkeley, Tech. Rep. CSD-89-503, 1989.
[23]
I. Eidhammer and I. Jonassen and W. R. Taylor, Structure Comparison and Structure Prediction, to appear J. Comp. Biol., x(x), 2000.
[24]
D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Reading: Addison-Wesley, 1989.
[25]
J. H. Holland, Adaptation in Natural and Artificial Systems, Cambridge, MA: MIT Press, 1992.

Cited By

View all
  • (2021)A COVID-19 Drug Repurposing Strategy through Quantitative Homological Similarities Using a Topological Data Analysis-Based FrameworkPharmaceutics10.3390/pharmaceutics1304048813:4(488)Online publication date: 2-Apr-2021
  • (2017)Adaptive distributed modified extremal optimisation for maximising contact map overlap and its performance evaluationInternational Journal of Computational Intelligence Studies10.1504/IJCISTUDIES.2017.0895186:4(288-310)Online publication date: 1-Jan-2017
  • (2017)Combinatorics of Contacts in Protein Contact MapsBulletin of Mathematical Biology10.1007/s11538-017-0380-480:2(385-403)Online publication date: 11-Dec-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
RECOMB '01: Proceedings of the fifth annual international conference on Computational biology
April 2001
316 pages
ISBN:1581133537
DOI:10.1145/369133
  • Chairman:
  • Thomas Lengauer
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2001

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

RECOMB01
Sponsor:

Acceptance Rates

RECOMB '01 Paper Acceptance Rate 35 of 128 submissions, 27%;
Overall Acceptance Rate 148 of 538 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2021)A COVID-19 Drug Repurposing Strategy through Quantitative Homological Similarities Using a Topological Data Analysis-Based FrameworkPharmaceutics10.3390/pharmaceutics1304048813:4(488)Online publication date: 2-Apr-2021
  • (2017)Adaptive distributed modified extremal optimisation for maximising contact map overlap and its performance evaluationInternational Journal of Computational Intelligence Studies10.1504/IJCISTUDIES.2017.0895186:4(288-310)Online publication date: 1-Jan-2017
  • (2017)Combinatorics of Contacts in Protein Contact MapsBulletin of Mathematical Biology10.1007/s11538-017-0380-480:2(385-403)Online publication date: 11-Dec-2017
  • (2016)Contact map overlap maximization using adaptive distributed modified extremal optimization2016 IEEE 9th International Workshop on Computational Intelligence and Applications (IWCIA)10.1109/IWCIA.2016.7805754(87-92)Online publication date: Nov-2016
  • (2016) Enumeration of Extended m -Regular Linear Stacks Journal of Computational Biology10.1089/cmb.2016.004123:12(943-956)Online publication date: Dec-2016
  • (2016)Regular Simple Queues of Protein Contact MapsBulletin of Mathematical Biology10.1007/s11538-016-0212-y79:1(21-35)Online publication date: 14-Nov-2016
  • (2015)Amplitude spectrum distance: measuring the global shape divergence of protein fragmentsBMC Bioinformatics10.1186/s12859-015-0693-y16:1Online publication date: 14-Aug-2015
  • (2015)A new distributed modified extremal optimization for optimizing protein structure alignment2015 IEEE 8th International Workshop on Computational Intelligence and Applications (IWCIA)10.1109/IWCIA.2015.7449472(109-114)Online publication date: Nov-2015
  • (2015)Protein structure similarity using recurrence relation technique2015 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)10.1109/CIBCB.2015.7300329(1-7)Online publication date: Aug-2015
  • (2015)Deriving compact extended formulations via LP-based separation techniquesAnnals of Operations Research10.1007/s10479-015-2012-4240:1(321-350)Online publication date: 22-Sep-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media