[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Protein Design by Sampling an Undirected Graphical Model of Residue Constraints

Published: 01 July 2009 Publication History

Abstract

This paper develops an approach for designing protein variants by sampling sequences that satisfy residue constraints encoded in an undirected probabilistic graphical model. Due to evolutionary pressures on proteins to maintain structure and function, the sequence record of a protein family contains valuable information regarding position-specific residue conservation and coupling (or covariation) constraints. Representing these constraints with a graphical model provides two key benefits for protein design: a probabilistic semantics enabling evaluation of possible sequences for consistency with the constraints, and an explicit factorization of residue dependence and independence supporting efficient exploration of the constrained sequence space. We leverage these benefits in developing two complementary MCMC algorithms for protein design: constrained shuffling mixes wild-type sequences positionwise and evaluates graphical model likelihood, while component sampling directly generates sequences by sampling clique values and propagating to other cliques. We apply our methods to design WW domains. We demonstrate that likelihood under a model of wild-type WWs is highly predictive of foldedness of new WWs. We then show both theoretical and rapid empirical convergence of our algorithms in generating high-likelihood, diverse new sequences. We further show that these sequences capture the original sequence constraints, yielding a model as predictive of foldedness as the original one.

References

[1]
W.L. Buntine, "Operations for Learning with Graphical Models," J. Artifical Intelligence Research, vol. 2, pp. 159-225, 1994.
[2]
B.I. Dahiyat and S.L. Mayo, "De Novo Protein Design: Fully Automated Sequence Selection," Science, vol. 278, no. 5335, pp. 82- 87, Oct. 1997.
[3]
MCMC in Practice, W.R. Gilks, S. Richardson, and D.J. Spiegelhalter, eds. Chapman & Hall/CRC, 1995.
[4]
H. Kamisetty, E.P. Xing, and C.J. Langmead, "Free Energy Estimates of All-Atom Protein Structures Using Generalized Belief Propagation," Proc. Int'l Conf. Research in Computational Molecular Biology (RECOMB '07), pp. 366-380, Apr. 2007.
[5]
S. Kamtekar, J.M. Schiffer, H. Xiong, J.M. Babik, and M.H. Hecht, "Protein Design by Binary Patterning of Polar and Nonpolar Amino Acids," Science, vol. 262, no. 5140, pp. 1680-1685, Dec. 1993.
[6]
B. Kuhlman, G. Dantas, G.C. Ireton, G. Varani, B.L. Stoddard, and D. Baker, "Design of a Novel Globular Protein Fold with Atomic-Level Accuracy," Science, vol. 302, no. 5649, pp. 1364- 1368, Nov. 2003.
[7]
S.L. Lauritzen, Graphical Models. Oxford Univ. Press, 1996.
[8]
J. Li, Z.-P. Yi, M.C. Laskowski, M. Laskowski, Jr., and C. Bailey-Kellogg, "Analysis of Sequence-Reactivity Space for Protein-Protein Interactions," Proteins, vol. 58, no. 3, pp. 661-671, Feb. 2005.
[9]
R.H. Lilien, B.W. Stevens, A.C. Anderson, and B.R. Donald, "A Novel Ensemble-Based Scoring and Search Algorithm for Protein Redesign and Its Application to Modify the Substrate Specificity of the Gramicidin Synthetase A Phenylalanine Adenlytaion Enzyme," J. Computational Biology, vol. 12, no. 6, pp. 740-761, July 2005.
[10]
S.W. Lockless and R. Ranganathan, "Evolutionarily Conserved Pathways of Energetic Connectivity in Protein Families," Science, vol. 286, no. 5438, pp. 295-299, Oct. 1999.
[11]
C. Loose, K. Jensen, I. Rigoutsos, and G. Stephanopoulos, "A Linguistic Model for the Rational Design of Antimicrobial Peptides," Nature, vol. 443, no. 7113, pp. 867-869, 2006.
[12]
S.M. Lu et al., "Predicting the Reactivity of Proteins from Their Sequence Alone: Kazal Family of Protein Inhibitors of Serine Proteinases," Proc. Nat'l Academy of Sciences USA, vol. 98, no. 4, pp. 1410-1415, Feb. 2001.
[13]
O. Olmea, B. Rost, and A. Valencia, "Effective Use of Sequence Correlation and Conservation in Fold Recognition," J. Molecular Biology, vol. 293, no. 5, pp. 1221-1239, Nov. 1999.
[14]
C.R. Otey, J.J. Silberg, C.A. Voigt, J.B. Endelman, G. Bandara, and F.H. Arnold, "Functional Evolution and Structural Conservation in Chimeric Cytochromes P450: Calibrating a Structure-Guided Approach," Chemistry & Biology, vol. 11, no. 3, pp. 309-318, Mar. 2004.
[15]
F. Pazos, M. Helmer-Citterich, G. Ausiello, and A. Valencia, "Correlated Mutations Contain Information about Protein-Protein Interaction," J. Molecular Biology, vol. 271, no. 4, pp. 511-523, Aug. 1997.
[16]
J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, Inc., 1988.
[17]
W.P. Russ, D.M. Lowery, P. Mishra, M.B. Yaffee, and R. Ranganathan, "Natural-Like Function in Artificial WW Domains," Nature, vol. 437, no. 7058, pp. 579-583, Sept. 2005.
[18]
L. Saftalov, P.A. Smith, A.M. Friedman, and C. Bailey-Kellogg, "Site-Directed Combinatorial Construction of Chimaeric Genes: General Method for Optimizing Assembly of Gene Fragments," Proteins, vol. 64, no. 3, pp. 629-642, June 2006.
[19]
M. Socolich, S.W. Lockless, W.P. Russ, H. Lee, K.H. Gardner, and R. Ranganathan, "Evolutionary Information for Specifying a Protein Fold," Nature, vol. 437, no. 7058, pp. 512-518, Sept. 2005.
[20]
J. Thomas, N. Ramakrishnan, and C. Bailey-Kellogg, "Graphical Models of Residue Coupling in Protein Families," IEEE Trans. Computational Biology and Bioinformatics, vol. 5, no. 2, pp. 183-197, Apr.-June 2008.
[21]
C. Yanover and Y. Weiss, "Finding the M Most Probable Configurations Using Loopy Belief Propagation," Proc. Neural Information Processing Systems (NIPS) Conf., 2003.
[22]
X. Ye, A.M. Friedman, and C. Bailey-Kellogg, "Hypergraph Model of Multi-Residue Interactions in Proteins: Sequentially-Constrained Partitioning Algorithms for Optimization of Site-Directed Protein Recombination," J. Computational Biology, vol. 14, no. 6, pp. 777-790, July 2007.
[23]
W. Zheng, A.M. Friedman, and C. Bailey-Kellogg, "Algorithms for Joint Optimization of Stability and Diversity in Planning Combinatorial Libraries of Chimeric Proteins," Proc. Int'l Conf. Research in Computational Molecular Biology (RECOMB '08), pp. 300- 314, Apr. 2008.
[24]
W. Zheng, X. Ye, A.M. Friedman, and C. Bailey-Kellogg, "Algorithms for Selecting Breakpoint Locations to Optimize Diversity in Protein Engineering by Site-Directed Protein Recombination," Proc. IEEE Conf. Computational Systems Bioinformatics (CSB '07), pp. 31-40, Aug. 2007.

Cited By

View all
  • (2014)Learning Sequence Determinants of ProteinProceedings of the 18th Annual International Conference on Research in Computational Molecular Biology - Volume 839410.1007/978-3-319-05269-4_10(129-143)Online publication date: 2-Apr-2014
  • (2013)Improved Multiple Sequence Alignments Using Coupled Pattern MiningIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2013.3610:5(1098-1112)Online publication date: 1-Sep-2013
  • (2012)Improved multiple sequence alignments using coupled pattern miningProceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine10.1145/2382936.2382940(28-35)Online publication date: 7-Oct-2012

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 6, Issue 3
July 2009
159 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 July 2009
Published in TCBB Volume 6, Issue 3

Author Tags

  1. Markov chain Monte Carlo (MCMC).
  2. Protein design
  3. graphical models
  4. residue coupling

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2014)Learning Sequence Determinants of ProteinProceedings of the 18th Annual International Conference on Research in Computational Molecular Biology - Volume 839410.1007/978-3-319-05269-4_10(129-143)Online publication date: 2-Apr-2014
  • (2013)Improved Multiple Sequence Alignments Using Coupled Pattern MiningIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2013.3610:5(1098-1112)Online publication date: 1-Sep-2013
  • (2012)Improved multiple sequence alignments using coupled pattern miningProceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine10.1145/2382936.2382940(28-35)Online publication date: 7-Oct-2012

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media