Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations

PLoS Comput Biol. 2016 Dec 21;12(12):e1005294. doi: 10.1371/journal.pcbi.1005294. eCollection 2016 Dec.

Authors

Affiliations

¹ Institute for Genome Sciences and Department of Biochemistry & Molecular Biology, University of Maryland School of Medicine, BioPark II, Room 617, Baltimore, MD, United States of America.
² National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America.

Abstract

Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models (hiHMMs), which describe these sequence correlations implicitly. By characterizing such correlations one may hope to obtain information regarding functionally-relevant properties that have thus far evaded detection. To do so, we infer a hiHMM distribution from sequence data using Bayes' theorem and Markov chain Monte Carlo (MCMC) sampling, which is widely recognized as the most effective approach for characterizing a complex, high dimensional distribution. Other routines then map correlated residue patterns to available structures with a view to hypothesis generation. When applied to N-acetyltransferases, this reveals sequence and structural features indicative of functionally important, yet generally unknown biochemical properties. Even for sets of proteins for which nothing is known beyond unannotated sequences and structures, this can lead to helpful insights. We describe, for example, a putative coenzyme-A-induced-fit substrate binding mechanism mediated by arginine residue switching between salt bridge and π-π stacking interactions. A suite of programs implementing this approach is available (psed.igs.umaryland.edu).

Publication types

Research Support, N.I.H., Intramural

MeSH terms

Acetyltransferases / chemistry*
Acetyltransferases / genetics
Acetyltransferases / metabolism
Amino Acid Sequence
Animals
Caenorhabditis elegans Proteins / chemistry
Caenorhabditis elegans Proteins / genetics
Caenorhabditis elegans Proteins / metabolism
Computational Biology
Humans
Markov Chains
Models, Molecular*
Monte Carlo Method
Sequence Alignment / methods
Sequence Analysis, Protein / methods*

Substances

Caenorhabditis elegans Proteins
Acetyltransferases

Grants and funding

SFA was supported by the Intramural Research Program of the National Institutes of Health, National Library of Medicine. AFN received no specific funding for this work. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.