Abstract
Fitness landscapes1,2 depict how genotypes manifest at the phenotypic level and form the basis of our understanding of many areas of biology2,3,4,5,6,7, yet their properties remain elusive. Previous studies have analysed specific genes, often using their function as a proxy for fitness2,4, experimentally assessing the effect on function of single mutations and their combinations in a specific sequence2,5,8,9,10,11,12,13,14,15 or in different sequences2,3,5,16,17,18. However, systematic high-throughput studies of the local fitness landscape of an entire protein have not yet been reported. Here we visualize an extensive region of the local fitness landscape of the green fluorescent protein from Aequorea victoria (avGFP) by measuring the native function (fluorescence) of tens of thousands of derivative genotypes of avGFP. We show that the fitness landscape of avGFP is narrow, with 3/4 of the derivatives with a single mutation showing reduced fluorescence and half of the derivatives with four mutations being completely non-fluorescent. The narrowness is enhanced by epistasis, which was detected in up to 30% of genotypes with multiple mutations and mostly occurred through the cumulative effect of slightly deleterious mutations causing a threshold-like decrease in protein stability and a concomitant loss of fluorescence. A model of orthologous sequence divergence spanning hundreds of millions of years predicted the extent of epistasis in our data, indicating congruence between the fitness landscape properties at the local and global scales. The characterization of the local fitness landscape of avGFP has important implications for several fields including molecular evolution, population genetics and protein design.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 51 print issues and online access
£199.00 per year
only £3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Accession codes
References
Wright, S. The roles of mutation, inbreeding, crossbreeding and selection in evolution. Proc. Sixth Int. Congr. Genet. 1, 356–366 (1932)
de Visser, J. A. G. M. & Krug, J. Empirical fitness landscapes and the predictability of evolution. Nature Rev. Genet. 15, 480–490 (2014)
Dean, A. M. & Thornton, J. W. Mechanistic approaches to the study of evolution: the functional synthesis. Nature Rev. Genet. 8, 675–688 (2007)
Soskine, M. & Tawfik, D. S. Mutational effects and the evolution of new protein functions. Nature Rev. Genet. 11, 572–582 (2010)
Weinreich, D. M., Lan, Y., Wylie, C. S. & Heckendorn, R. B. Should evolutionary geneticists worry about higher-order epistasis? Curr. Opin. Genet. Dev. 23, 700–707 (2013)
Mackay, T. F. C. Epistasis and quantitative traits: using model organisms to study gene-gene interactions. Nature Rev. Genet. 15, 22–33 (2014)
Taylor, M. B. & Ehrenreich, I. M. Higher-order genetic interactions and their contribution to complex traits. Trends Genet. 31, 34–40 (2015)
Bershtein, S., Segal, M., Bekerman, R., Tokuriki, N. & Tawfik, D. S. Robustness-epistasis link shapes the fitness landscape of a randomly drifting protein. Nature 444, 929–932 (2006)
Fowler, D. M. et al. High-resolution mapping of protein sequence-function relationships. Nature Methods 7, 741–746 (2010)
Roscoe, B. P., Thayer, K. M., Zeldovich, K. B., Fushman, D. & Bolon, D. N. Analyses of the effects of all ubiquitin point mutants on yeast growth rate. J. Mol. Biol. 425, 1363–1377 (2013)
Jacquier, H. et al. Capturing the mutational landscape of the beta-lactamase TEM-1. Proc. Natl Acad. Sci. USA 110, 13067–13072 (2013)
Melamed, D., Young, D. L., Gamble, C. E., Miller, C. R. & Fields, S. Deep mutational scanning of an RRM domain of the Saccharomyces cerevisiae poly(A)-binding protein. RNA 19, 1537–1551 (2013)
Olson, C. A., Wu, N. C. & Sun, R. A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain. Curr. Biol. 24, 2643–2651 (2014)
Bank, C., Hietpas, R. T., Jensen, J. D. & Bolon, D. N. A systematic survey of an intragenic epistatic landscape. Mol. Biol. Evol. 32, 229–238 (2015)
Meini, M. R., Tomatis, P. E., Weinreich, D. M. & Vila, A. J. Quantitative description of a protein fitness landscape based on molecular features. Mol. Biol. Evol. 32, 1774–1787 (2015)
Kondrashov, A. S., Sunyaev, S. & Kondrashov, F. A. Dobzhansky–Muller incompatibilities in protein evolution. Proc. Natl Acad. Sci. USA 99, 14878–14883 (2002)
Firnberg, E., Labonte, J. W., Gray, J. J. & Ostermeier, M. A comprehensive, high-resolution map of a gene’s fitness landscape. Mol. Biol. Evol. 31, 1581–1592 (2014)
Parera, M. & Martinez, M. A. Strong epistatic interactions within a single protein. Mol. Biol. Evol. 31, 1546–1553 (2014)
Coates, M. M., Garm, A., Theobald, J. C., Thompson, S. H. & Nilsson, D. E. The spectral sensitivity of the lens eyes of a box jellyfish, Tripedalia cystophora (Conant). J. Exp. Biol. 209, 3758–3765 (2006)
DePristo, M. A., Weinreich, D. M. & Hartl, D. L. Missense meanderings in sequence space: a biophysical view of protein evolution. Nature Rev. Genet. 6, 678–687 (2005)
Milkman, R. Selection differentials and selection coefficients. Genetics 88, 391–403 (1978)
Kimura, M. & Crow, J. F. Effect of overall phenotypic selection on genetic change at individual loci. Proc. Natl Acad. Sci. USA 75, 6168–6171 (1978)
Crow, J. F. & Kimura, M. Efficiency of truncation selection. Proc. Natl Acad. Sci. USA 76, 396–399 (1979)
Rockah-Shmuel, L., Tóth-Petróczy, Á. & Tawfik, D. S. Systematic mapping of protein mutational space by prolonged drift reveals the deleterious effects of seemingly neutral mutations. PLOS Comput. Biol. 11, e1004421 (2015)
Li, W. H. Models of nearly neutral mutations with particular implications for nonrandom usage of synonymous codons. J. Mol. Evol. 24, 337–345 (1987)
Akashi, H. Inferring weak selection from patterns of polymorphism and divergence at ‘silent’ sites in Drosophila DNA. Genetics 139, 1067–1076 (1995)
Povolotskaya, I. S. & Kondrashov, F. A. Sequence space and the ongoing expansion of the protein universe. Nature 465, 922–926 (2010)
Usmanova, D. R., Ferretti, L., Povolotskaya, I. S., Vlasov, P. K. & Kondrashov, F. A. A model of substitution trajectories in sequence space and long-term protein evolution. Mol. Biol. Evol. 32, 542–554 (2015)
Eyre-Walker, A. & Keightley, P. D. The distribution of fitness effects of new mutations. Nature Rev. Genet. 8, 610–618 (2007)
Ohta, T. Slightly deleterious mutant substitutions in evolution. Nature 246, 96–98 (1973)
Acknowledgements
We thank Y. Kulikova and G. Filion for discussion on statistical analysis and I. Osterman, R. Moretti and J. Meiler for technical assistance and M. Friesen for a critical reading of the manuscript. We thank H. Himmelbauer, CRG Genomic Unit and the Russian Science Foundation project (14-50-00150) for sequencing. Experiments were partially carried out using the equipment provided by the IBCH core facility (CKP IBCH). The work was supported by HHMI International Early Career Scientist Program (55007424), the EMBO Young Investigator Programme, MINECO (BFU2012-31329), Spanish Ministry of Economy and Competitiveness Centro de Excelencia Severo Ochoa 2013-2017 grant (SEV-2012-0208), Secretaria d’Universitats i Recerca del Departament d’Economia i Coneixement de la Generalitat’s AGAUR program (2014 SGR 0974), Russian Science Foundation (14-25-00129) and the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013, ERC grant agreement, 335980_EinME).
Author information
Authors and Affiliations
Contributions
K.S.S. and M.V.M. conceived the idea for the experiment; K.S.S., D.A.B., M.V.M., A.S.M., G.V.S., M.D.L., D.M.C., E.V.P., I.Z.M., D.S.T., K.A.L. and F.A.K. participated in experimental design; K.S.S., D.A.B., M.V.M., G.V.S., E.V.P., E.S.E. and M.D.L. performed the experiments; K.S.S., D.A.B., M.V.M., D.R.U., A.S.M., D.N.I., N.G.B., M.S.B., O.S., N.S.B., P.K.V., A.S.K. and F.A.K. performed data analysis; K.S.S., D.A.B., M.V.M., D.R.U., D.N.I. and F.A.K. wrote the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Additional information
Raw sequencing data were deposited in the Sequence Read Archive (SRA) under BioProject number PRJNA282342. Processed data sets are available at Figshare http://dx.doi.org/10.6084/m9.figshare.3102154.
Extended data figures and tables
Extended Data Figure 1 Scheme of the experimental approach.
The depiction of the construct design, expression and cell sorting.
Extended Data Figure 2 Fluorescence and impact of mutations.
A violin plot of the measured levels of fluorescence for genotypes carrying different numbers of missense mutations.
Extended Data Figure 3 Mutant genotypes and evolution.
a, b, The log-fluorescence and evolutionary conservation expressed as Shannon entropy (a), and fraction of mutant amino acid states found in avGFP orthologues (b). The y-axis error bars in b show the binomial proportion confidence interval level (68%), and other error bars denote s.e.m.
Extended Data Figure 4 Epistatically interacting pairs of sites in the GFP structure.
a, Pairs of amino acid sites for which we assayed at least one combination of mutations (in blue, top left). The distribution of the maximum level of epistasis observed between sites (blue scale, bottom right) and unknown values (white). b, Pairs of sites under exceptionally strong epistatic interaction (e < −2) connected by a blue line on the GFP structure. c, The distribution of distances in the GFP structure between sites with at least one pair of epistatically interacting mutations (red) and all pairs of sites in the structure (grey). d, Epistasis between pairs of mutations as a function of their individual fluorescence. e, The contribution of internally and externally oriented amino acid residues in the avGFP structure relative to pairs of missense mutations showing no epistasis (|e| < 0.3), weak (0.3 < |e| < 0.7) and strong (|e| > 0.7) epistasis.
Extended Data Figure 5 Modelling effect of mutations on fluorescence.
a, A multiple linear regression in which fluorescence is linear combination of effects of individual single mutations. b, A multiple regression in which mutations contribute linearly to a fitness potential and fluorescence is a sigmoidal function of p where F ≈ e−p/(1 + e−p). c, d, The predicted fluorescence by a neural network approach. Predicted fitness function by a neural network with one hidden neuron and two neurons in the outer layer. e, The scheme of our neural network approach. The genotype data was passed to the input layer of the neural network as an array of 0s or 1s corresponding to the absence or presence of amino acid mutations in the genotype, respectively. The first hidden layer consisted of a single neuron that calculated the weighted sum of inputs using weights obtained during training. The output of the first hidden layer was passed through an output subnetwork that transformed this value with a nonlinear function to make the final prediction of fluorescence. The output subnetwork consisted of several neurons with a sigmoidal transfer function, allowing the subnetwork to approximate a broad range of nonlinear functions. The final mapping of the hidden value to fluorescence was determined by the weights of connections between neurons inside the output subnetwork. During training all weights were optimized to find the best prediction of fluorescence from the hidden value. The resulting function that was defined during training is shown in Fig. 4. f, Correlation between the hidden value of the neural network and Rosetta-predicted ΔΔG for single mutants.
Supplementary information
Supplementary Information
This file contains Supplementary Text and Data, Supplementary Figures 1-5, Supplementary Table 1 and Supplementary references –see contents page for details. (PDF 2608 kb)
The local GFP fitness landscape
A 3D rendering of our dataset that is also depicted in Figure 1b. The protein sequence is arranged in a circle, with the N terminal and the chromophore labelled on the outer circle. Black line markers outside the fitness landscape representation are positioned every 10 sites of avGFP. The Z-axis, height, represents the level of fluorescence, which is colour-coded from green to black. The surface is shown as the median fluorescence brightness levels of all mutations at a given site with fluorescence levels conferred by individual mutations shown by dots. The centre represents the fluorescence of avGFP with distance away from it corresponding to the number of mutations in the genotype. The median surface extends up to genotypes with 10 mutations. (MP4 26274 kb)
Rights and permissions
About this article
Cite this article
Sarkisyan, K., Bolotin, D., Meer, M. et al. Local fitness landscape of the green fluorescent protein. Nature 533, 397–401 (2016). https://doi.org/10.1038/nature17995
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nature17995
This article is cited by
-
Evolvability-enhancing mutations in the fitness landscapes of an RNA and a protein
Nature Communications (2023)
-
Structure-inducing pre-training
Nature Machine Intelligence (2023)
-
Pervasive epistasis exposes intramolecular networks in adaptive enzyme evolution
Nature Communications (2023)
-
Deep mutational scanning of essential bacterial proteins can guide antibiotic development
Nature Communications (2023)
-
Designed active-site library reveals thousands of functional GFP variants
Nature Communications (2023)