Article
Published: 13 January 2020

Titrating gene expression using libraries of systematically attenuated CRISPR guide RNAs

Nature Biotechnology volume 38, pages 355–364 (2020)Cite this article

22k Accesses
231 Altmetric
Metrics details

Subjects

Abstract

A lack of tools to precisely control gene expression has limited our ability to evaluate relationships between expression levels and phenotypes. Here, we describe an approach to titrate expression of human genes using CRISPR interference and series of single-guide RNAs (sgRNAs) with systematically modulated activities. We used large-scale measurements across multiple cell models to characterize activities of sgRNAs containing mismatches to their target sites and derived rules governing mismatched sgRNA activity using deep learning. These rules enabled us to synthesize a compact sgRNA library to titrate expression of ~2,400 genes essential for robust cell growth and to construct an in silico sgRNA library spanning the human genome. Staging cells along a continuum of gene expression levels combined with single-cell RNA-seq readout revealed sharp transitions in cellular behaviors at gene-specific expression thresholds. Our work provides a general tool to control gene expression, with applications ranging from tuning biochemical pathways to identifying suppressors for diseases of dysregulated gene expression.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Mismatched sgRNAs titrate GFP expression at the single-cell level.**

**Fig. 2: A large-scale CRISPRi screen identifies factors governing mismatched sgRNA activity.**

**Fig. 3: Identification and characterization of intermediate-activity constant regions.**

**Fig. 4: Neural network predictions of sgRNA activity.**

**Fig. 5: Compact mismatched sgRNA library targeting essential genes.**

**Fig. 6: Rich phenotyping of cells with intermediate-activity sgRNAs by Perturb-seq.**

Machine-guided design of cell-type-targeting cis-regulatory elements

Article Open access 23 October 2024

A pan-CRISPR analysis of mammalian cell specificity identifies ultra-compact sgRNA subsets for genome-scale experiments

Article Open access 02 February 2022

SimiC enables the inference of complex gene regulatory dynamics across cell phenotypes

Article Open access 12 April 2022

Data availability

Raw and processed Perturb-seq data are available at GEO under accession code GSE132080. Raw and processed sgRNA read counts from pooled screens are provided as supplementary tables. All other data will be made available by the corresponding author upon reasonable request.

Code availability

Custom scripts in this manuscript largely build on scripts published previously^14,34,52. An IPython notebook detailing the initialization of the CNN model and its use to predict mismatched sgRNA activities is included as a supplementary file. All custom scripts will be made available upon request.

References

Huang, N., Lee, I., Marcotte, E. M. & Hurles, M. E. Characterising and predicting haploinsufficiency in the human genome. PLoS Genet. 6, e1001154 (2010).
PubMed PubMed Central Google Scholar
Rest, J. S. et al. Nonlinear fitness consequences of variation in expression level of a eukaryotic gene. Mol. Biol. Evol. 30, 448–456 (2013).
PubMed CAS Google Scholar
Bauer, C. R., Li, S. & Siegal, M. L. Essential gene disruptions reveal complex relationships between phenotypic robustness, pleiotropy, and fitness. Mol. Syst. Biol. 11, 773–773 (2015).
PubMed PubMed Central Google Scholar
Keren, L. et al. Massively parallel interrogation of the effects of gene expression levels on fitness. Cell 166, 1282–1294.e18 (2016).
PubMed CAS Google Scholar
Dykhuizen, D. E., Dean, A. M. & Hartl, D. L. Metabolic flux and fitness. Genetics 115, 25–31 (1987).
PubMed PubMed Central CAS Google Scholar
Dekel, E. & Alon, U. Optimality and evolutionary tuning of the expression level of a protein. Nature 436, 588–592 (2005).
PubMed CAS Google Scholar
Alper, H., Fischer, C., Nevoigt, E. & Stephanopoulos, G. Tuning genetic control through promoter engineering. Proc. Natl Acad. Sci. USA 102, 12678–12683 (2005).
PubMed CAS PubMed Central Google Scholar
Perfeito, L., Ghozzi, S., Berg, J., Schnetz, K. & Lässig, M. Nonlinear fitness landscape of a molecular pathway. PLoS Genet. 7, e1002160 (2011).
PubMed PubMed Central CAS Google Scholar
Michaels, Y. S. et al. Precise tuning of gene expression levels in mammalian cells. Nat. Commun. 10, 818 (2019).
PubMed PubMed Central CAS Google Scholar
Patwardhan, R. P. et al. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Nat. Biotechnol. 27, 1173–1175 (2009).
PubMed PubMed Central CAS Google Scholar
Moore, R., Chandrahas, A. & Bleris, L. Transcription activator-like effectors: a toolkit for synthetic biology. ACS Synth. Biol. 3, 708–716 (2014).
PubMed PubMed Central CAS Google Scholar
Dominguez, A. A., Lim, W. A. & Qi, L. S. Beyond editing: repurposing CRISPR-Cas9 for precision genome regulation and interrogation. Nat. Rev. Mol. Cell Biol. 17, 5–15 (2016).
PubMed CAS Google Scholar
Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816–821 (2012).
PubMed PubMed Central CAS Google Scholar
Horlbeck, M. A. et al. Compact and highly active next-generation libraries for CRISPR-mediated gene repression and activation. eLife 5, e19760 (2016).
PubMed PubMed Central Google Scholar
Sanson, K. R. et al. Optimized libraries for CRISPR-Cas9 genetic screens with multiple modalities. Nat. Commun. 9, 5416 (2018).
PubMed PubMed Central CAS Google Scholar
Sternberg, S. H., Redding, S., Jinek, M., Greene, E. C. & Doudna, J. A. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature 507, 62–67 (2014).
PubMed PubMed Central CAS Google Scholar
Szczelkun, M. D. et al. Direct observation of R-loop formation by single RNA-guided Cas9 and Cascade effector complexes. Proc. Natl Acad. Sci. USA 111, 9798–9803 (2014).
PubMed CAS PubMed Central Google Scholar
Gilbert, L. A. et al. Genome-scale CRISPR-mediated control of gene repression and activation. Cell 159, 647–661 (2014).
PubMed PubMed Central CAS Google Scholar
Nishimasu, H. et al. Crystal structure of Cas9 in complex with guide RNA and target DNA. Cell 156, 935–949 (2014).
PubMed PubMed Central CAS Google Scholar
Kocak, D. D. et al. Increasing the specificity of CRISPR systems with engineered RNA secondary structures. Nat. Biotechnol. 37, 657–666 (2019).
PubMed PubMed Central CAS Google Scholar
Maji, B. et al. A high-throughput platform to identify small-molecule inhibitors of CRISPR-Cas9. Cell 177, 1067–1079 (2019).
PubMed CAS PubMed Central Google Scholar
Chiarella, A. M. et al. Dose-dependent activation of gene expression is achieved using CRISPR and small molecules that recruit endogenous chromatin machinery. Nat. Biotechnol. https://doi.org/10.1038/s41587-019-0296-7 (2019).
PubMed PubMed Central Google Scholar
Tian, R. et al. CRISPR interference-based platform for multimodal genetic screens in human iPSC-derived neurons. Neuron 104, 239–255 (2019).
PubMed CAS PubMed Central Google Scholar
Nakamura, M. et al. Anti-CRISPR-mediated control of gene editing and synthetic circuits in eukaryotic cells. Nat. Commun. 10, 194 (2019).
PubMed PubMed Central Google Scholar
Gilbert, L. A. et al. CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell 154, 442–451 (2013).
PubMed PubMed Central CAS Google Scholar
Kampmann, M., Bassik, M. C. & Weissman, J. S. Integrated platform for genome-wide screening and construction of high-density genetic interaction maps in mammalian cells. Proc. Natl Acad. Sci. USA 110, E2317–E2326 (2013).
PubMed CAS PubMed Central Google Scholar
Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184–191 (2016).
PubMed PubMed Central CAS Google Scholar
Hsu, P. D. et al. DNA-targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol. 31, 827–832 (2013).
PubMed PubMed Central CAS Google Scholar
Boyle, E. A. et al. High-throughput biochemical profiling reveals sequence determinants of dCas9 off-target binding and unbinding. Proc. Natl Acad. Sci. USA 114, 5461–5466 (2017).
PubMed CAS PubMed Central Google Scholar
Chen, B. et al. Dynamic imaging of genomic loci in living human cells by an optimized CRISPR/Cas system. Cell 155, 1479–1491 (2013).
PubMed PubMed Central CAS Google Scholar
Dang, Y. et al. Optimizing sgRNA structure to improve CRISPR-Cas9 knockout efficiency. Genome Biol. 16, 280 (2015).
PubMed PubMed Central Google Scholar
Grevet, J. D. et al. Domain-focused CRISPR screen identifies HRI as a fetal hemoglobin regulator in human erythroid cells. Science 361, 285–290 (2018).
PubMed PubMed Central CAS Google Scholar
Briner, A. E. et al. Guide RNA functional modules direct Cas9 activity and orthogonality. Mol. Cell 56, 333–339 (2014).
PubMed CAS Google Scholar
Adamson, B. et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell 167, 1867–1882 (2016).
PubMed PubMed Central CAS Google Scholar
Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019).
PubMed CAS Google Scholar
Kim, H. K. et al. Deep learning improves prediction of CRISPR-Cpf1 guide RNA activity. Nat. Biotechnol. 36, 239–241 (2018).
PubMed CAS Google Scholar
Luo, J., Chen, W., Xue, L. & Tang, B. Prediction of activity and specificity of CRISPR-Cpf1 using convolutional deep learning neural networks. BMC Bioinformatics 20, 332 (2019).
PubMed PubMed Central Google Scholar
Dixit, A. et al. Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866 (2016).
PubMed PubMed Central CAS Google Scholar
Jaitin, D. A. et al. Dissecting immune circuits by linking CRISPR-pooled screens with single-cell RNA-seq. Cell 167, 1883–1896 (2016).
PubMed CAS Google Scholar
Datlinger, P. et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat. Methods 14, 297–301 (2017).
PubMed PubMed Central CAS Google Scholar
Replogle, J. M. et al. Direct capture of CRISPR guides enables scalable, multiplexed, and multi-omic Perturb-seq. Preprint at bioRxiv https://doi.org/10.1101/503367 (2018).
Harding, H. P. et al. An integrated stress response regulates amino acid metabolism and resistance to oxidative stress. Mol. Cell 11, 619–633 (2003).
PubMed CAS Google Scholar
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).
Semenova, E. et al. Interference by clustered regularly interspaced short palindromic repeat (CRISPR) RNA is governed by a seed sequence. Proc. Natl Acad. Sci. USA 108, 10098–10103 (2011).
PubMed CAS PubMed Central Google Scholar
Wiedenheft, B. et al. RNA-guided complex from a bacterial immune system enhances target recognition through seed sequence interactions. Proc. Natl Acad. Sci. USA 108, 10092–10097 (2011).
PubMed CAS PubMed Central Google Scholar
Mandegar, M. A. et al. CRISPR interference efficiently induces specific and reversible gene silencing in human iPSCs. Cell Stem Cell 18, 541–553 (2016).
PubMed PubMed Central CAS Google Scholar
Genga, R. M. J. et al. Single-cell RNA-sequencing-based CRISPRi screening resolves molecular drivers of early human endoderm development. Cell Rep. 27, 708–718 (2019).
PubMed PubMed Central CAS Google Scholar
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
PubMed PubMed Central Google Scholar
Perez, A. R. et al. GuideScan software for improved single and paired CRISPR guide RNA design. Nat. Biotechnol. 35, 347–349 (2017).
PubMed PubMed Central CAS Google Scholar
Bae, S., Park, J. & Kim, J.-S. Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 30, 1473–1475 (2014).
PubMed PubMed Central CAS Google Scholar
Bassik, M. C. et al. Rapid creation and quantitative monitoring of high coverage shRNA libraries. Nat. Methods 6, 443–445 (2009).
PubMed PubMed Central CAS Google Scholar
Norman, T. M. et al. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science 365, 786–793 (2019).
PubMed PubMed Central CAS Google Scholar
Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
PubMed PubMed Central CAS Google Scholar

Download references

Acknowledgements

We thank G. Ow and E. Collisson (University of California, San Francisco) for sharing the mCherry-marked sgRNA expression vector, R. Pak, J. Stern and A. Xu for help with library cloning and sequencing library preparation, B. Adamson for sharing the modified CROP-seq vector, M. Jones, J. Chen, L. Gilbert, J. Replogle and all members of the Weissman laboratory for helpful discussions and E. Chow, D. Bogdanoff and K. Chaung from the UCSF Center for Advanced Technology for help with sequencing. This work was funded by National Institutes of Health grants F32 GM116331 and K99 GM130964 (both to M.J.), U01 CA168370, U01 CA217882 and RM1 HG009490 (all to J.S.W.) and R35 GM118061 (C.A.G.) and the Innovative Genomics Institute, UC Berkeley (C.A.G.). J.S.W. is a Howard Hughes Medical Institute Investigator. D.A.S. is supported by NSF Graduate Research Fellowship 1650113 and a Moritz–Heyman Discovery Fellowship. R.A.S. is supported by a Fannie and John Hertz Foundation Fellowship and an NSF Graduate Research Fellowship. M.A.H. is a Byers Family Discovery Fellow and is supported by the UCSF Medical Scientist Training Program and the School of Medicine. T.M.N. is a fellow and J.A.H. is the Rebecca Ridley Kry Fellow of the Damon Runyon Cancer Research Foundation (T.M.N., DRG-2211–15; J.A.H., DRG-2262–16).

Author information

Thomas M. Norman
Present address: Computational & Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
These authors contributed equally: Marco Jost, Daniel A. Santos.

Authors and Affiliations

Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA, USA
Marco Jost, Daniel A. Santos, Reuben A. Saunders, Max A. Horlbeck, Sonia M. Scaria, Thomas M. Norman, Jeffrey A. Hussmann, Christina R. Liem & Jonathan S. Weissman
Howard Hughes Medical Institute, University of California, San Francisco, San Francisco, CA, USA
Marco Jost, Daniel A. Santos, Reuben A. Saunders, Max A. Horlbeck, Sonia M. Scaria, Thomas M. Norman, Jeffrey A. Hussmann, Christina R. Liem & Jonathan S. Weissman
California Institute for Quantitative Biosciences, University of California, San Francisco, San Francisco, CA, USA
Marco Jost, Daniel A. Santos, Reuben A. Saunders, Max A. Horlbeck, Sonia M. Scaria, Thomas M. Norman, Jeffrey A. Hussmann, Christina R. Liem & Jonathan S. Weissman
Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, CA, USA
Marco Jost, John S. Hawkins, Jeffrey A. Hussmann & Carol A. Gross
Department of Cell and Tissue Biology, University of California, San Francisco, San Francisco, CA, USA
Carol A. Gross

Authors

Marco Jost
View author publications
You can also search for this author in PubMed Google Scholar
Daniel A. Santos
View author publications
You can also search for this author in PubMed Google Scholar
Reuben A. Saunders
View author publications
You can also search for this author in PubMed Google Scholar
Max A. Horlbeck
View author publications
You can also search for this author in PubMed Google Scholar
John S. Hawkins
View author publications
You can also search for this author in PubMed Google Scholar
Sonia M. Scaria
View author publications
You can also search for this author in PubMed Google Scholar
Thomas M. Norman
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey A. Hussmann
View author publications
You can also search for this author in PubMed Google Scholar
Christina R. Liem
View author publications
You can also search for this author in PubMed Google Scholar
Carol A. Gross
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan S. Weissman
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.J. conducted the large-scale growth screen, supervised the constant region and Perturb-seq experiments, implemented the linear machine-learning model, analyzed the large-scale screen and Perturb-seq data, conceived experiments and wrote the manuscript. D.A.S. conducted the GFP and constant-region screens, implemented the deep-learning model, designed and conducted the compact library screens, analyzed data, conceived experiments and wrote the manuscript. R.A.S. designed the constant-region library and conducted a pilot screen, designed and conducted the Perturb-seq experiment, analyzed data, conceived experiments and edited the manuscript. M.A.H. assisted with the large-scale growth screen and, with J.S.H., designed the large-scale library. S.M.S. evaluated modified constant-region activities by RT-qPCR. J.A.H. and T.M.N. assisted with data analysis. C.R.L. assisted with library cloning and screens. C.A.G. supervised the generation of the large-scale library and edited the manuscript. J.S.W. conceived and supervised experiments and wrote the manuscript. All authors provided feedback on the manuscript.

Corresponding author

Correspondence to Jonathan S. Weissman.

Ethics declarations

Competing interests

J.S.W., M.J., D.A.S., R.A.S., M.A.H. and T.M.N. have filed patent applications related to CRISPRi/a screening, Perturb-seq and mismatched sgRNAs. J.S.W. consults for and holds equity in KSQ Therapeutics, Maze Therapeutics and Tenaya Therapeutics. J.S.W. is a venture partner at 5AM Ventures and a member of the Amgen Scientific Advisory Board. M.J., M.A.H. and T.M.N. consult for Maze Therapeutics.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Details of the GFP mismatch experiment.

(a) Representative plots illustrating gating strategy to select cells for analysis. (b) Comparison of relative activities obtained from two replicate transductions. Relative activity was defined as the fold-knockdown of each mismatched variant (GFP_{sgRNA[non-targeting]}/GFP_{sgRNA[variant]}) divided by the fold-knockdown of the perfectly-matched sgRNA. The background fluorescence of a GFP^– strain was subtracted from all GFP values prior to other calculations. n = 57 sgRNAs; r² = squared Pearson correlation coefficient. (c) KDE plots of GFP distributions 10 days after transducing K562 GFP⁺ cells with the perfectly-matched sgRNA, a non-targeting sgRNA, and each of the 57 singly-mismatched variants. Fluorescence of GFP^– K562 cells is shown in gray. Although most GFP distributions are unimodal, some are broadened compared to those with the perfectly matched sgRNA or the negative control sgRNA. This heterogeneity could be a consequence of the random integration of the GFP locus, cell-to-cell differences in expression of the dCas9-KRAB effector in our polyclonal cell line, the amplification of gene expression bursts by long GFP half-lives, or a combination of these factors. Two replicate transductions were evaluated for each sgRNA (see panel b); data from one replicate are shown here.

Supplementary Figure 2 Additional analysis of large-scale mismatched sgRNA screen.

(a, b) Comparison of growth phenotypes (γ) of all sgRNAs derived from replicates of the (a) K562 (n = 119,201 sgRNAs) and (b) Jurkat screens (n = 119,229 sgRNAs). Marginal distributions are normalized to element count in each category. r² = squared Pearson correlation coefficient for targeting sgRNAs (mismatched and original). (c) Comparison of γ of perfectly matched sgRNAs from the K562 screen in this work and a previously published K562 screen¹⁴ (average of two replicate screens). n = 4,830 sgRNAs; r² = squared Pearson correlation coefficient. (d) Comparison of γ of perfectly matched sgRNAs in K562 and Jurkat cells reveals substantial differences, likely reflecting cell-type specific gene essentiality (average of two replicate screens). n = 4,892 sgRNAs; r² = squared Pearson correlation coefficient. (e) Comparison of mismatched sgRNA relative activities in K562 and Jurkat cells, classified by the difference in γ of the corresponding original guide. n = 15,103 (left) and 26,409 (right) sgRNAs; r² = squared Pearson correlation coefficient. (f) Distribution of mismatched sgRNA relative activities for sgRNAs with 1 mismatch (left) or 2 mismatches (right). (g) Distribution of mismatched sgRNA relative activities stratified by sgRNA GC content, grouped by mismatches located in positions –19 to –13 (PAM-distal region), positions –12 to –9 (intermediate region), and positions –8 to –1 (PAM-proximal/seed region). n = 282-7,592 sgRNAs. (h) Distribution of mismatched sgRNA relative activities stratified by the identity of the 2 bases flanking the mismatch, grouped by mismatches located in the three regions as in g. n = 155-2,031 sgRNAs. (i) Distribution of mismatched sgRNA relative activities stratified based on whether or not the invariant first G of the sgRNA (position –20) matches the genome, grouped by mismatches located in the three regions as in g. n = 4,267-11,524 sgRNAs. (j) Comparison of mean CRISPRi relative activities from large-scale screen and cutting frequency determination (CFD) scores²⁷. Values are compared for identical combinations of mismatch type and mismatch position; mean relative activities were calculated by averaging relative activities for all mismatched sgRNAs with a given combination. n = 228 mismatch type/position combinations; r² = squared Pearson correlation coefficient. (k) Distribution of sgRNA series by number of sgRNAs with intermediate activity (0.1 < relative activity < 0.9), using only sgRNAs with a single mismatch (top) or all mismatched sgRNAs (bottom). Lines in violin plots in panels g, h, i denote distribution quartiles.

Supplementary Figure 3 Additional analysis of modified constant regions.

(a) Comparison of growth phenotypes measured in replicate screens after 4, 6, or 8 days of growth from t₀. Data from Day 4 were used for all subsequent analyses. n = 35,830 sgRNAs; r² = squared Pearson correlation coefficient. (b) Comparison of relative % knockdown (quantified via RT-qPCR) and mean relative growth phenotype for 10 intermediate-activity constant region variants paired with two targeting sequences against DPH2. Data represent the mean of technical triplicates. (c) Relative activities of constant regions paired with all 30 targeting sequences, ranked by the average strength of each constant region and displayed as rolling means with a window size of 50. (d) Distribution of all pairwise correlations of constant region relative activities within and between gene targets. n = 30 and 1,350 for intra-gene and inter-gene comparisons, respectively; indicated p-values are derived from a two-tailed Student’s t-test; dashed lines in violin plots indicate the distribution quartiles. (e) Relative activity of each indicated target sequence:constant region pair vs. the mean relative activity of the respective constant region for all targets. Growth phenotypes (γ) with the unmodified constant region are indicated in the figure legends. Lines represent rolling means of individual data points.

Supplementary Figure 4 Additional details for the neural network.

(a) Graph of the CNN model architecture. (b) Example of 5-fold cross-validation using only the training dataset, further analyzed in the subsequent two panels of this figure. A similar scheme was used to optimize hyperparameters for the CNN model, albeit with 3-fold cross-validation to allow for larger training sets in each split. (c) Model loss, measured as root mean squared error, for training and test data over 30 training epochs. Each line represents one of 5 splits diagrammed in panel b. The final models used for our predictions were trained for 8 epochs, as additional cycles only reduced training loss without significant improvement in validation loss (i.e., the model becomes overfit). (d) Stability of the model with different input data. For each split in panel b, 20 independent CNN models were trained for 8 epochs on the same data. The root mean squared error on the test set for each model is plotted as a blue dot. Box plots indicate the interquartile range of each distribution. (e) Model loss for the final CNN ensemble. Each line represents one of 20 models trained for 8 epochs on the entire training set. (f) Explained variance of validation sgRNA relative activities for each individual model (black), and for the mean prediction of all 20 models (red). n = 5,241 sgRNAs evaluated for each model; r² = squared Pearson correlation coefficient. (g) Validation error stratified by mismatch position. (h) Validation error stratified by mismatch type. (i) Comparison of CNN prediction error (difference between measured and predicted activity) and off-target specificity score for all sgRNAs in the validation set. Off-target specificity scores were calculated using CRISPRi relative activities as described in the Methods. n = 5,241 sgRNAs; r = Pearson correlation coefficient. (j) Partitioning of sgRNAs into bins based on relative activity in the large-scale K562 screen. (k) Confusion matrix showing the fraction of sgRNAs in each actual (measured) activity bin that were assigned to each predicted bin by the CNN model. Each row sums to 1. (l) Statistics indicating the requisite number of randomly sampled sgRNAs from each activity bin to have a given probability of selecting at least one sgRNA with true activity in that bin. Simulations are based on the probabilities outlined in the confusion matrix (panel e). (m) Similar to panel l, with random sampling from bin 2 (relative activity 0.37-0.63) to yield at least one sgRNA with intermediate activity (0.1-0.9). We tested several sampling schemes (e.g. drawing from bin 1, 2, 3, or combinations of these), and found this method to empirically give the highest success rate for selecting sgRNAs with intermediate activities.

Supplementary Figure 5 Additional details for the linear model.

(a) Comparison of measured relative growth phenotypes from the large-scale screen and predicted activities assigned by the elastic net linear model. Marginal histograms show distributions of relative activities along the corresponding axes. n = 5,241 sgRNAs; r² = squared Pearson correlation coefficient. (b) Comparison of measured relative activity (relative knockdown) in the GFP experiment and predicted relative sgRNA activity. n = 57 sgRNAs; r² = squared Pearson correlation coefficient. (c) Comparison of predicted relative activities from the linear model and the neural network, based on the validation set of singly-mismatched sgRNAs. n = 5,241 sgRNAs; r² = squared Pearson correlation coefficient. (d) Regression coefficients assigned to each feature in the linear model. 228 features (gray, blue) describe the position and type of mismatch; 42 features (gold) carry other information about the sgRNA and genomic context surrounding the protospacer. These features are detailed in subsequent panels. (e) Linear coefficients for features of the sgRNA and targeted locus. TSS; transcription start site. (f) Linear coefficients for features covering positions in the distal, intermediate, and seed regions of the targeting sequence (highlighted blue in panel d).

Supplementary Figure 6 Additional analysis of the compact allelic series screen.

(a) Composition of the compact library, in terms of previously measured relative activities in the large-scale screen (dark purple), or predicted relative activities assigned by the CNN model ensemble (light purple). Perfectly matched sgRNAs, which by definition have relative activities of 1.0, comprise 20% of the library but were not included in the histogram. (b) Distribution of mismatch positions and types for singly-mismatched sgRNAs in the compact library, for previously measured (dark purple) and CNN-imputed (light purple) sgRNAs. (c) Heatmap showing the distribution of mutated positions for doubly-mismatched sgRNAs in the compact library. (d) Comparison of growth phenotypes measured in each K562 replicate screen 4- and 7-days post-transduction. Data from Day 7 were used for all subsequent analyses. n = 25,518 sgRNAs; r² = squared Pearson correlation coefficient. (e) Comparison of growth phenotypes measured in each HeLa replicate screen 6- and 8-days post-transduction. Data from Day 8 was used for all subsequent analyses. n = 25,518 sgRNAs; r² = squared Pearson correlation coefficient. (f) Comparison of growth phenotypes of original (perfectly matched) sgRNAs in HeLa and K562 cells (γ, expressed as the average of two replicate screens). n = 4,810 sgRNAs; r² = squared Pearson correlation coefficient. (g) Measured vs. predicted relative activities of CNN-imputed sgRNAs in K562 cells (left) and HeLa cells (right). A small number of points beyond the y-axis limits were excluded to more clearly display the bulk of the distribution. n = 6,147 sgRNAs; r² = squared Pearson correlation coefficient. (h) Comparison of sgRNA composition and model error for the large-scale and compact libraries. The CNN-imputed guides had substantially higher predicted activities than those for the large-scale validation set; higher predicted activity was generally associated with higher model error for the validation (red) and imputed (blue) sgRNA sets, consistent with the discrepancy in model performance on each set. (i) Distribution of the number of intermediate-activity mismatched sgRNAs targeting each gene in the compact library. The number of genes with at least 2 intermediate activity sgRNAs is indicated above each histogram; sgRNA activities were quantified for 1907 and 1442 genes in K562 and HeLa cells, respectively. Note that here activities are aggregated by gene as opposed to by series, as was done in Supplementary Fig. 2i. (j) Comparison of phenotypes measured in replicate screens after 12 days of growth in the drug screen. n = 25,518 sgRNAs; r² = squared Pearson correlation coefficient. (k) Comparison of vehicle- (γ) and lovastatin-treatment (τ) growth phenotypes for all sgRNAs in the compact library. Knockdown of HMG-CoA reductase (HMGCR) greatly sensitizes cells to lovastatin, compared to knockdown of other genes such as tubulin (TUBB). n = 25,518 sgRNAs.

Supplementary Figure 7 Summary of the Perturb-seq experiment.

(a) Schematic of Perturb-seq strategy to capture single-cell transcriptomes with matched sgRNA identities. (b) Summary of sequencing and perturbation assignment statistics. (c) Distribution of number of cells captured per perturbation. Median: 122 cells per perturbation; 5^th to 95^th percentile: 66 – 277 cells per perturbation. n = 19,587 cells. (d, e) Comparison of (d) growth phenotypes (γ) and (e) relative activities measured in the large-scale mismatched sgRNA screen and in the Perturb-seq experiment. Differences are likely due to the different timescales and the different vectors used. n = 128 sgRNAs; r² = squared Pearson correlation coefficient.

Supplementary Figure 8 Target gene expression in cells with indicated perturbations.

(a) Distribution of target gene expression levels, quantified as target gene UMI count normalized to total UMI count per cell. Cell numbers for each perturbation are listed in Supplementary Table 14. Box plots inside violin plots denote quartile ranges (box), median (center mark), and 1.5 × interquartile range (whiskers). (b) Mean target gene expression levels for target genes with low basal expression levels.

Supplementary Figure 9 Target gene expression in cells with indicated perturbations (different quantification).

Expression is quantified as raw target gene UMI count. Cell numbers for each perturbation are listed in Supplementary Table 14. Box plots inside violin plots denote quartile ranges (box), median (center mark), and 1.5 × interquartile range (whiskers).

Supplementary Figure 10 Phenotypes resulting from gene titration.

(a) Distributions of total UMI counts in cells with the perfectly matched sgRNA against the indicated genes. Cell numbers for each perturbation are listed in Supplementary Table 14. Box plots inside violin plots denote quartile ranges (box), median (center mark), and 1.5 × interquartile range (whiskers). (b) Left: Comparison of median UMI count per cell and relative growth phenotype in cells with sgRNAs targeting BCR, GATA1, or POLR2H or control cells. Right: Comparison of median UMI count per cell and target gene expression. (c) Cell cycle scores (Methods) for populations of cells with individual sgRNAs. (d) Fraction of cells in indicated cell cycle phase for populations with a negative control sgRNA or sgRNAs targeting CAD. (e) Magnitudes of gene expression change of populations with perfectly matched sgRNAs targeting indicated genes. Magnitude of gene expression change is calculated as sum of z-scores of genes differentially expressed in the series (FDR-corrected p < 0.05 with any sgRNA in the series, two-sided Kolmogorov-Smirnov test, Methods), with z-scores of each gene in individual cells signed by the average direction of change in the population. Cell numbers and violin plots are as in a. (f) Comparison of magnitude of gene expression change to growth phenotype (γ) for all perfectly matched sgRNAs in the experiment. (g) Comparison of relative growth phenotype and magnitude of gene expression change for all individual sgRNAs, as in Fig. 6f but without increased transparency for individual series. (h) Comparison of magnitude of gene expression and target gene knockdown, as in Fig. 6g but without increased transparency for individual series. (i) Comparison of relative growth phenotype and target gene expression, as in Fig. 6f. (j) Comparison of measured growth phenotype (γ, not normalized to strongest sgRNA) and target gene expression, as in Fig. 6f.

Supplementary Figure 11 Diverse phenotypes resulting from essential gene depletion.

(a) Clustered correlation heatmap of perturbations. Gene expression profiles for genes with mean UMI count > 0.25 in the entire population were z-normalized to expression values in cells with negative control sgRNAs and then averaged for populations with the same sgRNA. Crosswise Pearson correlations of all averaged transcriptomes were clustered by the Ward variance minimization algorithm implemented in scipy. Cell numbers for each perturbation are listed in Supplementary Table 14. (b) UMAP projection, distribution of cells with indicated sgRNAs, target gene expression (rolling mean over 50 cells), and magnitudes of transcriptional changes for all differentially expressed genes and selected ISR regulon genes (rolling mean over 50 cells) for cells with knockdown of ATP5E or control cells. n = 2,781 cells total (negative control: 2,084 cells; ATP5E (0.070): 101 cells; ATP5E (0.554) 136 cells; ATP5E (0.914): 137 cells; ATP5E (1.000) 175 cells; ATP5E (1.185) 148 cells). See Methods for details.

Supplementary information

Supplementary Figures

Supplementary Figures 1–11

Reporting Summary

Supplementary Tables

Supplementary Tables 1–17

Supplementary File

IPython notebook detailing the initialization of a convolutional neural network to predict mismatched sgRNA activities.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jost, M., Santos, D.A., Saunders, R.A. et al. Titrating gene expression using libraries of systematically attenuated CRISPR guide RNAs. Nat Biotechnol 38, 355–364 (2020). https://doi.org/10.1038/s41587-019-0387-5

Download citation

Received: 21 June 2019
Accepted: 05 December 2019
Published: 13 January 2020
Issue Date: March 2020
DOI: https://doi.org/10.1038/s41587-019-0387-5