WO2016114972A1 - Heterodimeric cas9 and methods of use thereof - Google Patents
Heterodimeric cas9 and methods of use thereof Download PDFInfo
- Publication number
- WO2016114972A1 WO2016114972A1 PCT/US2016/012470 US2016012470W WO2016114972A1 WO 2016114972 A1 WO2016114972 A1 WO 2016114972A1 US 2016012470 W US2016012470 W US 2016012470W WO 2016114972 A1 WO2016114972 A1 WO 2016114972A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- polypeptide
- amino acids
- acid sequence
- amino acid
- cas9
- Prior art date
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases RNAses, DNAses
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/113—Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
- C07K2319/01—Fusion polypeptide containing a localisation/targetting motif
- C07K2319/09—Fusion polypeptide containing a localisation/targetting motif containing a nuclear localisation signal
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
- C07K2319/70—Fusion polypeptide containing domain for protein-protein interaction
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/111—General methods applicable to biologically active non-coding nucleic acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
Definitions
- the present disclosure provides a Cas9 heterodimer, as well as nucleic acids encoding the Cas9 heterodimer, and host cells comprising the nucleic acids.
- the present disclosure provides a system that includes a Cas9 heterodimer of the present disclosure and at least one of: a Cas9 guide RNA, and a dimerizing agent.
- a Cas9 heterodimer of the present disclosure is useful in a wide variety of applications, which are also provided.
- the present disclosure provides a Cas9 heterodimer comprising: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-inter acting polypeptide; and b) a first fusion partner, where the first fusion partner is a first member of a dimerization pair; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a second member of a dimerization pair.
- the RuvCI polypeptide comprises an amino acid sequence having at least
- the RuvCII polypeptide comprises an amino acid sequence having at least 75% amino acid sequence, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719-775 of the Streptococcus pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a corresponding Cas9 amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346, and has a length of from about 40 amino acids to about 70 amino acids.
- the HNH polypeptide comprises an amino acid sequence having at least 75% amino acid sequence, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 776 to 909 of the
- the RuvCIII polypeptide comprises an amino acid sequence having at least at least 75% amino acid sequence, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910-1099 of the Streptococcus pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a corresponding Cas9 amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346, and has a length of from about 170 amino acids to about 210 amino acids.
- the PAM-interacting polypeptide comprises an amino acid sequence having at least 75% amino acid sequence, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1100-1367 of the Streptococcus pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a corresponding Cas9 amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346, and has a length of from about 250 amino acids to about 280 amino acids.
- the first fusion polypeptide comprises one or more heterologous nuclear localization sequences (NLS) that provides for nuclear localization.
- the NLS is at or near the N-terminus of the first fusion polypeptide. In some cases, the NLS is at or near the C-terminus of the first fusion polypeptide. In some cases, the first fusion polypeptide comprises an NLS at or near the N-terminus of the first fusion polypeptide and comprises an NLS at or near the C-terminus of the first fusion polypeptide. In some cases, wherein the second fusion polypeptide comprises one or more heterologous nuclear localization sequences (NLS) that provides for nuclear localization. In some cases, the NLS is at or near the N-terminus of the second fusion polypeptide. In some cases, the NLS is at or near the C-terminus of the second fusion polypeptide.
- NLS heterologous nuclear localization sequences
- the second fusion polypeptide comprises an NLS at or near the N-terminus of the first second polypeptide and comprises an NLS at or near the C-terminus of the second fusion polypeptide.
- the first fusion partner is or near the N- terminus of the first polypeptide.
- the first fusion partner is or near the C-terminus of the first polypeptide.
- the first fusion partner is located internally to the first polypeptide. In some cases, the first fusion partner is located within the HNH polypeptide.
- the first fusion partner is located between amino acids 860 and 880 of the 776 to 909 of the Streptococcus pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a corresponding Cas9 amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is located within the RuvCIII polypeptide. In some cases, the first fusion partner is located between amino acids 1000 and 1025 of the Streptococcus pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a corresponding Cas9 amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346.
- the second fusion partner is or near the N-terminus of the second polypeptide. In some cases, the second fusion partner is or near the C-terminus of the second polypeptide. In some cases, the second fusion partner is located internally to the second polypeptide.
- the present disclosure provides a Cas9 heterodimer comprising: A) a first fusion polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM- interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a first member of a dimerization pair; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a second member of the dimerization pair.
- the first, circular permuted, polypeptide comprises, in order from N-terminus to C-terminus: i) a RuvCII polypeptide; ii) an HNH polypeptide; iii) a RuvCIII polypeptide; iv) a PAM-interacting polypeptide; and v) a RuvCI polypeptide.
- the first, circular permuted, polypeptide comprises, in order from N-terminus to C-terminus: i) an HNH polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-interacting polypeptide; iv) a RuvCI polypeptide; and v) a RuvCII polypeptide.
- the first, circular permuted, polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCIII polypeptide; ii) a PAM-interacting polypeptide; iii) a RuvCI polypeptide; iv) a RuvCII polypeptide; and v) an HNH polypeptide.
- the first, circular permuted, polypeptide comprises, in order from N-terminus to C-terminus: i) a C- terminal portion of a RuvCIII polypeptide; ii) a PAM-interacting polypeptide; iii) a RuvCI polypeptide; iv) a RuvCII polypeptide; v) an HNH polypeptide; and vi) an N-terminal portion of a RuvCIII polypeptide.
- the first, circular permuted, polypeptide comprises, in order from N-terminus to C-terminus: i) a C-terminal portion of an HNH polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-interacting polypeptide; iv) a RuvCI polypeptide; v) a RuvCII polypeptide; and v) an N-terminal portion of an HNH polypeptide.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) the first fusion partner; and b) the first polypeptide.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: a) the first polypeptide; and b) the first fusion partner.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) heterologous nuclear localization sequences (NLS) that provides for nuclear localization; b) the first fusion partner; and c) the first polypeptide.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) the first fusion partner; c) the first polypeptide; and d) an NLS.
- the first fusion polypeptide and the second fusion polypeptide form a heterodimer in the presence of a small molecule dimerizer.
- a) the first fusion partner is (FK506 binding protein 1A) FKBP1A; and the second fusion partner is FKBP1A; b) the first fusion partner is FKBP1A; and the second fusion partner is and PPP3CA (protein phosphatase 3, catalytic subunit, alpha isozyme); c) the first fusion partner is FKBP1A; and the second fusion partner is and cyclophilin; d) the first fusion partner is FKBP1A; and the second fusion partner is and Fkbp-Rapamycin Binding Domain (FRB); e) the first fusion partner is gyrase B (GyrB) ; and the second fusion partner is and GyrB ; f) the first fusion partner is dihydr
- the second fusion partner is (FK506 binding protein 1A) FKBP1A; and the first fusion partner is FKBP1A; b) the second fusion partner is FKBP1A; and the first fusion partner is and PPP3CA (protein phosphatase 3, catalytic subunit, alpha isozyme); c) the second fusion partner is FKBP1A; and the first fusion partner is and cyclophilin; d) the second fusion partner is
- FKBP1A; and the first fusion partner is and Fkbp-Rapamycin Binding Domain (FRB); e) the second fusion partner is gyrase B (GyrB) ; and the first fusion partner is and GyrB; f) the second fusion partner is dihydrofolate reductase (DHFR) ; and the first fusion partner is and DHFR; g) the second fusion partner is DmrB; and the first fusion partner is and DmrB; h) the second fusion partner is PYL; and the first fusion partner is and ABI; i) the second fusion partner is Cry2; and the first fusion partner is and CIP; or j) the second fusion partner is GAI; and the first fusion partner is and GID1.
- FVB Fkbp-Rapamycin Binding Domain
- the present disclosure provides a system comprising: a Cas9 heterodimer (e.g., as described above) and a Cas9 guide RNA that comprises stem loop 1 but does not comprise at least one of: a stem loop 2 and a stem loop 3.
- the Cas9 guide RNA does not comprise a stem loop 2 and does not comprise a stem loop 3.
- the system comprises a small molecule dimerizer that induces dimerization of the first fusion polypeptide and the second fusion polypeptide.
- the present disclosure provides one or more nucleic acids comprising nucleotide sequences encoding the Cas9 heterodimer (e.g., as described above).
- the nucleotide sequences encoding the first fusion polypeptide and the nucleotide sequences encoding the second fusion polypeptide are in the same nucleic acid.
- the nucleotide sequences encoding the first fusion polypeptide and nucleotide sequences encoding the second fusion polypeptide are in separate nucleic acids.
- the nucleotide sequences encoding the first fusion polypeptide are operably linked to a transcription control sequence.
- the nucleotide sequences encoding the second fusion polypeptide are operably linked to a transcription control sequence.
- the present disclosure provides a recombinant vector comprising the one or more nucleic acids.
- the vector is a viral vector.
- the viral vector is an adeno-associated viral vector, a lentiviral vector, or a retroviral vector.
- the present disclosure provides a host cell genetically modified with the one or more nucleic acids or with the recombinant expression vector.
- the host cell is a prokaryotic cell.
- the host cell is a eukaryotic cell.
- the host cell is a mammalian cell, a plant cell, an amphibian cell, a reptile cell, a yeast cell, a protozoan cell, or a helminth cell.
- the present disclosure provides a method of binding a target nucleic acid, comprising:
- the method results in modification of the target nucleic acid.
- the modification is cleavage.
- the Cas9 guide RNA does not comprise a stem loop 2 and does not comprise a stem loop 3.
- the system comprises a small molecule dimerizer that induces dimerization of the first fusion polypeptide and the second fusion polypeptide.
- the system comprises a donor polynucleotide.
- the system comprises a PAMmer.
- the Cas9 heterodimer has reduced nuclease activity.
- the Cas9 heterodimer has nickase activity.
- the Cas9 heterodimer includes a fusion partner that provides for an activity selected from: transcription modulation, methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity,
- deubiquitinating activity adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, and demyristoylation activity.
- Figures 1A-B present one embodiment of a split Cas9 protein (a Cas9 heterodimer), and a crystal structure that illustrates where the nuclease lobe and alpha-helical lobe can be separated relative to a wild type Cas9 protein.
- Figure 2 presents assays related to the expression and cleavage activity of split Cas9.
- Figure 3 presents assays related to DNA cleavage activity of split Cas9 when using truncated single guide RNA (sgRNA).
- sgRNA truncated single guide RNA
- Figures 4A-B present two different suitable dimerization domain insertion sites (and/or circular permutation sites) for split Cas9.
- Figure 5 depicts a Cas9 protein sequence of 5. pyogenes (SEQ ID NO: 1545).
- Figures 6A and 6B present illustrative example sequences of nuclease lobes and alpha-helical lobes (first and second polypeptides) of a subject Split Cas9.
- the examples depicted include tags
- protease sites e.g., TEV protease sites
- Figure 7 presents illustrative example sequences of nuclease lobes and alpha-helical lobes (first and second polypeptides) of a subject Split Cas9.
- the examples depicted include fusion partners (including dimer pair members), linkers, NLSs, tags (e.g., FLAG tags, HA tags), etc.
- FIGS 8A-F provide a schematic drawing of exemplary embodiments of subject compositions and methods in which the target nucleic acid is a single stranded nucleic acid.
- A-B Each embodiment depicted includes a PAMmer, which is hybridized to a single stranded target nucleic acid; and a guide nucleic acid (Cas9 guide RNA), which is hybridized to the target nucleic acid and is associated with a Cas9 heterodimer.
- the PAM sequence can be provided by the target nucleic acid and a PAMmer may therefore not be necessary.
- Each embodiment depicted in C and D includes a PAMmer having a specificity segment and an orientation segment.
- the PAM sequence is complementary to the target nucleic acid in 12C, and is not complementary to the target nucleic acid in 12D.
- E-F Each embodiment depicted in E and F includes a PAMmer having either a specificity segment or an orientation segment.
- the PAM sequence is complementary to the target nucleic acid on the right, and is not complementary to the target nucleic acid on the left.
- Figure 9 presents a schematic of one possible guide RNA.
- the depicted guide RNA is a single guide RNA with a targeter covalently linked to an activator via 4 linker nucleotides.
- the nucleotides are 5' to 3' from left to right.
- Figures 10A-D present a schematic of one way in which Cas9 can be split into two separate polypeptides that retain the ability to catalyze RNA-guided dsDNA cleavage.
- Figures 11A-E present data related to the requirements for split-Cas9 assembly.
- Figures 12A-C present genomic editing data for split-Cas9.
- Figures 13A-B present size exclusion chromatograms of purified oc-helical (top) and nuclease (bottom) lobes.
- Figures 14A-C present split-Cas9 activity data and individual lobe data related to the use single- guide and dual-guide RNAs.
- Figures 15A-B present split-Cas9 versus non-split Cas9 binding affinity data for target DNA.
- Figures 16A-D present data related to requirements for split-Cas9 heterodimerization.
- Figure 17 presents data related to the DNA cleavage activity of split-Cas9 in the presence of excess sgRNA.
- Figures 18A-C present a schematic of one strategy for inducible control of genome engineering by a split-Cas9 enzyme complex.
- Figures 19A-B present data related to split-Cas9 activity in the presence of different sgRNAs.
- Figures 20A-B present Table 1 and Table SI of Example 2.
- polynucleotide and nucleic acid refer to a polymeric form of nucleotides of any length, either ribonucleotides or deoxynucleotides. Thus, this term includes, but is not limited to, single-, double-, or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA -RNA hybrids, or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases.
- polynucleotide and “nucleic acid” should be understood to include, as applicable to the embodiment being described, single-stranded (such as sense or antisense) and double-stranded polynucleotides.
- nucleic acid refers to a nucleic acid, cell, or organism that is found in nature.
- a polypeptide or polynucleotide sequence that is present in an organism (including viruses) that can be isolated from a source in nature and which has not been intentionally modified by a human in the laboratory is naturally occurring.
- isolated is meant to describe a polynucleotide, a polypeptide, or a cell that is in an environment different from that in which the polynucleotide, the polypeptide, or the cell naturally occurs.
- An isolated genetically modified host cell may be present in a mixed population of genetically modified host cells.
- Heterologous means a nucleotide or polypeptide sequence that is not found in the native nucleic acid or protein, respectively.
- a first fusion polypeptide of a Cas9 heterodimer can comprise a polypeptide comprising a Cas9 nuclease and a non-Cas9 polypeptide, where the non-Cas9 polypeptide can be a fusion partner, etc.
- exogenous nucleic acid refers to a nucleic acid that is not normally or naturally found in and/or produced by a given bacterium, organism, or cell in nature.
- endogenous nucleic acid refers to a nucleic acid that is normally found in and/or produced by a given bacterium, organism, or cell in nature.
- An “endogenous nucleic acid” is also referred to as a “native nucleic acid” or a nucleic acid that is “native” to a given bacterium, organism, or cell.
- Recombinant means that a particular nucleic acid (DNA or RNA) is the
- DNA sequences encoding the structural coding sequence can be assembled from cDNA fragments and short oligonucleotide linkers, or from a series of synthetic oligonucleotides, to provide a synthetic nucleic acid which is capable of being expressed from a recombinant transcriptional unit contained in a cell or in a cell-free transcription and translation system.
- sequences can be provided in the form of an open reading frame uninterrupted by internal non-translated sequences, or introns, which are typically present in eukaryotic genes.
- Genomic DNA comprising the relevant sequences can also be used in the formation of a recombinant gene or transcriptional unit. Sequences of non-translated DNA may be present 5' or 3' from the open reading frame, where such sequences do not interfere with manipulation or expression of the coding regions, and may indeed act to modulate production of a desired product by various mechanisms (see “DNA regulatory sequences", below).
- the term "recombinant" polynucleotide or “recombinant” nucleic acid refers to one which is not naturally occurring, e.g., is made by the artificial combination of two otherwise separated segments of sequence through human intervention.
- This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques. Such is usually done to replace a codon with a redundant codon encoding the same or a conservative amino acid, while typically introducing or removing a sequence recognition site. Alternatively, it is performed to join together nucleic acid segments of desired functions to generate a desired combination of functions.
- This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques.
- the term "recombinant" polypeptide refers to a polypeptide which is not naturally occurring, e.g., is made by the artificial combination of two otherwise separated segments of amino sequence through human intervention.
- a polypeptide that comprises a heterologous amino acid sequence is recombinant.
- construct or "vector” is meant a recombinant nucleic acid, generally recombinant DNA, which has been generated for the purpose of the expression and/or propagation of a specific nucleotide sequence(s), or is to be used in the construction of other recombinant nucleotide sequences.
- DNA regulatory sequences refer to transcriptional and translational control sequences, such as promoters, enhancers, polyadenylation signals, terminators, protein degradation signals, and the like, that provide for and/or regulate expression of a coding sequence and/or production of an encoded polypeptide in a host cell.
- transformation is used interchangeably herein with “genetic modification” and refers to a permanent or transient genetic change induced in a cell following introduction of new nucleic acid (i.e., DNA exogenous to the cell).
- Genetic change can be accomplished either by incorporation of the new DNA into the genome of the host cell, or by transient or stable maintenance of the new DNA as an episomal element.
- a permanent genetic change is generally achieved by introduction of the DNA into the genome of the cell.
- chromosomes In prokaryotic cells, permanent changes can be introduced into the chromosome or via extrachromosomal elements such as plasmids and expression vectors, which may contain one or more selectable markers to aid in their maintenance in the recombinant host cell.
- Suitable methods of genetic modification include viral infection, transfection, conjugation, protoplast fusion, electroporation, particle gun technology, calcium phosphate precipitation, direct microinjection, and the like.
- the choice of method is generally dependent on the type of cell being transformed and the circumstances under which the transformation is taking place (i.e. in vitro, ex vivo, or in vivo). A general discussion of these methods can be found in Ausubel, et al, Short Protocols in Molecular Biology, 3rd ed., Wiley & Sons, 1995.
- a promoter is operably linked to a coding sequence if the promoter affects its transcription or expression.
- heterologous promoter and “heterologous control regions” refer to promoters and other control regions that are not normally associated with a particular nucleic acid in nature.
- a “transcriptional control region heterologous to a coding region” is a transcriptional control region that is not normally associated with the coding region in nature.
- a "host cell,” as used herein, denotes an in vivo or in vitro eukaryotic cell, a prokaryotic cell, or a cell from a multicellular organism (e.g., a cell line) cultured as a unicellular entity, which eukaryotic or prokaryotic cells can be, or have been, used as recipients for a nucleic acid (e.g., an expression vector that comprises a nucleotide sequence encoding one or more biosynthetic pathway gene products such as mevalonate pathway gene products), and include the progeny of the original cell which has been genetically modified by the nucleic acid.
- a nucleic acid e.g., an expression vector that comprises a nucleotide sequence encoding one or more biosynthetic pathway gene products such as mevalonate pathway gene products
- a “recombinant host cell” (also referred to as a “genetically modified host cell”) is a host cell into which has been introduced a heterologous nucleic acid, e.g., an expression vector.
- a subject prokaryotic host cell is a genetically modified prokaryotic host cell (e.g., a bacterium), by virtue of introduction into a suitable prokaryotic host cell of a heterologous nucleic acid, e.g., an exogenous nucleic acid that is foreign to (not normally found in nature in) the prokaryotic host cell, or a recombinant nucleic acid that is not normally found in the prokaryotic host cell; and a subject eukaryotic host cell is a genetically modified eukaryotic host cell, by virtue of introduction into a suitable eukaryotic host cell of a heterologous nucleic acid, e.g., an exogenous nucleic acid that is foreign to the eukaryotic host cell, or a recombinant nucleic acid that is not normally found in the eukaryotic host cell.
- a suitable prokaryotic host cell e.g., a bacterium
- a group of amino acids having aliphatic side chains consists of glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic -hydroxyl side chains consists of serine and threonine; a group of amino acids having amide-containing side chains consists of asparagine and glutamine; a group of amino acids having aromatic side chains consists of phenylalanine, tyrosine, and tryptophan; a group of amino acids having basic side chains consists of lysine, arginine, and histidine; and a group of amino acids having sulfur-containing side chains consists of cysteine and methionine.
- Exemplary conservative amino acid substitution groups are: valine-leucine -isoleucine, phenylalanine-tyrosine, lysine-arginine
- a polynucleotide or polypeptide has a certain percent "sequence identity" to another
- polynucleotide or polypeptide meaning that, when aligned, that percentage of bases or amino acids are the same, and in the same relative position, when comparing the two sequences.
- Sequence similarity can be determined in a number of different manners.
- sequences can be aligned using the methods and computer programs, including BLAST, available over the world wide web at ncbi.nlm.nih.gov/BLAST. See, e.g., Altschul et al. (1990), /. Mol. Biol. 215:403-10.
- FASTA is Another alignment algorithm, available in the Genetics Computing Group (GCG) package, from Madison, Wisconsin, USA, a wholly owned subsidiary of Oxford Molecular Group, Inc.
- GCG Genetics Computing Group
- Other techniques for alignment are described in Methods in Enzymology, vol. 266: Computer Methods for Macromolecular Sequence Analysis (1996), ed.
- the present disclosure provides a Cas9 heterodimer, as well as nucleic acids encoding the Cas8 heterodimer, and host cells comprising the nucleic acids.
- the present disclosure provides a system comprising a Cas9 heterodimer of the present disclosure.
- a Cas9 heterodimer of the present disclosure is useful in a wide variety of applications, which are also provided.
- the present disclosure provides a Cas9 heterodimer.
- a Cas9 heterodimer of the present disclosure provides a Cas9 heterodimer.
- a Cas9 heterodimer of the present disclosure is also referred to herein as a
- a Cas9 heterodimer of the present disclosure includes a first fusion polypeptide comprising a first polypeptide (e.g., a Cas9 nuclease lobe) covalently linked (directly or via a linker) to a first fusion partner; and a second fusion polypeptide comprising a second polypeptide (e.g., a Cas9 alpha-helical lobe) covalently linked (directly or via a linker) to a second fusion partner.
- a first fusion polypeptide comprising a first polypeptide (e.g., a Cas9 nuclease lobe) covalently linked (directly or via a linker) to a first fusion partner
- a second fusion polypeptide comprising a second polypeptide (e.g., a Cas9 alpha-helical lobe) covalently linked (directly or via a linker) to a second fusion partner.
- the first polypeptide e.g., a Cas9 nuclease lobe
- the first polypeptide is circularly permuted (i.e., in some cases, the first polypeptide is a circular permutant).
- a Cas9 heterodimer of the present disclosure comprises two polypeptides that can interact to form a complex (i.e., to form the heterodimeric Cas9 protein).
- a Cas9 heterodimer of the present disclosure is also referred to herein as a "split Cas9" or a "split Cas9 protein.”
- the fusion partners present in the first fusion polypeptide and the second fusion polypeptide can be induced to dimerize (e.g., by a dimerizing agent).
- the first fusion polypeptide and the second fusion polypeptide dimerize When the fusion partners present in the first fusion polypeptide and the second fusion polypeptide dimerize, the first fusion polypeptide and the second fusion polypeptide dimerize. In the absence of a dimerizing agent, and in the absence of a guide RNA that includes a stem loop 2 and/or a stem loop 3, the first fusion polypeptide and the second fusion polypeptide do not dimerize.
- the Cas9 heterodimer When the first fusion polypeptide and the second fusion polypeptide dimerize, the Cas9 heterodimer, together with a truncated guide RNA (e.g., a guide RNA that does not include stem loop 2 and/or stem loop 3), can bind a target nucleic acid (an in some cases modify, e.g., cleave or otherwise modify the target nucleic acid).
- a Cas9 heterodimer of the present disclosure and a truncated guide RNA form a "Cas9 heterodimer system," described herein.
- a Cas9 heterodimer system of the present disclosure can bind to a target nucleic acid.
- a Cas9 heterodimer system of the present disclosure can bind to a target nucleic acid and cleave the target nucleic acid. In some cases, a Cas9 heterodimer system of the present disclosure can bind to a target nucleic acid and modify the target nucleic acid. In some cases, a Cas9 heterodimer system of the present disclosure can bind to a target nucleic acid and modulate transcription of/from the target nucleic acid.
- a subject Cas9 heterodimer (a split Cas9 protein) includes a first polypeptide (where the first polypeptide includes a Cas9 nuclease lobe) and a second polypeptide (where the second polypeptide includes a Cas9 alpha-helical lobe) (e.g., see Figure 1).
- a nuclease lobe includes: (i) a RuvC domain, where a RuvC domain comprises a RuvCI polypeptide, a RuvCII polypeptide, and a RuvCIII polypeptide; (ii) an HNH domain (also referred to as an HNH polypeptide); and (iii) a PAM-interacting domain (also referred to as a "PAM-interacting polypeptide").
- a Cas9 alpha-helical lobe is also referred to as an "alpha-helical recognition region.”
- a Cas9 heterodimer of the present disclosure comprises : A) a first fusion
- polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a first member of a dimerization pair; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a second member of a dimerization pair.
- First fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-
- a Cas9 heterodimer of the present disclosure comprises : A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM- interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a first member of a dimerization pair; and B) a second fusion polypeptide comprising: a) an alpha- helical recognition region; and b) a second fusion partner, where the second fusion partner is a second member of a dimerization pair.
- a RuvCI polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 amino acids to 60 amino acids of amino acids 1-60 of the 5.
- a RuvCI polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1- 60 of the 5.
- a RuvCI polypeptide can have at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 2-56 of the 5.
- a RuvCII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 to 57 amino acids of amino acids 718-774 of the 5.
- a RuvCII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 718-774 of the 5.
- pyogene s Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of 55-60 (e.g., 55, 56, 57, 58, 59, or 60) amino acids.
- a short alpha-helix (S717-L727 in the 5. pyogenes Cas9 depicted in Figure 5) can be removed, e.g., to minimize the distance between the end of RuvCI and the beginning of RuvCII.
- a short alpha-helix (S717-L727 in the 5. pyogenes Cas9 depicted in Figure 5) is removed and the RuvCI polypeptide is connected to the RuvCII polypeptide with a linker (e.g., a glycine-serine-serine linker, and as described elsewhere).
- a RuvCII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 to 46 amino acids of amino acids 729-775 of the 5.
- a RuvCII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 728-774 of the 5.
- An HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 100 to 134 amino acids of amino acids 776-909 of the 5.
- an HNH polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 776-909 of the 5.
- a RuvCIII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 150 amino acids to 190 amino acids of amino acids 910 to 1099 of the 5.
- a RuvCIII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1099 of the 5.
- pyogene s Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 180 amino acids to 190 amino acids (e.g., 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, or 190 amino acids).
- a PAM-interacting polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 200 amino acids to 268 amino acids of amino acids 1100 to 1367 of the 5.
- a PAM-interacting polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1100 to 1367 of the 5.
- pyogene Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 260 amino acids to 270 amino acids (e.g., 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, or 270 amino acids).
- Heterologous subcellular localization sequences e.g., 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, or 270 amino acids.
- the first fusion polypeptide comprises a heterologous sequence that provides for subcellular localization (e.g., an NLS for targeting to the nucleus; a mitochondrial localization signal for targeting to the mitochondria; a chloroplast localization signal for targeting to a chloroplast; an ER retention signal; and the like).
- the first fusion polypeptide includes 2 or more, 3 or more, 4 or more, or 5 or more NLSs.
- an NLS is located at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the N-terminus and/or at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the C-terminus.
- the first fusion polypeptide comprises an NLS.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a first fusion partner; and c) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide.
- the first fusion polypeptide comprises an NLS.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a first fusion partner; c) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM- interacting polypeptide; and d) an NLS.
- the first fusion polypeptide comprises an NLS.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: a) an NLS; b) a first fusion partner; c) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and d) an NLS.
- the first fusion polypeptide comprises an NLS.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and c) a first fusion partner.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; c) a first fusion partner; and d) an NLS.
- the NLS comprises the amino acid sequence
- the NLS comprises the amino acid sequence KRPAATKKAGQAKKKK (SEQ ID NO: 1547). Other suitable NLS are described elsewhere herein.
- an NLS can be at or near the N-terminus and/or the C-terminus.
- the first fusion polypeptide comprises two or more NLSs (e.g., 3 or more, 4 or more, or 5 or more NLSs).
- the first fusion polypeptide comprises one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the N-terminus and/or one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the C-terminus.
- the term "at or near” is used here because, as is known in the art, the NLS need not be at the actual terminus of a protein, but can be positioned near (e.g., within 100 amino acids of) an N- and/or C- terminus (e.g., within 80, within 75, within 60, within 55, within 50, within 45, within 40, within 35, or within 30 amino acids of the an N-and/or C-terminus).
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first fusion partner; and b) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide.
- a first fusion polypeptide comprises one or more linker polypeptides.
- a linker polypeptide can be interposed between any of: a) an NLS and a fusion partner; b) a fusion partner and a RuvCI polypeptide; c) a RuvCI polypeptide and a RuvCII polypeptide; and d) a PAM-interacting polypeptide and an NLS.
- the linker polypeptide may have any of a variety of amino acid sequences. Proteins can be joined by a spacer peptide, generally of a flexible nature, although other chemical linkages are not excluded. Suitable linkers include polypeptides of between about 6 amino acids and about 40 amino acids in length, or between about 6 amino acids and about 25 amino acids in length. These linkers are generally produced by using synthetic, linker-encoding oligonucleotides to couple the proteins. Peptide linkers with a degree of flexibility will generally be preferred. The linking peptides may have virtually any amino acid sequence, bearing in mind that the preferred linkers will have a sequence that results in a generally flexible peptide.
- Exemplary polypeptide linkers include glycine polymers (G) n , glycine-serine polymers
- n is an integer of at least one
- glycine-alanine polymers including, for example, (GS) n , GSGGS n (SEQ ID NO: 1548) and GGGS n (SEQ ID NO: 1549), where n is an integer of at least one
- glycine-alanine polymers including, for example, (GS) n , GSGGS n (SEQ ID NO: 1548) and GGGS n (SEQ ID NO: 1549), where n is an integer of at least one
- Exemplary linkers can comprise amino acid sequences including, but not limited to, GGSG
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner.
- a first fusion polypeptide comprises one or more linker polypeptides.
- a linker polypeptide can be interposed between any of: a) an NLS and a RuvCI polypeptide; b) a RuvCI polypeptide and a RuvCII polypeptide; c) a PAM-interacting polypeptide and an NLS; d) a PAM-interacting polypeptide and a second fusion partner; and e) a fusion partner and an NLS.
- Suitable linker polypeptides are as described above.
- Fusion partner located internally within first fusion polypeptide
- the fusion partner is located internally with the first polypeptide. In some cases, the first fusion partner is inserted within the HNH polypeptide. In some cases, the first fusion partner is inserted within the RuvCIII polypeptide.
- the first fusion partner is inserted within the HNH polypeptide.
- pyogenes Cas9 is amino acids 776-909 of the amino acid sequence set forth in
- the first fusion partner is inserted in a site within amino acids 800 to 900 of amino acids 776-909 of the amino acid sequence of the 5.
- Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346.
- the first fusion partner is inserted at or near amino acid 868 of amino acids 776-909 of the amino acid sequence of the 5.
- pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346.
- the first fusion partner is inserted at amino acid 868 of amino acids 776-909 of the amino acid sequence of the 5.
- Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346.
- the first fusion partner is inserted at amino acid 860 of amino acids 776-909 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 861 of amino acids 776-909 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 862 of amino acids 776-909 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 863 of amino acids 776-909 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 864 of amino acids 776-909 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 865 of amino acids 776-909 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 866 of amino acids 776-909 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 867 of amino acids 776-909 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 869 of amino acids 776-909 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 870 of amino acids 776-909 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 871 of amino acids 776-909 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 872 of amino acids 776-909 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 873 of amino acids 776-909 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 874 of amino acids 776-909 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 875 of amino acids 776-909 of the amino acid sequence of the 5.
- the first fusion polypeptide can comprise, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide; iv) a first fusion partner; v) a C-terminal portion of an HNH polypeptide; vi) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide.
- An N-terminal portion of an HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 80 amino acids to 92 amino acids of amino acids 776 to 867 of the 5.
- an N-terminal portion of an HNH polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 776 to 867 of the 5.
- An N-terminal portion of an HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 50 amino acids to 66 amino acids of amino acids 776- 841 of the 5.
- a C-terminal portion of an HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 35 to 42 amino acids of amino acids 868-909 of the 5.
- a C-terminal portion of an HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 50 amino acids to 67 amino acids of amino acids 842-909 of the 5.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 860 of the 5.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 861 of the 5.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 862 of the 5.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 863 of the 5.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 864 of the 5.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 865 of the 5.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 866 of the 5.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 867 of the 5.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 868 of the 5.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 869 of the 5.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 870 of the 5.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 871 of the 5.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 872 of the 5.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 873 of the 5.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 874 of the 5.
- the first fusion partner is inserted within the RuvCIII polypeptide.
- pyogenes Cas9 is amino acids 910-1099 of the amino acid sequence set forth in SEQ ID NO: 1545.
- the first fusion partner is inserted in a site within amino acids 950 to 1060 of amino acids 910-1099 of the amino acid sequence of the 5.
- the first fusion partner is inserted at or near amino acid 1016 of amino acids 910-1099 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 1016 of amino acids 910-1099 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 1010 of amino acids 910-1099 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 1011 of amino acids 910-1099 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 1012 of amino acids 910- 1099 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 1013 of amino acids 910-1099 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 1014 of amino acids 910-1099 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 1015 of amino acids 910-1099 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 1017 of amino acids 910-1099 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 1018 of amino acids 910-1099 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 1019 of amino acids 910-1099 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 1020 of amino acids 910-1099 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 1021 of amino acids 910-1099 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 1022 of amino acids 910-1099 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 1023 of amino acids 910-1099 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 1024 of amino acids 910-1099 of the amino acid sequence of the 5.
- the first fusion partner is inserted at amino acid 1025 of amino acids 910-1099 of the amino acid sequence of the 5.
- the first fusion polypeptide can comprise, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide; v) a first fusion partner; vi) a C-terminal portion of a RuvCIII polypeptide; and v) a PAM-interacting polypeptide.
- An N-terminal portion of a RuvCIII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 80 amino acids to 106 amino acids of amino acids 910 to 1015 of the 5.
- a RuvCIII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1015 of the 5.
- a C-terminal portion of a RuvCIII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 75 amino acids to 84 amino acids of amino acids 1016 to 1099 of the 5.
- a C-terminal RuvCIII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1016 to 1099 of the 5.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1010 of the 5.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1011 of the 5.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1012 of the 5.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1013 of the 5.
- the first fusion polypeptide comprises, in order from
- N-terminus to C-terminus i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1014 of the 5.
- the first fusion polypeptide comprises, in order from
- N-terminus to C-terminus i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1015 of the 5.
- the first fusion polypeptide comprises, in order from
- N-terminus to C-terminus i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1016 of the 5.
- the first fusion polypeptide comprises, in order from
- N-terminus to C-terminus i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1017 of the 5.
- the first fusion polypeptide comprises, in order from
- N-terminus to C-terminus i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1018 of the 5.
- the first fusion polypeptide comprises, in order from
- N-terminus to C-terminus i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1019 of the 5.
- the first fusion polypeptide comprises, in order from
- N-terminus to C-terminus i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1020 of the 5.
- the first fusion polypeptide comprises, in order from
- N-terminus to C-terminus i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1021 of the 5.
- the first fusion polypeptide comprises, in order from
- N-terminus to C-terminus i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1022 of the 5.
- the first fusion polypeptide comprises, in order from
- N-terminus to C-terminus i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1023 of the 5.
- the first fusion polypeptide comprises, in order from
- N-terminus to C-terminus i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1024 of the 5.
- the second polypeptide of a Cas9 heterodimer of the present disclosure is a Cas9 heterodimer of the present disclosure.
- the second polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 400 amino acids to 658 amino acids of amino acids 61 to 718 of the 5.
- the second polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 61-718 of the 5.
- pyogene Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 650 amino acids to 660 amino acids (e.g., 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, or 660 amino acids).
- the second polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 400 amino acids to 624 amino acids of amino acids 95 to 718 of the 5.
- the second polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 95 to 718 of the 5.
- pyogene Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 620 amino acids to 630 amino acids (e.g., 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, or 630 amino acids).
- G56 (of the 5. pyogenes sequence set forth in SEQ ID NO: 1545) can be selected as the N-terminus for the alpha-helical lobe (e.g., due to its location in a poorly-conserved linker just before the arginine-rich bridge helix, which has been shown to be critical for Cas9 cleavage activity in human cells).
- the second polypeptide of a Cas9 heterodimer of the present disclosure comprises an a-helical lobe (also referred to as "an alpha-helical recognition region") of a Cas9 polypeptide.
- the second polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 400 amino acids to 658 amino acids of amino acids 56 to 714 of the 5.
- the second polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 56-714 of the 5.
- the C-terminus of the alpha-helical lobe can be at the beginning, end, or within the linker between the two lobes of the WT Cas9 protein.
- the C-terminus of the alpha-helical lobe can be at or near S714 of the WT Cas9 protein set forth in SEQ I D NO: 1545.
- the C-terminus of the alpha-helical lobe can be S714 of the WT Cas9 protein set forth in SEQ I D NO: 1545.
- the second fusion polypeptide comprises, in order from N-terminus to C- terminus: a) a second fusion partner; and b) a second polypeptide that comprises an alpha-helical recognition region. In some cases, the second fusion polypeptide comprises, in order from N- terminus to C-terminus: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner.
- the second fusion polypeptide comprises a heterologous sequence that provides for subcellular localization (e.g., an NLS for targeting to the nucleus; a mitochondrial localization signal for targeting to the mitochondria; a chloroplast localization signal for targeting to a chloroplast; an ER retention signal; and the like).
- the second fusion polypeptide includes 2 or more, 3 or more, 4 or more, or 5 or more NLSs.
- an NLS is located at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the N- terminus and/or at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the C-terminus.
- the second fusion polypeptide comprises an NLS.
- the second fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a second fusion partner; and c) a second polypeptide that comprises an alpha-helical recognition region.
- the second fusion polypeptide comprises, in order from N-terminus to C- terminus: a) an NLS; b) a second fusion partner; c) a second polypeptide that comprises an alpha-helical recognition region; and d) an NLS.
- the second fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a second polypeptide that comprises an alpha-helical recognition region; and c) a second fusion partner.
- the second fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a second polypeptide that comprises an alpha-helical recognition region; c) a second fusion partner; and d) an NLS.
- the NLS comprises the amino acid sequence
- the NLS comprises the amino acid sequence KRPAATKKAGQAKKKK (SEQ ID NO: 1547). Other suitable NLS are described elsewhere herein.
- an NLS can be at or near the N-terminus and/or the C-terminus.
- the second fusion polypeptide comprises two or more NLSs (e.g., 3 or more, 4 or more, or 5 or more NLSs).
- the second fusion polypeptide comprises one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the N-terminus and/or one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the C-terminus.
- the term "at or near” is used here because, as is known in the art, the NLS need not be at the actual terminus of a protein, but can be positioned near (e.g., within 100 amino acids of) an N- and/or C- terminus (e.g., within 80, within 75, within 60, within 55, within 50, within 45, within 40, within 35, or within 30 amino acids of the an N-and/or C-terminus).
- the second fusion polypeptide comprises one or more linker polypeptides.
- a linker polypeptide can be interposed between any of: a) an NLS and a fusion partner; b) a fusion partner and an alpha-helical lobe; and c) an alpha-helical lobe and an NLS. Suitable linker polypeptides are described elsewhere herein.
- Cas9 heterodimer comprising a circularly permuted polypeptide
- the Cas9 nuclease lobe of a Cas9 heterodimer of the present disclosure is a circular permutant.
- the term "circular permutant” refers to a variant polypeptide (e.g., of a subject Cas9 heterodimer) in which one section of the primary amino acid sequence has been moved to a different position within the primary amino acid sequence of the polypeptide, but where the local order of amino acids has not been changed, and where the three dimensional architecture of the protein is conserved.
- a circular permutant of a wild type 500 amino acid polypeptide may have an N-terminal residue of residue number 50 (relative to the wild type protein), where residues 1-49 of the wild type protein are added the C-terminus.
- Such a circular permutant, relative to the wild type protein sequence would have, from N- terminus to C-terminus, amino acid numbers 50-500 followed by 1-49 (amino acid 49 would be the C-terminal residue).
- amino acid 49 would be the C-terminal residue.
- such an example circular permutant would have the same total number of amino acids as the wild type reference protein, and the amino acids would even be in the same order (locally), but the overall primary amino acid sequence is changed.
- a Cas9 heterodimer of the present disclosure comprises: a) a first,
- polypeptide comprising: a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; where the first polypeptide comprises a first member of a dimerization pair; and b) a second polypeptide comprising an alpha-helical recognition region and a second member of a dimerization pair.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a first member of a dimerization pair; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a second member of the dimerization pair.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a first member of a dimerization pair; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a second member of the dimerization pair.
- the first fusion partner (first member of the dimerization pair) is covalently linked, directly or via a linker, at or near (e.g., within 1 to 50 amino acids of) the amino terminus (N-terminus) of the first, circular permuted, polypeptide.
- the first member of the dimerization pair is covalently linked, directly or via a linker, at or near (e.g., within 1 to 50 amino acids of) the carboxyl terminus (C -terminus) of the first, circular permuted, polypeptide.
- the first polypeptide comprises a nuclease lobe of a Cas9 polypeptide.
- a first fusion polypeptide comprises one or more linker polypeptides.
- a linker polypeptide can be interposed between any of the various possible components (polypeptides) of a first fusion polypeptide. Examples of suitable positions for a linker polypeptide include, but are not limited to, interposed between: a) an NLS and a fusion partner; b) a fusion partner and a RuvCII polypeptide; c) a PAM-interacting polypeptide and a RuvCI polypeptide; d) a RuvCI polypeptide and an NLS; e) a RuvCI polypeptide and a fusion partner; and f) a RuvCI polypeptide and a RuvCII polypeptide.
- the linker polypeptide may have any of a variety of amino acid sequences. Proteins can be joined by a spacer peptide, generally of a flexible nature, although other chemical linkages are not excluded. Currently, it is contemplated that the most useful linker sequences will generally be peptides of between about 6 and about 40 amino acids in length, or between about 6 and about 25 amino acids in length. These linkers are generally produced by using synthetic, linker- encoding oligonucleotides to couple the proteins. Peptide linkers with a degree of flexibility will generally be preferred. The linking peptides may have virtually any amino acid sequence, bearing in mind that the preferred linkers will have a sequence that results in a generally flexible peptide.
- Exemplary polypeptide linkers include glycine polymers (G) n , glycine-serine polymers
- n is an integer of at least one
- glycine-alanine polymers including, for example, (GS) n , GSGGS n (SEQ ID NO: 1548) and GGGS n (SEQ ID NO: 1549), where n is an integer of at least one
- glycine-alanine polymers including, for example, (GS) n , GSGGS n (SEQ ID NO: 1548) and GGGS n (SEQ ID NO: 1549), where n is an integer of at least one
- Exemplary linkers can comprise amino acid sequences including, but not limited to, GGSG (SEQ ID NO: 1550), GGSGG (SEQ ID NO: 1551), GSGSG (SEQ ID NO: 1552), GSGGG (SEQ ID NO: 1553), GGGSG (SEQ ID NO: 1554), GSSSG (SEQ ID NO: 1555), and the like.
- GGSG SEQ ID NO: 1550
- GGSGG SEQ ID NO: 1551
- GSGSG SEQ ID NO: 1552
- GSGGG SEQ ID NO: 1553
- GGGSG SEQ ID NO: 1554
- GSSSG SEQ ID NO: 1555
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first fusion partner; and b) a first polypeptide comprising: i) a RuvCII polypeptide; ii) an HNH polypeptide; iii) a RuvCIII polypeptide; iv) a PAM-inter acting polypeptide; and v) a RuvCI polypeptide.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: a) a first polypeptide comprising: i) a RuvCII polypeptide; ii) an HNH polypeptide; iii) a RuvCIII polypeptide; iv) a PAM-interacting polypeptide; and v) a RuvCI polypeptide; and b) a first fusion partner.
- the first fusion partner is a first member of a dimerization pair. Suitable first members of a dimerization pair are described herein.
- the first fusion polypeptide comprises a heterologous sequence that provides for subcellular localization (e.g., a nuclear localization signal (NLS) for targeting to the nucleus; a mitochondrial localization signal for targeting to the mitochondria; a chloroplast localization signal for targeting to a chloroplast; an ER retention signal; and the like).
- subcellular localization e.g., a nuclear localization signal (NLS) for targeting to the nucleus
- a mitochondrial localization signal for targeting to the mitochondria
- a chloroplast localization signal for targeting to a chloroplast
- an ER retention signal e.g., a nuclear localization signal (NLS) for targeting to the nucleus
- a mitochondrial localization signal for targeting to the mitochondria
- chloroplast localization signal for targeting to a chloroplast
- an ER retention signal e.g., a chloroplast localization signal
- an NLS is located at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the N-terminus and/or at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the C-terminus.
- the first fusion polypeptide comprises a nuclear localization signal (NLS).
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: a) an NLS; b) a first fusion partner; and c) a first polypeptide comprising: i) a RuvCII polypeptide; ii) an HNH polypeptide; iii) a RuvCIII polypeptide; iv) a PAM-interacting polypeptide; and v) a RuvCI polypeptide.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a first fusion partner; c) a first polypeptide comprising: i) a RuvCII polypeptide; ii) an HNH polypeptide; iii) a RuvCIII polypeptide; iv) a PAM-inter acting polypeptide; and v) a RuvCI polypeptide; and d) an NLS.
- the first fusion polypeptide comprises, in order from N-terminus to C- terminus: a) an NLS; b) a first polypeptide comprising: i) a RuvCII polypeptide; ii) an HNH polypeptide; iii) a RuvCIII polypeptide; iv) a PAM-interacting polypeptide; and v) a RuvCI polypeptide; and c) a first fusion partner.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first polypeptide comprising: i) a RuvCII polypeptide; ii) an HNH polypeptide; iii) a RuvCIII polypeptide; iv) a PAM-interacting polypeptide; and v) a RuvCI polypeptide; b) a first fusion partner; and c) an NLS.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first fusion partner; b a first polypeptide comprising: i) a RuvCII polypeptide; ii) an HNH polypeptide; iii) a RuvCIII polypeptide; iv) a PAM-interacting polypeptide; and v) a RuvCI polypeptide; and c) an NLS.
- the first fusion partner is a first member of a dimerization pair.
- the NLS comprises the amino acid sequence MAPKKKRKVGIHGVPAA (SEQ ID NO: 1546).
- the NLS comprises the amino acid sequence KRPAATKKAGQAKKKK (SEQ ID NO: 1547). Other suitable NLS are described elsewhere herein.
- an NLS can be at or near the N-terminus and/or the C-terminus.
- the first fusion polypeptide comprises two or more NLSs (e.g., 3 or more, 4 or more, or 5 or more NLSs).
- the first fusion polypeptide comprises one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the N-terminus and/or one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the C-terminus.
- the term "at or near” is used here because, as is known in the art, the NLS need not be at the actual terminus of a protein, but can be positioned near (e.g., within 100 amino acids of) an N- and/or C- terminus (e.g., within 80, within 75, within 60, within 55, within 50, within 45, within 40, within 35, or within 30 amino acids of the an N-and/or C-terminus).
- a RuvCII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 to 57 amino acids of amino acids 718-774 of the 5.
- a RuvCII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 718-774 of the 5.
- pyogene s Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of 55-60 (e.g., 55, 56, 57, 58, 59, or 60) amino acids.
- a RuvCII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 to 46 amino acids of amino acids 729-775 of the 5.
- a RuvCII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 728-774 of the 5.
- An HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 100 to 134 amino acids of amino acids 776-909 of the 5.
- an HNH polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 776-909 of the 5.
- a RuvCIII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 150 amino acids to 190 amino acids of amino acids 910 to 1099 of the 5.
- a RuvCIII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1099 of the 5.
- pyogene s Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 180 amino acids to 190 amino acids (e.g., 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, or 190 amino acids).
- a PAM-interacting polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 200 amino acids to 268 amino acids of amino acids 1100 to 1367 of the 5.
- a PAM-interacting polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1100 to 1367 of the 5.
- pyogene s Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 260 amino acids to 270 amino acids (e.g., 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, or 270 amino acids).
- a RuvCI polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 amino acids to 60 amino acids of amino acids 1-60 of the 5.
- a RuvCI polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1-60 of the 5.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first fusion partner; and b) a first polypeptide comprising: i) a C-terminal portion of an HNH polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-inter acting polypeptide; v) a RuvCI polypeptide; vi) a RuvCII polypeptide; and vi) an N-terminal portion of an HNH polypeptide.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first polypeptide comprising: i) a C-terminal portion of an HNH polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-interacting polypeptide; v) a RuvCI polypeptide; vi) a RuvCII polypeptide; and vi) an N-terminal portion of an HNH polypeptide; and b) a first fusion partner.
- the first fusion partner is a first member of a dimerization pair. Suitable first members of a dimerization pair are described herein.
- the first fusion polypeptide comprises a heterologous sequence that provides for subcellular localization (e.g., a nuclear localization signal (NLS) for targeting to the nucleus; a mitochondrial localization signal for targeting to the mitochondria; a chloroplast localization signal for targeting to a chloroplast; an ER retention signal; and the like).
- subcellular localization e.g., a nuclear localization signal (NLS) for targeting to the nucleus
- a mitochondrial localization signal for targeting to the mitochondria
- a chloroplast localization signal for targeting to a chloroplast
- an ER retention signal e.g., a nuclear localization signal (NLS) for targeting to the nucleus
- a mitochondrial localization signal for targeting to the mitochondria
- chloroplast localization signal for targeting to a chloroplast
- an ER retention signal e.g., a chloroplast localization signal
- an NLS is located at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the N-terminus and/or at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the C-terminus.
- the first fusion polypeptide comprises a nuclear localization signal (NLS).
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a first fusion partner; and c) a first polypeptide comprising: i) a C-terminal portion of an HNH polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-interacting polypeptide; v) a RuvCI polypeptide; vi) a RuvCII polypeptide; and vi) an N-terminal portion of an HNH polypeptide.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first polypeptide comprising: i) a C-terminal portion of an HNH polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-interacting polypeptide; v) a RuvCI polypeptide; vi) a RuvCII polypeptide; and vi) an N-terminal portion of an HNH polypeptide; b) a first fusion partner; and c) an NLS.
- the first fusion polypeptide comprises, in order from N-terminus to C- terminus: a) an NLS; b) a first fusion partner; c) a first polypeptide comprising: i) a C-terminal portion of an HNH polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-interacting polypeptide; v) a RuvCI polypeptide; vi) a RuvCII polypeptide; and vi) an N-terminal portion of an HNH polypeptide; and d) an NLS.
- the NLS comprises the amino acid sequence MAPKKKRKVGIHGVPAA (SEQ ID NO: 1546).
- the NLS comprises the amino acid sequence KRPAATKKAGQAKKKK (SEQ ID NO: 1547).
- Other suitable NLS are described elsewhere herein.
- the first fusion partner is a first member of a dimerization pair.
- an NLS can be at or near the N-terminus and/or the C-terminus.
- the first fusion polypeptide comprises two or more NLSs (e.g., 3 or more, 4 or more, or 5 or more NLSs).
- the first fusion polypeptide comprises one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the N-terminus and/or one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the C-terminus.
- the term "at or near” is used here because, as is known in the art, the NLS need not be at the actual terminus of a protein, but can be positioned near (e.g., within 100 amino acids of) an N- and/or C- terminus (e.g., within 80, within 75, within 60, within 55, within 50, within 45, within 40, within 35, or within 30 amino acids of the an N-and/or C-terminus).
- a first fusion polypeptide comprises one or more linker polypeptides.
- a linker polypeptide can be interposed between any of: a) an NLS and a fusion partner; b) a fusion partner and a C-terminal portion of an HNH polypeptide; c) a PAM-interacting polypeptide and a RuvCI polypeptide; and d) an N-terminal portion of an HNH polypeptide and a fusion partner.
- Suitable linker polypeptides are as described above.
- a C-terminal portion of an HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 35 to 42 amino acids of amino acids 868-909 of the 5.
- a C-terminal portion of an HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 50 amino acids to 67 amino acids of amino acids 842-909 of the 5.
- An N-terminal portion of an HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 80 amino acids to 92 amino acids of amino acids 776 to 867 of the 5.
- an N-terminal portion of an HNH polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 776 to 867 of the 5.
- An N-terminal portion of an HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 50 amino acids to 66 amino acids of amino acids 776- 841 of the 5.
- a RuvCIII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 150 amino acids to 190 amino acids of amino acids 910 to 1099 of the 5.
- a RuvCIII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1099 of the 5.
- pyogene s Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 180 amino acids to 190 amino acids (e.g., 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, or 190 amino acids).
- a PAM-interacting polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 200 amino acids to 268 amino acids of amino acids 1100 to 1367 of the 5.
- a PAM-interacting polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1100 to 1367 of the 5.
- pyogene s Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 260 amino acids to 270 amino acids (e.g., 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, or 270 amino acids).
- a RuvCI polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 amino acids to 60 amino acids of amino acids 1-60 of the 5.
- a RuvCI polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1-60 of the 5.
- a RuvCII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 to 46 amino acids of amino acids 729-775 of the 5.
- a RuvCII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 728-774 of the 5.
- Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of 45-50 (e.g., 45, 46, 47, 48, 49, or 50) amino acids.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first fusion partner; and b) a first polypeptide comprising: i) an HNH polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-interacting polypeptide; iv) a RuvCI polypeptide; and vi) a RuvCII polypeptide.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: a) a first polypeptide comprising: i) an HNH polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-interacting polypeptide; iv) a RuvCI polypeptide; and vi) a RuvCII polypeptide; and b) a first fusion partner.
- the first fusion partner is a first member of a dimerization pair. Suitable first members of a dimerization pair are described herein.
- the first fusion polypeptide comprises a heterologous sequence that provides for subcellular localization (e.g., a nuclear localization signal (NLS) for targeting to the nucleus; a mitochondrial localization signal for targeting to the mitochondria; a chloroplast localization signal for targeting to a chloroplast; an ER retention signal; and the like).
- subcellular localization e.g., a nuclear localization signal (NLS) for targeting to the nucleus
- a mitochondrial localization signal for targeting to the mitochondria
- a chloroplast localization signal for targeting to a chloroplast
- an ER retention signal e.g., a nuclear localization signal (NLS) for targeting to the nucleus
- a mitochondrial localization signal for targeting to the mitochondria
- chloroplast localization signal for targeting to a chloroplast
- an ER retention signal e.g., a chloroplast localization signal
- an NLS is located at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the N-terminus and/or at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the C-terminus.
- the first fusion polypeptide comprises a nuclear localization signal (NLS).
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a first fusion partner; and c) a first polypeptide comprising: i) an HNH
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first polypeptide comprising: i) an HNH polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-inter acting polypeptide; iv) a RuvCI polypeptide; and vi) a RuvCII polypeptide; b) a first fusion partner; and c) an NLS.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a first fusion partner; c) a first polypeptide comprising: i) an HNH polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-interacting polypeptide; iv) a RuvCI polypeptide; and vi) a RuvCII polypeptide; and d) an NLS.
- the NLS comprises the amino acid sequence MAPKKKRKVGIHGVPAA (SEQ ID NO: 1546).
- the NLS comprises the amino acid sequence KRPAATKKAGQAKKKK (SEQ ID NO: 1547). Other suitable NLS are described elsewhere herein.
- the first fusion partner is a first member of a dimerization pair.
- an NLS can be at or near the N-terminus and/or the C-terminus.
- the first fusion polypeptide comprises two or more NLSs (e.g., 3 or more, 4 or more, or 5 or more NLSs).
- the first fusion polypeptide comprises one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the N-terminus and/or one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the C-terminus.
- the term "at or near” is used here because, as is known in the art, the NLS need not be at the actual terminus of a protein, but can be positioned near (e.g., within 100 amino acids of) an N- and/or C- terminus (e.g., within 80, within 75, within 60, within 55, within 50, within 45, within 40, within 35, or within 30 amino acids of the an N-and/or C-terminus).
- a first fusion polypeptide comprises one or more linker polypeptides.
- a linker polypeptide can be interposed between any of: a) an NLS and a fusion partner; b) a fusion partner and an HNH polypeptide; c) a PAM-interacting polypeptide and a RuvCI polypeptide; and d) a RuvCII polypeptide and a fusion partner.
- Suitable linker polypeptides are as described above.
- a RuvCIII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 150 amino acids to 190 amino acids of amino acids 910 to 1099 of the 5.
- a RuvCIII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1099 of the 5.
- pyogene s Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 180 amino acids to 190 amino acids (e.g., 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, or 190 amino acids).
- a PAM-interacting polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 200 amino acids to 268 amino acids of amino acids 1100 to 1367 of the 5.
- a PAM-interacting polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1100 to 1367 of the 5.
- pyogene s Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 260 amino acids to 270 amino acids (e.g., 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, or 270 amino acids).
- a RuvCI polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 amino acids to 60 amino acids of amino acids 1-60 of the 5.
- a RuvCI polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1-60 of the 5.
- a RuvCII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 to 46 amino acids of amino acids 729-775 of the 5.
- a RuvCII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 728-774 of the 5.
- An HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 100 to 134 amino acids of amino acids 776-909 of the 5.
- an HNH polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 776-909 of the 5.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first fusion partner; and b) a first polypeptide comprising: i) a RuvCIII polypeptide; ii) a PAM-inter acting polypeptide; iii) a RuvCI polypeptide; iv) a RuvCII polypeptide; and v) an HNH polypeptide.
- the first fusion polypeptide comprises, in order from N- terminus to C-terminus: a) a first polypeptide comprising: i) a RuvCIII polypeptide; ii) a PAM- interacting polypeptide; iii) a RuvCI polypeptide; iv) a RuvCII polypeptide; and v) an HNH polypeptide; and b) a first fusion partner.
- the first fusion partner is a first member of a dimerization pair. Suitable first members of a dimerization pair are described herein.
- the first fusion polypeptide comprises a heterologous sequence that provides for subcellular localization (e.g., a nuclear localization signal (NLS) for targeting to the nucleus; a mitochondrial localization signal for targeting to the mitochondria; a chloroplast localization signal for targeting to a chloroplast; an ER retention signal; and the like).
- subcellular localization e.g., a nuclear localization signal (NLS) for targeting to the nucleus
- a mitochondrial localization signal for targeting to the mitochondria
- a chloroplast localization signal for targeting to a chloroplast
- an ER retention signal e.g., a nuclear localization signal (NLS) for targeting to the nucleus
- a mitochondrial localization signal for targeting to the mitochondria
- chloroplast localization signal for targeting to a chloroplast
- an ER retention signal e.g., a chloroplast localization signal
- an NLS is located at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the N-terminus and/or at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the C-terminus.
- the first fusion polypeptide comprises a nuclear localization signal (NLS).
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a first fusion partner; and c) a first polypeptide comprising: i) a RuvCIII polypeptide; ii) a PAM-interacting polypeptide; iii) a RuvCI polypeptide; iv) a RuvCII polypeptide; and v) an HNH polypeptide.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first polypeptide comprising: i) a RuvCIII polypeptide; ii) a PAM-interacting polypeptide; iii) a RuvCI polypeptide; iv) a RuvCII polypeptide; and v) an HNH polypeptide; b) a first fusion partner; and c) a fusion partner.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a first fusion partner; c) a first polypeptide comprising: i) a RuvCIII polypeptide; ii) a PAM-interacting polypeptide; iii) a RuvCI polypeptide; iv) a RuvCII polypeptide; and v) an HNH polypeptide; d) an NLS.
- the first fusion partner is a first member of a dimerization pair.
- the NLS comprises the amino acid sequence
- the NLS comprises the amino acid sequence KRPAATKKAGQAKKKK (SEQ ID NO: 1547). Other suitable NLS are described elsewhere herein.
- the first fusion partner is a first member of a dimerization pair.
- an NLS can be at or near the N-terminus and/or the C-terminus.
- the first fusion polypeptide comprises two or more NLSs (e.g., 3 or more, 4 or more, or 5 or more NLSs).
- the first fusion polypeptide comprises one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the N-terminus and/or one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the C-terminus.
- the term "at or near” is used here because, as is known in the art, the NLS need not be at the actual terminus of a protein, but can be positioned near (e.g., within 100 amino acids of) an N- and/or C- terminus (e.g., within 80, within 75, within 60, within 55, within 50, within 45, within 40, within 35, or within 30 amino acids of the an N-and/or C-terminus).
- a first fusion polypeptide comprises one or more linker polypeptides.
- a linker polypeptide can be interposed between any of: a) an NLS and a fusion partner; b) a fusion partner and a RuvCIII polypeptide; c) a PAM-interacting polypeptide and a RuvCI polypeptide; and d) an HNH polypeptide and a fusion partner.
- Suitable linker polypeptides are as described above.
- a RuvCIII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 150 amino acids to 190 amino acids of amino acids 910 to 1099 of the 5.
- a RuvCIII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1099 of the 5.
- pyogene s Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 180 amino acids to 190 amino acids (e.g., 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, or 190 amino acids).
- a PAM-interacting polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 200 amino acids to 268 amino acids of amino acids 1100 to 1367 of the 5.
- a PAM-interacting polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1100 to 1367 of the 5.
- pyogene s Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 260 amino acids to 270 amino acids (e.g., 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, or 270 amino acids).
- a RuvCI polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 amino acids to 60 amino acids of amino acids 1-60 of the 5.
- a RuvCI polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1-60 of the 5.
- a RuvCII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 to 46 amino acids of amino acids 729-775 of the 5.
- a RuvCII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 728-774 of the 5.
- An HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 100 to 134 amino acids of amino acids 776-909 of the 5.
- an HNH polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 776-909 of the 5.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first fusion partner; and b) a first polypeptide comprising: i) a C-terminal portion of a RuvCIII polypeptide; ii) a PAM-inter acting polypeptide; iii) a RuvCI polypeptide; iv) a RuvCII polypeptide; v) an HNH polypeptide; and vi) an N-terminal portion of a RuvCIII polypeptide.
- the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first polypeptide comprising: i) a C-terminal portion of a RuvCIII polypeptide; ii) a PAM- interacting polypeptide; iii) a RuvCI polypeptide; iv) a RuvCII polypeptide; v) an HNH polypeptide; and vi) an N-terminal portion of a RuvCIII polypeptide; and b) a first fusion partner.
- the first fusion partner is a first member of a dimerization pair. Suitable first members of a dimerization pair are described elsewhere herein.
- a C-terminal portion of a RuvCIII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 75 amino acids to 84 amino acids of amino acids 1016 to 1099 of the 5.
- a C-terminal RuvCIII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1016 to 1099 of the 5.
- An N-terminal portion of a RuvCIII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 80 amino acids to 106 amino acids of amino acids 910 to 1015 of the 5.
- a RuvCIII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1015 of the 5.
- a PAM-interacting polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 200 amino acids to 268 amino acids of amino acids 1100 to 1367 of the 5.
- a PAM-interacting polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1100 to 1367 of the 5.
- pyogene s Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 260 amino acids to 270 amino acids (e.g., 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, or 270 amino acids).
- a RuvCI polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 amino acids to 60 amino acids of amino acids 1-60 of the 5.
- a RuvCI polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1-60 of the 5.
- a RuvCII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 to 46 amino acids of amino acids 729-775 of the 5.
- a RuvCII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 728-774 of the 5.
- An HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 100 to 134 amino acids of amino acids 776-909 of the 5.
- an HNH polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 776-909 of the 5.
- a first fusion polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 82-829 of the amino acid sequence depicted in the following paragraph.
- the fusion partner is linked, directly or via a linker, to the N- terminus of the polypeptide.
- a first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a fusion partner; and b) a polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 82-829 of the amino acid sequence amino acid sequence depicted in the following paragraph.
- Suitable fusion partners include a first member of a dimerization pair, where suitable first members of a dimerization pair are described elsewhere herein.
- a first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a fusion partner; and c) a polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 82-829 of the amino acid sequence depicted in the following paragraph.
- a first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a fusion partner; c) a polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 82-829 of the amino acid sequence depicted in the following paragraph; and d) a fusion partner.
- a first fusion polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 82-820 of the amino acid sequence depicted in the following paragraph.
- the fusion partner is linked, directly or via a linker, to the N-terminus of the polypeptide.
- a first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a fusion partner; and b) a polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 82-820 of the amino acid sequence depicted in the following paragraph.
- Suitable fusion partners include a first member of a dimerization pair, where suitable first members of a dimerization pair are described elsewhere herein.
- a first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a fusion partner; and c) a polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 82-820 of the amino acid sequence depicted in the following paragraph.
- a first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a fusion partner; c) a polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 82-820 of the amino acid sequence depicted in the following paragraph; and d) a fusion partner.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a first member of a dimerization pair; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region (e.g., an alpha helical lobe); and b) a second fusion partner, where the second fusion partner is a second member of the dimerization pair.
- A) a first fusion polypeptide comprising: a) a first, circular permuted, polypeptide that comprises
- the fusion partner is at or near (e.g., within the first 50 amino acids of the N-terminus) the N-terminus of the second polypeptide. In some cases, the fusion partner is at or near (e.g., within the first 50 amino acids of the C-terminus) the C-terminus of the second polypeptide. In some cases, the fusion partner is located internally within the second fusion polypeptide.
- the second polypeptide comprises an a-helical lobe (also referred to as "an
- the second polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 400 amino acids to 658 amino acids of amino acids 61 to 718 of the 5.
- the second polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 61-718 of the 5.
- pyogene Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 650 amino acids to 660 amino acids (e.g., 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, or 660 amino acids).
- the second polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 400 amino acids to 624 amino acids of amino acids 95 to 718 of the 5.
- the second polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 95 to 718 of the 5.
- pyogene Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 620 amino acids to 630 amino acids (e.g., 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, or 630 amino acids).
- the second fusion polypeptide comprises, in order from N-terminus to C- terminus: a) a second fusion partner; and b) a second polypeptide that comprises an alpha-helical recognition region.
- the second fusion polypeptide comprises, in order from N- terminus to C-terminus: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner.
- the second fusion polypeptide comprises a heterologous sequence that provides for subcellular localization (e.g., an NLS for targeting to the nucleus; a mitochondrial localization signal for targeting to the mitochondria; a chloroplast localization signal for targeting to a chloroplast; an ER retention signal; and the like).
- the second fusion polypeptide includes 2 or more, 3 or more, 4 or more, or 5 or more NLSs.
- an NLS is located at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the N- terminus and/or at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the C-terminus.
- the second fusion polypeptide comprises an NLS.
- the second fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a second fusion partner; and c) a second polypeptide that comprises an alpha-helical recognition region.
- the second fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a second fusion partner; c) a second polypeptide that comprises an alpha-helical recognition region; and d) an NLS.
- the second fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a second polypeptide that comprises an alpha-helical recognition region; and c) a second fusion partner.
- the second fusion polypeptide comprises, in order from N-terminus to C- terminus: a) an NLS; b) a second polypeptide that comprises an alpha-helical recognition region; c) a second fusion partner; and d) an NLS.
- the NLS comprises the amino acid sequence MAPKKKRKVGIHGVPAA (SEQ ID NO: 1546).
- the NLS comprises the amino acid sequence KRPAATKKAGQAKKKK (SEQ ID NO: 1547). Other suitable NLS are described elsewhere herein.
- an NLS can be at or near the N-terminus and/or the C-terminus.
- the second fusion polypeptide comprises two or more NLSs (e.g., 3 or more, 4 or more, or 5 or more NLSs).
- the second fusion polypeptide comprises one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the N-terminus and/or one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the C-terminus.
- the term "at or near” is used here because, as is known in the art, the NLS need not be at the actual terminus of a protein, but can be positioned near (e.g., within 100 amino acids of) an N- and/or C- terminus (e.g., within 80, within 75, within 60, within 55, within 50, within 45, within 40, within 35, or within 30 amino acids of the an N-and/or C-terminus).
- the second fusion polypeptide comprises one or more linker polypeptides.
- a linker polypeptide can be interposed between any of: a) an NLS and a fusion partner; b) a fusion partner and an alpha-helical lobe; and c) an alpha-helical lobe and an NLS.
- the first fusion partner of the first fusion polypeptide, and the second fusion partner of the second fusion polypeptide, of a Cas9 heterodimer of the present disclosure constitute a "dimer pair.”
- a dimer pair is a pair of polypeptides that can dimerize with one another.
- Each member (each polypeptide) of the dimer pair can be part of a different polypeptide, and when the members of the binding pair (the dimer pair) are brought into close proximity with one another (e.g., bind to one another), the two different polypeptides (heterologous polypeptides) to which the dimer pair members are fused are brought into proximity with one another and can be said to dimerize (i.e., as a consequence of the members of the dimer pair dimerizing).
- a Cas9 heterodimer of the present disclosure comprises two polypeptides that can interact to form a complex (i.e., to form the heterodimeric Cas9 protein).
- a Cas9 heterodimer of the present disclosure is also referred to herein as a "split Cas9" or a "split Cas9 protein.”
- the fusion partners present in the first fusion polypeptide and the second fusion polypeptide can be induced to dimerize by a dimerizing agent. When the fusion partners present in the first fusion polypeptide and the second fusion polypeptide dimerize, the first fusion polypeptide and the second fusion polypeptide dimerize.
- the first fusion polypeptide and the second fusion polypeptide do not dimerize.
- the Cas9 heterodimer together with a truncated guide RNA (e.g., a guide RNA that does not include stem loop 2 and/or stem loop 3), can bind a target nucleic acid.
- a truncated guide RNA e.g., a guide RNA that does not include stem loop 2 and/or stem loop 3
- a Cas9 heterodimer of the present disclosure and a truncated guide RNA form a "Cas9 heterodimer system," described hereinbelow.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide (comprising a Cas9 nuclease lobe) and a first fusion partner (" a first member of a dimer pair"); and B) a second fusion polypeptide (comprising a Cas9 alpha-helical lobe) and a second fusion partner (" a second member of the dimer pair”).
- the first and second fusion polypeptides dimerize when the first and second binding members dimerize (when the first and second binding members are brought into close proximity with one another, e.g., via a dimerizer, via binding to one another, etc.).
- the dimer pair is inducible such that the members of the dimer pair do not associate (e.g., come into proximity with one another, bind to one another, etc.) in the absence of induction (e.g., chemical induction, light induction, etc.). In some cases, the dimer pair is not inducible such that the members of the dimer pair bind to one another when both members are present (e.g., synzip polypeptides).
- dimer pairs suitable for use in a subject can be any convenient dimer pair.
- Example dimer pairs suitable for use in a subject can be any convenient dimer pair.
- heterodimeric Cas9 protein include non-inducible binding pairs.
- each member of the binding pair is a protein domain that binds to the other member.
- each member of the binding pair is a coiled-coil domain.
- suitable coiled-coil domains include, but are not limited to:
- each of the two members of a non-inducible binding pair comprise an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100%, amino acid sequence identity) to a coiled coil domain.
- a member of a non-inducible binding pair includes an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100%, amino acid sequence identity) to SYNZIP14 (the amino acid sequence set forth in SEQ ID NO: 1556).
- a member of a non-inducible binding pair includes an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100%, amino acid sequence identity) to SYNZIP17 (the amino acid sequence set forth in SEQ ID NO: 1557).
- a member of a non-inducible binding pair includes an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100%, amino acid sequence identity) to SYNZIP18 (the amino acid sequence set forth in SEQ ID NO: 1558).
- one member of a non-inducible binding pair includes an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100%, amino acid sequence identity) to SYNZIP17 (the amino acid sequence set forth in SEQ ID NO: 1557); and the other member of the non-inducible binding pair includes an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100%, amino acid sequence identity) to SYNZIP18 (the amino acid sequence set forth in SEQ ID NO: 1558).
- the two members of a non-inducible binding pair are SYNZIP17 and SYNZIP18.
- one member of a non-inducible binding pair includes an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100%, amino acid sequence identity) to SYNZIP14 (the amino acid sequence set forth in SEQ ID NO: 1556); and the other member of the non-inducible binding pair includes an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100%, amino acid sequence identity) to SYNZIP17 (the amino acid sequence set forth in SEQ ID NO: 1557).
- the two members of a non-inducible binding pair are SYNZIP14 and SYNZIP17.
- Example dimer pairs suitable for use in a subject Cas9 heterodimer also include inducible
- dimerizer-binding pairs suitable for use in a Cas9 heterodimer of the present disclosure are in some embodiments polypeptides (e.g. protein domains) that bind to a different site of the same molecule (referred to herein as a "dimerizer").
- dimerizer polypeptides (e.g. protein domains) that bind to a different site of the same molecule (referred to herein as a "dimerizer").
- dimerizer both members of a dimerizer-binding pair bind to the dimerizer (e.g., in some cases each binding to a different site of the dimerizer) and are thus brought into proximity with one another.
- binding to the dimerizer is reversible. In some embodiments, binding to the dimerizer is irreversible. In some embodiments, binding to the dimerizer is non-covalent. In some embodiments, binding to the dimerizer is covalent.
- Dimer pairs suitable for use include dimerizer-binding pairs that dimerize upon binding of a first member of a dimer pair to a dimerizing agent and of a second member of the dimer pair to the same dimerizing agent. Dimer pairs suitable for use also include dimerizer-binding pairs that dimerize upon binding of a first member of a dimer pair to a dimerizing agent, where the dimerizing agent induces a conformational change in the first member of the dimer pair, and where the conformational change allows the first member of the dimer pair to bind (covalently or non-covalently) to a second member of the dimer pair.
- Other dimer pairs suitable for use include dimer pairs in which exposure to light (e.g., blue light) induces dimerization of the dimer pair.
- an inducible dimer pair will dimerize upon exposure to an agent that induces dimerization, where the agent is in some cases a small molecule, or, for example, in other cases, light.
- the discussion below referring to "dimerizer-binding pairs" includes dimer pairs that dimerize regardless of the mechanism.
- dimers include, but are not limited to:
- FKBP1 A FK506 binding protein (e.g., a rapamycin binding portion) paired with
- FKBP1A e.g., a rapamycin binding portion: dimerization induced by rapamycin and/or rapamycin analogs known as rapalogs;
- FKBP1A e.g., a rapamycin binding portion
- FRB Fkbp-Rapamycin Binding
- rapalogs dimerization induced by rapamycin and/or rapamycin analogs known as rapalogs
- FKBP1A e.g., a rapamycin binding portion
- CnA calcineurin catalytic subunit
- FKBP1A e.g., a rapamycin binding portion
- cyclophilin dimerization induced by rapamycin and/or rapamycin analogs known as rapalogs
- GyrB (Gyrase B) and GyrB: dimerization induced by coumermycin;
- DHFR dihydrofolate reductase
- DHFR dimerization induced by methotrexate
- DmrB and DmrB dimerization induced by AP20187;
- a member (a first and/or a second member) of a binding pair (e.g., a dimerizer-binding pair) of a subject Cas9 heterodimer can have a length in a range of from 35 to 300 amino acids
- a member of a dimer e.g., a dimerizer-binding pair of a subject Cas9
- FKBP1A also known as FKBP12, FKBP1 ; PKC12; PKCI2;
- a suitable dimerizer-binding pair member can include a rapamycin binding portion of FKBP1 A.
- a suitable dimerizer-binding pair member can comprise an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to the following amino acid sequence (a rapamycin binding portion of FKBP1A):
- a member of a dimerizer-binding pair of a Cas9 heterodimer is derived from protein phosphatase 3, catalytic subunit, alpha isozyme (PPP3CA) (also known as
- a suitable dimerizer-binding pair member can include a binding portion of PPP3CA.
- a suitable dimerizer-binding pair member can comprise an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to the following amino acid sequence (PP2Ac domain):
- a member of a dimer is derived from cyclophilin (also known cyclophilin A, PPIA, CYPA, CYPH, PPIase A, etc.).
- cyclophilin also known cyclophilin A, PPIA, CYPA, CYPH, PPIase A, etc.
- a suitable dimerizer-binding pair member can include a binding portion of cyclophilin.
- a suitable dimerizer-binding pair member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to the following amino acid sequence:
- a member of a dimer is derived from MTOR (also known as FKBP-rapamycin associated protein; FK506 binding protein 12-rapamycin associated protein 1 ; FK506 binding protein 12-rapamycin associated protein 2; FK506-binding protein 12-rapamycin complex-associated protein 1; FRAP; FRAP1 ; FRAP2; RAFT1 ; and RAPT1).
- MTOR also known as FKBP-rapamycin associated protein
- FK506 binding protein 12-rapamycin associated protein 1 FK506 binding protein 12-rapamycin associated protein 2
- FK506-binding protein 12-rapamycin complex-associated protein 1 FRAP; FRAP1 ; FRAP2; RAFT1 ; and RAPT1
- a suitable dimerizer-binding pair member can include the Fkbp- Rapamycin Binding Domain (also known as FRB).
- a suitable dimerizer-binding pair member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to the following amino acid sequence (FRB):
- VAILWHEMWHEGLEEASRLYFGERNVKGMFEVLEPLHAMMERGPQTLKETSFNQAYG RDLMEAQEWCRKYMKSGNVKDLTQAWDLYYHVFRRIS SEQ ID NO: 1562.
- a member of a dimer is derived from GyrB (also known as DNA gyrase subunit B).
- a suitable dimerizer-binding pair member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to a contiguous stretch of from about 100 amino acids to about 200 amino acids (aa), from about 200 aa to about 300 aa, from about 300 aa to about 400 aa, from about 400 aa to about 500 aa, from about 500 aa to about 600 aa, from about 600 aa to about 700 aa, or from about 700 aa to about 800 aa, of the following GyrB amino acid sequence from Escherichia coli (or to the DNA gyrase subunit B sequence from any organism):
- a member of a dimerizer-binding pair includes an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to amino acids 1-220 of the above- listed GyrB amino acid sequence from Escherichia coli.
- a member of a dimer is derived from DHFR (also known as dihydrofolate reductase, DHFRP1, and DYR).
- a suitable dimerizer- binding pair member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to the following amino acid sequence:
- a member of a dimer is derived from the DmrB binding domain (i.e., DmrB homodimerization domain).
- a suitable dimerizer- binding pair member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to the following amino acid sequence:
- MASRGVQVETISPGDGRTFPKRGQTCVVHYTGMLEDGKKVDSSRDRNKPFKFMLGKQ EVIRGWEEGVAQMSVGQRAKLTISPDYAYGATGHPGIIPPHATLVFDVELLKLE SEQ ID NO: 1565.
- a member of a dimer (e.g., a dimerizer-binding pair) is derived from a
- PYL protein also known as abscisic acid receptor and as RCAR.
- a member of a subject dimerizer-binding pair can be derived from proteins such as those of Arabidopsis thaliana: PYR1, RCAR1(PYL9), PYL1, PYL2, PYL3, PYL4, PYL5, PYL6, PYL7, PYL8 (RCAR3), PYL10, PYL11, PYL12, PYL13.
- a suitable dimerizer-binding pair member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to the following amino acid sequences:
- a member of a dimer (e.g., a dimerizer-binding pair) is derived from an
- ABI protein also known as Abscisic Acid-Insensitive
- a member of a subject dimerizer-binding pair can be derived from proteins such as those of Arabidopsis thaliana: ABI1 (Also known as ABSCISIC ACID-INSENSITIVE 1, Protein phosphatase 2C 56, AtPP2C56, P2C56, and PP2C ABI1) and/or ABI2(also known as P2C77, Protein phosphatase 2C 77, AtPP2C77, ABSCISIC ACID-INSENSITIVE 2, Protein phosphatase 2C ABI2, and PP2C ABI2).
- ABI1 Also known as ABSCISIC ACID-INSENSITIVE 1, Protein phosphatase 2C 56, AtPP2C56, P2C56, and PP2C ABI1
- ABI2 also known as P2C77, Protein phosphatase 2C 77, AtPP2C77, ABSCISIC ACID-INSENSITIVE 2, Protein
- a suitable dimerizer-binding pair member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to a contiguous stretch of from about 100 amino acids to about 110 amino acids (aa), from about 110 aa to about 115 aa, from about 115 aa to about 120 aa, from about 120 aa to about 130 aa, from about 130 aa to about 140 aa, from about 140 aa to about 150 aa, from about 150 aa to about 160 aa, from about 160 aa to about 170 aa, from about 170 aa to about 180 aa, from about 180 aa to about 190 aa, or from about 190 aa to about 200 aa of any of the following amino acid sequences:
- ABI1 ABI1 :
- ABI2 [00232]
- a member of a dimer (e.g., a dimerizer-binding pair) is derived from a
- Cry2 protein also known as cryptochrome 2.
- a member of a subject dimer e.g., a dimerizer-binding pair
- Cry2 proteins from any organism (e.g., a plant) such as, but not limited to, those of Arabidopsis thaliana.
- a suitable dimerizer-binding pair member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to a contiguous stretch of from about 100 amino acids to about 110 amino acids (aa), from about 110 aa to about 115 aa, from about 115 aa to about 120 aa, from about 120 aa to about 130 aa, from about 130 aa to about 140 aa, from about 140 aa to about 150 aa, from about 150 aa to about 160 aa, from about 160 aa to about 170 aa, from about 170 aa to about 180 aa, from about 180 aa to about 190 aa, or from about 190 aa to about 200 aa of any of the following amino acid sequences:
- Cry2 (Arabidopsis thaliana)
- a member of a dimer (e.g., a dimerizer-binding pair) is derived from the
- a suitable dimer (e.g., a dimerizer-binding pair) member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to a contiguous stretch of from about 100 amino acids to about 110 amino acids (aa), from about 110 aa to about 115 aa, from about 115 aa to about 120 aa, from about 120 aa to about 130 aa, from about 130 aa to about 140 aa, from about 140 aa to about 150 aa, from about 150 aa to about 160 aa, from about 160 aa to about 170 aa, from about 170 aa to about 180 aa, from about 180 aa to about 190 aa, or from about 190 aa to
- a member of a dimer (e.g., a dimerizer-binding pair) is derived from the
- a suitable dimerizer-binding pair member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to a contiguous stretch of from about 100 amino acids to about 110 amino acids (aa), from about 110 aa to about 115 aa, from about 115 aa to about 120 aa, from about 120 aa to about 130 aa, from about 130 aa to about 140 aa, from about 140 aa to about 150 aa, from about 150 aa to about 160 aa, from about 160 aa to about 170 aa, from about 170 aa to about 180 aa, from about 180 aa to about 190 aa, or from about 190 aa to about 200 aaa
- a member of a dimer (e.g., a dimerizer-binding pair) is derived from a
- a suitable dimer member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to a contiguous stretch of from about 100 amino acids to about 110 amino acids (aa), from about 110 aa to about 115 aa, from about 115 aa to about 120 aa, from about 120 aa to about 130 aa, from about 130 aa to about 140 aa, from about 140 aa to about 150 aa, from about 150 aa to about 160 aa, from about 160 aa to about 170 aa, from about 170 aa to about 180 aa, from about 180 aa to about 190 aa, or from about 190 aa to about 200 aa of any of the following amino acid sequences:
- GID1A [00238]
- GID1B [00239]
- GID1C [00240]
- Dimerizers (“dimerizing agents") that can provide for dimerization of a first member of a dimerizer-binding pair and a second member of a dimerizer-binding pair include, e.g. (where the dimerizer is in parentheses following the dimerizer-binding pair):
- FKBP1A and FKBP1A rapamycin and/or a rapamycin analog, rapalog
- FKBP1A and FRB rapamycin and/or a rapamycin analog, rapalog
- FKBP1A and PPP3CA rapamycin and/or a rapamycin analog, rapalog
- FKBP1A and cyclophilin rapamycin and/or a rapamycin analog, rapalog
- GyrB and GyrB (coumermycin);
- rapamycin can serve as a dimerizer.
- a rapamycin derivative or analog can be used. See, e.g., W096/41865; WO 99/36553; WO 01/14387; and Ye et al (1999) Science 283:88-91.
- analogs, homologs, derivatives and other compounds related structurally to rapamycin include, among others, variants of rapamycin having one or more of the following modifications relative to rapamycin: demethylation, elimination or replacement of the methoxy at C7, C42 and/or C29; elimination, derivatization or replacement of the hydroxy at CI 3, C43 and/or C28; reduction, elimination or derivatization of the ketone at C14, C24 and/or C30; replacement of the 6-membered pipecolate ring with a 5-membered prolyl ring; and alternative substitution on the cyclohexyl ring or replacement of the cyclohexyl ring with a substituted cyclopentyl ring.
- Rapamycin has the structure:
- Suitable rapalogs include, e.g.,
- rapalog Also suitable as a rapalog is a compound of the formula:
- n is 1 or 2; R and R are independently H, or a substituted or unsubstituted aliphatic or acyl moiety; one of R 7a and R 7b is H and the other is halo, R A , OR A , SR A , -OC(0)R A , -OC(0)NR A R B , -NR A R B , -NR B C(OR)R A , NR B C(0)OR A , -NR B S0 2 R A , or NR B S0 2 NR A R B' ; or R 7 ' and R 7b , taken together, are H in the tetraene moiety:
- R A is H or a substituted or unsubstituted aliphatic, heteroaliphatic, aryl, or heteroaryl moiety and where R B and R B are independently H, OH, or a substituted or unsubstituted aliphatic, heteroaliphatic, aryl, or heteroaryl moiety.
- coumermycin can serve as a dimerizing agent.
- a coumermycin analog can be used. See, e.g., Farrar et al. (1996) Nature 383: 178-181 ; and U.S. Pat. No.
- the dimerizing agent is methotrexate, e.g., a non-cytotoxic, homo-bifunctional methotrexate dimer. See, e.g., U.S. Patent No. 8,236,925.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a first member of a dimerization pair; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a second member of a dimerization pair.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an FKBP1A polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an FKBP1A polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an FKBP1A polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an FRB polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an FRB polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an FKBP1A polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an FKBP1A polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a PPP3CA
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM- interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a PPP3CA polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an FKBP1A polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an FKBP1A polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a cyclophilin polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM- interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a cyclophilin polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an FKBP1A polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a GyrB polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a GyrB polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a DHFR polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a DHFR polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a DmrB polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a DmrB polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a PYL polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an ABI polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an ABI polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an PYL polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a Cyr2 polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a CIB1 polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a CIB1 polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an Cry2 polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a GAI polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a GID1 polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a GID1 polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an GAI polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
- polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a first member of a dimerization pair; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a second member of the dimerization pair.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
- polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an FKBP1A polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an FKBP1A polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
- polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an FKBP1A polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an FRB polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an FRB polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an FKBP1A polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
- polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an FKBP1A polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a PPP3CA polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a PPP3CA polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an FKBP1A polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
- polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an FKBP1A polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a cyclophilin polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a cyclophilin polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an FKBP1A polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
- polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a GyrB polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a GyrB polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
- polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a DHFR polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a DHFR polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
- polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a DmrB polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a DmrB polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
- polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a PYL polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an ABI polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an ABI polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a PYL polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
- polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a
- Cry2 polypeptide and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a CIB l polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-inter acting polypeptide; and b) a first fusion partner, where the first fusion partner is a CIB l polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a Cry2 polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
- polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a GAI polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a GID1 polypeptide.
- a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a GID1 polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a GAI polypeptide.
- the components of a Cas9 heterodimer can be derived from any convenient Cas9 protein.
- Example Cas9 proteins include, but are not limited to the Streptococcus pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545 (depicted in Figure 5 and used as a reference Cas9 sequence in the examples below), as well as the Cas9 amino acid sequences set forth in any of SEQ ID NOs: 1- 259 and 795-1346.
- a subject Cas9 guide RNA and a subject Cas9 heterodimer form a complex.
- the guide RNA provides target specificity to the complex by comprising a nucleotide sequence that is complementary to a sequence (the target site) of a target nucleic acid.
- the Cas9 heterodimer of the complex provides the site-specific activity.
- the Cas9 heterodimer is guided to a target site within a target nucleic acid sequence (e.g. a chromosomal sequence or an extrachromosomal sequence, e.g.
- a subject Cas9 heterodimer can bind and/or modify (e.g., cleave, methylate, demethylate, etc.) a target nucleic acid and/or a polypeptide associated with target nucleic acid (e.g., methylation or acetylation of a histone tail).
- a subject Cas9 heterodimer can exhibit site -directed enzymatic activity (e.g., nuclease activity, activity for DNA and/or RNA methylation, activity for DNA and/or RNA cleavage, activity for histone acetylation, activity for histone methylation, activity for RNA modification, activity for RNA-binding, ativity for RNA splicing etc.).
- site -directed enzymatic activity e.g., nuclease activity, activity for DNA and/or RNA methylation, activity for DNA and/or RNA cleavage, activity for histone acetylation, activity for histone methylation, activity for RNA modification, activity for RNA-binding, ativity for RNA splicing etc.
- the Cas9 heterodimer exhibits reduced nuclease activity relative to the corresponding portion of a wild type Cas9 polypeptide.
- the Cas9 heterodimer is enzymatically inactive (sometimes referred to
- Assays to determine whether a subject Cas9 heterodimer interacts with a subject guide RNA and/or a target nucleic acid can be any convenient binding assay that tests for binding between a protein and a nucleic acid.
- Assays to determine whether a subject Cas9 heterodimer has an activity can be any convenient assay (e.g., a nucleic acid cleavage assay) that tests for the desired acitvity (e.g., cleavage, methylation, etc.).
- a subject Cas9 heterodimer e.g., a Cas9 heterodimer having a fusion partner that provides an activity
- has enzymatic activity that modifies target nucleic acid e.g., nuclease activity (which can be provided by the Cas9 sequences, but can alternatively be provided by exogenous amino acid sequences), methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity or glycosylase activity) and/or a polypeptide (e.g., a histone) associated with target nucleic acid (e.g., methyltransferase activity, de
- a polypeptide
- a suitable Cas9 polypeptide comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 60% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 99% or more or 100% amino acid sequence identity to motifs 1-4 of a Cas9 amino acid sequence (e.g., SEQ ID NO: 1545).
- compositions and methods that include a variant Cas9
- a variant Cas9 heterodimer has an amino acid sequence that is different by at least one amino acid (e.g., has a deletion, insertion, substitution, fusion) when compared to the amino acid sequence of a wild type Cas9 polypeptide.
- the variant Cas9 heterodimer has an amino acid change (e.g., deletion, insertion, or substitution) that reduces the nuclease activity of the Cas9 heterodimer.
- the variant Cas9 heterodimer has less than 50%, less than 40%, less than 30%, less than 20%, less than 10%, less than 5%, or less than 1% of the nuclease activity of the corresponding unmutated Cas9 heterodimer.
- the variant Cas9 heterodimer has substantially no nuclease activity.
- a subject Cas9 heterodimer is a variant Cas9 heterodimer that has no substantial nuclease activity, it can be referred to as a "dead Cas9 heterodimer" or "dCas9 heterodimer.”
- a subject Cas9 heterodimer can have reduced cleavage activity (e.g., can have nickase activity).
- a Cas9 heterodimer can have a functional HNH domain, but a defective RuvC domain.
- a Cas9 heterodimer can have a functional RuvC domain, but a defective HNH domain.
- a Cas9 heterodimer can substantially lack nuclease activity (e.g., can have defective HNH and RuvC domains).
- Any convenient amino acid mutation can be used to elicit the desired variant Cas9. For example, many amino acid mutations are known in the art that result in a Cas9 having a defective HNH domain and/or a defective RuvC domain.
- a variant Cas9 heterodimer has reduced nuclease activity.
- a variant Cas9 heterodimer suitable for use in a binding method of the present disclosure can exhibit less than about 20%, less than about 15%, less than about 10%, less than about 5%, less than about 1%, or less than about 0.1%, of the endonuclease activity of a corresponding unmutated Cas9 heterodimer, e.g., a Cas9 heterodimer comprising a wild type Cas9 amino acid sequence.
- a variant Cas9 heterodimer can cleave the complementary strand of a target nucleic acid but has reduced ability to cleave the non-complementary strand of a double stranded target nucleic acid.
- the variant Cas9 polypeptide can have a mutation (amino acid substitution) that reduces the function of the RuvC domain.
- a variant Cas9 heterodimer has a D10A (aspartate to alanine at amino acid position 10 of SEQ ID NO: 1545) mutation (or the corresponding mutation of any of the proteins presented in SEQ ID NOs: 1-256 and 795-1346) and can therefore cleave the complementary strand of a double stranded target nucleic acid but has reduced ability to cleave the non- complementary strand of a double stranded target nucleic acid (thus resulting in a single strand break (SSB) instead of a double strand break (DSB) (nickase activity) when the variant Cas9 heterodimer cleaves a double stranded target nucleic acid) (see, for example, Jinek et al., Science. 2012 Aug 17;337(6096):816-21).
- a variant Cas9 heterodimer can cleave the non-complementary strand of a double stranded target nucleic acid but has reduced ability to cleave the complementary strand of the target nucleic acid.
- the variant Cas9 heterodimer can have a mutation (amino acid substitution) that reduces the function of the HNH domain.
- a variant Cas9 heterodimer has an H839A mutation (histidine to alanine at amino acid position 839 of SEQ ID NO: 1545; or the corresponding mutation of any of the proteins set forth as SEQ ID NOs: 1-256 and 795-1346) and can therefore cleave the non-complementary strand of the target nucleic acid but has reduced ability to cleave the complementary strand of the target nucleic acid (thus resulting in a SSB instead of a DSB when the variant Cas9 polypeptide cleaves a double stranded target nucleic acid)(nickase activity).
- H839A mutation histidine to alanine at amino acid position 839 of SEQ ID NO: 1545; or the corresponding mutation of any of the proteins set forth as SEQ ID NOs: 1-256 and 795-1346
- Such a Cas9 heterodimer has a reduced ability to cleave a target nucleic acid (e.g., a single or double stranded target nucleic acid) and retains the ability to bind a target nucleic acid (e.g., a single or double stranded target nucleic acid).
- a target nucleic acid e.g., a single or double stranded target nucleic acid
- a variant Cas9 heterodimer has a reduced ability to cleave both the
- the variant Cas9 polypeptide harbors both the D10A and the H839A mutations (of SEQ ID NO: 1545, or the corresponding mutations of any of the proteins set forth as SEQ ID NOs: 1-256 and 795-1346) such that the heterodimer has a reduced ability to cleave both the complementary and the non-complementary strands of a double stranded target nucleic acid.
- Such a Cas9 heterodimer has a reduced ability to cleave a target nucleic acid (e.g., a single stranded or double stranded target nucleic acid) but retains the ability to bind a target nucleic acid (e.g., a single or double stranded target nucleic acid).
- a target nucleic acid e.g., a single stranded or double stranded target nucleic acid
- a target nucleic acid e.g., a single or double stranded target nucleic acid
- a Cas9 heterodimer is a chimeric Cas9 heterodimer that is fused to a fusion partner and can bind and/or modify a target nucleic acid (e.g., cleave, methylate, demethylate, etc.) and/or modify a polypeptide associated with target nucleic acid (e.g., methylation, acetylation, etc., of, for example, a histone tail) and/or modulate transcription of a target nucleic acid.
- a target nucleic acid e.g., cleave, methylate, demethylate, etc.
- a polypeptide associated with target nucleic acid e.g., methylation, acetylation, etc., of, for example, a histone tail
- a Cas9 heterodimer with a fusion partner has reduced nuclease activity (e.g., as described above) and is fused to a heterologous polypeptide that provides an activity that will be exhibited by the Cas9 heterodimer (e.g., target cleavage, target methylation, transcription modulation, etc.).
- a heterologous polypeptide that provides an activity that will be exhibited by the Cas9 heterodimer (e.g., target cleavage, target methylation, transcription modulation, etc.).
- a method of binding e.g., in some cases where the Cas9 heterodimer is a variant Cas9 heterodimer having a fusion partner (i.e., having a heterologous polypeptide) with an activity (e.g., an enzymatic activity) that modifies the target nucleic acid
- the method can also be considered to be a method of modifying the target nucleic acid.
- a method of binding a target nucleic acid e.g., a single or double stranded target nucleic acid
- a method of binding a target nucleic acid can be a method of modifying the target nucleic acid.
- a heterologous sequence provides for subcellular localization, i.e., the heterologous sequence is a subcellular localization sequence (e.g., a nuclear localization signal (NLS) for targeting to the nucleus, a sequence to keep the fusion protein out of the nucleus, e.g., a nuclear export sequence (NES), a sequence to keep the fusion protein retained in the cytoplasm, a mitochondrial localization signal for targeting to the mitochondria, a chloroplast localization signal for targeting to a chloroplast, an ER retention signal, and the like).
- a subcellular localization sequence e.g., a nuclear localization signal (NLS) for targeting to the nucleus, a sequence to keep the fusion protein out of the nucleus, e.g., a nuclear export sequence (NES), a sequence to keep the fusion protein retained in the cytoplasm, a mitochondrial
- a variant Cas9 does not include a NLS so that the protein is not targeted to the nucleus (which can be advantageous, e.g., when the target nucleic acid is an RNA that is present in the cytosol).
- the heterologous sequence can provide a tag (i.e., the heterologous sequence is a detectable label) for ease of tracking and/or purification (e.g., a fluorescent protein, e.g., green fluorescent protein (GFP), YFP, RFP, CFP, mCherry, tdTomato, and the like; a histidine tag, e.g., a 6XHis tag; a hemagglutinin (HA) tag; a FLAG tag; a Myc tag; and the like).
- the heterologous sequence can provide for increased or decreased
- the heterologous sequence is a stability control peptide, e.g., a degron, which in some cases is controllable (e.g., a temperature sensitive or drug controllable degron sequence).
- the heterologous sequence can provide for increased or decreased transcription from the target nucleic acid (i.e., the heterologous sequence is a transcription modulation sequence, e.g., a transcription factor/activator or a fragment thereof, a protein or fragment thereof that recruits a transcription factor/activator, a transcription repressor or a fragment thereof, a protein or fragment thereof that recruits a transcription repressor, a small molecule/drug-responsive transcription regulator, etc.).
- a transcription modulation sequence e.g., a transcription factor/activator or a fragment thereof, a protein or fragment thereof that recruits a transcription factor/activator, a transcription repressor or a fragment thereof, a protein or fragment thereof that recruits a transcription repressor, a small
- the heterologous sequence can provide a binding domain (i.e., the heterologous sequence is a protein binding sequence, e.g., to provide the ability of a Cas9 heterodimer to bind to another protein of interest, e.g., a DNA or histone modifying protein, a transcription factor or transcription repressor, a recruiting protein, an RNA modification enzyme, an RNA-binding protein, a translation initiation factor, an RNA splicing factor, etc.).
- a DNA or histone modifying protein e.g., to provide the ability of a Cas9 heterodimer to bind to another protein of interest, e.g., a DNA or histone modifying protein, a transcription factor or transcription repressor, a recruiting protein, an RNA modification enzyme, an RNA-binding protein, a translation initiation factor, an RNA splicing factor, etc.
- a subject Cas9 heterodimer can have multiple (1 or more, 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, etc.) fusion partners in any combination.
- a Cas9 heterodimer can have a heterologous sequence that provides an activity (e.g., for transcription modulation, target modification, modification of a protein associated with a target nucleic acid, etc.) and can also have a subcellular localization sequence (e.g., one or more NLSs).
- such a Cas9 heterodimer might also have a tag for ease of tracking and/or purification (e.g., green fluorescent protein (GFP), YFP, RFP, CFP, mCherry, tdTomato, and the like; a histidine tag, e.g., a 6XHis tag; a hemagglutinin (HA) tag; a FLAG tag; a Myc tag; and the like).
- GFP green fluorescent protein
- YFP green fluorescent protein
- RFP Red FFP
- CFP mCherry
- tdTomato e.g., a histidine tag
- HA hemagglutinin
- FLAG tag e.g., hemagglutinin
- Myc tag e.g., Myc tag
- a fusion partner (or multiple fusion partners) (e.g., an NLS, a tag, a fusion partner providing an activity, etc.) is located at or near the C-terminus of the first and/or second polypeptide (nuclease lobe / alpha-helical lobe). In some cases a fusion partner (or multiple fusion partners) (e.g., an NLS, a tag, a fusion partner providing an activity, etc.) is located at the N-terminus of the first and/or second polypeptide (nuclease lobe / alpha-helical lobe).
- a Cas9 heterodimer (either on the nuclease lobe, the alpha-helical lobe, or both) has a fusion partner (or multiple fusion partners)(e.g., an NLS, a tag, a fusion partner providing an activity, etc.) at both the N-terminus and C-terminus.
- a fusion partner e.g., an NLS, a tag, a fusion partner providing an activity, etc.
- Suitable fusion partners that provide for increased or decreased stability include, but are not limited to degron sequences.
- Degrons are readily understood by one of ordinary skill in the art to be amino acid sequences that control the stability of the protein of which they are part. For example, the stability of a protein comprising a degron sequence is controlled in part by the degron sequence.
- a suitable degron is constitutive such that the degron exerts its influence on protein stability independent of experimental control (i.e., the degron is not drug inducible, temperature inducible, etc.)
- the degron provides the Cas9 heterodimer with controllable stability such that the Cas9 heterodimer can be turned “on” (i.e., stable) or “off (i.e., unstable, degraded) depending on the desired conditions.
- the variant Cas9 polypeptide may be functional (i.e., "on", stable) below a threshold temperature (e.g., 42°C, 41°C, 40°C, 39°C, 38°C, 37°C, 36°C, 35°C, 34°C, 33°C, 32°C, 31°C, 30°C, etc.) but non-functional (i.e., "off, degraded) above the threshold temperature.
- a threshold temperature e.g., 42°C, 41°C, 40°C, 39°C, 38°C, 37°C, 36°C, 35°C, 34°C, 33°C, 32°C, 31°C, 30°C, etc.
- an exemplary drug inducible degron is derived from the FKBP12 protein.
- the stability of the degron is controlled by the presence or absence of a small molecule that binds to the degron.
- suitable degrons include, but are not limited to those degrons controlled by Shield- 1, DHFR, auxins, and/or temperature.
- suitable degrons are known in the art (e.g., Dohmen et al., Science, 1994. 263(5151): p. 1273-1276: Heat-inducible degron: a method for constructing temperature-sensitive mutants; Schoeber et al., Am J Physiol Renal Physiol. 2009 Jan;296(l):F204-l l : Conditional fast expression and function of multimeric TRPV5 channels using Shield-1 ; Chu et al., Bioorg Med Chem Lett.
- Suitable fusion partners include, but are not limited to, a polypeptide that provides for methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, or demyristoylation activity, any of which can be directed at modifying nucleic acid directly (e.g., methylation of DNA or RNA) or at modifying a nucleic acid- associated polypeptide (e.g., a histone, a DNA binding protein, and RNA binding protein, and the like).
- a nucleic acid- associated polypeptide e.g., a histone, a DNA binding protein, and RNA binding protein, and
- fusion partners include, but are not limited to boundary elements (e.g., CTCF), proteins and fragments thereof that provide periphery recruitment (e.g., Lamin A, Lamin B, etc.), and protein docking elements (e.g., FKBP/FRB, Pill/Abyl, etc.).
- boundary elements e.g., CTCF
- proteins and fragments thereof that provide periphery recruitment e.g., Lamin A, Lamin B, etc.
- protein docking elements e.g., FKBP/FRB, Pill/Abyl, etc.
- Cas9 heterodimer include, but are not limited to those described in the PCT patent applications: WO2010075303, WO2012068627, and WO2013155555, and can be found, for example, in U.S. patents and patent applications: 8,906,616; 8,895,308; 8,889,418; 8,889,356; 8,871,445;
- suitable fusion partners include, but are not limited to, a polypeptide that provides an activity that indirectly increases transcription by acting directly on the target nucleic acid or on a polypeptide (e.g., a histone, a DNA-binding protein, an RNA-binding protein, an RNA editing protein, etc.) associated with the target nucleic acid.
- a polypeptide e.g., a histone, a DNA-binding protein, an RNA-binding protein, an RNA editing protein, etc.
- Suitable fusion partners include, but are not limited to, a polypeptide that provides for methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, or demyristoylation activity.
- Additional suitable fusion partners include, but are not limited to, a polypeptide that directly provides for increased transcription and/or translation of a target nucleic acid (e.g., a transcription activator or a fragment thereof, a protein or fragment thereof that recruits a transcription activator, a small molecule/drug-responsive transcription and/or translation regulator, a translation-regulating protein, etc.).
- a target nucleic acid e.g., a transcription activator or a fragment thereof, a protein or fragment thereof that recruits a transcription activator, a small molecule/drug-responsive transcription and/or translation regulator, a translation-regulating protein, etc.
- Non-limiting examples of fusion partners to accomplish increased or decreased transcription include transcription activator and transcription repressor domains (e.g., the Kriippel associated box (KRAB or SKD); the Mad mSIN3 interaction domain (SID); the ERF repressor domain (ERD), etc).
- transcription activator and transcription repressor domains e.g., the Kriippel associated box (KRAB or SKD); the Mad mSIN3 interaction domain (SID); the ERF repressor domain (ERD), etc.
- a Cas9 heterodimer is targeted by the guide nucleic acid (guide RNA) to a specific location (i.e., sequence) in the target nucleic acid and exerts locus-specific regulation such as blocking RNA polymerase binding to a promoter (which selectively inhibits transcription activator function), and/or modifying the local chromatin status (e.g., when a fusion sequence is used that modifies the target nucleic acid or modifies a polypeptide associated with the target nucleic acid).
- the changes are transient (e.g., transcription repression or activation).
- the changes are inheritable (e.g., when epigenetic modifications are made to the target nucleic acid or to proteins associated with the target nucleic acid, e.g., nucleosomal histones).
- Non-limiting examples of fusion partners for use when targeting ssRNA target nucleic acids include (but are not limited to): splicing factors (e.g., RS domains); protein translation components (e.g., translation initiation, elongation, and/or release factors; e.g., eIF4G); RNA methylases; RNA editing enzymes (e.g., RNA deaminases, e.g., adenosine deaminase acting on RNA (ADAR), including A to I and/or C to U editing enzymes); helicases; RNA-binding proteins; and the like. It is understood that a fusion partner can include the entire protein or in some cases can include a fragment of the protein (e.g., a functional domain).
- splicing factors e.g., RS domains
- protein translation components e.g., translation initiation, elongation, and/or release factors; e.g., eIF4
- the heterologous sequence can be fused to the C-terminus of the Cas9 heterodimer (e.g., of either lobe of the heterodimer). In some embodiments, the heterologous sequence can be fused to the N-terminus of the Cas9 heterodimer (e.g., of either lobe of the heterodimer). In some embodiments, the heterologous sequence can be fused to an internal portion (i.e., a portion other than the N- or C- terminus) of the Cas9 heterodimer (e.g., of either lobe of the heterodimer).
- the fusion partner of a Cas9 heterodimer can be any domain capable of interacting with ssRNA (which, for the purposes of this disclosure, includes intramolecular and/or intermolecular secondary structures, e.g., double-stranded RNA duplexes such as hairpins, stem-loops, etc.), whether transiently or irreversibly, directly or indirectly, including but not limited to an effector domain selected from the group comprising; Endonucleases (for example RNase I I I, the CRR22 DYW domain, Dicer, and PIN (PilT N-terminus) domains from proteins such as SMG5 and SMG6); proteins and protein domains responsible for stimulating RNA cleavage (for example CPSF, CstF, CFIm and CFIIm); Exonucleases (for example XRN-1 or Exonuclease T) ; Deadenylases (for example HNT3); proteins and protein domains responsible
- the effector domain may be selected from the group comprising Endonucleases; proteins and protein domains capable of stimulating RNA cleavage; Exonucleases; Deadenylases; proteins and protein domains having nonsense mediated RNA decay activity; proteins and protein domains capable of stabilizing RNA; proteins and protein domains capable of repressing translation; proteins and protein domains capable of stimulating translation; proteins and protein domains capable of modulating translation (e.g., translation factors such as initiation factors, elongation factors, release factors, etc., e.g., eIF4G); proteins and protein domains capable of polyadenylation of RNA; proteins and protein domains capable of polyuridinylation of RNA; proteins and protein domains having RNA localization activity; proteins and protein domains capable of nuclear retention of RNA; proteins and protein domains having RNA nuclear export activity; proteins and protein domains capable of repression of RNA splicing; proteins and protein domains capable of stimulation of RNA splicing; proteins and protein domain
- Another suitable fusion partner is a PUF RNA -binding domain, which is described in more detail in WO2012068627.
- RNA splicing factors that can be used (in whole or as fragments thereof) as fusion
- RNA binding modules have modular organization, with separate sequence-specific RNA binding modules and splicing effector domains.
- members of the Serine/ Arginine-rich (SR) protein family contain N-terminal RNA recognition motifs (RRMs) that bind to exonic splicing enhancers (ESEs) in pre-mRNAs and C-terminal RS domains that promote exon inclusion.
- RRMs N-terminal RNA recognition motifs
- ESEs exonic splicing enhancers
- the hnRNP protein hnRNP Al binds to exonic splicing silencers (ESSs) through its RRM domains and inhibits exon inclusion through a C-terminal Glycine-rich domain.
- ESSs exonic splicing silencers
- splicing factors can regulate alternative use of splice site (ss) by binding to regulatory sequences between the two alternative sites.
- ASF/SF2 can recognize ESEs and promote the use of intron proximal sites
- hnRNP Al can bind to ESSs and shift splicing towards the use of intron distal sites.
- One application for such factors is to generate ESFs that modulate alternative splicing of endogenous genes, particularly disease associated genes.
- Bcl-x pre-mRNA produces two splicing isoforms with two alternative 5' splice sites to encode proteins of opposite functions.
- the long splicing isoform Bcl- xL is a potent apoptosis inhibitor expressed in long-lived postmitotic cells and is up-regulated in many cancer cells, protecting cells against apoptotic signals.
- the short isoform Bcl-xS is a pro- apoptotic isoform and expressed at high levels in cells with a high turnover rate (e.g., developing lymphocytes).
- the ratio of the two Bcl-x splicing isoforms is regulated by multiple cis-elements that are located in either the core exon region or the exon extension region (i.e., between the two alternative 5' splice sites). For more examples, see WO2010075303.
- a Cas9 heterodimer e.g., having a wild type Cas9 cleavage activity, having reduced nuclease activity, etc.
- a linker as described elsewhere.
- a heterologous polypeptide (e.g., a fusion partner) may include a "Protein Transduction
- PTD also known as a CPP - cell penetrating peptide
- polypeptide polynucleotide, carbohydrate, or organic or inorganic compound that facilitates traversing a lipid bilayer, micelle, cell membrane, organelle membrane, or vesicle membrane.
- a PTD attached to another molecule which can range from a small polar molecule to a large macromolecule and/or a nanoparticle, facilitates the molecule traversing a membrane, for example going from extracellular space to intracellular space, or cytosol to within an organelle.
- a PTD is covalently linked to the amino terminus a polypeptide (e.g., a Cas9 heterodimer).
- a PTD is covalently linked to the carboxyl terminus of a polypeptide (e.g., a Cas9 heterodimer). In some embodiments, a PTD is covalently linked to the carboxyl terminus and the amino terminus of a Cas9 heterodimer. In some cases, a subject Cas9 heterodimer is includes (is conjugated to, is fused to) one or more PTDs (e.g., two or more, three or more, four or more PTDs). In some cases a PTD includes a nuclear localization signal (NLS) (e.g, in some cases 2 or more, 3 or more, 4 or more, or 5 or more NLSs).
- NLS nuclear localization signal
- a Cas9 heterodimer (e.g., a Cas9 heterodimer that includes a heterologous polypeptide sequence) includes one or more NLSs (e.g., 2 or more, 3 or more, 4 or more, or 5 or more NLSs).
- a PTD is covalently linked to a nucleic acid (e.g., a Cas9 guide nucleic acid, a polynucleotide encoding a Cas9 guide nucleic acid, a polynucleotide encoding a Cas9 heterodimer, a donor polynucleotide, etc.).
- PTDs include but are not limited to a minimal undecapeptide protein transduction domain (corresponding to residues 47-57 of HIV- 1 TAT comprising YGRKKRRQRRR; SEQ ID NO: 264); a polyarginine sequence comprising a number of arginines sufficient to direct entry into a cell (e.g., 3, 4, 5, 6, 7, 8, 9, 10, or 10-50 arginines); a VP22 domain (Zender et al. (2002) Cancer Gene Ther. 9(6):489-96); an Drosophila Antennapedia protein transduction domain (Noguchi et al. (2003) Diabetes 52(7): 1732-1737); a truncated human calcitonin peptide (Trehin et al. (2004) Pharm. Research 21 : 1248-1256); polylysine (Wender et al. (2000) Proc. Natl. Acad. Sci. USA 97: 13003-13008);
- Exemplary PTDs include but are not limited to, YGRKKRRQRRR (SEQ ID NO: 264), RKKRRQRRR (SEQ ID NO: 269); an arginine homopolymer of from 3 arginine residues to 50 arginine residues;
- Exemplary PTD domain amino acid sequences include, but are not limited to, any of the following: YGRKKRRQRRR (SEQ ID NO: 264); RKKRRQRR (SEQ ID NO: 269); YARAAARQARA (SEQ ID NO: 271); THRLPRRRRRR (SEQ ID NO: 272); and GGRRARRRRRR (SEQ ID NO: 273).
- the PTD is an activatable CPP (ACPP) (Aguilera et al. (2009) Integr Biol (Camb) June; 1(5-6): 371-381).
- ACPPs comprise a polycationic CPP (e.g., Arg9 or "R9") connected via a cleavable linker to a matching polyanion (e.g., Glu9 or "E9”), which reduces the net charge to nearly zero and thereby inhibits adhesion and uptake into cells.
- a polyanion e.g., Glu9 or "E9
- the present invention provides a composition comprising a Cas9 heterodimer of the present disclosure.
- a subject composition is useful for carrying out a method of the present disclosure.
- the composition can comprise (e.g., in addition to a Cas9 guide RNA), one or more of: a salt, e.g., NaCl, MgCl 2 , KC1, MgS0 4 , etc.; a buffering agent, e.g., a Tris buffer, N-(2- Hydroxyethyl)piperazine-N'-(2-ethanesulfonic acid) (HEPES), 2-(N-Morpholino)ethanesulfonic acid (MES), MES sodium salt, 3-(N-Morpholino)propanesulfonic acid (MOPS), N- tris[Hydroxymethyl]methyl-3-aminopropanesulfonic acid (TAPS), etc.; a solubilizing agent; a detergent, e.g.
- a Cas9 heterodimer of the present disclosure present in a subject
- composition is pure, e.g., at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, at least about 99%, or more than 99% pure, where "% purity" means that the Cas9 heterodimer is the recited percent free from other proteins, other macromolecules, or contaminants that may be present during the production of the Cas9 heterodimer.
- the present disclosure provides nucleic acids comprising nucleotide sequences encoding a Cas9 heterodimer of the present disclosure.
- a single nucleic acid comprises nucleotide sequences encoding the first fusion polypeptide and the second fusion polypeptide of a Cas9 heterodimer of the present disclosure.
- the nucleotide sequence encoding first fusion polypeptide and the nucleotide sequence encoding the second fusion polypeptide are operably linked to the same transcriptional control element (e.g., a promoter).
- the nucleotide sequence encoding first fusion polypeptide and the nucleotide sequence encoding the second fusion polypeptide are operably linked to two different transcriptional control elements (e.g., two different promoters).
- the promoters are constitutive.
- the promoters are inducible. In some cases, the promoters are functional in eukaryotic cells. In some cases, the promoters are cell type- specific promoters. In some cases, the promoters are tissue-specific promoters.
- nucleotide sequences encoding the first fusion polypeptide and the second fusion polypeptide of a Cas9 heterodimer of the present disclosure are on separate nucleic acids.
- the present disclosure provides: a) a first nucleic acid comprising a nucleotide sequence encoding the first fusion polypeptide of a Cas9 heterodimer of the present disclosure; and b) a second nucleic acid comprising a nucleotide sequence encoding the second fusion polypeptide of a Cas9 heterodimer of the present disclosure.
- the nucleotide sequence encoding first fusion polypeptide and the nucleotide sequence encoding the second fusion polypeptide are operably linked to the same transcriptional control element (e.g., a promoter). In some cases, the nucleotide sequence encoding first fusion polypeptide and the nucleotide sequence encoding the second fusion polypeptide are operably linked to two different transcriptional control elements (e.g., two different promoters). ). In some cases, the promoters are constitutive. In some cases, the promoters are inducible. In some cases, the promoters are functional in eukaryotic cells. In some cases, the promoters are cell type-specific promoters. In some cases, the promoters are tissue-specific promoters.
- the same transcriptional control element e.g., a promoter
- the nucleotide sequence encoding first fusion polypeptide and the nucleotide sequence encoding the second fusion polypeptide are operably linked to
- nucleic acid(s) comprising a nucleotide sequence(s) encoding a Cas9
- heterodimer of the present disclosure is a recombinant expression vector.
- the recombinant expression vector is a viral construct, e.g., a recombinant adeno-associated virus construct (see, e.g., U.S. Patent No. 7,078,387), a recombinant adenoviral construct, a recombinant lentiviral construct, a recombinant retroviral construct, etc.
- Suitable expression vectors include, but are not limited to, viral vectors (e.g. viral vectors based on vaccinia virus; poliovirus; adenovirus (see, e.g., Li et al., Invest Opthalmol Vis Sci 35:2543 2549, 1994; Borras et al., Gene Ther 6:515 524, 1999; Li and Davidson, PNAS 92:7700 7704, 1995; Sakamoto et al, H Gene Ther 5: 1088 1097, 1999; WO 94/12649, WO 93/03769; WO 93/19191 ; WO 94/28938; WO 95/11984 and WO 95/00655); adeno-associated virus (see, e.g., Ali et al., Hum Gene Ther 9:81 86, 1998, Flannery et al., PNAS 94:6916 6921, 1997; Bennett et al., Invest
- a retroviral vector e.g., Murine Leuk
- any of a number of suitable transcription and translation control elements including constitutive and inducible promoters, transcription enhancer elements, transcription terminators, etc. may be used in the expression vector (see e.g., Bitter et al. (1987) Methods in Enzymology, 153:516-544).
- a control element e.g., a transcriptional control element, such as a promoter.
- the transcriptional control element may be functional in either a eukaryotic cell, e.g., a mammalian cell, a yeast cell, a plant cell, an amphibian cell, etc.; or a prokaryotic cell (e.g., bacterial or archaeal cell).
- a nucleotide sequence encoding a Cas9 heterodimer of the present disclosure is operably linked to multiple control elements that allow expression of the Cas9 heterodimer in both prokaryotic and eukaryotic cells.
- Non-limiting examples of suitable eukaryotic promoters include those from cytomegalovirus (CMV) immediate early, herpes simplex virus (HSV) thymidine kinase, early and late SV40, long terminal repeats (LTRs) from retrovirus, and mouse metallothionein-I. Selection of the appropriate vector and promoter is well within the level of ordinary skill in the art.
- the expression vector may also contain a ribosome binding site for translation initiation and a transcription terminator.
- the expression vector may also include appropriate sequences for amplifying expression.
- the expression vector may also include nucleotide sequences encoding protein tags (e.g., 6xHis tag, hemagglutinin tag, green fluorescent protein, yellow fluorescent protein, red fluorescent protein, etc.) that are fused to the Cas9 heterodimer.
- protein tags e.g., 6xHis tag, hemagglutinin tag, green fluorescent protein, yellow fluorescent protein, red fluorescent protein, etc.
- a nucleotide sequence encoding the first fusion polypeptide and/or the second fusion polypeptide of a Cas9 heterodimer of the present disclosure is operably linked to an inducible promoter.
- a nucleotide sequence encoding a Cas9 heterodimer of the present disclosure is operably linked to a constitutive promoter.
- Methods of introducing a nucleic acid into a host cell are known in the art, and any known method can be used to introduce a nucleic acid (e.g., an expression construct) into a cell. Suitable methods include e.g., viral or bacteriophage infection, transfection, conjugation, protoplast fusion, lipofection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEI)- mediated transfection, DEAE-dextran mediated transfection, liposome-mediated transfection, particle gun technology, calcium phosphate precipitation, direct micro injection, nanoparticle- mediated nucleic acid delivery (see, e.g., Panyam et., al Adv Drug Deliv Rev. 2012 Sep 13. pii: S0169-409X(12)00283-9. doi: 10.1016/j.addr.2012.09.023 ), and the like.
- PKI polyethyleneimine
- a "promoter sequence” is a DNA regulatory region capable of binding RNA polymerase and initiating transcription of a downstream (3' direction) coding or non-coding sequence.
- a promoter sequence is bounded at its 3' terminus by the transcription initiation site and extends upstream (5' direction) to include the minimum number of bases or elements necessary to initiate transcription at levels detectable above background.
- a transcription initiation site within the promoter sequence will be found a transcription initiation site, as well as protein binding domains responsible for the binding of RNA polymerase.
- Eukaryotic promoters will often, but not always, contain "TATA" boxes and "CAT” boxes.
- Various promoters, including inducible promoters may be used to drive the various vectors of the present invention.
- a promoter can be a constitutively active promoter (i.e., a promoter that is constitutively in an active/"ON” state), it may be an inducible promoter (i.e., a promoter whose state, active/"ON” or inactive/"OFF", is controlled by an external stimulus, e.g., the presence of a particular temperature, compound, or protein.), it may be a spatially restricted promoter (i.e.,
- transcriptional control element e.g., tissue specific promoter, cell type specific promoter, etc.
- tissue specific promoter e.g., tissue specific promoter, cell type specific promoter, etc.
- it may be a temporally restricted promoter (i.e., the promoter is in the "ON" state or "OFF” state during specific stages of embryonic development or during specific stages of a biological process, e.g., hair follicle cycle in mice).
- Suitable promoters can be derived from viruses and can therefore be referred to as viral
- promoters or they can be derived from any organism, including prokaryotic or eukaryotic organisms. Suitable promoters can be used to drive expression by any RNA polymerase (e.g., pol I, pol II, pol III).
- RNA polymerase e.g., pol I, pol II, pol III
- Exemplary promoters include, but are not limited to the SV40 early promoter, mouse mammary tumor virus long terminal repeat (LTR) promoter; adenovirus major late promoter (Ad MLP); a herpes simplex virus (HSV) promoter, a cytomegalovirus (CMV) promoter such as the CMV immediate early promoter region (CMVIE), a rous sarcoma virus (RSV) promoter, a human U6 small nuclear promoter (U6) (Miyagishi et al. , Nature
- LTR adenovirus major late promoter
- HSV herpes simplex virus
- CMV cytomegalovirus
- CMVIE CMV immediate early promoter region
- RSV rous sarcoma virus
- U6 small nuclear promoter U6 small nuclear promoter
- an enhanced U6 promoter e.g., Xia et al., Nucleic Acids Res. 2003 Sep 1 ;31(17)
- a human HI promoter HI
- inducible promoters include, but are not limited toT7 RNA polymerase promoter,
- T3 RNA polymerase promoter isopropyl-beta-D-thiogalactopyranoside (IPTG)-regulated promoter, lactose induced promoter, heat shock promoter, Tetracycline-regulated promoter, Steroid-regulated promoter, Metal-regulated promoter, estrogen receptor-regulated promoter, etc.
- Inducible promoters can therefore be regulated by molecules including, but not limited to, doxycycline; RNA polymerase, e.g., T7 RNA polymerase; an estrogen receptor; an estrogen receptor fusion; etc.
- the promoter is a spatially restricted promoter (i.e., cell type specific promoter, tissue specific promoter, etc.) such that in a multi-cellular organism, the promoter is active (i.e., "ON") in a subset of specific cells.
- spatially restricted promoters may also be referred to as enhancers, transcriptional control elements, control sequences, etc.
- any convenient spatially restricted promoter may be used and the choice of suitable promoter (e.g., a brain specific promoter, a promoter that drives expression in a subset of neurons, a promoter that drives expression in the germline, a promoter that drives expression in the lungs, a promoter that drives expression in muscles, a promoter that drives expression in islet cells of the pancreas, etc.) will depend on the organism.
- various spatially restricted promoters are known for plants, flies, worms, mammals, mice, etc.
- a spatially restricted promoter can be used to regulate the expression of a nucleic acid encoding a subject site-directed modifying polypeptide in a wide variety of different tissues and cell types, depending on the organism.
- Some spatially restricted promoters are also temporally restricted such that the promoter is in the "ON" state or "OFF" state during specific stages of embryonic development or during specific stages of a biological process (e.g., hair follicle cycle in mice).
- examples of spatially restricted promoters include, but are not limited to, neuron-specific promoters, adipocyte-specific promoters, cardiomyocyte-specific promoters, smooth muscle-specific promoters, photoreceptor-specific promoters, etc.
- Neuron-specific spatially restricted promoters include, but are not limited to, a neuron-specific enolase (NSE) promoter (see, e.g., EMBL HSEN02, X51956); an aromatic amino acid decarboxylase (AADC) promoter; a neurofilament promoter (see, e.g., GenBank HUMNFL, L04147); a synapsin promoter (see, e.g., GenBank HUMSYNIB, M55301); a thy-1 promoter (see, e.g., Chen et al. (1987) Cell 51:7-19; and Llewellyn, et al. (2010) Nat. Med.
- NSE neuron-specific enolase
- AADC aromatic amino acid decarboxylase
- Adipocyte-specific spatially restricted promoters include, but are not limited to aP2 gene
- promoter/enhancer e.g., a region from -5.4 kb to +21 bp of a human aP2 gene (see, e.g., Tozzo et al. (1997) Endocrinol. 138:1604; Ross et al. (1990) Proc. Natl. Acad. Sci. USA 87:9590; and Pavjani et al. (2005) Nat. Med. 11:797); a glucose transporter-4 (GLUT4) promoter (see, e.g., Knight et al. (2003) Proc. Natl. Acad. Sci. USA 100:14725); a fatty acid translocase
- adiponectin promoter see, e.g., Kita et al. (2005) Biochem. Biophys. Res. Comm. 331:484; and Chakrabarti (2010) Endocrinol. 151:2408
- an adipsin promoter see, e.g., Piatt et al. (1989) Proc. Natl. Acad. Sci. USA 86:7490
- a resistin promoter see, e.g., Seo et al. (2003) Molec. Endocrinol. 17:1522); and the like.
- Cardiomyocyte-specific spatially restricted promoters include, but are not limited to control sequences derived from the following genes: myosin light chain-2, a-myosin heavy chain, AE3, cardiac troponin C, cardiac actin, and the like.
- Franz et al. (1997) Cardiovasc. Res. 35:560-566; Robbins et al. (1995) Ann. N.Y. Acad. Sci. 752:492-505; Linn et al. (1995) Circ. Res. 76:584- 591; Parmacek et al. (1994) Mol. Cell. Biol. 14:1870-1885; Hunter et al. (1993) Hypertension 22:608-617; and Sartorelli et al. (1992) Proc. Natl. Acad. Sci. USA 89:4047-4051.
- Smooth muscle-specific spatially restricted promoters include, but are not limited to an SM22a promoter (see, e.g., Akyurek et al. (2000) Mol. Med. 6:983; and U.S. Patent No. 7,169,874); a smoothelin promoter (see, e.g., WO 2001/018048); an a-smooth muscle actin promoter; and the like.
- SM22a promoter see, e.g., Akyurek et al. (2000) Mol. Med. 6:983; and U.S. Patent No. 7,169,874
- a smoothelin promoter see, e.g., WO 2001/018048
- an a-smooth muscle actin promoter a-smooth muscle actin promoter
- a 0.4 kb region of the SM22a promoter, within which lie two CArG elements has been shown to mediate vascular smooth muscle cell-specific expression (see,
- Photoreceptor-specific spatially restricted promoters include, but are not limited to, a rhodopsin promoter; a rhodopsin kinase promoter (Young et al. (2003) Ophthalmol. Vis. Sci. 44:4076); a beta phosphodiesterase gene promoter (Nicoud et al. (2007) J. Gene Med. 9:1015); a retinitis pigmentosa gene promoter (Nicoud et al. (2007) supra); an interphotoreceptor retinoid-binding protein (IRBP) gene enhancer (Nicoud et al. (2007) supra); an IRBP gene promoter (Yokoyama et al. (1992) Exp Eye Res. 55:225); and the like.
- the present disclosure provides host cells comprising (e.g., genetically modified to comprise) a nucleic acid of the present disclosure.
- the present disclosure provides host cells comprising (e.g., genetically modified to comprise) a recombinant vector of the present disclosure.
- Suitable host cells include, e.g. a bacterial cell; an archaeal cell; a cell of a single -cell
- eukaryotic organism a plant cell; an algal cell, e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella p renoidosa, Sargassum patens C. Agardh, and the like; a fungal cell (e.g., a yeast cell); an animal cell; a cell from an invertebrate animal (e.g.
- fruit fly cnidarian, echinoderm, nematode, etc.
- a cell from a vertebrate animal e.g., fish, amphibian, reptile, bird, mammal
- a cell from a mammal e.g., a cell from a rodent, a cell from a human, etc.
- a suitable host cell can be a stem cell (e.g. an embryonic stem (ES) cell, an induced pluripotent stem (iPS) cell); a germ cell; a somatic cell, e.g. a fibroblast, a hematopoietic cell, a neuron, a muscle cell, a bone cell, a hepatocyte, a pancreatic cell; an in vitro or in vivo embryonic cell of an embryo at any stage, e.g., a 1-cell, 2-cell, 4-cell, 8-cell, etc. stage zebrafish embryo; etc.).
- ES embryonic stem
- iPS induced pluripotent stem
- Cells may be from established cell lines or they may be primary cells, where "primary cells”, “primary cell lines”, and “primary cultures” are used interchangeably herein to refer to cells and cells cultures that have been derived from a subject and allowed to grow in vitro for a limited number of passages, i.e. splittings, of the culture.
- primary cultures include cultures that may have been passaged 0 times, 1 time, 2 times, 4 times, 5 times, 10 times, or 15 times, but not enough times go through the crisis stage.
- Primary cell lines can be maintained for fewer than 10 passages in vitro.
- Host cells are in many embodiments unicellular organisms, or are grown in culture.
- the cells may be harvest from an organism (e.g., an individual) by any convenient method.
- leukocytes may be conveniently harvested by apheresis, leukocytapheresis, density gradient separation, etc.
- cells from tissues such as skin, muscle, bone marrow, spleen, liver, pancreas, lung, intestine, stomach, etc. are most conveniently harvested by biopsy.
- An appropriate solution may be used for dispersion or suspension of the harvested cells.
- Such solution will generally be a balanced salt solution, e.g.
- fetal calf serum or other naturally occurring factors, in conjunction with an acceptable buffer at low concentration, e.g., from 5-25 mM.
- Convenient buffers include HEPES, phosphate buffers, lactate buffers, etc.
- the cells may be used immediately, or they may be stored, frozen, for long periods of time, being thawed and capable of being reused. In such cases, the cells can be frozen in 10% dimethyl sulfoxide (DMSO), 50% serum, 40% buffered medium, or some other such solution as is commonly used in the art to preserve cells at such freezing
- DMSO dimethyl sulfoxide
- a subject genetically modified host cell is in vitro.
- a subject genetically modified host cell is in vivo. In some embodiments, a subject genetically modified host cell is a prokaryotic cell or is derived from a prokaryotic cell. In some embodiments, a subject genetically modified host cell is a bacterial cell or is derived from a bacterial cell. In some embodiments, a subject genetically modified host cell is an archaeal cell or is derived from an archaeal cell. In some embodiments, a subject genetically modified host cell is a eukaryotic cell or is derived from a eukaryotic cell. In some embodiments, a subject genetically modified host cell is a plant cell or is derived from a plant cell. In some embodiments,
- a subject genetically modified host cell is an animal cell or is derived from an animal cell. In some embodiments, a subject genetically modified host cell is an invertebrate cell or is derived from an invertebrate cell. In some embodiments, a subject genetically modified host cell is a vertebrate cell or is derived from a vertebrate cell. In some embodiments, a subject genetically modified host cell is a mammalian cell or is derived from a mammalian cell. In some embodiments, a subject genetically modified host cell is a rodent cell or is derived from a rodent cell. In some embodiments, a subject genetically modified host cell is a human cell or is derived from a human cell.
- the present disclosure further provides progeny of a subject genetically modified cell, where the progeny can comprise the same exogenous nucleic acid or polypeptide as the subject genetically modified cell from which it was derived.
- the present disclosure further provides a composition comprising a subject genetically modified host cell.
- a genetically modified host cell has been genetically modified with an exogenous nucleic acid comprising a nucleotide sequence encoding a subject Cas9 heterodimer (e.g., a circular permuted Cas9 heterodimer; a Cas9 heterodimer having a dimer pair; comprising WT Cas9 sequences; a variant Cas9, having mutations relative to a WT Cas9, such as a Cas9 with reduced nuclease activity and/or a heterologous sequence that provides an activity such as transcription modulation, methylation, etc. ; and the like).
- a subject Cas9 heterodimer e.g., a circular permuted Cas9 heterodimer; a Cas9 heterodimer having a dimer pair; comprising WT Cas9 sequences; a variant Cas9, having mutations relative to a WT Cas9, such as a Cas9 with reduced nuclease activity and/or
- a subject genetically modified non-human host cell e.g., a cell that has been genetically modified with an exogenous nucleic acid comprising a nucleotide sequence encoding a subject Cas9 heterodimer e (e.g., a circular permuted Cas9 heterodimer, a Cas9 heterodimer having a dimer pair; etc.) can generate a subject genetically modified non-human organism (e.g., a mouse, a fish, a frog, a fly, a worm, etc.).
- a subject genetically modified non-human organism e.g., a mouse, a fish, a frog, a fly, a worm, etc.
- the genetically modified host cell is a pluripotent stem cell (i.e., PSC) or a germ cell (e.g., sperm, oocyte, etc.)
- a pluripotent stem cell i.e., PSC
- a germ cell e.g., sperm, oocyte, etc.
- an entire genetically modified organism can be derived from the genetically modified host cell.
- the genetically modified host cell is a pluripotent stem cell (e.g., ESC, iPSC, pluripotent plant stem cell, etc.) or a germ cell (e.g., sperm cell, oocyte, etc.), either in vivo or in vitro, that can give rise to a genetically modified organism.
- the genetically modified host cell is a vertebrate PSC (e.g., ESC, iPSC, etc.) and is used to generate a genetically modified organism (e.g. by injecting a PSC into a blastocyst to produce a chimeric/mosaic animal, which could then be mated to generate non-chimeric/non-mosaic genetically modified organisms; grafting in the case of plants; etc.).
- a vertebrate PSC e.g., ESC, iPSC, etc.
- a genetically modified organism e.g. by injecting a PSC into a blastocyst to produce a chimeric/mosaic animal, which could then be mated to generate non-chimeric/non-mosaic genetically modified organisms; grafting in the case of plants; etc.
- Any convenient method/protocol for producing a genetically modified organism is suitable for producing a genetically modified host cell comprising an exogenous nucleic acid comprising a nucleotide sequence encoding a subject Cas9 heterodimer e (e.g., a circular permuted Cas9 heterodimer, a Cas9 heterodimer having a dimer pair; etc.).
- Methods of producing genetically modified organisms are known in the art. For example, see Cho et al., Curr Protoc Cell Biol. 2009 Mar;Chapter 19:Unit 19.11: Generation of transgenic mice; Gama et al., Brain Struct Funct. 2010 Mar;214(2-3):91-109. Epub 2009 Nov 25: Animal transgenesis: an overview; Husaini et al., GM Crops. 2011 Jun-Dec;2(3): 150-62. Epub 2011 Jun 1: Approaches for gene targeting and targeted gene expression in plants.
- a genetically modified organism comprises a target cell for methods of the invention, and thus can be considered a source for target cells.
- a genetically modified cell comprising one or more exogenous nucleic acids comprising nucleotide sequences encoding the two polypeptides of a subject Cas9 heterodimer
- the cells of the genetically modified organism comprise the one or more exogenous nucleic acids comprising nucleotide sequences encoding the two polypeptides of a subject Cas9 heterodimer.
- the DNA of a cell or cells of the genetically modified organism can be targeted for modification by introducing into the cell or cells a Cas9 guide RNA (e.g., a truncated Cas9 guide RNA) (or a nucleic acid encoding the Cas9 guide RNA) and in some cases a dimerizer (e.g., and/or in some cases a donor polynucleotide).
- a Cas9 guide RNA e.g., a truncated Cas9 guide RNA
- a dimerizer e.g., and/or in some cases a donor polynucleotide
- a Cas9 guide RNA or a DNA encoding the same
- a subset of cells e.g., brain cells, intestinal cells, kidney cells, lung cells, blood cells, etc.
- the introduction of a Cas9 guide RNA (or a DNA encoding the same) into a subset of cells (e.g., brain cells, intestinal cells, kidney cells, lung cells, blood cells, etc.) of the genetically modified organism can target the DNA of such cells for modification, the genomic location of which will depend on the targeting sequence of the introduced Cas9 guide RNA.
- a genetically modified organism is a source of target cells for methods of the invention.
- a genetically modified organism comprising cells that are genetically modified with an exogenous nucleic acid comprising a nucleotide sequence encoding a subject Cas9 heterodimer can provide a source of genetically modified cells, for example PSCs (e.g., ESCs, iPSCs, sperm, oocytes, etc.), neurons, progenitor cells, cardiomyocytes, etc.
- PSCs e.g., ESCs, iPSCs, sperm, oocytes, etc.
- neurons e.g., progenitor cells, cardiomyocytes, etc.
- a genetically modified cell is a PSC comprising an exogenous nucleic acid comprising a nucleotide sequence encoding a subject Cas9 heterodimer.
- the PSC can be a target cell such that the DNA of the PSC can be targeted for modification by introducing into the PSC a Cas9 guide RNA (e.g., a truncated Cas9 guide RNA) (or a nucleic acid encoding the Cas9 guide RNA) and in some cases a dimerizer (e.g., light, a dimerizing agent, etc.), and optionally a donor nucleic acid (donor polynucleotide), and the genomic location of the modification will depend on the targeting sequence of the introduced Cas9 guide RNA.
- Cas9 guide RNA e.g., a truncated Cas9 guide RNA
- a dimerizer e.g., light, a dimerizing agent, etc.
- donor nucleic acid donor poly
- the methods described herein can be used to modify the DNA (e.g., delete and/or replace any desired genomic location) of PSCs derived from a subject genetically modified organism.
- modified PSCs can then be used to generate organisms having both (i) an exogenous nucleic acid comprising a nucleotide sequence encoding a subject Cas9 heterodimer and (ii) a DNA modification that was introduced into the PSC.
- An exogenous nucleic acid comprising a nucleotide sequence encoding a subject Cas9
- heterodimer e.g., comprising WT Cas9 sequences; a variant Cas9, having mutations relative to a WT Cas9, such as a Cas9 with reduced nuclease activity and/or a heterologous sequence that provides an activity such as transcription modulation, methylation, etc.
- WT Cas9 sequences e.g., comprising WT Cas9 sequences; a variant Cas9, having mutations relative to a WT Cas9, such as a Cas9 with reduced nuclease activity and/or a heterologous sequence that provides an activity such as transcription modulation, methylation, etc.
- an unknown promoter e.g., when the nucleic acid randomly integrates into a host cell genome
- a known promoter e.g., when the nucleic acid randomly integrates into a host cell genome
- Suitable known promoters can be any known promoter and include constitutively active promoters (e.g., CMV promoter), inducible promoters (e.g., heat shock promoter, Tetracycline-regulated promoter, Steroid-regulated promoter, Metal-regulated promoter, estrogen receptor-regulated promoter, etc.), spatially restricted and/or temporally restricted promoters (e.g., a tissue specific promoter, a cell type specific promoter, etc.), etc.
- constitutively active promoters e.g., CMV promoter
- inducible promoters e.g., heat shock promoter, Tetracycline-regulated promoter, Steroid-regulated promoter, Metal-regulated promoter, estrogen receptor-regulated promoter, etc.
- spatially restricted and/or temporally restricted promoters e.g., a tissue specific promoter, a cell type specific promoter, etc.
- a subject genetically modified non-human organism can be any organism other than a human, including for example, a plant; algae; an invertebrate (e.g., a cnidarian, an echinoderm, a worm, a fly, etc.); a vertebrate (e.g., a fish (e.g., zebrafish, puffer fish, gold fish, etc.), an amphibian (e.g., salamander, frog, etc.), a reptile, a bird, a mammal, etc.); an ungulate (e.g., a goat, a pig, a sheep, a cow, etc.); a rodent (e.g., a mouse, a rat, a hamster, a guinea pig); a lagomorpha (e.g., a rabbit); etc.
- an invertebrate e.g., a cnidarian, an echinoderm
- a subject nucleic acid e.g., one or more nucleic acids comprising nucleotide sequences encoding a subject Cas9 heterodimer, e.g., a circular permuted Cas9 heterodimer; a Cas9 heterodimer having a dimer pair; comprising WT Cas9 sequences; a variant Cas9, having mutations relative to a WT Cas9, such as a Cas9 with reduced nuclease activity and/or a heterologous sequence that provides an activity such as transcription modulation, methylation, etc.
- a subject nucleic acid e.g., one or more nucleic acids comprising nucleotide sequences encoding a subject Cas9 heterodimer, e.g., a circular permuted Cas9 heterodimer; a Cas9 heterodimer having a dimer pair; comprising WT Cas9 sequences; a variant Cas9, having mutations relative to
- the present disclosure further provides a transgenic non-human animal, which animal comprises a transgene comprising a subject nucleic acid comprising a nucleotide sequence encoding a Cas9 heterodimer (e.g., one ore more nucleic acids comprising nucleotide sequences encoding a Cas9 heterodimer).
- the genome of the transgenic non-human animal comprises a subject nucleotide sequence encoding a Cas9 heterodimer.
- the transgenic non-human animal is homozygous for the genetic modification.
- the transgenic non-human animal is heterozygous for the genetic modification.
- the transgenic non-human animal is a vertebrate, for example, a fish (e.g., zebra fish, gold fish, puffer fish, cave fish, etc.), an amphibian (frog, salamander, etc.), a bird (e.g., chicken, turkey, etc.), a reptile (e.g., snake, lizard, etc.), a mammal (e.g., an ungulate, e.g., a pig, a cow, a goat, a sheep, etc.; a lagomorph (e.g., a rabbit); a rodent (e.g., a rat, a mouse); a non-human primate; etc.), etc.
- a fish e.g., zebra fish, gold fish, puffer fish, cave fish, etc.
- an amphibian frog, salamander, etc.
- a bird e.
- Nucleotide sequences encoding a Cas9 heterodimer can be under the control of (i.e., operably linked to) an unknown promoter (e.g., when the nucleic acid randomly integrates into a host cell genome) or can be under the control of (i.e., operably linked to) a known promoter.
- an unknown promoter e.g., when the nucleic acid randomly integrates into a host cell genome
- a known promoter e.g., when the nucleic acid randomly integrates into a host cell genome
- Suitable known promoters can be any known promoter and include constitutively active promoters (e.g., CMV promoter), inducible promoters (e.g., heat shock promoter, Tetracycline-regulated promoter, Steroid-regulated promoter, Metal-regulated promoter, estrogen receptor-regulated promoter, etc.), spatially restricted and/or temporally restricted promoters (e.g., a tissue specific promoter, a cell type specific promoter, etc.), etc.
- constitutively active promoters e.g., CMV promoter
- inducible promoters e.g., heat shock promoter, Tetracycline-regulated promoter, Steroid-regulated promoter, Metal-regulated promoter, estrogen receptor-regulated promoter, etc.
- spatially restricted and/or temporally restricted promoters e.g., a tissue specific promoter, a cell type specific promoter, etc.
- a subject nucleic acid e.g., one or more nucleic acids comprising nucleotide sequences encoding a subject Cas9 heterodimer, e.g., a circular permuted Cas9 heterodimer; a Cas9 heterodimer having a dimer pair; comprising WT Cas9 sequences; a variant Cas9, having mutations relative to a WT Cas9, such as a Cas9 with reduced nuclease activity and/or a heterologous sequence that provides an activity such as transcription modulation, methylation, etc.
- a subject nucleic acid e.g., one or more nucleic acids comprising nucleotide sequences encoding a subject Cas9 heterodimer, e.g., a circular permuted Cas9 heterodimer; a Cas9 heterodimer having a dimer pair; comprising WT Cas9 sequences; a variant Cas9, having mutations relative to
- the present disclosure further provides a transgenic plant, which plant comprises a transgene comprising a subject nucleic acid comprising a nucleotide sequence encoding a Cas9 heterodimer (e.g., one or more nucleic acids comprising nucleotide sequences encoding a Cas9 heterodimer).
- the genome of the transgenic plant comprises a subject nucleic acid.
- the transgenic plant is homozygous for the genetic modification. In some embodiments, the transgenic plant is heterozygous for the genetic modification.
- Methods of introducing exogenous nucleic acids into plant cells are well known in the art. Such plant cells are considered “transformed,” as defined above. Suitable methods include viral infection (such as double stranded DNA viruses), transfection, conjugation, protoplast fusion, electroporation, particle gun technology, calcium phosphate precipitation, direct microinjection, silicon carbide whiskers technology, Agrobacterium-mediated transformation and the like. The choice of method is generally dependent on the type of cell being transformed and the circumstances under which the transformation is taking place (i.e. in vitro, ex vivo, or in vivo).
- Transformation methods based upon the soil bacterium Agrobacterium tumefaciens are
- the wild type form of Agrobacterium contains a Ti (tumor-inducing) plasmid that directs production of tumorigenic crown gall growth on host plants. Transfer of the tumor-inducing T-DNA region of the Ti plasmid to a plant genome requires the Ti plasmid-encoded virulence genes as well as T-DNA borders, which are a set of direct DNA repeats that delineate the region to be transferred.
- An Agrobacterium-based vector is a modified form of a Ti plasmid, in which the tumor inducing functions are replaced by the nucleic acid sequence of interest to be introduced into the plant host.
- Agrobacterium-mediated transformation generally employs cointegrate vectors or binary vector systems, in which the components of the Ti plasmid are divided between a helper vector, which resides permanently in the Agrobacterium host and carries the virulence genes, and a shuttle vector, which contains the gene of interest bounded by T-DNA sequences.
- binary vectors are well known in the art and are commercially available, for example, from Clontech (Palo Alto, Calif.).
- Methods of coculturing Agrobacterium with cultured plant cells or wounded tissue such as leaf tissue, root explants, hypocotyledons, stem pieces or tubers, for example, also are well known in the art. See., e.g., Glick and Thompson, (eds.), Methods in Plant Molecular Biology and Biotechnology, Boca Raton, Fla.: CRC Press (1993).
- Microprojectile-mediated transformation also can be used to produce a subject transgenic plant.
- a subject nucleic acid may be introduced into a plant in a manner such that the nucleic acid is able to enter a plant cell(s), e.g., via an in vivo or ex vivo protocol.
- in vivo it is meant in the nucleic acid is administered to a living body of a plant e.g. infiltration.
- ex vivo it is meant that cells or explants are modified outside of the plant, and then such cells or organs are regenerated to a plant.
- non-Ti vectors can be used to transfer the DNA into plants and cells by using free DNA delivery techniques.
- transgenic plants such as wheat, rice (Christou (1991) Bio/Technology 9:957-9 and 4462) and corn (Gordon-Kamm (1990) Plant Cell 2: 603-618) can be produced.
- An immature embryo can also be a good target tissue for monocots for direct DNA delivery techniques by using the particle gun (Weeks et al. (1993) Plant Physiol 102: 1077-1084; Vasil (1993)
- Exemplary methods for introduction of DNA into chloroplasts are biolistic bombardment, polyethylene glycol transformation of protoplasts, and microinjection (Danieli et al Nat.
- Any vector suitable for the methods of biolistic bombardment, polyethylene glycol transformation of protoplasts and microinjection will be suitable as a targeting vector for chloroplast transformation.
- Any double stranded DNA vector may be used as a transformation vector, especially when the method of introduction does not utilize Agrobacterium.
- Plants which can be genetically modified include grains, forage crops, fruits, vegetables, oil seed crops, palms, forestry, and vines. Specific examples of plants which can be modified follow: maize, banana, peanut, field peas, sunflower, tomato, canola, tobacco, wheat, barley, oats, potato, soybeans, cotton, carnations, sorghum, lupin and rice.
- transformed plant cells, tissues, plants and products that contain the transformed plant cells.
- a feature of the subject transformed cells, and tissues and products that include the same is the presence of a subject nucleic acid integrated into the genome, and production by plant cells of a heterodimeric Cas9.
- Recombinant plant cells of the present invention are useful as populations of recombinant cells, or as a tissue, seed, whole plant, stem, fruit, leaf, root, flower, stem, tuber, grain, animal feed, a field of plants, and the like.
- Nucleotide sequences encoding a heterodimeric Cas9 can be under the control of (i.e., operably linked to) an unknown promoter (e.g., when the nucleic acid randomly integrates into a host cell genome) or can be under the control of (i.e., operably linked to) a known promoter.
- Suitable known promoters can be any known promoter and include constitutively active promoters, inducible promoters, spatially restricted and/or temporally restricted promoters, etc.
- the present disclosure provides a system and/or kit comprising: a) a Cas9 heterodimer of the present disclosure; and b) a Cas9 guide RNA (e.g., a Cas9 guide RNA that comprises stem loop 1 but does not comprise at least one of: a stem loop 2 and a stem loop 3).
- the Cas9 guide RNA does not by itself induce heterodimerization of the Cas9 heterodimer; instead, dimerization of the first fusion polypeptide and the second fusion polypeptide of the Cas9 heterodimer is induced when the first fusion partner and the second fusion partner dimerize with one another.
- the first fusion partner and the second fusion partner dimerize in the presence of a small molecule.
- the first fusion partner and the second fusion partner dimerize in the presence of light of a particular wavelength range.
- a system and/or kit of the present disclosure comprises a Cas9 guide RNA that comprises stem loop 1 but does not comprise at least one of: a stem loop 2 and a stem loop 3. In some cases, the Cas9 guide RNA does not comprise a stem loop 2 and a does not comprise stem loop 3. In some cases, a system and/or kit of the present disclosure comprises a small molecule dimerizer that induces dimerization of the first fusion polypeptide and the second fusion polypeptide. Small molecule dimerizers (also referred to herein as "small molecule dimerizing agents") are described elsewhere herein. In some cases, a system and/or kit of the present disclosure comprises a PAMmer (described in more detail below). In some cases, a system and/or kit of the present disclosure comprises a Donor polynucleotide (described in more detail below).
- Components of a subject kit can be in separate containers; or can be combined in a single
- kits described herein can further include one or more additional reagents, where such additional reagents can be selected from: a dilution buffer; a reconstitution solution; a wash buffer; a control reagent; a control expression vector or RNA polynucleotide; a reagent for in vitro production of a Cas9 heterodimer from DNA, and the like.
- a subject kit can further include instructions for using the components of the kit to practice the subject methods.
- the instructions for practicing the subject methods are generally recorded on a suitable recording medium.
- the instructions may be printed on a substrate, such as paper or plastic, etc.
- the instructions may be present in the kits as a package insert, in the labeling of the container of the kit or components thereof (i.e., associated with the packaging or subpackaging) etc.
- the instructions are present as an electronic storage data file present on a suitable computer readable storage medium, e.g. CD-ROM, diskette, flash drive, etc.
- the actual instructions are not present in the kit, but means for obtaining the instructions from a remote source, e.g. via the internet, are provided.
- An example of this embodiment is a kit that includes a web address where the instructions can be viewed and/or from which the instructions can be downloaded. As with the instructions, this means for obtaining the instructions is recorded on a suitable substrate.
- a nucleic acid molecule that binds to a Cas9 protein (and therefore also binds to a subject Cas9 heterodimer) and targets the Cas9 protein to a specific location within the target nucleic acid is referred to herein as a "guide nucleic acid” or "Cas9 guide RNA.”
- a guide nucleic acid is RNA, and in some cases, can be a hybrid nucleic acid that includes both
- guide RNA e.g., the terms “Cas9 guide RNA”, “truncated guide RNA”, “guide RNA”, and such
- guide RNAs and guide nucleic acids that include components/regions/sections other than RNA (e.g., deoxyribonucleotide regions; modified nucleotides such as base modifications, sugar modifications, nucleotide linkage modifications, and the like; etc).
- Cas9 guide RNA is herein used to refer to a guide RNA (and to modified guide RNAs having deoxyribonucleotides and/or other modifications) that interacts with a Cas9 protein (e.g., a Cas9 heterodimer) and targets the protein to a particular location (the target sequence) within a target nucleic acid.
- a subject Cas9 guide RNA comprises two segments, a first segment (referred to herein as a
- targeting segment and a second segment (referred to herein as a “protein-binding segment”).
- segment it is meant a segment/section/region of a molecule, e.g., a contiguous stretch of nucleotides in a nucleic acid molecule.
- a segment can also mean a region/section of a complex such that a segment may comprise regions of more than one molecule.
- the first segment (targeting segment) of a Cas9 guide RNA comprises a nucleotide sequence that is complementary to a specific sequence (a target site) within a target nucleic acid (e.g, a target ssRNA, a target ssDNA, the complementary strand of a double stranded target DNA, etc.).
- a target nucleic acid e.g, a target ssRNA, a target ssDNA, the complementary strand of a double stranded target DNA, etc.
- the protein-binding segment (or "protein-binding sequence") interacts with a Cas9 polypeptide. Site-specific binding and/or cleavage of the target nucleic acid can occur at locations determined by base-pairing complementarity between the Cas9 guide RNA and the target nucleic acid.
- the protein-binding segment of a subject Cas9 guide RNA comprises two complementary stretches of nucleotides that hybridize to one another to form a double stranded RNA duplex (dsRNA duplex).
- a subject Cas9 guide RNA and a subject Cas9 heterodimer form a complex (i.e., bind via non- covalent interactions).
- the Cas9 guide RNA provides target specificity to the complex by comprising a nucleotide sequence that is complementary to a sequence of a target nucleic acid.
- the Cas9 heterodimer of the complex provides the site-specific activity.
- the Cas9 heterodimer is guided to a target nucleic acid sequence (e.g. a target sequence in a chromosomal nucleic acid; a target sequence in an extrachromosomal nucleic acid, e.g.
- the targeting sequence (the targeting segment) of a Cas9 guide RNA can be modified so that the Cas9 guide RNA can target a Cas9 heterodimer to any desired sequence of any desired target nucleic acid, with the exception (as is known in the art) that the PAM sequence is taken into account.
- a Cas9 guide RNA can have a targeting segment with a sequence that can hybridize to a sequence in a eukaryotic nucleic acid (e.g., a eukaryotic chromosome, chromosomal sequence, a eukaryotic RNA, etc.).
- a subject Cas9 guide RNA comprises two separate nucleic acid
- an "activator” and a “targeter” and is referred to herein as a “dual Cas9 guide RNA", a “double-molecule Cas9 guide RNA”, or a “two-molecule Cas9 guide RNA” a “dual guide RNA”, or a “dgRNA.”
- the activator and targeter are covalently linked to one another (e.g., via intervening nucleotides) and the resulting guide RNA is referred to as a "single guide RNA", a “single Cas9 guide RNA”, a “single-molecule Cas9 guide RNA,” or a “one-molecule Cas9 guide RNA”, or simply "sgRNA.”
- a Cas9 guide RNA is a DNA/RNA hybrid molecule.
- the protein-binding segment of the Cas9 guide RNA is RNA and forms an RNA duplex.
- the targeting segment of a Cas9 guide RNA can be DNA.
- the "targeter" molecule if a DNA/RNA hybrid guide nucleic acid is a dual guide nucleic acid, the "targeter" molecule and be a hybrid molecule (e.g, the targeting segment can be DNA and the duplex-forming segment can be RNA).
- the duplex-forming segment of the "activator” molecule can be RNA (e.g., in order to form an RNA-duplex with the duplex-forming segment of the targeter molecule), while nucleotides of the "activator” molecule that are outside of the duplex-forming segment can be DNA (in which case the activator molecule is a hybrid DNA/RNA molecule) or can be RNA (in which case the activator molecule is RNA).
- the targeting segment can be DNA
- the duplex-forming segments (which make up the protein-binding segment) can be RNA
- nucleotides outside of the targeting and duplex-forming segments can be RNA or DNA.
- targeter can also be referred to as a “targeter RNA” (even though in some cases a targeter RNA can have deoxyribonucleotides and/or other modifications) and the "activator” can be referred to as an "ativator RNA” (even though in some cases a targeter RNA can have deoxyribonucleotides and/or other
- An example dual Cas9 guide RNA comprises a crRNA-like (“CRISPR RNA” / "targeter” / "crRNA” / “crRNA repeat”) molecule and a corresponding tracrRNA-like (“trans-acting CRISPR RNA” / "activator” / "tracrRNA”) molecule.
- a crRNA-like molecule comprises both the targeting segment (single stranded) of the guide nucleic acid and a stretch ("duplex-forming segment") of nucleotides that forms one half of the dsRNA duplex of the protein-binding segment of the Cas9 guide RNA.
- a corresponding tracrRNA-like molecule comprises a stretch of nucleotides (duplex-forming segment) that forms the other half of the dsRNA duplex of the protein-binding segment of the guide nucleic acid.
- a stretch of nucleotides of a crRNA-like molecule are complementary to and hybridize with a stretch of nucleotides of a tracrRNA-like molecule to form the dsRNA duplex of the protein-binding domain of the Cas9 guide RNA.
- each targeter molecule can be said to have a corresponding activator molecule (which has a region that hybridizes with the targeter).
- the targeter molecule additionally provides the single stranded targeting segment.
- a targeter and an activator molecule hybridize to form a dual Cas9 guide RNA.
- the exact sequence of a given crRNA or tracrRNA molecule is characteristic of the species in which the RNA molecules are found.
- a subject dual Cas9 guide RNA can include any corresponding activator and targeter pair.
- activator is used herein to mean a tracrRNA-like molecule (tracrRNA : "transacting CRISPR RNA") of a Cas9 dual guide RNA (and of a Cas9 single guide RNA when the "activator” and the “targeter” are linked together by, e.g., intervening nucleotides).
- a Cas9 guide RNA dgRNA or sgRNA
- comprises an activator sequence e.g., a tracrRNA sequence.
- a tracr molecule is a naturally existing molecule that hybridizes with a CRISPR RNA molecule (a crRNA) to form a Cas9 dual guide RNA.
- the term "activator" is used herein to encompass naturally existing tracrRNAs, but also to encompass tracrRNAs with modifications (e.g, truncations, sequence variations, base modifications, backbone modifications, linkage modifications, etc.) where the activator retains at least one function of a tracrRNA (e.g., contributes to the dsRNA duplex to which Cas9 binds).
- the activator provides one or more stem loops that can interact with Cas9; in some cases, the activator contributes to the dsRNA duplex to which Cas9 binds (the dsRNA duplex formed from the hybridization of the targeter and the activator) and contributes stem loop 1 ; in some cases, contributes to the dsRNA duplex to which Cas9 binds (the dsRNA duplex formed from the hybridization of the targeter and the activator) and contributes stem loops land 2; in some cases, contributes to the dsRNA duplex to which Cas9 binds (the dsRNA duplex formed from the hybridization of the targeter and the activator) and contributes stem loops land 3; in some cases, contributes to the dsRNA duplex to which Cas9 binds (the dsRNA duplex formed from the hybridization of the targeter and the activator) and contributes stem loops 1, 2, and 3; etc.).
- an activator can be referred to as having a
- targeter is used herein to refer to a crRNA-like molecule (crRNA : "CRISPR
- RNA of a Cas9 dual guide RNA (and of a Cas9 single guide RNA when the "activator” and the “targeter” are linked together, e.g., by intervening nucleotides).
- a Cas9 guide RNA comprises a targeting segment (single stranded) (which comprises nucleotides that hybridize with a target nucleic acid, and a duplex-forming segment (e.g., a duplex forming segment of a crRNA, which can also be referred to as a crRNA repeat).
- the sequence of a targeting segment (the segment that hybridizes with a target sequence of a target nucleic acid) of a targeter is modified by a user to hybridize with a desired target nucleic acid, the sequence of a targeter will often be a non-naturally occurring sequence.
- the duplex-forming segment of a targeter (described in more detail below), which hybridizes with the duplex-forming segment of an activator, can include a naturally existing sequence (e.g., can include the sequence of a duplex-forming segment of a naturally existing crRNA, which can also be referred to as a crRNA repeat).
- the term targeter is used herein to distinguish from naturally occurring crRNAs, despite the fact that part of a targeter (e.g., the duplex-forming segment) can include a naturally occurring sequence from a crRNA.
- targeter encompasses naturally occurring crRNAs.
- duplex-forming segment is used herein to refer to the stretch of nucleotides of an activator or a targeter that contributes to the formation of the dsRNA duplex by hybridizing to a stretch of nucleotides of a corresponding activator or targeter.
- an activator comprises a duplex-forming segment that is complementary to the duplex-forming segment of the corresponding targeter.
- an activator comprises a duplex-forming segment while a targeter comprises both a duplex-forming segment and the targeting segment of the Cas9 guide RNA (sgRNA or dgRNA).
- a subject Cas9 single guide RNA comprises an "activator” and a “targeter” where the "activator” and the “targeter” are covalently linked (e.g., by intervening nucleotides).
- a subject Cas9 dual guide RNA comprises an "activator” and a “targeter” where the "activator” and the “targeter” are not covalently linked by intervening nucleotides.
- a Cas9 guide RNA can also be said to include 3 parts: (i) a targeting sequence (a nucleotide sequence that hybridizes with a sequence of the target nucleic acid); (ii) an activator sequence (as described above)(in some cases, referred to as a tracr sequence); and (iii) a sequence that hybridizes to at least a portion of the activator sequence to form a double stranded duplex.
- a targeter has (i) and (iii); while an activator has (ii).
- a Cas9 guide RNA (e.g. a dual guide RNA or a single guide RNA) can be comprised of any corresponding activator and targeter pair.
- Non-limiting examples of nucleotide sequences that can be included in a Cas9 guide RNA (dgRNA or sgRNA) include sequences set forth in SEQ ID NOs:431-679 and 1535-1544, or complements thereof.
- sequences from SEQ ID NOs: 431-562 and 1535-1544 (which are from tracrRNAs) or complements thereof, can pair with sequences from SEQ ID NOs: 563-679 (which are from crRNAs), or complements thereof , to form a dsRNA duplex of a protein binding segment.
- a targeter comprises both the targeting segment (single stranded) of the Cas9 guide RNA and a stretch ("duplex-forming segment") of nucleotides that forms one half of the dsRNA duplex of the protein-binding segment of the Cas9 guide RNA.
- a corresponding tracrRNA-like molecule comprises a stretch of nucleotides (a duplex-forming segment) that forms the other half of the dsRNA duplex of the protein-binding segment of the
- each targeter can be said to have a
- a targeter and an activator hybridize to form a Cas9 guide RNA.
- the exact sequence of a given naturally existing crRNA or tracrRNA molecule is characteristic of the species in which the RNA molecules are found. Examples of suitable activator and targeter sequences include, but are not limited to, those set forth in SEQ ID NOs: 431-679 and 1535-1544.
- a subject Cas9 guide RNA (dgRNA or sgRNA) can include any corresponding activator and targeter sequence pair.
- the first segment of a subject guide nucleic acid comprises a nucleotide sequence that can be complementary to a sequence (a target site) in a target nucleic acid.
- the targeting segment of a subject guide nucleic acid can interact with a target nucleic acid (e.g., a single stranded RNA (ssRNA) and/or a single stranded DNA (ssDNA)) in a sequence-specific manner via hybridization (i.e., base pairing).
- a target nucleic acid e.g., a single stranded RNA (ssRNA) and/or a single stranded DNA (ssDNA)
- ssDNA single stranded DNA
- the nucleotide sequence of the targeting segment may vary and can determine the location within the target nucleic acid that the guide nucleic acid and the target nucleic acid will interact.
- the targeting segment of a Cas9 guide RNA can be modified (e.g., by genetic engineering) /designed to hybridize to any desired sequence (target site) within a target nucleic acid (e.g., a eukaryotic target nucleic acid).
- a target nucleic acid e.g., a eukaryotic target nucleic acid
- the targeting segment can have a length of from about 12 nucleotides to about 100 nucleotides.
- the targeting segment can have a length of from about 12 nucleotides (nt) to about 80 nt, from about 12 nt to about 50nt, from about 12 nt to about 40 nt, from about 12 nt to about 30 nt, from about 12 nt to about 25 nt, from about 12 nt to about 20 nt, or from about 12 nt to about 19 nt.
- the targeting segment can have a length of from about 19 nt to about 20 nt, from about 19 nt to about 25 nt, from about 19 nt to about 30 nt, from about 19 nt to about 35 nt, from about 19 nt to about 40 nt, from about 19 nt to about 45 nt, from about 19 nt to about 50 nt, from about 19 nt to about 60 nt, from about 19 nt to about 70 nt, from about 19 nt to about 80 nt, from about 19 nt to about 90 nt, from about 19 nt to about 100 nt, from about 20 nt to about 25 nt, from about 20 nt to about 30 nt, from about 20 nt to about 35 nt, from about 20 nt to about 40 nt, from about 20 nt to about 45 nt, from about 20 nt to about 50 nt, from about 20 nt to about 60
- the complementary to a nucleotide sequence (target site) of the target nucleic acid can have a length of 12 nt or more.
- the targeting sequence of the targeting segment that is complementary to a target site of the target nucleic acid can have a length of 12 nt or more, 15 nt or more, 18 nt or more, 19 nt or more, 20 nt or more, 25 nt or more, 30 nt or more, 35 nt or more or 40 nt.
- the targeting sequence of the targeting segment that is complementary to a target sequence of the target nucleic acid can have a length of from about 12 nucleotides (nt) to about 80 nt, from about 12 nt to about 50nt, from about 12 nt to about 45 nt, from about 12 nt to about 40 nt, from about 12 nt to about 35 nt, from about 12 nt to about 30 nt, from about 12 nt to about 25 nt, from about 12 nt to about 20 nt, from about 12 nt to about 19 nt, from about 19 nt to about 20 nt, from about 19 nt to about 25 nt, from about 19 nt to about 30 nt, from about 19 nt to about 35 nt, from about 19 nt to about 40 nt, from about 19 nt to about 45 nt, from about 19 nt to about 50 nt, from about 19 nt to about 60 n
- the targeting sequence of the targeting segment that is complementary to a target site of the target nucleic acid is 20 nucleotides in length. In some cases, the targeting sequence of the targeting segment that is complementary to a target site of the target nucleic acid is 19 nucleotides in length.
- the targeting segment can have a length of 7 or more nucleotides (nt) (e.g., 8 or more, 9 or more, 10 or more, 12 or more, 15 or more, 20 or more, 25 or more, 30 or more, or 40 or more nucleotides).
- nt nucleotides
- the targeting segment can have a length of from 7 to 100 nucleotides (nt) (e.g., from 7 to 80 nt, from 7 to 60 nt, from 7 to 40 nt, from 7 to 30 nt, from 7 to 25 nt, from 7 to 22 nt, from 7 to 20 nt, from 7 to 18 nt, from 8 to 80 nt, from 8 to 60 nt, from 8 to 40 nt, from 8 to 30 nt, from 8 to 25 nt, from 8 to 22 nt, from 8 to 20 nt, from 8 to 18 nt, from 10 to 100 nt, from 10 to 80 nt, from 10 to 60 nt, from 10 to 40 nt, from 10 to 30 nt, from 10 to 25 nt, from 10 to 22 nt, from 10 to 20 nt, from 10 to 18 nt, from 12 to 100 nt, from 12 to 80 nt, from 12 to 60 nt
- the nucleotide sequence (the targeting sequence) of the targeting segment that is complementary to a nucleotide sequence (target site) of the target nucleic acid can have a length of 7 or more nucleotides (nt) (e.g., 8 or more, 9 or more, 10 or more, 12 or more, 15 or more, 20 or more, 25 or more, 30 or more, or 40 or more nucleotides).
- nt nucleotides
- the targeting sequence of the targeting segment that is complementary to a target sequence of the target nucleic acid can have a length of from 7 to 100 nucleotides (nt) (e.g., from 7 to 80 nt, from 7 to 60 nt, from 7 to 40 nt, from 7 to 30 nt, from 7 to 25 nt, from 7 to 22 nt, from 7 to 20 nt, from 7 to 18 nt, from 8 to 80 nt, from 8 to 60 nt, from 8 to 40 nt, from 8 to 30 nt, from 8 to 25 nt, from 8 to 22 nt, from 8 to 20 nt, from 8 to 18 nt, from 10 to 100 nt, from 10 to 80 nt, from 10 to 60 nt, from 10 to 40 nt, from 10 to 30 nt, from 10 to 25 nt, from 10 to 22 nt, from 10 to 20 nt, from 10 to 18 nt, from 12 to 100
- the targeting sequence of the targeting segment that is complementary to a target site of the target nucleic acid is 20 nucleotides in length. In some cases, the targeting sequence of the targeting segment that is complementary to a target site of the target nucleic acid is 19 nucleotides in length.
- the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid can be 60% or more (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%). In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the seven contiguous 5' -most nucleotides of the target site of the target nucleic acid.
- the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 60% or more over about 20 contiguous nucleotides. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the fourteen contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 14 nucleotides in length.
- the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the seven contiguous 5 '-most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder.
- the targeting sequence can be considered to be 20 nucleotides in length.
- the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 7 contiguous 5 '-most nucleotides of the target site of the target nucleic acid (which can be complementary to the 3'- most nucleotides of the targeting sequence of the Cas9 guide RNA). In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 8 contiguous 5 '-most nucleotides of the target site of the target nucleic acid (which can be complementary to the 3 '-most nucleotides of the targeting sequence of the Cas9 guide RNA).
- the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 9 contiguous 5 '-most nucleotides of the target site of the target nucleic acid (which can be complementary to the 3 '-most nucleotides of the targeting sequence of the Cas9 guide RNA). In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 10 contiguous 5 '-most nucleotides of the target site of the target nucleic acid (which can be complementary to the 3'- most nucleotides of the targeting sequence of the Cas9 guide RNA).
- the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 60% or more (e.g., e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over about 20 contiguous nucleotides.
- the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 7 contiguous 5 '-most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 7 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 8 contiguous 5 '-most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 8 nucleotides in length.
- the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 9 contiguous 5 '-most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 9 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 10 contiguous 5 '-most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 10 nucleotides in length.
- the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 11 contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 11 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 12 contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 12 nucleotides in length.
- the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 13 contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 13 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 14 contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 14 nucleotides in length.
- Second segment protein- binding segment
- the protein-binding segment of a subject guide nucleic acid interacts with a Cas9 polypeptide.
- the subject guide nucleic acid guides the bound polypeptide to a specific nucleotide sequence within target nucleic acid via the above mentioned targeting segment.
- the protein-binding segment of a subject guide nucleic acid comprises two stretches of nucleotides that are complementary to one another. The complementary nucleotides of the protein-binding segment hybridize to form a double stranded RNA duplex (dsRNA).
- dsRNA double stranded RNA duplex
- the protein-binding segment of a subject Cas9 guide RNA interacts with a Cas9 protein.
- the Cas9 guide RNA guides the bound Cas9 protein to a specific nucleotide sequence within target nucleic acid via the above mentioned targeting segment.
- the protein-binding segment of a Cas9 guide RNA comprises two stretches of nucleotides that are complementary to one another and hybridize to form a double stranded RNA duplex (dsRNA duplex).
- dsRNA duplex double stranded RNA duplex
- the protein-binding segment includes a dsRNA duplex.
- the protein-binding segment also includes stem loop 1 (the "nexus") of a Cas9 guide RNA (e.g., see Figure 9).
- the activator of a Cas9 guide RNA includes (i) a duplex forming segment that contributes to the dsRNA duplex of the protein-binding segment; and (ii) nucleotides 3' of the duplex forming segment, e.g., that form stem loop 1 (the "nexus”).
- the protein-binding segment includes stem loop 1 (the "nexus") of a Cas9 guide RNA.
- the protein-binding segment includes 5 or more nucleotides (nt) (e.g., 6 or more, 7 or more, 8 or more, 9 or more, 10 or more, 11 or more, 12 or more, 15 or more, 20 or more, 30 or more, 40 or more, 50 or more, 60 or more, 70 or more, 75 or more, or 80 or more nt) 3' of the dsRNA duplex (where 3' is relative to the duplex-forming segment of the activator sequence).
- nt nucleotides
- the dsRNA duplex of the guide RNA (sgRNA or dgRNA) that forms between the activator and targeter is sometimes referred to herein as the "stem loop".
- the activator (activator RNA, tracrRNA) of many naturally existing Cas9 guide RNAs e.g., 5.
- pygogenes guide RNAs has 3 stem loops (3 hairpins) that are 3' of the duplex-forming segment of the activator.
- stem loop 1 The closest stem loop to the duplex-forming segment of the activator (3' of the duplex forming segment) is called “stem loop 1" (and is also referred to herein as the "nexus”); the next stem loop is called “stem loop 2" (and is also referred to herein as the "hairpin 1”); and the next stem loop is called “stem loop 3" (and is also referred to herein as the "hairpin 2").
- stem loop 1 The closest stem loop to the duplex-forming segment of the activator (3' of the duplex forming segment)
- stem loop 2 the next stem loop
- stem loop 3 the next stem loop
- truncated guide RNA refers to a Cas9 guide RNA (single guide or dual guide) that has the nexus ("stem loop 1"), but is missing one or both of stem loops 2 and 3.
- a "truncated guide RNA” is truncated from the 3' end of the activator and can have: (i) stem loop 1 only; (ii) stem loop 1 plus stem loop 2; or (iii) stem loop 1 plus stem loop 3.
- a guide RNA (e.g., some naturally existing guide RNAs) have only one stem loop 3' of the nexus ("stem loop 1") and thus for purposes herein, such guide RNAs are referred to herein as having a nexus ("stem loop 1") and a "stem loop 2/3" (or "hairpin 1/2").
- stem loop 1 a nexus
- stem loop 2/3 a nexus
- hairpin 1/2" or "hairpin 1/2"
- truncated guide RNA refers to a Cas9 guide RNA (single guide or dual guide) that does not include one or both of: stem loop 2 and stem loop 3.
- a Cas9 guide RNA sgRNA or dgRNA
- a Cas9 guide RNA has stem loop 1, but does not have stem loop 2 and does not have stem loop 3.
- a Cas9 guide RNA sgRNA or dgRNA
- a truncated Cas9 guide RNA has stem loop 1 and stem loop 2, but does not have stem loop 3.
- a Cas9 guide RNA (sgRNA or dgRNA) (a truncated Cas9 guide RNA) has stem loop 1 and stem loop 3, but does not have stem loop 2.
- a Cas9 guide RNA (sgRNA or dgRNA) (a truncated Cas9 guide RNA) has stem loop 1, but does not have at least one of: stem loop 2 and stem loop 3.
- a Cas9 guide RNA (sgRNA or dgRNA) (e.g., a full length Cas9 guide RNA) has stem loops 1, 2, and 3.
- a Cas9 guide RNA having stem loops 1 , 2, and 3 e.g., a full length guide RNA
- the nuclease lobe and the alpha-helical lobe of a split Cas9 protein i.e., a Cas9 heterodimer
- complex formation between the nuclease lobe and the alpha-helical lobe does not require an additional dimerization mechanism (e.g., dimerization domains and/or dimerization inducers).
- the Cas9 heterodimer does not assemble (e..g, in the absence of a dimerizer such as light or a dimerizing agent).
- a Cas9 heterdimer having a dimerizer pair as fusion partners (e.g., in which the nuclease lobe includes a first member of a dimer pair and the alpha-helical lobe includes a second member the dimer pair), in the presence of a truncated Cas9 guide RNA (a Cas9 guide RNA that does not include stem loops 2 or 3 but does include stem loop 1 ; a guide RNA that does not include stem loop 2 but does include stem loops 1 and 3; and/or a guide RNA that does not include stem loop 3 but does include stem loops 1 and 2).
- a Cas9 guide RNA that does not include stem loops 2 or 3 but does include stem loop 1 a guide RNA that does not include stem loop 2 but does include stem loops 1 and 3
- a guide RNA that does not include stem loop 3 but does include stem loops 1 and 2 complex formation of the nuclease lobe and the alpha-helical lobe to form
- an activator of a Cas9 guide RNA
- an activator of a Cas9 guide RNA
- an activator of a Cas9 guide RNA
- an activator of a Cas9 guide RNA
- an activator of a Cas9 guide RNA
- an activator (of a Cas9 guide RNA) has stem loop 1, but does not have at least one of: stem loop 2 and stem loop 3.
- the activator (e.g., tracr sequence) of a Cas9 guide RNA includes (i) a duplex forming segment that contributes to the dsRNA duplex of the protein- binding segment; and (ii) nucleotides 3' of the duplex forming segment (and therefore the Cas9 guide RNA includes (ii)).
- the additional nucleotides 3' of the duplex forming segment form stem loop 1.
- the activator (e.g., tracr sequence) of a Cas9 guide RNA includes (i) a duplex forming segment that contributes to the dsRNA duplex of the protein-binding segment; and (ii) 5 or more nucleotides (e.g., 6 or more, 7 or more, 8 or more, 9 or more, 10 or more, 11 or more, 12 or more, 13 or more, 14 or more, 15 or more, 20 or more, 25 or more, 30 or more, 35 or more, 40 or more, 45 or more, 50 or more, 60 or more, 70 or more, or 75 or more nucleotides) 3' of the duplex forming segment (and therefore the Cas9 guide RNA includes (ii)).
- dgRNA or sgRNA includes (i) a duplex forming segment that contributes to the dsRNA duplex of the protein-binding segment; and (ii) 5 or more nucleotides (e.g., 6 or more, 7 or more, 8 or more
- the activator of a Cas9 guide RNA includes (i) a duplex forming segment that contributes to the dsRNA duplex of the protein- binding segment; and (ii) 5 or more nucleotides (e.g., 6 or more, 7 or more, 8 or more, 9 or more, 10 or more, 11 or more, 12 or more, 13 or more, 14 or more, 15 or more, 20 or more, 25 or more, 30 or more, 35 or more, 40 or more, 45 or more, 50 or more, 60 or more, 70 or more, or 75 or more nucleotides) 3' of the duplex forming segment (and therefore the Cas9 guide RNA includes ( ⁇ )) ⁇
- the activator (e.g., tracr sequence) of a Cas9 guide RNA includes (i) a duplex forming segment that contributes to the dsRNA duplex of the protein- binding segment; and (ii) a stretch of nucleotides (e.g., referred to herein as a 3' tail) 3' of the duplex forming segment (and therefore the Cas9 guide RNA includes (ii)).
- the stretch of nucleotides 3' of the duplex forming segment has a length in a range of from 5 to 200 nucleotides (nt) (e.g., from 5 to 150 nt, from 5 to 130 nt, from 5 to 120 nt, from 5 to 100 nt, from 5 to 80 nt, from 10 to 200 nt, from 10 to 150 nt, from 10 to 130 nt, from 10 to 120 nt, from 10 to 100 nt, from 10 to 80 nt, from 12 to 200 nt, from 12 to 150 nt, from 12 to 130 nt, from 12 to 120 nt, from 12 to 100 nt, from 12 to 80 nt, from 15 to 200 nt, from 15 to 150 nt, from 15 to 130 nt, from 15 to 120 nt, from 15 to 100 nt, from 15 to 80 nt, from 20 to 200 nt, from 20 to 150 nt, from 20 to 130 n
- the duplex-forming segment of the activator is 60% or more identical to one of the activator (tracrRNA) molecules set forth in SEQ ID NOs:431-562, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- tracrRNA activator
- the duplex-forming segment of the activator (or the DNA encoding the duplex-forming segment of the activator) can be 65% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- 8 or more contiguous nucleotides e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides.
- the duplex-forming segment of the activator (or the DNA encoding the duplex-forming segment of the activator) can be 70% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- the duplex-forming segment of the activator (or the DNA encoding the duplex-forming segment of the activator) can be 75% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, and 1535- 1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contig
- the duplex-forming segment of the activator (or the DNA encoding the duplex-forming segment of the activator) can be 80% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- the duplex-forming segment of the activator (or the DNA encoding the duplex-forming segment of the activator) can be 85% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contig
- the duplex-forming segment of the activator (or the DNA encoding the duplex-forming segment of the activator) can be 95% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- the duplex-forming segment of the activator (or the DNA encoding the duplex-forming segment of the activator) can be 98% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contig
- the duplex-forming segment of the activator (or the DNA encoding the duplex-forming segment of the activator) can be 100% identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- 8 or more contiguous nucleotides e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides.
- the duplex-forming segment of the targeter is 60% or more identical to one of the targeter (crRNA) sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 contig or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- crRNA targeter
- the duplex- forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 65% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- 8 or more contiguous nucleotides e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides.
- the duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 70% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- 8 or more contiguous nucleotides e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides.
- the duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 75% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- 8 or more contiguous nucleotides e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides.
- the duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 80% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- 8 or more contiguous nucleotides e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides.
- the duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 85% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- 8 or more contiguous nucleotides e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides.
- the duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 90% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- 8 or more contiguous nucleotides e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides.
- the duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 95% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- 8 or more contiguous nucleotides e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides.
- the duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 98% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- 8 or more contiguous nucleotides e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides.
- the duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 99% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- 8 or more contiguous nucleotides e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides.
- the duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 100% identical to one of the crRNA sequences set forth in SEQ ID NOs:563- 679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- 8 or more contiguous nucleotides e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides.
- a dual guide RNA can be designed to allow for controlled (i.e., conditional) binding of a
- RNA aptamers can be used to regulate (i.e., control) the binding of the activator with the targeter.
- the activator and/or the targeter can include an RNA aptamer sequence.
- RNA aptamers are known in the art and are generally a synthetic version of a riboswitch.
- the terms "RNA aptamer” and “riboswitch” are used interchangeably herein to encompass both synthetic and natural nucleic acid sequences that provide for inducible regulation of the structure (and therefore the availability of specific sequences) of the nucleic acid molecule (e.g., RNA, DNA/RNA hybrid, etc.) of which they are part.
- RNA aptamers usually comprise a sequence that folds into a particular structure (e.g., a hairpin), which specifically binds a particular drug (e.g., a small molecule).
- Binding of the drug causes a structural change in the folding of the RNA, which changes a feature of the nucleic acid of which the aptamer is a part.
- an activator with an aptamer may not be able to bind to the cognate targeter unless the aptamer is bound by the appropriate drug;
- a targeter with an aptamer may not be able to bind to the cognate activator unless the aptamer is bound by the appropriate drug;
- a targeter and an activator, each comprising a different aptamer that binds a different drug may not be able to bind to each other unless both drugs are present.
- a Cas9 dual guide RNA can be designed to be inducible.
- aptamers and riboswitches can be found, for example, in: Nakamura et al., Genes Cells. 2012 May;17(5):344-64; Vavalle et al., Future Cardiol. 2012 May;8(3):371-82; Citartan et al., Biosens Bioelectron. 2012 Apr 15;34(1): 1-11 ; and Liberman et al., Wiley Interdiscip Rev RNA. 2012 May-Jun;3(3):369-84; all of which are herein incorporated by reference in their entirety.
- a Cas9 single guide RNA comprises two stretches of nucleotides (a "targeter" and an
- a subject single Cas9 guide RNA (e.g., a single guide RNA) can comprise a targeter and an activator, each having a duplex-forming segment, where the duplex-forming segments of the targeter and the activator hybridize with one another to form a dsRNA duplex.
- targeter and the activator can be covalently linked via the 3' end of the targeter and the 5' end of the activator.
- targeter and the activator can be covalently linked via the 5' end of the targeter and the 3' end of the activator.
- the linker of a Cas9 single guide RNA can have a length of from 3 nucleotides to 100
- the linker can have a length of from 3 nucleotides (nt) to 90 nt, from 3 nucleotides (nt) to 80 nt, from 3 nucleotides (nt) to 70 nt, from 3 nucleotides (nt) to 60 nt, from 3 nucleotides (nt) to 50 nt, from 3 nucleotides (nt) to 40 nt, from 3 nucleotides (nt) to 30 nt, from 3 nucleotides (nt) to 20 nt or from 3 nucleotides (nt) to 10 nt.
- the linker can have a length of from 3 nt to 5 nt, from 5 nt to 10 nt, from 10 nt to 15 nt, from 15 nt to 20 nt, from 20 nt to 25 nt, from 25 nt to 30 nt, from 30 nt to 35 nt, from 35 nt to 40 nt, from 40 nt to 50 nt, from 50 nt to 60 nt, from 60 nt to 70 nt, from 70 nt to 80 nt, from 80 nt to 90 nt, or from 90 nt to 100 nt.
- the linker of a Cas9 single guide RNA is 4 nt.
- a single Cas9 guide RNA comprises two complementary stretches of nucleotides (a targeter and an activator) that hybridize to form a dsRNA duplex.
- one of the two complementary stretches of nucleotides of the single Cas9 guide RNA (or the DNA encoding the stretch) is 60% or more identical to one of the activator (tracrRNA) molecules set forth in SEQ ID NOs:431-562, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- one of the two complementary stretches of nucleotides of the single Cas9 guide RNA is 65% or more identical, 70% or more identical, 75% or more identical, 80% or more identical, 85% or more identical, 90% or more identical, 95% or more identical, 98% or more identical, 99% or more identical or 100% identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- 8 or more contiguous nucleotides e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleo
- one of the two complementary stretches of nucleotides of the single Cas9 guide RNA is 60% or more identical to one of the targeter (crRNA) sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- crRNA targeter
- one of the two complementary stretches of nucleotides of the single Cas9 guide RNA is 65% or more identical, 70% or more identical, 75% or more identical, 80% or more identical, 85% or more identical, 90% or more identical, 95% or more identical, 98% or more identical, 99% or more identical or 100% identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- 8 or more contiguous nucleotides e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides,
- one of the two complementary stretches of nucleotides of the single Cas9 guide RNA is 60% or more identical to one of the targeter (crRNA) sequences or activator (tracrRNA) sequences set forth in SEQ ID NOs: 431- 679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- crRNA targeter
- tracrRNA activator
- one of the two complementary stretches of nucleotides of the single Cas9 guide RNA can be 65% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- 8 or more contiguous nucleotides e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides.
- One of the two complementary stretches of nucleotides of the single Cas9 guide RNA can be 70% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- 8 or more contiguous nucleotides e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides.
- One of the two complementary stretches of nucleotides of the single Cas9 guide RNA can be 75% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides)
- 8 or more contiguous nucleotides e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides
- One of the two complementary stretches of nucleotides of the single Cas9 guide RNA can be 80% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
- 8 or more contiguous nucleotides e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Medicinal Chemistry (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Plant Pathology (AREA)
- Peptides Or Proteins (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
Abstract
The present disclosure provides a Cas9 heterodimer, as well as nucleic acids encoding the Cas9 heterodimer, and host cells comprising the nucleic acids. The present disclosure provides a system that includes a Cas9 heterodimer of the present disclosure and at least one of: a Cas9 guide RNA, and a dimerizing agent.. A Cas9 heterodimer of the present disclosure is useful in a wide variety of applications, which are also provided.
Description
HETERODIMERIC CAS9 AND METHODS OF USE THEREOF CROSS-REFERENCE
[0001] This application claims the benefit of U.S. Provisional Patent Application No. 62/102,485, filed January 12, 2015, and of U.S. Provisional Patent Application No. 62/116,263, filed February 13, 2015, which applications are incorporated herein by reference in their entirety.
INCORPORATION BY REFERENCE OF SEQUENCE LISTING PROVIDED AS A TEXT FILE
[0002] A Sequence Listing is provided herewith as a text file, "BERK-267PRV2-SeqList_ST25.txt" created on February 11, 2015 and having a size of 7,769 KB. The contents of the text file are incorporated by reference herein in their entirety.
SUMMARY
[0003] The present disclosure provides a Cas9 heterodimer, as well as nucleic acids encoding the Cas9 heterodimer, and host cells comprising the nucleic acids. The present disclosure provides a system that includes a Cas9 heterodimer of the present disclosure and at least one of: a Cas9 guide RNA, and a dimerizing agent. A Cas9 heterodimer of the present disclosure is useful in a wide variety of applications, which are also provided.
[0004] The present disclosure provides a Cas9 heterodimer comprising: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-inter acting polypeptide; and b) a first fusion partner, where the first fusion partner is a first member of a dimerization pair; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a second member of a dimerization pair. In some cases, the RuvCI polypeptide comprises an amino acid sequence having at least
75% amino acid sequence, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1-60 of the Streptococcus pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a corresponding Cas9 amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346, and has a length of from about 50 amino acids to about 70 amino acids. In some cases, the RuvCII polypeptide comprises an amino acid sequence having at least 75% amino acid sequence, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719-775 of the Streptococcus pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a corresponding Cas9 amino acid sequence set forth in any of SEQ
ID NOs: 1-259 and 795-1346, and has a length of from about 40 amino acids to about 70 amino acids. In some cases, the HNH polypeptide comprises an amino acid sequence having at least 75% amino acid sequence, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 776 to 909 of the
Streptococcus pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a corresponding Cas9 amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346, and has a length of from about 120 amino acids to about 145 amino acids. In some cases, the RuvCIII polypeptide comprises an amino acid sequence having at least at least 75% amino acid sequence, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910-1099 of the Streptococcus pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a corresponding Cas9 amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346, and has a length of from about 170 amino acids to about 210 amino acids. In some cases, the PAM-interacting polypeptide comprises an amino acid sequence having at least 75% amino acid sequence, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1100-1367 of the Streptococcus pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a corresponding Cas9 amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346, and has a length of from about 250 amino acids to about 280 amino acids. In some cases, the first fusion polypeptide comprises one or more heterologous nuclear localization sequences (NLS) that provides for nuclear localization. In some cases, the NLS is at or near the N-terminus of the first fusion polypeptide. In some cases, the NLS is at or near the C-terminus of the first fusion polypeptide. In some cases, the first fusion polypeptide comprises an NLS at or near the N-terminus of the first fusion polypeptide and comprises an NLS at or near the C-terminus of the first fusion polypeptide. In some cases, wherein the second fusion polypeptide comprises one or more heterologous nuclear localization sequences (NLS) that provides for nuclear localization. In some cases, the NLS is at or near the N-terminus of the second fusion polypeptide. In some cases, the NLS is at or near the C-terminus of the second fusion polypeptide. In some cases, the second fusion polypeptide comprises an NLS at or near the N-terminus of the first second polypeptide and comprises an NLS at or near the C-terminus of the second fusion polypeptide. In some cases, wherein the first fusion partner is or near the N- terminus of the first polypeptide. In some cases, the first fusion partner is or near the C-terminus of the first polypeptide. In some cases, the first fusion partner is located internally to the first polypeptide. In some cases, the first fusion partner is located within the HNH polypeptide. In some cases, the first fusion partner is located between amino acids 860 and 880 of the 776 to 909 of the Streptococcus pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a
corresponding Cas9 amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is located within the RuvCIII polypeptide. In some cases, the first fusion partner is located between amino acids 1000 and 1025 of the Streptococcus pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a corresponding Cas9 amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the second fusion partner is or near the N-terminus of the second polypeptide. In some cases, the second fusion partner is or near the C-terminus of the second polypeptide. In some cases, the second fusion partner is located internally to the second polypeptide.
The present disclosure provides a Cas9 heterodimer comprising: A) a first fusion polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM- interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a first member of a dimerization pair; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a second member of the dimerization pair. In some cases, the first, circular permuted, polypeptide comprises, in order from N-terminus to C-terminus: i) a RuvCII polypeptide; ii) an HNH polypeptide; iii) a RuvCIII polypeptide; iv) a PAM-interacting polypeptide; and v) a RuvCI polypeptide. In some cases, the first, circular permuted, polypeptide comprises, in order from N-terminus to C-terminus: i) an HNH polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-interacting polypeptide; iv) a RuvCI polypeptide; and v) a RuvCII polypeptide. In some cases, the first, circular permuted, polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCIII polypeptide; ii) a PAM-interacting polypeptide; iii) a RuvCI polypeptide; iv) a RuvCII polypeptide; and v) an HNH polypeptide. In some cases, the first, circular permuted, polypeptide comprises, in order from N-terminus to C-terminus: i) a C- terminal portion of a RuvCIII polypeptide; ii) a PAM-interacting polypeptide; iii) a RuvCI polypeptide; iv) a RuvCII polypeptide; v) an HNH polypeptide; and vi) an N-terminal portion of a RuvCIII polypeptide. In some cases, the first, circular permuted, polypeptide comprises, in order from N-terminus to C-terminus: i) a C-terminal portion of an HNH polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-interacting polypeptide; iv) a RuvCI polypeptide; v) a RuvCII polypeptide; and v) an N-terminal portion of an HNH polypeptide. In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) the first fusion partner; and b) the first polypeptide. In some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: a) the first polypeptide; and b) the first fusion partner. In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) heterologous nuclear localization sequences (NLS) that provides for nuclear localization; b) the first fusion
partner; and c) the first polypeptide. In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) the first fusion partner; c) the first polypeptide; and d) an NLS. In some cases, the first fusion polypeptide and the second fusion polypeptide form a heterodimer in the presence of a small molecule dimerizer. In any of the embodiments set forth above or herein, in some cases, a) the first fusion partner is (FK506 binding protein 1A) FKBP1A; and the second fusion partner is FKBP1A; b) the first fusion partner is FKBP1A; and the second fusion partner is and PPP3CA (protein phosphatase 3, catalytic subunit, alpha isozyme); c) the first fusion partner is FKBP1A; and the second fusion partner is and cyclophilin; d) the first fusion partner is FKBP1A; and the second fusion partner is and Fkbp-Rapamycin Binding Domain (FRB); e) the first fusion partner is gyrase B (GyrB) ; and the second fusion partner is and GyrB ; f) the first fusion partner is dihydrofolate reductase (DHFR) ; and the second fusion partner is and DHFR; g) the first fusion partner is DmrB; and the second fusion partner is and DmrB; h) the first fusion partner is PYL; and the second fusion partner is and ABI; i) the first fusion partner is Cry2; and the second fusion partner is and CIP; or j) the first fusion partner is GAI; and the second fusion partner is and GID1. In some cases, a) the second fusion partner is (FK506 binding protein 1A) FKBP1A; and the first fusion partner is FKBP1A; b) the second fusion partner is FKBP1A; and the first fusion partner is and PPP3CA (protein phosphatase 3, catalytic subunit, alpha isozyme); c) the second fusion partner is FKBP1A; and the first fusion partner is and cyclophilin; d) the second fusion partner is
FKBP1A; and the first fusion partner is and Fkbp-Rapamycin Binding Domain (FRB); e) the second fusion partner is gyrase B (GyrB) ; and the first fusion partner is and GyrB; f) the second fusion partner is dihydrofolate reductase (DHFR) ; and the first fusion partner is and DHFR; g) the second fusion partner is DmrB; and the first fusion partner is and DmrB; h) the second fusion partner is PYL; and the first fusion partner is and ABI; i) the second fusion partner is Cry2; and the first fusion partner is and CIP; or j) the second fusion partner is GAI; and the first fusion partner is and GID1.
[0006] The present disclosure provides a system comprising: a Cas9 heterodimer (e.g., as described above) and a Cas9 guide RNA that comprises stem loop 1 but does not comprise at least one of: a stem loop 2 and a stem loop 3. In some cases, the Cas9 guide RNA does not comprise a stem loop 2 and does not comprise a stem loop 3. In some cases, the system comprises a small molecule dimerizer that induces dimerization of the first fusion polypeptide and the second fusion polypeptide.
[0007] The present disclosure provides one or more nucleic acids comprising nucleotide sequences encoding the Cas9 heterodimer (e.g., as described above). In some cases, the nucleotide sequences encoding the first fusion polypeptide and the nucleotide sequences encoding the
second fusion polypeptide are in the same nucleic acid. In some cases, the nucleotide sequences encoding the first fusion polypeptide and nucleotide sequences encoding the second fusion polypeptide are in separate nucleic acids. In some cases, the nucleotide sequences encoding the first fusion polypeptide are operably linked to a transcription control sequence. In some cases, the nucleotide sequences encoding the second fusion polypeptide are operably linked to a transcription control sequence. The present disclosure provides a recombinant vector comprising the one or more nucleic acids. In some cases, the vector is a viral vector. In some cases, the viral vector is an adeno-associated viral vector, a lentiviral vector, or a retroviral vector. The present disclosure provides a host cell genetically modified with the one or more nucleic acids or with the recombinant expression vector. In some cases, the host cell is a prokaryotic cell. In some cases, the host cell is a eukaryotic cell. In some cases, the host cell is a mammalian cell, a plant cell, an amphibian cell, a reptile cell, a yeast cell, a protozoan cell, or a helminth cell.
[0008] The present disclosure provides a method of binding a target nucleic acid, comprising:
contacting a target nucleic acid with the system described above (e.g., comprising a Cas9 heterodimer and a Cas9 guide RNA, e.g., a truncated Cas9 guide RNA that does not include stem loops 2 or 3). In some cases, the method results in modification of the target nucleic acid. In some cases, the modification is cleavage. In some cases, the Cas9 guide RNA does not comprise a stem loop 2 and does not comprise a stem loop 3. In some cases, the system comprises a small molecule dimerizer that induces dimerization of the first fusion polypeptide and the second fusion polypeptide. In some cases, the system comprises a donor polynucleotide. In some cases, the system comprises a PAMmer. In some cases, the Cas9 heterodimer has reduced nuclease activity. In some cases, the Cas9 heterodimer has nickase activity. In some cases, the Cas9 heterodimer includes a fusion partner that provides for an activity selected from: transcription modulation, methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity,
deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, and demyristoylation activity.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] Figures 1A-B present one embodiment of a split Cas9 protein (a Cas9 heterodimer), and a crystal structure that illustrates where the nuclease lobe and alpha-helical lobe can be separated relative to a wild type Cas9 protein.
[0010] Figure 2 presents assays related to the expression and cleavage activity of split Cas9.
[0011] Figure 3 presents assays related to DNA cleavage activity of split Cas9 when using truncated single guide RNA (sgRNA).
[0012] Figures 4A-B present two different suitable dimerization domain insertion sites (and/or circular permutation sites) for split Cas9.
[0013] Figure 5 depicts a Cas9 protein sequence of 5. pyogenes (SEQ ID NO: 1545).
[0014] Figures 6A and 6B present illustrative example sequences of nuclease lobes and alpha-helical lobes (first and second polypeptides) of a subject Split Cas9. The examples depicted include tags
(e.g., his tag, maltose binding protein (MBP)), protease sites (e.g., TEV protease sites), and the like.
[0015] Figure 7 presents illustrative example sequences of nuclease lobes and alpha-helical lobes (first and second polypeptides) of a subject Split Cas9. The examples depicted include fusion partners (including dimer pair members), linkers, NLSs, tags (e.g., FLAG tags, HA tags), etc.
[0016] Figures 8A-F provide a schematic drawing of exemplary embodiments of subject compositions and methods in which the target nucleic acid is a single stranded nucleic acid. (A-B) Each embodiment depicted includes a PAMmer, which is hybridized to a single stranded target nucleic acid; and a guide nucleic acid (Cas9 guide RNA), which is hybridized to the target nucleic acid and is associated with a Cas9 heterodimer. In cases where the target nucleic acid is double stranded, the PAM sequence can be provided by the target nucleic acid and a PAMmer may therefore not be necessary. (C-D) Each embodiment depicted in C and D includes a PAMmer having a specificity segment and an orientation segment. The PAM sequence is complementary to the target nucleic acid in 12C, and is not complementary to the target nucleic acid in 12D. (E-F) Each embodiment depicted in E and F includes a PAMmer having either a specificity segment or an orientation segment. The PAM sequence is complementary to the target nucleic acid on the right, and is not complementary to the target nucleic acid on the left.
[0017] Figure 9 presents a schematic of one possible guide RNA. The depicted guide RNA is a single guide RNA with a targeter covalently linked to an activator via 4 linker nucleotides. The nucleotides are 5' to 3' from left to right.
[0018] Figures 10A-D present a schematic of one way in which Cas9 can be split into two separate polypeptides that retain the ability to catalyze RNA-guided dsDNA cleavage.
[0019] Figures 11A-E present data related to the requirements for split-Cas9 assembly.
[0020] Figures 12A-C present genomic editing data for split-Cas9.
[0021] Figures 13A-B present size exclusion chromatograms of purified oc-helical (top) and nuclease (bottom) lobes.
[0022] Figures 14A-C present split-Cas9 activity data and individual lobe data related to the use single- guide and dual-guide RNAs.
[0023] Figures 15A-B present split-Cas9 versus non-split Cas9 binding affinity data for target DNA.
[0024] Figures 16A-D present data related to requirements for split-Cas9 heterodimerization.
[0025] Figure 17 presents data related to the DNA cleavage activity of split-Cas9 in the presence of excess sgRNA.
[0026] Figures 18A-C present a schematic of one strategy for inducible control of genome engineering by a split-Cas9 enzyme complex.
[0027] Figures 19A-B present data related to split-Cas9 activity in the presence of different sgRNAs.
[0028] Figures 20A-B present Table 1 and Table SI of Example 2.
DEFINITIONS
[0029] The terms "polynucleotide" and "nucleic acid," used interchangeably herein, refer to a polymeric form of nucleotides of any length, either ribonucleotides or deoxynucleotides. Thus, this term includes, but is not limited to, single-, double-, or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA -RNA hybrids, or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases. The terms "polynucleotide" and "nucleic acid" should be understood to include, as applicable to the embodiment being described, single-stranded (such as sense or antisense) and double-stranded polynucleotides.
[0030] The term "naturally-occurring" as used herein as applied to a nucleic acid, a cell, or an organism, refers to a nucleic acid, cell, or organism that is found in nature. For example, a polypeptide or polynucleotide sequence that is present in an organism (including viruses) that can be isolated from a source in nature and which has not been intentionally modified by a human in the laboratory is naturally occurring.
[0031] As used herein the term "isolated" is meant to describe a polynucleotide, a polypeptide, or a cell that is in an environment different from that in which the polynucleotide, the polypeptide, or the cell naturally occurs. An isolated genetically modified host cell may be present in a mixed population of genetically modified host cells.
[0032] "Heterologous," as used herein, means a nucleotide or polypeptide sequence that is not found in the native nucleic acid or protein, respectively. For example, a first fusion polypeptide of a Cas9 heterodimer can comprise a polypeptide comprising a Cas9 nuclease and a non-Cas9 polypeptide, where the non-Cas9 polypeptide can be a fusion partner, etc.
[0033] As used herein, the term "exogenous nucleic acid" refers to a nucleic acid that is not normally or naturally found in and/or produced by a given bacterium, organism, or cell in nature. As used herein, the term "endogenous nucleic acid" refers to a nucleic acid that is normally found in and/or produced by a given bacterium, organism, or cell in nature. An "endogenous nucleic acid" is also referred to as a "native nucleic acid" or a nucleic acid that is "native" to a given bacterium, organism, or cell.
[0034] "Recombinant," as used herein, means that a particular nucleic acid (DNA or RNA) is the
product of various combinations of cloning, restriction, and/or ligation steps resulting in a construct having a structural coding or non-coding sequence distinguishable from endogenous nucleic acids found in natural systems. Generally, DNA sequences encoding the structural coding sequence can be assembled from cDNA fragments and short oligonucleotide linkers, or from a series of synthetic oligonucleotides, to provide a synthetic nucleic acid which is capable of being expressed from a recombinant transcriptional unit contained in a cell or in a cell-free transcription and translation system. Such sequences can be provided in the form of an open reading frame uninterrupted by internal non-translated sequences, or introns, which are typically present in eukaryotic genes. Genomic DNA comprising the relevant sequences can also be used in the formation of a recombinant gene or transcriptional unit. Sequences of non-translated DNA may be present 5' or 3' from the open reading frame, where such sequences do not interfere with manipulation or expression of the coding regions, and may indeed act to modulate production of a desired product by various mechanisms (see "DNA regulatory sequences", below).
[0035] Thus, e.g., the term "recombinant" polynucleotide or "recombinant" nucleic acid refers to one which is not naturally occurring, e.g., is made by the artificial combination of two otherwise separated segments of sequence through human intervention. This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques. Such is usually done to replace a codon with a redundant codon encoding the same or a conservative amino acid, while typically introducing or removing a sequence recognition site. Alternatively, it is performed to join together nucleic acid segments of desired functions to generate a desired combination of functions. This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques.
[0036] Similarly, the term "recombinant" polypeptide refers to a polypeptide which is not naturally occurring, e.g., is made by the artificial combination of two otherwise separated segments of
amino sequence through human intervention. Thus, e.g., a polypeptide that comprises a heterologous amino acid sequence is recombinant.
[0037] By "construct" or "vector" is meant a recombinant nucleic acid, generally recombinant DNA, which has been generated for the purpose of the expression and/or propagation of a specific nucleotide sequence(s), or is to be used in the construction of other recombinant nucleotide sequences.
[0038] The terms "DNA regulatory sequences," "control elements," and "regulatory elements," used interchangeably herein, refer to transcriptional and translational control sequences, such as promoters, enhancers, polyadenylation signals, terminators, protein degradation signals, and the like, that provide for and/or regulate expression of a coding sequence and/or production of an encoded polypeptide in a host cell.
[0039] The term "transformation" is used interchangeably herein with "genetic modification" and refers to a permanent or transient genetic change induced in a cell following introduction of new nucleic acid (i.e., DNA exogenous to the cell). Genetic change ("modification") can be accomplished either by incorporation of the new DNA into the genome of the host cell, or by transient or stable maintenance of the new DNA as an episomal element. Where the cell is a eukaryotic cell, a permanent genetic change is generally achieved by introduction of the DNA into the genome of the cell. In prokaryotic cells, permanent changes can be introduced into the chromosome or via extrachromosomal elements such as plasmids and expression vectors, which may contain one or more selectable markers to aid in their maintenance in the recombinant host cell. Suitable methods of genetic modification include viral infection, transfection, conjugation, protoplast fusion, electroporation, particle gun technology, calcium phosphate precipitation, direct microinjection, and the like. The choice of method is generally dependent on the type of cell being transformed and the circumstances under which the transformation is taking place (i.e. in vitro, ex vivo, or in vivo). A general discussion of these methods can be found in Ausubel, et al, Short Protocols in Molecular Biology, 3rd ed., Wiley & Sons, 1995.
[0040] "Operably linked" refers to a juxtaposition wherein the components so described are in a
relationship permitting them to function in their intended manner. For instance, a promoter is operably linked to a coding sequence if the promoter affects its transcription or expression. As used herein, the terms "heterologous promoter" and "heterologous control regions" refer to promoters and other control regions that are not normally associated with a particular nucleic acid in nature. For example, a "transcriptional control region heterologous to a coding region" is a transcriptional control region that is not normally associated with the coding region in nature.
[0041] A "host cell," as used herein, denotes an in vivo or in vitro eukaryotic cell, a prokaryotic cell, or a cell from a multicellular organism (e.g., a cell line) cultured as a unicellular entity, which eukaryotic or prokaryotic cells can be, or have been, used as recipients for a nucleic acid (e.g., an expression vector that comprises a nucleotide sequence encoding one or more biosynthetic pathway gene products such as mevalonate pathway gene products), and include the progeny of the original cell which has been genetically modified by the nucleic acid. It is understood that the progeny of a single cell may not necessarily be completely identical in morphology or in genomic or total DNA complement as the original parent, due to natural, accidental, or deliberate mutation. A "recombinant host cell" (also referred to as a "genetically modified host cell") is a host cell into which has been introduced a heterologous nucleic acid, e.g., an expression vector. For example, a subject prokaryotic host cell is a genetically modified prokaryotic host cell (e.g., a bacterium), by virtue of introduction into a suitable prokaryotic host cell of a heterologous nucleic acid, e.g., an exogenous nucleic acid that is foreign to (not normally found in nature in) the prokaryotic host cell, or a recombinant nucleic acid that is not normally found in the prokaryotic host cell; and a subject eukaryotic host cell is a genetically modified eukaryotic host cell, by virtue of introduction into a suitable eukaryotic host cell of a heterologous nucleic acid, e.g., an exogenous nucleic acid that is foreign to the eukaryotic host cell, or a recombinant nucleic acid that is not normally found in the eukaryotic host cell.
[0042] The term "conservative amino acid substitution" refers to the interchangeability in proteins of amino acid residues having similar side chains. For example, a group of amino acids having aliphatic side chains consists of glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic -hydroxyl side chains consists of serine and threonine; a group of amino acids having amide-containing side chains consists of asparagine and glutamine; a group of amino acids having aromatic side chains consists of phenylalanine, tyrosine, and tryptophan; a group of amino acids having basic side chains consists of lysine, arginine, and histidine; and a group of amino acids having sulfur-containing side chains consists of cysteine and methionine. Exemplary conservative amino acid substitution groups are: valine-leucine -isoleucine, phenylalanine-tyrosine, lysine-arginine, alanine-valine, and asparagine-glutamine.
[0043] A polynucleotide or polypeptide has a certain percent "sequence identity" to another
polynucleotide or polypeptide, meaning that, when aligned, that percentage of bases or amino acids are the same, and in the same relative position, when comparing the two sequences.
Sequence similarity can be determined in a number of different manners. To determine sequence identity, sequences can be aligned using the methods and computer programs, including BLAST, available over the world wide web at ncbi.nlm.nih.gov/BLAST. See, e.g., Altschul et al. (1990), /. Mol. Biol. 215:403-10. Another alignment algorithm is FASTA, available in the Genetics
Computing Group (GCG) package, from Madison, Wisconsin, USA, a wholly owned subsidiary of Oxford Molecular Group, Inc. Other techniques for alignment are described in Methods in Enzymology, vol. 266: Computer Methods for Macromolecular Sequence Analysis (1996), ed. Doolittle, Academic Press, Inc., a division of Harcourt Brace & Co., San Diego, California, USA. Of particular interest are alignment programs that permit gaps in the sequence. The Smith-Waterman is one type of algorithm that permits gaps in sequence alignments. See Meth. Mol. Biol. 70: 173-187 (1997). Also, the GAP program using the Needleman and Wunsch alignment method can be utilized to align sequences. See /. Mol. Biol. 48: 443-453 (1970).
[0044] Before the present invention is further described, it is to be understood that this invention is not limited to particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.
[0045] Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
[0046] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present invention, the preferred methods and materials are now described. All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited.
[0047] It must be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to "a Cas9 polypeptide" includes a plurality of such polypeptides and reference to "the guide RNA" includes reference to one or more guide RNAs and equivalents thereof known to those skilled in the art, and so forth. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of
such exclusive terminology as "solely," "only" and the like in connection with the recitation of claim elements, or use of a "negative" limitation.
[0048] It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination. All combinations of the embodiments pertaining to the invention are specifically embraced by the present invention and are disclosed herein just as if each and every combination was individually and explicitly disclosed. In addition, all sub-combinations of the various embodiments and elements thereof are also specifically embraced by the present invention and are disclosed herein just as if each and every such sub-combination was individually and explicitly disclosed herein.
[0049] The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates which may need to be independently confirmed.
DETAILED DESCRIPTION
[0050] The present disclosure provides a Cas9 heterodimer, as well as nucleic acids encoding the Cas8 heterodimer, and host cells comprising the nucleic acids. The present disclosure provides a system comprising a Cas9 heterodimer of the present disclosure. A Cas9 heterodimer of the present disclosure is useful in a wide variety of applications, which are also provided.
CAS9 HETERODIMERS
[0051] The present disclosure provides a Cas9 heterodimer. A Cas9 heterodimer of the present
disclosure comprises two polypeptides, where the two polypeptides are not covalently linked to one another. A Cas9 heterodimer of the present disclosure is also referred to herein as a
"heterodimeric Cas9 complex" and/or or a "split Cas9 protein" and/or or a "heterodimeric Cas9 protein." A Cas9 heterodimer of the present disclosure includes a first fusion polypeptide comprising a first polypeptide (e.g., a Cas9 nuclease lobe) covalently linked (directly or via a linker) to a first fusion partner; and a second fusion polypeptide comprising a second polypeptide (e.g., a Cas9 alpha-helical lobe) covalently linked (directly or via a linker) to a second fusion partner. In some cases, the first polypeptide (e.g., a Cas9 nuclease lobe) is circularly permuted (i.e., in some cases, the first polypeptide is a circular permutant).
[0052] A Cas9 heterodimer of the present disclosure comprises two polypeptides that can interact to form a complex (i.e., to form the heterodimeric Cas9 protein). A Cas9 heterodimer of the present disclosure is also referred to herein as a "split Cas9" or a "split Cas9 protein." The fusion partners present in the first fusion polypeptide and the second fusion polypeptide can be induced to dimerize (e.g., by a dimerizing agent). When the fusion partners present in the first fusion polypeptide and the second fusion polypeptide dimerize, the first fusion polypeptide and the second fusion polypeptide dimerize. In the absence of a dimerizing agent, and in the absence of a guide RNA that includes a stem loop 2 and/or a stem loop 3, the first fusion polypeptide and the second fusion polypeptide do not dimerize. When the first fusion polypeptide and the second fusion polypeptide dimerize, the Cas9 heterodimer, together with a truncated guide RNA (e.g., a guide RNA that does not include stem loop 2 and/or stem loop 3), can bind a target nucleic acid (an in some cases modify, e.g., cleave or otherwise modify the target nucleic acid). A Cas9 heterodimer of the present disclosure and a truncated guide RNA form a "Cas9 heterodimer system," described herein. A Cas9 heterodimer system of the present disclosure can bind to a target nucleic acid. In some cases, a Cas9 heterodimer system of the present disclosure can bind to a target nucleic acid and cleave the target nucleic acid. In some cases, a Cas9 heterodimer system of the present disclosure can bind to a target nucleic acid and modify the target nucleic acid. In some cases, a Cas9 heterodimer system of the present disclosure can bind to a target nucleic acid and modulate transcription of/from the target nucleic acid.
[0053] A subject Cas9 heterodimer (a split Cas9 protein) includes a first polypeptide (where the first polypeptide includes a Cas9 nuclease lobe) and a second polypeptide (where the second polypeptide includes a Cas9 alpha-helical lobe) (e.g., see Figure 1). A nuclease lobe includes: (i) a RuvC domain, where a RuvC domain comprises a RuvCI polypeptide, a RuvCII polypeptide, and a RuvCIII polypeptide; (ii) an HNH domain (also referred to as an HNH polypeptide); and (iii) a PAM-interacting domain (also referred to as a "PAM-interacting polypeptide"). A Cas9 alpha-helical lobe is also referred to as an "alpha-helical recognition region."
Cas9 heterodimers with nuclease lobe and alpha-helical lobe
[0054] In some cases, a Cas9 heterodimer of the present disclosure comprises : A) a first fusion
polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a first member of a dimerization pair; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a second member of a dimerization pair.
First fusion polypeptide
[0055] As noted above, in some cases, a Cas9 heterodimer of the present disclosure comprises : A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM- interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a first member of a dimerization pair; and B) a second fusion polypeptide comprising: a) an alpha- helical recognition region; and b) a second fusion partner, where the second fusion partner is a second member of a dimerization pair.
[0056] A RuvCI polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 amino acids to 60 amino acids of amino acids 1-60 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 40 amino acids to 80 amino acids, e.g., from 40 amino acids to 50 amino acids, from 50 amino acids to 60 amino acids, from 60 amino acids to 70 amino acids, or from 70 amino acids to 80 amino acids. In some cases, a RuvCI polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1- 60 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a
corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 50 amino acids to 60 amino acids (e.g., 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, or 60 amino acids). For example, in some cases, a RuvCI polypeptide can have at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 2-56 of the 5.
pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795- 1346.
[0057] A RuvCII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 to 57 amino acids of amino acids 718-774 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 40 amino acids to about 70 amino acids, e.g., from 40 amino acids to 45 amino acids, from 45 amino acids to 50 amino acids, from 50 amino acids to 55 amino acids, from 55 amino acids to 60 amino acids, from 60 amino acids to 65 amino acids,
or from 65 amino acids to 70 amino acids. In some cases, a RuvCII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 718-774 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of 55-60 (e.g., 55, 56, 57, 58, 59, or 60) amino acids.
[0058] In some cases, a short alpha-helix (S717-L727 in the 5. pyogenes Cas9 depicted in Figure 5) can be removed, e.g., to minimize the distance between the end of RuvCI and the beginning of RuvCII. In some cases, a short alpha-helix (S717-L727 in the 5. pyogenes Cas9 depicted in Figure 5) is removed and the RuvCI polypeptide is connected to the RuvCII polypeptide with a linker (e.g., a glycine-serine-serine linker, and as described elsewhere).
[0059] A RuvCII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 to 46 amino acids of amino acids 729-775 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 40 amino acids to about 60 amino acids, e.g., from 40 amino acids to 45 amino acids, from 45 amino acids to 50 amino acids, from 50 amino acids to 55 amino acids, or from 55 amino acids to 60 amino acids. In some cases, a RuvCII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 728-774 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of 45-50 (e.g., 45, 46, 47, 48, 49, or 50) amino acids.
[0060] An HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 100 to 134 amino acids of amino acids 776-909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795- 1346; and can have a length of from 90 amino acids to 150 amino acids, e.g., from 90 amino acids to 95 amino acids, from 95 to amino acids to 100 amino acids, from 100 amino acids to 125 amino acids, from 125 amino acids to 130 amino acids, from 130 amino acids to 135 amino acids, from 135 amino acids to 140 amino acids, from 140 amino acids to 145 amino acids, or from 145 amino acids to 150 amino acids. In some cases, an HNH polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%,
at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 776-909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 130 amino acids to 140 amino acids (e.g., 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, or 140 amino acids).
[0061] A RuvCIII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 150 amino acids to 190 amino acids of amino acids 910 to 1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 150 amino acids to 160 amino acids, from 160 amino acids to 170 amino acids, from 170 amino acids to 180 amino acids, from 180 amino acids to 190 amino acids, from 190 amino acids to 200 amino acids, from 200 amino acids to 210 amino acids, or from 210 amino acids to 220 amino acids. In some cases, a RuvCIII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 180 amino acids to 190 amino acids (e.g., 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, or 190 amino acids).
[0062] A PAM-interacting polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 200 amino acids to 268 amino acids of amino acids 1100 to 1367 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 240 amino acids to 280 amino acids, e.g., from 240 amino acids to 250 amino acids, from 250 amino acids to 260 amino acids, from 260 amino acids to 270 amino acids, or from 270 amino acids to 280 amino acids. In some cases, a PAM-interacting polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1100 to 1367 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 260 amino acids to 270 amino acids (e.g., 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, or 270 amino acids).
Heterologous subcellular localization sequences
[0063] In some cases, the first fusion polypeptide comprises a heterologous sequence that provides for subcellular localization (e.g., an NLS for targeting to the nucleus; a mitochondrial localization signal for targeting to the mitochondria; a chloroplast localization signal for targeting to a chloroplast; an ER retention signal; and the like). In some cases, the first fusion polypeptide includes 2 or more, 3 or more, 4 or more, or 5 or more NLSs. In some cases, an NLS is located at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the N-terminus and/or at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the C-terminus.
[0064] In some cases, the first fusion polypeptide comprises an NLS. For example, in some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a first fusion partner; and c) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide. In some cases, the first fusion polypeptide comprises an NLS. For example, in some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a first fusion partner; c) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM- interacting polypeptide; and d) an NLS. In some cases, the first fusion polypeptide comprises an NLS. For example, in some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: a) an NLS; b) a first fusion partner; c) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and d) an NLS. In some cases, the first fusion polypeptide comprises an NLS. In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and c) a first fusion partner. In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; c) a first fusion partner; and d) an NLS. In some cases, the NLS comprises the amino acid sequence
MAPKKKRKVGIHGVPAA (SEQ ID NO: 1546). In some cases, the NLS comprises the amino acid sequence KRPAATKKAGQAKKKK (SEQ ID NO: 1547). Other suitable NLS are described elsewhere herein.
[0065] An NLS can be at or near the N-terminus and/or the C-terminus. In some cases, the first fusion polypeptide comprises two or more NLSs (e.g., 3 or more, 4 or more, or 5 or more NLSs). In some cases, the first fusion polypeptide comprises one or more NLSs (e.g., 2 or more, 3 or more,
or 4 or more NLSs) at or near the N-terminus and/or one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the C-terminus. The term "at or near" is used here because, as is known in the art, the NLS need not be at the actual terminus of a protein, but can be positioned near (e.g., within 100 amino acids of) an N- and/or C- terminus (e.g., within 80, within 75, within 60, within 55, within 50, within 45, within 40, within 35, or within 30 amino acids of the an N-and/or C-terminus).
Fusion partner at or near N-terminus of first fusion polypeptide
[0066] In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first fusion partner; and b) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide.
[0067] In some cases, a first fusion polypeptide comprises one or more linker polypeptides. For
example, a linker polypeptide can be interposed between any of: a) an NLS and a fusion partner; b) a fusion partner and a RuvCI polypeptide; c) a RuvCI polypeptide and a RuvCII polypeptide; and d) a PAM-interacting polypeptide and an NLS.
[0068] The linker polypeptide may have any of a variety of amino acid sequences. Proteins can be joined by a spacer peptide, generally of a flexible nature, although other chemical linkages are not excluded. Suitable linkers include polypeptides of between about 6 amino acids and about 40 amino acids in length, or between about 6 amino acids and about 25 amino acids in length. These linkers are generally produced by using synthetic, linker-encoding oligonucleotides to couple the proteins. Peptide linkers with a degree of flexibility will generally be preferred. The linking peptides may have virtually any amino acid sequence, bearing in mind that the preferred linkers will have a sequence that results in a generally flexible peptide. The use of small amino acids, such as glycine and alanine, are of use in creating a flexible peptide. The creation of such sequences is routine to those of skill in the art. A variety of different linkers are commercially available and are considered suitable for use.
[0069] Exemplary polypeptide linkers include glycine polymers (G)n, glycine-serine polymers
(including, for example, (GS)n, GSGGSn (SEQ ID NO: 1548) and GGGSn (SEQ ID NO: 1549), where n is an integer of at least one), glycine-alanine polymers, alanine-serine polymers.
Exemplary linkers can comprise amino acid sequences including, but not limited to, GGSG
(SEQ ID NO: 1550), GGSGG (SEQ ID NO: 1551), GSGSG (SEQ ID NO: 1552), GSGGG (SEQ
ID NO: 1553), GGGSG (SEQ ID NO: 1554), GSSSG (SEQ ID NO: 1555), and the like. The ordinarily skilled artisan will recognize that design of a peptide conjugated to any elements described above can include linkers that are all or partially flexible, such that the linker can include a flexible linker as well as one or more portions that confer less flexible structure.
Fusion partner at or near C-terminus of first fusion polypeptide
[0070] In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner.
[0071] In some cases, a first fusion polypeptide comprises one or more linker polypeptides. For
example, a linker polypeptide can be interposed between any of: a) an NLS and a RuvCI polypeptide; b) a RuvCI polypeptide and a RuvCII polypeptide; c) a PAM-interacting polypeptide and an NLS; d) a PAM-interacting polypeptide and a second fusion partner; and e) a fusion partner and an NLS. Suitable linker polypeptides are as described above.
Fusion partner located internally within first fusion polypeptide
[0072] In some cases, the fusion partner is located internally with the first polypeptide. In some cases, the first fusion partner is inserted within the HNH polypeptide. In some cases, the first fusion partner is inserted within the RuvCIII polypeptide.
Fusion partner inserted into HNH polypeptide
[0073] In some cases, the first fusion partner is inserted within the HNH polypeptide. The HNH
polypeptide of 5. pyogenes Cas9 is amino acids 776-909 of the amino acid sequence set forth in
SEQ ID NO: 1545. For example, in some cases, the first fusion partner is inserted in a site within amino acids 800 to 900 of amino acids 776-909 of the amino acid sequence of the 5. pyogenes
Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. For example, in some cases, the first fusion partner is inserted at or near amino acid 868 of amino acids 776-909 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 868 of amino acids 776-909 of the amino acid sequence of the 5. pyogenes
Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 860 of amino acids 776-909 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID
NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 861 of amino acids 776-909 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion
partner is inserted at amino acid 862 of amino acids 776-909 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 863 of amino acids 776-909 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 864 of amino acids 776-909 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 865 of amino acids 776-909 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 866 of amino acids 776-909 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 867 of amino acids 776-909 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 869 of amino acids 776-909 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 870 of amino acids 776-909 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 871 of amino acids 776-909 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 872 of amino acids 776-909 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 873 of amino acids 776-909 of the amino acid sequence of the
5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 874 of amino acids 776-909 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 875 of amino acids 776-909 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346.
[0074] As one non-limiting example, the first fusion polypeptide can comprise, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide; iv) a first fusion partner; v) a C-terminal portion of an HNH polypeptide; vi) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide.
[0075] An N-terminal portion of an HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 80 amino acids to 92 amino acids of amino acids 776 to 867 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 80 amino acids to 110 amino acids, e.g., from 80 amino acids to 90 amino acids, from 90 amino acids to 100 amino acids, or from 100 amino acids to 110 amino acids. In some cases, an N-terminal portion of an HNH polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 776 to 867 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of 85 amino acids to 95 amino acids (85, 86, 87, 88, 89, 90, 91, 92, 93, 94, or 95 amino acids). An N-terminal portion of an HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 50 amino acids to 66 amino acids of amino acids 776- 841 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 50 amino acids to 80 amino acids, e.g., from 50 amino acids to 60 amino acids, from 60 amino acids to 70 amino acids, or from 70 amino acids to 80 amino acids.
[0076] A C-terminal portion of an HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 35 to 42 amino acids of amino acids 868-909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 35 to 42 amino acids (e.g., 35, 36, 37, 38, 39, 40, 41, or 42 amino acids). A C-terminal portion of an HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 50 amino acids to 67 amino acids of amino acids 842-909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 50 amino acids to 80 amino acids, e.g., from 50 amino acids to 60 amino acids, from 60 amino acids to 70 amino acids, or from 70 amino acids to 80 amino acids.
[0077] For example, in some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 860 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; iv) a first fusion partner; v) a C-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 861 to 909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; vi) a RuvCIII polypeptide; and vii) a PAM- interacting polypeptide.
[0078] As another example, in some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 861 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; iv) a first fusion partner; v) a C- terminal portion of an HNH polypeptide comprising an amino acid sequence having at least
75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 862 to 909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; vi) a RuvCIII polypeptide; and vii) a PAM-inter acting polypeptide.
[0079] As another example, in some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 862 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; iv) a first fusion partner; v) a C- terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 863 to 909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; vi) a RuvCIII polypeptide; and vii) a PAM-inter acting polypeptide.
[0080] As another example, in some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 863 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; iv) a first fusion partner; v) a C- terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 864 to 909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; vi) a RuvCIII polypeptide; and vii) a PAM-inter acting polypeptide.
[0081] As another example, in some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid
sequence identity to amino acids 719 to 864 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; iv) a first fusion partner; v) a C- terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 865 to 909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; vi) a RuvCIII polypeptide; and vii) a PAM-inter acting polypeptide.
[0082] As another example, in some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 865 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; iv) a first fusion partner; v) a C- terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 866 to 909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; vi) a RuvCIII polypeptide; and vii) a PAM-inter acting polypeptide.
[0083] As another example, in some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 866 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; iv) a first fusion partner; v) a C- terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 867 to 909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; vi) a RuvCIII polypeptide; and vii) a PAM-inter acting polypeptide.
[0084] As another example, in some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 867 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; iv) a first fusion partner; v) a C- terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 868 to 909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; vi) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide.
[0085] As another example, in some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 868 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; iv) a first fusion partner; v) a C- terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 869 to 909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; vi) a RuvCIII polypeptide; and vii) a PAM-interacting polypeptide.
[0086] As another example, in some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 869 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; iv) a first fusion partner; v) a C- terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%,
amino acid sequence identity to amino acids 870 to 909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; vi) a RuvCIII polypeptide; and vii) a PAM-inter acting polypeptide.
[0087] As another example, in some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 870 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; iv) a first fusion partner; v) a C- terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 871 to 909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; vi) a RuvCIII polypeptide; and vii) a PAM-inter acting polypeptide.
[0088] As another example, in some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 871 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; iv) a first fusion partner; v) a C- terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 872 to 909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; vi) a RuvCIII polypeptide; and vii) a PAM-inter acting polypeptide.
[0089] As another example, in some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 872 of the 5. pyogenes Cas9 amino acid sequence set
forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; iv) a first fusion partner; v) a C- terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 873 to 909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; vi) a RuvCIII polypeptide; and vii) a PAM-inter acting polypeptide.
[0090] As another example, in some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 873 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; iv) a first fusion partner; v) a C- terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 874 to 909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; vi) a RuvCIII polypeptide; and vii) a PAM-inter acting polypeptide.
[0091] As another example, in some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an N-terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 719 to 874 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; iv) a first fusion partner; v) a C- terminal portion of an HNH polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 875 to 909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; vi) a RuvCIII polypeptide; and vii) a PAM-inter acting polypeptide.
Fusion partner inserted within RuvCIII polypeptide
In some cases, the first fusion partner is inserted within the RuvCIII polypeptide. The RuvCIII polypeptide of 5. pyogenes Cas9 is amino acids 910-1099 of the amino acid sequence set forth in SEQ ID NO: 1545. For example, in some cases, the first fusion partner is inserted in a site within amino acids 950 to 1060 of amino acids 910-1099 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. For example, in some cases, the first fusion partner is inserted at or near amino acid 1016 of amino acids 910-1099 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 1016 of amino acids 910-1099 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 1010 of amino acids 910-1099 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 1011 of amino acids 910-1099 of the amino acid sequence of the 5.
pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795- 1346. In some cases, the first fusion partner is inserted at amino acid 1012 of amino acids 910- 1099 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 1013 of amino acids 910-1099 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 1014 of amino acids 910-1099 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 1015 of amino acids 910-1099 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the
first fusion partner is inserted at amino acid 1017 of amino acids 910-1099 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 1018 of amino acids 910-1099 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 1019 of amino acids 910-1099 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 1020 of amino acids 910-1099 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 1021 of amino acids 910-1099 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 1022 of amino acids 910-1099 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 1023 of amino acids 910-1099 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 1024 of amino acids 910-1099 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346. In some cases, the first fusion partner is inserted at amino acid 1025 of amino acids 910-1099 of the amino acid sequence of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346.
As one non-limiting example, the first fusion polypeptide can comprise, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH
polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide; v) a first fusion partner; vi) a C-terminal portion of a RuvCIII polypeptide; and v) a PAM-interacting polypeptide.
[0094] An N-terminal portion of a RuvCIII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 80 amino acids to 106 amino acids of amino acids 910 to 1015 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 80 amino acids to 120 amino acids, from 80 amino acids to 90 amino acids, from 90 amino acids to 100 amino acids, from 100 amino acids to 110 amino acids, or from 110 amino acids to 120 amino acids. In some cases, a RuvCIII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1015 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 100 amino acids to 106 amino acids (e.g., 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, or 110 amino acids).
[0095] A C-terminal portion of a RuvCIII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 75 amino acids to 84 amino acids of amino acids 1016 to 1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 70 amino acids to 100 amino acids, from 70 amino acids to 80 amino acids, from 80 amino acids to 90 amino acids, or from 90 amino acids to 100 amino acids. In some cases, a C-terminal RuvCIII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1016 to 1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 80 amino acids to 90 amino acids (e.g., 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, or 90 amino acids).
[0096] For example, in some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%,
amino acid sequence identity to amino acids 910 to 1010 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; v) a first fusion partner; vi) a C-terminal portion of a RuvCIII polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1011-1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and vii) a PAM-interacting polypeptide.
[0097] As another example, in some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1011 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795- 1346; v) a first fusion partner; vi) a C-terminal portion of a RuvCIII polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1012-1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and vii) a PAM-interacting polypeptide.
[0098] As another example, in some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1012 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795- 1346; v) a first fusion partner; vi) a C-terminal portion of a RuvCIII polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1013-1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and vii) a PAM-interacting polypeptide.
[0099] As another example, in some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1013 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795- 1346; v) a first fusion partner; vi) a C-terminal portion of a RuvCIII polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1014-1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and vii) a PAM-interacting polypeptide.
[00100] As another example, in some cases, the first fusion polypeptide comprises, in order from
N-terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1014 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795- 1346; v) a first fusion partner; vi) a C-terminal portion of a RuvCIII polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1015-1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and vii) a PAM-interacting polypeptide.
[00101] As another example, in some cases, the first fusion polypeptide comprises, in order from
N-terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1015 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795- 1346; v) a first fusion partner; vi) a C-terminal portion of a RuvCIII polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%,
at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1016-1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and vii) a PAM-interacting polypeptide.
[00102] As another example, in some cases, the first fusion polypeptide comprises, in order from
N-terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1016 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795- 1346; v) a first fusion partner; vi) a C-terminal portion of a RuvCIII polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1017-1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and vii) a PAM-interacting polypeptide.
[00103] As another example, in some cases, the first fusion polypeptide comprises, in order from
N-terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1017 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795- 1346; v) a first fusion partner; vi) a C-terminal portion of a RuvCIII polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1018-1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and vii) a PAM-interacting polypeptide.
[00104] As another example, in some cases, the first fusion polypeptide comprises, in order from
N-terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1018 of the 5.
pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795- 1346; v) a first fusion partner; vi) a C-terminal portion of a RuvCIII polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1019-1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and vii) a PAM-interacting polypeptide.
[00105] As another example, in some cases, the first fusion polypeptide comprises, in order from
N-terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1019 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795- 1346; v) a first fusion partner; vi) a C-terminal portion of a RuvCIII polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1020-1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and vii) a PAM-interacting polypeptide.
[00106] As another example, in some cases, the first fusion polypeptide comprises, in order from
N-terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1020 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795- 1346; v) a first fusion partner; vi) a C-terminal portion of a RuvCIII polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1021-1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and vii) a PAM-interacting polypeptide.
[00107] As another example, in some cases, the first fusion polypeptide comprises, in order from
N-terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1021 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795- 1346; v) a first fusion partner; vi) a C-terminal portion of a RuvCIII polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1022-1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and vii) a PAM-interacting polypeptide.
[00108] As another example, in some cases, the first fusion polypeptide comprises, in order from
N-terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1022 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795- 1346; v) a first fusion partner; vi) a C-terminal portion of a RuvCIII polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1023-1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and vii) a PAM-interacting polypeptide.
[00109] As another example, in some cases, the first fusion polypeptide comprises, in order from
N-terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1023 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795- 1346; v) a first fusion partner; vi) a C-terminal portion of a RuvCIII polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%,
at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1024-1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and vii) a PAM-interacting polypeptide.
[00110] As another example, in some cases, the first fusion polypeptide comprises, in order from
N-terminus to C-terminus: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) an N-terminal portion of a RuvCIII polypeptide, comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1024 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795- 1346; v) a first fusion partner; vi) a C-terminal portion of a RuvCIII polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1025-1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and vii) a PAM-interacting polypeptide.
Second fusion polypeptide
[00111] In some cases, the second polypeptide of a Cas9 heterodimer of the present disclosure
comprises an a-helical lobe (also referred to as "an alpha-helical recognition region") of a Cas9 polypeptide. For example, in some cases, the second polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 400 amino acids to 658 amino acids of amino acids 61 to 718 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 400 amino acids to 800 amino acids, e.g., from 400 amino acids to 450 amino acids, from 450 amino acids to 500 amino acids, from 500 amino acids to 550 amino acids, from 550 amino acids to 600 amino acids, from 600 amino acids to 650 amino acids, from 650 amino acids to 700 amino acids, from 700 amino acids to 750 amino acids, or from 750 amino acids to 800 amino acids. In some cases, the second polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 61-718 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has
a length of from 650 amino acids to 660 amino acids (e.g., 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, or 660 amino acids).
[00112] In some cases, the second polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 400 amino acids to 624 amino acids of amino acids 95 to 718 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from about 400 amino acids to 800 amino acids, e.g., from 400 amino acids to 450 amino acids, from 450 amino acids to 500 amino acids, from 500 amino acids to 550 amino acids, from 550 amino acids to 600 amino acids, from 600 amino acids to 650 amino acids, from 650 amino acids to 700 amino acids, from 700 amino acids to 750 amino acids, or from 750 amino acids to 800 amino acids. In some cases, the second polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 95 to 718 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 620 amino acids to 630 amino acids (e.g., 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, or 630 amino acids).
[00113] In some cases, G56 (of the 5. pyogenes sequence set forth in SEQ ID NO: 1545) can be selected as the N-terminus for the alpha-helical lobe (e.g., due to its location in a poorly-conserved linker just before the arginine-rich bridge helix, which has been shown to be critical for Cas9 cleavage activity in human cells). In some cases, the second polypeptide of a Cas9 heterodimer of the present disclosure comprises an a-helical lobe (also referred to as "an alpha-helical recognition region") of a Cas9 polypeptide. For example, in some cases, the second polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 400 amino acids to 658 amino acids of amino acids 56 to 714 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 400 amino acids to 800 amino acids, e.g., from 400 amino acids to 450 amino acids, from 450 amino acids to 500 amino acids, from 500 amino acids to 550 amino acids, from 550 amino acids to 600 amino acids, from 600 amino acids to 650 amino acids, from 650 amino acids to 700 amino acids, from 700 amino acids to 750 amino acids, or from 750 amino acids to 800 amino acids. In some cases, the second polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least
98%, at least 99%, or 100%, amino acid sequence identity to amino acids 56-714 of the 5.
pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795- 1346; and has a length of from 650 amino acids to 660 amino acids (e.g., 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, or 660 amino acids).
[00114] In some cases, the C-terminus of the alpha-helical lobe can be at the beginning, end, or within the linker between the two lobes of the WT Cas9 protein. For example, the C-terminus of the alpha-helical lobe can be at or near S714 of the WT Cas9 protein set forth in SEQ I D NO: 1545. For example, the C-terminus of the alpha-helical lobe can be S714 of the WT Cas9 protein set forth in SEQ I D NO: 1545.
[00115] In some cases, the second fusion polypeptide comprises, in order from N-terminus to C- terminus: a) a second fusion partner; and b) a second polypeptide that comprises an alpha-helical recognition region. In some cases, the second fusion polypeptide comprises, in order from N- terminus to C-terminus: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner.
[00116] In some cases, the second fusion polypeptide comprises a heterologous sequence that provides for subcellular localization (e.g., an NLS for targeting to the nucleus; a mitochondrial localization signal for targeting to the mitochondria; a chloroplast localization signal for targeting to a chloroplast; an ER retention signal; and the like). In some cases, the second fusion polypeptide includes 2 or more, 3 or more, 4 or more, or 5 or more NLSs. In some cases, an NLS is located at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the N- terminus and/or at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the C-terminus.
[00117] In some cases, the second fusion polypeptide comprises an NLS. For example, in some cases, the second fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a second fusion partner; and c) a second polypeptide that comprises an alpha-helical recognition region. In some cases, the second fusion polypeptide comprises, in order from N-terminus to C- terminus: a) an NLS; b) a second fusion partner; c) a second polypeptide that comprises an alpha-helical recognition region; and d) an NLS. In some cases, the second fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a second polypeptide that comprises an alpha-helical recognition region; and c) a second fusion partner. In some cases, the second fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a second polypeptide that comprises an alpha-helical recognition region; c) a second fusion partner; and d) an NLS. In some cases, the NLS comprises the amino acid sequence
MAPKKKRKVGIHGVPAA (SEQ ID NO: 1546). In some cases, the NLS comprises the amino
acid sequence KRPAATKKAGQAKKKK (SEQ ID NO: 1547). Other suitable NLS are described elsewhere herein.
[00118] An NLS can be at or near the N-terminus and/or the C-terminus. In some cases, the second fusion polypeptide comprises two or more NLSs (e.g., 3 or more, 4 or more, or 5 or more NLSs). In some cases, the second fusion polypeptide comprises one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the N-terminus and/or one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the C-terminus. The term "at or near" is used here because, as is known in the art, the NLS need not be at the actual terminus of a protein, but can be positioned near (e.g., within 100 amino acids of) an N- and/or C- terminus (e.g., within 80, within 75, within 60, within 55, within 50, within 45, within 40, within 35, or within 30 amino acids of the an N-and/or C-terminus).
[00119] In some cases, the second fusion polypeptide comprises one or more linker polypeptides. For example, a linker polypeptide can be interposed between any of: a) an NLS and a fusion partner; b) a fusion partner and an alpha-helical lobe; and c) an alpha-helical lobe and an NLS. Suitable linker polypeptides are described elsewhere herein.
Cas9 heterodimer comprising a circularly permuted polypeptide
[00120] In some embodiments, the Cas9 nuclease lobe of a Cas9 heterodimer of the present disclosure is a circular permutant. As used herein, the term "circular permutant" refers to a variant polypeptide (e.g., of a subject Cas9 heterodimer) in which one section of the primary amino acid sequence has been moved to a different position within the primary amino acid sequence of the polypeptide, but where the local order of amino acids has not been changed, and where the three dimensional architecture of the protein is conserved. For example, a circular permutant of a wild type 500 amino acid polypeptide may have an N-terminal residue of residue number 50 (relative to the wild type protein), where residues 1-49 of the wild type protein are added the C-terminus. Such a circular permutant, relative to the wild type protein sequence would have, from N- terminus to C-terminus, amino acid numbers 50-500 followed by 1-49 (amino acid 49 would be the C-terminal residue). Thus, such an example circular permutant would have the same total number of amino acids as the wild type reference protein, and the amino acids would even be in the same order (locally), but the overall primary amino acid sequence is changed.
[00121] In some embodiments, a Cas9 heterodimer of the present disclosure comprises: a) a first,
circularly permuted, polypeptide comprising: a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; where the first polypeptide comprises a first member of a dimerization pair; and b) a second polypeptide comprising an alpha-helical recognition region and a second member of a dimerization pair.
[00122] For example, in some cases, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a first member of a dimerization pair; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a second member of the dimerization pair.
First fusion polypeptide
[00123] As described above, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a first member of a dimerization pair; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a second member of the dimerization pair. In some cases, the first fusion partner (first member of the dimerization pair) is covalently linked, directly or via a linker, at or near (e.g., within 1 to 50 amino acids of) the amino terminus (N-terminus) of the first, circular permuted, polypeptide. In some cases, the first member of the dimerization pair is covalently linked, directly or via a linker, at or near (e.g., within 1 to 50 amino acids of) the carboxyl terminus (C -terminus) of the first, circular permuted, polypeptide. In some cases, the first polypeptide comprises a nuclease lobe of a Cas9 polypeptide.
[00124] In some cases, a first fusion polypeptide comprises one or more linker polypeptides. A linker polypeptide can be interposed between any of the various possible components (polypeptides) of a first fusion polypeptide. Examples of suitable positions for a linker polypeptide include, but are not limited to, interposed between: a) an NLS and a fusion partner; b) a fusion partner and a RuvCII polypeptide; c) a PAM-interacting polypeptide and a RuvCI polypeptide; d) a RuvCI polypeptide and an NLS; e) a RuvCI polypeptide and a fusion partner; and f) a RuvCI polypeptide and a RuvCII polypeptide.
[00125] The linker polypeptide may have any of a variety of amino acid sequences. Proteins can be joined by a spacer peptide, generally of a flexible nature, although other chemical linkages are not excluded. Currently, it is contemplated that the most useful linker sequences will generally be peptides of between about 6 and about 40 amino acids in length, or between about 6 and about 25 amino acids in length. These linkers are generally produced by using synthetic, linker- encoding oligonucleotides to couple the proteins. Peptide linkers with a degree of flexibility will
generally be preferred. The linking peptides may have virtually any amino acid sequence, bearing in mind that the preferred linkers will have a sequence that results in a generally flexible peptide. The use of small amino acids, such as glycine and alanine, are of use in creating a flexible peptide. The creation of such sequences is routine to those of skill in the art. A variety of different linkers are commercially available and are considered suitable for use.
[00126] Exemplary polypeptide linkers include glycine polymers (G)n, glycine-serine polymers
(including, for example, (GS)n, GSGGSn (SEQ ID NO: 1548) and GGGSn (SEQ ID NO: 1549), where n is an integer of at least one), glycine-alanine polymers, alanine-serine polymers.
Exemplary linkers can comprise amino acid sequences including, but not limited to, GGSG (SEQ ID NO: 1550), GGSGG (SEQ ID NO: 1551), GSGSG (SEQ ID NO: 1552), GSGGG (SEQ ID NO: 1553), GGGSG (SEQ ID NO: 1554), GSSSG (SEQ ID NO: 1555), and the like. The ordinarily skilled artisan will recognize that design of a peptide conjugated to any elements described above can include linkers that are all or partially flexible, such that the linker can include a flexible linker as well as one or more portions that confer less flexible structure.
Cas9 nuclease lobe circular permutant 1
[00127] In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first fusion partner; and b) a first polypeptide comprising: i) a RuvCII polypeptide; ii) an HNH polypeptide; iii) a RuvCIII polypeptide; iv) a PAM-inter acting polypeptide; and v) a RuvCI polypeptide. In some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: a) a first polypeptide comprising: i) a RuvCII polypeptide; ii) an HNH polypeptide; iii) a RuvCIII polypeptide; iv) a PAM-interacting polypeptide; and v) a RuvCI polypeptide; and b) a first fusion partner. In some cases, the first fusion partner is a first member of a dimerization pair. Suitable first members of a dimerization pair are described herein.
[00128] In some cases, the first fusion polypeptide comprises a heterologous sequence that provides for subcellular localization (e.g., a nuclear localization signal (NLS) for targeting to the nucleus; a mitochondrial localization signal for targeting to the mitochondria; a chloroplast localization signal for targeting to a chloroplast; an ER retention signal; and the like). In some cases, the first fusion polypeptide includes 2 or more, 3 or more, 4 or more, or 5 or more NLSs. In some cases, an NLS is located at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the N-terminus and/or at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the C-terminus. In some cases, the first fusion polypeptide comprises a nuclear localization signal (NLS). For example, in some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: a) an NLS; b) a first fusion partner; and c) a first polypeptide comprising: i) a RuvCII polypeptide; ii) an HNH polypeptide; iii) a RuvCIII polypeptide; iv) a PAM-interacting polypeptide; and v) a RuvCI polypeptide. In some cases, the first fusion
polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a first fusion partner; c) a first polypeptide comprising: i) a RuvCII polypeptide; ii) an HNH polypeptide; iii) a RuvCIII polypeptide; iv) a PAM-inter acting polypeptide; and v) a RuvCI polypeptide; and d) an NLS. In some cases, the first fusion polypeptide comprises, in order from N-terminus to C- terminus: a) an NLS; b) a first polypeptide comprising: i) a RuvCII polypeptide; ii) an HNH polypeptide; iii) a RuvCIII polypeptide; iv) a PAM-interacting polypeptide; and v) a RuvCI polypeptide; and c) a first fusion partner. In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first polypeptide comprising: i) a RuvCII polypeptide; ii) an HNH polypeptide; iii) a RuvCIII polypeptide; iv) a PAM-interacting polypeptide; and v) a RuvCI polypeptide; b) a first fusion partner; and c) an NLS. In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first fusion partner; b a first polypeptide comprising: i) a RuvCII polypeptide; ii) an HNH polypeptide; iii) a RuvCIII polypeptide; iv) a PAM-interacting polypeptide; and v) a RuvCI polypeptide; and c) an NLS. In some cases, the first fusion partner is a first member of a dimerization pair. In some cases, the NLS comprises the amino acid sequence MAPKKKRKVGIHGVPAA (SEQ ID NO: 1546). In some cases, the NLS comprises the amino acid sequence KRPAATKKAGQAKKKK (SEQ ID NO: 1547). Other suitable NLS are described elsewhere herein.
[00129] An NLS can be at or near the N-terminus and/or the C-terminus. In some cases, the first fusion polypeptide comprises two or more NLSs (e.g., 3 or more, 4 or more, or 5 or more NLSs). In some cases, the first fusion polypeptide comprises one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the N-terminus and/or one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the C-terminus. The term "at or near" is used here because, as is known in the art, the NLS need not be at the actual terminus of a protein, but can be positioned near (e.g., within 100 amino acids of) an N- and/or C- terminus (e.g., within 80, within 75, within 60, within 55, within 50, within 45, within 40, within 35, or within 30 amino acids of the an N-and/or C-terminus).
[00130] A RuvCII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 to 57 amino acids of amino acids 718-774 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 40 amino acids to about 70 amino acids, e.g., from 40 amino acids to 45 amino acids, from 45 amino acids to 50 amino acids, from 50 amino acids to 55 amino acids, from 55 amino acids to 60 amino acids, from 60 amino acids to 65 amino acids, or from 65 amino acids to 70 amino acids. In some cases, a RuvCII polypeptide comprises an
amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 718-774 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of 55-60 (e.g., 55, 56, 57, 58, 59, or 60) amino acids.
[00131] A RuvCII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 to 46 amino acids of amino acids 729-775 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 40 amino acids to about 60 amino acids, e.g., from 40 amino acids to 45 amino acids, from 45 amino acids to 50 amino acids, from 50 amino acids to 55 amino acids, or from 55 amino acids to 60 amino acids. In some cases, a RuvCII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 728-774 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of 45-50 (e.g., 45, 46, 47, 48, 49, or 50) amino acids.
[00132] An HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 100 to 134 amino acids of amino acids 776-909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 90 amino acids to 150 amino acids, e.g., from 90 amino acids to 95 amino acids, from 95 to amino acids to 100 amino acids, from 100 amino acids to 125 amino acids, from 125 amino acids to 130 amino acids, from 130 amino acids to 135 amino acids, from 135 amino acids to 140 amino acids, from 140 amino acids to 145 amino acids, or from 145 amino acids to 150 amino acids. In some cases, an HNH polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 776-909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 130 amino acids to 140 amino acids (e.g., 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, or 140 amino acids).
[00133] A RuvCIII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 150 amino acids to 190 amino acids of amino acids 910 to 1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 150 amino acids to 160 amino acids, from 160 amino acids to 170 amino acids, from 170 amino acids to 180 amino acids, from 180 amino acids to 190 amino acids, from 190 amino acids to 200 amino acids, from 200 amino acids to 210 amino acids, or from 210 amino acids to 220 amino acids. In some cases, a RuvCIII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 180 amino acids to 190 amino acids (e.g., 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, or 190 amino acids).
[00134] A PAM-interacting polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 200 amino acids to 268 amino acids of amino acids 1100 to 1367 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 240 amino acids to 280 amino acids, e.g., from 240 amino acids to 250 amino acids, from 250 amino acids to 260 amino acids, from 260 amino acids to 270 amino acids, or from 270 amino acids to 280 amino acids. In some cases, a PAM-interacting polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1100 to 1367 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 260 amino acids to 270 amino acids (e.g., 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, or 270 amino acids).
[00135] A RuvCI polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 amino acids to 60 amino acids of amino acids 1-60 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID
NOs: 1-259 and 795-1346; and can have a length of from 40 amino acids to 80 amino acids, e.g., from 40 amino acids to 50 amino acids, from 50 amino acids to 60 amino acids, from 60 amino acids to 70 amino acids, or from 70 amino acids to 80 amino acids. In some cases, a RuvCI polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1-60 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 50 amino acids to 60 amino acids (e.g., 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, or 60 amino acids).
Cas9 nuclease lobe circular permutant 2
[00136] In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first fusion partner; and b) a first polypeptide comprising: i) a C-terminal portion of an HNH polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-inter acting polypeptide; v) a RuvCI polypeptide; vi) a RuvCII polypeptide; and vi) an N-terminal portion of an HNH polypeptide. In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first polypeptide comprising: i) a C-terminal portion of an HNH polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-interacting polypeptide; v) a RuvCI polypeptide; vi) a RuvCII polypeptide; and vi) an N-terminal portion of an HNH polypeptide; and b) a first fusion partner. In some cases, the first fusion partner is a first member of a dimerization pair. Suitable first members of a dimerization pair are described herein.
[00137] In some cases, the first fusion polypeptide comprises a heterologous sequence that provides for subcellular localization (e.g., a nuclear localization signal (NLS) for targeting to the nucleus; a mitochondrial localization signal for targeting to the mitochondria; a chloroplast localization signal for targeting to a chloroplast; an ER retention signal; and the like). In some cases, the first fusion polypeptide includes 2 or more, 3 or more, 4 or more, or 5 or more NLSs. In some cases, an NLS is located at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the N-terminus and/or at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the C-terminus. In some cases, the first fusion polypeptide comprises a nuclear localization signal (NLS).
[00138] In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a first fusion partner; and c) a first polypeptide comprising: i) a C-terminal portion of an HNH polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-interacting polypeptide; v) a RuvCI polypeptide; vi) a RuvCII polypeptide; and vi) an N-terminal portion of an HNH polypeptide. In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first polypeptide comprising: i) a C-terminal portion of an HNH polypeptide; ii)
a RuvCIII polypeptide; iii) a PAM-interacting polypeptide; v) a RuvCI polypeptide; vi) a RuvCII polypeptide; and vi) an N-terminal portion of an HNH polypeptide; b) a first fusion partner; and c) an NLS. In some cases, the first fusion polypeptide comprises, in order from N-terminus to C- terminus: a) an NLS; b) a first fusion partner; c) a first polypeptide comprising: i) a C-terminal portion of an HNH polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-interacting polypeptide; v) a RuvCI polypeptide; vi) a RuvCII polypeptide; and vi) an N-terminal portion of an HNH polypeptide; and d) an NLS. In some cases, the NLS comprises the amino acid sequence MAPKKKRKVGIHGVPAA (SEQ ID NO: 1546). In some cases, the NLS comprises the amino acid sequence KRPAATKKAGQAKKKK (SEQ ID NO: 1547). Other suitable NLS are described elsewhere herein. In some cases, the first fusion partner is a first member of a dimerization pair.
[00139] An NLS can be at or near the N-terminus and/or the C-terminus. In some cases, the first fusion polypeptide comprises two or more NLSs (e.g., 3 or more, 4 or more, or 5 or more NLSs). In some cases, the first fusion polypeptide comprises one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the N-terminus and/or one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the C-terminus. The term "at or near" is used here because, as is known in the art, the NLS need not be at the actual terminus of a protein, but can be positioned near (e.g., within 100 amino acids of) an N- and/or C- terminus (e.g., within 80, within 75, within 60, within 55, within 50, within 45, within 40, within 35, or within 30 amino acids of the an N-and/or C-terminus).
[00140] In some cases, a first fusion polypeptide comprises one or more linker polypeptides. For
example, a linker polypeptide can be interposed between any of: a) an NLS and a fusion partner; b) a fusion partner and a C-terminal portion of an HNH polypeptide; c) a PAM-interacting polypeptide and a RuvCI polypeptide; and d) an N-terminal portion of an HNH polypeptide and a fusion partner. Suitable linker polypeptides are as described above.
[00141] A C-terminal portion of an HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 35 to 42 amino acids of amino acids 868-909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 35 to 42 amino acids (e.g., 35, 36, 37, 38, 39, 40, 41, or 42 amino acids). A C-terminal portion of an HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 50 amino acids to 67 amino acids of amino acids 842-909 of the 5. pyogenes
Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 50 amino acids to 80 amino acids, e.g., from 50 amino acids to 60 amino acids, from 60 amino acids to 70 amino acids, or from 70 amino acids to 80 amino acids.
[00142] An N-terminal portion of an HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 80 amino acids to 92 amino acids of amino acids 776 to 867 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 80 amino acids to 110 amino acids, e.g., from 80 amino acids to 90 amino acids, from 90 amino acids to 100 amino acids, or from 100 amino acids to 110 amino acids. In some cases, an N-terminal portion of an HNH polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 776 to 867 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of 85 amino acids to 95 amino acids (85, 86, 87, 88, 89, 90, 91, 92, 93, 94, or 95 amino acids). An N-terminal portion of an HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 50 amino acids to 66 amino acids of amino acids 776- 841 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 50 amino acids to 80 amino acids, e.g., from 50 amino acids to 60 amino acids, from 60 amino acids to 70 amino acids, or from 70 amino acids to 80 amino acids.
[00143] A RuvCIII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 150 amino acids to 190 amino acids of amino acids 910 to 1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 150 amino acids to 160 amino acids, from 160 amino acids to 170 amino acids, from 170 amino acids to 180 amino acids, from 180 amino acids to 190 amino acids, from 190 amino acids to 200 amino acids, from 200 amino acids to 210 amino acids, or from 210 amino acids to 220 amino acids. In some cases, a RuvCIII
polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 180 amino acids to 190 amino acids (e.g., 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, or 190 amino acids).
[00144] A PAM-interacting polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 200 amino acids to 268 amino acids of amino acids 1100 to 1367 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 240 amino acids to 280 amino acids, e.g., from 240 amino acids to 250 amino acids, from 250 amino acids to 260 amino acids, from 260 amino acids to 270 amino acids, or from 270 amino acids to 280 amino acids. In some cases, a PAM-interacting polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1100 to 1367 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 260 amino acids to 270 amino acids (e.g., 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, or 270 amino acids).
[00145] A RuvCI polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 amino acids to 60 amino acids of amino acids 1-60 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 40 amino acids to 80 amino acids, e.g., from 40 amino acids to 50 amino acids, from 50 amino acids to 60 amino acids, from 60 amino acids to 70 amino acids, or from 70 amino acids to 80 amino acids. In some cases, a RuvCI polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1-60 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 50 amino acids to 60 amino acids (e.g., 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, or 60 amino acids).
[00146] A RuvCII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 to 46 amino acids of amino acids 729-775 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 40 amino acids to about 60 amino acids, e.g., from 40 amino acids to 45 amino acids, from 45 amino acids to 50 amino acids, from 50 amino acids to 55 amino acids, or from 55 amino acids to 60 amino acids. In some cases, a RuvCII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 728-774 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of 45-50 (e.g., 45, 46, 47, 48, 49, or 50) amino acids. Cas9 nuclease lobe circular permutant 3
[00147] In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first fusion partner; and b) a first polypeptide comprising: i) an HNH polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-interacting polypeptide; iv) a RuvCI polypeptide; and vi) a RuvCII polypeptide. In some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: a) a first polypeptide comprising: i) an HNH polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-interacting polypeptide; iv) a RuvCI polypeptide; and vi) a RuvCII polypeptide; and b) a first fusion partner. In some cases, the first fusion partner is a first member of a dimerization pair. Suitable first members of a dimerization pair are described herein.
[00148] In some cases, the first fusion polypeptide comprises a heterologous sequence that provides for subcellular localization (e.g., a nuclear localization signal (NLS) for targeting to the nucleus; a mitochondrial localization signal for targeting to the mitochondria; a chloroplast localization signal for targeting to a chloroplast; an ER retention signal; and the like). In some cases, the first fusion polypeptide includes 2 or more, 3 or more, 4 or more, or 5 or more NLSs. In some cases, an NLS is located at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the N-terminus and/or at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the C-terminus. In some cases, the first fusion polypeptide comprises a nuclear localization signal (NLS).
[00149] In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a first fusion partner; and c) a first polypeptide comprising: i) an HNH
polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-interacting polypeptide; iv) a RuvCI polypeptide; and vi) a RuvCII polypeptide. In some cases, the first fusion polypeptide comprises,
in order from N-terminus to C-terminus: a) a first polypeptide comprising: i) an HNH polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-inter acting polypeptide; iv) a RuvCI polypeptide; and vi) a RuvCII polypeptide; b) a first fusion partner; and c) an NLS. In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a first fusion partner; c) a first polypeptide comprising: i) an HNH polypeptide; ii) a RuvCIII polypeptide; iii) a PAM-interacting polypeptide; iv) a RuvCI polypeptide; and vi) a RuvCII polypeptide; and d) an NLS. In some cases, the NLS comprises the amino acid sequence MAPKKKRKVGIHGVPAA (SEQ ID NO: 1546). In some cases, the NLS comprises the amino acid sequence KRPAATKKAGQAKKKK (SEQ ID NO: 1547). Other suitable NLS are described elsewhere herein. In some cases, the first fusion partner is a first member of a dimerization pair.
[00150] An NLS can be at or near the N-terminus and/or the C-terminus. In some cases, the first fusion polypeptide comprises two or more NLSs (e.g., 3 or more, 4 or more, or 5 or more NLSs). In some cases, the first fusion polypeptide comprises one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the N-terminus and/or one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the C-terminus. The term "at or near" is used here because, as is known in the art, the NLS need not be at the actual terminus of a protein, but can be positioned near (e.g., within 100 amino acids of) an N- and/or C- terminus (e.g., within 80, within 75, within 60, within 55, within 50, within 45, within 40, within 35, or within 30 amino acids of the an N-and/or C-terminus).
[00151] In some cases, a first fusion polypeptide comprises one or more linker polypeptides. For
example, a linker polypeptide can be interposed between any of: a) an NLS and a fusion partner; b) a fusion partner and an HNH polypeptide; c) a PAM-interacting polypeptide and a RuvCI polypeptide; and d) a RuvCII polypeptide and a fusion partner. Suitable linker polypeptides are as described above.
[00152] A RuvCIII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 150 amino acids to 190 amino acids of amino acids 910 to 1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 150 amino acids to 160 amino acids, from 160 amino acids to 170 amino acids, from 170 amino acids to 180 amino acids, from 180 amino acids to 190 amino acids, from 190 amino acids to 200 amino acids, from 200 amino acids to 210 amino acids, or from 210 amino acids to 220 amino acids. In some cases, a RuvCIII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at
least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 180 amino acids to 190 amino acids (e.g., 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, or 190 amino acids).
[00153] A PAM-interacting polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 200 amino acids to 268 amino acids of amino acids 1100 to 1367 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 240 amino acids to 280 amino acids, e.g., from 240 amino acids to 250 amino acids, from 250 amino acids to 260 amino acids, from 260 amino acids to 270 amino acids, or from 270 amino acids to 280 amino acids. In some cases, a PAM-interacting polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1100 to 1367 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 260 amino acids to 270 amino acids (e.g., 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, or 270 amino acids).
[00154] A RuvCI polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 amino acids to 60 amino acids of amino acids 1-60 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 40 amino acids to 80 amino acids, e.g., from 40 amino acids to 50 amino acids, from 50 amino acids to 60 amino acids, from 60 amino acids to 70 amino acids, or from 70 amino acids to 80 amino acids. In some cases, a RuvCI polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1-60 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 50 amino acids to 60 amino acids (e.g., 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, or 60 amino acids).
[00155] A RuvCII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 to 46 amino acids of amino acids 729-775 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 40 amino acids to about 60 amino acids, e.g., from 40 amino acids to 45 amino acids, from 45 amino acids to 50 amino acids, from 50 amino acids to 55 amino acids, or from 55 amino acids to 60 amino acids. In some cases, a RuvCII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 728-774 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of 45-50 (e.g., 45, 46, 47, 48, 49, or 50) amino acids.
[00156] An HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 100 to 134 amino acids of amino acids 776-909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 90 amino acids to 150 amino acids, e.g., from 90 amino acids to 95 amino acids, from 95 to amino acids to 100 amino acids, from 100 amino acids to 125 amino acids, from 125 amino acids to 130 amino acids, from 130 amino acids to 135 amino acids, from 135 amino acids to 140 amino acids, from 140 amino acids to 145 amino acids, or from 145 amino acids to 150 amino acids. In some cases, an HNH polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 776-909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 130 amino acids to 140 amino acids (e.g., 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, or 140 amino acids).
Cas9 nuclease lobe circular permutant 4
[00157] In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first fusion partner; and b) a first polypeptide comprising: i) a RuvCIII polypeptide; ii) a PAM-inter acting polypeptide; iii) a RuvCI polypeptide; iv) a RuvCII polypeptide; and v) an HNH polypeptide. In some cases, the first fusion polypeptide comprises, in order from N- terminus to C-terminus: a) a first polypeptide comprising: i) a RuvCIII polypeptide; ii) a PAM-
interacting polypeptide; iii) a RuvCI polypeptide; iv) a RuvCII polypeptide; and v) an HNH polypeptide; and b) a first fusion partner. In some cases, the first fusion partner is a first member of a dimerization pair. Suitable first members of a dimerization pair are described herein.
[00158] In some cases, the first fusion polypeptide comprises a heterologous sequence that provides for subcellular localization (e.g., a nuclear localization signal (NLS) for targeting to the nucleus; a mitochondrial localization signal for targeting to the mitochondria; a chloroplast localization signal for targeting to a chloroplast; an ER retention signal; and the like). In some cases, the first fusion polypeptide includes 2 or more, 3 or more, 4 or more, or 5 or more NLSs. In some cases, an NLS is located at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the N-terminus and/or at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the C-terminus. In some cases, the first fusion polypeptide comprises a nuclear localization signal (NLS).
[00159] In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a first fusion partner; and c) a first polypeptide comprising: i) a RuvCIII polypeptide; ii) a PAM-interacting polypeptide; iii) a RuvCI polypeptide; iv) a RuvCII polypeptide; and v) an HNH polypeptide. In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first polypeptide comprising: i) a RuvCIII polypeptide; ii) a PAM-interacting polypeptide; iii) a RuvCI polypeptide; iv) a RuvCII polypeptide; and v) an HNH polypeptide; b) a first fusion partner; and c) a fusion partner. In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a first fusion partner; c) a first polypeptide comprising: i) a RuvCIII polypeptide; ii) a PAM-interacting polypeptide; iii) a RuvCI polypeptide; iv) a RuvCII polypeptide; and v) an HNH polypeptide; d) an NLS. In some cases, the first fusion partner is a first member of a dimerization pair. In some cases, the NLS comprises the amino acid sequence
MAPKKKRKVGIHGVPAA (SEQ ID NO: 1546). In some cases, the NLS comprises the amino acid sequence KRPAATKKAGQAKKKK (SEQ ID NO: 1547). Other suitable NLS are described elsewhere herein. In some cases, the first fusion partner is a first member of a dimerization pair.
[00160] An NLS can be at or near the N-terminus and/or the C-terminus. In some cases, the first fusion polypeptide comprises two or more NLSs (e.g., 3 or more, 4 or more, or 5 or more NLSs). In some cases, the first fusion polypeptide comprises one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the N-terminus and/or one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the C-terminus. The term "at or near" is used here because, as is known in the art, the NLS need not be at the actual terminus of a protein, but can be positioned near (e.g., within 100 amino acids of) an N- and/or C- terminus (e.g., within 80,
within 75, within 60, within 55, within 50, within 45, within 40, within 35, or within 30 amino acids of the an N-and/or C-terminus).
[00161] In some cases, a first fusion polypeptide comprises one or more linker polypeptides. For
example, a linker polypeptide can be interposed between any of: a) an NLS and a fusion partner; b) a fusion partner and a RuvCIII polypeptide; c) a PAM-interacting polypeptide and a RuvCI polypeptide; and d) an HNH polypeptide and a fusion partner. Suitable linker polypeptides are as described above.
[00162] A RuvCIII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 150 amino acids to 190 amino acids of amino acids 910 to 1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 150 amino acids to 160 amino acids, from 160 amino acids to 170 amino acids, from 170 amino acids to 180 amino acids, from 180 amino acids to 190 amino acids, from 190 amino acids to 200 amino acids, from 200 amino acids to 210 amino acids, or from 210 amino acids to 220 amino acids. In some cases, a RuvCIII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 180 amino acids to 190 amino acids (e.g., 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, or 190 amino acids).
[00163] A PAM-interacting polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 200 amino acids to 268 amino acids of amino acids 1100 to 1367 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 240 amino acids to 280 amino acids, e.g., from 240 amino acids to 250 amino acids, from 250 amino acids to 260 amino acids, from 260 amino acids to 270 amino acids, or from 270 amino acids to 280 amino acids. In some cases, a PAM-interacting polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1100 to 1367 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 260
amino acids to 270 amino acids (e.g., 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, or 270 amino acids).
[00164] A RuvCI polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 amino acids to 60 amino acids of amino acids 1-60 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 40 amino acids to 80 amino acids, e.g., from 40 amino acids to 50 amino acids, from 50 amino acids to 60 amino acids, from 60 amino acids to 70 amino acids, or from 70 amino acids to 80 amino acids. In some cases, a RuvCI polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1-60 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 50 amino acids to 60 amino acids (e.g., 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, or 60 amino acids).
[00165] A RuvCII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 to 46 amino acids of amino acids 729-775 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 40 amino acids to about 60 amino acids, e.g., from 40 amino acids to 45 amino acids, from 45 amino acids to 50 amino acids, from 50 amino acids to 55 amino acids, or from 55 amino acids to 60 amino acids. In some cases, a RuvCII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 728-774 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of 45-50 (e.g., 45, 46, 47, 48, 49, or 50) amino acids.
[00166] An HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 100 to 134 amino acids of amino acids 776-909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 90 amino acids to 150 amino acids, e.g., from 90 amino
acids to 95 amino acids, from 95 to amino acids to 100 amino acids, from 100 amino acids to 125 amino acids, from 125 amino acids to 130 amino acids, from 130 amino acids to 135 amino acids, from 135 amino acids to 140 amino acids, from 140 amino acids to 145 amino acids, or from 145 amino acids to 150 amino acids. In some cases, an HNH polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 776-909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 130 amino acids to 140 amino acids (e.g., 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, or 140 amino acids).
Cas9 nuclease lobe circular permutant 5
[00167] In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first fusion partner; and b) a first polypeptide comprising: i) a C-terminal portion of a RuvCIII polypeptide; ii) a PAM-inter acting polypeptide; iii) a RuvCI polypeptide; iv) a RuvCII polypeptide; v) an HNH polypeptide; and vi) an N-terminal portion of a RuvCIII polypeptide. In some cases, the first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a first polypeptide comprising: i) a C-terminal portion of a RuvCIII polypeptide; ii) a PAM- interacting polypeptide; iii) a RuvCI polypeptide; iv) a RuvCII polypeptide; v) an HNH polypeptide; and vi) an N-terminal portion of a RuvCIII polypeptide; and b) a first fusion partner. In some cases, the first fusion partner is a first member of a dimerization pair. Suitable first members of a dimerization pair are described elsewhere herein.
[00168] A C-terminal portion of a RuvCIII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 75 amino acids to 84 amino acids of amino acids 1016 to 1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 70 amino acids to 100 amino acids, from 70 amino acids to 80 amino acids, from 80 amino acids to 90 amino acids, or from 90 amino acids to 100 amino acids. In some cases, a C-terminal RuvCIII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1016 to 1099 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 80 amino acids to 90 amino acids (e.g., 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, or 90 amino acids).
[00169] An N-terminal portion of a RuvCIII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 80 amino acids to 106 amino acids of amino acids 910 to 1015 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 80 amino acids to 120 amino acids, from 80 amino acids to 90 amino acids, from 90 amino acids to 100 amino acids, from 100 amino acids to 110 amino acids, or from 110 amino acids to 120 amino acids. In some cases, a RuvCIII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 910 to 1015 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 100 amino acids to 106 amino acids (e.g., 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, or 110 amino acids).
[00170] A PAM-interacting polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 200 amino acids to 268 amino acids of amino acids 1100 to 1367 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 240 amino acids to 280 amino acids, e.g., from 240 amino acids to 250 amino acids, from 250 amino acids to 260 amino acids, from 260 amino acids to 270 amino acids, or from 270 amino acids to 280 amino acids. In some cases, a PAM-interacting polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1100 to 1367 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 260 amino acids to 270 amino acids (e.g., 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, or 270 amino acids).
[00171] A RuvCI polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 amino acids to 60 amino acids of amino acids 1-60 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID
NOs: 1-259 and 795-1346; and can have a length of from 40 amino acids to 80 amino acids, e.g., from 40 amino acids to 50 amino acids, from 50 amino acids to 60 amino acids, from 60 amino acids to 70 amino acids, or from 70 amino acids to 80 amino acids. In some cases, a RuvCI polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 1-60 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 50 amino acids to 60 amino acids (e.g., 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, or 60 amino acids).
[00172] A RuvCII polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 40 to 46 amino acids of amino acids 729-775 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 40 amino acids to about 60 amino acids, e.g., from 40 amino acids to 45 amino acids, from 45 amino acids to 50 amino acids, from 50 amino acids to 55 amino acids, or from 55 amino acids to 60 amino acids. In some cases, a RuvCII polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 728-774 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of 45-50 (e.g., 45, 46, 47, 48, 49, or 50) amino acids.
[00173] An HNH polypeptide can comprise an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 100 to 134 amino acids of amino acids 776-909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and can have a length of from 90 amino acids to 150 amino acids, e.g., from 90 amino acids to 95 amino acids, from 95 to amino acids to 100 amino acids, from 100 amino acids to 125 amino acids, from 125 amino acids to 130 amino acids, from 130 amino acids to 135 amino acids, from 135 amino acids to 140 amino acids, from 140 amino acids to 145 amino acids, or from 145 amino acids to 150 amino acids. In some cases, an HNH polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 776-909 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding
segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 130 amino acids to 140 amino acids (e.g., 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, or 140 amino acids).
Examples of first fusion polypeptides
[00174] Non-limiting examples of suitable first fusion polypeptides are depicted in Figure 1. In some embodiments, a first fusion polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 82-829 of the amino acid sequence depicted in the following paragraph. In some cases, the fusion partner is linked, directly or via a linker, to the N- terminus of the polypeptide. For example, in some cases, a first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a fusion partner; and b) a polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 82-829 of the amino acid sequence amino acid sequence depicted in the following paragraph. Suitable fusion partners include a first member of a dimerization pair, where suitable first members of a dimerization pair are described elsewhere herein. In some cases, a first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a fusion partner; and c) a polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 82-829 of the amino acid sequence depicted in the following paragraph. In some cases, a first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a fusion partner; c) a polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 82-829 of the amino acid sequence depicted in the following paragraph; and d) a fusion partner.
[00175] MDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGIHGVPAASIAATLENDLAR LENENARLEKDIANLERDLAKLEREEAYFGGSGGSGGSASGQGDSLHEHIANLAGSPAI KKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKEL GSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKD DSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERG GLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDF RKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIA KSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVR KVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSV LVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLF
ELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQH KHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAA FKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDGSGGSGGSGGSGGSG GSGGSGGSGGVDDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLI GALLFDSGEKRPAATKKAGQAKKKK (SEQ ID NO://)
[00176] In some embodiments, a first fusion polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 82-820 of the amino acid sequence depicted in the following paragraph. In some cases, the fusion partner is linked, directly or via a linker, to the N-terminus of the polypeptide. For example, in some cases, a first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) a fusion partner; and b) a polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 82-820 of the amino acid sequence depicted in the following paragraph. Suitable fusion partners include a first member of a dimerization pair, where suitable first members of a dimerization pair are described elsewhere herein. In some cases, a first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a fusion partner; and c) a polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 82-820 of the amino acid sequence depicted in the following paragraph. In some cases, a first fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a fusion partner; c) a polypeptide comprising an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 82-820 of the amino acid sequence depicted in the following paragraph; and d) a fusion partner.
[00177] MDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGIHGVPAASIAATLENDLARLENENARLEKDI ANLERDLAKLEREEAYFGGSGGSGGSASGQGDNVPSEEWKKMKNYWRQLLNAKLITQRKFDNL TKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVS DFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQE IGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQV NIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLWAKVEKGKSKK LKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLI IKLPKYSLFELENGRKRMLASAGEL QKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEI IEQISEFSKRVILAD ANLDKVLSAYNKHRDKPIREQAE I IHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLI HQSITGLYETRIDLSQLGGDGSGGSGGSGGSGGSGGSGGSGGSGGVDDKKYS IGLDIGTNSVGW AVITDEYKVPSKKFKVLGNTDRHS IKKNLIGALLFDSGGSSGSPAIKKGILQTVKVVDELVKVM
GRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYY LQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDS IDNKVLTRSDKNRGKSEKRPAATKKAGQ AKKKK (SEQ ID NO : / / ) .
Second fusion polypeptide
[00178] As described above, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a first member of a dimerization pair; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region (e.g., an alpha helical lobe); and b) a second fusion partner, where the second fusion partner is a second member of the dimerization pair. In some cases, the fusion partner is at or near (e.g., within the first 50 amino acids of the N-terminus) the N-terminus of the second polypeptide. In some cases, the fusion partner is at or near (e.g., within the first 50 amino acids of the C-terminus) the C-terminus of the second polypeptide. In some cases, the fusion partner is located internally within the second fusion polypeptide.
[00179] In some cases, the second polypeptide comprises an a-helical lobe (also referred to as "an
alpha-helical recognition region") of a Cas9 polypeptide. For example, in some cases, the second polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 400 amino acids to 658 amino acids of amino acids 61 to 718 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795- 1346; and can have a length of from 400 amino acids to 800 amino acids, e.g., from 400 amino acids to 450 amino acids, from 450 amino acids to 500 amino acids, from 500 amino acids to 550 amino acids, from 550 amino acids to 600 amino acids, from 600 amino acids to 650 amino acids, from 650 amino acids to 700 amino acids, from 700 amino acids to 750 amino acids, or from 750 amino acids to 800 amino acids. In some cases, the second polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 61-718 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 650 amino acids to 660 amino acids (e.g., 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, or 660 amino acids).
[00180] In some cases, the second polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to a contiguous stretch of from 400 amino acids to 624 amino acids of amino acids 95 to 718 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from about 400 amino acids to 800 amino acids, e.g., from 400 amino acids to 450 amino acids, from 450 amino acids to 500 amino acids, from 500 amino acids to 550 amino acids, from 550 amino acids to 600 amino acids, from 600 amino acids to 650 amino acids, from 650 amino acids to 700 amino acids, from 700 amino acids to 750 amino acids, or from 750 amino acids to 800 amino acids. In some cases, the second polypeptide comprises an amino acid sequence having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to amino acids 95 to 718 of the 5. pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545, or a corresponding segment of a Cas9 polypeptide amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346; and has a length of from 620 amino acids to 630 amino acids (e.g., 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, or 630 amino acids).
[00181] In some cases, the second fusion polypeptide comprises, in order from N-terminus to C- terminus: a) a second fusion partner; and b) a second polypeptide that comprises an alpha-helical recognition region. In some cases, the second fusion polypeptide comprises, in order from N- terminus to C-terminus: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner.
[00182] In some cases, the second fusion polypeptide comprises a heterologous sequence that provides for subcellular localization (e.g., an NLS for targeting to the nucleus; a mitochondrial localization signal for targeting to the mitochondria; a chloroplast localization signal for targeting to a chloroplast; an ER retention signal; and the like). In some cases, the second fusion polypeptide includes 2 or more, 3 or more, 4 or more, or 5 or more NLSs. In some cases, an NLS is located at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the N- terminus and/or at or near (e.g., within 75 amino acids, 50 amino acids, or 30 amino acids) the C-terminus. In some cases, the second fusion polypeptide comprises an NLS.
[00183] For example, in some cases, the second fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a second fusion partner; and c) a second polypeptide that comprises an alpha-helical recognition region. In some cases, the second fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b) a second fusion partner; c) a second polypeptide that comprises an alpha-helical recognition region; and d) an NLS. In some cases, the second fusion polypeptide comprises, in order from N-terminus to C-terminus: a) an NLS; b)
a second polypeptide that comprises an alpha-helical recognition region; and c) a second fusion partner. In some cases, the second fusion polypeptide comprises, in order from N-terminus to C- terminus: a) an NLS; b) a second polypeptide that comprises an alpha-helical recognition region; c) a second fusion partner; and d) an NLS. In some cases, the NLS comprises the amino acid sequence MAPKKKRKVGIHGVPAA (SEQ ID NO: 1546). In some cases, the NLS comprises the amino acid sequence KRPAATKKAGQAKKKK (SEQ ID NO: 1547). Other suitable NLS are described elsewhere herein.
[00184] An NLS can be at or near the N-terminus and/or the C-terminus. In some cases, the second fusion polypeptide comprises two or more NLSs (e.g., 3 or more, 4 or more, or 5 or more NLSs). In some cases, the second fusion polypeptide comprises one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the N-terminus and/or one or more NLSs (e.g., 2 or more, 3 or more, or 4 or more NLSs) at or near the C-terminus. The term "at or near" is used here because, as is known in the art, the NLS need not be at the actual terminus of a protein, but can be positioned near (e.g., within 100 amino acids of) an N- and/or C- terminus (e.g., within 80, within 75, within 60, within 55, within 50, within 45, within 40, within 35, or within 30 amino acids of the an N-and/or C-terminus).
[00185] In some cases, the second fusion polypeptide comprises one or more linker polypeptides. For example, a linker polypeptide can be interposed between any of: a) an NLS and a fusion partner; b) a fusion partner and an alpha-helical lobe; and c) an alpha-helical lobe and an NLS.
First and second fusion partners
[00186] The first fusion partner of the first fusion polypeptide, and the second fusion partner of the second fusion polypeptide, of a Cas9 heterodimer of the present disclosure constitute a "dimer pair." A dimer pair is a pair of polypeptides that can dimerize with one another. Each member (each polypeptide) of the dimer pair can be part of a different polypeptide, and when the members of the binding pair (the dimer pair) are brought into close proximity with one another (e.g., bind to one another), the two different polypeptides (heterologous polypeptides) to which the dimer pair members are fused are brought into proximity with one another and can be said to dimerize (i.e., as a consequence of the members of the dimer pair dimerizing).
[00187] A Cas9 heterodimer of the present disclosure comprises two polypeptides that can interact to form a complex (i.e., to form the heterodimeric Cas9 protein). A Cas9 heterodimer of the present disclosure is also referred to herein as a "split Cas9" or a "split Cas9 protein." The fusion partners present in the first fusion polypeptide and the second fusion polypeptide can be induced to dimerize by a dimerizing agent. When the fusion partners present in the first fusion polypeptide and the second fusion polypeptide dimerize, the first fusion polypeptide and the second fusion polypeptide dimerize. In the absence of the dimerizing agent, and in the absence
of a guide RNA that includes a stem loop 2 and/or a stem loop 3, the first fusion polypeptide and the second fusion polypeptide do not dimerize. When the first fusion polypeptide and the second fusion polypeptide dimerize, the Cas9 heterodimer, together with a truncated guide RNA (e.g., a guide RNA that does not include stem loop 2 and/or stem loop 3), can bind a target nucleic acid. A Cas9 heterodimer of the present disclosure and a truncated guide RNA form a "Cas9 heterodimer system," described hereinbelow.
[00188] As an illustrative example, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide (comprising a Cas9 nuclease lobe) and a first fusion partner (" a first member of a dimer pair"); and B) a second fusion polypeptide (comprising a Cas9 alpha-helical lobe) and a second fusion partner (" a second member of the dimer pair"). The first and second fusion polypeptides dimerize when the first and second binding members dimerize (when the first and second binding members are brought into close proximity with one another, e.g., via a dimerizer, via binding to one another, etc.). In some cases, the dimer pair is inducible such that the members of the dimer pair do not associate (e.g., come into proximity with one another, bind to one another, etc.) in the absence of induction (e.g., chemical induction, light induction, etc.). In some cases, the dimer pair is not inducible such that the members of the dimer pair bind to one another when both members are present (e.g., synzip polypeptides).
[00189] Any convenient dimer pair can used. Example dimer pairs suitable for use in a subject
heterodimeric Cas9 protein include non-inducible binding pairs. For example, in some cases, each member of the binding pair is a protein domain that binds to the other member. As an illustrative example, in some cases, each member of the binding pair is a coiled-coil domain. Examples of suitable coiled-coil domains include, but are not limited to:
SYNZIP14:
NDLDAYEREAEKLEKKNEVLRNRLAALENELATLRQEVASMKQELQS
(SEQ ID NO: 1556)
SYNZIP17:
NEKEELKSKKAELRNRIEQLKQKREQLKQKIANLRKEIEAYK
(SEQ ID NO: 1557)
SYNZIP18:
SIAATLENDLARLENENARLEKDIANLERDLAKLEREEAYF
(SEQ ID NO: 1558)
[00190] In some cases, each of the two members of a non-inducible binding pair comprise an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100%, amino acid sequence identity) to a coiled coil domain. In some cases, a member of a non-inducible binding pair includes an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100%, amino acid sequence identity) to SYNZIP14 (the amino acid sequence set forth in SEQ ID NO: 1556). In some cases, a member of a non-inducible binding pair includes an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100%, amino acid sequence identity) to SYNZIP17 (the amino acid sequence set forth in SEQ ID NO: 1557). In some cases, a member of a non-inducible binding pair includes an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100%, amino acid sequence identity) to SYNZIP18 (the amino acid sequence set forth in SEQ ID NO: 1558).
[00191] In some cases, one member of a non-inducible binding pair includes an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100%, amino acid sequence identity) to SYNZIP17 (the amino acid sequence set forth in SEQ ID NO: 1557); and the other member of the non-inducible binding pair includes an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100%, amino acid sequence identity) to SYNZIP18 (the amino acid sequence set forth in SEQ ID NO: 1558). For example, in some cases, the two members of a non-inducible binding pair are SYNZIP17 and SYNZIP18.
[00192] In some cases, one member of a non-inducible binding pair includes an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100%, amino acid sequence identity) to SYNZIP14 (the amino acid sequence set forth in SEQ ID NO: 1556); and the other member of the non-inducible binding pair includes an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100%, amino acid sequence identity) to SYNZIP17 (the amino acid sequence set forth in SEQ ID NO: 1557). For example, in some cases, the two members of a non-inducible binding pair are SYNZIP14 and SYNZIP17.
[00193] Example dimer pairs suitable for use in a subject Cas9 heterodimer also include inducible
binding pairs (binding pairs that can be induced to dimerize, e.g., with a dimerizer, as discussed in more detail below). Dimerizer-binding pairs suitable for use in a Cas9 heterodimer of the
present disclosure are in some embodiments polypeptides (e.g. protein domains) that bind to a different site of the same molecule (referred to herein as a "dimerizer"). In the presence of a dimerizer, both members of a dimerizer-binding pair bind to the dimerizer (e.g., in some cases each binding to a different site of the dimerizer) and are thus brought into proximity with one another. This can also be referred to as chemically-inducible dimerization (CID) (e.g., see DeRose et al, Pflugers Arch. 2013 Mar;465(3):409-17, which is hereby incorporated by reference in its entirety). In some embodiments, binding to the dimerizer is reversible. In some embodiments, binding to the dimerizer is irreversible. In some embodiments, binding to the dimerizer is non-covalent. In some embodiments, binding to the dimerizer is covalent.
[00194] Dimer pairs suitable for use include dimerizer-binding pairs that dimerize upon binding of a first member of a dimer pair to a dimerizing agent and of a second member of the dimer pair to the same dimerizing agent. Dimer pairs suitable for use also include dimerizer-binding pairs that dimerize upon binding of a first member of a dimer pair to a dimerizing agent, where the dimerizing agent induces a conformational change in the first member of the dimer pair, and where the conformational change allows the first member of the dimer pair to bind (covalently or non-covalently) to a second member of the dimer pair. Other dimer pairs suitable for use include dimer pairs in which exposure to light (e.g., blue light) induces dimerization of the dimer pair.
[00195] Regardless of the mechanism, an inducible dimer pair will dimerize upon exposure to an agent that induces dimerization, where the agent is in some cases a small molecule, or, for example, in other cases, light. Thus, for simplicity, the discussion below referring to "dimerizer-binding pairs" includes dimer pairs that dimerize regardless of the mechanism.
[00196] Non-limiting examples of suitable dimers (e.g., dimerizer-binding pairs) include, but are not limited to:
[00197] (a) FKBP1 A (FK506 binding protein) (e.g., a rapamycin binding portion) paired with
FKBP1A (e.g., a rapamycin binding portion): dimerization induced by rapamycin and/or rapamycin analogs known as rapalogs;
[00198] (b) FKBP1A (e.g., a rapamycin binding portion) and FRB (Fkbp-Rapamycin Binding
Domain): dimerization induced by rapamycin and/or rapamycin analogs known as rapalogs;
[00199] (c) FKBP1A (e.g., a rapamycin binding portion) and CnA (calcineurin catalytic subunit
A): dimerization induced by rapamycin and/or rapamycin analogs known as rapalogs;
[00200] (d) FKBP1A (e.g., a rapamycin binding portion) and cyclophilin: dimerization induced by rapamycin and/or rapamycin analogs known as rapalogs;
[00201] (e) GyrB (Gyrase B) and GyrB: dimerization induced by coumermycin;
[00202] (f) DHFR (dihydrofolate reductase) and DHFR: dimerization induced by methotrexate);
[00203] (g) DmrB and DmrB: dimerization induced by AP20187;
[00204] (h) PYL and ABI: dimerization induced by abscisic acid;
[00205] (i) Cry2 and CIB 1 : dimerization induced by blue light; and
[00206] (j) GAI and GID1 : dimerization induced by gibberellin.
[00207] A member (a first and/or a second member) of a binding pair (e.g., a dimerizer-binding pair) of a subject Cas9 heterodimer can have a length in a range of from 35 to 300 amino acids
(e.g., from 35 to 250, from 35 to 200, from 35 to 150, from 35 to 100, from 35 to 50, from 50 to 300, from 50 to 250, from 50 to 200, from 50 to 150, from 50 to 100, from 100 to 300, from 100 to 250, from 100 to 200, from 100 to 150, from 150 to 300, from 150 to 250, from 150 to 200, from 200 to 300, from 200 to 250, or from 250 to 300 amino acids).
[00208] In some cases, a member of a dimer (e.g., a dimerizer-binding pair) of a subject Cas9
heterodimer is derived from FKBP1A (also known as FKBP12, FKBP1 ; PKC12; PKCI2;
PPIASE; FKBP-12; FKBP-1A). For example, a suitable dimerizer-binding pair member can include a rapamycin binding portion of FKBP1 A. For example, a suitable dimerizer-binding pair member can comprise an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to the following amino acid sequence (a rapamycin binding portion of FKBP1A):
GVQVETISPGDGRTFPKRGQTCVVHYTGMLEDGKKFDSSRDRNKPFKFMLGKQEVIRG WEEGVAQMSVGQRAKLTISPDYAYGATGHPGIIPPHATLVFDVELLKLE (SEQ ID NO: 1559).
[00209] In some cases, a member of a dimerizer-binding pair of a Cas9 heterodimer is derived from protein phosphatase 3, catalytic subunit, alpha isozyme (PPP3CA) (also known as
"Serine/threonine -protein phosphatase 2B catalytic subunit alpha isoform"; CNA; CALN;
CALNA; CALNA1 ; CCN1 ; CNA1 ; PPP2B; "CAM-PRP catalytic subunit"; and "calmodulin- dependent calcineurin A subunit alpha isoform"). For example, a suitable dimerizer-binding pair member can include a binding portion of PPP3CA. For example, a suitable dimerizer-binding pair member can comprise an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to the following amino acid sequence (PP2Ac domain):
LEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMKLFEVGGSPANTRYLFL GDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVY DACMDAFDCLPLAALMNQQFLCVHGGLSPEINTLDDIRKLDRFKEPPAYGPMCDILWS DPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMY
RKSQTTGFPSLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFM (SEQ ID NO: 1560).
[00210] In some cases, a member of a dimer (e.g., a dimerizer-binding pair) is derived from cyclophilin (also known cyclophilin A, PPIA, CYPA, CYPH, PPIase A, etc.). For example, a suitable dimerizer-binding pair member can include a binding portion of cyclophilin. For example, a suitable dimerizer-binding pair member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to the following amino acid sequence:
MVNPTVFFDIAVDGEPLGRVSFELFADKVPKTAENFRALSTGEKGFGYKGSCFHRIIPGF MCQGGDFTRHNGTGGKSIYGEKFEDENFILKHTGPGILSMANAGPNTNGSQFFICTAKT EWLDGKHVVFGKVKEGMNIVEAMERFGSRNGKTSKKITIADCGQLE (SEQ ID
NO: 1561).
[00211] In some cases, a member of a dimer (e.g., a dimerizer-binding pair) is derived from MTOR (also known as FKBP-rapamycin associated protein; FK506 binding protein 12-rapamycin associated protein 1 ; FK506 binding protein 12-rapamycin associated protein 2; FK506-binding protein 12-rapamycin complex-associated protein 1; FRAP; FRAP1 ; FRAP2; RAFT1 ; and RAPT1). For example, a suitable dimerizer-binding pair member can include the Fkbp- Rapamycin Binding Domain (also known as FRB). For example, a suitable dimerizer-binding pair member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to the following amino acid sequence (FRB):
VAILWHEMWHEGLEEASRLYFGERNVKGMFEVLEPLHAMMERGPQTLKETSFNQAYG RDLMEAQEWCRKYMKSGNVKDLTQAWDLYYHVFRRIS (SEQ ID NO: 1562).
[00212] In some cases, a member of a dimer (e.g., a dimerizer-binding pair) is derived from GyrB (also known as DNA gyrase subunit B). For example, a suitable dimerizer-binding pair member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to a contiguous stretch of from about 100 amino acids to about 200 amino acids (aa), from about 200 aa to about 300 aa, from about 300 aa to about 400 aa, from about 400 aa to about 500 aa, from about 500 aa to about 600 aa, from about 600 aa to about 700 aa, or from about 700 aa to about 800 aa, of the following GyrB amino acid sequence from Escherichia coli (or to the DNA gyrase subunit B sequence from any organism):
MSNSYDSSSIKVLKGLDAVRKRPGMYIGDTDDGTGLHHMVFEVVDNAIDEALAGHCKE IIVTIHADNSVSVQDDGRGIPTGIHPEEGVSAAEVIMTVLHAGGKFDDNSYKVSGGLHGV
GVSVVNALSQKLELVIQREGKIHRQIYEHGVPQAPLAVTGETEKTGTMVRFWPSLETFT NVTEFEYEILAKRLRELSFLNSGVSIRLRDKRDGKEDHFHYEGGIKAFVEYLNKNKTPIH PNIFYFSTEKDGIGVEVALQWNDGFQENIYCFTNNIPQRDGGTHLAGFRAAMTRTLNAY MDKEGYSKKAKVSATGDDAREGLIAVVSVKVPDPKFSSQTKDKLVSSEVKSAVEQQM NELLAEYLLENPTDAKIVVGKIIDAARAREAARRAREMTRRKGALDLAGLPGKLADCQ ERDPALSELYLVEGDSAGGSAKQGRNRKNQAILPLKGKILNVEKARFDKMLSSQEVATL ITALGCGIGRDEYNPDKLRYHSIIIMTDADVDGSHIRTLLLTFFYRQMPEIVERGHVYIAQ PPLYKVKKGKQEQYIKDDEAMDQYQISIALDGATLHTNASAPALAGEALEKLVSEYNA TQKMINRMERRYPKAMLKELIYQPTLTEADLSDEQTVTRWVNALVSELNDKEQHGSQ WKFDVHTNAEQNLFEPIVRVRTHGVDTDYPLDHEFITGGEYRRICTLGEKLRGLLEEDA FIERGERRQPVASFEQALDWLVKESRRGLSIQRYKGLGEMNPEQLWETTMDPESRRML RVTVKDAIAADQLFTTLMGDAVEPRRAFIEENALKAANIDI (SEQ ID NO:1563). In some cases, a member of a dimerizer-binding pair includes an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to amino acids 1-220 of the above- listed GyrB amino acid sequence from Escherichia coli.
[00213] In some cases, a member of a dimer (e.g., a dimerizer-binding pair) is derived from DHFR (also known as dihydrofolate reductase, DHFRP1, and DYR). For example, a suitable dimerizer- binding pair member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to the following amino acid sequence:
MVGSLNCIVAVSQNMGIGKNGDLPWPPLRNEFRYFQRMTTTSSVEGKQNLVIMGKKT WFSIPEKNRPLKGRINLVLSRELKEPPQGAHFLSRSLDDALKLTEQPELANKVDMVWIV GGSSVYKEAMNHPGHLKLFVTRIMQDFESDTFFPEIDLEKYKLLPEYPGVLSDVQEEKGI KYKFEVYEKND (SEQ ID NO: 1564).
[00214] In some cases, a member of a dimer (e.g., a dimerizer-binding pair) is derived from the DmrB binding domain (i.e., DmrB homodimerization domain). For example, a suitable dimerizer- binding pair member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to the following amino acid sequence:
MASRGVQVETISPGDGRTFPKRGQTCVVHYTGMLEDGKKVDSSRDRNKPFKFMLGKQ EVIRGWEEGVAQMSVGQRAKLTISPDYAYGATGHPGIIPPHATLVFDVELLKLE (SEQ ID NO: 1565).
[00215] In some cases, a member of a dimer (e.g., a dimerizer-binding pair) is derived from a
PYL protein (also known as abscisic acid receptor and as RCAR). For example a member of a
subject dimerizer-binding pair can be derived from proteins such as those of Arabidopsis thaliana: PYR1, RCAR1(PYL9), PYL1, PYL2, PYL3, PYL4, PYL5, PYL6, PYL7, PYL8 (RCAR3), PYL10, PYL11, PYL12, PYL13. For example, a suitable dimerizer-binding pair member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to the following amino acid sequences:
[00216] PYL10:
MNGDETKKVESEYIKKHHRHELVESQCSSTLVKHIKAPLHLVWSIVRRFDEPQKYKPFIS RCVVQGKKLEVGSVREVDLKSGLPATKSTEVLEILDDNEHILGIRIVGGDHRLKNYSSTI SLHSETIDGKTGTLAIESFVVDVPEGNTKEETCFFVEALIQCNLNSLADVTERLQAESME KKI (SEQ ID NO: 1566).
[00217] PYL11 :
METSQKYHTCGSTLVQTIDAPLSLVWSILRRFDNPQAYKQFVKTCNLSSGDGGEGSVRE VTVVSGLPAEFSRERLDELDDESHVMMISIIGGDHRLVNYRSKTMAFVAADTEEKTVVV ESYVVDVPEGNSEEETTSFADTIVGFNLKSLAKLSERVAHLKL (SEQ ID NO: 1567)
[00218] PYL12:
MKTSQEQHVCGSTVVQTINAPLPLVWSILRRFDNPKTFKHFVKTCKLRSGDGGEGSVRE VTVVSDLPASFSLERLDELDDESHVMVISIIGGDHRLVNYQSKTTVFVAAEEEKTVVVES YVVDVPEGNTEEETTLFADTIVGCNLRSLAKLSEKMMELT (SEQ ID NO:1568).
[00219] PYL13:
MESSKQKRCRSSVVETIEAPLPLVWSILRSFDKPQAYQRFVKSCTMRSGGGGGKGGEGK GSVRDVTLVSGFPADFSTERLEELDDESHVMVVSIIGGNHRLVNYKSKTKVVASPEDMA KKTVVVESYVVDVPEGTSEEDTIFFVDNIIRYNLTSLAKLTKKMMK (SEQ ID NO: 1569).
[00220] PYL1 :
MANSESSSSPVNEEENSQRISTLHHQTMPSDLTQDEFTQLSQSIAEFHTYQLGNGRCSSLL AQRIHAPPETVWSVVRRFDRPQIYKHFIKSCNVSEDFEMRVGCTRDVNVISGLPANTSRE RLDLLDDDRRVTGFSITGGEHRLRNYKSVTTVHRFEKEEEEERIWTVVLESYVVDVPEG NSEEDTRLFADTVIRLNLQKLASITEAMNRNNNNNNSSQVR (SEQ ID NO:1570).
[00221] PYL2:
MSSSPAVKGLTDEEQKTLEPVIKTYHQFEPDPTTCTSLITQRIHAPASVVWPLIRRFDNPE RYKHFVKRCRLISGDGDVGSVREVTVISGLPASTSTERLEFVDDDHRVLSFRVVGGEHR LKNYKSVTSVNEFLNQDSGKVYTVVLESYTVDIPEGNTEEDTKMFVDTVVKLNLQKLG VAATSAPMHDDE (SEQ ID NO: 1571).
[00222] PYL3:
MNLAPIHDPSSSSTTTTSSSTPYGLTKDEFSTLDSIIRTHHTFPRSPNTCTSLIAHRVDAPA HAIWRFVRDFANPNKYKHFIKSCTIRVNGNGIKEIKVGTIREVSVVSGLPASTSVEILEVL DEEKRILSFRVLGGEHRLNNYRSVTSVNEFVVLEKDKKKRVYSVVLESYIVDIPQGNTEE DTRMFVDTVVKSNLQNLAVISTASPT (SEQ ID NO: 1572).
[00223] PYL4:
MLAVHRPSSAVSDGDSVQIPMMIASFQKRFPSLSRDSTAARFHTHEVGPNQCCSAVIQEI SAPISTVWSVVRRFDNPQAYKHFLKSCSVIGGDGDNVGSLRQVHVVSGLPAASSTERLD ILDDERHVISFSVVGGDHRLSNYRSVTTLHPSPISGTVVVESYVVDVPPGNTKEETCDFV DVIVRCNLQSLAKIAENTAAESKKKMSL (SEQ ID NO: 1573).
[00224] PYL5:
MRSPVQLQHGSDATNGFHTLQPHDQTDGPIKRVCLTRGMHVPEHVAMHHTHDVGPDQ CCSSVVQMIHAPPESVWALVRRFDNPKVYKNFIRQCRIVQGDGLHVGDLREVMVVSGL PAVSSTERLEILDEERHVISFSVVGGDHRLKNYRSVTTLHASDDEGTVVVESYIVDVPPG NTEEETLSFVDTIVRCNLQSLARSTNRQ (SEQ ID NO: 1574).
[00225] PYL6:
MPTSIQFQRSSTAAEAANATVRNYPHHHQKQVQKVSLTRGMADVPEHVELSHTHVVGP SQCFSVVVQDVEAPVSTVWSILSRFEHPQAYKHFVKSCHVVIGDGREVGSVREVRVVSG LPAAFSLERLEIMDDDRHVISFSVVGGDHRLMNYKSVTTVHESEEDSDGKKRTRVVESY VVDVPAGNDKEETCSFADTIVRCNLQSLAKLAENTSKFS (SEQ ID NO: 1575).
[00226] PYL7:
MEMIGGDDTDTEMYGALVTAQSLRLRHLHHCRENQCTSVLVKYIQAPVHLVWSLVRR FDQPQKYKPFISRCTVNGDPEIGCLREVNVKSGLPATTSTERLEQLDDEEHILGINIIGGD HRLKNYSSILTVHPEMIDGRSGTMVMESFVVDVPQGNTKDDTCYFVESLIKCNLKSLAC VSERLAAQDITNSIATFCNASNGYREKNHTETNL (SEQ ID NO: 1576).
[00227] PYL8:
MEANGIENLTNPNQEREFIRRHHKHELVDNQCSSTLVKHINAPVHIVWSLVRRFDQPQK YKPFISRCVVKGNMEIGTVREVDVKSGLPATRSTERLELLDDNEHILSIRIVGGDHRLKN YSSIISLHPETIEGRIGTLVIESFVVDVPEGNTKDETCYFVEALIKCNLKSLADISERLAVQ DTTESRV (SEQ ID NO: 1577).
[00228] PYL9:
MMDGVEGGTAMYGGLETVQYVRTHHQHLCRENQCTSALVKHIKAPLHLVWSLVRRF DQPQKYKPFVSRCTVIGDPEIGSLREVNVKSGLPATTSTERLELLDDEEHILGIKIIGGDHR
LKNYSSILTVHPEIIEGRAGTMVIESFVVDVPQGNTKDETCYFVEALIRCNLKSLADVSER LASQDITQ (SEQ ID NO: 1578).
[00229] PYR1 :
MPSELTPEERSELKNSIAEFHTYQLDPGSCSSLHAQRIHAPPELVWSIVRRFDKPQTYKHF IKSCSVEQNFEMRVGCTRDVIVISGLPANTSTERLDILDDERRVTGFSIIGGEHRLTNYKS VTTVHRFEKENRIWTVVLESYVVDMPEGNSEDDTRMFADTVVKLNLQKLATVAEAMA RNSGDGSGSQVT (SEQ ID NO: 1579).
[00230] In some cases, a member of a dimer (e.g., a dimerizer-binding pair) is derived from an
ABI protein (also known as Abscisic Acid-Insensitive). For example a member of a subject dimerizer-binding pair can be derived from proteins such as those of Arabidopsis thaliana: ABI1 (Also known as ABSCISIC ACID-INSENSITIVE 1, Protein phosphatase 2C 56, AtPP2C56, P2C56, and PP2C ABI1) and/or ABI2(also known as P2C77, Protein phosphatase 2C 77, AtPP2C77, ABSCISIC ACID-INSENSITIVE 2, Protein phosphatase 2C ABI2, and PP2C ABI2). For example, a suitable dimerizer-binding pair member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to a contiguous stretch of from about 100 amino acids to about 110 amino acids (aa), from about 110 aa to about 115 aa, from about 115 aa to about 120 aa, from about 120 aa to about 130 aa, from about 130 aa to about 140 aa, from about 140 aa to about 150 aa, from about 150 aa to about 160 aa, from about 160 aa to about 170 aa, from about 170 aa to about 180 aa, from about 180 aa to about 190 aa, or from about 190 aa to about 200 aa of any of the following amino acid sequences:
[00231] ABI1 :
MEEVSPAIAGPFRPFSETQMDFTGIRLGKGYCNNQYSNQDSENGDLMVSLPETSSCSVS GSHGSESRKVLISRINSPNLNMKESAAADIVVVDISAGDEINGSDITSEKKMISRTESRSLF EFKSVPLYGFTSICGRRPEMEDAVSTIPRFLQSSSGSMLDGRFDPQSAAHFFGVYDGHGG SQVANYCRERMHLALAEEIAKEKPMLCDGDTWLEKWKKALFNSFLRVDSEIESVAPET VGSTSVVAVVFPSHIFVANCGDSRAVLCRGKTALPLSVDHKPDREDEAARIEAAGGKVI QWNGARVFGVLAMSRSIGDRYLKPSIIPDPEVTAVKRVKEDDCLILASDGVWDVMTDE E ACEM ARKRILLWHKKN A V AGD ASLL ADERRKEGKDP A AMS A AE YLS KL AIQRGS KD NISVVVVDLKPRRKLKSKPLN (SEQ ID NO: 1580).
[00232] ABI2:
MDEVSPAVAVPFRPFTDPHAGLRGYCNGESRVTLPESSCSGDGAMKDSSFEINTRQDSL
TSSSSAMAGVDISAGDEINGSDEFDPRSMNQSEKKVLSRTESRSLFEFKCVPLYGVTSICG
RRPEMEDSVSTIPRFLQVSSSSLLDGRVTNGFNPHLSAHFFGVYDGHGGSQVANYCRER
MHLALTEEIVKEKPEFCDGDTWQEKWKKALFNSFMRVDSEIETVAHAPETVGSTSVVA VVFPTHIFVANCGDSRAVLCRGKTPLALSVDHKPDRDDEAARIEAAGGKVIRWNGARV FGVLAMSRSIGDRYLKPSVIPDPEVTSVRRVKEDDCLILASDGLWDVMTNEEVCDLARK RILLWHKKNAMAGEALLPAEKRGEGKDPAAMSAAEYLSKMALQKGSKDNISVVVVDL KGIRKFKSKSLN (SEQ ID NO: 1581).
[00233] In some cases, a member of a dimer (e.g., a dimerizer-binding pair) is derived from a
Cry2 protein (also known as cryptochrome 2). For example a member of a subject dimer (e.g., a dimerizer-binding pair) can be derived from Cry2 proteins from any organism (e.g., a plant) such as, but not limited to, those of Arabidopsis thaliana. For example, a suitable dimerizer-binding pair member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to a contiguous stretch of from about 100 amino acids to about 110 amino acids (aa), from about 110 aa to about 115 aa, from about 115 aa to about 120 aa, from about 120 aa to about 130 aa, from about 130 aa to about 140 aa, from about 140 aa to about 150 aa, from about 150 aa to about 160 aa, from about 160 aa to about 170 aa, from about 170 aa to about 180 aa, from about 180 aa to about 190 aa, or from about 190 aa to about 200 aa of any of the following amino acid sequences:
[00234] Cry2 (Arabidopsis thaliana)
MKMDKKTIVWFRRDLRIEDNPALAAAAHEGSVFPVFIWCPEEEGQFYPGRASRWWMK
QSLAHLSQSLKALGSDLTLIKTHNTISAILDCIRVTGATKVVFNHLYDPVSLVRDHTVKE
KLVERGISVQSYNGDLLYEPWEIYCEKGKPFTSFNSYWKKCLDMSIESVMLPPPWRLMP
ITAAAEAIWACSIEELGLENEAEKPSNALLTRAWSPGWSNADKLLNEFIEKQLIDYAKNS
KKVVGNSTSLLSPYLHFGEISVRHVFQCARMKQIIWARDKNSEGEESADLFLRGIGLREY
SRYICFNFPFTHEQSLLSHLRFFPWDADVDKFKAWRQGRTGYPLVDAGMRELWATGW
MHNRIRVIVSSFAVKFLLLPWKWGMKYFWDTLLDADLECDILGWQYISGSIPDGHELDR
LDNPALQGAKYDPEGEYIRQWLPELARLPTEWIHHPWDAPLTVLKASGVELGTNYAKPI
VDIDTARELLAKAISRTREAQIMIGAAPDEIVADSFEALGANTIKEPGLCPSVSSNDQQVP
SAVRYNGSKRVKPEEEEERDMKKSRGFDERELFSTAESSSSSSVFFVSQSCSLASEGKNL
EGIQDSSDQITTSLGKNGCK (SEQ ID NO: 1582).
[00235] In some cases, a member of a dimer (e.g., a dimerizer-binding pair) is derived from the
CIB 1 Arabidopsis thaliana protein (also known as transcription factor bHLH63). For example, a suitable dimer (e.g., a dimerizer-binding pair) member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to a contiguous stretch of from about 100 amino acids to about 110 amino acids (aa), from about 110 aa to about 115 aa,
from about 115 aa to about 120 aa, from about 120 aa to about 130 aa, from about 130 aa to about 140 aa, from about 140 aa to about 150 aa, from about 150 aa to about 160 aa, from about 160 aa to about 170 aa, from about 170 aa to about 180 aa, from about 180 aa to about 190 aa, or from about 190 aa to about 200 aa of the following amino acid sequence:
MNGAIGGDLLLNFPDMSVLERQRAHLKYLNPTFDSPLAGFFADSSMITGGEMDSYLSTA
GLNLPMMYGETTVEGDSRLSISPETTLGTGNFKKRKFDTETKDCNEKKKKMTMNRDDL
VEEGEEEKSKITEQNNGSTKSIKKMKHKAKKEENNFSNDSSKVTKELEKTDYIHVRARR
GQATDSHSIAERVRREKISERMKFLQDLVPGCDKITGKAGMLDEIINYVQSLQRQIEFLS
MKLAIVNPRPDFDMDDIFAKEVASTPMTVVPSPEMVLSGYSHEMVHSGYSSEMVNSGY
LHVNPMQQVNTSSDPLSCFNNGEAPSMWDSHVQNLYGNLGV (SEQ ID NO: 1583).
[00236] In some cases, a member of a dimer (e.g., a dimerizer-binding pair) is derived from the
GAI Arabidopsis thaliana protein (also known as Gibberellic Acid Insensitive, and DELLA protein GAI). For example, a suitable dimerizer-binding pair member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to a contiguous stretch of from about 100 amino acids to about 110 amino acids (aa), from about 110 aa to about 115 aa, from about 115 aa to about 120 aa, from about 120 aa to about 130 aa, from about 130 aa to about 140 aa, from about 140 aa to about 150 aa, from about 150 aa to about 160 aa, from about 160 aa to about 170 aa, from about 170 aa to about 180 aa, from about 180 aa to about 190 aa, or from about 190 aa to about 200 aa of the following amino acid sequence:
MKRDHHHHHHQDKKTMMMNEEDDGNGMDELLAVLGYKVRSSEMADVAQKLEQLEV MMSNVQEDDLSQLATETVHYNPAELYTWLDSMLTDLNPPSSNAEYDLKAIPGDAILNQ FAIDSASSSNQGGGGDTYTTNKRLKCSNGVVETTTATAESTRHVVLVDSQENGVRLVH ALLACAEAVQKENLTVAEALVKQIGFLAVSQIGAMRKVATYFAEALARRIYRLSPSQSPI DHSLSDTLQMHFYETCPYLKFAHFTANQAILEAFQGKKRVHVIDFSMSQGLQWPALMQ ALALRPGGPPVFRLTGIGPPAPDNFDYLHEVGCKLAHLAEAIHVEFEYRGFVANTLADL DASMLELRPSEIESVAVNSVFELHKLLGRPGAIDKVLGVVNQIKPEIFTVVEQESNHNSPI FLDRFTESLHYYSTLFDSLEGVPSGQDKVMSEVYLGKQICNVVACDGPDRVERHETLSQ WRNRFGSAGFAAAHIGSNAFKQASMLLALFNGGEGYRVEESDGCLMLGWHTRPLIATS AWKLSTN (SEQ ID NO: 1584).
[00237] In some cases, a member of a dimer (e.g., a dimerizer-binding pair) is derived from a
GID1 Arabidopsis thaliana protein (also known as Gibberellin receptor GID1). For example, a suitable dimer member can include an amino acid sequence having 75% or more amino acid sequence identity (e.g., 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% amino acid sequence identity) to a contiguous stretch of from about 100 amino acids to
about 110 amino acids (aa), from about 110 aa to about 115 aa, from about 115 aa to about 120 aa, from about 120 aa to about 130 aa, from about 130 aa to about 140 aa, from about 140 aa to about 150 aa, from about 150 aa to about 160 aa, from about 160 aa to about 170 aa, from about 170 aa to about 180 aa, from about 180 aa to about 190 aa, or from about 190 aa to about 200 aa of any of the following amino acid sequences:
[00238] GID1A:
MAASDEVNLIESRTVVPLNTWVLISNFKVAYNILRRPDGTFNRHLAEYLDRKVTANANP VDGVFSFDVLIDRRINLLSRVYRPAYADQEQPPSILDLEKPVDGDIVPVILFFHGGSFAHS SANSAIYDTLCRRLVGLCKCVVVSVNYRRAPENPYPCAYDDGWIALNWVNSRSWLKS KKDSKVHIFLAGDSSGGNIAHNVALRAGESGIDVLGNILLNPMFGGNERTESEKSLDGK YFVTVRDRDWYWKAFLPEGEDREHPACNPFSPRGKSLEGVSFPKSLVVVAGLDLIRDW QLAYAEGLKKAGQEVKLMHLEKATVGFYLLPNNNHFHNVMDEISAFVNAEC (SEQ ID NO: 1585).
[00239] GID1B:
MAGGNEVNLNECKRIVPLNTWVLISNFKLAYKVLRRPDGSFNRDLAEFLDRKVPANSFP LDGVFSFDHVDSTTNLLTRIYQPASLLHQTRHGTLELTKPLSTTEIVPVLIFFHGGSFTHSS ANSAIYDTFCRRLVTICGVVVVSVDYRRSPEHRYPCAYDDGWNALNWVKSRVWLQSG KDSNVYVYLAGDSSGGNIAHNVAVRATNEGVKVLGNILLHPMFGGQERTQSEKTLDGK YFVTIQDRDWYWRAYLPEGEDRDHPACNPFGPRGQSLKGVNFPKSLVVVAGLDLVQD WQLAYVDGLKKTGLEVNLLYLKQATIGFYFLPNNDHFHCLMEELNKFVHSIEDSQSKSS PVLLTP (SEQ ID NO: 1586)
[00240] GID1C:
MAGSEEVNLIESKTVVPLNTWVLISNFKLAYNLLRRPDGTFNRHLAEFLDRKVPANANP
VNGVFSFDVIIDRQTNLLSRVYRPADAGTSPSITDLQNPVDGEIVPVIVFFHGGSFAHSSA
NSAIYDTLCRRLVGLCGAVVVSVNYRRAPENRYPCAYDDGWAVLKWVNSSSWLRSKK
DSKVRIFLAGDSSGGNIVHNVAVRAVESRIDVLGNILLNPMFGGTERTESEKRLDGKYF
VTVRDRDWYWRAFLPEGEDREHPACSPFGPRSKSLEGLSFPKSLVVVAGLDLIQDWQL
KYAEGLKKAGQEVKLLYLEQATIGFYLLPNNNHFHTVMDEIAAFVNAECQ (SEQ ID
NO: 1587).
Dimerizers
[00241] Dimerizers ("dimerizing agents") that can provide for dimerization of a first member of a dimerizer-binding pair and a second member of a dimerizer-binding pair include, e.g. (where the dimerizer is in parentheses following the dimerizer-binding pair):
a) FKBP1A and FKBP1A (rapamycin and/or a rapamycin analog, rapalog);
b) FKBP1A and FRB (rapamycin and/or a rapamycin analog, rapalog);
c) FKBP1A and PPP3CA (rapamycin and/or a rapamycin analog, rapalog);
d) FKBP1A and cyclophilin (rapamycin and/or a rapamycin analog, rapalog);
e) GyrB and GyrB (coumermycin);
f) DHFR and DHFR (methotrexate);
g) DmrB and DmrB (AP20187);
h) PYL and ABI (abscisic acid);
i) Cry2 and CIB1 (blue light); and
j) GAI and GID1 (gibberellin).
[00242] As noted above, rapamycin can serve as a dimerizer. Alternatively, a rapamycin derivative or analog can be used. See, e.g., W096/41865; WO 99/36553; WO 01/14387; and Ye et al (1999) Science 283:88-91. For example, analogs, homologs, derivatives and other compounds related structurally to rapamycin ("rapalogs") include, among others, variants of rapamycin having one or more of the following modifications relative to rapamycin: demethylation, elimination or replacement of the methoxy at C7, C42 and/or C29; elimination, derivatization or replacement of the hydroxy at CI 3, C43 and/or C28; reduction, elimination or derivatization of the ketone at C14, C24 and/or C30; replacement of the 6-membered pipecolate ring with a 5-membered prolyl ring; and alternative substitution on the cyclohexyl ring or replacement of the cyclohexyl ring with a substituted cyclopentyl ring. Additional information is presented in, e.g., U.S. Pat. Nos. 5,525,610; 5,310,903 5,362,718; and 5,527,907. Selective epimerization of the C-28 hydroxyl group has been described; see, e.g., WO 01/14387. Additional synthetic dimerizing agents suitable for use as an alternative to rapamycin include those described in U.S. Patent Publication No. 2012/0130076.
[00243] Rapamycin has the structure:
Rapamycin
[00244] Suitable rapalogs include, e.g.,
28-epirapamycin
[00245] Also suitable as a rapalog is a compound of the formula:
[00246] where n is 1 or 2; R and R are independently H, or a substituted or unsubstituted aliphatic or acyl moiety; one of R7a and R7b is H and the other is halo, RA, ORA, SRA, -OC(0)RA, -OC(0)NRARB, -NRARB, -NRBC(OR)RA, NRBC(0)ORA, -NRBS02RA, or NRBS02NRARB'; or R7' and R7b, taken together, are H in the tetraene moiety:
[00247] where RA is H or a substituted or unsubstituted aliphatic, heteroaliphatic, aryl, or heteroaryl moiety and where RB and RB are independently H, OH, or a substituted or unsubstituted aliphatic, heteroaliphatic, aryl, or heteroaryl moiety.
[00248] As noted above, coumermycin can serve as a dimerizing agent. Alternatively, a coumermycin analog can be used. See, e.g., Farrar et al. (1996) Nature 383: 178-181 ; and U.S. Pat. No.
6,916,846.
[00249] As noted above, in some cases, the dimerizing agent is methotrexate, e.g., a non-cytotoxic, homo-bifunctional methotrexate dimer. See, e.g., U.S. Patent No. 8,236,925.
Examples of Cas9 heterodimers
[00250] In some embodiments, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a first member of a dimerization pair; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a second member of a dimerization pair.
[00251] In some embodiments, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an FKBP1A polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an FKBP1A polypeptide.
[00252] In some embodiments, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an FKBP1A polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an FRB polypeptide. In some embodiments, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an FRB polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an FKBP1A polypeptide.
[00253] In some embodiments, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII
polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an FKBP1A polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a PPP3CA
polypeptide. In some embodiments, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM- interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a PPP3CA polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an FKBP1A polypeptide.
[00254] In some embodiments, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an FKBP1A polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a cyclophilin polypeptide. In some embodiments, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM- interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a cyclophilin polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an FKBP1A polypeptide.
[00255] In some embodiments, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a GyrB polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a GyrB polypeptide.
[00256] In some embodiments, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a DHFR polypeptide;
and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a DHFR polypeptide.
[00257] In some embodiments, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a DmrB polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a DmrB polypeptide.
[00258] In some embodiments, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a PYL polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an ABI polypeptide. In some embodiments, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an ABI polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an PYL polypeptide.
[00259] In some embodiments, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a Cyr2 polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a CIB1 polypeptide. In some embodiments, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a CIB1 polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an Cry2 polypeptide.
[00260] In some embodiments, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting
polypeptide; and b) a first fusion partner, where the first fusion partner is a GAI polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a GID1 polypeptide. In some embodiments, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first polypeptide comprising: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a GID1 polypeptide; and B) a second fusion polypeptide comprising: a) an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an GAI polypeptide.
[00261] In some cases, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a first member of a dimerization pair; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a second member of the dimerization pair.
[00262] In some cases, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an FKBP1A polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an FKBP1A polypeptide.
[00263] In some cases, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an FKBP1A polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an FRB polypeptide. In some cases, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an FRB polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition
region; and b) a second fusion partner, where the second fusion partner is an FKBP1A polypeptide.
[00264] In some cases, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an FKBP1A polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a PPP3CA polypeptide. In some cases, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a PPP3CA polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an FKBP1A polypeptide.
[00265] In some cases, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an FKBP1A polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a cyclophilin polypeptide. In some cases, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a cyclophilin polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an FKBP1A polypeptide.
[00266] In some cases, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a GyrB polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that
comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a GyrB polypeptide.
[00267] In some cases, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a DHFR polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a DHFR polypeptide.
[00268] In some cases, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a DmrB polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a DmrB polypeptide.
[00269] In some cases, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a PYL polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is an ABI polypeptide. In some cases, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is an ABI polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a PYL polypeptide.
[00270] In some cases, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a
Cry2 polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second
fusion partner is a CIB l polypeptide. In some cases, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-inter acting polypeptide; and b) a first fusion partner, where the first fusion partner is a CIB l polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a Cry2 polypeptide.
[00271] In some cases, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion
polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a GAI polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a GID1 polypeptide. In some cases, a Cas9 heterodimer of the present disclosure comprises: A) a first fusion polypeptide comprising: a) a first, circular permuted, polypeptide that comprises: i) a RuvCI polypeptide; ii) a RuvCII polypeptide; iii) an HNH polypeptide; iv) a RuvCIII polypeptide; and v) a PAM-interacting polypeptide; and b) a first fusion partner, where the first fusion partner is a GID1 polypeptide; and B) a second fusion polypeptide comprising: a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a GAI polypeptide.
[00272] Cas9 protein
[00273] The components of a Cas9 heterodimer (e.g,. the nuclease lobe, sub-regions of a nuclease lobe, the alpha-helical lobe, etc.) can be derived from any convenient Cas9 protein. Example Cas9 proteins include, but are not limited to the Streptococcus pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545 (depicted in Figure 5 and used as a reference Cas9 sequence in the examples below), as well as the Cas9 amino acid sequences set forth in any of SEQ ID NOs: 1- 259 and 795-1346.
[00274] A subject Cas9 guide RNA and a subject Cas9 heterodimer form a complex. The guide RNA provides target specificity to the complex by comprising a nucleotide sequence that is complementary to a sequence (the target site) of a target nucleic acid. The Cas9 heterodimer of the complex provides the site-specific activity. In other words, the Cas9 heterodimer is guided to a target site within a target nucleic acid sequence (e.g. a chromosomal sequence or an extrachromosomal sequence, e.g. an episomal sequence, a minicircle sequence, a mitochondrial sequence, a chloroplast sequence, a plasmid sequence, etc.) by virtue of its association with the protein-binding segment of the Cas9 guide RNA (described elsewhere).
[00275] A subject Cas9 heterodimer can bind and/or modify (e.g., cleave, methylate, demethylate, etc.) a target nucleic acid and/or a polypeptide associated with target nucleic acid (e.g., methylation or acetylation of a histone tail). A subject Cas9 heterodimer can exhibit site -directed enzymatic activity (e.g., nuclease activity, activity for DNA and/or RNA methylation, activity for DNA and/or RNA cleavage, activity for histone acetylation, activity for histone methylation, activity for RNA modification, activity for RNA-binding, ativity for RNA splicing etc.). In some cases the Cas9 heterodimer exhibits reduced nuclease activity relative to the corresponding portion of a wild type Cas9 polypeptide. In some cases, the Cas9 heterodimer is enzymatically inactive (sometimes referred to as a "dead Cas9 heterodimer".
[00276] Assays to determine whether a subject Cas9 heterodimer interacts with a subject guide RNA and/or a target nucleic acid can be any convenient binding assay that tests for binding between a protein and a nucleic acid. Assays to determine whether a subject Cas9 heterodimer has an activity (e.g., to determine if the polypeptide has nuclease activity that cleave a target nucleic acid) can be any convenient assay (e.g., a nucleic acid cleavage assay) that tests for the desired acitvity (e.g., cleavage, methylation, etc.).
[00277] In some cases, a subject Cas9 heterodimer (e.g., a Cas9 heterodimer having a fusion partner that provides an activity) has enzymatic activity that modifies target nucleic acid (e.g., nuclease activity (which can be provided by the Cas9 sequences, but can alternatively be provided by exogenous amino acid sequences), methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity or glycosylase activity) and/or a polypeptide (e.g., a histone) associated with target nucleic acid (e.g., methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity or demyristoylation activity)..
[00278] Many Cas9 orthologs from a wide variety of species have been identified and the protiens share only a few identical amino acids. Identified Cas9 orthologs have the same domain architecture with a central HNH endonuclease domain and a split RuvC/RNaseH domain (Figure 20A
(Table 1)). Cas9 proteins share 4 key motifs with a conserved architecture. Motifs 1, 2, and 4 are
RuvC like motifs while motif 3 is an HNH-motif. In some cases, a suitable Cas9 polypeptide comprises an amino acid sequence having 4 motifs, each of motifs 1-4 having 60% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 99% or more or
100% amino acid sequence identity to motifs 1-4 of a Cas9 amino acid sequence (e.g., SEQ ID NO: 1545).
[00279] Variant Cas9 polypeptides (variant Cas9 heterodimers)
[00280] The present disclosure provides compositions and methods that include a variant Cas9
heterodimer. A variant Cas9 heterodimer has an amino acid sequence that is different by at least one amino acid (e.g., has a deletion, insertion, substitution, fusion) when compared to the amino acid sequence of a wild type Cas9 polypeptide. In some instances, the variant Cas9 heterodimer has an amino acid change (e.g., deletion, insertion, or substitution) that reduces the nuclease activity of the Cas9 heterodimer. For example, in some instances, the variant Cas9 heterodimer has less than 50%, less than 40%, less than 30%, less than 20%, less than 10%, less than 5%, or less than 1% of the nuclease activity of the corresponding unmutated Cas9 heterodimer. In some cases, the variant Cas9 heterodimer has substantially no nuclease activity. When a subject Cas9 heterodimer is a variant Cas9 heterodimer that has no substantial nuclease activity, it can be referred to as a "dead Cas9 heterodimer" or "dCas9 heterodimer."
[00281] Thus, a subject Cas9 heterodimer can have reduced cleavage activity (e.g., can have nickase activity). For example, a Cas9 heterodimer can have a functional HNH domain, but a defective RuvC domain. A Cas9 heterodimer can have a functional RuvC domain, but a defective HNH domain. A Cas9 heterodimer can substantially lack nuclease activity (e.g., can have defective HNH and RuvC domains). Any convenient amino acid mutation can be used to elicit the desired variant Cas9. For example, many amino acid mutations are known in the art that result in a Cas9 having a defective HNH domain and/or a defective RuvC domain.
[00282] In some cases, a variant Cas9 heterodimer has reduced nuclease activity. For example, a variant Cas9 heterodimer suitable for use in a binding method of the present disclosure can exhibit less than about 20%, less than about 15%, less than about 10%, less than about 5%, less than about 1%, or less than about 0.1%, of the endonuclease activity of a corresponding unmutated Cas9 heterodimer, e.g., a Cas9 heterodimer comprising a wild type Cas9 amino acid sequence.
[00283] In some cases, a variant Cas9 heterodimer can cleave the complementary strand of a target nucleic acid but has reduced ability to cleave the non-complementary strand of a double stranded target nucleic acid. For example, the variant Cas9 polypeptide can have a mutation (amino acid substitution) that reduces the function of the RuvC domain. As a non-limiting example, in some embodiments, a variant Cas9 heterodimer has a D10A (aspartate to alanine at amino acid position 10 of SEQ ID NO: 1545) mutation (or the corresponding mutation of any of the proteins presented in SEQ ID NOs: 1-256 and 795-1346) and can therefore cleave the complementary strand of a double stranded target nucleic acid but has reduced ability to cleave the non-
complementary strand of a double stranded target nucleic acid (thus resulting in a single strand break (SSB) instead of a double strand break (DSB) (nickase activity) when the variant Cas9 heterodimer cleaves a double stranded target nucleic acid) (see, for example, Jinek et al., Science. 2012 Aug 17;337(6096):816-21).
[00284] In some cases, a variant Cas9 heterodimer can cleave the non-complementary strand of a double stranded target nucleic acid but has reduced ability to cleave the complementary strand of the target nucleic acid. For example, the variant Cas9 heterodimer can have a mutation (amino acid substitution) that reduces the function of the HNH domain. As a non-limiting example, in some embodiments, a variant Cas9 heterodimer has an H839A mutation (histidine to alanine at amino acid position 839 of SEQ ID NO: 1545; or the corresponding mutation of any of the proteins set forth as SEQ ID NOs: 1-256 and 795-1346) and can therefore cleave the non-complementary strand of the target nucleic acid but has reduced ability to cleave the complementary strand of the target nucleic acid (thus resulting in a SSB instead of a DSB when the variant Cas9 polypeptide cleaves a double stranded target nucleic acid)(nickase activity). Such a Cas9 heterodimer has a reduced ability to cleave a target nucleic acid (e.g., a single or double stranded target nucleic acid) and retains the ability to bind a target nucleic acid (e.g., a single or double stranded target nucleic acid).
[00285] In some cases, a variant Cas9 heterodimer has a reduced ability to cleave both the
complementary and the non-complementary strands of a double stranded target nucleic acid. As a non-limiting example, in some cases, the variant Cas9 polypeptide harbors both the D10A and the H839A mutations (of SEQ ID NO: 1545, or the corresponding mutations of any of the proteins set forth as SEQ ID NOs: 1-256 and 795-1346) such that the heterodimer has a reduced ability to cleave both the complementary and the non-complementary strands of a double stranded target nucleic acid. Such a Cas9 heterodimer has a reduced ability to cleave a target nucleic acid (e.g., a single stranded or double stranded target nucleic acid) but retains the ability to bind a target nucleic acid (e.g., a single or double stranded target nucleic acid).
[00286] Additional examples of Cas 9 mutations that can reduce the cleavage activity of Cas9 can be found in the art, for example, see Jinek et al., Science. 2012 Aug 17;337(6096):816-21 ;
Chylinski et al., RNA Biol. 2013 May;10(5):726-37; Ma et al., Biomed Res Int.
2013;2013:270805; Hou et al., Proc Natl Acad Sci U S A. 2013 Sep 24;110(39):15644-9; Jinek et al., Elife. 2013;2:e00471 ; Pattanayak et al., Nat Biotechnol. 2013 Sep;31(9):839-43; Qi et al, Cell. 2013 Feb 28;152(5): 1173-83; Wang et al., Cell. 2013 May 9;153(4):910-8; Auer et. al., Genome Res. 2013 Oct 31 ; Chen et. al., Nucleic Acids Res. 2013 Nov l ;41(20):el9; Cheng et. al., Cell Res. 2013 Oct;23(10): 1163-71 ; Cho et. al., Genetics. 2013 Nov;195(3): 1177-80;
DiCarlo et al., Nucleic Acids Res. 2013 Apr;41(7):4336-43; Dickinson et. al., Nat Methods.
2013 Oct;10(10): 1028-34; Ebina et. al., Sci Rep. 2013;3:2510; Fujii et. al, Nucleic Acids Res. 2013 Nov l ;41(20):el87; Hu et. al., Cell Res. 2013 Nov;23(l l): 1322-5; Jiang et. al., Nucleic Acids Res. 2013 Nov l ;41(20):el88; Larson et. al., Nat Protoc. 2013 Nov;8(l l):2180-96; Mali et. at., Nat Methods. 2013 Oct;10(10):957-63; Nakayama et. al., Genesis. 2013 Dec;51(12):835- 43; Ran et. al., Nat Protoc. 2013 Nov;8(l l):2281-308; Ran et. al., Cell. 2013 Sep
12;154(6): 1380-9; Upadhyay et. al., G3 (Bethesda). 2013 Dec 9;3(12):2233-8; Walsh et. al., Proc Natl Acad Sci U S A. 2013 Sep 24;110(39):15514-5; Xie et. al., Mol Plant. 2013 Oct 9; Yang et. al., Cell. 2013 Sep 12;154(6): 1370-9; and U.S. patents and patent applications:
8,906,616; 8,895,308; 8,889,418; 8,889,356; 8,871,445; 8,865,406; 8,795,965; 8,771,945;
8,697,359; 20140068797; 20140170753; 20140179006; 20140179770; 20140186843;
20140186919; 20140186958; 20140189896; 20140227787; 20140234972; 20140242664;
20140242699; 20140242700; 20140242702; 20140248702; 20140256046; 20140273037;
20140273226; 20140273230; 20140273231 ; 20140273232; 20140273233; 20140273234;
20140273235; 20140287938; 20140295556; 20140295557; 20140298547; 20140304853;
20140309487; 20140310828; 20140310830; 20140315985; 20140335063; 20140335620;
20140342456; 20140342457; 20140342458; 20140349400; 20140349405; 20140356867;
20140356956; 20140356958; 20140356959; 20140357523; 20140357530; 20140364333; and 20140377868; all of which are hereby incorporated by reference in their entirety.
[00287] Chimeric polypeptides (fusion polypetides)
[00288] In some embodiments, a Cas9 heterodimer is a chimeric Cas9 heterodimer that is fused to a fusion partner and can bind and/or modify a target nucleic acid (e.g., cleave, methylate, demethylate, etc.) and/or modify a polypeptide associated with target nucleic acid (e.g., methylation, acetylation, etc., of, for example, a histone tail) and/or modulate transcription of a target nucleic acid. In some cases, a Cas9 heterodimer with a fusion partner has reduced nuclease activity (e.g., as described above) and is fused to a heterologous polypeptide that provides an activity that will be exhibited by the Cas9 heterodimer (e.g., target cleavage, target methylation, transcription modulation, etc.).
[00289] In some such cases, a method of binding, e.g., in some cases where the Cas9 heterodimer is a variant Cas9 heterodimer having a fusion partner (i.e., having a heterologous polypeptide) with an activity (e.g., an enzymatic activity) that modifies the target nucleic acid, the method can also be considered to be a method of modifying the target nucleic acid. In some cases, a method of binding a target nucleic acid (e.g., a single or double stranded target nucleic acid) can result in modification of the target nucleic acid. Thus, in some cases, a method of binding a target nucleic acid (e.g., a single or double stranded target nucleic acid) can be a method of modifying the target nucleic acid.
[00290] In some cases, a heterologous sequence provides for subcellular localization, i.e., the heterologous sequence is a subcellular localization sequence (e.g., a nuclear localization signal (NLS) for targeting to the nucleus, a sequence to keep the fusion protein out of the nucleus, e.g., a nuclear export sequence (NES), a sequence to keep the fusion protein retained in the cytoplasm, a mitochondrial localization signal for targeting to the mitochondria, a chloroplast localization signal for targeting to a chloroplast, an ER retention signal, and the like). In some embodiments, a variant Cas9 does not include a NLS so that the protein is not targeted to the nucleus (which can be advantageous, e.g., when the target nucleic acid is an RNA that is present in the cytosol). In some embodiments, the heterologous sequence can provide a tag (i.e., the heterologous sequence is a detectable label) for ease of tracking and/or purification (e.g., a fluorescent protein, e.g., green fluorescent protein (GFP), YFP, RFP, CFP, mCherry, tdTomato, and the like; a histidine tag, e.g., a 6XHis tag; a hemagglutinin (HA) tag; a FLAG tag; a Myc tag; and the like).
[00291] In some embodiments, the heterologous sequence can provide for increased or decreased
stability (i.e., the heterologous sequence is a stability control peptide, e.g., a degron, which in some cases is controllable (e.g., a temperature sensitive or drug controllable degron sequence). In some embodiments, the heterologous sequence can provide for increased or decreased transcription from the target nucleic acid (i.e., the heterologous sequence is a transcription modulation sequence, e.g., a transcription factor/activator or a fragment thereof, a protein or fragment thereof that recruits a transcription factor/activator, a transcription repressor or a fragment thereof, a protein or fragment thereof that recruits a transcription repressor, a small molecule/drug-responsive transcription regulator, etc.). In some embodiments, the heterologous sequence can provide a binding domain (i.e., the heterologous sequence is a protein binding sequence, e.g., to provide the ability of a Cas9 heterodimer to bind to another protein of interest, e.g., a DNA or histone modifying protein, a transcription factor or transcription repressor, a recruiting protein, an RNA modification enzyme, an RNA-binding protein, a translation initiation factor, an RNA splicing factor, etc.).
[00292] A subject Cas9 heterodimer can have multiple (1 or more, 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, etc.) fusion partners in any combination. As an illustrative example, a Cas9 heterodimer can have a heterologous sequence that provides an activity (e.g., for transcription modulation, target modification, modification of a protein associated with a target nucleic acid, etc.) and can also have a subcellular localization sequence (e.g., one or more NLSs). In some cases, such a Cas9 heterodimer might also have a tag for ease of tracking and/or purification (e.g., green fluorescent protein (GFP), YFP, RFP, CFP, mCherry, tdTomato, and the like; a histidine tag, e.g., a 6XHis tag; a hemagglutinin (HA) tag; a FLAG tag; a Myc tag; and
the like). A Cas9 heterodimer can have one or more NLSs (e.g., two or more, three or more, four or more, five or more, 1, 2, 3, 4, or 5 NLSs). In some cases a fusion partner (or multiple fusion partners) (e.g., an NLS, a tag, a fusion partner providing an activity, etc.) is located at or near the C-terminus of the first and/or second polypeptide (nuclease lobe / alpha-helical lobe). In some cases a fusion partner (or multiple fusion partners) (e.g., an NLS, a tag, a fusion partner providing an activity, etc.) is located at the N-terminus of the first and/or second polypeptide (nuclease lobe / alpha-helical lobe). In some cases a Cas9 heterodimer (either on the nuclease lobe, the alpha-helical lobe, or both) has a fusion partner (or multiple fusion partners)(e.g., an NLS, a tag, a fusion partner providing an activity, etc.) at both the N-terminus and C-terminus.
[00293] Suitable fusion partners that provide for increased or decreased stability include, but are not limited to degron sequences. Degrons are readily understood by one of ordinary skill in the art to be amino acid sequences that control the stability of the protein of which they are part. For example, the stability of a protein comprising a degron sequence is controlled in part by the degron sequence. In some cases, a suitable degron is constitutive such that the degron exerts its influence on protein stability independent of experimental control (i.e., the degron is not drug inducible, temperature inducible, etc.) In some cases, the degron provides the Cas9 heterodimer with controllable stability such that the Cas9 heterodimer can be turned "on" (i.e., stable) or "off (i.e., unstable, degraded) depending on the desired conditions. For example, if the degron is a temperature sensitive degron, the variant Cas9 polypeptide may be functional (i.e., "on", stable) below a threshold temperature (e.g., 42°C, 41°C, 40°C, 39°C, 38°C, 37°C, 36°C, 35°C, 34°C, 33°C, 32°C, 31°C, 30°C, etc.) but non-functional (i.e., "off, degraded) above the threshold temperature. As another example, if the degron is a drug inducible degron, the presence or absence of drug can switch the protein from an "off (i.e., unstable) state to an "on" (i.e., stable) state or vice versa. An exemplary drug inducible degron is derived from the FKBP12 protein. The stability of the degron is controlled by the presence or absence of a small molecule that binds to the degron.
[00294] Examples of suitable degrons include, but are not limited to those degrons controlled by Shield- 1, DHFR, auxins, and/or temperature. Non-limiting examples of suitable degrons are known in the art (e.g., Dohmen et al., Science, 1994. 263(5151): p. 1273-1276: Heat-inducible degron: a method for constructing temperature-sensitive mutants; Schoeber et al., Am J Physiol Renal Physiol. 2009 Jan;296(l):F204-l l : Conditional fast expression and function of multimeric TRPV5 channels using Shield-1 ; Chu et al., Bioorg Med Chem Lett. 2008 Nov 15;18(22):5941- 4: Recent progress with FKBP-derived destabilizing domains ; Kanemaki, Pflugers Arch. 2012 Dec 28: Frontiers of protein expression control with conditional degrons; Yang et al., Mol Cell. 2012 Nov 30;48(4):487-8: Titivated for destruction: the methyl degron; Barbour et al., Biosci
Rep. 2013 Jan 18;33(1).: Characterization of the bipartite degron that regulates ubiquitin- independent degradation of thymidylate synthase; and Greussing et al., J Vis Exp. 2012 Nov 10;(69): Monitoring of ubiquitin-proteasome activity in living cells using a Degron (dgn)- destabilized green fluorescent protein (GFP)-based reporter protein; all of which are hereby incorporated in their entirety by reference).
[00295] Suitable fusion partners (for either the nuclease lobe, the alpha-helical lobe, or both) include, but are not limited to, a polypeptide that provides for methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, or demyristoylation activity, any of which can be directed at modifying nucleic acid directly (e.g., methylation of DNA or RNA) or at modifying a nucleic acid- associated polypeptide (e.g., a histone, a DNA binding protein, and RNA binding protein, and the like). Further suitable fusion partners include, but are not limited to boundary elements (e.g., CTCF), proteins and fragments thereof that provide periphery recruitment (e.g., Lamin A, Lamin B, etc.), and protein docking elements (e.g., FKBP/FRB, Pill/Abyl, etc.).
[00296] Examples of various additional suitable fusion partners (or fragments thereof) for a subject
Cas9 heterodimer include, but are not limited to those described in the PCT patent applications: WO2010075303, WO2012068627, and WO2013155555, and can be found, for example, in U.S. patents and patent applications: 8,906,616; 8,895,308; 8,889,418; 8,889,356; 8,871,445;
8,865,406; 8,795,965; 8,771,945; 8,697,359; 20140068797; 20140170753; 20140179006;
20140179770; 20140186843; 20140186919; 20140186958; 20140189896; 20140227787;
20140234972; 20140242664; 20140242699; 20140242700; 20140242702; 20140248702;
20140256046; 20140273037; 20140273226; 20140273230; 20140273231 ; 20140273232;
20140273233; 20140273234; 20140273235; 20140287938; 20140295556; 20140295557;
20140298547; 20140304853; 20140309487; 20140310828; 20140310830; 20140315985;
20140335063; 20140335620; 20140342456; 20140342457; 20140342458; 20140349400;
20140349405; 20140356867; 20140356956; 20140356958; 20140356959; 20140357523;
20140357530; 20140364333; and 20140377868; all of which are hereby incorporated by reference in their entirety.
[00297] For example, suitable fusion partners include, but are not limited to, a polypeptide that provides an activity that indirectly increases transcription by acting directly on the target nucleic acid or on a polypeptide (e.g., a histone, a DNA-binding protein, an RNA-binding protein, an RNA editing protein, etc.) associated with the target nucleic acid. Suitable fusion partners include, but are not limited to, a polypeptide that provides for methyltransferase activity, demethylase
activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, or demyristoylation activity.
[00298] Additional suitable fusion partners include, but are not limited to, a polypeptide that directly provides for increased transcription and/or translation of a target nucleic acid (e.g., a transcription activator or a fragment thereof, a protein or fragment thereof that recruits a transcription activator, a small molecule/drug-responsive transcription and/or translation regulator, a translation-regulating protein, etc.).
[00299] Non-limiting examples of fusion partners to accomplish increased or decreased transcription include transcription activator and transcription repressor domains (e.g., the Kriippel associated box (KRAB or SKD); the Mad mSIN3 interaction domain (SID); the ERF repressor domain (ERD), etc). In some such cases, a Cas9 heterodimer is targeted by the guide nucleic acid (guide RNA) to a specific location (i.e., sequence) in the target nucleic acid and exerts locus-specific regulation such as blocking RNA polymerase binding to a promoter (which selectively inhibits transcription activator function), and/or modifying the local chromatin status (e.g., when a fusion sequence is used that modifies the target nucleic acid or modifies a polypeptide associated with the target nucleic acid). In some cases, the changes are transient (e.g., transcription repression or activation). In some cases, the changes are inheritable (e.g., when epigenetic modifications are made to the target nucleic acid or to proteins associated with the target nucleic acid, e.g., nucleosomal histones).
[00300] Non-limiting examples of fusion partners for use when targeting ssRNA target nucleic acids include (but are not limited to): splicing factors (e.g., RS domains); protein translation components (e.g., translation initiation, elongation, and/or release factors; e.g., eIF4G); RNA methylases; RNA editing enzymes (e.g., RNA deaminases, e.g., adenosine deaminase acting on RNA (ADAR), including A to I and/or C to U editing enzymes); helicases; RNA-binding proteins; and the like. It is understood that a fusion partner can include the entire protein or in some cases can include a fragment of the protein (e.g., a functional domain).
[00301] In some embodiments, the heterologous sequence can be fused to the C-terminus of the Cas9 heterodimer (e.g., of either lobe of the heterodimer). In some embodiments, the heterologous sequence can be fused to the N-terminus of the Cas9 heterodimer (e.g., of either lobe of the heterodimer). In some embodiments, the heterologous sequence can be fused to an internal portion (i.e., a portion other than the N- or C- terminus) of the Cas9 heterodimer (e.g., of either lobe of the heterodimer).
The fusion partner of a Cas9 heterodimer (e.g., either lobe) can be any domain capable of interacting with ssRNA (which, for the purposes of this disclosure, includes intramolecular and/or intermolecular secondary structures, e.g., double-stranded RNA duplexes such as hairpins, stem-loops, etc.), whether transiently or irreversibly, directly or indirectly, including but not limited to an effector domain selected from the group comprising; Endonucleases (for example RNase I I I, the CRR22 DYW domain, Dicer, and PIN (PilT N-terminus) domains from proteins such as SMG5 and SMG6); proteins and protein domains responsible for stimulating RNA cleavage (for example CPSF, CstF, CFIm and CFIIm); Exonucleases (for example XRN-1 or Exonuclease T) ; Deadenylases (for example HNT3); proteins and protein domains responsible for nonsense mediated RNA decay (for example UPF1, UPF2, UPF3, UPF3b, RNP SI, Y14, DEK, REF2, and SRml60); proteins and protein domains responsible for stabilizing RNA (for example PABP) ; proteins and protein domains responsible for repressing translation (for example Ago2 and Ago4); proteins and protein domains responsible for stimulating translation (for example Staufen); proteins and protein domains responsible for (e.g., capable of) modulating translation (e.g., translation factors such as initiation factors, elongation factors, release factors, etc., e.g., eIF4G); proteins and protein domains responsible for polyadenylation of RNA (for example PAP1, GLD-2, and Star- PAP) ; proteins and protein domains responsible for polyuridinylation of RNA (for example CI Dl and terminal uridylate transferase) ; proteins and protein domains responsible for RNA localization (for example from IMP1, ZBP1, She2p, She3p, and Bicaudal-D); proteins and protein domains responsible for nuclear retention of RNA (for example Rrp6); proteins and protein domains responsible for nuclear export of RNA (for example TAP, NXF1, THO, TREX, REF, and Aly) ; proteins and protein domains responsible for repression of RNA splicing (for example PTB, Sam68, and hnRNP Al ) ; proteins and protein domains responsible for stimulation of RNA splicing (for example Serine/ Arginine-rich (SR) domains) ; proteins and protein domains responsible for reducing the efficiency of transcription (for example FUS (TLS)); and proteins and protein domains responsible for stimulating transcription (for example CDK7 and HIV Tat). Alternatively, the effector domain may be selected from the group comprising Endonucleases; proteins and protein domains capable of stimulating RNA cleavage; Exonucleases; Deadenylases; proteins and protein domains having nonsense mediated RNA decay activity; proteins and protein domains capable of stabilizing RNA; proteins and protein domains capable of repressing translation; proteins and protein domains capable of stimulating translation; proteins and protein domains capable of modulating translation (e.g., translation factors such as initiation factors, elongation factors, release factors, etc., e.g., eIF4G); proteins and protein domains capable of polyadenylation of RNA; proteins and protein domains capable of polyuridinylation of RNA; proteins and protein
domains having RNA localization activity; proteins and protein domains capable of nuclear retention of RNA; proteins and protein domains having RNA nuclear export activity; proteins and protein domains capable of repression of RNA splicing; proteins and protein domains capable of stimulation of RNA splicing; proteins and protein domains capable of reducing the efficiency of transcription ; and proteins and protein domains capable of stimulating
transcription. Another suitable fusion partner is a PUF RNA -binding domain, which is described in more detail in WO2012068627.
[00303] Some RNA splicing factors that can be used (in whole or as fragments thereof) as fusion
partners for a Cas9 heterodimer have modular organization, with separate sequence-specific RNA binding modules and splicing effector domains. For example, members of the Serine/ Arginine-rich (SR) protein family contain N-terminal RNA recognition motifs (RRMs) that bind to exonic splicing enhancers (ESEs) in pre-mRNAs and C-terminal RS domains that promote exon inclusion. As another example, the hnRNP protein hnRNP Al binds to exonic splicing silencers (ESSs) through its RRM domains and inhibits exon inclusion through a C-terminal Glycine-rich domain. Some splicing factors can regulate alternative use of splice site (ss) by binding to regulatory sequences between the two alternative sites. For example, ASF/SF2 can recognize ESEs and promote the use of intron proximal sites, whereas hnRNP Al can bind to ESSs and shift splicing towards the use of intron distal sites. One application for such factors is to generate ESFs that modulate alternative splicing of endogenous genes, particularly disease associated genes. For example, Bcl-x pre-mRNA produces two splicing isoforms with two alternative 5' splice sites to encode proteins of opposite functions. The long splicing isoform Bcl- xL is a potent apoptosis inhibitor expressed in long-lived postmitotic cells and is up-regulated in many cancer cells, protecting cells against apoptotic signals. The short isoform Bcl-xS is a pro- apoptotic isoform and expressed at high levels in cells with a high turnover rate (e.g., developing lymphocytes). The ratio of the two Bcl-x splicing isoforms is regulated by multiple cis-elements that are located in either the core exon region or the exon extension region (i.e., between the two alternative 5' splice sites). For more examples, see WO2010075303.
[00304] In some embodiments, a Cas9 heterodimer (e.g., having a wild type Cas9 cleavage activity, having reduced nuclease activity, etc.) can be linked to a fusion partner via a linker as described elsewhere.
[00305] A heterologous polypeptide (e.g., a fusion partner) may include a "Protein Transduction
Domain" or PTD (also known as a CPP - cell penetrating peptide), which refers to a
polypeptide, polynucleotide, carbohydrate, or organic or inorganic compound that facilitates traversing a lipid bilayer, micelle, cell membrane, organelle membrane, or vesicle membrane. A PTD attached to another molecule, which can range from a small polar molecule to a large
macromolecule and/or a nanoparticle, facilitates the molecule traversing a membrane, for example going from extracellular space to intracellular space, or cytosol to within an organelle. In some embodiments, a PTD is covalently linked to the amino terminus a polypeptide (e.g., a Cas9 heterodimer). In some embodiments, a PTD is covalently linked to the carboxyl terminus of a polypeptide (e.g., a Cas9 heterodimer). In some embodiments, a PTD is covalently linked to the carboxyl terminus and the amino terminus of a Cas9 heterodimer. In some cases, a subject Cas9 heterodimer is includes (is conjugated to, is fused to) one or more PTDs (e.g., two or more, three or more, four or more PTDs). In some cases a PTD includes a nuclear localization signal (NLS) (e.g, in some cases 2 or more, 3 or more, 4 or more, or 5 or more NLSs). Thus, in some cases, a Cas9 heterodimer (e.g., a Cas9 heterodimer that includes a heterologous polypeptide sequence) includes one or more NLSs (e.g., 2 or more, 3 or more, 4 or more, or 5 or more NLSs). In some embodiments, a PTD is covalently linked to a nucleic acid (e.g., a Cas9 guide nucleic acid, a polynucleotide encoding a Cas9 guide nucleic acid, a polynucleotide encoding a Cas9 heterodimer, a donor polynucleotide, etc.). Examples of PTDs include but are not limited to a minimal undecapeptide protein transduction domain (corresponding to residues 47-57 of HIV- 1 TAT comprising YGRKKRRQRRR; SEQ ID NO: 264); a polyarginine sequence comprising a number of arginines sufficient to direct entry into a cell (e.g., 3, 4, 5, 6, 7, 8, 9, 10, or 10-50 arginines); a VP22 domain (Zender et al. (2002) Cancer Gene Ther. 9(6):489-96); an Drosophila Antennapedia protein transduction domain (Noguchi et al. (2003) Diabetes 52(7): 1732-1737); a truncated human calcitonin peptide (Trehin et al. (2004) Pharm. Research 21 : 1248-1256); polylysine (Wender et al. (2000) Proc. Natl. Acad. Sci. USA 97: 13003-13008);
RRQRRTS KLMKR (SEQ ID NO: 265); Transportan
GWTLNSAGYLLGKINLKALAALAKKIL (SEQ ID NO: 266);
KALAWEAKLAKALAKALAKHLAKALAKALKCEA (SEQ ID NO: 267); and
RQIKIWFQNRRMKWKK (SEQ ID NO: 268). Exemplary PTDs include but are not limited to, YGRKKRRQRRR (SEQ ID NO: 264), RKKRRQRRR (SEQ ID NO: 269); an arginine homopolymer of from 3 arginine residues to 50 arginine residues; Exemplary PTD domain amino acid sequences include, but are not limited to, any of the following: YGRKKRRQRRR (SEQ ID NO: 264); RKKRRQRR (SEQ ID NO: 269); YARAAARQARA (SEQ ID NO: 271); THRLPRRRRRR (SEQ ID NO: 272); and GGRRARRRRRR (SEQ ID NO: 273). In some embodiments, the PTD is an activatable CPP (ACPP) (Aguilera et al. (2009) Integr Biol (Camb) June; 1(5-6): 371-381). ACPPs comprise a polycationic CPP (e.g., Arg9 or "R9") connected via a cleavable linker to a matching polyanion (e.g., Glu9 or "E9"), which reduces the net charge to nearly zero and thereby inhibits adhesion and uptake into cells. Upon cleavage of the linker, the
polyanion is released, locally unmasking the polyarginine and its inherent adhesiveness, thus "activating" the ACPP to traverse the membrane.
COMPOSITIONS
[00306] The present invention provides a composition comprising a Cas9 heterodimer of the present disclosure. A subject composition is useful for carrying out a method of the present disclosure. The composition can comprise (e.g., in addition to a Cas9 guide RNA), one or more of: a salt, e.g., NaCl, MgCl2, KC1, MgS04, etc.; a buffering agent, e.g., a Tris buffer, N-(2- Hydroxyethyl)piperazine-N'-(2-ethanesulfonic acid) (HEPES), 2-(N-Morpholino)ethanesulfonic acid (MES), MES sodium salt, 3-(N-Morpholino)propanesulfonic acid (MOPS), N- tris[Hydroxymethyl]methyl-3-aminopropanesulfonic acid (TAPS), etc.; a solubilizing agent; a detergent, e.g., a non-ionic detergent such as Tween-20, etc.; a protease inhibitor; and the like.
[00307] In some embodiments, a Cas9 heterodimer of the present disclosure present in a subject
composition is pure, e.g., at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, at least about 99%, or more than 99% pure, where "% purity" means that the Cas9 heterodimer is the recited percent free from other proteins, other macromolecules, or contaminants that may be present during the production of the Cas9 heterodimer.
NUCLEIC ACIDS
[00308] The present disclosure provides nucleic acids comprising nucleotide sequences encoding a Cas9 heterodimer of the present disclosure.
[00309] In some cases, a single nucleic acid comprises nucleotide sequences encoding the first fusion polypeptide and the second fusion polypeptide of a Cas9 heterodimer of the present disclosure. In some cases, the nucleotide sequence encoding first fusion polypeptide and the nucleotide sequence encoding the second fusion polypeptide are operably linked to the same transcriptional control element (e.g., a promoter). In some cases, the nucleotide sequence encoding first fusion polypeptide and the nucleotide sequence encoding the second fusion polypeptide are operably linked to two different transcriptional control elements (e.g., two different promoters). In some cases, the promoters are constitutive. In some cases, the promoters are inducible. In some cases, the promoters are functional in eukaryotic cells. In some cases, the promoters are cell type- specific promoters. In some cases, the promoters are tissue-specific promoters.
[00310] In some cases, nucleotide sequences encoding the first fusion polypeptide and the second fusion polypeptide of a Cas9 heterodimer of the present disclosure are on separate nucleic acids. Thus, in some cases, the present disclosure provides: a) a first nucleic acid comprising a nucleotide
sequence encoding the first fusion polypeptide of a Cas9 heterodimer of the present disclosure; and b) a second nucleic acid comprising a nucleotide sequence encoding the second fusion polypeptide of a Cas9 heterodimer of the present disclosure. In some cases, the nucleotide sequence encoding first fusion polypeptide and the nucleotide sequence encoding the second fusion polypeptide are operably linked to the same transcriptional control element (e.g., a promoter). In some cases, the nucleotide sequence encoding first fusion polypeptide and the nucleotide sequence encoding the second fusion polypeptide are operably linked to two different transcriptional control elements (e.g., two different promoters). ). In some cases, the promoters are constitutive. In some cases, the promoters are inducible. In some cases, the promoters are functional in eukaryotic cells. In some cases, the promoters are cell type-specific promoters. In some cases, the promoters are tissue-specific promoters.
[00311] In some cases, a nucleic acid(s) comprising a nucleotide sequence(s) encoding a Cas9
heterodimer of the present disclosure is a recombinant expression vector. In some embodiments, the recombinant expression vector is a viral construct, e.g., a recombinant adeno-associated virus construct (see, e.g., U.S. Patent No. 7,078,387), a recombinant adenoviral construct, a recombinant lentiviral construct, a recombinant retroviral construct, etc.
[00312] Suitable expression vectors include, but are not limited to, viral vectors (e.g. viral vectors based on vaccinia virus; poliovirus; adenovirus (see, e.g., Li et al., Invest Opthalmol Vis Sci 35:2543 2549, 1994; Borras et al., Gene Ther 6:515 524, 1999; Li and Davidson, PNAS 92:7700 7704, 1995; Sakamoto et al, H Gene Ther 5: 1088 1097, 1999; WO 94/12649, WO 93/03769; WO 93/19191 ; WO 94/28938; WO 95/11984 and WO 95/00655); adeno-associated virus (see, e.g., Ali et al., Hum Gene Ther 9:81 86, 1998, Flannery et al., PNAS 94:6916 6921, 1997; Bennett et al., Invest Opthalmol Vis Sci 38:2857 2863, 1997; Jomary et al., Gene Ther 4:683 690, 1997, Rolling et al., Hum Gene Ther 10:641 648, 1999; Ali et al, Hum Mol Genet 5:591 594, 1996; Srivastava in WO 93/09239, Samulski et al., J. Vir. (1989) 63:3822-3828; Mendelson et al, Virol. (1988) 166: 154-165; and Flotte et al., PNAS (1993) 90: 10613-10617); SV40; herpes simplex virus; human immunodeficiency virus (see, e.g., Miyoshi et al., PNAS 94: 10319 23, 1997; Takahashi et al., J Virol 73:7812 7816, 1999); a retroviral vector (e.g., Murine Leukemia Virus, spleen necrosis virus, and vectors derived from retroviruses such as Rous Sarcoma Virus, Harvey Sarcoma Virus, avian leukosis virus, a lentivirus, human immunodeficiency virus, myeloproliferative sarcoma virus, and mammary tumor virus); and the like.
[00313] Numerous suitable expression vectors are known to those of skill in the art, and many are
commercially available. The following vectors are provided by way of example; for eukaryotic host cells: pXTl, pSG5 (Stratagene), pSVK3, pBPV, pMSG, and pSVLSV40 (Pharmacia). However, any other vector may be used so long as it is compatible with the host cell.
[00314] Depending on the host/vector system utilized, any of a number of suitable transcription and translation control elements, including constitutive and inducible promoters, transcription enhancer elements, transcription terminators, etc. may be used in the expression vector (see e.g., Bitter et al. (1987) Methods in Enzymology, 153:516-544).
[00315] In some embodiments, a nucleotide sequence encoding a Cas9 heterodimer of the present
disclosure (e.g., a nucleotide sequence encoding the first fusion polypeptide and/or the second fusion polypeptide of a Cas9 heterodimer of the present disclosure) is operably linked to a control element, e.g., a transcriptional control element, such as a promoter. The transcriptional control element may be functional in either a eukaryotic cell, e.g., a mammalian cell, a yeast cell, a plant cell, an amphibian cell, etc.; or a prokaryotic cell (e.g., bacterial or archaeal cell). In some embodiments, a nucleotide sequence encoding a Cas9 heterodimer of the present disclosure (e.g., a nucleotide sequence encoding the first fusion polypeptide and/or the second fusion polypeptide of a Cas9 heterodimer of the present disclosure) is operably linked to multiple control elements that allow expression of the Cas9 heterodimer in both prokaryotic and eukaryotic cells.
[00316] Non-limiting examples of suitable eukaryotic promoters (promoters functional in a eukaryotic cell) include those from cytomegalovirus (CMV) immediate early, herpes simplex virus (HSV) thymidine kinase, early and late SV40, long terminal repeats (LTRs) from retrovirus, and mouse metallothionein-I. Selection of the appropriate vector and promoter is well within the level of ordinary skill in the art. The expression vector may also contain a ribosome binding site for translation initiation and a transcription terminator. The expression vector may also include appropriate sequences for amplifying expression. The expression vector may also include nucleotide sequences encoding protein tags (e.g., 6xHis tag, hemagglutinin tag, green fluorescent protein, yellow fluorescent protein, red fluorescent protein, etc.) that are fused to the Cas9 heterodimer.
[00317] In some embodiments, a nucleotide sequence encoding a Cas9 heterodimer of the present
disclosure (e.g., a nucleotide sequence encoding the first fusion polypeptide and/or the second fusion polypeptide of a Cas9 heterodimer of the present disclosure) is operably linked to an inducible promoter. In some embodiments, a nucleotide sequence encoding a Cas9 heterodimer of the present disclosure (e.g., a nucleotide sequence encoding the first fusion polypeptide and/or the second fusion polypeptide of a Cas9 heterodimer of the present disclosure) is operably linked to a constitutive promoter.
[00318] Methods of introducing a nucleic acid into a host cell are known in the art, and any known method can be used to introduce a nucleic acid (e.g., an expression construct) into a cell. Suitable methods include e.g., viral or bacteriophage infection, transfection, conjugation, protoplast
fusion, lipofection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEI)- mediated transfection, DEAE-dextran mediated transfection, liposome-mediated transfection, particle gun technology, calcium phosphate precipitation, direct micro injection, nanoparticle- mediated nucleic acid delivery (see, e.g., Panyam et., al Adv Drug Deliv Rev. 2012 Sep 13. pii: S0169-409X(12)00283-9. doi: 10.1016/j.addr.2012.09.023 ), and the like.
[00319] As used herein, a "promoter sequence" is a DNA regulatory region capable of binding RNA polymerase and initiating transcription of a downstream (3' direction) coding or non-coding sequence. For purposes of the present disclosure, a promoter sequence is bounded at its 3' terminus by the transcription initiation site and extends upstream (5' direction) to include the minimum number of bases or elements necessary to initiate transcription at levels detectable above background. Within the promoter sequence will be found a transcription initiation site, as well as protein binding domains responsible for the binding of RNA polymerase. Eukaryotic promoters will often, but not always, contain "TATA" boxes and "CAT" boxes. Various promoters, including inducible promoters, may be used to drive the various vectors of the present invention.
[00320] A promoter can be a constitutively active promoter (i.e., a promoter that is constitutively in an active/"ON" state), it may be an inducible promoter (i.e., a promoter whose state, active/"ON" or inactive/"OFF", is controlled by an external stimulus, e.g., the presence of a particular temperature, compound, or protein.), it may be a spatially restricted promoter (i.e.,
transcriptional control element, enhancer, etc.)(e.g., tissue specific promoter, cell type specific promoter, etc.), and it may be a temporally restricted promoter (i.e., the promoter is in the "ON" state or "OFF" state during specific stages of embryonic development or during specific stages of a biological process, e.g., hair follicle cycle in mice).
[00321] Suitable promoters can be derived from viruses and can therefore be referred to as viral
promoters, or they can be derived from any organism, including prokaryotic or eukaryotic organisms. Suitable promoters can be used to drive expression by any RNA polymerase (e.g., pol I, pol II, pol III). Exemplary promoters include, but are not limited to the SV40 early promoter, mouse mammary tumor virus long terminal repeat (LTR) promoter; adenovirus major late promoter (Ad MLP); a herpes simplex virus (HSV) promoter, a cytomegalovirus (CMV) promoter such as the CMV immediate early promoter region (CMVIE), a rous sarcoma virus (RSV) promoter, a human U6 small nuclear promoter (U6) (Miyagishi et al. , Nature
Biotechnology 20, 497 - 500 (2002)), an enhanced U6 promoter (e.g., Xia et al., Nucleic Acids Res. 2003 Sep 1 ;31(17)), a human HI promoter (HI), and the like.
[00322] Examples of inducible promoters include, but are not limited toT7 RNA polymerase promoter,
T3 RNA polymerase promoter, isopropyl-beta-D-thiogalactopyranoside (IPTG)-regulated
promoter, lactose induced promoter, heat shock promoter, Tetracycline-regulated promoter, Steroid-regulated promoter, Metal-regulated promoter, estrogen receptor-regulated promoter, etc. Inducible promoters can therefore be regulated by molecules including, but not limited to, doxycycline; RNA polymerase, e.g., T7 RNA polymerase; an estrogen receptor; an estrogen receptor fusion; etc.
[00323] In some embodiments, the promoter is a spatially restricted promoter (i.e., cell type specific promoter, tissue specific promoter, etc.) such that in a multi-cellular organism, the promoter is active (i.e., "ON") in a subset of specific cells. Spatially restricted promoters may also be referred to as enhancers, transcriptional control elements, control sequences, etc. Any convenient spatially restricted promoter may be used and the choice of suitable promoter (e.g., a brain specific promoter, a promoter that drives expression in a subset of neurons, a promoter that drives expression in the germline, a promoter that drives expression in the lungs, a promoter that drives expression in muscles, a promoter that drives expression in islet cells of the pancreas, etc.) will depend on the organism. For example, various spatially restricted promoters are known for plants, flies, worms, mammals, mice, etc. Thus, a spatially restricted promoter can be used to regulate the expression of a nucleic acid encoding a subject site-directed modifying polypeptide in a wide variety of different tissues and cell types, depending on the organism. Some spatially restricted promoters are also temporally restricted such that the promoter is in the "ON" state or "OFF" state during specific stages of embryonic development or during specific stages of a biological process (e.g., hair follicle cycle in mice).
[00324] For illustration purposes, examples of spatially restricted promoters include, but are not limited to, neuron-specific promoters, adipocyte-specific promoters, cardiomyocyte-specific promoters, smooth muscle-specific promoters, photoreceptor-specific promoters, etc. Neuron-specific spatially restricted promoters include, but are not limited to, a neuron-specific enolase (NSE) promoter (see, e.g., EMBL HSEN02, X51956); an aromatic amino acid decarboxylase (AADC) promoter; a neurofilament promoter (see, e.g., GenBank HUMNFL, L04147); a synapsin promoter (see, e.g., GenBank HUMSYNIB, M55301); a thy-1 promoter (see, e.g., Chen et al. (1987) Cell 51:7-19; and Llewellyn, et al. (2010) Nat. Med. 16(10): 1161-1166); a serotonin receptor promoter (see, e.g., GenBank S62283); a tyrosine hydroxylase promoter (TH) (see, e.g., Oh et al. (2009) Gene Ther 16:437; Sasaoka et al. (1992) Mol. Brain Res. 16:274; Boundy et al. (1998) J. Neurosci. 18:9989; and Kaneda et al. (1991) Neuron 6:583-594); a GnRH promoter (see, e.g., Radovick et al. (1991) Proc. Natl. Acad. Sci. USA 88:3402-3406); an L7 promoter (see, e.g., Oberdick et al. (1990) Science 248:223-226); a DNMT promoter (see, e.g., Bartge et al. (1988) Proc. Natl. Acad. Sci. USA 85:3648-3652); an enkephalin promoter (see, e.g., Comb et al. (1988) EMBO J. 17:3793-3805); a myelin basic protein (MBP) promoter; a Ca2+-
calmodulin-dependent protein kinase II-alpha (CamKIIa) promoter (see, e.g., Mayford et al.
(1996) Proc. Natl. Acad. Sci. USA 93:13250; and Casanova et al. (2001) Genesis 31:37); a CMV enhancer/platelet-derived growth factor-β promoter (see, e.g., Liu et al. (2004) Gene Therapy 11:52-60); and the like.
[00325] Adipocyte-specific spatially restricted promoters include, but are not limited to aP2 gene
promoter/enhancer, e.g., a region from -5.4 kb to +21 bp of a human aP2 gene (see, e.g., Tozzo et al. (1997) Endocrinol. 138:1604; Ross et al. (1990) Proc. Natl. Acad. Sci. USA 87:9590; and Pavjani et al. (2005) Nat. Med. 11:797); a glucose transporter-4 (GLUT4) promoter (see, e.g., Knight et al. (2003) Proc. Natl. Acad. Sci. USA 100:14725); a fatty acid translocase
(FAT/CD36) promoter (see, e.g., Kuriki et al. (2002) Biol. Pharm. Bull. 25:1476; and Sato et al. (2002) J. Biol. Chem. 277:15703); a stearoyl-CoA desaturase-1 (SCD1) promoter (Tabor et al. (1999) J. Biol. Chem. 274:20603); a leptin promoter (see, e.g., Mason et al. (1998) Endocrinol. 139:1013; and Chen et al. (1999) Biochem. Biophys. Res. Comm. 262:187); an adiponectin promoter (see, e.g., Kita et al. (2005) Biochem. Biophys. Res. Comm. 331:484; and Chakrabarti (2010) Endocrinol. 151:2408); an adipsin promoter (see, e.g., Piatt et al. (1989) Proc. Natl. Acad. Sci. USA 86:7490); a resistin promoter (see, e.g., Seo et al. (2003) Molec. Endocrinol. 17:1522); and the like.
[00326] Cardiomyocyte-specific spatially restricted promoters include, but are not limited to control sequences derived from the following genes: myosin light chain-2, a-myosin heavy chain, AE3, cardiac troponin C, cardiac actin, and the like. Franz et al. (1997) Cardiovasc. Res. 35:560-566; Robbins et al. (1995) Ann. N.Y. Acad. Sci. 752:492-505; Linn et al. (1995) Circ. Res. 76:584- 591; Parmacek et al. (1994) Mol. Cell. Biol. 14:1870-1885; Hunter et al. (1993) Hypertension 22:608-617; and Sartorelli et al. (1992) Proc. Natl. Acad. Sci. USA 89:4047-4051.
[00327] Smooth muscle-specific spatially restricted promoters include, but are not limited to an SM22a promoter (see, e.g., Akyurek et al. (2000) Mol. Med. 6:983; and U.S. Patent No. 7,169,874); a smoothelin promoter (see, e.g., WO 2001/018048); an a-smooth muscle actin promoter; and the like. For example, a 0.4 kb region of the SM22a promoter, within which lie two CArG elements, has been shown to mediate vascular smooth muscle cell-specific expression (see, e.g., Kim, et al.
(1997) Mol. Cell. Biol. 17, 2266-2278; Li, et al., (1996) J. Cell Biol. 132, 849-859; and
Moessler, et al. (1996) Development 122, 2415-2425).
[00328] Photoreceptor-specific spatially restricted promoters include, but are not limited to, a rhodopsin promoter; a rhodopsin kinase promoter (Young et al. (2003) Ophthalmol. Vis. Sci. 44:4076); a beta phosphodiesterase gene promoter (Nicoud et al. (2007) J. Gene Med. 9:1015); a retinitis pigmentosa gene promoter (Nicoud et al. (2007) supra); an interphotoreceptor retinoid-binding
protein (IRBP) gene enhancer (Nicoud et al. (2007) supra); an IRBP gene promoter (Yokoyama et al. (1992) Exp Eye Res. 55:225); and the like.
HOST CELLS
[00329] The present disclosure provides host cells comprising (e.g., genetically modified to comprise) a nucleic acid of the present disclosure. The present disclosure provides host cells comprising (e.g., genetically modified to comprise) a recombinant vector of the present disclosure.
[00330] Suitable host cells include, e.g. a bacterial cell; an archaeal cell; a cell of a single -cell
eukaryotic organism; a plant cell; an algal cell, e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella p renoidosa, Sargassum patens C. Agardh, and the like; a fungal cell (e.g., a yeast cell); an animal cell; a cell from an invertebrate animal (e.g. fruit fly, cnidarian, echinoderm, nematode, etc.); a cell from a vertebrate animal (e.g., fish, amphibian, reptile, bird, mammal); a cell from a mammal (e.g., a cell from a rodent, a cell from a human, etc.); and the like.
[00331] A suitable host cell can be a stem cell (e.g. an embryonic stem (ES) cell, an induced pluripotent stem (iPS) cell); a germ cell; a somatic cell, e.g. a fibroblast, a hematopoietic cell, a neuron, a muscle cell, a bone cell, a hepatocyte, a pancreatic cell; an in vitro or in vivo embryonic cell of an embryo at any stage, e.g., a 1-cell, 2-cell, 4-cell, 8-cell, etc. stage zebrafish embryo; etc.). Cells may be from established cell lines or they may be primary cells, where "primary cells", "primary cell lines", and "primary cultures" are used interchangeably herein to refer to cells and cells cultures that have been derived from a subject and allowed to grow in vitro for a limited number of passages, i.e. splittings, of the culture. For example, primary cultures include cultures that may have been passaged 0 times, 1 time, 2 times, 4 times, 5 times, 10 times, or 15 times, but not enough times go through the crisis stage. Primary cell lines can be maintained for fewer than 10 passages in vitro. Host cells are in many embodiments unicellular organisms, or are grown in culture.
[00332] If the cells are primary cells, they may be harvest from an organism (e.g., an individual) by any convenient method. For example, leukocytes may be conveniently harvested by apheresis, leukocytapheresis, density gradient separation, etc., while cells from tissues such as skin, muscle, bone marrow, spleen, liver, pancreas, lung, intestine, stomach, etc. are most conveniently harvested by biopsy. An appropriate solution may be used for dispersion or suspension of the harvested cells. Such solution will generally be a balanced salt solution, e.g. normal saline, phosphate-buffered saline (PBS), Hank's balanced salt solution, etc., conveniently supplemented with fetal calf serum or other naturally occurring factors, in conjunction with an acceptable buffer at low concentration, e.g., from 5-25 mM. Convenient buffers include HEPES, phosphate buffers, lactate buffers, etc. The cells may be used immediately, or they may be stored, frozen,
for long periods of time, being thawed and capable of being reused. In such cases, the cells can be frozen in 10% dimethyl sulfoxide (DMSO), 50% serum, 40% buffered medium, or some other such solution as is commonly used in the art to preserve cells at such freezing
temperatures, and thawed in a manner as commonly known in the art for thawing frozen cultured cells.
[00333] In some embodiments, a subject genetically modified host cell is in vitro. In some
embodiments, a subject genetically modified host cell is in vivo. In some embodiments, a subject genetically modified host cell is a prokaryotic cell or is derived from a prokaryotic cell. In some embodiments, a subject genetically modified host cell is a bacterial cell or is derived from a bacterial cell. In some embodiments, a subject genetically modified host cell is an archaeal cell or is derived from an archaeal cell. In some embodiments, a subject genetically modified host cell is a eukaryotic cell or is derived from a eukaryotic cell. In some embodiments, a subject genetically modified host cell is a plant cell or is derived from a plant cell. In some
embodiments, a subject genetically modified host cell is an animal cell or is derived from an animal cell. In some embodiments, a subject genetically modified host cell is an invertebrate cell or is derived from an invertebrate cell. In some embodiments, a subject genetically modified host cell is a vertebrate cell or is derived from a vertebrate cell. In some embodiments, a subject genetically modified host cell is a mammalian cell or is derived from a mammalian cell. In some embodiments, a subject genetically modified host cell is a rodent cell or is derived from a rodent cell. In some embodiments, a subject genetically modified host cell is a human cell or is derived from a human cell.
[00334] The present disclosure further provides progeny of a subject genetically modified cell, where the progeny can comprise the same exogenous nucleic acid or polypeptide as the subject genetically modified cell from which it was derived. The present disclosure further provides a composition comprising a subject genetically modified host cell.
NON-HUMAN GENETICALLY MODIFIED ORGANISMS
[00335] In some embodiments, a genetically modified host cell has been genetically modified with an exogenous nucleic acid comprising a nucleotide sequence encoding a subject Cas9 heterodimer (e.g., a circular permuted Cas9 heterodimer; a Cas9 heterodimer having a dimer pair; comprising WT Cas9 sequences; a variant Cas9, having mutations relative to a WT Cas9, such as a Cas9 with reduced nuclease activity and/or a heterologous sequence that provides an activity such as transcription modulation, methylation, etc. ; and the like). If such a cell is a eukaryotic single- cell organism, then the modified cell can be considered a genetically modified organism. In some
embodiments, subject non-human genetically modified organism is a Cas9 transgenic multicellular organism.
[00336] In some embodiments, a subject genetically modified non-human host cell (e.g., a cell that has been genetically modified with an exogenous nucleic acid comprising a nucleotide sequence encoding a subject Cas9 heterodimer e (e.g., a circular permuted Cas9 heterodimer, a Cas9 heterodimer having a dimer pair; etc.) can generate a subject genetically modified non-human organism (e.g., a mouse, a fish, a frog, a fly, a worm, etc.). For example, if the genetically modified host cell is a pluripotent stem cell (i.e., PSC) or a germ cell (e.g., sperm, oocyte, etc.), an entire genetically modified organism can be derived from the genetically modified host cell. In some embodiments, the genetically modified host cell is a pluripotent stem cell (e.g., ESC, iPSC, pluripotent plant stem cell, etc.) or a germ cell (e.g., sperm cell, oocyte, etc.), either in vivo or in vitro, that can give rise to a genetically modified organism. In some embodiments the genetically modified host cell is a vertebrate PSC (e.g., ESC, iPSC, etc.) and is used to generate a genetically modified organism (e.g. by injecting a PSC into a blastocyst to produce a chimeric/mosaic animal, which could then be mated to generate non-chimeric/non-mosaic genetically modified organisms; grafting in the case of plants; etc.). Any convenient method/protocol for producing a genetically modified organism is suitable for producing a genetically modified host cell comprising an exogenous nucleic acid comprising a nucleotide sequence encoding a subject Cas9 heterodimer e (e.g., a circular permuted Cas9 heterodimer, a Cas9 heterodimer having a dimer pair; etc.). Methods of producing genetically modified organisms are known in the art. For example, see Cho et al., Curr Protoc Cell Biol. 2009 Mar;Chapter 19:Unit 19.11: Generation of transgenic mice; Gama et al., Brain Struct Funct. 2010 Mar;214(2-3):91-109. Epub 2009 Nov 25: Animal transgenesis: an overview; Husaini et al., GM Crops. 2011 Jun-Dec;2(3): 150-62. Epub 2011 Jun 1: Approaches for gene targeting and targeted gene expression in plants.
[00337] In some embodiments, a genetically modified organism comprises a target cell for methods of the invention, and thus can be considered a source for target cells. For example, if a genetically modified cell comprising one or more exogenous nucleic acids comprising nucleotide sequences encoding the two polypeptides of a subject Cas9 heterodimer is used to generate a genetically modified organism, then the cells of the genetically modified organism comprise the one or more exogenous nucleic acids comprising nucleotide sequences encoding the two polypeptides of a subject Cas9 heterodimer. In some such embodiments, the DNA of a cell or cells of the genetically modified organism can be targeted for modification by introducing into the cell or cells a Cas9 guide RNA (e.g., a truncated Cas9 guide RNA) (or a nucleic acid encoding the Cas9 guide RNA) and in some cases a dimerizer (e.g., and/or in some cases a donor polynucleotide).
For example, the introduction of a Cas9 guide RNA (or a DNA encoding the same) into a subset of cells (e.g., brain cells, intestinal cells, kidney cells, lung cells, blood cells, etc.) of the genetically modified organism can target the DNA of such cells for modification, the genomic location of which will depend on the targeting sequence of the introduced Cas9 guide RNA.
[00338] In some embodiments, a genetically modified organism is a source of target cells for methods of the invention. For example, a genetically modified organism comprising cells that are genetically modified with an exogenous nucleic acid comprising a nucleotide sequence encoding a subject Cas9 heterodimer can provide a source of genetically modified cells, for example PSCs (e.g., ESCs, iPSCs, sperm, oocytes, etc.), neurons, progenitor cells, cardiomyocytes, etc.
[00339] In some embodiments, a genetically modified cell is a PSC comprising an exogenous nucleic acid comprising a nucleotide sequence encoding a subject Cas9 heterodimer. As such, the PSC can be a target cell such that the DNA of the PSC can be targeted for modification by introducing into the PSC a Cas9 guide RNA (e.g., a truncated Cas9 guide RNA) (or a nucleic acid encoding the Cas9 guide RNA) and in some cases a dimerizer (e.g., light, a dimerizing agent, etc.), and optionally a donor nucleic acid (donor polynucleotide), and the genomic location of the modification will depend on the targeting sequence of the introduced Cas9 guide RNA. Thus, in some embodiments, the methods described herein can be used to modify the DNA (e.g., delete and/or replace any desired genomic location) of PSCs derived from a subject genetically modified organism. Such modified PSCs can then be used to generate organisms having both (i) an exogenous nucleic acid comprising a nucleotide sequence encoding a subject Cas9 heterodimer and (ii) a DNA modification that was introduced into the PSC.
[00340] An exogenous nucleic acid comprising a nucleotide sequence encoding a subject Cas9
heterodimer (e.g., comprising WT Cas9 sequences; a variant Cas9, having mutations relative to a WT Cas9, such as a Cas9 with reduced nuclease activity and/or a heterologous sequence that provides an activity such as transcription modulation, methylation, etc.) can be under the control of (i.e., operably linked to) an unknown promoter (e.g., when the nucleic acid randomly integrates into a host cell genome) or can be under the control of (i.e., operably linked to) a known promoter. Suitable known promoters can be any known promoter and include constitutively active promoters (e.g., CMV promoter), inducible promoters (e.g., heat shock promoter, Tetracycline-regulated promoter, Steroid-regulated promoter, Metal-regulated promoter, estrogen receptor-regulated promoter, etc.), spatially restricted and/or temporally restricted promoters (e.g., a tissue specific promoter, a cell type specific promoter, etc.), etc.
[00341] A subject genetically modified non-human organism can be any organism other than a human, including for example, a plant; algae; an invertebrate (e.g., a cnidarian, an echinoderm, a worm, a fly, etc.); a vertebrate (e.g., a fish (e.g., zebrafish, puffer fish, gold fish, etc.), an amphibian
(e.g., salamander, frog, etc.), a reptile, a bird, a mammal, etc.); an ungulate (e.g., a goat, a pig, a sheep, a cow, etc.); a rodent (e.g., a mouse, a rat, a hamster, a guinea pig); a lagomorpha (e.g., a rabbit); etc.
Transgenic non-human animals
[00342] As described above, in some embodiments, a subject nucleic acid (e.g., one or more nucleic acids comprising nucleotide sequences encoding a subject Cas9 heterodimer, e.g., a circular permuted Cas9 heterodimer; a Cas9 heterodimer having a dimer pair; comprising WT Cas9 sequences; a variant Cas9, having mutations relative to a WT Cas9, such as a Cas9 with reduced nuclease activity and/or a heterologous sequence that provides an activity such as transcription modulation, methylation, etc. ; and the like), e.g., a recombinant expression vector, is used as a transgene to generate a transgenic animal that produces a Cas9 heterodimer. Thus, the present disclosure further provides a transgenic non-human animal, which animal comprises a transgene comprising a subject nucleic acid comprising a nucleotide sequence encoding a Cas9 heterodimer (e.g., one ore more nucleic acids comprising nucleotide sequences encoding a Cas9 heterodimer). In some embodiments, the genome of the transgenic non-human animal comprises a subject nucleotide sequence encoding a Cas9 heterodimer. In some embodiments, the transgenic non-human animal is homozygous for the genetic modification. In some
embodiments, the transgenic non-human animal is heterozygous for the genetic modification. In some embodiments, the transgenic non-human animal is a vertebrate, for example, a fish (e.g., zebra fish, gold fish, puffer fish, cave fish, etc.), an amphibian (frog, salamander, etc.), a bird (e.g., chicken, turkey, etc.), a reptile (e.g., snake, lizard, etc.), a mammal (e.g., an ungulate, e.g., a pig, a cow, a goat, a sheep, etc.; a lagomorph (e.g., a rabbit); a rodent (e.g., a rat, a mouse); a non-human primate; etc.), etc.
[00343] Nucleotide sequences encoding a Cas9 heterodimer (e.g., one or more nucleic acids comprising nucleotide sequences encoding a Cas9 heterodimer) can be under the control of (i.e., operably linked to) an unknown promoter (e.g., when the nucleic acid randomly integrates into a host cell genome) or can be under the control of (i.e., operably linked to) a known promoter. Suitable known promoters can be any known promoter and include constitutively active promoters (e.g., CMV promoter), inducible promoters (e.g., heat shock promoter, Tetracycline-regulated promoter, Steroid-regulated promoter, Metal-regulated promoter, estrogen receptor-regulated promoter, etc.), spatially restricted and/or temporally restricted promoters (e.g., a tissue specific promoter, a cell type specific promoter, etc.), etc.
Transgenic plants
[00344] As described above, in some embodiments, a subject nucleic acid (e.g., one or more nucleic acids comprising nucleotide sequences encoding a subject Cas9 heterodimer, e.g., a circular
permuted Cas9 heterodimer; a Cas9 heterodimer having a dimer pair; comprising WT Cas9 sequences; a variant Cas9, having mutations relative to a WT Cas9, such as a Cas9 with reduced nuclease activity and/or a heterologous sequence that provides an activity such as transcription modulation, methylation, etc. ; and the like), e.g., a recombinant expression vector, is used as a transgene to generate a transgenic plant that produces a Cas9 heterodimer. Thus, the present disclosure further provides a transgenic plant, which plant comprises a transgene comprising a subject nucleic acid comprising a nucleotide sequence encoding a Cas9 heterodimer (e.g., one or more nucleic acids comprising nucleotide sequences encoding a Cas9 heterodimer). In some embodiments, the genome of the transgenic plant comprises a subject nucleic acid. In some embodiments, the transgenic plant is homozygous for the genetic modification. In some embodiments, the transgenic plant is heterozygous for the genetic modification.
[00345] Methods of introducing exogenous nucleic acids into plant cells are well known in the art. Such plant cells are considered "transformed," as defined above. Suitable methods include viral infection (such as double stranded DNA viruses), transfection, conjugation, protoplast fusion, electroporation, particle gun technology, calcium phosphate precipitation, direct microinjection, silicon carbide whiskers technology, Agrobacterium-mediated transformation and the like. The choice of method is generally dependent on the type of cell being transformed and the circumstances under which the transformation is taking place (i.e. in vitro, ex vivo, or in vivo).
[00346] Transformation methods based upon the soil bacterium Agrobacterium tumefaciens are
particularly useful for introducing an exogenous nucleic acid molecule into a vascular plant. The wild type form of Agrobacterium contains a Ti (tumor-inducing) plasmid that directs production of tumorigenic crown gall growth on host plants. Transfer of the tumor-inducing T-DNA region of the Ti plasmid to a plant genome requires the Ti plasmid-encoded virulence genes as well as T-DNA borders, which are a set of direct DNA repeats that delineate the region to be transferred. An Agrobacterium-based vector is a modified form of a Ti plasmid, in which the tumor inducing functions are replaced by the nucleic acid sequence of interest to be introduced into the plant host.
[00347] Agrobacterium-mediated transformation generally employs cointegrate vectors or binary vector systems, in which the components of the Ti plasmid are divided between a helper vector, which resides permanently in the Agrobacterium host and carries the virulence genes, and a shuttle vector, which contains the gene of interest bounded by T-DNA sequences. A variety of binary vectors are well known in the art and are commercially available, for example, from Clontech (Palo Alto, Calif.). Methods of coculturing Agrobacterium with cultured plant cells or wounded tissue such as leaf tissue, root explants, hypocotyledons, stem pieces or tubers, for example, also
are well known in the art. See., e.g., Glick and Thompson, (eds.), Methods in Plant Molecular Biology and Biotechnology, Boca Raton, Fla.: CRC Press (1993).
[00348] Microprojectile-mediated transformation also can be used to produce a subject transgenic plant.
This method, first described by Klein et al. (Nature 327:70-73 (1987)), relies on microprojectiles such as gold or tungsten that are coated with the desired nucleic acid molecule by precipitation with calcium chloride, spermidine or polyethylene glycol. The microprojectile particles are accelerated at high speed into an angiosperm tissue using a device such as the BIOLISTIC PD- 1000 (Biorad; Hercules Calif.).
[00349] A subject nucleic acid may be introduced into a plant in a manner such that the nucleic acid is able to enter a plant cell(s), e.g., via an in vivo or ex vivo protocol. By "in vivo," it is meant in the nucleic acid is administered to a living body of a plant e.g. infiltration. By "ex vivo" it is meant that cells or explants are modified outside of the plant, and then such cells or organs are regenerated to a plant. A number of vectors suitable for stable transformation of plant cells or for the establishment of transgenic plants have been described, including those described in Weissbach and Weissbach, (1989) Methods for Plant Molecular Biology Academic Press, and Gelvin et al., (1990) Plant Molecular Biology Manual, Kluwer Academic Publishers. Specific examples include those derived from a Ti plasmid of Agrobacterium tumefaciens, as well as those disclosed by Herrera-Estrella et al. (1983) Nature 303: 209, Bevan (1984) Nucl Acid Res. 12: 8711-8721, Klee (1985) Bio/Technolo 3: 637-642. Alternatively, non-Ti vectors can be used to transfer the DNA into plants and cells by using free DNA delivery techniques. By using these methods transgenic plants such as wheat, rice (Christou (1991) Bio/Technology 9:957-9 and 4462) and corn (Gordon-Kamm (1990) Plant Cell 2: 603-618) can be produced. An immature embryo can also be a good target tissue for monocots for direct DNA delivery techniques by using the particle gun (Weeks et al. (1993) Plant Physiol 102: 1077-1084; Vasil (1993)
Bio/Technolo 10: 667-674; Wan and Lemeaux (1994) Plant Physiol 104: 37-48 and for
Agrobacterium-mediated DNA transfer (Ishida et al. (1996) Nature Biotech 14: 745-750).
Exemplary methods for introduction of DNA into chloroplasts are biolistic bombardment, polyethylene glycol transformation of protoplasts, and microinjection (Danieli et al Nat.
Biotechnol 16:345-348, 1998; Staub et al Nat. Biotechnol 18: 333-338, 2000; O'Neill et al Plant J. 3:729-738, 1993; Knoblauch et al Nat. Biotechnol 17: 906-909; U.S. Pat. Nos. 5,451,513, 5,545,817, 5,545,818, and 5,576,198; in Intl. Application No. WO 95/16783; and in Boynton et al., Methods in Enzymology 217: 510-536 (1993), Svab et al., Proc. Natl. Acad. Sci. USA 90: 913-917 (1993), and McBride et al., Proc. Nati. Acad. Sci. USA 91 : 7301-7305 (1994)). Any vector suitable for the methods of biolistic bombardment, polyethylene glycol transformation of protoplasts and microinjection will be suitable as a targeting vector for chloroplast
transformation. Any double stranded DNA vector may be used as a transformation vector, especially when the method of introduction does not utilize Agrobacterium.
[00350] Plants which can be genetically modified include grains, forage crops, fruits, vegetables, oil seed crops, palms, forestry, and vines. Specific examples of plants which can be modified follow: maize, banana, peanut, field peas, sunflower, tomato, canola, tobacco, wheat, barley, oats, potato, soybeans, cotton, carnations, sorghum, lupin and rice.
[00351] Also provided by the subject disclosure are transformed plant cells, tissues, plants and products that contain the transformed plant cells. A feature of the subject transformed cells, and tissues and products that include the same is the presence of a subject nucleic acid integrated into the genome, and production by plant cells of a heterodimeric Cas9. Recombinant plant cells of the present invention are useful as populations of recombinant cells, or as a tissue, seed, whole plant, stem, fruit, leaf, root, flower, stem, tuber, grain, animal feed, a field of plants, and the like.
[00352] Nucleotide sequences encoding a heterodimeric Cas9 can be under the control of (i.e., operably linked to) an unknown promoter (e.g., when the nucleic acid randomly integrates into a host cell genome) or can be under the control of (i.e., operably linked to) a known promoter. Suitable known promoters can be any known promoter and include constitutively active promoters, inducible promoters, spatially restricted and/or temporally restricted promoters, etc.
SYSTEMS AND KITS
[00353] The present disclosure provides a system and/or kit comprising: a) a Cas9 heterodimer of the present disclosure; and b) a Cas9 guide RNA (e.g., a Cas9 guide RNA that comprises stem loop 1 but does not comprise at least one of: a stem loop 2 and a stem loop 3). The Cas9 guide RNA does not by itself induce heterodimerization of the Cas9 heterodimer; instead, dimerization of the first fusion polypeptide and the second fusion polypeptide of the Cas9 heterodimer is induced when the first fusion partner and the second fusion partner dimerize with one another. In some cases, the first fusion partner and the second fusion partner dimerize in the presence of a small molecule. In some cases, the first fusion partner and the second fusion partner dimerize in the presence of light of a particular wavelength range.
[00354] A system and/or kit of the present disclosure comprises a Cas9 guide RNA that comprises stem loop 1 but does not comprise at least one of: a stem loop 2 and a stem loop 3. In some cases, the Cas9 guide RNA does not comprise a stem loop 2 and a does not comprise stem loop 3. In some cases, a system and/or kit of the present disclosure comprises a small molecule dimerizer that induces dimerization of the first fusion polypeptide and the second fusion polypeptide. Small molecule dimerizers (also referred to herein as "small molecule dimerizing agents") are
described elsewhere herein. In some cases, a system and/or kit of the present disclosure comprises a PAMmer (described in more detail below). In some cases, a system and/or kit of the present disclosure comprises a Donor polynucleotide (described in more detail below).
[00355] Components of a subject kit can be in separate containers; or can be combined in a single
container. Any of the kits described herein can further include one or more additional reagents, where such additional reagents can be selected from: a dilution buffer; a reconstitution solution; a wash buffer; a control reagent; a control expression vector or RNA polynucleotide; a reagent for in vitro production of a Cas9 heterodimer from DNA, and the like.
[00356] In addition to above-mentioned components, a subject kit can further include instructions for using the components of the kit to practice the subject methods. The instructions for practicing the subject methods are generally recorded on a suitable recording medium. For example, the instructions may be printed on a substrate, such as paper or plastic, etc. As such, the instructions may be present in the kits as a package insert, in the labeling of the container of the kit or components thereof (i.e., associated with the packaging or subpackaging) etc. In other embodiments, the instructions are present as an electronic storage data file present on a suitable computer readable storage medium, e.g. CD-ROM, diskette, flash drive, etc. In yet other embodiments, the actual instructions are not present in the kit, but means for obtaining the instructions from a remote source, e.g. via the internet, are provided. An example of this embodiment is a kit that includes a web address where the instructions can be viewed and/or from which the instructions can be downloaded. As with the instructions, this means for obtaining the instructions is recorded on a suitable substrate.
[00357] Cas9 Guide RNA
[00358] A nucleic acid molecule that binds to a Cas9 protein (and therefore also binds to a subject Cas9 heterodimer) and targets the Cas9 protein to a specific location within the target nucleic acid is referred to herein as a "guide nucleic acid" or "Cas9 guide RNA." In some cases, a guide nucleic acid is RNA, and in some cases, can be a hybrid nucleic acid that includes both
deoxyribonucleotides and ribonucleotides. For the sake of simplicity, as used herein, the terms that include the phrase "guide RNA" (e.g., the terms "Cas9 guide RNA", "truncated guide RNA", "guide RNA", and such) are meant to encompass guide RNAs and guide nucleic acids that include components/regions/sections other than RNA (e.g., deoxyribonucleotide regions; modified nucleotides such as base modifications, sugar modifications, nucleotide linkage modifications, and the like; etc). Also, to distinguish a guide RNA that interacts and guides a Cas9 protein (e.g., a Cas9 heterodimer) from other guide RNAs in the art, the term "Cas9 guide RNA" is herein used to refer to a guide RNA (and to modified guide RNAs having
deoxyribonucleotides and/or other modifications) that interacts with a Cas9 protein (e.g., a Cas9 heterodimer) and targets the protein to a particular location (the target sequence) within a target nucleic acid.
[00359] A subject Cas9 guide RNA comprises two segments, a first segment (referred to herein as a
"targeting segment"); and a second segment (referred to herein as a "protein-binding segment"). By "segment" it is meant a segment/section/region of a molecule, e.g., a contiguous stretch of nucleotides in a nucleic acid molecule. A segment can also mean a region/section of a complex such that a segment may comprise regions of more than one molecule.
[00360] The first segment (targeting segment) of a Cas9 guide RNA comprises a nucleotide sequence that is complementary to a specific sequence (a target site) within a target nucleic acid (e.g, a target ssRNA, a target ssDNA, the complementary strand of a double stranded target DNA, etc.). The protein-binding segment (or "protein-binding sequence") interacts with a Cas9 polypeptide. Site-specific binding and/or cleavage of the target nucleic acid can occur at locations determined by base-pairing complementarity between the Cas9 guide RNA and the target nucleic acid.
[00361] The protein-binding segment of a subject Cas9 guide RNA comprises two complementary stretches of nucleotides that hybridize to one another to form a double stranded RNA duplex (dsRNA duplex).
[00362] A subject Cas9 guide RNA and a subject Cas9 heterodimer form a complex (i.e., bind via non- covalent interactions). The Cas9 guide RNA provides target specificity to the complex by comprising a nucleotide sequence that is complementary to a sequence of a target nucleic acid. The Cas9 heterodimer of the complex provides the site-specific activity. In other words, the Cas9 heterodimer is guided to a target nucleic acid sequence (e.g. a target sequence in a chromosomal nucleic acid; a target sequence in an extrachromosomal nucleic acid, e.g. an episomal nucleic acid, a minicircle, an ssRNA, an ssDNA, etc. ; a target sequence in a mitochondrial nucleic acid; a target sequence in a chloroplast nucleic acid; a target sequence in a plasmid; etc.) by virtue of its association with the protein-binding segment of the Cas9 guide RNA.
[00363] The targeting sequence (the targeting segment) of a Cas9 guide RNA can be modified so that the Cas9 guide RNA can target a Cas9 heterodimer to any desired sequence of any desired target nucleic acid, with the exception (as is known in the art) that the PAM sequence is taken into account. Thus, for example, a Cas9 guide RNA can have a targeting segment with a sequence that can hybridize to a sequence in a eukaryotic nucleic acid (e.g., a eukaryotic chromosome, chromosomal sequence, a eukaryotic RNA, etc.).
[00364] In some embodiments, a subject Cas9 guide RNA comprises two separate nucleic acid
molecules: an "activator" and a "targeter" and is referred to herein as a "dual Cas9 guide RNA",
a "double-molecule Cas9 guide RNA", or a "two-molecule Cas9 guide RNA" a "dual guide RNA", or a "dgRNA." In some embodiments, the activator and targeter are covalently linked to one another (e.g., via intervening nucleotides) and the resulting guide RNA is referred to as a "single guide RNA", a "single Cas9 guide RNA", a "single-molecule Cas9 guide RNA," or a "one-molecule Cas9 guide RNA", or simply "sgRNA."
[00365] As noted above, in some cases, a Cas9 guide RNA is a DNA/RNA hybrid molecule. In such cases, the protein-binding segment of the Cas9 guide RNA is RNA and forms an RNA duplex. However, the targeting segment of a Cas9 guide RNA can be DNA. Thus, if a DNA/RNA hybrid guide nucleic acid is a dual guide nucleic acid, the "targeter" molecule and be a hybrid molecule (e.g, the targeting segment can be DNA and the duplex-forming segment can be RNA). In such cases, the duplex-forming segment of the "activator" molecule can be RNA (e.g., in order to form an RNA-duplex with the duplex-forming segment of the targeter molecule), while nucleotides of the "activator" molecule that are outside of the duplex-forming segment can be DNA (in which case the activator molecule is a hybrid DNA/RNA molecule) or can be RNA (in which case the activator molecule is RNA). If a DNA/RNA hybrid guide nucleic acid is a single guide nucleic acid, then the targeting segment can be DNA, the duplex-forming segments (which make up the protein-binding segment) can be RNA, and nucleotides outside of the targeting and duplex-forming segments can be RNA or DNA. The "targeter" can also be referred to as a "targeter RNA" (even though in some cases a targeter RNA can have deoxyribonucleotides and/or other modifications) and the "activator" can be referred to as an "ativator RNA" (even though in some cases a targeter RNA can have deoxyribonucleotides and/or other
modifications).
[00366] An example dual Cas9 guide RNA comprises a crRNA-like ("CRISPR RNA" / "targeter" / "crRNA" / "crRNA repeat") molecule and a corresponding tracrRNA-like ("trans-acting CRISPR RNA" / "activator" / "tracrRNA") molecule. A crRNA-like molecule (targeter) comprises both the targeting segment (single stranded) of the guide nucleic acid and a stretch ("duplex-forming segment") of nucleotides that forms one half of the dsRNA duplex of the protein-binding segment of the Cas9 guide RNA. A corresponding tracrRNA-like molecule (activator / tracrRNA) comprises a stretch of nucleotides (duplex-forming segment) that forms the other half of the dsRNA duplex of the protein-binding segment of the guide nucleic acid. In other words, a stretch of nucleotides of a crRNA-like molecule are complementary to and hybridize with a stretch of nucleotides of a tracrRNA-like molecule to form the dsRNA duplex of the protein-binding domain of the Cas9 guide RNA. As such, each targeter molecule can be said to have a corresponding activator molecule (which has a region that hybridizes with the targeter). The targeter molecule additionally provides the single stranded targeting segment.
Thus, a targeter and an activator molecule (as a corresponding pair) hybridize to form a dual Cas9 guide RNA. The exact sequence of a given crRNA or tracrRNA molecule is characteristic of the species in which the RNA molecules are found. A subject dual Cas9 guide RNA can include any corresponding activator and targeter pair.
[00367] The term "activator" is used herein to mean a tracrRNA-like molecule (tracrRNA : "transacting CRISPR RNA") of a Cas9 dual guide RNA (and of a Cas9 single guide RNA when the "activator" and the "targeter" are linked together by, e.g., intervening nucleotides). Thus, for example, a Cas9 guide RNA (dgRNA or sgRNA) comprises an activator sequence (e.g., a tracrRNA sequence). As is known in the art a tracr molecule (a tracrRNA) is a naturally existing molecule that hybridizes with a CRISPR RNA molecule (a crRNA) to form a Cas9 dual guide RNA. The term "activator" is used herein to encompass naturally existing tracrRNAs, but also to encompass tracrRNAs with modifications (e.g, truncations, sequence variations, base modifications, backbone modifications, linkage modifications, etc.) where the activator retains at least one function of a tracrRNA (e.g., contributes to the dsRNA duplex to which Cas9 binds). In some cases the activator provides one or more stem loops that can interact with Cas9; in some cases, the activator contributes to the dsRNA duplex to which Cas9 binds (the dsRNA duplex formed from the hybridization of the targeter and the activator) and contributes stem loop 1 ; in some cases, contributes to the dsRNA duplex to which Cas9 binds (the dsRNA duplex formed from the hybridization of the targeter and the activator) and contributes stem loops land 2; in some cases, contributes to the dsRNA duplex to which Cas9 binds (the dsRNA duplex formed from the hybridization of the targeter and the activator) and contributes stem loops land 3; in some cases, contributes to the dsRNA duplex to which Cas9 binds (the dsRNA duplex formed from the hybridization of the targeter and the activator) and contributes stem loops 1, 2, and 3; etc.). Thus, an activator can be referred to as having a tracr sequence (tracrRNA sequence) and in some cases is a tracrRNA, but the term "activator" is not limited to naturally existing tracrRNAs.
[00368] The term "targeter" is used herein to refer to a crRNA-like molecule (crRNA : "CRISPR
RNA") of a Cas9 dual guide RNA (and of a Cas9 single guide RNA when the "activator" and the "targeter" are linked together, e.g., by intervening nucleotides). Thus, for example, a Cas9 guide RNA (dgRNA or sgRNA) comprises a targeting segment (single stranded) (which comprises nucleotides that hybridize with a target nucleic acid, and a duplex-forming segment (e.g., a duplex forming segment of a crRNA, which can also be referred to as a crRNA repeat). Because the sequence of a targeting segment (the segment that hybridizes with a target sequence of a target nucleic acid) of a targeter is modified by a user to hybridize with a desired target nucleic acid, the sequence of a targeter will often be a non-naturally occurring sequence. However, the
duplex-forming segment of a targeter (described in more detail below), which hybridizes with the duplex-forming segment of an activator, can include a naturally existing sequence (e.g., can include the sequence of a duplex-forming segment of a naturally existing crRNA, which can also be referred to as a crRNA repeat). Thus, the term targeter is used herein to distinguish from naturally occurring crRNAs, despite the fact that part of a targeter (e.g., the duplex-forming segment) can include a naturally occurring sequence from a crRNA. However, the term
"targeter" encompasses naturally occurring crRNAs.
[00369] The term "duplex-forming segment" is used herein to refer to the stretch of nucleotides of an activator or a targeter that contributes to the formation of the dsRNA duplex by hybridizing to a stretch of nucleotides of a corresponding activator or targeter. In other words, an activator comprises a duplex-forming segment that is complementary to the duplex-forming segment of the corresponding targeter. As such, an activator comprises a duplex-forming segment while a targeter comprises both a duplex-forming segment and the targeting segment of the Cas9 guide RNA (sgRNA or dgRNA). A subject Cas9 single guide RNA comprises an "activator" and a "targeter" where the "activator" and the "targeter" are covalently linked (e.g., by intervening nucleotides). A subject Cas9 dual guide RNA comprises an "activator" and a "targeter" where the "activator" and the "targeter" are not covalently linked by intervening nucleotides.
[00370] A Cas9 guide RNA can also be said to include 3 parts: (i) a targeting sequence (a nucleotide sequence that hybridizes with a sequence of the target nucleic acid); (ii) an activator sequence (as described above)(in some cases, referred to as a tracr sequence); and (iii) a sequence that hybridizes to at least a portion of the activator sequence to form a double stranded duplex. A targeter has (i) and (iii); while an activator has (ii).
[00371] A Cas9 guide RNA (e.g. a dual guide RNA or a single guide RNA) can be comprised of any corresponding activator and targeter pair. Non-limiting examples of nucleotide sequences that can be included in a Cas9 guide RNA (dgRNA or sgRNA) include sequences set forth in SEQ ID NOs:431-679 and 1535-1544, or complements thereof. For example, in some cases, sequences from SEQ ID NOs: 431-562 and 1535-1544 (which are from tracrRNAs) or complements thereof, can pair with sequences from SEQ ID NOs: 563-679 (which are from crRNAs), or complements thereof , to form a dsRNA duplex of a protein binding segment.
[00372] As noted above, a targeter comprises both the targeting segment (single stranded) of the Cas9 guide RNA and a stretch ("duplex-forming segment") of nucleotides that forms one half of the dsRNA duplex of the protein-binding segment of the Cas9 guide RNA. A corresponding tracrRNA-like molecule (activator) comprises a stretch of nucleotides (a duplex-forming segment) that forms the other half of the dsRNA duplex of the protein-binding segment of the
Cas9 guide RNA. In other words, a stretch of nucleotides of the targeter is complementary to and
hybridizes with a stretch of nucleotides of the activator to form the dsRNA duplex of the protein- binding segment of a Cas9 guide RNA. As such, each targeter can be said to have a
corresponding activator (which has a region that hybridizes with the targeter). The targeter molecule additionally provides the single stranded targeting segment. Thus, a targeter and an activator (as a corresponding pair) hybridize to form a Cas9 guide RNA. The exact sequence of a given naturally existing crRNA or tracrRNA molecule is characteristic of the species in which the RNA molecules are found. Examples of suitable activator and targeter sequences include, but are not limited to, those set forth in SEQ ID NOs: 431-679 and 1535-1544. A subject Cas9 guide RNA (dgRNA or sgRNA) can include any corresponding activator and targeter sequence pair.
[00373] Targeting segment of a Cas9 guide RNA
[00374] The first segment of a subject guide nucleic acid comprises a nucleotide sequence that can be complementary to a sequence (a target site) in a target nucleic acid. In other words, the targeting segment of a subject guide nucleic acid can interact with a target nucleic acid (e.g., a single stranded RNA (ssRNA) and/or a single stranded DNA (ssDNA)) in a sequence-specific manner via hybridization (i.e., base pairing). As such, the nucleotide sequence of the targeting segment may vary and can determine the location within the target nucleic acid that the guide nucleic acid and the target nucleic acid will interact. The targeting segment of a Cas9 guide RNA can be modified (e.g., by genetic engineering) /designed to hybridize to any desired sequence (target site) within a target nucleic acid (e.g., a eukaryotic target nucleic acid).
[00375] The targeting segment can have a length of from about 12 nucleotides to about 100 nucleotides.
For example, the targeting segment can have a length of from about 12 nucleotides (nt) to about 80 nt, from about 12 nt to about 50nt, from about 12 nt to about 40 nt, from about 12 nt to about 30 nt, from about 12 nt to about 25 nt, from about 12 nt to about 20 nt, or from about 12 nt to about 19 nt. For example, the targeting segment can have a length of from about 19 nt to about 20 nt, from about 19 nt to about 25 nt, from about 19 nt to about 30 nt, from about 19 nt to about 35 nt, from about 19 nt to about 40 nt, from about 19 nt to about 45 nt, from about 19 nt to about 50 nt, from about 19 nt to about 60 nt, from about 19 nt to about 70 nt, from about 19 nt to about 80 nt, from about 19 nt to about 90 nt, from about 19 nt to about 100 nt, from about 20 nt to about 25 nt, from about 20 nt to about 30 nt, from about 20 nt to about 35 nt, from about 20 nt to about 40 nt, from about 20 nt to about 45 nt, from about 20 nt to about 50 nt, from about 20 nt to about 60 nt, from about 20 nt to about 70 nt, from about 20 nt to about 80 nt, from about 20 nt to about 90 nt, or from about 20 nt to about 100 nt.
[00376] The nucleotide sequence (the targeting sequence) of the targeting segment that is
complementary to a nucleotide sequence (target site) of the target nucleic acid can have a length of 12 nt or more. For example, the targeting sequence of the targeting segment that is
complementary to a target site of the target nucleic acid can have a length of 12 nt or more, 15 nt or more, 18 nt or more, 19 nt or more, 20 nt or more, 25 nt or more, 30 nt or more, 35 nt or more or 40 nt. For example, the targeting sequence of the targeting segment that is complementary to a target sequence of the target nucleic acid can have a length of from about 12 nucleotides (nt) to about 80 nt, from about 12 nt to about 50nt, from about 12 nt to about 45 nt, from about 12 nt to about 40 nt, from about 12 nt to about 35 nt, from about 12 nt to about 30 nt, from about 12 nt to about 25 nt, from about 12 nt to about 20 nt, from about 12 nt to about 19 nt, from about 19 nt to about 20 nt, from about 19 nt to about 25 nt, from about 19 nt to about 30 nt, from about 19 nt to about 35 nt, from about 19 nt to about 40 nt, from about 19 nt to about 45 nt, from about 19 nt to about 50 nt, from about 19 nt to about 60 nt, from about 20 nt to about 25 nt, from about 20 nt to about 30 nt, from about 20 nt to about 35 nt, from about 20 nt to about 40 nt, from about 20 nt to about 45 nt, from about 20 nt to about 50 nt, or from about 20 nt to about 60 nt. The nucleotide sequence (the targeting sequence) of the targeting segment that is complementary to a nucleotide sequence (target site) of the target nucleic acid can have a length of 12 nt or more.
[00377] In some cases, the targeting sequence of the targeting segment that is complementary to a target site of the target nucleic acid is 20 nucleotides in length. In some cases, the targeting sequence of the targeting segment that is complementary to a target site of the target nucleic acid is 19 nucleotides in length.
[00378] In some cases, the targeting segment can have a length of 7 or more nucleotides (nt) (e.g., 8 or more, 9 or more, 10 or more, 12 or more, 15 or more, 20 or more, 25 or more, 30 or more, or 40 or more nucleotides). In some cases, the targeting segment can have a length of from 7 to 100 nucleotides (nt) (e.g., from 7 to 80 nt, from 7 to 60 nt, from 7 to 40 nt, from 7 to 30 nt, from 7 to 25 nt, from 7 to 22 nt, from 7 to 20 nt, from 7 to 18 nt, from 8 to 80 nt, from 8 to 60 nt, from 8 to 40 nt, from 8 to 30 nt, from 8 to 25 nt, from 8 to 22 nt, from 8 to 20 nt, from 8 to 18 nt, from 10 to 100 nt, from 10 to 80 nt, from 10 to 60 nt, from 10 to 40 nt, from 10 to 30 nt, from 10 to 25 nt, from 10 to 22 nt, from 10 to 20 nt, from 10 to 18 nt, from 12 to 100 nt, from 12 to 80 nt, from 12 to 60 nt, from 12 to 40 nt, from 12 to 30 nt, from 12 to 25 nt, from 12 to 22 nt, from 12 to 20 nt, from 12 to 18 nt, from 14 to 100 nt, from 14 to 80 nt, from 14 to 60 nt, from 14 to 40 nt, from 14 to 30 nt, from 14 to 25 nt, from 14 to 22 nt, from 14 to 20 nt, from 14 to 18 nt, from 16 to 100 nt, from 16 to 80 nt, from 16 to 60 nt, from 16 to 40 nt, from 16 to 30 nt, from 16 to 25 nt, from 16 to 22 nt, from 16 to 20 nt, from 16 to 18 nt, from 18 to 100 nt, from 18 to 80 nt, from 18 to 60 nt, from 18 to 40 nt, from 18 to 30 nt, from 18 to 25 nt, from 18 to 22 nt, from 18 to 20 nt, 18 nt, 19 nt, or 20 nt).
[00379] In some cases, the nucleotide sequence (the targeting sequence) of the targeting segment that is complementary to a nucleotide sequence (target site) of the target nucleic acid can have a length
of 7 or more nucleotides (nt) (e.g., 8 or more, 9 or more, 10 or more, 12 or more, 15 or more, 20 or more, 25 or more, 30 or more, or 40 or more nucleotides). In some cases, the targeting sequence of the targeting segment that is complementary to a target sequence of the target nucleic acid can have a length of from 7 to 100 nucleotides (nt) (e.g., from 7 to 80 nt, from 7 to 60 nt, from 7 to 40 nt, from 7 to 30 nt, from 7 to 25 nt, from 7 to 22 nt, from 7 to 20 nt, from 7 to 18 nt, from 8 to 80 nt, from 8 to 60 nt, from 8 to 40 nt, from 8 to 30 nt, from 8 to 25 nt, from 8 to 22 nt, from 8 to 20 nt, from 8 to 18 nt, from 10 to 100 nt, from 10 to 80 nt, from 10 to 60 nt, from 10 to 40 nt, from 10 to 30 nt, from 10 to 25 nt, from 10 to 22 nt, from 10 to 20 nt, from 10 to 18 nt, from 12 to 100 nt, from 12 to 80 nt, from 12 to 60 nt, from 12 to 40 nt, from 12 to 30 nt, from 12 to 25 nt, from 12 to 22 nt, from 12 to 20 nt, from 12 to 18 nt, from 14 to 100 nt, from 14 to 80 nt, from 14 to 60 nt, from 14 to 40 nt, from 14 to 30 nt, from 14 to 25 nt, from 14 to 22 nt, from 14 to 20 nt, from 14 to 18 nt, from 16 to 100 nt, from 16 to 80 nt, from 16 to 60 nt, from 16 to 40 nt, from 16 to 30 nt, from 16 to 25 nt, from 16 to 22 nt, from 16 to 20 nt, from 16 to 18 nt, from 18 to 100 nt, from 18 to 80 nt, from 18 to 60 nt, from 18 to 40 nt, from 18 to 30 nt, from 18 to 25 nt, from 18 to 22 nt, from 18 to 20 nt, 18 nt, 19 nt, or 20 nt).
[00380] In some cases, the targeting sequence of the targeting segment that is complementary to a target site of the target nucleic acid is 20 nucleotides in length. In some cases, the targeting sequence of the targeting segment that is complementary to a target site of the target nucleic acid is 19 nucleotides in length.
[00381] The percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid can be 60% or more (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%). In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the seven contiguous 5' -most nucleotides of the target site of the target nucleic acid. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 60% or more over about 20 contiguous nucleotides. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the fourteen contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 14 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the seven contiguous 5 '-most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the
remainder. In such a case, the targeting sequence can be considered to be 20 nucleotides in length.
[00382] In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 7 contiguous 5 '-most nucleotides of the target site of the target nucleic acid (which can be complementary to the 3'- most nucleotides of the targeting sequence of the Cas9 guide RNA). In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 8 contiguous 5 '-most nucleotides of the target site of the target nucleic acid (which can be complementary to the 3 '-most nucleotides of the targeting sequence of the Cas9 guide RNA). In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 9 contiguous 5 '-most nucleotides of the target site of the target nucleic acid (which can be complementary to the 3 '-most nucleotides of the targeting sequence of the Cas9 guide RNA). In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 10 contiguous 5 '-most nucleotides of the target site of the target nucleic acid (which can be complementary to the 3'- most nucleotides of the targeting sequence of the Cas9 guide RNA). In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 60% or more (e.g., e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over about 20 contiguous nucleotides.
[00383] In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 7 contiguous 5 '-most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 7 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 8 contiguous 5 '-most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 8 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 9 contiguous 5 '-most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 9 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 10 contiguous 5 '-most
nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 10 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 11 contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 11 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 12 contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 12 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 13 contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 13 nucleotides in length. In some cases, the percent complementarity between the targeting sequence of the targeting segment and the target site of the target nucleic acid is 100% over the 14 contiguous 5'- most nucleotides of the target site of the target nucleic acid and as low as 0% or more over the remainder. In such a case, the targeting sequence can be considered to be 14 nucleotides in length.
[00384] Second segment: protein- binding segment
[00385] The protein-binding segment of a subject guide nucleic acid interacts with a Cas9 polypeptide.
The subject guide nucleic acid guides the bound polypeptide to a specific nucleotide sequence within target nucleic acid via the above mentioned targeting segment. The protein-binding segment of a subject guide nucleic acid comprises two stretches of nucleotides that are complementary to one another. The complementary nucleotides of the protein-binding segment hybridize to form a double stranded RNA duplex (dsRNA).
[00386] The protein-binding segment of a subject Cas9 guide RNA interacts with a Cas9 protein. The Cas9 guide RNA guides the bound Cas9 protein to a specific nucleotide sequence within target nucleic acid via the above mentioned targeting segment. The protein-binding segment of a Cas9 guide RNA comprises two stretches of nucleotides that are complementary to one another and hybridize to form a double stranded RNA duplex (dsRNA duplex). Thus, the protein-binding segment includes a dsRNA duplex. In some cases, the protein-binding segment also includes stem loop 1 (the "nexus") of a Cas9 guide RNA (e.g., see Figure 9). For example, in some cases, the activator of a Cas9 guide RNA (dgRNA or sgRNA) includes (i) a duplex forming segment that contributes to the dsRNA duplex of the protein-binding segment; and (ii) nucleotides 3' of
the duplex forming segment, e.g., that form stem loop 1 (the "nexus"). For example, in some cases, the protein-binding segment includes stem loop 1 (the "nexus") of a Cas9 guide RNA. In some cases, the protein-binding segment includes 5 or more nucleotides (nt) (e.g., 6 or more, 7 or more, 8 or more, 9 or more, 10 or more, 11 or more, 12 or more, 15 or more, 20 or more, 30 or more, 40 or more, 50 or more, 60 or more, 70 or more, 75 or more, or 80 or more nt) 3' of the dsRNA duplex (where 3' is relative to the duplex-forming segment of the activator sequence).
[00387] The dsRNA duplex of the guide RNA (sgRNA or dgRNA) that forms between the activator and targeter is sometimes referred to herein as the "stem loop". In addition, the activator (activator RNA, tracrRNA) of many naturally existing Cas9 guide RNAs (e.g., 5. pygogenes guide RNAs) has 3 stem loops (3 hairpins) that are 3' of the duplex-forming segment of the activator. The closest stem loop to the duplex-forming segment of the activator (3' of the duplex forming segment) is called "stem loop 1" (and is also referred to herein as the "nexus"); the next stem loop is called "stem loop 2" (and is also referred to herein as the "hairpin 1"); and the next stem loop is called "stem loop 3" (and is also referred to herein as the "hairpin 2"). For example, see Figure 9 for clarification of the nomenclature.
[00388] The term "truncated guide RNA", as used herein, refers to a Cas9 guide RNA (single guide or dual guide) that has the nexus ("stem loop 1"), but is missing one or both of stem loops 2 and 3. Thus, a "truncated guide RNA" is truncated from the 3' end of the activator and can have: (i) stem loop 1 only; (ii) stem loop 1 plus stem loop 2; or (iii) stem loop 1 plus stem loop 3. In some cases, a guide RNA (e.g., some naturally existing guide RNAs) have only one stem loop 3' of the nexus ("stem loop 1") and thus for purposes herein, such guide RNAs are referred to herein as having a nexus ("stem loop 1") and a "stem loop 2/3" (or "hairpin 1/2"). For more information regarding guide RNAs, see Briner et al., Mol Cell. 2014 Oct 23;56(2):333-9, which is hereby incorporated by reference in its entirety.
[00389] Thus, the term "truncated guide RNA", as used herein, refers to a Cas9 guide RNA (single guide or dual guide) that does not include one or both of: stem loop 2 and stem loop 3. In some cases, a Cas9 guide RNA (sgRNA or dgRNA) (a truncated Cas9 guide RNA) has stem loop 1, but does not have stem loop 2 and does not have stem loop 3. In some cases, a Cas9 guide RNA (sgRNA or dgRNA) (a truncated Cas9 guide RNA) has stem loop 1 and stem loop 2, but does not have stem loop 3. In some cases, a Cas9 guide RNA (sgRNA or dgRNA) (a truncated Cas9 guide RNA) has stem loop 1 and stem loop 3, but does not have stem loop 2. For example, in some cases, a Cas9 guide RNA (sgRNA or dgRNA) (a truncated Cas9 guide RNA) has stem loop 1, but does not have at least one of: stem loop 2 and stem loop 3. In some cases, a Cas9 guide RNA (sgRNA or dgRNA) (e.g., a full length Cas9 guide RNA) has stem loops 1, 2, and 3.
[00390] The inventors have discovered that in the presence of a Cas9 guide RNA having stem loops 1 , 2, and 3 (e.g., a full length guide RNA), the nuclease lobe and the alpha-helical lobe of a split Cas9 protein (i.e., a Cas9 heterodimer) form a functional complex. Thus, complex formation between the nuclease lobe and the alpha-helical lobe does not require an additional dimerization mechanism (e.g., dimerization domains and/or dimerization inducers). However, in the presence of a Cas9 guide RNA that does not have stem loops 2 or 3 (a truncated guide RNA), the Cas9 heterodimer does not assemble (e..g, in the absence of a dimerizer such as light or a dimerizing agent). As such, when one wishes to control the assembly of a Cas9 heterodimer, one can use a Cas9 heterdimer having a dimerizer pair as fusion partners (e.g., in which the nuclease lobe includes a first member of a dimer pair and the alpha-helical lobe includes a second member the dimer pair), in the presence of a truncated Cas9 guide RNA (a Cas9 guide RNA that does not include stem loops 2 or 3 but does include stem loop 1 ; a guide RNA that does not include stem loop 2 but does include stem loops 1 and 3; and/or a guide RNA that does not include stem loop 3 but does include stem loops 1 and 2). Thus, complex formation of the nuclease lobe and the alpha-helical lobe to form a functional Cas9 complex can be induced by using the proper inducing agent.
[00391] Thus, in some cases, an activator (of a Cas9 guide RNA) has stem loop 1, but does not have stem loop 2 and does not have stem loop 3. In some cases, an activator (of a Cas9 guide RNA) has stem loop 1 and stem loop 2, but does not have stem loop 3. In some cases, an activator (of a Cas9 guide RNA) has stem loop 1 and stem loop 3, but does not have stem loop 2. In some cases, an activator (of a Cas9 guide RNA) has stem loops 1, 2, and 3. For example, in some cases, an activator (of a Cas9 guide RNA) has stem loop 1, but does not have at least one of: stem loop 2 and stem loop 3.
[00392] In some cases, the activator (e.g., tracr sequence) of a Cas9 guide RNA (dgRNA or sgRNA) includes (i) a duplex forming segment that contributes to the dsRNA duplex of the protein- binding segment; and (ii) nucleotides 3' of the duplex forming segment (and therefore the Cas9 guide RNA includes (ii)). In some cases, the additional nucleotides 3' of the duplex forming segment form stem loop 1. In some cases, the activator (e.g., tracr sequence) of a Cas9 guide RNA (dgRNA or sgRNA) includes (i) a duplex forming segment that contributes to the dsRNA duplex of the protein-binding segment; and (ii) 5 or more nucleotides (e.g., 6 or more, 7 or more, 8 or more, 9 or more, 10 or more, 11 or more, 12 or more, 13 or more, 14 or more, 15 or more, 20 or more, 25 or more, 30 or more, 35 or more, 40 or more, 45 or more, 50 or more, 60 or more, 70 or more, or 75 or more nucleotides) 3' of the duplex forming segment (and therefore the Cas9 guide RNA includes (ii)). In some cases, the activator of a Cas9 guide RNA (dgRNA or sgRNA) includes (i) a duplex forming segment that contributes to the dsRNA duplex of the protein-
binding segment; and (ii) 5 or more nucleotides (e.g., 6 or more, 7 or more, 8 or more, 9 or more, 10 or more, 11 or more, 12 or more, 13 or more, 14 or more, 15 or more, 20 or more, 25 or more, 30 or more, 35 or more, 40 or more, 45 or more, 50 or more, 60 or more, 70 or more, or 75 or more nucleotides) 3' of the duplex forming segment (and therefore the Cas9 guide RNA includes (ϋ))·
[00393] In some cases, the activator (e.g., tracr sequence) of a Cas9 guide RNA (dgRNA or sgRNA) includes (i) a duplex forming segment that contributes to the dsRNA duplex of the protein- binding segment; and (ii) a stretch of nucleotides (e.g., referred to herein as a 3' tail) 3' of the duplex forming segment (and therefore the Cas9 guide RNA includes (ii)). In some cases, the stretch of nucleotides 3' of the duplex forming segment has a length in a range of from 5 to 200 nucleotides (nt) (e.g., from 5 to 150 nt, from 5 to 130 nt, from 5 to 120 nt, from 5 to 100 nt, from 5 to 80 nt, from 10 to 200 nt, from 10 to 150 nt, from 10 to 130 nt, from 10 to 120 nt, from 10 to 100 nt, from 10 to 80 nt, from 12 to 200 nt, from 12 to 150 nt, from 12 to 130 nt, from 12 to 120 nt, from 12 to 100 nt, from 12 to 80 nt, from 15 to 200 nt, from 15 to 150 nt, from 15 to 130 nt, from 15 to 120 nt, from 15 to 100 nt, from 15 to 80 nt, from 20 to 200 nt, from 20 to 150 nt, from 20 to 130 nt, from 20 to 120 nt, from 20 to 100 nt, from 20 to 80 nt, from 30 to 200 nt, from 30 to 150 nt, from 30 to 130 nt, from 30 to 120 nt, from 30 to 100 nt, or from 30 to 80 nt).
[00394] In some embodiments, the duplex-forming segment of the activator is 60% or more identical to one of the activator (tracrRNA) molecules set forth in SEQ ID NOs:431-562, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). For example, the duplex-forming segment of the activator (or the DNA encoding the duplex-forming segment of the activator) can be 65% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). The duplex-forming segment of the activator (or the DNA encoding the duplex-forming segment of the activator) can be 70% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).The duplex-forming segment of the activator (or the DNA encoding the duplex-forming segment of the activator) can be 75% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, and 1535-
1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). The duplex-forming segment of the activator (or the DNA encoding the duplex-forming segment of the activator) can be 80% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).The duplex-forming segment of the activator (or the DNA encoding the duplex-forming segment of the activator) can be 85% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).The duplex-forming segment of the activator (or the DNA encoding the duplex-forming segment of the activator) can be 90% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, and 1535- 1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). The duplex-forming segment of the activator (or the DNA encoding the duplex-forming segment of the activator) can be 95% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).The duplex-forming segment of the activator (or the DNA encoding the duplex-forming segment of the activator) can be 98% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).The duplex-forming segment of the activator (or the DNA encoding the duplex-forming segment of the activator) can be 99% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, and 1535- 1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). The
duplex-forming segment of the activator (or the DNA encoding the duplex-forming segment of the activator) can be 100% identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
In some embodiments, the duplex-forming segment of the targeter is 60% or more identical to one of the targeter (crRNA) sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 contig or more contiguous nucleotides, or 20 or more contiguous nucleotides). For example, the duplex- forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 65% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). The duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 70% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). The duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 75% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). The duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 80% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). The duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 85% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous
nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). The duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 90% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). The duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 95% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). The duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 98% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). The duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 99% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). The duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 100% identical to one of the crRNA sequences set forth in SEQ ID NOs:563- 679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00396] A dual guide RNA can be designed to allow for controlled (i.e., conditional) binding of a
targeter with an activator. Because a Cas9 dual guide RNA is not functional unless both the activator and the targeter are bound in a functional complex with Cas9, a dual guide RNA can be inducible (e.g., drug inducible) by rendering the binding between the activator and the targeter to be inducible. As one non-limiting example, RNA aptamers can be used to regulate (i.e., control) the binding of the activator with the targeter. Accordingly, the activator and/or the targeter can include an RNA aptamer sequence.
[00397] Aptamers (e.g., RNA aptamers) are known in the art and are generally a synthetic version of a riboswitch. The terms "RNA aptamer" and "riboswitch" are used interchangeably herein to
encompass both synthetic and natural nucleic acid sequences that provide for inducible regulation of the structure (and therefore the availability of specific sequences) of the nucleic acid molecule (e.g., RNA, DNA/RNA hybrid, etc.) of which they are part. RNA aptamers usually comprise a sequence that folds into a particular structure (e.g., a hairpin), which specifically binds a particular drug (e.g., a small molecule). Binding of the drug causes a structural change in the folding of the RNA, which changes a feature of the nucleic acid of which the aptamer is a part. As non-limiting examples: (i) an activator with an aptamer may not be able to bind to the cognate targeter unless the aptamer is bound by the appropriate drug; (ii) a targeter with an aptamer may not be able to bind to the cognate activator unless the aptamer is bound by the appropriate drug; and (iii) a targeter and an activator, each comprising a different aptamer that binds a different drug, may not be able to bind to each other unless both drugs are present. As illustrated by these examples, a Cas9 dual guide RNA can be designed to be inducible.
[00398] Examples of aptamers and riboswitches can be found, for example, in: Nakamura et al., Genes Cells. 2012 May;17(5):344-64; Vavalle et al., Future Cardiol. 2012 May;8(3):371-82; Citartan et al., Biosens Bioelectron. 2012 Apr 15;34(1): 1-11 ; and Liberman et al., Wiley Interdiscip Rev RNA. 2012 May-Jun;3(3):369-84; all of which are herein incorporated by reference in their entirety.
[00399] A Cas9 single guide RNA comprises two stretches of nucleotides (a "targeter" and an
"activator") that are complementary to one another, hybridize to form the double stranded RNA duplex (dsRNA duplex) of the protein-binding segment (thus resulting in a stem-loop structure), and are covalently linked, e.g., by a linker of intervening nucleotides ( "linker nucleotides"). Thus, a subject single Cas9 guide RNA (e.g., a single guide RNA) can comprise a targeter and an activator, each having a duplex-forming segment, where the duplex-forming segments of the targeter and the activator hybridize with one another to form a dsRNA duplex. The targeter and the activator can be covalently linked via the 3' end of the targeter and the 5' end of the activator. Alternatively, targeter and the activator can be covalently linked via the 5' end of the targeter and the 3' end of the activator.
[00400] The linker of a Cas9 single guide RNA can have a length of from 3 nucleotides to 100
nucleotides. For example, the linker can have a length of from 3 nucleotides (nt) to 90 nt, from 3 nucleotides (nt) to 80 nt, from 3 nucleotides (nt) to 70 nt, from 3 nucleotides (nt) to 60 nt, from 3 nucleotides (nt) to 50 nt, from 3 nucleotides (nt) to 40 nt, from 3 nucleotides (nt) to 30 nt, from 3 nucleotides (nt) to 20 nt or from 3 nucleotides (nt) to 10 nt. For example, the linker can have a length of from 3 nt to 5 nt, from 5 nt to 10 nt, from 10 nt to 15 nt, from 15 nt to 20 nt, from 20 nt to 25 nt, from 25 nt to 30 nt, from 30 nt to 35 nt, from 35 nt to 40 nt, from 40 nt to 50 nt, from 50
nt to 60 nt, from 60 nt to 70 nt, from 70 nt to 80 nt, from 80 nt to 90 nt, or from 90 nt to 100 nt. In some embodiments, the linker of a Cas9 single guide RNA is 4 nt.
[00401] A single Cas9 guide RNA comprises two complementary stretches of nucleotides (a targeter and an activator) that hybridize to form a dsRNA duplex. In some embodiments, one of the two complementary stretches of nucleotides of the single Cas9 guide RNA (or the DNA encoding the stretch) is 60% or more identical to one of the activator (tracrRNA) molecules set forth in SEQ ID NOs:431-562, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). For example, one of the two complementary stretches of nucleotides of the single Cas9 guide RNA (or the DNA encoding the stretch) is 65% or more identical, 70% or more identical, 75% or more identical, 80% or more identical, 85% or more identical, 90% or more identical, 95% or more identical, 98% or more identical, 99% or more identical or 100% identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00402] In some embodiments, one of the two complementary stretches of nucleotides of the single Cas9 guide RNA (or the DNA encoding the stretch) is 60% or more identical to one of the targeter (crRNA) sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). For example, one of the two complementary stretches of nucleotides of the single Cas9 guide RNA (or the DNA encoding the stretch) is 65% or more identical, 70% or more identical, 75% or more identical, 80% or more identical, 85% or more identical, 90% or more identical, 95% or more identical, 98% or more identical, 99% or more identical or 100% identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00403] In some embodiments, one of the two complementary stretches of nucleotides of the single Cas9 guide RNA (or the DNA encoding the stretch) is 60% or more identical to one of the targeter (crRNA) sequences or activator (tracrRNA) sequences set forth in SEQ ID NOs: 431- 679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides,
15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). For example, one of the two complementary stretches of nucleotides of the single Cas9 guide RNA (or the DNA encoding the stretch) can be 65% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00404] One of the two complementary stretches of nucleotides of the single Cas9 guide RNA (or the DNA encoding the stretch) can be 70% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00405] One of the two complementary stretches of nucleotides of the single Cas9 guide RNA (or the DNA encoding the stretch) can be 75% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides)
[00406] One of the two complementary stretches of nucleotides of the single Cas9 guide RNA (or the DNA encoding the stretch) can be 80% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00407] One of the two complementary stretches of nucleotides of the single Cas9 guide RNA (or the DNA encoding the stretch) can be 85% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00408] One of the two complementary stretches of nucleotides of the single Cas9 guide RNA (or the DNA encoding the stretch) can be 90% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or
more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00409] One of the two complementary stretches of nucleotides of the single Cas9 guide RNA (or the DNA encoding the stretch) can be 95% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00410] One of the two complementary stretches of nucleotides of the single Cas9 guide RNA (or the DNA encoding the stretch) can be 98% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00411] One of the two complementary stretches of nucleotides of the single Cas9 guide RNA (or the DNA encoding the stretch) can be 99% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00412] One of the two complementary stretches of nucleotides of the single Cas9 guide RNA (or the DNA encoding the stretch) can be 100% identical to one of the sequences set forth in SEQ ID NOs: 431-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00413] Appropriate cognate pairs of targeters and activators can be routinely determined for SEQ ID NOs:431-679 by taking into account the species name and base-pairing (for the dsRNA duplex of the protein-binding domain). Any activator/targeter pair can be used as part of subject dual Cas9 guide RNA or as part of a subject single Cas9 guide RNA.
[00414] In some embodiments, one of the two complementary stretches of nucleotides (that hybridize to form the dsRNA duplex of the protein-binding segment) of the Cas9 single guide RNA (or the DNA encoding the stretch) is 60% or more identical to one of the activator (tracrRNA) molecules set forth in SEQ ID NOs:431-562, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more
contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). For example, one of the two complementary stretches of nucleotides of the Cas9 single guide RNA (or the DNA encoding the stretch) is 65% or more identical, 70% or more identical, 75% or more identical, 80% or more identical, 85% or more identical, 90% or more identical, 95% or more identical, 98% or more identical, 99% or more identical or 100% identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00415] In some embodiments, one of the two complementary stretches of nucleotides (that hybridize to form the dsRNA duplex of the protein-binding segment) of the Cas9 single guide RNA (or the DNA encoding the stretch) is 60% or more identical to one of the targeter (crRNA) sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). For example, one of the two complementary stretches of nucleotides of the Cas9 single guide RNA (or the DNA encoding the stretch) is 65% or more identical, 70% or more identical, 75% or more identical, 80% or more identical, 85% or more identical, 90% or more identical, 95% or more identical, 98% or more identical, 99% or more identical or 100% identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00416] In some embodiments, one of the two complementary stretches of nucleotides (that hybridize to form the dsRNA duplex of the protein-binding segment) of the Cas9 single guide RNA (or the DNA encoding the stretch) is 60% or more identical to one of the targeter (crRNA) sequences or activator (tracrRNA) sequences set forth in SEQ ID NOs: 431-679, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). For example, one of the two complementary stretches of nucleotides of the Cas9 single guide RNA (or the DNA encoding the stretch) can be 65% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or
more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). One of the two complementary stretches of nucleotides of the Cas9 single guide RNA (or the DNA encoding the stretch) can be 70% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). One of the two complementary stretches of nucleotides of the Cas9 single guide RNA (or the DNA encoding the stretch) can be 75% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides)One of the two complementary stretches of nucleotides of the Cas9 single guide RNA (or the DNA encoding the stretch) can be 80% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). One of the two complementary stretches of nucleotides of the Cas9 single guide RNA (or the DNA encoding the stretch) can be 85% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). One of the two complementary stretches of nucleotides of the Cas9 single guide RNA (or the DNA encoding the stretch) can be 90% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). One of the two complementary stretches of nucleotides of the Cas9 single guide RNA (or the DNA encoding the stretch) can be 95% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). One of the two complementary stretches of nucleotides of the Cas9 single guide RNA (or the DNA encoding the stretch) can be 98% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, and 1535-1544, or a
complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). One of the two complementary stretches of nucleotides of the Cas9 single guide RNA (or the DNA encoding the stretch) can be 99% or more identical to one of the sequences set forth in SEQ ID NOs: 431-679, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). One of the two complementary stretches of nucleotides of the Cas9 single guide RNA (or the DNA encoding the stretch) can be 100% identical to one of the sequences set forth in SEQ ID NOs: 431-679, and 1535-1544, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00417] Appropriate cognate pairs of targeters and activators can be routinely determined for SEQ ID NOs:431-679 and 1535-1544, by taking into account the species name and base-pairing (for the dsRNA duplex of the protein-binding domain). Any activator/targeter pair can be used as part of subject Cas9 dual guide RNA or as part of a subject Cas9 single guide RNA.
[00418] In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a Cas9 dual guide
RNA (e.g., a dual guide RNA) or a Cas9 single guide RNA (e.g., a single guide RNA) includes a stretch of nucleotides with 60% or more sequence identity (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% sequence identity) with a naturally existing activator (tracrRNA) molecule. In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a Cas9 dual guide RNA (e.g., a dual guide RNA) or a Cas9 single guide RNA (e.g., a single guide RNA) includes a stretch of nucleotides with 60% or more sequence identity (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% sequence identity) with an activator (tracrRNA) molecule set forth in any one of SEQ ID NOs:431-562, and 1535-1544, or a complement thereof. In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a Cas9 dual guide RNA (e.g., a dual guide RNA) or a Cas9 single guide RNA (e.g., a single guide RNA) includes a stretch of nucleotides with 70% or more sequence identity with an activator (tracrRNA) molecule set forth in any one of SEQ ID NOs:431-562, and 1535-1544, or a complement thereof. In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a Cas9 dual guide RNA (e.g., a dual guide RNA) or a Cas9 single guide RNA (e.g., a single guide RNA) includes a stretch of nucleotides with 75% or more sequence identity with an activator
(tracrRNA) molecule set forth in any one of SEQ ID NOs:431-562, and 1535-1544, or a complement thereof. In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a Cas9 dual guide RNA (e.g., a dual guide RNA) or a Cas9 single guide RNA (e.g., a single guide RNA) includes a stretch of nucleotides with 80% or more sequence identity with an activator (tracrRNA) molecule set forth in any one of SEQ ID NOs:431-562, and 1535-1544, or a complement thereof. In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a Cas9 dual guide RNA (e.g., a dual guide RNA) or a Cas9 single guide RNA (e.g., a single guide RNA) includes a stretch of nucleotides with 85% or more sequence identity with an activator (tracrRNA) molecule set forth in any one of SEQ ID NOs:431-562, and 1535-1544, or a complement thereof. In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a Cas9 dual guide RNA (e.g., a dual guide RNA) or a Cas9 single guide RNA (e.g., a single guide RNA) includes a stretch of nucleotides with 90% or more sequence identity with an activator (tracrRNA) molecule set forth in any one of SEQ ID NOs:431-562, and 1535-1544, or a complement thereof. In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a Cas9 dual guide RNA (e.g., a dual guide RNA) or a Cas9 single guide RNA (e.g., a single guide RNA) includes a stretch of nucleotides with 95% or more sequence identity with an activator (tracrRNA) molecule set forth in any one of SEQ ID NOs:431-562, and 1535-1544, or a complement thereof. In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a Cas9 dual guide RNA (e.g., a dual guide RNA) or a Cas9 single guide RNA (e.g., a single guide RNA) includes a stretch of nucleotides with 98% or more sequence identity with an activator (tracrRNA) molecule set forth in any one of SEQ ID NOs:431-562, and 1535-1544, or a complement thereof. In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a Cas9 dual guide RNA (e.g., a dual guide RNA) or a Cas9 single guide RNA (e.g., a single guide RNA) includes a stretch of nucleotides with 100% sequence identity with an activator
(tracrRNA) molecule set forth in any one of SEQ ID NOs:431-562, and 1535-1544, or a complement thereof.
In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a Cas9 dual guide
RNA (e.g., a dual guide RNA) or a Cas9 single guide RNA (e.g., a single guide RNA) includes a stretch of nucleotides with 60% or more sequence identity (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 100% sequence identity) with a nucleotide sequence set forth in any one of SEQ ID NOs:431-679, and 1535-1544, or a complement thereof. In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a Cas9 dual guide RNA (e.g., a dual guide RNA) or a Cas9 single guide RNA (e.g., a single guide RNA) includes a stretch of nucleotides with 70% or more sequence identity with an activator (tracrRNA) molecule set forth in any one of SEQ ID NOs:431-562, and 1535-
1544, or a complement thereof. In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a Cas9 dual guide RNA (e.g., a dual guide RNA) or a Cas9 single guide RNA (e.g., a single guide RNA) includes a stretch of nucleotides with 75% or more sequence identity with an activator (tracrRNA) molecule set forth in any one of SEQ ID NOs:431-562, and 1535-1544, or a complement thereof. In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a Cas9 dual guide RNA (e.g., a dual guide RNA) or a Cas9 single guide RNA (e.g., a single guide RNA) includes a stretch of nucleotides with 80% or more sequence identity with an activator (tracrRNA) molecule set forth in any one of SEQ ID NOs:431-562, and 1535-1544, or a complement thereof. In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a Cas9 dual guide RNA (e.g., a dual guide RNA) or a Cas9 single guide RNA (e.g., a single guide RNA) includes a stretch of nucleotides with 85% or more sequence identity with an activator (tracrRNA) molecule set forth in any one of SEQ ID NOs:431-562, and 1535-1544, or a complement thereof. In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a Cas9 dual guide RNA (e.g., a dual guide RNA) or a Cas9 single guide RNA (e.g., a single guide RNA) includes a stretch of nucleotides with 90% or more sequence identity with an activator (tracrRNA) molecule set forth in any one of SEQ ID NOs:431-562, and 1535-1544, or a complement thereof. In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a Cas9 dual guide RNA (e.g., a dual guide RNA) or a Cas9 single guide RNA (e.g., a single guide RNA) includes a stretch of nucleotides with 95% or more sequence identity with an activator (tracrRNA) molecule set forth in any one of SEQ ID NOs:431-562, and 1535-1544, or a complement thereof. In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a Cas9 dual guide RNA (e.g., a dual guide RNA) or a Cas9 single guide RNA (e.g., a single guide RNA) includes a stretch of nucleotides with 98% or more sequence identity with an activator (tracrRNA) molecule set forth in any one of SEQ ID NOs:431-562, and 1535-1544, or a complement thereof. In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a Cas9 dual guide RNA (e.g., a dual guide RNA) or a Cas9 single guide RNA (e.g., a single guide RNA) includes a stretch of nucleotides with 100% sequence identity with an activator
(tracrRNA) molecule set forth in any one of SEQ ID NOs:431-562, and 1535-1544, or a complement thereof.
In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a Cas9 dual guide
RNA (e.g., a dual guide RNA) or a Cas9 single guide RNA (e.g., a single guide RNA) includes 30 or more nucleotides (nt) (e.g., 40 or more, 50 or more, 60 or more, 70 or more, 75 or more nt). In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a Cas9 dual guide RNA (e.g., a dual guide RNA) or a Cas9 single guide RNA (e.g., a single guide RNA) has a length in a range of from 25 to 300 nucleotides (nt) (e.g., 30 to 300 nt, 40 to 300 nt, 50 to 300 nt,
60 to 300 nt, 65 to 300 nt, 70 to 300 nt, 75 to 300 nt, 30 to 200 nt, 40 to 200 nt, 50 to 200 nt, 60 to 200 nt, 65 to 200 nt, 70 to 200 nt, 75 to 200 nt, 30 to 150 nt, 40 to 150 nt, 50 to 150 nt, 60 to 150 nt, 65 to 150 nt, 70 to 150 nt, 75 to 150 nt, 30 to 100 nt, 40 to 100 nt, 50 to 100 nt, 60 to 100 nt, 65 to 100 nt, 70 to 100 nt, 75 to 100 nt, 30 to 75 nt, 30 to 65 nt, 30 to 50 nt, or 30 to 40 nt). In some cases, an activator (e.g., a trRNA, trRNA-like molecule, etc.) of a dual Cas9 guide RNA (e.g., a dual guide RNA) or a single Cas9 guide RNA (e.g., a single guide RNA) has a length in a range of from 30 to 200 nucleotides (nt) (e.g., 40 to 200 nucleotides, 50 to 200 nucleotides, 60 to 200 nucleotides, 65 to 200 nucleotides, 70 to 200 nucleotides, 75 to 200 nucleotides, 40 to 150 nucleotides, 50 to 150 nucleotides, 60 to 150 nucleotides, 65 to 150 nucleotides, 70 to 150 nucleotides, 75 to 150 nucleotides, 40 to 100 nucleotides, 50 to 100 nucleotides, 60 to 100 nucleotides, 65 to 100 nucleotides, 70 to 100 nucleotides, or 75 to 100 nucleotides).
In some cases, the protein-binding segment has a length of from 10 nucleotides to 300 nucleotides. Also with regard to both a subject Cas9 single guide RNA and to a subject Cas9 dual guide RNA, the dsRNA duplex of the protein-binding segment can have a length from about 6 base pairs (bp) to about 50bp. For example, the dsRNA duplex of the protein-binding segment can have a length from about 6 bp to about 40 bp, from about 6 bp to about 30bp, from about 6 bp to about 25 bp, from about 6 bp to about 20 bp, from about 6 bp to about 15 bp, from about 8 bp to about 40 bp, from about 8 bp to about 30bp, from about 8 bp to about 25 bp, from about 8 bp to about 20 bp or from about 8 bp to about 15 bp. For example, the dsRNA duplex of the protein-binding segment can have a length from about from about 8 bp to about 10 bp, from about 10 bp to about 15 bp, from about 15 bp to about 18 bp, from about 18 bp to about 20 bp, from about 20 bp to about 25 bp, from about 25 bp to about 30 bp, from about 30 bp to about 35 bp, from about 35 bp to about 40 bp, or from about 40 bp to about 50 bp. In some embodiments, the dsRNA duplex of the protein-binding segment has a length of 36 base pairs. The percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the protein-binding segment can be 60% or more. For example, the percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the protein- binding segment can be 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, or 99% or more. In some cases, the dsRNA duplex of the protein binding segment includes a "bulge", e.g., a region of non-complementarity (which, e.g., can result in two (or more) sub-regions of complementarity separated by one region (or more) of non-complementarity). In some cases, the percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the protein-binding segment is 100%.
[00422] In some embodiments, a suitable Cas9 guide RNA comprises two separate molecules (an activator and a targeter). In some cases, the first of the two separate molecules (the activator) comprises a nucleotide sequence having 60% or more (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more, or 100%) nucleotide sequence identity over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides) to any one of the nucleotide sequences set forth in SEQ ID NOs:431-562, and 1535-1544, or a complement thereof. In some cases, the second of the two separate molecules (the targeter) comprises a nucleotide sequence having 60% or more (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more, or 100%) nucleotide sequence identity over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides) to any one of the nucleotide sequences set forth in SEQ ID NOs:563-679, or a complement thereof.
[00423] In some embodiments, a suitable Cas9 guide RNA is a single RNA polynucleotide and
comprises a first nucleotide sequence having 60% or more (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more, or 100%) nucleotide sequence identity over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides) to any one of the nucleotide sequences set forth in SEQ ID NOs:431-562, and 1535- 1544, and a second nucleotide sequence having 60% or more (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more, or 100%) nucleotide sequence identity over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides) to any one of the nucleotide sequences set forth in SEQ ID NOs: 463-679.
[00424] In some embodiments, the targeter comprises the sequence 5'GUUUUAGAGCUA-3' (SEQ ID NO: 679) linked at its 5' end to a stretch of nucleotides that are complementary to a target nucleic acid. In some embodiments, the activator comprises the sequence
5 ' -UAGC AAGUUAAAAUAAGGCU AGUCCG-3 ' (SEQ ID NO:397).
[00425] In some embodiments, a Cas9 guide RNA comprises the sequence 5'-GUUUUAGAGCUA- linker-UAGCAAGUUAAAAUAAGGCUAGUCCG-3' (SEQ ID NO:680) linked at its 5' end to a stretch of nucleotides that are complementary to a target nucleic acid (where "linker" denotes
any a linker nucleotide sequence that can comprise any nucleotide sequence). Illustrative examples of Cas9 single guide RNAs include those set forth in SEQ ID NOs: 680-682.
[00426] A subject dual guide RNA comprises two separate nucleic acid molecules. Each of the two molecules of a subject dual guide RNA comprises a stretch of nucleotides that are
complementary to one another such that the complementary nucleotides of the two molecules hybridize to form the double stranded RNA duplex of the protein-binding segment.
[00427] In some embodiments, the duplex-forming segment of the activator is 60% or more identical to one of the activator (tracrRNA) molecules set forth in SEQ ID NOs:431-562, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides). For example, the duplex-forming segment of the activator (or the DNA encoding the duplex-forming segment of the activator) can be 65% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00428] The duplex-forming segment of the activator (or the DNA encoding the duplex-forming
segment of the activator) can be 70% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00429] The duplex-forming segment of the activator (or the DNA encoding the duplex-forming
segment of the activator) can be 75% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00430] The duplex-forming segment of the activator (or the DNA encoding the duplex-forming
segment of the activator) can be 80% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00431] The duplex-forming segment of the activator (or the DNA encoding the duplex-forming segment of the activator) can be 85% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00432] The duplex-forming segment of the activator (or the DNA encoding the duplex-forming
segment of the activator) can be 90% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00433] The duplex-forming segment of the activator (or the DNA encoding the duplex-forming
segment of the activator) can be 95% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00434] The duplex-forming segment of the activator (or the DNA encoding the duplex-forming
segment of the activator) can be 98% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00435] The duplex-forming segment of the activator (or the DNA encoding the duplex-forming
segment of the activator) can be 99% or more identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00436] The duplex-forming segment of the activator (or the DNA encoding the duplex-forming
segment of the activator) can be 100% identical to one of the tracrRNA sequences set forth in SEQ ID NOs:431-562, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or
more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00437] In some embodiments, the duplex-forming segment of the targeter is 60% or more identical to one of the targeter (crRNA) seqeunces set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 contig or more contiguous uous nucleotides, or 20 or more contiguous nucleotides). For example, the duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 65% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00438] The duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 70% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00439] The duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 75% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00440] The duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 80% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00441] The duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 85% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00442] The duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 90% or more identical to one of the crRNA sequences set forth in SEQ ID
NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00443] The duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 95% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00444] The duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 98% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00445] The duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 99% or more identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00446] The duplex-forming segment of the targeter (or the DNA encoding the duplex-forming segment of the targeter) can be 100% identical to one of the crRNA sequences set forth in SEQ ID NOs:563-679, or a complement thereof, over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides).
[00447] A dual guide RNA can be designed to allow for controlled (i.e., conditional) binding of a
targeter with an activator. Because a dual Cas9 guide RNA is not functional unless both the activator and the targeter are bound in a functional complex with Cas9, a dual guide guide RNA can be inducible (e.g., drug inducible) by rendering the binding between the activator and the targeter to be inducible. As one non-limiting example, RNA aptamers can be used to regulate (i.e., control) the binding of the activator with the targeter. Accordingly, the activator and/or the targeter can include an RNA aptamer sequence.
[00448] Aptamers (e.g., RNA aptamers) are known in the art and are generally a synthetic version of a riboswitch. The terms "RNA aptamer" and "riboswitch" are used interchangeably herein to encompass both synthetic and natural nucleic acid sequences that provide for inducible regulation of the structure (and therefore the availability of specific sequences) of the nucleic
acid molecule (e.g., RNA, DNA/RNA hybrid, etc.) of which they are part. RNA aptamers usually comprise a sequence that folds into a particular structure (e.g., a hairpin), which specifically binds a particular drug (e.g., a small molecule). Binding of the drug causes a structural change in the folding of the RNA, which changes a feature of the nucleic acid of which the aptamer is a part. As non-limiting examples: (i) an activator with an aptamer may not be able to bind to the cognate targeter unless the aptamer is bound by the appropriate drug; (ii) a targeter with an aptamer may not be able to bind to the cognate activator unless the aptamer is bound by the appropriate drug; and (iii) a targeter and an activator, each comprising a different aptamer that binds a different drug, may not be able to bind to each other unless both drugs are present. As illustrated by these examples, a dual Cas9 guide RNA can be designed to be inducible.
[00449] Examples of aptamers and riboswitches can be found, for example, in: Nakamura et al., Genes Cells. 2012 May;17(5):344-64; Vavalle et al., Future Cardiol. 2012 May;8(3):371-82; Citartan et al., Biosens Bioelectron. 2012 Apr 15;34(1): 1-11 ; and Liberman et al., Wiley Interdiscip Rev RNA. 2012 May-Jun;3(3):369-84; all of which are herein incorporated by reference in their entirety.
[00450] Non-limiting examples of nucleotide sequences that can be included in a dual Cas9 guide RNA include either of the sequences set forth in SEQ ID NOs:431-562, or complements thereof pairing with any sequences set forth in SEQ ID NOs:563-679, or complements thereof that can hybridize to form a protein binding segment.
[00451] Hybrid Cas9 guide RNAs
[00452] In some cases, a Cas9 guide RNA is two RNA molecules (dual guide RNA). In some cases, a
Cas9 guide RNA is one RNA molecule (single guide RNA). In some cases, a Cas9 guide RNA is a DNA/RNA hybrid molecule. In such cases, the protein-binding segment of the Cas9 guide RNA is RNA and forms an RNA duplex. Thus, the duplex-forming segments of the activator and the targeter is RNA. However, the targeting segment of a Cas9 guide RNA can be DNA. Thus, if a DNA/RNA hybrid Cas9 guide RNA is a dual Cas9 guide RNA, the "targeter" molecule and be a hybrid molecule (e.g, the targeting segment can be DNA and the duplex-forming segment can be RNA). In such cases, the duplex-forming segment of the "activator" molecule can be RNA (e.g., in order to form an RNA-duplex with the duplex-forming segment of the targeter molecule), while nucleotides of the "activator" molecule that are outside of the duplex-forming segment can be DNA (in which case the activator molecule is a hybrid DNA/RNA molecule) or can be RNA (in which case the activator molecule is RNA). If a DNA/RNA hybrid Cas9 guide RNA is a single Cas9 guide RNA, then the targeting segment can be DNA, the duplex-forming segments (which make up the protein-binding segment of the single Cas9 guide RNA) can be
RNA, and nucleotides outside of the targeting and duplex-forming segments can be RNA or DNA.
[00453] A DNA/RNA hybrid guide nucleic can be useful in some cases, for example, when a target nucleic acid is an RNA. Cas9 normally associates with a guide RNA that hybridizes with a target DNA, thus forming a DNA-RNA duplex at the target site. Therefore, when the target nucleic acid is an RNA, it is sometimes advantageous to recapitulate a DNA-RNA duplex at the target site by using a targeting segment (of the Cas9 guide RNA) that is DNA instead of RNA.
However, because the protein-binding segment of a Cas9 guide RNA is an RNA-duplex, the targeter molecule is DNA in the targeting segment and RNA in the duplex-forming segment. Hybrid Cas9 guide RNAs can bias Cas9 binding to single stranded target nucleic acids relative to double stranded target nucleic acids.
[00454] Example Cas9 guide RNAs
[00455] In some embodiments, a suitable Cas9 guide RNA comprises two separate RNA polynucleotide molecules. In some cases, the first of the two separate RNA polynucleotide molecules (the activator) comprises a nucleotide sequence having 60% or more (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more, or 100%) nucleotide sequence identity over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides) to any one of the nucleotide sequences set forth in SEQ ID NOs:431-562, or a complement thereof. In some cases, the second of the two separate RNA polynucleotide molecules (the targeter) comprises a nucleotide sequence having 60% or more (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more, or 100%) nucleotide sequence identity over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides) to any one of the nucleotide sequences set forth in SEQ ID NOs:563-679, or a complement thereof.
[00456] In some embodiments, a suitable Cas9 guide RNA is a single RNA polynucleotide and
comprises a first nucleotide sequence having 60% or more (e.g., 65% or more, 70% or more,
75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more, or 100%) nucleotide sequence identity over a stretch of 8 or more contiguous nucleotides
(e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides) to any one of the nucleotide sequences set forth in SEQ ID NOs:431-562 and a
second nucleotide sequence having 60% or more (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 98% or more, 99% or more, or 100%) nucleotide sequence identity over a stretch of 8 or more contiguous nucleotides (e.g., 8 or more contiguous nucleotides, 10 or more contiguous nucleotides, 12 or more contiguous nucleotides, 15 or more contiguous nucleotides, or 20 or more contiguous nucleotides) to any one of the nucleotide sequences set forth in SEQ ID NOs: 463-679.
[00457] In some embodiments, the Cas9 guide RNA is a dual Cas9 guide RNA and the targeter
comprises the sequence 5'GUUUUAGAGCUA-3' (SEQ ID NO:679) linked at its 5' end to a stretch of nucleotides that are complementary to a target nucleic acid. In some embodiments, the Cas9 guide RNA is a dual Cas9 guide RNA and the activator comprises the sequence 5' UAGC AAGUUAAAAUAAGGCUAGUCCG-3' (SEQ ID NO:397).
[00458] In some embodiments, the Cas9 guide RNA is a single Cas9 guide RNA and comprises the sequence 5 ' -GUUUUAGAGCUA-linker-UAGC AAGUUAAAAUAAGGCUAGUCCG-3 ' (SEQ ID NO:680) linked at its 5' end to a stretch of nucleotides that are complementary to a target nucleic acid (where "linker" denotes any a linker nucleotide sequence that can comprise any nucleotide sequence). Other exemplary single Cas9 guide RNAs include those set forth in SEQ ID NOs: 680-682.
[00459] Stability control sequence (e.g., transcriptional terminator segment)
[00460] In some embodiments, a Cas9 guide RNA comprises a stability control sequence. A stability control sequence influences the stability of a nucleic acid (e.g., a Cas9 guide RNA, a targeter, an activator, etc.). One example of a suitable stability control sequence for use with an RNA is a transcriptional terminator segment (i.e., a transcription termination sequence). A transcriptional terminator segment of a subject Cas9 guide RNA can have a total length of from about 10 nucleotides to about 100 nucleotides, e.g., from about 10 nucleotides (nt) to about 20 nt, from about 20 nt to about 30 nt, from about 30 nt to about 40 nt, from about 40 nt to about 50 nt, from about 50 nt to about 60 nt, from about 60 nt to about 70 nt, from about 70 nt to about 80 nt, from about 80 nt to about 90 nt, or from about 90 nt to about 100 nt. For example, the transcriptional terminator segment can have a length of from about 15 nucleotides (nt) to about 80 nt, from about 15 nt to about 50 nt, from about 15 nt to about 40 nt, from about 15 nt to about 30 nt or from about 15 nt to about 25 nt.
[00461] In some cases, the transcription termination sequence is one that is functional in a eukaryotic cell. In some cases, the transcription termination sequence is one that is functional in a prokaryotic cell.
[00462] Non-limiting examples of nucleotide sequences that can be included in a stability control sequence (e.g., transcriptional termination segment, or in any segment of the Cas9 guide RNA to provide for increased stability) include sequences set forth in SEQ ID NO:683-696 and, for example,
5'-UAAUCCCACAGCCGCCAGUUCCGCUGGCGGCAUUUU-5' (SEQ ID NO: 1349) (a Rho-independent trp termination site).
Additional sequences
[00463] In some embodiments, a Cas9 guide RNA comprises an additional segment or segments (in some cases at the 5' end, in some cases the 3' end, in some cases at either the 5' or 3' end, in some cases embedded within the sequence (i.e., not at the 5' and/or 3' end), in some cases at both the 5' end and the 3' end, in some cases embedded and at the 5' end and/or the 3' end, etc). For example, a suitable additional segment can comprise a 5' cap (e.g., a 7-methylguanylate cap (m7G)); a 3' polyadenylated tail (i.e., a 3' poly(A) tail); a ribozyme sequence (e.g. to allow for self-cleavage of a Cas9 guide RNA (or component of a Cas9 guide RNA, e.g., a targeter, an activator, etc.) and release of a mature PAMmer in a regulated fashion); a riboswitch sequence (e.g., to allow for regulated stability and/or regulated accessibility by proteins and protein complexes); a sequence that forms a dsRNA duplex (i.e., a hairpin)); a sequence that targets an RNA to a subcellular location (e.g., nucleus, mitochondria, chloroplasts, and the like); a modification or sequence that provides for tracking (e.g., a direct label (e.g., direct conjugation to a fluorescent molecule (i.e., fluorescent dye)), conjugation to a moiety that facilitates fluorescent detection, a sequence that allows for fluorescent detection; a modification or sequence that provides a binding site for proteins (e.g., proteins that act on DNA, including transcriptional activators, transcriptional repressors, DNA methyltransferases, DNA
demethylases, histone acetyltransferases, histone deacetylases, proteins that bind RNA (e.g., RNA aptemers), labeled proteins, fluorescently labeled proteins, and the like); a modification or sequence that provides for increased, decreased, and/or controllable stability; and combinations thereof.
[00464] Examples of various Cas9 guide RNAs can be found in the art, for example, see Jinek et al.,
Science. 2012 Aug 17;337(6096):816-21 ; Chylinski et al., RNA Biol. 2013 May;10(5):726-37;
Ma et al., Biomed Res Int. 2013;2013:270805; Hou et al., Proc Natl Acad Sci U S A. 2013 Sep
24;110(39): 15644-9; Jinek et al., Elife. 2013;2:e00471; Pattanayak et al., Nat Biotechnol. 2013
Sep;31(9):839-43; Qi et al, Cell. 2013 Feb 28;152(5): 1173-83; Wang et al., Cell. 2013 May
9;153(4):910-8; Auer et. al., Genome Res. 2013 Oct 31; Chen et. al., Nucleic Acids Res. 2013
Nov l ;41(20):el9; Cheng et. al., Cell Res. 2013 Oct;23(10): 1163-71 ; Cho et. al., Genetics. 2013
Nov;195(3): 1177-80; DiCarlo et al., Nucleic Acids Res. 2013 Apr;41(7):4336-43; Dickinson et.
al., Nat Methods. 2013 Oct;10(10): 1028-34; Ebina et. al., Sci Rep. 2013;3:2510; Fujii et. al, Nucleic Acids Res. 2013 Nov l ;41(20):el87; Hu et. al, Cell Res. 2013 Nov;23(l l): 1322-5; Jiang et. al., Nucleic Acids Res. 2013 Nov l ;41(20):el88; Larson et. al., Nat Protoc. 2013 Nov;8(l l):2180-96; Mali et. at., Nat Methods. 2013 Oct;10(10):957-63; Nakayama et. al., Genesis. 2013 Dec;51(12):835-43; Ran et. al., Nat Protoc. 2013 Nov;8(l l):2281-308; Ran et. al., Cell. 2013 Sep 12;154(6): 1380-9; Upadhyay et. al., G3 (Bethesda). 2013 Dec 9;3(12):2233-8; Walsh et. al., Proc Natl Acad Sci U S A. 2013 Sep 24;110(39): 15514-5; Xie et. al., Mol Plant. 2013 Oct 9; Yang et. al., Cell. 2013 Sep 12; 154(6): 1370-9; Briner et al., Mol Cell. 2014 Oct 23;56(2):333-9; and U.S. patents and patent applications: 8,906,616; 8,895,308; 8,889,418; 8,889,356; 8,871,445; 8,865,406; 8,795,965; 8,771,945; 8,697,359; 20140068797;
20140170753; 20140179006; 20140179770; 20140186843; 20140186919; 20140186958;
20140189896; 20140227787; 20140234972; 20140242664; 20140242699; 20140242700;
20140242702; 20140248702; 20140256046; 20140273037; 20140273226; 20140273230;
20140273231 ; 20140273232; 20140273233; 20140273234; 20140273235; 20140287938;
20140295556; 20140295557; 20140298547; 20140304853; 20140309487; 20140310828;
20140310830; 20140315985; 20140335063; 20140335620; 20140342456; 20140342457;
20140342458; 20140349400; 20140349405; 20140356867; 20140356956; 20140356958;
20140356959; 20140357523; 20140357530; 20140364333; and 20140377868; all of which are hereby incorporated by reference in their entirety.
[00465] Donor polynucleotide
[00466] In some cases, the contacting occurs under conditions that are permissive for nonhomologous end joining or homology-directed repair. In some cases, the method further comprises contacting the target DNA with a donor polynucleotide, wherein the donor polynucleotide, a portion of the donor polynucleotide, a copy of the donor polynucleotide, or a portion of a copy of the donor polynucleotide integrates into the target DNA. In some cases, the method does not comprise contacting a cell with a donor polynucleotide, and the target DNA is modified such that nucleotides within the target DNA are deleted.
[00467] In some cases, Cas9 guide RNA and a Cas9 heterodimer are coadministered (e.g., contacted with a target nucleic acid, administered to cells, etc.) with a donor polynucleotide sequence that includes at least a segment with homology to the target DNA sequence, the subject methods may be used to add, i.e. insert or replace, nucleic acid material to a target DNA sequence (e.g. to "knock in" a nucleic acid that encodes for a protein, an siRNA, an miRNA, etc.), to add a tag (e.g., 6xHis, a fluorescent protein (e.g., a green fluorescent protein; a yellow fluorescent protein, etc.), hemagglutinin (HA), FLAG, etc.), to add a regulatory sequence to a gene (e.g. promoter, polyadenylation signal, internal ribosome entry sequence (IRES), 2A peptide, start codon, stop
codon, splice signal, localization signal, etc.), to modify a nucleic acid sequence (e.g., introduce a mutation), and the like. As such, a complex comprising a Cas9 guide RNA and a Cas9 heterodimer is useful in any in vitro or in vivo application in which it is desirable to modify DNA in a site-specific, i.e. "targeted", way, for example gene knock-out, gene knock-in, gene editing, gene tagging, etc., as used in, for example, gene therapy, e.g. to treat a disease or as an antiviral, antipathogenic, or anticancer therapeutic, the production of genetically modified organisms in agriculture, the large scale production of proteins by cells for therapeutic, diagnostic, or research purposes, the induction of iPS cells, biological research, the targeting of genes of pathogens for deletion or replacement, etc
[00468] In applications in which it is desirable to insert a polynucleotide sequence into a target DNA sequence, a polynucleotide comprising a donor sequence to be inserted is also provided to the cell. By a "donor sequence" or "donor polynucleotide" it is meant a nucleic acid sequence to be inserted at the cleavage site induced by a Cas9 heterodimer. The donor polynucleotide will contain sufficient homology to a genomic sequence at the cleavage site, e.g. 70%, 80%, 85%, 90%, 95%, or 100% homology with the nucleotide sequences flanking the cleavage site, e.g. within about 50 bases or less of the cleavage site, e.g. within about 30 bases, within about 15 bases, within about 10 bases, within about 5 bases, or immediately flanking the cleavage site, to support homology-directed repair between it and the genomic sequence to which it bears homology. Approximately 25, 50, 100, or 200 nucleotides, or more than 200 nucleotides, of sequence homology between a donor and a genomic sequence (or any integral value between 10 and 200 nucleotides, or more) will support homology-directed repair. Donor sequences can be of any length, e.g. 10 nucleotides or more, 50 nucleotides or more, 100 nucleotides or more, 250 nucleotides or more, 500 nucleotides or more, 1000 nucleotides or more, 5000 nucleotides or more, etc.
[00469] The donor sequence is typically not identical to the genomic sequence that it replaces. Rather, the donor sequence may contain at least one or more single base changes, insertions, deletions, inversions or rearrangements with respect to the genomic sequence, so long as sufficient homology is present to support homology-directed repair. In some embodiments, the donor sequence comprises a non-homologous sequence flanked by two regions of homology, such that homology-directed repair between the target DNA region and the two flanking sequences results in insertion of the non-homologous sequence at the target region. Donor sequences may also comprise a vector backbone containing sequences that are not homologous to the DNA region of interest and that are not intended for insertion into the DNA region of interest. Generally, the homologous region(s) of a donor sequence will have at least 50% sequence identity to a genomic sequence with which recombination is desired. In certain embodiments, 60%, 70%, 80%, 90%,
95%, 98%, 99%, or 99.9% sequence identity is present. Any value between 1% and 100% sequence identity can be present, depending upon the length of the donor polynucleotide.
[00470] The donor sequence may comprise certain sequence differences as compared to the genomic sequence, e.g. restriction sites, nucleotide polymorphisms, selectable markers (e.g., drug resistance genes, fluorescent proteins, enzymes etc.), etc., which may be used to assess for successful insertion of the donor sequence at the cleavage site or in some cases may be used for other purposes (e.g., to signify expression at the targeted genomic locus). In some cases, if located in a coding region, such nucleotide sequence differences will not change the amino acid sequence, or will make silent amino acid changes (i.e., changes which do not affect the structure or function of the protein). Alternatively, these sequences differences may include flanking recombination sequences such as FLPs, loxP sequences, or the like, that can be activated at a later time for removal of the marker sequence.
[00471] The donor sequence may be provided to the cell as single-stranded DNA, single-stranded RNA, double-stranded DNA, or double-stranded RNA. It may be introduced into a cell in linear or circular form. If introduced in linear form, the ends of the donor sequence may be protected (e.g., from exonucleolytic degradation) by methods known to those of skill in the art. For example, one or more dideoxynucleotide residues are added to the 3' terminus of a linear molecule and/or self-complementary oligonucleotides are ligated to one or both ends. See, for example, Chang et al. (1987) Proc. Natl. Acad Sci USA 84:4959-4963; Nehls et al. (1996) Science 272:886-889. Additional methods for protecting exogenous polynucleotides from degradation include, but are not limited to, addition of terminal amino group(s) and the use of modified internucleotide linkages such as, for example, phosphorothioates, phosphor amidates, and O-methyl ribose or deoxyribose residues. As an alternative to protecting the termini of a linear donor sequence, additional lengths of sequence may be included outside of the regions of homology that can be degraded without impacting recombination. A donor sequence can be introduced into a cell as part of a vector molecule having additional sequences such as, for example, replication origins, promoters and genes encoding antibiotic resistance. Moreover, donor sequences can be introduced as naked nucleic acid, as nucleic acid complexed with an agent such as a liposome or poloxamer, or can be delivered by viruses (e.g., adenovirus, AAV), as described above for nucleic acids encoding a Cas9 guide RNA and/or a Cas9 heterodimer and/or donor polynucleotide.
[00472] PAMmer (and PAM sequences)
[00473] In some cases, e.g., when a target nucleic acid is single stranded, a PAMmer can be used to provide a PAM sequence. PAMmers can be present in subject compositions, systems, kits, and/or methods.
[00474] A "PAMmer" is a single stranded oligonucleotide (e.g., DNA, RNA, a modified nucleic acid, etc.) that hybridizes to a single stranded target nucleic acid (thus converting the single stranded target nucleic acid into a double stranded target nucleic acid at a desired position), and provides a protospacer adjacent motif (PAM) sequence. For information regarding PAMmer s in addition to the discussion below, see, for example, O'Connell et al., Nature. 2014 Dec 11;516(7530):263- 6; and Sternberg et. al., Nature. 2014 Mar 6;507(7490):62-7; both of which are hereby incorporated by reference in their entirety.
[00475] A PAMmer includes a PAM sequence and at least one of: an orientation segment (which is positioned 3' of the PAM sequence), and a specificity segment (which is positioned 5' of the PAM sequence). A specificity segment has a nucleotide sequence that is complementary to a first target nucleotide sequence in a target nucleic acid (i.e., the sequence that is targeted by the specificity segment), where the first target nucleotide sequence overlaps (in some cases 100%) with the sequence targeted by the targeting segment of the guide nucleic acid. In other words, the specificity segment is complementary with (and hybridizes to) the target site of the target nucleic acid (see Figures 8A-F).
[00476] In some cases, a PAMmer having a specificity segement is referred to herein as a "5' extended PAMmer." The term "5' extended PAMmer" refers to a situation in which a PAMmer includes nucleotides 5' of the PAM sequence. The term "5' extended PAMmer" encompasses a PAMmer having a specificity segment, but also encompasses a PAMmer that has nucleotides 5' of the PAM sequence that do not constitute a specificity segment. Thus, in some cases, the nucleotides that are 5' of the PAM sequence constitute a specificity segment (i.e., the nucleotides hybridize to the target nucleic acid)(see below for a more detailed discussion regarding a specificity segment), and in some cases, a PAMmer has nucleotides that are 5' of the PAM sequence that do not constitute a specificity segment (do not hybridize with the target nucleic acid).
[00477] An orientation segment has a nucleotide sequence that is complementary to a second target nucleotide sequence in a target nucleic acid (i.e., the sequence that is targeted by the orientation segment). In some cases, a subject PAMmer includes a PAM sequence and an orientation segment, but does not include a specificity segment. In some cases, a subject PAMmer includes a PAM sequence and a specificity segment, but does not include an orientation segment.
[00478] In some cases, a subject PAMmer includes a PAM sequence, an orientation segment, and a specificity segment. The number of nucleotides (nt) present in the PAMmer between a specificity segment and an orientation segment can depend on a number of factors that include, but are not limited to: the length of the PAM sequence (which is present between the specificity segment and the orientation segment); the number of of nucleotides present between the target
site and the orientation site of the target nucleic acid; the presence or absence of additional sequences (e.g., aptamers, protein binding sequences, linker nucleotides, stability sequences, etc.) between the specificity segment and the orientation segment; etc. In some embodiments, the number of nucleotides (nt) present in the PAMmer between a specificity segment and an orientation segment is in a range of from 2 nt to 100 nt (e.g., 2 nt to 90 nt, 2 nt to 80 nt, 2 nt to 70 nt, 2 nt to 60 nt, 2 nt to 50 nt, 2 nt to 40 nt, 2 nt to 30 nt, 2 nt to 25 nt, 2 nt to 20 nt, 2 nt to 15 nt, or 2 nt to 10 nt). In some embodiments, the number of nucleotides (nt) present in the PAMmer between the specificity segment and the orientation segment is 100 nt or less (e.g., 90 nt or less, 80 nt or less, 70 nt or less, 60 nt or less, 50 nt or less, 40 nt or less, 30 nt or less, 25 nt or less, 25 nt or less, 20 nt or less, 15 nt or less, or 10 nt or less).
[00479] In some embodiments, the PAM sequence is immediately adjacent to the orientation segment, immediately adjacent to the specificity segment, and/or immediately adjacent to both the orientation segment and the specificity segment. In some embodiments, the number of nucleotides (nt) present in the PAMmer between the PAM sequence and the specificity segment of the PAMmer is in a range of from 0 nt to 10 nt (e.g., 0 nt to 9 nt, 0 nt to 8 nt, 0 nt to 7 nt, 0 nt to 6 nt, 0 nt to 5 nt, 0 nt to 4 nt, 0 nt to 3 nt, 1 nt to 9 nt, 1 nt to 8 nt, 1 nt to 7 nt, 1 nt to 6 nt, 1 nt to 5 nt, 1 nt to 4 nt, 1 nt to 3 nt, 2 nt to 9 nt, 2 nt to 8 nt, 2 nt to 7 nt, 2 nt to 6 nt, 2 nt to 5 nt, 2 nt to 4 nt, or 2 nt to 3 nt). In some embodiments, 10 or less nt (e.g., 9 or less nt, 8 or less nt, 7 or less nt, 6 or less nt, 5 or less nt, 4 or less nt, 3 or less nt, 2 or less nt, 1 or less nt, or no nt) are present in the PAMmer between the PAM sequence and the specificity segment. In some embodiments, the number of nucleotides (nt) present in the PAMmer between the PAM sequence and the orientation segment of the PAMmer is in a range of from 0 nt to 10 nt (e.g., 0 nt to 9 nt, 0 nt to 8 nt, 0 nt to 7 nt, 0 nt to 6 nt, 0 nt to 5 nt, 0 nt to 4 nt, 0 nt to 3 nt, 1 nt to 9 nt, 1 nt to 8 nt, 1 nt to 7 nt, 1 nt to 6 nt, 1 nt to 5 nt, 1 nt to 4 nt, 1 nt to 3 nt, 2 nt to 9 nt, 2 nt to 8 nt, 2 nt to 7 nt, 2 nt to 6 nt, 2 nt to 5 nt, 2 nt to 4 nt, or 2 nt to 3 nt). In some embodiments, 10 or less nt (e.g., 9 or less nt, 8 or less nt, 7 or less nt, 6 or less nt, 5 or less nt, 4 or less nt, 3 or less nt, 2 or less nt, 1 or less nt, or no nt) are present in the PAMmer between the PAM sequence and the orientation segment.
[00480] In some embodiments, a PAMmer has a length (e.g., the PAM sequence and the orientation segment have a combined length) in a range of from 2 nt to 100 nt (e.g., 2 nt to 70 nt, 2 nt to 50 nt, 2 nt to 45 nt, 2 nt to 40 nt, 2 nt to 35 nt, 2 nt to 30 nt, 2 nt to 25 nt, 2 nt to 20 nt, 2 nt to 10 nt, 2 nt to 5 nt, 3 nt to 70 nt, 3 nt to 50 nt, 3 nt to 45 nt, 3 nt to 40 nt, 3 nt to 35 nt, 3 nt to 30 nt, 3 nt to 25 nt, 3 nt to 20 nt, 3 nt to 10 nt, 3 nt to 5 nt, 5 nt to 70 nt, 5 nt to 50 nt, 5 nt to 45 nt, 5 nt to 40 nt, 5 nt to 35 nt, 5 nt to 30 nt, 5 nt to 25 nt, 5 nt to 20 nt, 10 nt to 70 nt, 10 nt to 50 nt, 10 nt to 45 nt, 10 nt to 40 nt, 10 nt to 35 nt, 10 nt to 30 nt, 10 nt to 25 nt, 10 nt to 20 nt, 10 nt to 15 nt, 15 nt
to 70 nt, 15 nt to 50 nt, 15 nt to 45 nt, 15 nt to 40 nt, 15 nt to 35 nt, 15 nt to 30 nt, 15 nt to 25 nt, or 15 nt to 20 nt).
[00481] In some cases, a PAMmer is a DNA molecule. In some cases, a PAMmer is an RNA molecule.
In some cases, a PAMmer is a hybrid DNA/RNA molecule (e.g., in some cases, at least the PAM seqeunce of the PAMmer is DNA). In some cases the PAMmer has one or more modified nucleic acids (described in more detail below with respect to nucleic acid modifications). In some embodiments, a subject PAMmer has one or more nucleotides that are 2'-0-Methyl modified nucleotides. In some embodiments, a subject PAMmer has one or more 2' Fluoro modified nucleotides. In some embodiments, a subject PAMmer has one or more LNA bases. In some embodiments, a subject PAMmer has one or more nucleotides that are linked by a phosphorothioate bond (i.e., the subject nucleic acid has one or more phosphorothioate linkages). In some embodiments, a subject PAMmer has a 5' cap (e.g., a 7-methylguanylate cap (m7G)). In some embodiments, a subject PAMmer has a combination of modified nucleotides. For example, a subject PAMmer can have a 5' cap (e.g., a 7-methylguanylate cap (m7G)) in addition to having one or more nucleotides with other modifications (e.g., a 2'-0-Methyl nucleotide and/or a 2' Fluoro modified nucleotide and/or a LNA base and/or a phosphorothioate linkage).
[00482] PAM sequence
[00483] A wild type Cas9 polypeptide normally has nuclease activity that cleaves a target nucleic acid (e.g., a double stranded DNA (dsDNA)) at a target site defined by the region of complementarity between the targeting segment of the guide nucleic acid and the target nucleic acid. In some cases, site-specific cleavage of the target nucleic acid occurs at locations determined by both (i) base-pairing complementarity between the guide nucleic acid and the target nucleic acid; and (ii) a short motif referred to as the protospacer adjacent motif (PAM) in the target nucleic acid. When a Cas9 polypeptde binds to (in some cases cleaves) a dsDNA target nucleic acid, the PAM sequence that is recognized (bound) by the Cas9 polypeptide is present on the non- complementary strand (the strand that does not hybridize with the targeting segment of the guide nucleic acid) of the target DNA. Thus, when a Cas9 Polypeptide binds to (in some cases cleaves) a single stranded target nucleic acid, no PAM sequence is present because there is no non- complementary strand (see Figure 8). A subject PAMmer provides a PAM sequence, which is positioned near the target site (the sequence targeted by the targeting segment of the guide nucleic acid) by the orientation segment and/or the specificity segment of the PAMmer.
[00484] In some embodiments, the PAM sequence of the PAMmer is complementary to (i.e., hybridizes with) the target nucleic acid. In some embodiments, the PAM sequence of the PAMmer is not complementary to (i.e., does not hybridize with) the target nucleic acid. In some embodiments, a
PAM sequence of a PAMmer has a length in a range of from 1 nt to 15 nt (e.g., 1 nt to 14 nt, 1 nt
to 13 nt, 1 nt to 12 nt, 1 nt to 11 nt, 1 nt to 10 nt, 1 nt to 9 nt, 1 nt to 9 nt, 1 nt to 8 nt, 1 nt to 7 nt, 1 nt to 6 nt, 1 nt to 5 nt, 1 nt to 4 nt, 1 nt to 3 nt, 2 nt to 15 nt, 2 nt to 14 nt, 2 nt to 13 nt, 2 nt to 12 nt, 2 nt to 11 nt, 2 nt to 10 nt, 2 nt to 9 nt, 2 nt to 8 nt, 2 nt to 7 nt, 2 nt to 6 nt, 2 nt to 5 nt, 2 nt to 4 nt, 2 nt to 3 nt, 2 nt, or 3 nt).
[00485] In some embodiments (e.g., when the Cas9 heterodimer is derived from 5. pyogenes or a closely related Cas9 is used; see for example, Chylinski et al., RNA Biol. 2013 May;10(5):726-37; and Jinek et al., Science. 2012 Aug 17;337(6096):816-21 ; both of which are hereby incorporated by reference in their entirety), a PAM sequence (e.g., of a target nucleic acid, of a PAMmer, etc.) can be GG (5'-GG-3'), or can be 5'-NGG-3', where N is any nucleotide. In some embodiments (e.g., when a Cas9 heterodimer is derived from the Cas9 protein of Neisseria meningitidis or a closely related Cas9 is used), the PAM sequence (e.g., of a target nucleic acid, of a PAMmer, etc.) can be 5 ' -NNNNGANN-3 ' , 5 ' -NNNNGTTN-3 ' , 5 ' -NNNNGNNT-3 ' , 5 ' -NNNNGTNN-3 ' , 5 ' -NNNNGNTN-3 ' , or 5 ' -NNNNGATT-3 ' , where N is any nucleotide. In some embodiments (e.g., when a Cas9 heterodimer is derived from Streptococcus thermophilus #1 or a closely related Cas9 is used), the PAM sequence (e.g., of a target nucleic acid, of a PAMmer, etc.) can be 5'-NNAGAA-3' , 5'-NNAGGA-3', 5'-NNGGAA-3', 5'-NNANAA-3', or 5'-NNGGGA-3' where N is any nucleotide. In some embodiments (e.g., when a Cas9 heterodimer is derived from Treponema denticola (TD) or a closely related Cas9 is used), the PAM sequence (e.g., of a target nucleic acid, of a PAMmer, etc.) can be 5'-NAAAAN-3', 5'-NAAAAC-3', 5'-NAAANC-3' , 5'-NANAAC-3', or 5'-NNAAAC-3' , where N is any nucleotide. As would be known by one of ordinary skill in the art, additional PAM sequences for other Cas9 polypeptides can readily be determined using bioinformatic analysis (e.g, analysis of genomic sequencing data). See Esvelt et al., Nat Methods. 2013 Nov;10(l l): 1116-21, for additional information.
[00486] Also as known in the art, the PAM-interacting domain can be derived from a Cas9 protein from a first species, and the PAM sequence can correspond to that domain. Thus, in some cases, a subject Cas9 heterodimer has a PAM-interacting domain that is derived from a Cas9 protein of a first species, and other portions of the Cas9 heterodimer (e.g., the rest of the Cas9 heterodimer) can be derived from the Cas9 protein of a second species.
[00487] Specificity segment
[00488] A specificity segment can be present or absent in a subject PAMmer (the PAMmer has a
specificity segment, an orientation segment, or both a specificity segment and an orientation segment), and when present, the specificity segment is positioned 5' of the PAM sequence. As noted above, in some cases, a PAMmer having a specificity segment is referred to herein as a
"5 '-extended PAMmer." The specificity segment hybridizes to (i.e., targets) a sequence of a target nucleic that overlaps with the target site such that the PAM sequence is positioned near the
target site (i.e., the sequence of the target nucleic acid that is targeted by the targeting segment of the guide nucleic acid). Thus, the PAMmer provides a PAM sequence at any desired location within a target nucleic acid (e.g., by designing the specificity segment of the PAMmer to hybridize to any desired nucleotide sequence of the target nucleic acid).
[00489] In cases where a PAMmer is used in a method of cleavage, the targeting segment of the guide nucleic acid (which associates with a Cas9 heterodimer) is complementary to the target nucleic acid, and this is true whether or not the PAMmer has a specificity segment. In cases where a PAMmer is used in a method of binding, the targeting segment of the guide nucleic acid (which associates with a Cas9 heterodimer) is complementary to the target nucleic acid when the PAMmer has a specificity segment, but the targeting segment of the guide nucleic acid need not be complementary to the target nucleic acid when the PAMmer does not have a specificity segment (i.e., when the PAMmer has PAM sequence and an orientation segment, but not a specificity segment).
[00490] A specificity segment can have a length of from 3 nucleotides (nt) to 100 nt (e.g., from 3 nt to 80 nt, from 3 nt to 50nt, from 3 nt to 40 nt, from 5 nt to 40 nt, from 5 nt to 35 nt, from 5 nt to 30 nt, from 5 nt to 25 nt, from 10 nt to 40 nt, from 10 nt to 35 nt, from 10 nt to 30 nt, from 10 nt to 25 nt, from 10 nt to 20 nt, from 12 nt to 40 nt, from 12 nt to 35 nt, from 12 nt to 30 nt, from 12 nt to 25 nt, from 12 nt to 20 nt, from 15 nt to 40 nt, from 15 nt to 35 nt, from 15 nt to 30 nt, from 15 nt to 25 nt, from 15 nt to 20 nt, from 17 nt to 40 nt, from 17 nt to 35 nt, from 17 nt to 30 nt, from 17 nt to 25 nt, from 17 nt to 20 nt, from 18 nt to 40 nt, from 18 nt to 35 nt, from 18 nt to 30 nt, from 18 nt to 25 nt, from 18 nt to 20 nt, from 20 nt to 40 nt, from 20 nt to 35 nt, from 20 nt to 30 nt, or from 20 nt to 25 nt). In some cases, the specificity segment is 20 nucleotides in length. In some cases, the specificity segment is 19 nucleotides in length.
[00491] The percent complementarity between the specificity segment and the sequence of the target nucleic acid targeted by the specificity segment (e.g., the target site, i.e., the site targeted by the targeting segment of the guide nucleic acid) can be 60% or more (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%). In some cases, the percent complementarity between the specificity segment and the sequence of the target nucleic acid targeted by the specificity segment is 60% or more (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over about 10 to 30 contiguous nucleotides (nt) (e.g. 15 to 30 contiguous nt, 15 to 25 contiguous nt, 17 to 30 contiguous t, 17 to 25 contiguous t, or 18 to 22 contiguous nt). In some cases, the percent complementarity between the specificity segment and the sequence of the target nucleic acid targeted by the specificity segment is 60% or more (e.g., 65% or more, 70% or more, 75%
or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 10 or more contiguous nucleotides (nt) (e.g. 12 or more contiguous nt, 15 or more contiguous nt, 17 or more contiguous nt, 18 or more contiguous nt, 19 or more contiguous nt, or 20 or more contiguous nt).
[00492] In some cases, the sequence targeted by the specificity segment of a PAMmer is 100% identical to the target site (i.e., the sequence targeted by the targeting segment of the guide nucleic acid). However, the sequence targeted by the specificity segment of a PAMmer need not be 100% identical to the target site. For example, in some cases, the sequence targeted by the specificity segment of a PAMmer overlaps with the sequence targeted by the targeting segment of the guide nucleic acid, but the overlap is not 100%. For example, the sequence targeted by the specificity segment of a PAMmer can be a subset of the target site. In some cases, the sequence targeted by the specificity segment of a PAMmer is shorter than the sequence targeted by the targeting segment of the guide nucleic acid. In some cases, the sequence targeted by the specificity segment of a PAMmer is longer than the sequence targeted by the targeting segment of the guide nucleic acid. In some cases, the sequence targeted by the specificity segment of a PAMmer is the same length as the sequence targeted by the targeting segment of the guide nucleic acid.
[00493] In some cases, the sequence targeted by the specificity segment of a PAMmer shares 2
nucleotides (nt) or more with the sequence targeted by the targeting segment of the guide nucleic acid (e.g., 3 nt or more, 5 nt or more, 8 nt or more, 10 nt or more, 12 nt or more, 15 nt or more, 18 nt or more, etc.). In some cases, the sequence targeted by the specificity segment of a PAMmer shares 2 nucleotides (nt) to 30 nt with the sequence targeted by the targeting segment of the guide nucleic acid (e.g., 5 nt to 30 nt, 5 nt to 25 nt, 5 nt to 22 nt, 8 nt to 30 nt, 8 nt to 25 nt, 8 nt to 22 nt, 8 nt to 20 nt, 10 nt to 30 nt, 10 nt to 25 nt, 10 nt to 22 nt, 10 nt to 20 nt, 12 nt to 30 nt, 12 nt to 25 nt, 12 nt to 22 nt, 12 nt to 20 nt, 15 nt to 30 nt, 15 nt to 25 nt, 15 nt to 22 nt, 15 nt to 20 nt, 18 nt to 30 nt, 18 nt to 25 nt, 18 nt to 22 nt, or 18 nt to 20 nt).
[00494] In some embodiments, a PAMmer has a specificity segment, but does not have an orientation segment (i.e., the PAMmer does not have a nucleotide sequence 3' of the PAM sequence that hybridizes with the target nucleic acid) (Figure 8E). In some such cases, the PAM sequence can be at the 3' end of the PAMmer (i.e., the PAMmer can have 0 nucleotides 3' of the PAM sequence), or the PAMmer can have 1 or more nucleotides (nt) 3' of the PAM sequence (e.g., 2 or more nt, 3 or more nt, 4 or more nt, 5 or more nt, 10 or more nt, 15 or more nt, 20 or more nt, etc.), where the nucleotides 3' of the PAM sequence do not hybridize to the target nucleic acid. In some cases in which a PAMmer does not have an orientation segment, a PAMmer can have a nucleotide sequence, 3' of the PAM sequence, with a length in a range of from 1 nucleotide (nt) to 20 nt (e.g., from 1 nt to 18 nt, from 1 nt to 16 nt, from 1 nt to 14 nt, from 1 nt to 12 nt, from 1
nt to 10 nt, from 1 nt to 9 nt, from 1 nt to 8 nt, from 1 nt to 7 nt, from 1 nt to 6 nt, from 1 nt to 5 nt, from 1 nt to 4 nt, or from 1 nt to 3 nt), where the nucleotides 3' of the PAM sequence do not hybridize to the target nucleic acid. For example, if a PAMmer has nucleotides 3' of the PAM sequence that do hybridize to the target nucleic acid, then the nucleotides that hybridize would be considered an (or part of an) orientation segment.
[00495] In some cases, the length of the specificity segment inversely correlates with efficiency of the cleavage reaction and positively correlates with specificity (i.e., reduction of off-target effects). Thus, there can be a trade-off between the desired level of cleavage and the desired level of specificity. The presence (as well as the length) of a specificity segment can be determined based on the particular target nucleic acid, the nature/purpose of the method, and/or the desired outcome. For example, if maximum specificity is desired, but cleavage efficiency is not a concern, then a long specificity segment may be desirable. On the other hand, if maximum cleavage is desired, but specificity is not a concern (e.g., the orientation segment of the PAMmer provides for adequate specificity), then a shorter specificity segment (e.g., no specificity segment) may be desirable.
[00496] For methods of binding, the presence of a specificity segment can increase binding specificity.
Not to be bound by theory, it is believed that this is because the specificity segment provides an energetic barrier to binding that can be overcome by the presence of a targeting segment in the guide nucleic acid that has complementarity to (i.e., can hybridize with) that target nucleic acid, thus displacing the specificity segment of the PAMmer.
[00497] Orientation segment
[00498] An orientation segment can be present or absent in a subject PAMmer (the PAMmer has a specificity segment, an orientation segment, or both a specificity segment and an orientation segment), and when present, the orientation segment is positioned 3' of the PAM seqeunce. The orientation segment hybridizes to (i.e., targets) a sequence of a target nucleic (the orientation site) such that the PAM sequence is positioned near the target site (i.e., the sequence of the target nucleic acid that is targeted by the targeting segment of the guide nucleic acid). Thus, the PAMmer provides a PAM sequence at any desired location within a target nucleic acid (e.g., by designing the orientation segment of the PAMmer to hybridize to any desired nucleotide sequence of the target nucleic acid).
[00499] The orientation segment can have a length of from 3 nucleotides (nt) to 100 nt (e.g., from 3 nt to 80 nt, from 3 nt to 50nt, from 3 nt to 40 nt, from 5 nt to 40 nt, from 5 nt to 35 nt, from 5 nt to 30 nt, from 5 nt to 25 nt, from 10 nt to 40 nt, from 10 nt to 35 nt, from 10 nt to 30 nt, from 10 nt to 25 nt, from 10 nt to 20 nt, from 12 nt to 40 nt, from 12 nt to 35 nt, from 12 nt to 30 nt, from 12
nt to 25 nt, from 12 nt to 20 nt, from 15 nt to 40 nt, from 15 nt to 35 nt, from 15 nt to 30 nt, from 15 nt to 25 nt, from 15 nt to 20 nt, from 17 nt to 40 nt, from 17 nt to 35 nt, from 17 nt to 30 nt, from 17 nt to 25 nt, from 17 nt to 20 nt, from 18 nt to 40 nt, from 18 nt to 35 nt, from 18 nt to 30 nt, from 18 nt to 25 nt, from 18 nt to 20 nt, from 20 nt to 40 nt, from 20 nt to 35 nt, from 20 nt to 30 nt, or from 20 nt to 25 nt). In some cases, the orientation segment is 20 nucleotides in length. In some cases, the orientation segment is 19 nucleotides in length.
[00500] The percent complementarity between the orientation segment and the sequence of the target nucleic acid targeted by the orientation segment can be 60% or more (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%). In some cases, the percent complementarity between the orientation segment and the sequence of the target nucleic acid targeted by the orientation segment is 60% or more (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over about 10 to 30 contiguous nucleotides (nt) (e.g. 15 to 30 contiguous nt, 15 to 25 contiguous nt, 17 to 30 contiguous nt, 17 to 25 contiguous nt, or 18 to 22 contiguous nt). In some cases, the percent complementarity between the orientation segment and the sequence of the target nucleic acid targeted by the orientation segment is 60% or more (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 10 or more contiguous nucleotides (nt) (e.g. 12 or more contiguous nt, 15 or more contiguous nt, 17 or more contiguous nt, 18 or more contiguous nt, 19 or more contiguous nt, or 20 or more contiguous nt).
[00501] In some cases, the sequence targeted by the orientation segment of a PAMmer is immediately adjacent to the sequence targeted by the targeting segment of the guide nucleic acid. In some embodiments, 10 or less nt (e.g., 9 or less nt, 8 or less nt, 7 or less nt, 6 or less nt, 5 or less nt, 4 or less nt, 3 or less nt, 2 or less nt, 1 or less nt, or no nt) are present in the target nucleic acid between the sequence targeted by the targeting segment of the guide nucleic acid (i.e., the target site) and the sequence targeted by the orientation segment of the PAMmer. In some cases, the sequence of the target nucleic acid that is targeted by the orientation segment of a PAMmer is within 10 or fewer nucleotides (nt) (e.g., 9 or fewer nt, 8 or fewer nt, 7 or fewer nt, 6 or fewer nt, 5 or fewer nt, 4 or fewer nt, 3 or fewer nt, 2 or fewer nt, 1 or fewer nt, or no nt) of the sequence targeted by the targeting segment of the guide nucleic acid. In some embodiments, the number of nucleotides (nt) present in the target nucleic acid between the sequence targeted by the targeting segment of the guide nucleic acid (i.e., the target site) and the sequence targeted by the orientation segment of the PAMmer is in a range of from 0 nt to 10 nt (e.g., 0 nt to 9 nt, 0 nt to 8 nt, 0 nt to 7 nt, 0 nt to 6 nt, 0 nt to 5 nt, 0 nt to 4 nt, 0 nt to 3 nt, 1 nt to 9 nt, 1 nt to 8 nt, 1 nt to 7
nt, 1 nt to 6 nt, 1 nt to 5 nt, 1 nt to 4 nt, 1 nt to 3 nt, 2 nt to 9 nt, 2 nt to 8 nt, 2 nt to 7 nt, 2 nt to 6 nt, 2 nt to 5 nt, 2 nt to 4 nt, or 2 nt to 3 nt).
[00502] In some cases, a PAMmer has an orientation segment, but does not have a specificity segment (i.e., the PAMmer does not have a nucleotide sequence 5' of the PAM sequence that hybridizes with the target nucleic acid), but does have an orientation segment (Figure 8F). In some such cases, the PAM sequence can be at the 5' end of the PAMmer (i.e., the PAMmer can have 0 nucleotides 5' of the PAM sequence), or the PAMmer can have 1 or more nucleotides (nt) 5' of the PAM sequence (e.g., 2 or more nt, 3 or more nt, 4 or more nt, 5 or more nt, 10 or more nt, 15 or more nt, 20 or more nt, etc.), where the nucleotides 5' of the PAM sequence do not hybridize to the target nucleic acid. In some cases in which a PAMmer does not have a specificity segment, a PAMmer can have a nucleotide sequence, 5' of the PAM sequence, with a length in a range of from 1 nucleotide (nt) to 20 nt (e.g., from 1 nt to 18 nt, from 1 nt to 16 nt, from 1 nt to 14 nt, from 1 nt to 12 nt, from 1 nt to 10 nt, from 1 nt to 9 nt, from 1 nt to 8 nt, from 1 nt to 7 nt, from 1 nt to 6 nt, from 1 nt to 5 nt, from 1 nt to 4 nt, or from 1 nt to 3 nt), where the nucleotides 5' of the PAM sequence do not hybridize to the target nucleic acid. For example, if a PAMmer has nucleotides 5' of the PAM sequence that do hybridize to the target nucleic acid, then the nucleotides that hybridize would be considered a (or part of a) specificity segment.
[00503] In some cases (e.g., those involving methods of binding, where the PAMmer does not have a specificity segment), the target site of the target nucleic acid can be determined by the orientation segment of the PAMmer and not by the targeting segment of the guide nucleic acid. In some cases, the targeting segment of the guide nucleic acid does not have complementarity to a nucleotide sequence of the target nucleic acid. In some cases, the targeting segment of the guide nucleic acid does not have complementarity to a nucleotide sequence of the target nucleic acid that is near (e.g., within 20 or fewer nucleotides (nt), within 30 or fewer nt, within 40 or fewer t, within 50 or fewer nt, within 60 or fewer nt, within 70 or fewer nt, within 80 or fewer nt, within 90 or fewer nt, or within 100 or fewer nt) the orientation site. However, the orientation segment of the PAMmer still positions the PAM sequence of the PAMmer such that the target nucleic acid can still be bound and/or cleaved by a subject Cas9 polypeptide (Cas9 heterodimer).
METHODS
[00504] A Cas9 heterodimer of the present disclosure finds use in a variety of methods. A subject Cas9 heterodimer (split Cas9) can be used in any method that a Cas9 protein can be used. For example, a Cas9 heterodimer (a split Cas9, in some cases a variant split Cas9 protein) can be used to (i) modify (e.g., cleave, e.g., nick; methylate; etc.) target nucleic acid (DNA or RNA; single stranded or double stranded); (ii) modulate transcription of a target nucleic acid; (iii) label
a target nucleic acid (e.g., using a PAMmer); (iv) bind a target nucleic acid (e.g., for purposes of isolation, labeling, imaging, tracking, etc); (v) modify a polypeptide (e.g., a histone) associated with a target nucleic acid; and the like. Because a method that uses a Cas9 heterodimer includes binding of the heterodimer to a particular region in a target nucleic acid (by virtue of being targeted there by an associated Cas9 guide RNA), the methods are generally referred to herein as methods of binding (e.g., a method of binding a target nucleic acid). However, it is to be understood that in some cases, while a method of binding may result in nothing more than binding of the target nucleic acid, in other cases, the method can have different final results (e.g., the method can result in modification of the target nucleic acid, e.g., cleavage/methylation/etc, modulation of transcription from the target nucleic acid, modulation of translation of the target nucleic acid, genome editing, modulation of a protein associated with the target nucleic acid, isolation of the target nucleic acid, etc.). For examples of suitable methods, Cas9 variants, guide RNAs, etc., see, for example, Jinek et al., Science. 2012 Aug 17;337(6096):816-21 ; Chylinski et al., RNA Biol. 2013 May;10(5):726-37; Ma et al., Biomed Res Int. 2013;2013:270805; Hou et al., Proc Natl Acad Sci U S A. 2013 Sep 24; 110(39): 15644-9; Jinek et al., Elife. 2013;2:e00471 ; Pattanayak et al., Nat Biotechnol. 2013 Sep;31(9):839-43; Qi et al, Cell. 2013 Feb
28;152(5): 1173-83; Wang et al., Cell. 2013 May 9;153(4):910-8; Auer et. al., Genome Res. 2013 Oct 31; Chen et. al., Nucleic Acids Res. 2013 Nov l ;41(20):el9; Cheng et. al., Cell Res. 2013 Oct;23(10): 1163-71 ; Cho et. al., Genetics. 2013 Nov;195(3): 1177-80; DiCarlo et al., Nucleic Acids Res. 2013 Apr;41(7):4336-43; Dickinson et. al., Nat Methods. 2013 Oct;10(10): 1028-34; Ebina et. al., Sci Rep. 2013;3:2510; Fujii et. al, Nucleic Acids Res. 2013 Nov l ;41(20):el87; Hu et. al., Cell Res. 2013 Nov;23(l l): 1322-5; Jiang et. al, Nucleic Acids Res. 2013 Nov l ;41(20):el88; Larson et. al, Nat Protoc. 2013 Nov;8(l l):2180-96; Mali et. at., Nat Methods. 2013 Oct;10(10):957-63; Nakayama et. al., Genesis. 2013 Dec;51(12):835-43; Ran et. al., Nat Protoc. 2013 Nov;8(l l):2281-308; Ran et. al., Cell. 2013 Sep 12;154(6): 1380-9; Upadhyay et. al., G3 (Bethesda). 2013 Dec 9;3(12):2233-8; Walsh et. al., Proc Natl Acad Sci U S A. 2013 Sep 24;110(39): 15514-5; Xie et. al., Mol Plant. 2013 Oct 9; Yang et. al., Cell. 2013 Sep
12;154(6): 1370-9; and U.S. patents and patent applications: 8,906,616; 8,895,308; 8,889,418; 8,889,356; 8,871,445; 8,865,406; 8,795,965; 8,771,945; 8,697,359; 20140068797;
20140170753; 20140179006; 20140179770; 20140186843; 20140186919; 20140186958;
20140189896; 20140227787; 20140234972; 20140242664; 20140242699; 20140242700;
20140242702; 20140248702; 20140256046; 20140273037; 20140273226; 20140273230;
20140273231 ; 20140273232; 20140273233; 20140273234; 20140273235; 20140287938;
20140295556; 20140295557; 20140298547; 20140304853; 20140309487; 20140310828;
20140310830; 20140315985; 20140335063; 20140335620; 20140342456; 20140342457;
20140342458; 20140349400; 20140349405; 20140356867; 20140356956; 20140356958;
20140356959; 20140357523; 20140357530; 20140364333; and 20140377868; all of which are hereby incorporated by reference in their entirety.
[00505] For example, the present disclosure provides (but is not limited to) methods of cleaving a target nucleic acid; methods of editing a target nucleic acid; methods of modulating transcription from a target nucleic acid; methods of isolating a target nucleic acid, methods of binding a target nucleic acid, methods of imaging a target nucleic acid, methods of modifying a target nucleic acid, and the like.
[00506] As used herein, the terms/phrases "contact a target nucleic acid" and "contacting a target
nucleic acid", for example, with a split Cas9 (a Cas9 heterodimer), with a subject system, etc. encompass all methods for contacting the target nucleic acid. For example, a Cas9 heterodimer can be provided as protein, RNA (encoding the split Cas9), or DNA (encoding the split Cas9); while a Cas9 guide RNA can be provided as a guide RNA or as a nucleic acid encoding the guide RNA. As such, when, for example, performing a method in a cell (e.g., inside of a cell in vitro, inside of a cell in vivo, inside of a cell ex vivo), a method that includes contacting the target nucleic acid encompasses the introduction into the cell of any or all of the components in their active/final state (e.g., in the form of a protein(s) for split Cas9, in the form of an RNA in some cases for the guide RNA), and also encompasses the introduction into the cell of one or more nucleic acids encoding one or more of the components (e.g., nucleic acid(s) having nucleotide sequence(s) encoding split Cas9 protein(s), nucleic acid(s) having nucleotide sequence(s) encoding guide RNA(s), and the like). Because the methods can also be performed in vitro outside of a cell, a method that includes contacting a target nucleic acid, (unless otherwise specified) encompasses contacting outside of a cell in vitro, inside of a cell in vitro, inside of a cell in vivo, inside of a cell ex vivo, etc.
[00507] In some cases, a subject method is a method that includes contacting a target nucleic acid with a subject Cas9 heterodimer. In some cases, a subject method includes contacting a target nucleic acid with a Cas9 heterodimer and a Cas9 guide RNA (e.g., a truncated Cas9 guide RNA, e.g., not having stem loops 2 or 3). In some cases, a subject method includes contacting a target nucleic acid with a Cas9 heterodimer and a Cas9 guide RNA (e.g., a truncated guide RNA, e.g., not having stem loops 2 or 3) and a dimerizer (e.g., light, a dimerizing agent, etc.). In some cases, a method is a method of contacting a target nucleic acid with a system. In some cases, the system can include: (i) a subject Cas9 heterodimer and a Cas9 guide RNA; (ii) a subject Cas9 heterodimer and a Cas9 guide RNA and a dimerizer; or (iii) a subject Cas9 heterodimer and a
Cas9 guide RNA and a dimerizer and a donor polynucleotide.
[00508] Target nucleic acids and target cells of interest
[00509] A target nucleic acid can be any nucleic acid (e.g., DNA, RNA), can be double strand or single stranded, can be any type of nucleic acid (e.g., a chromosome, derived from a chromosome, chromosomal, plasmid, viral, extracellular, intracellular, mitochondrial, chloroplast, linear, circular, etc.) and can be from any organism (e.g., as long as the Cas9 guide RNA can hybridize to a target sequence in a target nucleic acid, that target nucleic acid can be targeted).
[00510] A target nucleic acid can be DNA or RNA. A target nucleic acid can be double stranded (e.g., dsDNA, dsRNA) or single stranded (e.g., ssRNA, ssDNA). In some cases, a target nucleic acid is single stranded . In some cases, a target nucleic acid is a single stranded RNA (ssRNA). In some cases, a target ssRNA (e.g., a target cell ssRNA, a viral ssRNA, etc.) is selected from: mRNA, rRNA, tRNA, non-coding RNA (ncRNA), long non-coding RNA (IncRNA), and microRNA (miRNA). In some cases, a target nucleic acid is a single stranded DNA (ssDNA) (e.g., a viral DNA). As noted above, in some cases, a target nucleic acid is single stranded. In some such cases, methods in which the target nucleic acid is single stranded, the method includes the use of a PAMmer (e.g., so that a PAM sequence is present at the target).
[00511] A target nucleic acid can be located anywhere, for example, outside of a cell in vitro, inside of a cell in vitro, inside of a cell in vivo, inside of a cell ex vivo. Suitable target cells (which can comprise target nucleic acids) include, but are not limited to: a bacterial cell; an archaeal cell; a cell of a single -cell eukaryotic organism; a plant cell; an algal cell, e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens C. Agardh, and the like; a fungal cell (e.g., a yeast cell); an animal cell; a cell from an invertebrate animal (e.g. fruit fly, cnidarian, echinoderm, nematode, etc.); a cell from a vertebrate animal (e.g., fish, amphibian, reptile, bird, mammal); a cell from a mammal (e.g., a cell from a rodent, a cell from a human, etc.); and the like. Any type of cell may be of interest (e.g. a stem cell, e.g. an embryonic stem (ES) cell, an induced pluripotent stem (iPS) cell, a germ cell (e.g., an oocyte, a sperm, an oogonia, a speratogonia, etc.), a somatic cell, e.g. a fibroblast, a hematopoietic cell, a neuron, a muscle cell, a bone cell, a hepatocyte, a pancreatic cell; an in vitro or in vivo embryonic cell of an embryo at any stage, e.g., a 1-cell, 2-cell, 4-cell, 8-cell, etc. stage zebrafish embryo; etc.). Cells may be from established cell lines or they may be primary cells, where "primary cells", "primary cell lines", and "primary cultures" are used
interchangeably herein to refer to cells and cells cultures that have been derived from a subject and allowed to grow in vitro for a limited number of passages, i.e. splittings, of the culture. For example, primary cultures are cultures that may have been passaged 0 times, 1 time, 2 times, 4 times, 5 times, 10 times, or 15 times, but not enough times go through the crisis stage.
Typically, the primary cell lines are maintained for fewer than 10 passages in vitro. Target cells can be unicellular organisms and/or can be grown in culture. If the cells are primary cells, they
may be harvest from an individual by any convenient method. For example, leukocytes may be conveniently harvested by apheresis, leukocytapheresis, density gradient separation, etc., while cells from tissues such as skin, muscle, bone marrow, spleen, liver, pancreas, lung, intestine, stomach, etc. can be conveniently harvested by biopsy.
[00512] In some of the above applications, the subject methods may be employed to induce target nucleic acid cleavage, target nucleic acid modification, and/or to bind target nucleic acids (e.g., for visualization, for collecting and/or analyzing, etc.) in mitotic or post-mitotic cells in vivo and/or ex vivo and/or in vitro (e.g., to disrupt production of a protein encoded by a targeted mRNA). Because the guide RNA provides specificity by hybridizing to target nucleic acid, a mitotic and/or post-mitotic cell of interest in the disclosed methods may include a cell from any organism (e.g. a bacterial cell, an archaeal cell, a cell of a single -cell eukaryotic organism, a plant cell, an algal cell, e.g., Botryococcus braunii, Chlamydomonas reinhardtii,
Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens C. Agardh, and the like, a fungal cell (e.g., a yeast cell), an animal cell, a cell from an invertebrate animal (e.g. fruit fly, cnidarian, echinoderm, nematode, etc.), a cell from a vertebrate animal (e.g., fish, amphibian, reptile, bird, mammal), a cell from a mammal, a cell from a rodent, a cell from a human, etc.). Introducing components into a target cell
[00513] A Cas9 guide RNA (or a nucleic acid comprising a nucleotide sequence encoding same), a PAMmer (or a nucleic acid comprising a nucleotide sequence encoding same), and/or a Cas9 heterodimer (or a nucleic acid comprising a nucleotide sequence encoding same) and/or a donor polynucleotide can be introduced into a host cell by any of a variety of well-known methods.
[00514] Methods of introducing a nucleic acid into a host cell are known in the art, and any known method can be used to introduce a nucleic acid (e.g., an expression construct) into a stem cell or progenitor cell. Suitable methods include, include e.g., viral or bacteriophage infection, transfection, conjugation, protoplast fusion, lipofection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEI) -mediated transfection, DEAE-dextran mediated transfection, liposome-mediated transfection, particle gun technology, calcium phosphate precipitation, direct micro injection, nanoparticle-mediated nucleic acid delivery (see, e.g., Panyam et., al Adv Drug Deliv Rev. 2012 Sep 13. pii: S0169-409X(12)00283-9. doi:
10.1016/j.addr.2012.09.023 ), and the like. Any or all of the components can be introduced into a cell as a composition (e.g., including any convenient combination of: a Cas9 heterodimer, a Cas9 guide RNA, a dimerizing agent, a donor polynucleotide, etc.) using known methods, e.g., such as nucleofection.
[00515] Cell synchronization
[00516] In some embodiments, a subject method includes a step of blocking a cell at a desired phase in the cell cycle (e.g., blocking a cell at S phase, blocking a cell at M phase, etc.), which can increase efficiency of Cas9 mediated methods (e.g., methods that include cleavage). In some cases, a subject method includes a step of contacting a target cell with a cell cycle blocking agent that blocks the target cell at a desired phase in the cell cycle. In some embodiments, a subject method includes a step of enriching a cell population for cells that are in a desired phase(s) of the cell cycle.
[00517] Thus, in some embodiments, subject methods include (i) the step of enriching a cell population for cells that are in a desired phase(s) of the cell cycle, and/or (ii) the step of blocking a cell at a desired phase in the cell cycle. The cell cycle is the series of events that take place in a cell leading to its division and duplication (replication) that produces two daughter cells. Two major phases of the cell cycle are the S phase (DNA synthesis phase), in which DNA duplication occurs, and the M phase (mitosis), in which the chromosomes segregation and cell division occurs. The eukaryotic cell cycle is traditionally divided into four sequential phases: Gl, S, G2, and M. Gl, S, and G2 together can collectively be referred to as "interphase". Under certain conditions, cells can delay progress through Gl and can enter a specialized resting state known as GO (G zero), in which they can remain for days, weeks, or even years before resuming proliferation. The period of transition from one state to another can be referred to using a hyphen, for example, Gl/S, G2/M, etc. As is known in the art, various checkpoints exist throughout the cell cycle at which a cell can monitor conditions to determine whether cell cycle progression should occur. For example, the G2/M DNA damage checkpoint serves to prevent cells from entering mitosis (M-phase) with genomic DNA damage.
[00518] A step of enriching a population of eukaryotic cells for cells in a desired phase of the cell cycle (e.g., Gl, S, G2, M, Gl/S, G2/M, GO, etc., or any combination thereof), and can be performed using any convenient method (e.g., a cell separation method and/or a cell synchronization method).
[00519] In some cases, a subject method includes a step of enriching a population of eukaryotic cells for cells in the GO phase of the cell cycle. For example, in some cases, a subject method includes: (a) enriching a population of eukaryotic cells for cells in the GO phase of the cell cycle; and (b) contacting the target DNA with a split-Cas9, a Cas9 guide RNA, and a dimerizing agent.
[00520] In some cases, a subject method includes a step of enriching a population of eukaryotic cells for cells in the Gl phase of the cell cycle. For example, in some cases, a subject method includes: (a) enriching a population of eukaryotic cells for cells in the Gl phase of the cell cycle; and (b) contacting the target DNA with a split-Cas9, a Cas9 guide RNA, and a dimerizing agent.
[00521] In some cases, a subject method includes a step of enriching a population of eukaryotic cells for cells in the G2 phase of the cell cycle. For example, in some cases, a subject method includes: (a) enriching a population of eukaryotic cells for cells in the G2 phase of the cell cycle; and (b) contacting the target DNA with a split-Cas9, a Cas9 guide RNA, and a dimerizing agent.
[00522] In some cases, a subject method includes a step of enriching a population of eukaryotic cells for cells in the S phase of the cell cycle. For example, in some cases, a subject method includes: (a) enriching a population of eukaryotic cells for cells in the S phase of the cell cycle; and (b) contacting the target DNA with a split-Cas9, a Cas9 guide RNA, and a dimerizing agent.
[00523] In some cases, a subject method includes a step of enriching a population of eukaryotic cells for cells in the M phase of the cell cycle. For example, in some cases, a subject method includes: (a) enriching a population of eukaryotic cells for cells in the M phase of the cell cycle; and (b) contacting the target DNA with a split-Cas9, a Cas9 guide RNA, and a dimerizing agent.
[00524] In some cases, a subject method includes a step of enriching a population of eukaryotic cells for cells in the Gl/S transition of the cell cycle. For example, in some cases, a subject method includes: (a) enriching a population of eukaryotic cells for cells in the Gl/S transition of the cell cycle; and (b) contacting the target DNA with a Cas9 targeting complex (e.g., via introducing into the target eukaryotic cell(s) at least one component of a Cas9 targeting complex)(e.g., contacting the target DNA with a split-Cas9, a Cas9 guide RNA, and a dimerizing agent.
[00525] In some cases, a subject method includes a step of enriching a population of eukaryotic cells for cells in the G2/M transition of the cell cycle. For example, in some cases, a subject method includes: (a) enriching a population of eukaryotic cells for cells in the G2/M transition of the cell cycle; and (b) contacting the target DNA with a Cas9 targeting complex (e.g., via introducing into the target eukaryotic cell(s) at least one component of a Cas9 targeting complex)(e.g., contacting the target DNA with a split-Cas9, a Cas9 guide RNA, and a dimerizing agent.
[00526] By "enrich" is meant increasing the fraction of desired cells in the resulting cell population.
For example, in some cases, enriching includes selecting desirable cells (e.g., cells that are in the desired phase of the cell cycle) away from undesirable cells (e.g., cells that are not in the desired phase of the cell cycle), which can result in a smaller population of cells, but a greater fraction (i.e., higher percentage) of the cells of the resulting cell population will be desirable cells (e.g., cells that are in the desired phase of the cell cycle). Cell separation methods (described below) can be an example of this type of enrichment. In other cases, enriching includes converting undesirable cells (e.g., cells that are not in the desired phase of the cell cycle) into desirable cells (e.g., cells that are in the desired phase of the cell cycle), which can result in a similar size population of cells as the starting population, but a greater fraction of those cells will be
desirable cells (e.g., cells that are in the desired phase of the cell cycle). Cell synchronization methods (described below) can be an example of this type of enrichment. In some cases, enrichment can both change the overall size of the resulting cell population (compared to the size of the starting population) and increase the fraction of desirable cells. For example, multiple methods / techniques can be combined (e.g., to improve enrichment, to enrich for cells a more than one desired phase of the cell cycle, etc.).
[00527] In some cases, enriching includes a cell separation method. Any convenient cell separation method can be used to enrich for cells that are at various phases of the cell cycle. Suitable cell separation techniques for enrichment of cells at particular phases of the cell cycle include, but are not limited to: (i) mitotic shake-off (M-phase; mechanical separation on the basis of cell adhesion properties, e.g., adherent cells in the mitotic phase detach from the surface upon gentle shaking, tapping, or rinsing); (ii) Countercurrent centrifugal elutriation (CCE) (Gl, S, G2/M, and intermediate states; physical separation on the basis of cell size and density); and (iii) flow cytometry and cell sorting (e.g., GO, Gl, S, G2/M; physical separation based on specific intracellular, e.g., DNA, content) and cell surface and/or size properties).
[00528] Mitotic shake-off generally includes dislodgment of low adhesive, mitotic cells by agitation (see for example, Beyrouthy et. al., PLoS ONE 3, e3943 (2008); Schorl, C. & Sedivy, Methods 41, 143-150 (2007)). CCE generally includes the separation of cells according to their sedimentation velocity in a gravitational field where the liquid containing the cells is made to flow against the centrifugal force with the sedimentation rate of cells being proportional to their size (see for example, Grosse et. al., Prep Biochem Biotechnol. 2012;42(3):217-33; Banfalvi et. al., Nat. Protoc. 3, 663-673 (2008)). Flow cytometry methods generally include the
characterization of cells according to antibody and/or ligand and/or dye-mediated fluorescence and scattered light in a hydrodynamically focused stream of liquid with subsequent electrostatic, mechanical or fluidic switching sorting (see for example, Coquelle et. al., Biochem. Pharmacol. 72, 1396-1404 (2006); Juan et. al., Cytometry 49, 170-175 (2002)). For more information related to cell separation techniques, refer to, for example, Rosner et al., Nat Protoc. 2013 Mar;8(3):602-26.
[00529] In some cases, enriching includes a cell synchronization method (i.e., synchronizing the cells of a cell population). Cell synchronization is a process by which cells at different stages of the cell cycle within a cell population (i.e., a population of cells in which various individual cells are in different phases of the cycle) are brought into the same phase. Any convenient cell synchronization method can be used in the subject methods to enrich for cells that are at a desired phase(s) of the cell cycle. For example, cell synchronization can be achieved by blocking cells at a desired phase in the cell cycle, which allows the other cells to cycle until they reach the
blocked phase. For example, suitable methods of cell synchronization include, but are not limited to: (i) inhibition of DNA replication, DNA synthesis, and/or mitotic spindle formation (e.g., sometimes referred to herein as contacting a cell with a cell cycle blocking composition);
(ii) mitogen or growth factor withdrawal (GO, Gl, G0/G1 ; growth restriction-induced quiescence via, e.g., serum starvation and/or amino acid starvation); and (iii) density arrest (Gl ; cell-cell contact-induced activation of specific transcriptional programs) (see for example, Rosner et al., Nat Protoc. 2013 Mar;8(3):602-26 (e.g., see Table 1 of Rosner et al.), which is hereby incorporated by reference in its entirety, and see references cited therein).
[00530] Various methods for cell synchronization will be known to one of ordinary skill in the art and any convenient method can be used. For additional methods for cell synchronization (e.g., synchronization of plant cells), see, for example, Sharma, Methods in Cell Science, 1999, Volume 21, Issue 2-3, pp 73-78 ("Synchronization in plant cells— an introduction"); Dolezel et al, Methods in Cell Science, 1999, Volume 21, Issue 2-3, pp 95-107 ("Cell cycle
synchronization in plant root meristems"); Kumagai-Sano et al., Nat Protoc. 2006;l(6):2621-7; and Cools et al., The Plant Journal (2010) 64, 705-714; and Rosner et al, Nat Protoc. 2013 Mar;8(3):602-26; all of which are hereby incorporated by reference in their entirety.
[00531] Cell cycle blocking compositions
[00532] In some embodiments, a cell (or cells of a cell population), is blocked at a desired phase of the cell cycle (e.g., by contacting the cell with a cycle blocking composition). In some embodiments, cells of a cell population are synchronized (e.g., by contacting the cells with a cell cycle blocking composition). A cell cycle blocking composition can include one or more cell cycle blocking agents. The term "cell cycle blocking agent" is used herein to refer to an agent that blocks (e.g., reversibly blocks (pauses), irreversibly blocks) a cell at a particular point in the cell cycle such that the cell cannot proceed further. Suitable cell cycle blocking agents include reversible cell cycle blocking agents. Reversible cell cycle blocking agents do not render the cell permanently blocked. In other words, when reversible cell cycle blocking agent is removed from the cell medium, the cell is free to proceed through the cell cycle. Cell cycle blocking agents are sometimes referred to in the art as cell synchronization agents because when such agents contact a cell population (e.g., a population having cells that are at different stages of the cell cycle), the cells of the population become blocked at the same phase of the cell cycle, thus synchronizing the population of cells relative to that particular phase of the cell cycle. When the cell cycle blocking agent used is reversible, the cells can then be "released" from cell cycle block.
[00533] Suitable cell cycle blocking agents include, but are not limited to: nocodazole (G2, M, G2/M; inhibition of microtubule polymerization), colchicine (G2, M, G2/M; inhibition of microtubule polymerization); demecolcine (colcemid) (G2, M, G2/M; inhibition of microtubule
polymerization); hydroxyurea (Gl, S, Gl/S; inhibition of ribonucleotide reductase); aphidicolin (Gl, S, Gl/S; inhibition of DNA polymerase-a and DNA polymerase-δ); lovastatin (Gl ;
inhibition of HMG-CoA reductase / cholesterol synthesis and the proteasome); mimosine (Gl, S, Gl/S; inhibition of thymidine, nucleotide biosynthesis, inhibition of Ctf4/chromatin binding); thymidine (Gl, S, Gl/S; excess thymidine-induced feedback inhibition of DNA replication); latrunculin A (M; delays anaphase onset, actin polymerization inhibitor, disrupts interpolar microtubule stability); and latrunculin B (M; actin polymerization inhibitor).
[00534] Suitable cell cycle blocking agents can include any agent that has the same or similar function as the agents above (e.g., an agent that inhibits microtubule polymerization, an agent that inhibits ribonucleotide reductase, an agent that inhibits DNA polymerase-a and/or DNA polymerase-δ, an agent that inhibits HMG-CoA reductase and/or cholesterol synthesis, an agent that inhibits nucleotide biosynthesis, an agent that inhibits DNA replication, i.e., inhibit DNA synthesis, an agent that inhibits initiation of DNA replication, an agent that inhibits deoxycytosine synthesis, an agent that induces excess thymidine-induced feedback inhibition of DNA replication, and agent that disrupts interpolar microtubule stability, an agent that inhibits actin polymerization, and the like). Suitable agents that block Gl can include: staurosporine, dimethyl sulfoxide (DMSO), glycocorticosteroids, and/or mevalonate synthesis inhibitors. Suitable agents that block G2 phase can include CDK1 inhibitors e.g., RO-3306. Suitable agents that block M can include cytochalasin D.
[00535] In some cases, suitable cell cycle blocking agents include: cobtorin; dinitroaniline; benefin
(benluralin); butralin; dinitramine; ethalfluralin; oryzalin; pendimethalin; trifluralin; amiprophos- methyl; butamiphos dithiopyr; thiazopyr propyzamider-pronamide- tebutam DCPA (chlorthal- dimethyl); anisomycin; alpha amanitin; jasmonic acid; abscisic acid; menadione; cryptogeine; hydrogen peroxide; sodium permanganate; indomethacin; epoxomycin; lactacystein; icrf 193; olomoucine; roscovitine; bohemine; K252a; okadaic acid; endothal; caffeine; MG132; cycline dependent kinase inhibitors; and the like.
[00536] For more information regarding cell cycle blocking agents, see Merrill GF, Methods Cell Biol.
1998;57:229-49, which is hereby incorporated by reference in its entirety.
EXAMPLES
[00537] The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present invention, and are not intended to limit the scope of what the inventors regard as their invention nor are they intended to represent that the experiments below are all or the only experiments performed. Efforts have been made to ensure accuracy with respect to numbers used (e.g. amounts, temperature, etc.) but
some experimental errors and deviations should be accounted for. Unless indicated otherwise, parts are parts by weight, molecular weight is weight average molecular weight, temperature is in degrees Celsius, and pressure is at or near atmospheric. Standard abbreviations may be used, e.g., bp, base pair(s); kb, kilobase(s); pi, picoliter(s); s or sec, second(s); min, minute(s); h or hr, hour(s); aa, amino acid(s); kb, kilobase(s); bp, base pair(s); nt, nucleotide(s); i.m.,
intramuscular(ly); i.p., intraperitoneal(ly); s.c, subcutaneous(ly); and the like.
[00538] While the present invention has been described with reference to the specific embodiments thereof, it should be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the true spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation, material, composition of matter, process, process step or steps, to the objective, spirit and scope of the present invention. All such modifications are intended to be within the scope of the claims appended hereto.
Example 1
[00539] All amino acid positions listed below (unless otherwise denoted) are relative to 5. pyogenes
Cas9 (GenBank ESU85303; SEQ ID NO: 1545), but can readily be adapted to the corresponding positions/domains of any Cas9 protein.
RESULTS
Design of Split- Cas9 (Cas9 heterodimer)
[00540] Sites were selected for the N and C termini of the alpha-helical lobe and sites were also selected for the junction between the RuvCI and RuvCII polypeptides in the nuclease lobe (Figure 1A). G56 was selected as the N-terminus for the alpha-helical lobe due to its location in a poorly- conserved linker just before the arginine-rich bridge helix, which has been shown to be critical for Cas9 cleavage activity in human cells. The C-terminus of the alpha-helical lobe was selected to be at S714, located in a linker between the two lobes. For the nuclease lobe, RuvCI was maintained through E57 and RuvCII began with G729. RuvCI was connected to RuvCII with a linker (in this case glycine-serine-serine). This construct removes a short alpha-helix (S717- L727) to minimize the distance between the end of RuvCI and the beginning of RuvCII and allow for a short linker.
[00541] Figure 1. (A) Design of a split Cas9 protein (heterodimeric Cas9 complex, Cas9 heterodimer).
To generate the nuclease lobe, RuvCI was connected to RuvCII (and therefore to the polypeptide
RuvCII : HNH domain : RuvCIII : C-terminal PAM-interacting domain) by a GSS linker, and a short alpha-helix was removed to minimize the distance between the end of RuvCI and the beginning of RuvCII. The alpha-helical lobe was amino acids 56-714 as depicted. (B) Crystal structure of split Cas9 showing the nuclease lobe (residues 1-57, 729-1368) in blue and the
alpha-helical lobe (residues 58-717) in grey, with the sgRNA and DNA substrates colored in orange and red, respectively. The inset shows where new N and C termini were introduced to generate split-Cas9: the short alpha-helix colored red (residues 718-728) was removed, and residues 57 and 729 were connected with a short GGS linker (blue line). The N- and C-terminii of the nuclease lobe are the same as in the full-length Cas9; the new N- and C-terminii of the alpha-helical lobe are shown.
Split-Cas9 cleaves dsDNA in living cells
[00542] The individual lobes (alpha-helical lobe and nuclease lobe) of Split-Cas9 were overexpressed in and purified from Escherichia coli using similar protocols as used for full-length Spy Cas9. Both lobes were expressed and purified well (Figure 2). The lobes were tested alone and together and were compared to WT Cas9 (full length, "FL Cas9") for their ability to cleave a double-stranded DNA with a target sequence complementary to the single guide RNA (sgRNA). The lobes in isolation showed no detectable DNA cleavage activity over the course of an hour, while the reconstituted Split-Cas9 complex exhibited site-specific cleavage matching that of full-length Cas9, with an approximately 10-fold reduced rate of cleavage (Figure 2).
[00543] Figure 2. Expression and cleavage activity of Split-Cas9. lOx-His-MBP-tagged alpha-helical lobe and nuclease lobe were expressed individually in E. coli BL21 and purified. Cleavage assays were performed with about 1 nM 5' 32P-labeled double stranded DNA. Protein-guide RNA complexes (RNPs) were preformed by a 10-minute incubation at 37°C. The final concentration of guide NRA was 100 nM in all conditions. Full-length Spy Cas9 was used at 100 nM. 200 nM alpha-helical lobe was used for both "alpha-helical" and "Split-Cas9". 100 nM nuclease lobe was used for both "nuclease" and "Split-Cas9." Time points were taken at 0, 1, 2, 5, and 30 minutes and immediately quenched with formamide-EDTA buffer. Quenched samples were resolved with 10% urea-poly acrylamide gel electrophoresis (PAGE) and visualized using a phosphorimager.
sgRNA induces Split- Cas9 heterodimerization
[00544] A hypothesis was made that sgRNA serves to recruit the lobes of split Cas9 into an active complex. Negative stain electron microscopy of a mixture of the lobes in the absence of sgRNA gave particles that appeared to be isolated lobes and no formed complex, suggesting that the lobes are not capable of dimerizing on their own. The addition of sgRNA yielded particles with class averages that closely resembled those of full-length Cas9 with sgRNA bound. 3- dimensional reconstruction confirmed that in the presence of sgRNA Split-Cas9 takes on a conformation similar to that of full-length Cas9. The necessity of sgRNA for dimerization was confirmed with analytical size-exclusion chromatography and Forster resonance energy transfer (FRET) with Cy3 and Cy5-labeled lobes. The affinity of the lobes for sgRNA was examined
with filter-binding assays (see example 2 below, Figure 11). Full-length Cas9 has extremely high affinity for sgRNA, binding with a about 10 pM KD. Both lobes also have high affinity for sgRNA on their own: the alpha-helical lobe has a KD of about 2 nM, and the nuclease lobe has a KD of about 0.5 nM. These high-affinity interactions are consistent with the ability of the sgRNA to efficiently recruit the lobes into an active complex.
3 '-truncated sgRNAs do not activate Split- Cas9 for cleavage
[00545] The two 3' hairpins (stem loop 2 and stem loop 3) of sgRNA are not required for Cas9 DNA- cleavage activity in living cells or outside of cell in vitro, but they contribute to the RNA's interaction with the nuclease lobe. Truncating these hairpins (i.e., truncating the guide RNA such that it did not include stem loops 2 or 3) reduced the affinity of the nuclease lobe for sgRNA and eliminated the ability of the sgRNA to productively dimerize the two lobes (Figure 3). For example, sgRNAs missing the final hairpin (stem loop 3, nucleotides 103-124) and both 3' hairpins (stem loops 2 and 3, nucleotides 85-124) were generated and tested for their ability to guide Split-Cas9 dsDNA cleavage in vitro. Removal of the final hairpin severely reduced DNA cleavage by Split-Cas9, while removal of both hairpins led to no detectable cleavage activity. Full-length Cas9 was not significantly affected by either truncation. If these sgRNA truncations are unable to induce dimerization of Split-Cas9 but are still able to activate sequence-specific cleavage activity of the two lobes when the lobes are linked in the context of full-length Cas9, then truncated sgRNAs can be used in conjunction with dimerization pairs (e.g., small molecule- inducible dimerization pairs) on the two lobes to allow for temporal control over the activation of Split-Cas9.
[00546] Figure 3. DNA cleavage with truncated single guide RNA. Cleavage assays were performed as described above. The truncated single guide RNA (missing stem loops 2 and 3 as well as the poly-U termination sequence) was used. Note that little to no cleavage was observed when using the truncated single guide RNA with split Cas9 (the alpha-helical lobe plus the nuclease lobe), but efficient cleavage was observed when using full-length single guide RNA (including stem loops 2 and 3) with split Cas9. Also note that when using WT Cas9 (full-length), full-length and truncated sgRNA both efficiently cleaved target nucleic acid.
MATERIALS AND METHODS
[00547] Protein purification. The sequence encoding the alpha-helical lobe was polymerase chain reaction (PCR) amplified from a Spy Cas9 expression vector and restriction cloned into a pET- based expression vector to generate a fusion with an N-terminal decahistidine-mannose binding protein (His10-MBP) tag followed by a tobacco etch virus (TEV) protease cleavage site. The nuclease lobe was generated by around-the-horn PCR of a Spy Cas9 expression vector, resulting in a fusion with an N-terminal His10-maltose binding protein (MBP) tag followed by a tobacco
etch virus (TEV) protease cleavage site. The protein was expressed and purified as previously described for full-length His6-MBP tagged Spy Cas9, with the exception that the nuclease lobe was not subjected to ion exchange chromatography.
[00548] Cleavage assays. Cas9-sgRNA (single guide RNA), alpha-helical lobe-sgRNA, nuclease lobe- sgRNA, and Split-Cas9-sgRNA complexes were reconstituted before cleavage experiments by incubating for 10 min at 37 °C in reaction buffer (20 mM Tris-HCl, pH 7.5, 75 mM KC1, 5 mM MgC12, 1 mM dithiothreitol (DTT), 5% glycerol). Cleavage reactions were conducted at room temperature and contained ~1 nM 5 '-radiolabeled 55 bp DNA containing a target sequence and a PAM site and 100 nM of the appropriate protein-sgRNA complex, with the exception of the alpha-helical lobe, which was maintained at 200 nM for both its reactions alone and when part of the Split-Cas9 complex. Aliquots were removed at each time point and quenched by the addition of RNA gel-loading buffer (95% deionized formamide, 0.025% (w/v) bromophenol blue, 0.025% (w/v) xylene cyanol, 50 mM EDTA (pH 8.0), 0.025% (w/v) sodium dodecyl sulfate (SDS)). Samples were boiled for 10 min at 95 °C before being resolved by 10% denaturing PAGE. Reaction products were visualized by phosphorimaging and quantified with ImageQuant (GE Healthcare).
[00549] DNA and sgRNA binding assays. Electromobility shift assays were performed by incubating -0.2 nM 5 '-radiolabeled target DNA with increasing concentrations of full-length Cas9, alpha- helical lobe, nuclease lobe, and both lobes in a one-to-one ratio, with sgRNA held in excess of the maximum protein concentration (500 nM for full-length Cas9 and 5 μΜ for the lobes and Split-Cas9). Binding reactions were incubated at room temperature for one hour in a reaction buffer containing 20 mM Tris, pH 7.5, 100 mM KC1, 5 mM MgCl2, 1 mM DTT, 5% glycerol, 0.01% Tween-20, 100 μg/mL bovine serum albumin (BSA), and 50 μg/mL heparin and resolved with 10% native PAGE with 5 mM MgCl2 at 4°C. Reaction products were visualized by phosphorimaging and quantified with ImageQuant.
[00550] Filter-binding assays were performed by incubating -0.02 nM 5 'radiolabeled sgRNA with increasing concentrations of full-length Cas9, alpha-helical lobe, nuclease lobe, and both lobes in a one-to-one ratio at room temperature for one hour in a reaction buffer containing 20 mM Tris, pH 7.5, 100 mM KC1, 5 mM MgCl2, 1 mM DTT, 5% glycerol, 0.01% Igepal, 10 μg/mL yeast tRNA, and 50 μg/mL BSA. Samples were drawn through a dot blot with HT Tuffryn polysulfone membrane, Protran nitrocellulose membrane, and Hybond N+ nylon membrane. The membranes were visualized by phophorimaging and quantified with ImageQuant.
[00551] Figure 4. (A) Two different views of the crystal structure of Cas9. The red positions (N- terminus of the nuclease lobe) and the green positions (C-terminus of the alpha helical lobe) represent the insertion sites (e.g., for a dimerization pair) for nuclease lobe circular permutant 1.
[Nuclease lobe (blue): Red = Nuclease lobe N-terminal insertion site (e.g., Dimerization Domain : Linker : Cas9 amino acids 718-1368 : 20-30 residue linker : Cas9 amino acids 1-56). Purple = Nuclease lobe C-terminus] [Alpha helical lobe (gray): Green = Alpha helical lobe C-terminal insertion site (e.g., Cas9 amino acids 56-717 : Linker : Dimerization Domain). Yellow = Alpha helical lobe N-terminus]. (B) Two different views of the crystal structure of Cas9. The red positions (beginning and end of loop of nuclease lobe) and the green position (C-terminus of the alpha helical lobe) represent the insertion sites (e.g., for a dimerization pair: one red position and the green position) for nuclease lobe circular permutant 5. [Nuclease lobe (blue): Red = Nuclease lobe Beginning and end of loop insertion sites (e.g., Cas9 amino acids 1016-1368 : 20-30 residue linker : Cas9 amino acids 1-57 : GSS linker : Cas9 amino acids 728-1015 : linker : Dimerization Domain)(or e.g., Dimerization Domain : linker : Cas9 amino acids 1016-1368 : 20-30 residue linker : Cas9 amino acids 1-57 : GSS linker : Cas9 amino acids 728-1015)] [Alpha helical lobe (gray): Green = Alpha helical lobe C-terminal insertion site (e.g., Cas9 amino acids 56-717 : Linker : Dimerization Domain). Yellow = Alpha helical lobe N-terminus]. While the exact amino acid positions above are suitable, the exact positions are not necessarily required, as long as the functional part of each region is maintained. As an illustrative example, when "Cas9 amino acids 1-57" is recited above, a region having amino acids 3-57, 5-57, 1-55, or 3-55, etc. can also be suitable.
Example 2
[00552] Cas9, an RNA-guided DNA endonuclease found in clustered regularly interspaced short
palindromic repeats (CRISPR) bacterial immune systems, is a versatile tool for genome editing, transcriptional regulation and cellular imaging applications. Structures of Streptococcus pyogenes Cas9 alone or bound to single -guide RNA (sgRNA) and target DNA revealed a bi- lobed protein architecture that undergoes major conformational changes upon guide RNA and DNA binding. To investigate the molecular determinants and relevance of the inter-lobe rearrangement for target recognition and cleavage, a split Cas9 enzyme was designed in which the nuclease lobe and a-helical lobe are expressed as separate polypeptides. Although the lobes do not interact on their own, the sgRNA recruited them into a ternary complex that recapitulates the activity of full-length Cas9 and catalyzes site-specific DNA cleavage. The use of a modified sgRNA abrogated split-Cas9 activity by preventing dimerization, allowing for the development of an inducible dimerization system. Split-Cas9 can act as a highly regulatable platform for genome engineering applications.
[00553] The following experiments show that two distinct Cas9 polypeptides encompassing the oc- helical and nuclease lobes can be stably expressed and purified. Filter binding and negative-stain EM experiments demonstrated that the split-Cas9 assembles with sgRNA into a ternary complex
resembling that of full-length Cas9-RNA. Furthermore, DNA cleavage assays revealed that the enzymatic activity of split-Cas9 closely mimics that of wild-type (WT) Cas9. Split-Cas9 is functional for genome editing in human cells with full-length sgRNAs, but can be inactivated with shortened sgRNAs that give rise to destabilized complexes. Together these data show how the Cas9 protein can be re -engineered as a split enzyme whose assembly and function is regulatable by the sgRNA, providing a new platform for controlled use of Cas9 for genome engineering applications in cells.
MATERIALS AND METHODS
[00554] Cloning and protein purification. The expression vector for purification of the nuclease lobe was generated by Around the Horn (ATH) polymerase chain reaction (PCR) using a pre-existing pET-based expression vector for 5. pyogenes Cas9. The final construct encodes an N-terminal decahistidine-maltose binding protein (His10-MBP) tag, a tobacco etch virus (TEV) protease cleavage site, residues 1-57, a glycine-serine-serine linker, and residues 729-1368. The vector for the catalytically inactive dNuclease lobe was generated by ATH PCR of a similar dCas9 (D10A/H840A) vector. The vector for expression of the oc-helical lobe was generated by PCR amplification of 5. pyogenes Cas9 residues 56-714 and assembly of the resulting fragment into a His10-MBP expression vector via ligation-independent cloning.
[00555] Each protein was over-expressed in E. coli BL21 Rosetta 2(DE3) (EMD Biosciences) by
growing in 2xYT medium at 37 °C to an optical density of 0.5, inducing with 0.5 mM IPTG, and growing an additional 16 hours at 18°C. Cells were lysed by sonication in a buffer containing 50 mM Tris pH 7.5, 500 mM NaCl, 1 mM TCEP, 5% glycerol, and a protease inhibitor cocktail (Roche). The clarified lysate was bound in batch to Ni-NTA agarose (Qiagen). The resin was washed extensively with 20 mM Tris pH 7.5, 500 mM NaCl, 1 mM TCEP, 10 mM imidazole, 5% glycerol; and the bound protein was eluted in 20 mM Tris pH 7.5, 500 mM NaCl, 1 mM TCEP, 300 mM imidazole, 5% glycerol. The His10-MBP affinity tag was removed with His6- tagged TEV protease during overnight dialysis against 20 mM Tris pH 7.5, 500 mM NaCl, 1 mM TCEP, 5% glycerol. The protein was then flowed over Ni-NTA agarose to remove TEV protease and the cleaved affinity tag.
[00556] The oc-helical lobe was dialyzed for 2 h against 20 mM Tris pH 7.5, 125 mM KC1, 1 mM
TCEP, 5% glycerol, and purified on a 5 mL HiTrap SP Sepharose column (GE Healthcare), with elution over a linear gradient from 125 mM - 1 M KC1. The oc-helical lobe was further purified by size exclusion chromatography on a Superdex 200 16/60 column (GE Healthcare) in 20 mM Tris pH 7.5, 200 mM KC1, 1 mM TCEP, 5% glycerol. The nuclease lobe was purified via size exclusion chromatography immediately following the ortho-Ni-NTA step. The dNuclease lobe
was purified as described for the nuclease lobe, except the size exclusion chromatography was performed with 20 mM Tris pH 7.5, 500 mM KC1, 1 mM TCEP, 5% glycerol. Full-length Cas9 was purified as previously described (8).
[00557] In vitro transcription of sgRNA. Linearized plasmid DNA was used as a template for in vitro transcription of full-length, Ahairpinsl-2, and Ahairpin2 λΐ sgRNA. The appropriate region of the sgRNA, along with a T7 RNA polymerase promoter sequence, was PCR-amplified and restriction-cloned into EcoRI/BamHI sites of a pUC19 vector, and the resulting vector was digested with BamHI to enable run-off transcription. The Ahairpinl and Aspacer-nexus λΐ sgRNA, as well as λΐ crRNA and tracrRNA, were transcribed from a single-stranded DNA template with an annealed T7 promoter oligonucleotide. The DNA template for EMX1 sgRNA was produced by overlapping PCR as previously described (13).
[00558] Transcription reactions (1 mL) were conducted in buffer containing 50 mM Tris, pH 8.1, 25 mM MgCl2, 0.01% Triton X-100, 2 mM spermidine, and 10 mM DTT, along with 5 mM each of ATP, GTP, CTP, and UTP, 100 μg/mL T7 polymerase, and approximately 1 μΜ DNA template. Reactions were incubated at 37 °C overnight and subsequently treated with 5 units DNase (Promega) for 1 hour. Reactions were then quenched with 800 of 95% formamide, 0.05% bromophenol blue, and 20 mM EDTA, and loaded onto a 7M urea 10% polyacrylamide gel. The appropriate band was excised, and the RNA was eluted from the gel overnight in DEPC-treated water. The sgRNA was ethanol-precipitated and resuspended in DEPC-treated water.
Concentrations were determined by A260nm using a NanoDrop (Thermo Scientific). For filter binding assays, the sgRNA was dephosphorylated with calf intestinal phosphatase (New England Biolabs) prior to radiolabeling with T4 polynucleotide kinase (New England Biolabs) and γ- [32P]-ATP (Perkin Elmer). The radiolabeled sgRNA was gel purified as described above.
[00559] Split-Cas9 complex reconstitution. Split-Cas9 complexes were reconstituted prior to cleavage and binding assays at 37 °C for 10 minutes in a buffer containing 20 mM Tris pH 7.5, 100 mM KC1, 5 mM MgCl2, 1 mM DTT, and 5% glycerol. For binding assays, reactions containing equimolar amounts of the dNuclease lobe, the oc-helical lobe, and in vitro transcribed sgRNA. For cleavage assays, reactions contained equimolar amounts of the nuclease lobe and sgRNA (or crRNA:tracrRNA),with a two-fold molar excess of the oc-helical lobe.
[00560] DNA cleavage assays. All cleavage assays were performed in IX Cleavage Buffer, which contained 20 mM Tris pH 7.5, 100 mM KC1, 5 mM MgCl2, 1 mM DTT, and 5% glycerol.
Preformed complexes were diluted in Cleavage Buffer, and reactions were initiated with the addition of radiolabeled dsDNA substrates. Final reaction concentrations were 100 nM protein:RNA complex and ~1 nM radiolabeled DNA target. The concentration of Cas9 was
chosen to be sufficiently above the Kd for the sgRNA such that complex assembly is unlikely to be rate imiting, except in the case of split-Cas9 and the Ahairpinsl-2 sgRNA (Kd >100 nM). Reactions proceeded at room temperature, and aliquots were removed at selected time points and quenched with an equal volume of buffer containing 50 mM EDTA, 0.02% bromophenol blue, and 90% formamide. Reaction products were resolved by 7M urea-PAGE, gels were dried, and DNA was visualized by phosphorimaging and quantified using ImageQuant software (GE Healthcare). The percentage of DNA cleaved was determined by dividing the amount of cleaved DNA by the sum of uncleaved and cleaved DNA. Kinetic analysis was performed using Prism (GraphPad Software). Reported observed rate constants (kobs) are the average of three independent experiments ± standard error of the mean. Graphed values are the averaged timepoints of three independent experiments with error bars representing the standard deviation.
[00561] Electrophoretic mobility shift assays (EMSAs). All binding assays were performed in IX Binding Buffer, which contains 20 mM Tris pH 7.5, 100 mM KC1, 5 mM MgCl2, 1 mM DTT, 5% glycerol, 50 g/mL heparin, 0.01 % Tween-20, 100 g/mL BSA. Preformed complexes were diluted into IX Binding Buffer, after which radiolabeled dsDNA substrates were added to a final concentration of <0.2 nM. Reactions were incubated at room temperature for 60 min and then resolved at 4 °C on a native 8% polyacrylamide gel containing 0.5X TBE and 5 mM MgCl2. Gels were dried and DNA was visualized by phosphorimaging and quantified using ImageQuant software (GE Healthcare). The fraction of DNA bound (amount of bound DNA divided by the sum of free and bound DNA) was plotted versus concentration of protein and fit to a binding isotherm using Prism (GraphPad Software).
[00562] Filter-binding assays. All filter binding assays were performed in IX RNA Binding Buffer, which contains 20 mM Tris pH7.5, 100 mM KC1, 5 mM MgCl2, 1 mM DTT, 5% glycerol, 0.01 % Igepal CA-630, 10 μg/mL yeast tRNA, and 10 μg/mL BSA. WT Cas9, a-helical lobe, nuclease lobe, or an equimolar mix of both lobes (split-Cas9) were incubated with <0.02 nM radiolabeled sgRNA for 60 min at room temperature. Tufryn (Pall Corporation), Protran (Whatman), and Hybond-N+ (GE Healthcare) membranes were soaked in buffer containing 20 mM Tris pH7.5, 100 mM KC1, 5 mM MgCl2, 1 mM DTT, 5% glycerol, and arranged on a dot blot apparatus. Binding reactions were separated through the membranes by the application of vacuum, and after drying, the membranes were visualized by phosphorimaging and quantified using ImageQuant Software (GE Healthcare). The fraction of sgRNA bound was plotted versus the concentration of protein and fit to a binding isotherm using Prism (GraphPad Software). Reported Kd values are the average of three independent experiments, and errors represent the standard error of the mean.
[00563] Negative-stain electron microscopy and image processing. Negatively-stained samples of the a-helical lobe, nuclease lobe, and split-Cas9 (a-helical lobe and nuclease lobe) were prepared and imaged with and without sgRNA as described previously (4). Data were acquired using a Tecnai F20 Twin transmission electron microscope operated at 120 keV at a nominal magnification of 80,000x (1.45 A at the specimen level) using low-dose exposures (-20 e~A~2) with a randomly set defocus ranging from -0.7 to -1.6 μπι. A total of 150-200 images of each Cas9 sample was automatically recorded on a Gatan 4k x 4k CCD camera using the MSI-Raster application within LEGINON (22). Low-resolution negative stain class averages of Lid particles from the yeast 26S proteasome (23) were used as references for template -based particle picking. The Lid complex was used as a template to avoid selection bias because it bears minimal to no structural resemblance to Cas9. Cas9 complexes were extracted using a 224 x 224-pixel box size. These particles were subjected to 2D reference-free alignment and classification using multivariate statistical analysis and multi-reference alignment in IMAGIC (24).
[00564] Cas9 and split-Cas9 RNP assembly and nucleofection. The split-Cas9 RNP was prepared immediately before the experiment by incubating both lobes with sgRNA at molar ratios of 1.2: 1 : 1.2 (a-helical lobe:nuclease lobe:sgRNA) for 10 min at 37 °C in 20 mM HEPES pH 7.5, 150 mM KC1, 1 mM MgCl2, 10% glycerol, and 1 mM TCEP. The nucleofections were carried out as previously described for Cas9, using 10, 30, and 100 pmol of RNP complex for approximately 2x10s cells (13). Where indicated, cells were synchronized with 200 ng/mL nocodazole for 17 hr prior to nucleofection. Neither WT Cas9 nor the split-Cas9 lobes had nuclear localization signals, which may have led to reduced editing levels, particularly for the unsynchronized cells.
[00565] Analysis of in-cell genome editing efficiency. Determination of the percentage of indels
induced at the target region was performed as previously described (13). In brief, 640-nt regions of the EMX1 locus containing the target sites were PCR amplified, and the resulting products were denatured, re-annealed, and digested with T7 endonuclease I (New England Biolabs), which cleaves mismatched heteroduplex DNA (25). The products were resolved on a 2% agarose gel containing SYBR Gold (Life Technologies), and band intensities were determined using Image Lab (Bio-Rad Laboratories). Editing efficiencies was determined using the formula (1 - (l - (b + c / a + b +c))l/2) x 100, , where "a" is the band intensity of DNA substrate and "b" and "c" are the cleavage products (26).
RESULTS
Design and functional validation of split- Cas9
[00566] The nuclease lobe of Cas9 includes the RuvC and HNH nuclease domains, as well as a C- terminal domain that is involved in PAM recognition (Figure 10A) (8-10). The RuvC domain comprises three distinct motifs: motifs II and III are interrupted by the HNH domain, and motifs I and II are interrupted by a large lobe composed entirely of oc-helices. This a-helical lobe, also referred to as the recognition (REC) lobe (9), forms a broad cleft that makes extensive contacts with the sgRNA and target DNA.
[00567] Using available crystal structures as a guide, a split-Cas9 was designed in which the native structure of both lobes was kept as intact as possible (Figure 10A). In particular, rather than simply split the full-length Cas9 sequence internally at a single junction, the nuclease lobe was constructed by directly linking the N-terminal RuvCI motif to the remainder of the nuclease lobe located -650 amino acids away in primary sequence, with the intervening polypeptide comprising the α-helical lobe. Two crossover points between the lobes occur at residues -56 and -720 (Figure 10B): the C-terminal connection is disordered in both apo-Cas9 and sgRNA/DNA- bound structures, and the N-terminal connection occurs between the RuvCI motif and the bridge helix. Residue E57 from RuvCI was connected with residue G729 from RuvCII using a three- amino acid linker, and removed a short, poorly conserved a-helix from the RuvCII motif that does not appear to play an important structural role in the sgRNA/DNA-bound state (Figure 10B). The α-helical lobe spans residues G56-S714, with the N-terminus encompassing the entirety of the bridge helix.
[00568] To determine whether the lobes could function as separate polypeptides, both lobes in
Escherichia coli were separately over-expressed and purified by affinity and size-exclusion chromatography (Figure IOC and Figure 13). Whether split-Cas9 (a-helical lobe plus nuclease lobe) would recapitulate the activity of WT Cas9 was investigated using a standard cleavage assay with sgRNA and a radiolabeled double-stranded DNA (dsDNA) target (Figure 10D). No cleavage was observed with either lobe individually, but the reconstituted split-Cas9 enzyme complex exhibited robust target DNA cleavage (Figure 10D and Figure 14). Split-Cas9 maintained the same site and pattern of cleavage as WT Cas9, including the "trimming" of the non-target strand observed previously (4), and functioned equally well with a dual guide RNA composed of crRNA and tracrRNA (Figure 14). In addition, split-Cas9 activity was dependent on complementarity between the sgRNA and target DNA as well as the presence of a 5'-NGG-3' PAM (Figure 14).
[00569] Figure 10. Cas9 can be split into two separate polypeptides that retain the ability to catalyze RNA-guided dsDNA cleavage. (A) Domain organization of WT Cas9 (top) and split-Cas9 (bottom), composed of the a-helical lobe and nuclease lobe. Domain junctions are numbered according to Nishimasu et al. (9). BH, bridge helix; REC, recognition lobe; PI, PAM-interacting. The PI domain can be further subdivided into Topo-homology and C-terminal domains (8). (B) Crystal structures of sgRNA/DNA -bound Cas9 (PDB ID: 4008) (9) colored according to domain (left) or by lobe (right), with the α-helical and nuclease lobes depicted in grey and blue, respectively. Nucleic acids are omitted for clarity. In the observed interface between the lobes (inset, left), the dashed line represents a disordered linker spanning residues V713-D718. In the engineered interface (inset, right), the dashed line represents a GGS linker connecting E57 to G729, and new N- and C-termini of the α-helical lobe are shown. (Q Sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) analysis of purified WT Cas9 (159 kDa), the a-helical lobe (77 kDa), and the nuclease lobe (81 kDa). The gel was stained with Coomassie Brilliant Blue. (D) DNA cleavage assay with the indicated Cas9 construct, analyzed by denaturing PAGE. Reactions contained ~1 nM radiolabeled dsDNA and 100 nM protein-sgRNA complex; split-Cas9 contained a two-fold molar excess of the α-helical lobe. Quantified data and kinetic analysis can be found in Figure 14 and Table SI.
[00570] Figure 13. Size exclusion chromatograms of purified α-helical (top) and nuclease (bottom) lobes. Following cleavage of the affinity tag by TEV protease and further clean-up using ortho- Ni-NTA and ion exchange columns (a-helical lobe) or an ortho-Ni-NTA column alone (nuclease lobe), the polypeptides were concentrated and injected onto a HiLoad 16/60 Superdex 200 gel filtration column. The a-helical (77 kDa) and nuclease lobes (81 kDa) eluted at 88.4 mL and 76.8 mL, respectively. Both polypeptides were soluble and exhibited consistent activity across multiple rounds of freeze-thawing.
[00571] Figure 14. Split-Cas9 activity is mediated by single-guide and dual-guide RNAs, and requires RNA:DNA complementarity and a PAM. (A) DNA cleavage time courses using a single-guide RNA and WT Cas9, individual α-helical and nuclease lobes, or split-Cas9. Values for WT and split-Cas9 were averaged from three independent experiments, and error bars represent the standard deviation. Rate constants can be found in Figure 20B (Table SI). (B) DNA cleavage time courses using a dual-guide RNA (crRNA:tracrRNA hybrid) and WT Cas9 or split-Cas9. Data are presented as in A. (Q DNA cleavage assay with split-Cas9 and DNA substrates containing a mismatched target or mutated PAM, analyzed by denaturing PAGE. Reactions contained ~1 nM radiolabeled dsDNA and 100 nM Cas9-sgRNA complex.
[00572] When the kinetics of DNA cleavage under pseudo-first order conditions was investigated using excess enzyme, split-Cas9 was ~ 10-fold slower than WT, though it reached the same endpoint after 5 minutes (Figure 14 and Figure 20B (Table SI)). This may result from slower kinetics of protein-RNA complex formation, a reduced rate of dsDNA recognition and unwinding, or a minor defect in nuclease domain activation. DNA binding experiments using nuclease-inactive split-dCas9 (D10A/H840A mutations) revealed a significantly weaker affinity of split-Cas9 for target DNA than WT Cas9 (Figure 15), suggesting that slower kinetics of dsDNA binding likely limit the observed rate of cleavage. Collectively, these results demonstrate that the enzymatic activity of WT Cas9 does not require a direct linkage between the a-helical and nuclease lobes, though their physical connection within RNA -protein complexes increases the affinity for the target DNA substrate. Remarkably, while previous work shows that the RNA-induced large- scale rearrangement of both lobes is necessary for WT Cas9 to achieve an active conformation (8), the experiments reveal that the sgRNA is entirely sufficient to recruit and dimerize the separate lobes into an active enzyme complex. Furthermore, communication through the sgRNA enables PAM recognition, dsDNA unwinding, and DNA cleavage, despite the absence of extensive protein-protein interactions between the lobes.
[00573] Figure 15. Split-Cas9 exhibits substantially weaker binding affinity for target DNA than WT Cas9. (A) Radiolabeled target dsDNA was incubated with increasing concentrations of Cas9- sgRNA complexes using catalytically inactive mutants of WT Cas9 and the nuclease lobe, and reaction products were resolved by native PAGE. The distinct Cas9 constructs in each titration are indicated (top). (B) Quantified binding data from A. Split-dCas9-RNA binds dsDNA with an apparent equilibrium dissociation constant of -700 nM, which is more than 3 orders of magnitude greater than that determined for dCas9-RNA (Kd ~ 0.2 nM). However, the apparent affinity measured here is likely to be much weaker than the actual affinity, since the low split- dCas9-sgRNA concentrations that were tested will also favor dissociation of the ternary complex formed between the sgRNA, α-helical lobe, and nuclease lobe. Thus, the observed binding curve is likely a convolution of equilibria between the protein and sgRNA, and between the protein-sgRNA complex and dsDNA. Individual lobes together with sgRNA do not appreciably bind dsDNA at the tested concentrations.
sgRNA motifs recruit both Cas9 lobes to form a ternary complex
[00574] RNA molecular determinants that promote heterodimerization of the α-helical and nuclease lobes was next investigated. Crystal structures of sgRNA/DNA -bound Cas9 show that the spacer (guide) and stem-loop motifs at the 5' end of the sgRNA primarily contact the α-helical lobe, whereas two hairpins at the 3' end bind the outside face of the nuclease lobe (Figure 11A). The
nexus motif, recently shown to be critical for activity (11), occupies a central position between the lobes and forms extensive interactions with the bridge helix. Based on this interaction profile, a full-length sgRNA and two shorter sgRNA constructs that were selectively truncated from either the 5' or 3' end were generated (Figure 11B), and their affinities were determined for binding to WT Cas9, the individual oc-helical and nuclease lobes, and split-Cas9 using a filter binding assay.
[00575] The full-length sgRNA is bound by WT Cas9 with an equilibrium dissociation constant (Kd) of 10 + 2 pM, whereas the lobes individually and together have Kd values in the range of 0.2-0.8 nM (Figure 11C and Figure 20A (Table 1)). The difference between WT and split-Cas9 likely reflects the increased entropic cost required to assemble a ternary versus binary complex.
Interestingly, WT Cas9 bound a truncated sgRNA comprising only the 3' hairpins (Aspacer- nexus) with an affinity that was indistinguishable from the full-length sgRNA (Figure 11D and Figure 20A (Table 1)), indicating that these hairpins provide the major source of binding energy for the WT protein-RNA complex. Consistent with the crystal structure, the nuclease lobe still bound the 5 ' -truncated sgRNA as tightly as the full-length sgRNA, while the affinity of the oc- helical lobe was reduced by over three orders of magnitude (Kd > 100 nM).
[00576] It was then hypothesized that removing the two hairpins from the 3' end of the sgRNA
(Ahairpinsl-2) would selectively perturb interactions with the nuclease lobe. Indeed, the affinity of the 3 ' -truncated sgRNA for the nuclease lobe decreased by over three orders of magnitude relative to full-length sgRNA (Kd >100 nM), whereas the affinity of the oc-helical lobe was unchanged (Figure 11D and Figure 20A (Table 1)). The results demonstrate that sgRNA truncations specifically destabilize binding to only one of the two lobes, and that the affinity of split-Cas9 is limited by the highest affinity interaction with either lobe. These findings highlight the multiple, independent molecular contacts formed between the sgRNA and the two lobes of Cas9.
[00577] Based on the binding data and on the minimal contacts observed between the two Cas9 lobes in available structures, it was then hypothesized that the sgRNA would be required for
heterodimerization of the oc-helical and nuclease lobes. To test this, analytical negative-stain electron microscopy was performed with each lobe alone and together in the presence and absence of sgRNA. Raw micrographs of a sample containing both polypeptides and the sgRNA revealed bi-lobed densities that had dimensions consistent with earlier reconstructions of the Cas9-RNA complex (Figure 16) (8), and the resulting class averages were indistinguishable from those obtained using WT Cas9 (Figure HE). In contrast, smaller particles that had dimensions more consistent with single lobes were observed when the polypeptides were mixed
together in the absence of sgRNA (Figure HE and Figure 16). These results indicate that a full- length sgRNA acts as a molecular scaffold in dimerizing the two lobes.
[00578] Figure 11. Split-Cas9 assembly requires the sgRNA. (A) Crystal structure of sgRNA/DNA- bound Cas9 (PDB ID: 4008) (9): Cas9 is colored by lobe and shown as a transparent surface, the sgRNA is colored by motif according to Briner et al. (11), and the DNA is omitted for clarity. (B) Cartoon representations of full-length and truncated sgRNA variants used in binding experiments; specific motifs of the sgRNA are colored as in (A). (C-D) Results from binding experiments using full-length sgRNA (Q, and Ahairpinsl-2 and Aspacer-nexus sgRNA truncations (D). Radiolabeled RNAs were incubated with increasing concentrations of WT Cas9, individual oc-helical and nuclease lobes, or split-Cas9, and the fraction of protein-bound RNA was determined by nitrocellulose filter binding. Equilibrium dissociation constants (Kd) determined from three independent experiments are shown in Figure 20A (Table 1).
(E) Reference -free class averages from negative-stain EM images of split-Cas9 reconstituted with single -guide RNA (top left), WT Cas9 reconstituted with dual-guide RNA (top right), and split-Cas9 in the absence of guide RNA (bottom). For split-Cas9 without sgRNA, several class averages are shown. The width of the boxes corresponds to -336 A. Data with WT Cas9 is adapted from Jinek et al. (8).
[00579] Figure 16. Split-Cas9 heterodimerization requires the sgRNA. (A-D) Raw electron
micrographs of negatively-stained oc-helical and nuclease lobes alone (A,B), together (Q, or together with sgRNA (D). Particles having dimensions consistent with WT Cas9-RNA complexes, and thus indicative of heterodimer formation, are only observed in the presence of sgRNA. Representative particles are circled (yellow), and the scale bar indicates 50 nm.
Split-Cas9 functions in mammalian cells for genome editing
[00580] To determine whether split-Cas9 would retain the ability to generate site-specific genomic edits in vivo, the EMX1 locus was targeted in HEK293T cells by nucleofection using reconstituted Cas9-sgRNA ribonucleoprotein (RNP) complexes (Figure 12A) (13). Split-Cas9 generated indels with efficiencies of up to 0.6% and 2% in cells that were unsynchronized or nocodazole synchronized, respectively, compared to 22% and 34% with WT Cas9 (Figure 12B). The reduced levels of editing may be due in part to disruption of the ternary complex during dilution and nucleofection, as the complex is limited by the affinity of the oc-helical lobe for sgRNA, or to slower kinetics of DNA cleavage in cells. Additionally, because each copy of sgRNA must recruit both lobes to form an active complex, the activity in cells may be sensitive to the stoichiometry between the sgRNA and either lobe. In agreement with this, the in vitro DNA cleavage activity of split-Cas9 decreased as the sgRNA concentration was increased above that
of both lobes (Figure 17), suggesting that excess sgRNA can titrate the lobes apart from each other. While the results leave room for optimization of split-Cas9 activity in cells, they demonstrate that the intrinsic genome editing capabilities are retained when Cas9 comprises two individual polypeptides.
[00581] Figure 12. Genomic editing function and selective inactivation of split-Cas9. (A). Schematic of the split-Cas9 RNP nucleofection assay using a full-length EMX1 -targeting sgRNA. Illustration and protocol adapted from Lin et al. (13). (B) Analysis of editing efficiencies by nonhomologous end joining (NHEJ) using a T7 endonuclease I assay and agarose gel electrophoresis. Cells were nucleofected with 100, 30, or 10 pmol of WT or split-Cas9 ribonucleoprotein (RNP) complexes after arrest at mitosis with nocodazole (Sync) or during normal growth (Unsync). Editing efficiencies are shown at the bottom. (Q DNA cleavage time courses using WT and split-Cas9 with either a full-length sgRNA (top) or the Ahairpinsl-2 sgRNA (bottom). Values were averaged from three independent experiments, and error bars represent the standard deviation. Rate constants can be found in Figure 20B (Table SI).
[00582] Figure 17. Excess sgRNA reduces the DNA cleavage activity of split-Cas9. DNA cleavage assay with varying molar ratios of protein to sgRNA, analyzed by denaturing PAGE. Reactions contained ~1 nM radiolabeled dsDNA, 100 nM oc-helical and nuclease lobes, and 50-1000 nM sgRNA. The extent of product formation decreases substantially as the sgRNA concentration surpasses the lobe concentration. This observation suggests that stoichiometric excesses of sgRNA titrate the individual lobes away from each and onto independent sgRNA molecules, a hypothesis supported by the finding that distinct sgRNA motifs interact with either lobe.
Engineered sgRNAs selectively preclude split- Cas9 but not WT Cas9 activity
[00583] The potential for enhanced spatiotemporal control of genome engineering events with split- Cas9 prompted the investigation of ways in which sgRNA-mediated dimerization of the oc- helical and nuclease lobes could be perturbed. In particular, certain 3 ' -truncated or modified sgRNAs, which have weak affinity for the nuclease lobe (Figure HE) but still support robust DNA cleavage activity of WT Cas9 (4), could selectively inactivate split-Cas9 activity through their inability to effectively recruit and dimerize both lobes into a functional enzyme complex. Thus, the activity of split-Cas9 in cells could be made dependent upon inducible protein-protein dimerization domains (Figure 18).
[00584] When sgRNA variants that lacked one or both hairpins at the 3' end were tested for their ability to support in vitro cleavage, split-Cas9 activity was either severely compromised or completely abolished relative to WT Cas9 activity. (Figure 12C, Figure 19, and Figure 20B (Table SI)). A recent report found that sgRNAs in which only the first hairpin is deleted function robustly in
cells (11), and similar designs supported DNA cleavage activity of WT Cas9 but not split-Cas9 in vitro (Figure 19). Thus, rationally designed variants of the sgRNA scaffold can be used to prevent RNA-mediated heterodimerization of the two lobes without compromising the intrinsic RNA-guided DNA cleaving capabilities of Cas9. Split-Cas9 can be converted into a regulatable system using exogenous dimerization domains (Figure 18). Fusing both lobes to domains that selectively dimerize upon chemical or optical induction, such as the abscisic acid-inducible PYL-ABI dimer (17) or the blue light-inducible CRY2-CIB 1 dimer (18) would allow for enhanced spatiotemporal control of genome engineering events. Dimerization domains might also overcome inefficient complex formation by making lobe assembly independent of the sgRNA. The combined use of inducible dimerization domains with compromised sgRNA variants that enable DNA targeting but not split-Cas9 assembly would eliminate leaky activity in the absence of inducer while still allowing for robust activation, creating an extremely sensitive inducible system.
[00585] Figure 18. Strategy for inducible control of genome engineering by a split-Cas9 enzyme
complex. (A) Because the oc-helical and nuclease lobes dimerize in the presence of sgRNA, both WT and split-Cas9 are functional genome editing tools in cells using full-length sgRNA. (B) sgRNA variants with 3' -hairpin truncations have substantially weaker affinity for the nuclease lobe and thus do not efficiently assemble a functional split-Cas9 complex, leading to an inactive enzyme. In contrast, in vitro DNA cleavage by WT Cas9 is minimally affected by these truncations, indicating that the intrinsic activity of the Cas9-sgRNA enzyme complex does not require hairpins at the 3' end. (Q A schematic of an inducible split-Cas9 system, in which exogenous dimerization domains control the assembly of a functional ternary complex between a 3 '-truncated sgRNA and the oc-helical and nuclease lobes. By fusing both lobes to domains that dimerize only upon some external stimulus (e.g. a small molecule; red trapezoid), split-Cas9 can be specifically activated for a desired genome engineering outcome.
[00586] Figure 19. 3'-truncated sgRNA variants selectively inactivate split-Cas9. (A-B) DNA cleavage assays with WT and split-Cas9 and a panel of four different sgRNAs, analyzed by denaturing PAGE. (A) Full-length sgRNAs promote DNA cleavage activity of both WT and split-Cas9, whereas split-Cas9 activity is completely lost with an sgRNA lacking both hairpins at the 3' end (Ahairpinsl-2). (B) sgRNA variants where only one hairpin is removed show minimal effects on WT Cas9 activity but severely (Ahairpin2) or completely (Ahairpinl) inactivate split-Cas9.
[00587] References
1. Barrangou R, Marraffini LA (2014) CRISPR-Cas systems: Prokaryotes upgrade to adaptive immunity. Mol Cell 54:234-244.
van der Oost J, Westra ER, Jackson RN, Wiedenheft B (2014) Unravelling the structural and mechanistic basis of CRISPR-Cas systems. Nat Rev Microbiol 12:479^192.
Sapranauskas R et al. (2011) The Streptococcus thermophilus CRISPR/Cas system provides immunity in Escherichia coli. Nucleic Acids Res 39:9275-9282.
Jinek M et al. (2012) A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337:816-821.
Mali P, Esvelt KM, Church GM (2013) Cas9 as a versatile tool for engineering biology. Nat Meth 10:957-963.
Hsu PD, Lander ES, Zhang F (2014) Development and applications of CRISPR-Cas9 for genome engineering. Cell 157: 1262-1278.
Doudna JA, Charpentier E (2014) Genome editing. The new frontier of genome engineering with CRISPR-Cas9. Science 346:1258096.
Jinek M et al. (2014) Structures of Cas9 endonucleases reveal RNA-mediated
conformational activation. Science 343: 1247997.
Nishimasu H et al. (2014) Crystal structure of cas9 in complex with guide RNA and target DNA. Cell 156:935-949.
Anders C, Niewoehner O, Duerst A, Jinek M (2014) Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease. Nature 513:569-573.
Briner AE et al. (2014) Guide RNA functional modules direct cas9 activity and
orthogonality. Mol Cell 56:333-339.
Hsu PD et al. (2013) DNA targeting specificity of RNA-guided Cas9 nucleases. Nat Biotechnol 31:827-832.
Lin S, Staahl B, Alia RK, Doudna J A (2014) Enhanced homology-directed human genome engineering by controlled timing of CRISPR/Cas9 delivery. Elife 3.
Shekhawat SS, Ghosh I (2011) Split-protein systems: beyond binary protein-protein interactions. Curr Opin Chem Biol 15:789-797.
Konermann S et al. (2013) Optical control of mammalian endogenous transcription and epigenetic states. Nature 500:472-476.
Lienert F et al. (2013) Two- and three-input TALE-based AND logic computation in embryonic stem cells. Nucleic Acids Res 41 :9967-9975.
Liang F-S, Ho WQ, Crabtree GR (2011) Engineering the ABA plant stress pathway for regulation of induced proximity. Sci Signal 4:rs2-rs2.
Kennedy MJ et al. (2010) Rapid blue-light-mediated induction of protein interactions in living cells. Nat Meth 7:973-975.
Wiedenheft B et al. (2011) Structures of the RNA-guided surveillance complex from a bacterial immune system. Nature 477:486-489.
Jackson RN et al. (2014) Structural biology. Crystal structure of the CRISPR RNA-guided surveillance complex from Escherichia coli. Science 345: 1473-1479.
Mulepati S, Heroux A, Bailey S (2014) Structural biology. Crystal structure of a CRISPR RNA-guided surveillance complex bound to a ssDNA target. Science 345: 1479-1484. Suloway C et al. (2005) Automated molecular microscopy: the new Leginon system. J Struct Biol 151 :41-60.
Lander GC et al. (2012) Complete subunit architecture of the proteasome regulatory particle. Nature 482: 186-191.
van Heel M, Harauz G, Orlova EV, Schmidt R, Schatz M (1996) A new generation of the IMAGIC image processing system. J Struct Biol 116: 17-24.
Kim HJ, Lee HJ, Kim H, Cho SW, Kim J-S (2009) Targeted genome editing in human cells with zinc finger nucleases constructed via modular assembly. Genome Res 19: 1279-1288. Ran FA et al. (2013) Double nicking by RNA-guided CRISPR Cas9 for enhanced genome editing specificity. Cell 154: 1380-1389.
Claims
1. A Cas9 heterodimer comprising:
A) a first fusion polypeptide comprising:
a) a first polypeptide comprising:
i) a RuvCI polypeptide;
ii) a RuvCII polypeptide;
iii) an HNH polypeptide;
iv) a RuvCIII polypeptide; and
v) a PAM-interacting polypeptide; and
b) a first fusion partner, where the first fusion partner is a first member of a dimerization pair; and
B) a second fusion polypeptide comprising:
a) an alpha-helical recognition region; and
b) a second fusion partner, where the second fusion partner is a second member of a dimerization pair.
2. The Cas9 heterodimer of claim 1 , wherein the RuvCI polypeptide comprises an amino acid sequence having at least 75% amino acid sequence identity to amino acids 1-60 of the Streptococcus pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a corresponding Cas9 amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346, and has a length of from about 50 amino acids to about 70 amino acids.
3. The Cas9 heterodimer of claim 1, wherein the RuvCII polypeptide comprises an amino acid sequence having at least 75% amino acid sequence identity to amino acids 719-775 of the
Streptococcus pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a corresponding Cas9 amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346, and has a length of from about 40 amino acids to about 70 amino acids.
4. The Cas9 heterodimer of claim 1 , wherein the HNH polypeptide comprises an amino acid sequence having at least 75% amino acid sequence identity to amino acids 776 to 909 of the Streptococcus pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a corresponding
Cas9 amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346, and has a length of from about 120 amino acids to about 145 amino acids.
5. The Cas9 heterodimer of claim 1, wherein the RuvCIII polypeptide comprises an amino acid sequence having at least 75% amino acid sequence identity to amino acids 910-1099 of the Streptococcus pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a corresponding Cas9 amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346, and has a length of from about 170 amino acids to about 210 amino acids.
6. The Cas9 heterodimer of claim 1, wherein the PAM-interacting polypeptide comprises an amino acid sequence having at least 75% amino acid sequence identity to amino acids 1100-1367 of the Streptococcus pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a corresponding Cas9 amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346, and has a length of from about 250 amino acids to about 280 amino acids.
7. The Cas9 heterodimer of any one of claims 1-6, wherein the first fusion polypeptide comprises one or more heterologous nuclear localization sequences (NLS) that provides for nuclear localization.
8. The Cas9 heterodimer of claim 7, wherein the NLS is at or near the N-terminus of the first fusion polypeptide.
9. The Cas9 heterodimer of claim 7, wherein the NLS is at or near the C-terminus of the first fusion polypeptide.
10. The Cas9 heterodimer of claim 7, wherein the first fusion polypeptide comprises an NLS at or near the N-terminus of the first fusion polypeptide and comprises an NLS at or near the C-terminus of the first fusion polypeptide.
11. The Cas9 heterodimer of any one of claims 1-6, wherein the second fusion polypeptide comprises one or more heterologous nuclear localization sequences (NLS) that provides for nuclear localization.
12. The Cas9 heterodimer of claim 11, wherein the NLS is at or near the N-terminus of the second fusion polypeptide.
13. The Cas9 heterodimer of claim 11, wherein the NLS is at or near the C-terminus of the second fusion polypeptide.
14. The Cas9 heterodimer of claim 11, wherein the second fusion polypeptide comprises an NLS at or near the N-terminus of the first second polypeptide and comprises an NLS at or near the C- terminus of the second fusion polypeptide.
15. The Cas9 heterodimer of any one of claims 1-14, wherein the first fusion partner is at or near the N-terminus of the first polypeptide.
16. The Cas9 heterodimer of any one of claims 1-14, wherein the first fusion partner is at or near the C-terminus of the first polypeptide.
17. The Cas9 heterodimer of any one of claims 1-14, wherein the first fusion partner is located internally to the first polypeptide.
18. The Cas9 heterodimer of claim 17, wherein the first fusion partner is located within the HNH polypeptide.
19. The Cas9 heterodimer of claim 18, wherein the first fusion partner is located between amino acids 860 and 880 of amino acids 776 to 909 of the Streptococcus pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a corresponding Cas9 amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346.
20. The Cas9 heterodimer of claim 17, wherein the first fusion partner is located within the RuvCIII polypeptide.
21. The Cas9 heterodimer of claim 20, wherein the first fusion partner is located between amino acids 1000 and 1025 of the Streptococcus pyogenes Cas9 amino acid sequence set forth in SEQ ID NO: 1545 or a corresponding Cas9 amino acid sequence set forth in any of SEQ ID NOs: 1-259 and 795-1346.
22. The Cas9 heterodimer of any one of claims 1-21, wherein the second fusion partner is at or near the N-terminus of the second polypeptide.
23. The Cas9 heterodimer of any one of claims 1-21, wherein the second fusion partner is at or near the C-terminus of the second polypeptide.
24. The Cas9 heterodimer of any one of claims 1-21, wherein the second fusion partner is located internally to the second polypeptide.
25. A Cas9 heterodimer comprising:
A) a first fusion polypeptide comprising:
a) a first, circular permuted, polypeptide that comprises:
i) a RuvCI polypeptide;
ii) a RuvCII polypeptide;
iii) an HNH polypeptide;
iv) a RuvCIII polypeptide; and
v) a PAM-interacting polypeptide; and
b) a first fusion partner, where the first fusion partner is a first member of a dimerization pair; and
B) a second fusion polypeptide comprising:
a) a second polypeptide that comprises an alpha-helical recognition region; and b) a second fusion partner, where the second fusion partner is a second member of the dimerization pair.
26. The Cas9 heterodimer of claim 25, wherein the first, circular permuted, polypeptide comprises, in order from N-terminus to C-terminus:
i) a RuvCII polypeptide;
ii) an HNH polypeptide;
iii) a RuvCIII polypeptide;
iv) a PAM-interacting polypeptide;
v) a RuvCI polypeptide.
27. The Cas9 heterodimer of claim 25, wherein the first, circular permuted, polypeptide comprises, in order from N-terminus to C-terminus:
i) an HNH polypeptide;
ii) a RuvCIII polypeptide;
iii) a PAM-interacting polypeptide;
iv) a RuvCI polypeptide; and
v) a RuvCII polypeptide.
28. The Cas9 heterodimer of claim 25, wherein the first, circular permuted, polypeptide comprises, in order from N-terminus to C-terminus:
i) a RuvCIII polypeptide;
ii) a PAM-inter acting polypeptide;
iii) a RuvCI polypeptide;
iv) a RuvCII polypeptide; and
v) an HNH polypeptide.
29. The Cas9 heterodimer of claim 25, wherein the first, circular permuted, polypeptide comprises, in order from N-terminus to C-terminus:
i) a C-terminal portion of a RuvCIII polypeptide;
ii) a PAM-interacting polypeptide;
iii) a RuvCI polypeptide;
iv) a RuvCII polypeptide;
v) an HNH polypeptide; and
vi) an N-terminal portion of a RuvCIII polypeptide.
30. The Cas9 heterodimer of claim 25, wherein the first, circular permuted, polypeptide comprises, in order from N-terminus to C-terminus:
i) a C-terminal portion of an HNH polypeptide;
ii) a RuvCIII polypeptide;
iii) a PAM-interacting polypeptide;
iv) a RuvCI polypeptide;
v) a RuvCII polypeptide; and
v) an N-terminal portion of an HNH polypeptide.
31. The Cas9 heterodimer of any one of claims 26 to 30, wherein the first fusion polypeptide comprises, in order from N-terminus to C-terminus:
a) the first fusion partner; and
b) the first polypeptide.
32. The Cas9 heterodimer of any one of claims 26 to 30, wherein the first fusion polypeptide comprises, in order from N-terminus to C-terminus:
a) the first polypeptide; and
b) the first fusion partner.
33. The Cas9 heterodimer of any one of claims 26 to 30, wherein the first fusion polypeptide comprises, in order from N-terminus to C-terminus:
a) heterologous nuclear localization sequences (NLS) that provides for nuclear localization; b) the first fusion partner; and
c) the first polypeptide.
34. The Cas9 heterodimer of any one of claims 26 to 30, wherein the first fusion polypeptide comprises, in order from N-terminus to C-terminus:
a) an NLS;
b) the first fusion partner;
c) the first polypeptide; and
d) an NLS.
35. The Cas9 heterodimer of any one of claims 1-34, wherein the first fusion polypeptide and the second fusion polypeptide form a heterodimer in the presence of a small molecule dimerizer.
36. The Cas9 heterodimer of any one of claims 1-35, wherein:
a) the first fusion partner is FK506 binding protein 1A (FKBPIA); and the second fusion partner is FKBPIA;
b) the first fusion partner is FKBPIA; and the second fusion partner is and PPP3CA (protein phosphatase 3, catalytic subunit, alpha isozyme);
c) the first fusion partner is FKBPIA; and the second fusion partner is and cyclophilin;
d) the first fusion partner is FKBPIA; and the second fusion partner is and Fkbp-Rapamycin Binding Domain (FRB);
e) the first fusion partner is gyrase B (GyrB) ; and the second fusion partner is and GyrB;
f) the first fusion partner is dihydrofolate reductase (DHFR) ; and the second fusion partner is and DHFR;
g) the first fusion partner is DmrB; and the second fusion partner is and DmrB;
h) the first fusion partner is PYL; and the second fusion partner is and ABI;
i) the first fusion partner is Cry2; and the second fusion partner is and CIP; or
j) the first fusion partner is GAI; and the second fusion partner is and GID1.
37. The Cas9 heterodimer of any one of claims 1-35, wherein:
a) the second fusion partner is (FK506 binding protein 1A) FKBPIA; and the first fusion partner is FKBPIA;
b) the second fusion partner is FKBPIA; and the first fusion partner is and PPP3CA (protein phosphatase 3, catalytic subunit, alpha isozyme);
c) the second fusion partner is FKBPIA; and the first fusion partner is and cyclophilin;
d) the second fusion partner is FKBPIA; and the first fusion partner is and Fkbp-Rapamycin Binding Domain (FRB);
e) the second fusion partner is gyrase B (GyrB) ; and the first fusion partner is and GyrB;
f) the second fusion partner is dihydrofolate reductase (DHFR) ; and the first fusion partner is and DHFR;
g) the second fusion partner is DmrB; and the first fusion partner is and DmrB;
h) the second fusion partner is PYL; and the first fusion partner is and ABI;
i) the second fusion partner is Cry2; and the first fusion partner is and CIP; or
j) the second fusion partner is GAI; and the first fusion partner is and GID1.
38. A system comprising:
a) a Cas9 heterodimer of any one of claims 1-37; and
b) a Cas9 guide RNA that comprises stem loop 1 but does not comprise at least one of: a stem loop 2 and a stem loop 3.
39. The system of claim 38, wherein the Cas9 guide RNA does not comprise a stem loop 2 and does not comprise a stem loop 3.
40. The system of claim 38 or 39, comprising a small molecule dimerizer that induces dimerization of the first fusion polypeptide and the second fusion polypeptide.
41. One or more nucleic acids comprising nucleotide sequences encoding the Cas9 heterodimer of any one of claims 1-37.
42. The one or more nucleic acids of claim 41, wherein the nucleotide sequences encoding the first fusion polypeptide and the nucleotide sequences encoding the second fusion polypeptide are in the same nucleic acid.
43. The one or more nucleic acids of claim 41, wherein nucleotide sequences encoding the first fusion polypeptide and nucleotide sequences encoding the second fusion polypeptide are in separate nucleic acids.
44. The one or more nucleic acids of claim 41, wherein the nucleotide sequences encoding the first fusion polypeptide are operably linked to a transcription control sequence.
45. The one or more nucleic acids of claim 41, wherein the nucleotide sequences encoding the second fusion polypeptide are operably linked to a transcription control sequence.
46. A recombinant vector comprising the one or more nucleic acids of any one of claims 41-
45.
47. The recombinant vector of claim 46, wherein the vector is a viral vector.
48. The recombinant vector of claim 47, wherein the viral vector is an adeno-associated viral vector, a lentiviral vector, or a retroviral vector.
49. A host cell genetically modified with the one or more nucleic acids of any one of claims 41-45, or with the recombinant expression vector of any one of claims 46-48.
50. The host cell of claim 49, wherein the host cell is a prokaryotic cell.
51. The host cell of claim 49, wherein the host cell is a eukaryotic cell.
52. The host cell of claim 51 , wherein the host cell is a mammalian cell, a plant cell, an amphibian cell, a reptile cell, a yeast cell, a protozoan cell, or a helminth cell.
53. A method of binding a target nucleic acid, the method comprising:
contacting the target nucleic acid with the system of claim 38.
54. The method of claim 53, wherein the method results in modification of the target nucleic acid.
55. The method of claim 54, wherein said modification is cleavage.
56. The method of any of claims 53 to 55, wherein the Cas9 guide RNA does not comprise a stem loop 2 and does not comprise a stem loop 3.
57. The method of any of claims 53 to 56, wherein the system comprises a small molecule dimerizer that induces dimerization of the first fusion polypeptide and the second fusion polypeptide.
58. The method of any of claims 53 to 57, wherein the system comprises a donor polynucleotide.
59. The method of any of claims 53 to 58, wherein the system comprises a PAMmer.
60. The method of any of claims 53 to 59, wherein the Cas9 heterodimer has reduced nuclease activity.
61. The method of any of claims 53 to 60, wherein the Cas9 heterodimer has nickase activity.
62. The method of any of claims 53 to 61, wherein the Cas9 heterodimer includes a fusion partner that provides for an activity selected from: transcription modulation, methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, and demyristoylation activity.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP16737638.3A EP3245232B1 (en) | 2015-01-12 | 2016-01-07 | Heterodimeric cas9 and methods of use thereof |
US15/536,626 US11208638B2 (en) | 2015-01-12 | 2016-01-07 | Heterodimeric Cas9 and methods of use thereof |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201562102485P | 2015-01-12 | 2015-01-12 | |
US62/102,485 | 2015-01-12 | ||
US201562116263P | 2015-02-13 | 2015-02-13 | |
US62/116,263 | 2015-02-13 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016114972A1 true WO2016114972A1 (en) | 2016-07-21 |
Family
ID=56406236
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2016/012470 WO2016114972A1 (en) | 2015-01-12 | 2016-01-07 | Heterodimeric cas9 and methods of use thereof |
Country Status (3)
Country | Link |
---|---|
US (1) | US11208638B2 (en) |
EP (1) | EP3245232B1 (en) |
WO (1) | WO2016114972A1 (en) |
Cited By (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9526784B2 (en) | 2013-09-06 | 2016-12-27 | President And Fellows Of Harvard College | Delivery system for functional nucleases |
WO2017075335A1 (en) | 2015-10-28 | 2017-05-04 | Voyager Therapeutics, Inc. | Regulatable expression using adeno-associated virus (aav) |
US9840699B2 (en) | 2013-12-12 | 2017-12-12 | President And Fellows Of Harvard College | Methods for nucleic acid editing |
WO2018089664A1 (en) * | 2016-11-11 | 2018-05-17 | The Regents Of The University Of California | Variant rna-guided polypeptides and methods of use |
US10077453B2 (en) | 2014-07-30 | 2018-09-18 | President And Fellows Of Harvard College | CAS9 proteins including ligand-dependent inteins |
US10113163B2 (en) | 2016-08-03 | 2018-10-30 | President And Fellows Of Harvard College | Adenosine nucleobase editors and uses thereof |
US10167457B2 (en) | 2015-10-23 | 2019-01-01 | President And Fellows Of Harvard College | Nucleobase editors and uses thereof |
US10208317B2 (en) | 2013-12-11 | 2019-02-19 | Regeneron Pharmaceuticals, Inc. | Methods and compositions for the targeted modification of a mouse embryonic stem cell genome |
US10323236B2 (en) | 2011-07-22 | 2019-06-18 | President And Fellows Of Harvard College | Evaluation and improvement of nuclease cleavage specificity |
US10385359B2 (en) | 2013-04-16 | 2019-08-20 | Regeneron Pharmaceuticals, Inc. | Targeted modification of rat genome |
US10428319B2 (en) | 2017-06-09 | 2019-10-01 | Editas Medicine, Inc. | Engineered Cas9 nucleases |
US10508298B2 (en) | 2013-08-09 | 2019-12-17 | President And Fellows Of Harvard College | Methods for identifying a target site of a CAS9 nuclease |
US10597679B2 (en) | 2013-09-06 | 2020-03-24 | President And Fellows Of Harvard College | Switchable Cas9 nucleases and uses thereof |
US20200224221A1 (en) * | 2017-02-20 | 2020-07-16 | Institute Of Genetics And Developmental Biology, Chinese Academy Of Sciences | Genome editing method |
US10745677B2 (en) | 2016-12-23 | 2020-08-18 | President And Fellows Of Harvard College | Editing of CCR5 receptor gene to protect against HIV infection |
US10858639B2 (en) | 2013-09-06 | 2020-12-08 | President And Fellows Of Harvard College | CAS9 variants and uses thereof |
US11046948B2 (en) | 2013-08-22 | 2021-06-29 | President And Fellows Of Harvard College | Engineered transcription activator-like effector (TALE) domains and uses thereof |
US11236313B2 (en) | 2016-04-13 | 2022-02-01 | Editas Medicine, Inc. | Cas9 fusion molecules, gene editing systems, and methods of use thereof |
US11268082B2 (en) | 2017-03-23 | 2022-03-08 | President And Fellows Of Harvard College | Nucleobase editors comprising nucleic acid programmable DNA binding proteins |
US11306324B2 (en) | 2016-10-14 | 2022-04-19 | President And Fellows Of Harvard College | AAV delivery of nucleobase editors |
US11319532B2 (en) | 2017-08-30 | 2022-05-03 | President And Fellows Of Harvard College | High efficiency base editors comprising Gam |
US11390884B2 (en) | 2015-05-11 | 2022-07-19 | Editas Medicine, Inc. | Optimized CRISPR/cas9 systems and methods for gene editing in stem cells |
US11447770B1 (en) | 2019-03-19 | 2022-09-20 | The Broad Institute, Inc. | Methods and compositions for prime editing nucleotide sequences |
US11499151B2 (en) | 2017-04-28 | 2022-11-15 | Editas Medicine, Inc. | Methods and systems for analyzing guide RNA molecules |
US11542509B2 (en) | 2016-08-24 | 2023-01-03 | President And Fellows Of Harvard College | Incorporation of unnatural amino acids into proteins using base editing |
US11542496B2 (en) | 2017-03-10 | 2023-01-03 | President And Fellows Of Harvard College | Cytosine to guanine base editor |
US11560566B2 (en) | 2017-05-12 | 2023-01-24 | President And Fellows Of Harvard College | Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation |
WO2023010133A2 (en) | 2021-07-30 | 2023-02-02 | Tune Therapeutics, Inc. | Compositions and methods for modulating expression of frataxin (fxn) |
WO2023010135A1 (en) | 2021-07-30 | 2023-02-02 | Tune Therapeutics, Inc. | Compositions and methods for modulating expression of methyl-cpg binding protein 2 (mecp2) |
US11597924B2 (en) | 2016-03-25 | 2023-03-07 | Editas Medicine, Inc. | Genome editing systems comprising repair-modulating enzyme molecules and methods of their use |
US11661590B2 (en) | 2016-08-09 | 2023-05-30 | President And Fellows Of Harvard College | Programmable CAS9-recombinase fusion proteins and uses thereof |
US11667911B2 (en) | 2015-09-24 | 2023-06-06 | Editas Medicine, Inc. | Use of exonucleases to improve CRISPR/CAS-mediated genome editing |
US11680268B2 (en) | 2014-11-07 | 2023-06-20 | Editas Medicine, Inc. | Methods for improving CRISPR/Cas-mediated genome-editing |
US11732274B2 (en) | 2017-07-28 | 2023-08-22 | President And Fellows Of Harvard College | Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE) |
US11795443B2 (en) | 2017-10-16 | 2023-10-24 | The Broad Institute, Inc. | Uses of adenosine base editors |
WO2023250511A2 (en) | 2022-06-24 | 2023-12-28 | Tune Therapeutics, Inc. | Compositions, systems, and methods for reducing low-density lipoprotein through targeted gene repression |
US11866726B2 (en) | 2017-07-14 | 2024-01-09 | Editas Medicine, Inc. | Systems and methods for targeted integration and genome editing and detection thereof using integrated priming sites |
WO2024015881A2 (en) | 2022-07-12 | 2024-01-18 | Tune Therapeutics, Inc. | Compositions, systems, and methods for targeted transcriptional activation |
US11898179B2 (en) | 2017-03-09 | 2024-02-13 | President And Fellows Of Harvard College | Suppression of pain by gene editing |
WO2024040254A2 (en) | 2022-08-19 | 2024-02-22 | Tune Therapeutics, Inc. | Compositions, systems, and methods for regulation of hepatitis b virus through targeted gene repression |
US11911415B2 (en) | 2015-06-09 | 2024-02-27 | Editas Medicine, Inc. | CRISPR/Cas-related methods and compositions for improving transplantation |
US11912985B2 (en) | 2020-05-08 | 2024-02-27 | The Broad Institute, Inc. | Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence |
WO2024064642A2 (en) | 2022-09-19 | 2024-03-28 | Tune Therapeutics, Inc. | Compositions, systems, and methods for modulating t cell function |
US11970710B2 (en) | 2015-10-13 | 2024-04-30 | Duke University | Genome engineering with Type I CRISPR systems in eukaryotic cells |
US11976307B2 (en) | 2012-04-27 | 2024-05-07 | Duke University | Genetic correction of mutated genes |
WO2024163683A2 (en) | 2023-02-01 | 2024-08-08 | Tune Therapeutics, Inc. | Systems, compositions, and methods for modulating expression of methyl-cpg binding protein 2 (mecp2) and x-inactive specific transcript (xist) |
WO2024163678A2 (en) | 2023-02-01 | 2024-08-08 | Tune Therapeutics, Inc. | Fusion proteins and systems for targeted activation of frataxin (fxn) and related methods |
US12110545B2 (en) | 2017-01-06 | 2024-10-08 | Editas Medicine, Inc. | Methods of assessing nuclease cleavage |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11661599B1 (en) | 2017-12-14 | 2023-05-30 | National Technology & Engineering Solutions Of Sandia, Llc | CRISPR-Cas based system for targeting single-stranded sequences |
WO2020077293A1 (en) * | 2018-10-11 | 2020-04-16 | The Board Of Trustees Of The Leland Stanford Junior University | Systems and methods for compartment spatial organization |
CN110577971B (en) * | 2019-08-08 | 2022-11-18 | 复旦大学 | CRISPR/Sa-SauriCas9 gene editing system and application thereof |
CN115052979A (en) * | 2019-11-12 | 2022-09-13 | 马里兰大学派克分院 | Plant vectors, compositions and uses related thereto |
WO2024105633A1 (en) * | 2022-11-18 | 2024-05-23 | Kyoto Prefectural Public University Corporation | Compositions for mitophagy induction and uses thereof |
Citations (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140068797A1 (en) | 2012-05-25 | 2014-03-06 | University Of Vienna | Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription |
US8697359B1 (en) | 2012-12-12 | 2014-04-15 | The Broad Institute, Inc. | CRISPR-Cas systems and methods for altering expression of gene products |
US20140179770A1 (en) | 2012-12-12 | 2014-06-26 | Massachusetts Institute Of Technology | Delivery, engineering and optimization of systems, methods and compositions for sequence manipulation and therapeutic applications |
US20140179006A1 (en) | 2012-12-12 | 2014-06-26 | Massachusetts Institute Of Technology | Crispr-cas component systems, methods and compositions for sequence manipulation |
US20140186843A1 (en) | 2012-12-12 | 2014-07-03 | Massachusetts Institute Of Technology | Methods, systems, and apparatus for identifying target sequences for cas enzymes or crispr-cas systems for target sequences and conveying results thereof |
US20140186919A1 (en) | 2012-12-12 | 2014-07-03 | Feng Zhang | Engineering and optimization of improved systems, methods and enzyme compositions for sequence manipulation |
US20140186958A1 (en) | 2012-12-12 | 2014-07-03 | Feng Zhang | Engineering and optimization of systems, methods and compositions for sequence manipulation with functional domains |
US20140234972A1 (en) | 2012-12-12 | 2014-08-21 | Massachusetts Institute Of Technology | CRISPR-CAS Nickase Systems, Methods And Compositions For Sequence Manipulation in Eukaryotes |
US20140242664A1 (en) | 2012-12-12 | 2014-08-28 | The Broad Institute, Inc. | Engineering of systems, methods and optimized guide compositions for sequence manipulation |
US20140242702A1 (en) | 2013-02-25 | 2014-08-28 | Sigma Aldrich Co. Llc | Methods and compositions for enhancing nuclease-mediated gene disruption |
US20140273233A1 (en) | 2013-03-15 | 2014-09-18 | Sigma-Aldrich Co., Llc | Crispr-based genome modification and regulation |
US20140273226A1 (en) | 2013-03-15 | 2014-09-18 | System Biosciences, Llc | Crispr/cas systems for genomic modification and gene modulation |
US20140273235A1 (en) | 2013-03-15 | 2014-09-18 | Regents Of The University Of Minnesota | ENGINEERING PLANT GENOMES USING CRISPR/Cas SYSTEMS |
US20140287938A1 (en) | 2013-03-15 | 2014-09-25 | The Broad Institute, Inc. | Recombinant virus and preparations thereof |
US20140298547A1 (en) | 2012-12-13 | 2014-10-02 | Dow Agrosciences Llc | Dna detection methods for site specific nuclease activity |
US20140295556A1 (en) | 2013-03-15 | 2014-10-02 | The General Hospital Corporation | Using RNA-guided FokI Nucleases (RFNs) to Increase Specificity for RNA-Guided Genome Editing |
US20140304853A1 (en) | 2013-04-05 | 2014-10-09 | Sangamo Biosciences, Inc. | Methods and compositions for integration of an exogenous sequence within the genome of plants |
US20140310828A1 (en) | 2013-04-16 | 2014-10-16 | Regeneron Pharmaceuticals, Inc. | Targeted modification of rat genome |
US20140315985A1 (en) | 2013-03-14 | 2014-10-23 | Caribou Biosciences, Inc. | Compositions and methods of nucleic acid-targeting nucleic acids |
US20140335063A1 (en) | 2013-05-10 | 2014-11-13 | Sangamo Biosciences, Inc. | Delivery methods and compositions for nuclease-mediated genome engineering |
US20140342458A1 (en) | 2012-12-17 | 2014-11-20 | President And Fellows Of Harvard College | RNA-Guided Human Genome Engineering |
US20140349400A1 (en) | 2013-03-15 | 2014-11-27 | Massachusetts Institute Of Technology | Programmable Modification of DNA |
US20140349405A1 (en) | 2013-05-22 | 2014-11-27 | Wisconsin Alumni Research Foundation | Rna-directed dna cleavage and gene editing by cas9 enzyme from neisseria meningitidis |
US20140357530A1 (en) | 2012-12-12 | 2014-12-04 | The Broad Institute Inc. | Functional genomics using crispr-cas systems, compositions, methods, knock out libraries and applications thereof |
US20140357523A1 (en) | 2013-05-29 | 2014-12-04 | Agilent Technologies, Inc. | Method for fragmenting genomic dna using cas9 |
US20140356956A1 (en) | 2013-06-04 | 2014-12-04 | President And Fellows Of Harvard College | RNA-Guided Transcriptional Regulation |
US20140364333A1 (en) | 2013-03-15 | 2014-12-11 | President And Fellows Of Harvard College | Methods for Live Imaging of Cells |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9322037B2 (en) * | 2013-09-06 | 2016-04-26 | President And Fellows Of Harvard College | Cas9-FokI fusion proteins and uses thereof |
WO2015089364A1 (en) * | 2013-12-12 | 2015-06-18 | The Broad Institute Inc. | Crystal structure of a crispr-cas system, and uses thereof |
EP3080260B1 (en) * | 2013-12-12 | 2019-03-06 | The Broad Institute, Inc. | Crispr-cas systems and methods for altering expression of gene products, structural information and inducible modular cas enzymes |
-
2016
- 2016-01-07 EP EP16737638.3A patent/EP3245232B1/en active Active
- 2016-01-07 WO PCT/US2016/012470 patent/WO2016114972A1/en active Application Filing
- 2016-01-07 US US15/536,626 patent/US11208638B2/en active Active
Patent Citations (56)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140068797A1 (en) | 2012-05-25 | 2014-03-06 | University Of Vienna | Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription |
US20140242700A1 (en) | 2012-12-12 | 2014-08-28 | Massachusetts Institute Of Technology | Engineering and optimization of improved systems, methods and enzyme compositions for sequence manipulation |
US20140179770A1 (en) | 2012-12-12 | 2014-06-26 | Massachusetts Institute Of Technology | Delivery, engineering and optimization of systems, methods and compositions for sequence manipulation and therapeutic applications |
US8906616B2 (en) | 2012-12-12 | 2014-12-09 | The Broad Institute Inc. | Engineering of systems, methods and optimized guide compositions for sequence manipulation |
US20140179006A1 (en) | 2012-12-12 | 2014-06-26 | Massachusetts Institute Of Technology | Crispr-cas component systems, methods and compositions for sequence manipulation |
US20140186843A1 (en) | 2012-12-12 | 2014-07-03 | Massachusetts Institute Of Technology | Methods, systems, and apparatus for identifying target sequences for cas enzymes or crispr-cas systems for target sequences and conveying results thereof |
US20140189896A1 (en) | 2012-12-12 | 2014-07-03 | Feng Zhang | Crispr-cas component systems, methods and compositions for sequence manipulation |
US20140186919A1 (en) | 2012-12-12 | 2014-07-03 | Feng Zhang | Engineering and optimization of improved systems, methods and enzyme compositions for sequence manipulation |
US20140186958A1 (en) | 2012-12-12 | 2014-07-03 | Feng Zhang | Engineering and optimization of systems, methods and compositions for sequence manipulation with functional domains |
US8771945B1 (en) | 2012-12-12 | 2014-07-08 | The Broad Institute, Inc. | CRISPR-Cas systems and methods for altering expression of gene products |
US8795965B2 (en) | 2012-12-12 | 2014-08-05 | The Broad Institute, Inc. | CRISPR-Cas component systems, methods and compositions for sequence manipulation |
US20140227787A1 (en) | 2012-12-12 | 2014-08-14 | The Broad Institute, Inc. | Crispr-cas systems and methods for altering expression of gene products |
US20140234972A1 (en) | 2012-12-12 | 2014-08-21 | Massachusetts Institute Of Technology | CRISPR-CAS Nickase Systems, Methods And Compositions For Sequence Manipulation in Eukaryotes |
US20140242699A1 (en) | 2012-12-12 | 2014-08-28 | Massachusetts Institute Of Technology | Delivery, engineering and optimization of systems, methods and compositions for sequence manipulation and therapeutic applications |
US20140310830A1 (en) | 2012-12-12 | 2014-10-16 | Feng Zhang | CRISPR-Cas Nickase Systems, Methods And Compositions For Sequence Manipulation in Eukaryotes |
US8865406B2 (en) | 2012-12-12 | 2014-10-21 | The Broad Institute Inc. | Engineering and optimization of improved systems, methods and enzyme compositions for sequence manipulation |
US20140170753A1 (en) | 2012-12-12 | 2014-06-19 | Massachusetts Institute Of Technology | Crispr-cas systems and methods for altering expression of gene products |
US20140248702A1 (en) | 2012-12-12 | 2014-09-04 | The Broad Institute, Inc. | CRISPR-Cas Nickase Systems, Methods And Compositions For Sequence Manipulation in Eukaryotes |
US20140256046A1 (en) | 2012-12-12 | 2014-09-11 | Massachusetts Institute Of Technology | Engineering and optimization of systems, methods and compositions for sequence manipulation with functional domains |
US20140273232A1 (en) | 2012-12-12 | 2014-09-18 | The Broad Institute, Inc. | Engineering of systems, methods and optimized guide compositions for sequence manipulation |
US20140273231A1 (en) | 2012-12-12 | 2014-09-18 | The Broad Institute, Inc. | Crispr-cas component systems, methods and compositions for sequence manipulation |
US20140357530A1 (en) | 2012-12-12 | 2014-12-04 | The Broad Institute Inc. | Functional genomics using crispr-cas systems, compositions, methods, knock out libraries and applications thereof |
US8895308B1 (en) | 2012-12-12 | 2014-11-25 | The Broad Institute Inc. | Engineering and optimization of improved systems, methods and enzyme compositions for sequence manipulation |
US8697359B1 (en) | 2012-12-12 | 2014-04-15 | The Broad Institute, Inc. | CRISPR-Cas systems and methods for altering expression of gene products |
US20140273234A1 (en) | 2012-12-12 | 2014-09-18 | The Board Institute, Inc. | Engineering and optimization of improved systems, methods and enzyme compositions for sequence manipulation |
US8889356B2 (en) | 2012-12-12 | 2014-11-18 | The Broad Institute Inc. | CRISPR-Cas nickase systems, methods and compositions for sequence manipulation in eukaryotes |
US8889418B2 (en) | 2012-12-12 | 2014-11-18 | The Broad Institute Inc. | Engineering and optimization of improved systems, methods and enzyme compositions for sequence manipulation |
US20140335620A1 (en) | 2012-12-12 | 2014-11-13 | The Broad Institute, Inc. | Engineering and optimization of improved systems, methods and enzyme compositions for sequence manipulation |
US20140242664A1 (en) | 2012-12-12 | 2014-08-28 | The Broad Institute, Inc. | Engineering of systems, methods and optimized guide compositions for sequence manipulation |
US8871445B2 (en) | 2012-12-12 | 2014-10-28 | The Broad Institute Inc. | CRISPR-Cas component systems, methods and compositions for sequence manipulation |
US20140298547A1 (en) | 2012-12-13 | 2014-10-02 | Dow Agrosciences Llc | Dna detection methods for site specific nuclease activity |
US20140342458A1 (en) | 2012-12-17 | 2014-11-20 | President And Fellows Of Harvard College | RNA-Guided Human Genome Engineering |
US20140356958A1 (en) | 2012-12-17 | 2014-12-04 | President And Fellows Of Harvard College | RNA-Guided Human Genome Engineering |
US20140342456A1 (en) | 2012-12-17 | 2014-11-20 | President And Fellows Of Harvard College | RNA-Guided Human Genome Engineering |
US20140342457A1 (en) | 2012-12-17 | 2014-11-20 | President And Fellows Of Harvard College | RNA-Guided Human Genome Engineering |
US20140242702A1 (en) | 2013-02-25 | 2014-08-28 | Sigma Aldrich Co. Llc | Methods and compositions for enhancing nuclease-mediated gene disruption |
US20140315985A1 (en) | 2013-03-14 | 2014-10-23 | Caribou Biosciences, Inc. | Compositions and methods of nucleic acid-targeting nucleic acids |
US20140295557A1 (en) | 2013-03-15 | 2014-10-02 | The General Hospital Corporation | Using Truncated Guide RNAs (tru-gRNAs) to Increase Specificity for RNA-Guided Genome Editing |
US20140287938A1 (en) | 2013-03-15 | 2014-09-25 | The Broad Institute, Inc. | Recombinant virus and preparations thereof |
US20140273037A1 (en) | 2013-03-15 | 2014-09-18 | System Biosciences, Llc | Compositions and methods directed to crispr/cas genomic engineering systems |
US20140273230A1 (en) | 2013-03-15 | 2014-09-18 | Sigma-Aldrich Co., Llc | Crispr-based genome modification and regulation |
US20140273235A1 (en) | 2013-03-15 | 2014-09-18 | Regents Of The University Of Minnesota | ENGINEERING PLANT GENOMES USING CRISPR/Cas SYSTEMS |
US20140295556A1 (en) | 2013-03-15 | 2014-10-02 | The General Hospital Corporation | Using RNA-guided FokI Nucleases (RFNs) to Increase Specificity for RNA-Guided Genome Editing |
US20140273226A1 (en) | 2013-03-15 | 2014-09-18 | System Biosciences, Llc | Crispr/cas systems for genomic modification and gene modulation |
US20140349400A1 (en) | 2013-03-15 | 2014-11-27 | Massachusetts Institute Of Technology | Programmable Modification of DNA |
US20140364333A1 (en) | 2013-03-15 | 2014-12-11 | President And Fellows Of Harvard College | Methods for Live Imaging of Cells |
US20140273233A1 (en) | 2013-03-15 | 2014-09-18 | Sigma-Aldrich Co., Llc | Crispr-based genome modification and regulation |
US20140304853A1 (en) | 2013-04-05 | 2014-10-09 | Sangamo Biosciences, Inc. | Methods and compositions for integration of an exogenous sequence within the genome of plants |
US20140310828A1 (en) | 2013-04-16 | 2014-10-16 | Regeneron Pharmaceuticals, Inc. | Targeted modification of rat genome |
US20140309487A1 (en) | 2013-04-16 | 2014-10-16 | Regeneron Pharmaceuticals, Inc. | Targeted modification of rat genome |
US20140335063A1 (en) | 2013-05-10 | 2014-11-13 | Sangamo Biosciences, Inc. | Delivery methods and compositions for nuclease-mediated genome engineering |
US20140349405A1 (en) | 2013-05-22 | 2014-11-27 | Wisconsin Alumni Research Foundation | Rna-directed dna cleavage and gene editing by cas9 enzyme from neisseria meningitidis |
US20140356867A1 (en) | 2013-05-29 | 2014-12-04 | Agilent Technologies, Inc. | Nucleic acid enrichment using cas9 |
US20140357523A1 (en) | 2013-05-29 | 2014-12-04 | Agilent Technologies, Inc. | Method for fragmenting genomic dna using cas9 |
US20140356959A1 (en) | 2013-06-04 | 2014-12-04 | President And Fellows Of Harvard College | RNA-Guided Transcriptional Regulation |
US20140356956A1 (en) | 2013-06-04 | 2014-12-04 | President And Fellows Of Harvard College | RNA-Guided Transcriptional Regulation |
Non-Patent Citations (39)
Title |
---|
"GenBank", Database accession no. S62283 |
"Methods in Enzymology", vol. 266, 1996, ACADEMIC PRESS, INC., article "Computer Methods for Macromolecular Sequence Analysis" |
AUER, GENOME RES., 31 October 2013 (2013-10-31) |
BARTGE ET AL., PROC. NATL. ACAD. SCI. USA, vol. 85, 1988, pages 3648 - 3652 |
BISHOPCHEN, J CHEM BIOL, vol. 2, 2009, pages 1 - 9 |
BOUNDY ET AL., J. NEUROSCI., vol. 18, 1998, pages 9989 |
BRINER ET AL., MOL CELL, vol. 56, no. 2, 23 October 2014 (2014-10-23), pages 333 - 9 |
CHEN ET AL., CELL, vol. 51, 1987, pages 7 - 19 |
CHENG, CELL RES., vol. 23, no. 10, October 2013 (2013-10-01), pages 1163 - 71 |
CHO, GENETICS, vol. 195, no. 3, November 2013 (2013-11-01), pages 1177 - 80 |
CHYLINSKI ET AL., RNA BIOL., vol. 10, no. 5, May 2013 (2013-05-01), pages 726 - 37 |
COMB ET AL., EMBO J., vol. 17, 1988, pages 3793 - 3805 |
DICARLO ET AL., NUCLEIC ACIDS RES., vol. 41, no. 7, April 2013 (2013-04-01), pages 4336 - 43 |
DICKINSON, NAT METHODS, vol. 10, no. 10, October 2013 (2013-10-01), pages 1028 - 34 |
EBINA, SCI REP., vol. 3, 2013, pages 2510 |
HOU ET AL., PROC NATL ACAD SCI USA, vol. 110, no. 39, 24 September 2013 (2013-09-24), pages 15644 - 9 |
HSU ET AL.: "Development and applications of CRISPR-Cas9 for genome engineering", CELL, vol. 157, no. 6, 5 June 2014 (2014-06-05), pages 1262 - 78, XP028849523 * |
HU, CELL RES., vol. 23, no. 11, November 2013 (2013-11-01), pages 1322 - 5 |
J. MOL. BIOL., vol. 48, 1970, pages 443 - 453 |
JIANG, NUCLEIC ACIDS RES., vol. 41, no. 20, 1 November 2013 (2013-11-01), pages e188 |
JINEK ET AL., ELIFE, vol. 2, 2013, pages e00471 |
JINEK ET AL., SCIENCE, vol. 337, no. 6096, 17 August 2012 (2012-08-17), pages 816 - 21 |
KANEDA ET AL., NEURON, vol. 6, 1991, pages 583 - 594 |
LARSON, NAT PROTOC., vol. 8, no. 11, November 2013 (2013-11-01), pages 2281 - 308 |
LLEWELLYN ET AL., NAT. MED., vol. 16, no. 10, 2010, pages 1161 - 1166 |
MA ET AL., BIOMED RES INT., 2013 |
METH. MOL. BIOL., vol. 70, 1997, pages 173 - 187 |
NAKAYAMA, GENESIS, vol. 51, no. 12, December 2013 (2013-12-01), pages 835 - 43 |
OBERDICK ET AL., SCIENCE, vol. 248, 1990, pages 223 - 226 |
OH ET AL., GENE THER, vol. 16, 2009, pages 437 |
PATTANAYAK ET AL., NAT BIOTECHNOL., vol. 31, no. 9, September 2013 (2013-09-01), pages 839 - 43 |
PUTYRSKISCHULTZ, FEBS LETTERS, vol. 586, 2012, pages 2097 - 2105 |
QI ET AL., CELL, vol. 152, no. 5, 28 February 2013 (2013-02-28), pages 1173 - 83 |
RADOVICK ET AL., PROC. NATL. ACAD. SCI. USA, vol. 88, 1991, pages 3402 - 3406 |
SASAOKA ET AL., MOL. BRAIN RES., vol. 16, 1992, pages 274 |
UPADHYAY, G3 (BETHESDA, vol. 3, no. 12, 9 December 2013 (2013-12-09), pages 2233 - 8 |
WANG ET AL., CELL, vol. 153, no. 4, 9 May 2013 (2013-05-09), pages 910 - 8 |
XIE, MOL PLANT., 9 October 2013 (2013-10-09) |
YANG, CELL, vol. 154, no. 6, 12 September 2013 (2013-09-12), pages 1370 - 9 |
Cited By (82)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10323236B2 (en) | 2011-07-22 | 2019-06-18 | President And Fellows Of Harvard College | Evaluation and improvement of nuclease cleavage specificity |
US12006520B2 (en) | 2011-07-22 | 2024-06-11 | President And Fellows Of Harvard College | Evaluation and improvement of nuclease cleavage specificity |
US11976307B2 (en) | 2012-04-27 | 2024-05-07 | Duke University | Genetic correction of mutated genes |
US10975390B2 (en) | 2013-04-16 | 2021-04-13 | Regeneron Pharmaceuticals, Inc. | Targeted modification of rat genome |
US12037596B2 (en) | 2013-04-16 | 2024-07-16 | Regeneron Pharmaceuticals, Inc. | Targeted modification of rat genome |
US10385359B2 (en) | 2013-04-16 | 2019-08-20 | Regeneron Pharmaceuticals, Inc. | Targeted modification of rat genome |
US10954548B2 (en) | 2013-08-09 | 2021-03-23 | President And Fellows Of Harvard College | Nuclease profiling system |
US11920181B2 (en) | 2013-08-09 | 2024-03-05 | President And Fellows Of Harvard College | Nuclease profiling system |
US10508298B2 (en) | 2013-08-09 | 2019-12-17 | President And Fellows Of Harvard College | Methods for identifying a target site of a CAS9 nuclease |
US11046948B2 (en) | 2013-08-22 | 2021-06-29 | President And Fellows Of Harvard College | Engineered transcription activator-like effector (TALE) domains and uses thereof |
US9999671B2 (en) | 2013-09-06 | 2018-06-19 | President And Fellows Of Harvard College | Delivery of negatively charged proteins using cationic lipids |
US11299755B2 (en) | 2013-09-06 | 2022-04-12 | President And Fellows Of Harvard College | Switchable CAS9 nucleases and uses thereof |
US9526784B2 (en) | 2013-09-06 | 2016-12-27 | President And Fellows Of Harvard College | Delivery system for functional nucleases |
US10858639B2 (en) | 2013-09-06 | 2020-12-08 | President And Fellows Of Harvard College | CAS9 variants and uses thereof |
US10597679B2 (en) | 2013-09-06 | 2020-03-24 | President And Fellows Of Harvard College | Switchable Cas9 nucleases and uses thereof |
US10682410B2 (en) | 2013-09-06 | 2020-06-16 | President And Fellows Of Harvard College | Delivery system for functional nucleases |
US9737604B2 (en) | 2013-09-06 | 2017-08-22 | President And Fellows Of Harvard College | Use of cationic lipids to deliver CAS9 |
US10912833B2 (en) | 2013-09-06 | 2021-02-09 | President And Fellows Of Harvard College | Delivery of negatively charged proteins using cationic lipids |
US10208317B2 (en) | 2013-12-11 | 2019-02-19 | Regeneron Pharmaceuticals, Inc. | Methods and compositions for the targeted modification of a mouse embryonic stem cell genome |
US10711280B2 (en) | 2013-12-11 | 2020-07-14 | Regeneron Pharmaceuticals, Inc. | Methods and compositions for the targeted modification of a mouse ES cell genome |
US11820997B2 (en) | 2013-12-11 | 2023-11-21 | Regeneron Pharmaceuticals, Inc. | Methods and compositions for the targeted modification of a genome |
US11053481B2 (en) | 2013-12-12 | 2021-07-06 | President And Fellows Of Harvard College | Fusions of Cas9 domains and nucleic acid-editing domains |
US10465176B2 (en) | 2013-12-12 | 2019-11-05 | President And Fellows Of Harvard College | Cas variants for gene editing |
US9840699B2 (en) | 2013-12-12 | 2017-12-12 | President And Fellows Of Harvard College | Methods for nucleic acid editing |
US11124782B2 (en) | 2013-12-12 | 2021-09-21 | President And Fellows Of Harvard College | Cas variants for gene editing |
US10077453B2 (en) | 2014-07-30 | 2018-09-18 | President And Fellows Of Harvard College | CAS9 proteins including ligand-dependent inteins |
US11578343B2 (en) | 2014-07-30 | 2023-02-14 | President And Fellows Of Harvard College | CAS9 proteins including ligand-dependent inteins |
US10704062B2 (en) | 2014-07-30 | 2020-07-07 | President And Fellows Of Harvard College | CAS9 proteins including ligand-dependent inteins |
US11680268B2 (en) | 2014-11-07 | 2023-06-20 | Editas Medicine, Inc. | Methods for improving CRISPR/Cas-mediated genome-editing |
US11390884B2 (en) | 2015-05-11 | 2022-07-19 | Editas Medicine, Inc. | Optimized CRISPR/cas9 systems and methods for gene editing in stem cells |
US11911415B2 (en) | 2015-06-09 | 2024-02-27 | Editas Medicine, Inc. | CRISPR/Cas-related methods and compositions for improving transplantation |
US11667911B2 (en) | 2015-09-24 | 2023-06-06 | Editas Medicine, Inc. | Use of exonucleases to improve CRISPR/CAS-mediated genome editing |
US11970710B2 (en) | 2015-10-13 | 2024-04-30 | Duke University | Genome engineering with Type I CRISPR systems in eukaryotic cells |
US11214780B2 (en) | 2015-10-23 | 2022-01-04 | President And Fellows Of Harvard College | Nucleobase editors and uses thereof |
US10167457B2 (en) | 2015-10-23 | 2019-01-01 | President And Fellows Of Harvard College | Nucleobase editors and uses thereof |
US12043852B2 (en) | 2015-10-23 | 2024-07-23 | President And Fellows Of Harvard College | Evolved Cas9 proteins for gene editing |
WO2017075335A1 (en) | 2015-10-28 | 2017-05-04 | Voyager Therapeutics, Inc. | Regulatable expression using adeno-associated virus (aav) |
US11597924B2 (en) | 2016-03-25 | 2023-03-07 | Editas Medicine, Inc. | Genome editing systems comprising repair-modulating enzyme molecules and methods of their use |
US11236313B2 (en) | 2016-04-13 | 2022-02-01 | Editas Medicine, Inc. | Cas9 fusion molecules, gene editing systems, and methods of use thereof |
US12049651B2 (en) | 2016-04-13 | 2024-07-30 | Editas Medicine, Inc. | Cas9 fusion molecules, gene editing systems, and methods of use thereof |
US10113163B2 (en) | 2016-08-03 | 2018-10-30 | President And Fellows Of Harvard College | Adenosine nucleobase editors and uses thereof |
US11999947B2 (en) | 2016-08-03 | 2024-06-04 | President And Fellows Of Harvard College | Adenosine nucleobase editors and uses thereof |
US11702651B2 (en) | 2016-08-03 | 2023-07-18 | President And Fellows Of Harvard College | Adenosine nucleobase editors and uses thereof |
US10947530B2 (en) | 2016-08-03 | 2021-03-16 | President And Fellows Of Harvard College | Adenosine nucleobase editors and uses thereof |
US11661590B2 (en) | 2016-08-09 | 2023-05-30 | President And Fellows Of Harvard College | Programmable CAS9-recombinase fusion proteins and uses thereof |
US11542509B2 (en) | 2016-08-24 | 2023-01-03 | President And Fellows Of Harvard College | Incorporation of unnatural amino acids into proteins using base editing |
US12084663B2 (en) | 2016-08-24 | 2024-09-10 | President And Fellows Of Harvard College | Incorporation of unnatural amino acids into proteins using base editing |
US11306324B2 (en) | 2016-10-14 | 2022-04-19 | President And Fellows Of Harvard College | AAV delivery of nucleobase editors |
WO2018089664A1 (en) * | 2016-11-11 | 2018-05-17 | The Regents Of The University Of California | Variant rna-guided polypeptides and methods of use |
EP3538561A4 (en) * | 2016-11-11 | 2020-10-21 | The Regents of The University of California | Variant rna-guided polypeptides and methods of use |
US11180778B2 (en) | 2016-11-11 | 2021-11-23 | The Regents Of The University Of California | Variant RNA-guided polypeptides and methods of use |
US10745677B2 (en) | 2016-12-23 | 2020-08-18 | President And Fellows Of Harvard College | Editing of CCR5 receptor gene to protect against HIV infection |
US11820969B2 (en) | 2016-12-23 | 2023-11-21 | President And Fellows Of Harvard College | Editing of CCR2 receptor gene to protect against HIV infection |
US12110545B2 (en) | 2017-01-06 | 2024-10-08 | Editas Medicine, Inc. | Methods of assessing nuclease cleavage |
US20200224221A1 (en) * | 2017-02-20 | 2020-07-16 | Institute Of Genetics And Developmental Biology, Chinese Academy Of Sciences | Genome editing method |
US12129479B2 (en) | 2017-02-20 | 2024-10-29 | Suzhou Qi Biodesign Biotechnology Company Limited | Genome editing method |
US11898179B2 (en) | 2017-03-09 | 2024-02-13 | President And Fellows Of Harvard College | Suppression of pain by gene editing |
US11542496B2 (en) | 2017-03-10 | 2023-01-03 | President And Fellows Of Harvard College | Cytosine to guanine base editor |
US11268082B2 (en) | 2017-03-23 | 2022-03-08 | President And Fellows Of Harvard College | Nucleobase editors comprising nucleic acid programmable DNA binding proteins |
US11499151B2 (en) | 2017-04-28 | 2022-11-15 | Editas Medicine, Inc. | Methods and systems for analyzing guide RNA molecules |
US11560566B2 (en) | 2017-05-12 | 2023-01-24 | President And Fellows Of Harvard College | Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation |
US10428319B2 (en) | 2017-06-09 | 2019-10-01 | Editas Medicine, Inc. | Engineered Cas9 nucleases |
US11098297B2 (en) | 2017-06-09 | 2021-08-24 | Editas Medicine, Inc. | Engineered Cas9 nucleases |
US11866726B2 (en) | 2017-07-14 | 2024-01-09 | Editas Medicine, Inc. | Systems and methods for targeted integration and genome editing and detection thereof using integrated priming sites |
US11732274B2 (en) | 2017-07-28 | 2023-08-22 | President And Fellows Of Harvard College | Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE) |
US11319532B2 (en) | 2017-08-30 | 2022-05-03 | President And Fellows Of Harvard College | High efficiency base editors comprising Gam |
US11932884B2 (en) | 2017-08-30 | 2024-03-19 | President And Fellows Of Harvard College | High efficiency base editors comprising Gam |
US11795443B2 (en) | 2017-10-16 | 2023-10-24 | The Broad Institute, Inc. | Uses of adenosine base editors |
US11643652B2 (en) | 2019-03-19 | 2023-05-09 | The Broad Institute, Inc. | Methods and compositions for prime editing nucleotide sequences |
US11795452B2 (en) | 2019-03-19 | 2023-10-24 | The Broad Institute, Inc. | Methods and compositions for prime editing nucleotide sequences |
US11447770B1 (en) | 2019-03-19 | 2022-09-20 | The Broad Institute, Inc. | Methods and compositions for prime editing nucleotide sequences |
US12031126B2 (en) | 2020-05-08 | 2024-07-09 | The Broad Institute, Inc. | Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence |
US11912985B2 (en) | 2020-05-08 | 2024-02-27 | The Broad Institute, Inc. | Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence |
WO2023010135A1 (en) | 2021-07-30 | 2023-02-02 | Tune Therapeutics, Inc. | Compositions and methods for modulating expression of methyl-cpg binding protein 2 (mecp2) |
WO2023010133A2 (en) | 2021-07-30 | 2023-02-02 | Tune Therapeutics, Inc. | Compositions and methods for modulating expression of frataxin (fxn) |
US12098399B2 (en) | 2022-06-24 | 2024-09-24 | Tune Therapeutics, Inc. | Compositions, systems, and methods for epigenetic regulation of proprotein convertase subtilisin/kexin type 9 (PCSK9) gene expression |
WO2023250511A2 (en) | 2022-06-24 | 2023-12-28 | Tune Therapeutics, Inc. | Compositions, systems, and methods for reducing low-density lipoprotein through targeted gene repression |
WO2024015881A2 (en) | 2022-07-12 | 2024-01-18 | Tune Therapeutics, Inc. | Compositions, systems, and methods for targeted transcriptional activation |
WO2024040254A2 (en) | 2022-08-19 | 2024-02-22 | Tune Therapeutics, Inc. | Compositions, systems, and methods for regulation of hepatitis b virus through targeted gene repression |
WO2024064642A2 (en) | 2022-09-19 | 2024-03-28 | Tune Therapeutics, Inc. | Compositions, systems, and methods for modulating t cell function |
WO2024163683A2 (en) | 2023-02-01 | 2024-08-08 | Tune Therapeutics, Inc. | Systems, compositions, and methods for modulating expression of methyl-cpg binding protein 2 (mecp2) and x-inactive specific transcript (xist) |
WO2024163678A2 (en) | 2023-02-01 | 2024-08-08 | Tune Therapeutics, Inc. | Fusion proteins and systems for targeted activation of frataxin (fxn) and related methods |
Also Published As
Publication number | Publication date |
---|---|
EP3245232A1 (en) | 2017-11-22 |
US11208638B2 (en) | 2021-12-28 |
EP3245232A4 (en) | 2018-07-04 |
EP3245232B1 (en) | 2021-04-21 |
US20180002682A1 (en) | 2018-01-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3245232B1 (en) | Heterodimeric cas9 and methods of use thereof | |
US10793842B2 (en) | Cas9 variants and methods of use thereof | |
EP3307762B1 (en) | Reporter cas9 variants and methods of use thereof | |
US20230227859A1 (en) | Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription | |
US11180792B2 (en) | Methods and compositions for labeling a single-stranded target nucleic acid | |
EP3724332A1 (en) | Novel rna-programmable endonuclease systems and their use in genome editing and other applications | |
US20210054353A1 (en) | Novel rna-programmable endonuclease systems and uses thereof | |
WO2017048969A1 (en) | Variant cas9 polypeptides comprising internal insertions | |
EP3080266A1 (en) | Methods and compositions for modifying a single stranded target nucleic acid | |
AU2022290382A9 (en) | Type v rna programmable endonuclease systems | |
EP4101928A1 (en) | Type v rna programmable endonuclease systems | |
WO2023118068A1 (en) | Novel small type v rna programmable endonuclease systems | |
WO2023237587A1 (en) | Novel small type v rna programmable endonuclease systems | |
CN118103502A (en) | V-type RNA programmable endonuclease system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16737638 Country of ref document: EP Kind code of ref document: A1 |
|
REEP | Request for entry into the european phase |
Ref document number: 2016737638 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |