WO2024156084A1 - Variants of cpf1 (cas12a) with improved activity - Google Patents
Variants of cpf1 (cas12a) with improved activity Download PDFInfo
- Publication number
- WO2024156084A1 WO2024156084A1 PCT/CN2023/073486 CN2023073486W WO2024156084A1 WO 2024156084 A1 WO2024156084 A1 WO 2024156084A1 CN 2023073486 W CN2023073486 W CN 2023073486W WO 2024156084 A1 WO2024156084 A1 WO 2024156084A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- protein
- nucleic acid
- cas12a
- sequence
- cell
- Prior art date
Links
- 230000000694 effects Effects 0.000 title description 20
- 230000001976 improved effect Effects 0.000 title description 8
- 101150059443 cas12a gene Proteins 0.000 title description 4
- 101100329224 Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003) cpf1 gene Proteins 0.000 title description 2
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 241
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 224
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 224
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 187
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 134
- 108700004991 Cas12a Proteins 0.000 claims abstract description 132
- 231100000350 mutagenesis Toxicity 0.000 claims abstract description 71
- 238000000034 method Methods 0.000 claims abstract description 69
- 102000037865 fusion proteins Human genes 0.000 claims abstract description 66
- 108020001507 fusion proteins Proteins 0.000 claims abstract description 66
- 239000013598 vector Substances 0.000 claims abstract description 45
- 235000018102 proteins Nutrition 0.000 claims description 130
- 241000196324 Embryophyta Species 0.000 claims description 96
- 108020004414 DNA Proteins 0.000 claims description 72
- 238000006467 substitution reaction Methods 0.000 claims description 69
- 240000008042 Zea mays Species 0.000 claims description 42
- 108020005004 Guide RNA Proteins 0.000 claims description 38
- 101710163270 Nuclease Proteins 0.000 claims description 36
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 32
- 150000001413 amino acids Chemical group 0.000 claims description 28
- 108010077850 Nuclear Localization Signals Proteins 0.000 claims description 24
- 125000003607 serino group Chemical group [H]N([H])[C@]([H])(C(=O)[*])C(O[H])([H])[H] 0.000 claims description 21
- 230000000295 complement effect Effects 0.000 claims description 19
- 235000018417 cysteine Nutrition 0.000 claims description 18
- XUJNEKJLAYXESH-UHFFFAOYSA-N cysteine Natural products SCC(N)C(O)=O XUJNEKJLAYXESH-UHFFFAOYSA-N 0.000 claims description 18
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 claims description 16
- 125000000637 arginyl group Chemical group N[C@@H](CCCNC(N)=N)C(=O)* 0.000 claims description 15
- 239000004475 Arginine Substances 0.000 claims description 14
- ODKSFYDXXFIFQN-UHFFFAOYSA-N arginine Natural products OC(=O)C(N)CCCNC(N)=N ODKSFYDXXFIFQN-UHFFFAOYSA-N 0.000 claims description 14
- 244000068988 Glycine max Species 0.000 claims description 12
- CKLJMWTZIZZHCS-REOHCLBHSA-N L-aspartic acid Chemical compound OC(=O)[C@@H](N)CC(O)=O CKLJMWTZIZZHCS-REOHCLBHSA-N 0.000 claims description 11
- 108010008532 Deoxyribonuclease I Proteins 0.000 claims description 10
- 102000007260 Deoxyribonuclease I Human genes 0.000 claims description 10
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 claims description 7
- WHUUTDBJXJRKMK-UHFFFAOYSA-N Glutamic acid Natural products OC(=O)C(N)CCC(O)=O WHUUTDBJXJRKMK-UHFFFAOYSA-N 0.000 claims description 7
- 240000007594 Oryza sativa Species 0.000 claims description 7
- 235000003704 aspartic acid Nutrition 0.000 claims description 7
- OQFSQFPPLPISGP-UHFFFAOYSA-N beta-carboxyaspartic acid Natural products OC(=O)C(N)C(C(O)=O)C(O)=O OQFSQFPPLPISGP-UHFFFAOYSA-N 0.000 claims description 7
- 235000013922 glutamic acid Nutrition 0.000 claims description 7
- 239000004220 glutamic acid Substances 0.000 claims description 7
- 102100034343 Integrase Human genes 0.000 claims description 6
- 239000004332 silver Substances 0.000 claims description 5
- 244000020551 Helianthus annuus Species 0.000 claims description 4
- 108060004795 Methyltransferase Proteins 0.000 claims description 4
- 108091006106 transcriptional activators Proteins 0.000 claims description 4
- 108091006107 transcriptional repressors Proteins 0.000 claims description 4
- 108010013043 Acetylesterase Proteins 0.000 claims description 3
- 108010061833 Integrases Proteins 0.000 claims description 3
- 102100025169 Max-binding protein MNT Human genes 0.000 claims description 3
- 102000016397 Methyltransferase Human genes 0.000 claims description 3
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 claims description 3
- 102000018120 Recombinases Human genes 0.000 claims description 3
- 108010091086 Recombinases Proteins 0.000 claims description 3
- 240000003768 Solanum lycopersicum Species 0.000 claims description 3
- 108091023040 Transcription factor Proteins 0.000 claims description 3
- 102000040945 Transcription factor Human genes 0.000 claims description 3
- 102000008579 Transposases Human genes 0.000 claims description 3
- 108010020764 Transposases Proteins 0.000 claims description 3
- 101710172430 Uracil-DNA glycosylase inhibitor Proteins 0.000 claims description 3
- 102000005421 acetyltransferase Human genes 0.000 claims description 3
- 108020002494 acetyltransferase Proteins 0.000 claims description 3
- 244000098338 Triticum aestivum Species 0.000 claims 1
- 101100532801 Caenorhabditis elegans sdn-1 gene Proteins 0.000 abstract description 8
- 210000004027 cell Anatomy 0.000 description 210
- 125000005647 linker group Chemical group 0.000 description 90
- 102000053602 DNA Human genes 0.000 description 69
- 108090000765 processed proteins & peptides Proteins 0.000 description 67
- 125000003729 nucleotide group Chemical group 0.000 description 57
- 239000002773 nucleotide Substances 0.000 description 56
- 230000014509 gene expression Effects 0.000 description 49
- 125000000151 cysteine group Chemical group N[C@@H](CS)C(=O)* 0.000 description 42
- 229940024606 amino acid Drugs 0.000 description 37
- 229920001184 polypeptide Polymers 0.000 description 37
- 102000004196 processed proteins & peptides Human genes 0.000 description 37
- 210000001519 tissue Anatomy 0.000 description 33
- 235000016383 Zea mays subsp huehuetenangensis Nutrition 0.000 description 31
- 235000002017 Zea mays subsp mays Nutrition 0.000 description 31
- 235000009973 maize Nutrition 0.000 description 31
- 230000004048 modification Effects 0.000 description 31
- 238000012986 modification Methods 0.000 description 31
- 102000040430 polynucleotide Human genes 0.000 description 31
- 108091033319 polynucleotide Proteins 0.000 description 31
- 239000002157 polynucleotide Substances 0.000 description 29
- 230000001105 regulatory effect Effects 0.000 description 29
- 235000001014 amino acid Nutrition 0.000 description 26
- 230000009466 transformation Effects 0.000 description 25
- 125000003275 alpha amino acid group Chemical group 0.000 description 24
- 230000001404 mediated effect Effects 0.000 description 24
- 101100202645 Arabidopsis thaliana SDN1 gene Proteins 0.000 description 22
- 230000027455 binding Effects 0.000 description 21
- 108091079001 CRISPR RNA Proteins 0.000 description 19
- 229920002477 rna polymer Polymers 0.000 description 18
- 230000006870 function Effects 0.000 description 17
- 230000009261 transgenic effect Effects 0.000 description 17
- 108091026890 Coding region Proteins 0.000 description 16
- -1 phosphotriesters Chemical class 0.000 description 16
- 235000000346 sugar Nutrition 0.000 description 16
- 230000001939 inductive effect Effects 0.000 description 15
- 230000008685 targeting Effects 0.000 description 15
- 230000035772 mutation Effects 0.000 description 14
- 238000010453 CRISPR/Cas method Methods 0.000 description 13
- 238000004422 calculation algorithm Methods 0.000 description 13
- 239000012634 fragment Substances 0.000 description 13
- 241000589158 Agrobacterium Species 0.000 description 12
- 101000860092 Francisella tularensis subsp. novicida (strain U112) CRISPR-associated endonuclease Cas12a Proteins 0.000 description 12
- 230000001965 increasing effect Effects 0.000 description 12
- 241000589155 Agrobacterium tumefaciens Species 0.000 description 11
- 150000001875 compounds Chemical class 0.000 description 11
- 241000829100 Macaca mulatta polyomavirus 1 Species 0.000 description 10
- 230000001580 bacterial effect Effects 0.000 description 10
- 238000002869 basic local alignment search tool Methods 0.000 description 10
- 108010058731 nopaline synthase Proteins 0.000 description 10
- 125000006850 spacer group Chemical group 0.000 description 10
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 9
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 9
- 108091028113 Trans-activating crRNA Proteins 0.000 description 9
- 125000000623 heterocyclic group Chemical group 0.000 description 9
- 125000002496 methyl group Chemical group [H]C([H])([H])* 0.000 description 9
- 125000004573 morpholin-4-yl group Chemical group N1(CCOCC1)* 0.000 description 9
- 241000894006 Bacteria Species 0.000 description 8
- 108020004705 Codon Proteins 0.000 description 8
- 235000010469 Glycine max Nutrition 0.000 description 8
- 108090000848 Ubiquitin Proteins 0.000 description 8
- 102000044159 Ubiquitin Human genes 0.000 description 8
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 8
- 239000000203 mixture Substances 0.000 description 8
- 229920001223 polyethylene glycol Polymers 0.000 description 8
- 230000008439 repair process Effects 0.000 description 8
- 238000013518 transcription Methods 0.000 description 8
- 230000035897 transcription Effects 0.000 description 8
- 238000012384 transportation and delivery Methods 0.000 description 8
- 239000002202 Polyethylene glycol Substances 0.000 description 7
- 240000000111 Saccharum officinarum Species 0.000 description 7
- 235000007201 Saccharum officinarum Nutrition 0.000 description 7
- 241000700605 Viruses Species 0.000 description 7
- 125000000217 alkyl group Chemical group 0.000 description 7
- 238000003776 cleavage reaction Methods 0.000 description 7
- 238000012217 deletion Methods 0.000 description 7
- 230000037430 deletion Effects 0.000 description 7
- 210000002257 embryonic structure Anatomy 0.000 description 7
- 238000010362 genome editing Methods 0.000 description 7
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 7
- 238000000338 in vitro Methods 0.000 description 7
- 239000000463 material Substances 0.000 description 7
- 239000002777 nucleoside Substances 0.000 description 7
- 239000002245 particle Substances 0.000 description 7
- 230000007017 scission Effects 0.000 description 7
- 241000894007 species Species 0.000 description 7
- 239000000126 substance Substances 0.000 description 7
- 230000006819 RNA synthesis Effects 0.000 description 6
- 150000001408 amides Chemical class 0.000 description 6
- 125000000539 amino acid group Chemical group 0.000 description 6
- 230000001413 cellular effect Effects 0.000 description 6
- 210000000349 chromosome Anatomy 0.000 description 6
- 239000003623 enhancer Substances 0.000 description 6
- 238000003780 insertion Methods 0.000 description 6
- 230000037431 insertion Effects 0.000 description 6
- 239000013612 plasmid Substances 0.000 description 6
- 229920000642 polymer Polymers 0.000 description 6
- 230000002103 transcriptional effect Effects 0.000 description 6
- 238000011144 upstream manufacturing Methods 0.000 description 6
- UHDGCWIWMRVCDJ-UHFFFAOYSA-N 1-beta-D-Xylofuranosyl-NH-Cytosine Natural products O=C1N=C(N)C=CN1C1C(O)C(O)C(CO)O1 UHDGCWIWMRVCDJ-UHFFFAOYSA-N 0.000 description 5
- 101150033839 4 gene Proteins 0.000 description 5
- 108700028369 Alleles Proteins 0.000 description 5
- 108091033409 CRISPR Proteins 0.000 description 5
- 108700010070 Codon Usage Proteins 0.000 description 5
- UHDGCWIWMRVCDJ-PSQAKQOGSA-N Cytidine Natural products O=C1N=C(N)C=CN1[C@@H]1[C@@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-PSQAKQOGSA-N 0.000 description 5
- 238000007400 DNA extraction Methods 0.000 description 5
- 102000004190 Enzymes Human genes 0.000 description 5
- 108090000790 Enzymes Proteins 0.000 description 5
- 241000588724 Escherichia coli Species 0.000 description 5
- 108091093037 Peptide nucleic acid Proteins 0.000 description 5
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 5
- 238000004113 cell culture Methods 0.000 description 5
- 239000013078 crystal Substances 0.000 description 5
- 230000001186 cumulative effect Effects 0.000 description 5
- UHDGCWIWMRVCDJ-ZAKLUEHWSA-N cytidine Chemical compound O=C1N=C(N)C=CN1[C@H]1[C@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-ZAKLUEHWSA-N 0.000 description 5
- 230000005782 double-strand break Effects 0.000 description 5
- 239000013604 expression vector Substances 0.000 description 5
- 150000002243 furanoses Chemical group 0.000 description 5
- 230000003993 interaction Effects 0.000 description 5
- 238000004519 manufacturing process Methods 0.000 description 5
- 210000004940 nucleus Anatomy 0.000 description 5
- 230000002829 reductive effect Effects 0.000 description 5
- 125000001424 substituent group Chemical group 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000012546 transfer Methods 0.000 description 5
- 230000003612 virological effect Effects 0.000 description 5
- PEHVGBZKEYRQSX-UHFFFAOYSA-N 7-deaza-adenine Chemical compound NC1=NC=NC2=C1C=CN2 PEHVGBZKEYRQSX-UHFFFAOYSA-N 0.000 description 4
- DHMQDGOQFOQNFH-UHFFFAOYSA-N Glycine Chemical compound NCC(O)=O DHMQDGOQFOQNFH-UHFFFAOYSA-N 0.000 description 4
- 241000238631 Hexapoda Species 0.000 description 4
- 238000009015 Human TaqMan MicroRNA Assay kit Methods 0.000 description 4
- 241000542065 Moraxella bovoculi Species 0.000 description 4
- 235000007164 Oryza sativa Nutrition 0.000 description 4
- DZBUGLKDJFMEHC-UHFFFAOYSA-N acridine Chemical compound C1=CC=CC2=CC3=CC=CC=C3N=C21 DZBUGLKDJFMEHC-UHFFFAOYSA-N 0.000 description 4
- 125000000304 alkynyl group Chemical group 0.000 description 4
- 230000003115 biocidal effect Effects 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 4
- 229940104302 cytosine Drugs 0.000 description 4
- 210000002472 endoplasmic reticulum Anatomy 0.000 description 4
- 230000002068 genetic effect Effects 0.000 description 4
- 125000000291 glutamic acid group Chemical group N[C@@H](CCC(O)=O)C(=O)* 0.000 description 4
- 230000000670 limiting effect Effects 0.000 description 4
- 210000004962 mammalian cell Anatomy 0.000 description 4
- 239000003550 marker Substances 0.000 description 4
- 238000002703 mutagenesis Methods 0.000 description 4
- 125000003835 nucleoside group Chemical group 0.000 description 4
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 description 4
- 150000004713 phosphodiesters Chemical class 0.000 description 4
- 230000004481 post-translational protein modification Effects 0.000 description 4
- 210000001938 protoplast Anatomy 0.000 description 4
- 230000008263 repair mechanism Effects 0.000 description 4
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 4
- 230000005945 translocation Effects 0.000 description 4
- 229940035893 uracil Drugs 0.000 description 4
- KDCGOANMDULRCW-UHFFFAOYSA-N 7H-purine Chemical compound N1=CNC2=NC=NC2=C1 KDCGOANMDULRCW-UHFFFAOYSA-N 0.000 description 3
- QTBSBXVTEAMEQO-UHFFFAOYSA-N Acetic acid Chemical compound CC(O)=O QTBSBXVTEAMEQO-UHFFFAOYSA-N 0.000 description 3
- 229930024421 Adenine Natural products 0.000 description 3
- 241000219194 Arabidopsis Species 0.000 description 3
- 241000701489 Cauliflower mosaic virus Species 0.000 description 3
- 230000007018 DNA scission Effects 0.000 description 3
- 108700007698 Genetic Terminator Regions Proteins 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 3
- 206010035226 Plasma cell myeloma Diseases 0.000 description 3
- 108020004511 Recombinant DNA Proteins 0.000 description 3
- 102000007056 Recombinant Fusion Proteins Human genes 0.000 description 3
- 108010008281 Recombinant Fusion Proteins Proteins 0.000 description 3
- 238000010459 TALEN Methods 0.000 description 3
- RYYWUUFWQRZTIU-UHFFFAOYSA-N Thiophosphoric acid Chemical class OP(O)(S)=O RYYWUUFWQRZTIU-UHFFFAOYSA-N 0.000 description 3
- 241000209140 Triticum Species 0.000 description 3
- 108010017070 Zinc Finger Nucleases Proteins 0.000 description 3
- 229960000643 adenine Drugs 0.000 description 3
- 125000003342 alkenyl group Chemical group 0.000 description 3
- 238000000137 annealing Methods 0.000 description 3
- 229960001714 calcium phosphate Drugs 0.000 description 3
- 239000001506 calcium phosphate Substances 0.000 description 3
- 229910000389 calcium phosphate Inorganic materials 0.000 description 3
- 235000011010 calcium phosphates Nutrition 0.000 description 3
- 239000002299 complementary DNA Substances 0.000 description 3
- 230000021615 conjugation Effects 0.000 description 3
- 238000004520 electroporation Methods 0.000 description 3
- 210000003527 eukaryotic cell Anatomy 0.000 description 3
- 108091006047 fluorescent proteins Proteins 0.000 description 3
- 102000034287 fluorescent proteins Human genes 0.000 description 3
- BTCSSZJGUNDROE-UHFFFAOYSA-N gamma-aminobutyric acid Chemical compound NCCCC(O)=O BTCSSZJGUNDROE-UHFFFAOYSA-N 0.000 description 3
- BRZYSWJRSDMWLG-CAXSIQPQSA-N geneticin Chemical compound O1C[C@@](O)(C)[C@H](NC)[C@@H](O)[C@H]1O[C@@H]1[C@@H](O)[C@H](O[C@@H]2[C@@H]([C@@H](O)[C@H](O)[C@@H](C(C)O)O2)N)[C@@H](N)C[C@H]1N BRZYSWJRSDMWLG-CAXSIQPQSA-N 0.000 description 3
- 230000002363 herbicidal effect Effects 0.000 description 3
- 239000004009 herbicide Substances 0.000 description 3
- 125000002887 hydroxy group Chemical group [H]O* 0.000 description 3
- 239000002502 liposome Substances 0.000 description 3
- 230000004807 localization Effects 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 210000003470 mitochondria Anatomy 0.000 description 3
- 102000035118 modified proteins Human genes 0.000 description 3
- 108091005573 modified proteins Proteins 0.000 description 3
- 201000000050 myeloid neoplasm Diseases 0.000 description 3
- 239000002105 nanoparticle Substances 0.000 description 3
- 150000003833 nucleoside derivatives Chemical class 0.000 description 3
- 210000000056 organ Anatomy 0.000 description 3
- 230000003285 pharmacodynamic effect Effects 0.000 description 3
- 125000004437 phosphorous atom Chemical group 0.000 description 3
- 229910052698 phosphorus Inorganic materials 0.000 description 3
- 239000000047 product Substances 0.000 description 3
- 230000006798 recombination Effects 0.000 description 3
- 238000005215 recombination Methods 0.000 description 3
- 230000010076 replication Effects 0.000 description 3
- 238000012552 review Methods 0.000 description 3
- 238000002741 site-directed mutagenesis Methods 0.000 description 3
- 108010054045 starch-branching enzyme IIb Proteins 0.000 description 3
- 230000001131 transforming effect Effects 0.000 description 3
- QORWJWZARLRLPR-UHFFFAOYSA-H tricalcium bis(phosphate) Chemical compound [Ca+2].[Ca+2].[Ca+2].[O-]P([O-])([O-])=O.[O-]P([O-])([O-])=O QORWJWZARLRLPR-UHFFFAOYSA-H 0.000 description 3
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 2
- PIINGYXNCHTJTF-UHFFFAOYSA-N 2-(2-azaniumylethylamino)acetate Chemical group NCCNCC(O)=O PIINGYXNCHTJTF-UHFFFAOYSA-N 0.000 description 2
- FZWGECJQACGGTI-UHFFFAOYSA-N 2-amino-7-methyl-1,7-dihydro-6H-purin-6-one Chemical compound NC1=NC(O)=C2N(C)C=NC2=N1 FZWGECJQACGGTI-UHFFFAOYSA-N 0.000 description 2
- ICSNLGPSRYBMBD-UHFFFAOYSA-N 2-aminopyridine Chemical compound NC1=CC=CC=N1 ICSNLGPSRYBMBD-UHFFFAOYSA-N 0.000 description 2
- 125000003903 2-propenyl group Chemical group [H]C([*])([H])C([H])=C([H])[H] 0.000 description 2
- OVONXEQGWXGFJD-UHFFFAOYSA-N 4-sulfanylidene-1h-pyrimidin-2-one Chemical compound SC=1C=CNC(=O)N=1 OVONXEQGWXGFJD-UHFFFAOYSA-N 0.000 description 2
- RYVNIFSIEDRLSJ-UHFFFAOYSA-N 5-(hydroxymethyl)cytosine Chemical compound NC=1NC(=O)N=CC=1CO RYVNIFSIEDRLSJ-UHFFFAOYSA-N 0.000 description 2
- LRSASMSXMSNRBT-UHFFFAOYSA-N 5-methylcytosine Chemical compound CC1=CNC(=O)N=C1N LRSASMSXMSNRBT-UHFFFAOYSA-N 0.000 description 2
- HCGHYQLFMPXSDU-UHFFFAOYSA-N 7-methyladenine Chemical compound C1=NC(N)=C2N(C)C=NC2=N1 HCGHYQLFMPXSDU-UHFFFAOYSA-N 0.000 description 2
- UJOBWOGCFQCDNV-UHFFFAOYSA-N 9H-carbazole Chemical compound C1=CC=C2C3=CC=CC=C3NC2=C1 UJOBWOGCFQCDNV-UHFFFAOYSA-N 0.000 description 2
- MSSXOMSJDRHRMC-UHFFFAOYSA-N 9H-purine-2,6-diamine Chemical compound NC1=NC(N)=C2NC=NC2=N1 MSSXOMSJDRHRMC-UHFFFAOYSA-N 0.000 description 2
- LRFVTYWOQMYALW-UHFFFAOYSA-N 9H-xanthine Chemical compound O=C1NC(=O)NC2=C1NC=N2 LRFVTYWOQMYALW-UHFFFAOYSA-N 0.000 description 2
- 241000093740 Acidaminococcus sp. Species 0.000 description 2
- DCXYFEDJOCDNAF-UHFFFAOYSA-N Asparagine Natural products OC(=O)C(N)CC(N)=O DCXYFEDJOCDNAF-UHFFFAOYSA-N 0.000 description 2
- 244000025254 Cannabis sativa Species 0.000 description 2
- 102000014914 Carrier Proteins Human genes 0.000 description 2
- 108090000994 Catalytic RNA Proteins 0.000 description 2
- 102000053642 Catalytic RNA Human genes 0.000 description 2
- 229920000858 Cyclodextrin Polymers 0.000 description 2
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 2
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 2
- YQYJSBFKSSDGFO-UHFFFAOYSA-N Epihygromycin Natural products OC1C(O)C(C(=O)C)OC1OC(C(=C1)O)=CC=C1C=C(C)C(=O)NC1C(O)C(O)C2OCOC2C1O YQYJSBFKSSDGFO-UHFFFAOYSA-N 0.000 description 2
- 108010043121 Green Fluorescent Proteins Proteins 0.000 description 2
- 102000004144 Green Fluorescent Proteins Human genes 0.000 description 2
- HVLSXIKZNLPZJJ-TXZCQADKSA-N HA peptide Chemical compound C([C@@H](C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](C(C)C)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(=O)N[C@@H](C)C(O)=O)NC(=O)[C@H]1N(CCC1)C(=O)[C@@H](N)CC=1C=CC(O)=CC=1)C1=CC=C(O)C=C1 HVLSXIKZNLPZJJ-TXZCQADKSA-N 0.000 description 2
- 108091092195 Intron Proteins 0.000 description 2
- DCXYFEDJOCDNAF-REOHCLBHSA-N L-asparagine Chemical compound OC(=O)[C@@H](N)CC(N)=O DCXYFEDJOCDNAF-REOHCLBHSA-N 0.000 description 2
- FFEARJCKVFRZRR-BYPYZUCNSA-N L-methionine Chemical compound CSCC[C@H](N)C(O)=O FFEARJCKVFRZRR-BYPYZUCNSA-N 0.000 description 2
- QIVBCDIJIAJPQS-VIFPVBQESA-N L-tryptophane Chemical compound C1=CC=C2C(C[C@H](N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-VIFPVBQESA-N 0.000 description 2
- 241000904817 Lachnospiraceae bacterium Species 0.000 description 2
- 239000004472 Lysine Substances 0.000 description 2
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 description 2
- 102100038895 Myc proto-oncogene protein Human genes 0.000 description 2
- 108010066154 Nuclear Export Signals Proteins 0.000 description 2
- 102000002488 Nucleoplasmin Human genes 0.000 description 2
- 102220502409 Phosphoribosylformylglycinamidine synthase_C10S_mutation Human genes 0.000 description 2
- WCUXLLCKKVVCTQ-UHFFFAOYSA-M Potassium chloride Chemical compound [Cl-].[K+] WCUXLLCKKVVCTQ-UHFFFAOYSA-M 0.000 description 2
- CZPWVGJYEJSRLH-UHFFFAOYSA-N Pyrimidine Chemical compound C1=CN=CN=C1 CZPWVGJYEJSRLH-UHFFFAOYSA-N 0.000 description 2
- 108091008103 RNA aptamers Proteins 0.000 description 2
- 230000004570 RNA-binding Effects 0.000 description 2
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 2
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 2
- AYFVYJQAPQTCCC-UHFFFAOYSA-N Threonine Natural products CC(O)C(N)C(O)=O AYFVYJQAPQTCCC-UHFFFAOYSA-N 0.000 description 2
- 239000004473 Threonine Substances 0.000 description 2
- IQFYYKKMVGJFEH-XLPZGREQSA-N Thymidine Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](CO)[C@@H](O)C1 IQFYYKKMVGJFEH-XLPZGREQSA-N 0.000 description 2
- 108020004566 Transfer RNA Proteins 0.000 description 2
- 108700019146 Transgenes Proteins 0.000 description 2
- QIVBCDIJIAJPQS-UHFFFAOYSA-N Tryptophan Natural products C1=CC=C2C(CC(N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-UHFFFAOYSA-N 0.000 description 2
- DRTQHJPVMGBUCF-XVFCMESISA-N Uridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-XVFCMESISA-N 0.000 description 2
- 239000002253 acid Substances 0.000 description 2
- 230000004721 adaptive immunity Effects 0.000 description 2
- 230000009418 agronomic effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 235000009582 asparagine Nutrition 0.000 description 2
- 229960001230 asparagine Drugs 0.000 description 2
- UCMIRNVEIXFBKS-UHFFFAOYSA-N beta-alanine Chemical group NCCC(O)=O UCMIRNVEIXFBKS-UHFFFAOYSA-N 0.000 description 2
- 108091008324 binding proteins Proteins 0.000 description 2
- 230000008827 biological function Effects 0.000 description 2
- 230000007321 biological mechanism Effects 0.000 description 2
- 229910052799 carbon Inorganic materials 0.000 description 2
- 125000004432 carbon atom Chemical group C* 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 230000003197 catalytic effect Effects 0.000 description 2
- 210000000170 cell membrane Anatomy 0.000 description 2
- 210000002421 cell wall Anatomy 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 210000003763 chloroplast Anatomy 0.000 description 2
- HVYWMOMLDIMFJA-DPAQBDIFSA-N cholesterol Chemical group C1C=C2C[C@@H](O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H]([C@H](C)CCCC(C)C)[C@@]1(C)CC2 HVYWMOMLDIMFJA-DPAQBDIFSA-N 0.000 description 2
- 235000012000 cholesterol Nutrition 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000001276 controlling effect Effects 0.000 description 2
- 238000005520 cutting process Methods 0.000 description 2
- 125000000753 cycloalkyl group Chemical group 0.000 description 2
- 230000006378 damage Effects 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 230000002950 deficient Effects 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- NAGJZTKCGNOGPW-UHFFFAOYSA-N dithiophosphoric acid Chemical class OP(O)(S)=S NAGJZTKCGNOGPW-UHFFFAOYSA-N 0.000 description 2
- 230000011559 double-strand break repair via nonhomologous end joining Effects 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000002255 enzymatic effect Effects 0.000 description 2
- 125000000524 functional group Chemical group 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 229960003692 gamma aminobutyric acid Drugs 0.000 description 2
- 239000003862 glucocorticoid Substances 0.000 description 2
- 239000005090 green fluorescent protein Substances 0.000 description 2
- 239000001963 growth medium Substances 0.000 description 2
- 125000001475 halogen functional group Chemical group 0.000 description 2
- 125000005842 heteroatom Chemical group 0.000 description 2
- 238000002744 homologous recombination Methods 0.000 description 2
- 230000006801 homologous recombination Effects 0.000 description 2
- 238000009396 hybridization Methods 0.000 description 2
- FDGQSTZJBFJUBT-UHFFFAOYSA-N hypoxanthine Chemical compound O=C1NC=NC2=C1NC=N2 FDGQSTZJBFJUBT-UHFFFAOYSA-N 0.000 description 2
- 238000010348 incorporation Methods 0.000 description 2
- 230000008595 infiltration Effects 0.000 description 2
- 238000001764 infiltration Methods 0.000 description 2
- 230000000977 initiatory effect Effects 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 150000002632 lipids Chemical class 0.000 description 2
- 239000012528 membrane Substances 0.000 description 2
- 108020004999 messenger RNA Proteins 0.000 description 2
- 229930182817 methionine Natural products 0.000 description 2
- 230000011987 methylation Effects 0.000 description 2
- 238000007069 methylation reaction Methods 0.000 description 2
- 125000000325 methylidene group Chemical group [H]C([H])=* 0.000 description 2
- 238000000520 microinjection Methods 0.000 description 2
- 238000010369 molecular cloning Methods 0.000 description 2
- 239000000178 monomer Substances 0.000 description 2
- 230000006780 non-homologous end joining Effects 0.000 description 2
- 108060005597 nucleoplasmin Proteins 0.000 description 2
- 210000003463 organelle Anatomy 0.000 description 2
- 210000002824 peroxisome Anatomy 0.000 description 2
- RDOWQLZANAYVLL-UHFFFAOYSA-N phenanthridine Chemical compound C1=CC=C2C3=CC=CC=C3C=NC2=C1 RDOWQLZANAYVLL-UHFFFAOYSA-N 0.000 description 2
- 150000003904 phospholipids Chemical class 0.000 description 2
- 150000008298 phosphoramidates Chemical class 0.000 description 2
- 150000008300 phosphoramidites Chemical class 0.000 description 2
- 230000008488 polyadenylation Effects 0.000 description 2
- 229920000768 polyamine Polymers 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 210000001236 prokaryotic cell Anatomy 0.000 description 2
- 230000004850 protein–protein interaction Effects 0.000 description 2
- ZCCUUQDIBDJBTK-UHFFFAOYSA-N psoralen Chemical compound C1=C2OC(=O)C=CC2=CC2=C1OC=C2 ZCCUUQDIBDJBTK-UHFFFAOYSA-N 0.000 description 2
- 238000000746 purification Methods 0.000 description 2
- 150000003212 purines Chemical class 0.000 description 2
- 150000003230 pyrimidines Chemical class 0.000 description 2
- 108010054624 red fluorescent protein Proteins 0.000 description 2
- 230000014493 regulation of gene expression Effects 0.000 description 2
- 108091092562 ribozyme Proteins 0.000 description 2
- 150000003839 salts Chemical class 0.000 description 2
- HFHDHCJBZVLPGP-UHFFFAOYSA-N schardinger α-dextrin Chemical compound O1C(C(C2O)O)C(CO)OC2OC(C(C2O)O)C(CO)OC2OC(C(C2O)O)C(CO)OC2OC(C(O)C2O)C(CO)OC2OC(C(C2O)O)C(CO)OC2OC2C(O)C(O)C1OC2CO HFHDHCJBZVLPGP-UHFFFAOYSA-N 0.000 description 2
- 238000000527 sonication Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 125000003396 thiol group Chemical group [H]S* 0.000 description 2
- 229940113082 thymine Drugs 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 241001515965 unidentified phage Species 0.000 description 2
- 238000011179 visual inspection Methods 0.000 description 2
- 210000005253 yeast cell Anatomy 0.000 description 2
- 108091005957 yellow fluorescent proteins Proteins 0.000 description 2
- DIGQNXIGRZPYDK-WKSCXVIASA-N (2R)-6-amino-2-[[2-[[(2S)-2-[[2-[[(2R)-2-[[(2S)-2-[[(2R,3S)-2-[[2-[[(2S)-2-[[2-[[(2S)-2-[[(2S)-2-[[(2R)-2-[[(2S,3S)-2-[[(2R)-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[2-[[(2S)-2-[[(2R)-2-[[2-[[2-[[2-[(2-amino-1-hydroxyethylidene)amino]-3-carboxy-1-hydroxypropylidene]amino]-1-hydroxy-3-sulfanylpropylidene]amino]-1-hydroxyethylidene]amino]-1-hydroxy-3-sulfanylpropylidene]amino]-1,3-dihydroxypropylidene]amino]-1-hydroxyethylidene]amino]-1-hydroxypropylidene]amino]-1,3-dihydroxypropylidene]amino]-1,3-dihydroxypropylidene]amino]-1-hydroxy-3-sulfanylpropylidene]amino]-1,3-dihydroxybutylidene]amino]-1-hydroxy-3-sulfanylpropylidene]amino]-1-hydroxypropylidene]amino]-1,3-dihydroxypropylidene]amino]-1-hydroxyethylidene]amino]-1,5-dihydroxy-5-iminopentylidene]amino]-1-hydroxy-3-sulfanylpropylidene]amino]-1,3-dihydroxybutylidene]amino]-1-hydroxy-3-sulfanylpropylidene]amino]-1,3-dihydroxypropylidene]amino]-1-hydroxyethylidene]amino]-1-hydroxy-3-sulfanylpropylidene]amino]-1-hydroxyethylidene]amino]hexanoic acid Chemical compound C[C@@H]([C@@H](C(=N[C@@H](CS)C(=N[C@@H](C)C(=N[C@@H](CO)C(=NCC(=N[C@@H](CCC(=N)O)C(=NC(CS)C(=N[C@H]([C@H](C)O)C(=N[C@H](CS)C(=N[C@H](CO)C(=NCC(=N[C@H](CS)C(=NCC(=N[C@H](CCCCN)C(=O)O)O)O)O)O)O)O)O)O)O)O)O)O)O)N=C([C@H](CS)N=C([C@H](CO)N=C([C@H](CO)N=C([C@H](C)N=C(CN=C([C@H](CO)N=C([C@H](CS)N=C(CN=C(C(CS)N=C(C(CC(=O)O)N=C(CN)O)O)O)O)O)O)O)O)O)O)O)O DIGQNXIGRZPYDK-WKSCXVIASA-N 0.000 description 1
- YIMATHOGWXZHFX-WCTZXXKLSA-N (2r,3r,4r,5r)-5-(hydroxymethyl)-3-(2-methoxyethoxy)oxolane-2,4-diol Chemical compound COCCO[C@H]1[C@H](O)O[C@H](CO)[C@H]1O YIMATHOGWXZHFX-WCTZXXKLSA-N 0.000 description 1
- BHQCQFFYRZLCQQ-UHFFFAOYSA-N (3alpha,5alpha,7alpha,12alpha)-3,7,12-trihydroxy-cholan-24-oic acid Natural products OC1CC2CC(O)CCC2(C)C2C1C1CCC(C(CCC(O)=O)C)C1(C)C(O)C2 BHQCQFFYRZLCQQ-UHFFFAOYSA-N 0.000 description 1
- QGVQZRDQPDLHHV-DPAQBDIFSA-N (3s,8s,9s,10r,13r,14s,17r)-10,13-dimethyl-17-[(2r)-6-methylheptan-2-yl]-2,3,4,7,8,9,11,12,14,15,16,17-dodecahydro-1h-cyclopenta[a]phenanthrene-3-thiol Chemical compound C1C=C2C[C@@H](S)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H]([C@H](C)CCCC(C)C)[C@@]1(C)CC2 QGVQZRDQPDLHHV-DPAQBDIFSA-N 0.000 description 1
- 125000000008 (C1-C10) alkyl group Chemical group 0.000 description 1
- UFSCXDAOCAIFOG-UHFFFAOYSA-N 1,10-dihydropyrimido[5,4-b][1,4]benzothiazin-2-one Chemical compound S1C2=CC=CC=C2N=C2C1=CNC(=O)N2 UFSCXDAOCAIFOG-UHFFFAOYSA-N 0.000 description 1
- PTFYZDMJTFMPQW-UHFFFAOYSA-N 1,10-dihydropyrimido[5,4-b][1,4]benzoxazin-2-one Chemical compound O1C2=CC=CC=C2N=C2C1=CNC(=O)N2 PTFYZDMJTFMPQW-UHFFFAOYSA-N 0.000 description 1
- FYADHXFMURLYQI-UHFFFAOYSA-N 1,2,4-triazine Chemical class C1=CN=NC=N1 FYADHXFMURLYQI-UHFFFAOYSA-N 0.000 description 1
- FRJNIHLOMXIQKH-UHFFFAOYSA-N 1-amino-15-oxo-4,7,10-trioxa-14-azaoctadecan-18-oic acid Chemical compound NCCCOCCOCCOCCCNC(=O)CCC(O)=O FRJNIHLOMXIQKH-UHFFFAOYSA-N 0.000 description 1
- XXJGBENTLXFVFI-UHFFFAOYSA-N 1-amino-methylene Chemical compound N[CH2] XXJGBENTLXFVFI-UHFFFAOYSA-N 0.000 description 1
- WJFKNYWRSNBZNX-UHFFFAOYSA-N 10H-phenothiazine Chemical compound C1=CC=C2NC3=CC=CC=C3SC2=C1 WJFKNYWRSNBZNX-UHFFFAOYSA-N 0.000 description 1
- TZMSYXZUNZXBOL-UHFFFAOYSA-N 10H-phenoxazine Chemical compound C1=CC=C2NC3=CC=CC=C3OC2=C1 TZMSYXZUNZXBOL-UHFFFAOYSA-N 0.000 description 1
- UHUHBFMZVCOEOV-UHFFFAOYSA-N 1h-imidazo[4,5-c]pyridin-4-amine Chemical compound NC1=NC=CC2=C1N=CN2 UHUHBFMZVCOEOV-UHFFFAOYSA-N 0.000 description 1
- VGONTNSXDCQUGY-RRKCRQDMSA-N 2'-deoxyinosine Chemical group C1[C@H](O)[C@@H](CO)O[C@H]1N1C(N=CNC2=O)=C2N=C1 VGONTNSXDCQUGY-RRKCRQDMSA-N 0.000 description 1
- ZENKESXKWBIZCV-UHFFFAOYSA-N 2,2,4,4-tetrafluoro-1,3-benzodioxin-6-amine Chemical group O1C(F)(F)OC(F)(F)C2=CC(N)=CC=C21 ZENKESXKWBIZCV-UHFFFAOYSA-N 0.000 description 1
- VEPOHXYIFQMVHW-XOZOLZJESA-N 2,3-dihydroxybutanedioic acid (2S,3S)-3,4-dimethyl-2-phenylmorpholine Chemical compound OC(C(O)C(O)=O)C(O)=O.C[C@H]1[C@@H](OCCN1C)c1ccccc1 VEPOHXYIFQMVHW-XOZOLZJESA-N 0.000 description 1
- YMHOBZXQZVXHBM-UHFFFAOYSA-N 2,5-dimethoxy-4-bromophenethylamine Chemical compound COC1=CC(CCN)=C(OC)C=C1Br YMHOBZXQZVXHBM-UHFFFAOYSA-N 0.000 description 1
- QSHACTSJHMKXTE-UHFFFAOYSA-N 2-(2-aminopropyl)-7h-purin-6-amine Chemical compound CC(N)CC1=NC(N)=C2NC=NC2=N1 QSHACTSJHMKXTE-UHFFFAOYSA-N 0.000 description 1
- JRYMOPZHXMVHTA-DAGMQNCNSA-N 2-amino-7-[(2r,3r,4s,5r)-3,4-dihydroxy-5-(hydroxymethyl)oxolan-2-yl]-1h-pyrrolo[2,3-d]pyrimidin-4-one Chemical compound C1=CC=2C(=O)NC(N)=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O JRYMOPZHXMVHTA-DAGMQNCNSA-N 0.000 description 1
- BFSVOASYOCHEOV-UHFFFAOYSA-N 2-diethylaminoethanol Chemical compound CCN(CC)CCO BFSVOASYOCHEOV-UHFFFAOYSA-N 0.000 description 1
- WKMPTBDYDNUJLF-UHFFFAOYSA-N 2-fluoroadenine Chemical compound NC1=NC(F)=NC2=C1N=CN2 WKMPTBDYDNUJLF-UHFFFAOYSA-N 0.000 description 1
- 125000004200 2-methoxyethyl group Chemical group [H]C([H])([H])OC([H])([H])C([H])([H])* 0.000 description 1
- OALHHIHQOFIMEF-UHFFFAOYSA-N 3',6'-dihydroxy-2',4',5',7'-tetraiodo-3h-spiro[2-benzofuran-1,9'-xanthene]-3-one Chemical compound O1C(=O)C2=CC=CC=C2C21C1=CC(I)=C(O)C(I)=C1OC1=C(I)C(O)=C(I)C=C21 OALHHIHQOFIMEF-UHFFFAOYSA-N 0.000 description 1
- PDBUTMYDZLUVCP-UHFFFAOYSA-N 3,4-dihydro-1,4-benzoxazin-2-one Chemical compound C1=CC=C2OC(=O)CNC2=C1 PDBUTMYDZLUVCP-UHFFFAOYSA-N 0.000 description 1
- VXGRJERITKFWPL-UHFFFAOYSA-N 4',5'-Dihydropsoralen Natural products C1=C2OC(=O)C=CC2=CC2=C1OCC2 VXGRJERITKFWPL-UHFFFAOYSA-N 0.000 description 1
- HQQTZCPKNZVLFF-UHFFFAOYSA-N 4h-1,2-benzoxazin-3-one Chemical compound C1=CC=C2ONC(=O)CC2=C1 HQQTZCPKNZVLFF-UHFFFAOYSA-N 0.000 description 1
- 101710169336 5'-deoxyadenosine deaminase Proteins 0.000 description 1
- ZLAQATDNGLKIEV-UHFFFAOYSA-N 5-methyl-2-sulfanylidene-1h-pyrimidin-4-one Chemical compound CC1=CNC(=S)NC1=O ZLAQATDNGLKIEV-UHFFFAOYSA-N 0.000 description 1
- UJBCLAXPPIDQEE-UHFFFAOYSA-N 5-prop-1-ynyl-1h-pyrimidine-2,4-dione Chemical compound CC#CC1=CNC(=O)NC1=O UJBCLAXPPIDQEE-UHFFFAOYSA-N 0.000 description 1
- KXBCLNRMQPRVTP-UHFFFAOYSA-N 6-amino-1,5-dihydroimidazo[4,5-c]pyridin-4-one Chemical compound O=C1NC(N)=CC2=C1N=CN2 KXBCLNRMQPRVTP-UHFFFAOYSA-N 0.000 description 1
- DCPSTSVLRXOYGS-UHFFFAOYSA-N 6-amino-1h-pyrimidine-2-thione Chemical compound NC1=CC=NC(S)=N1 DCPSTSVLRXOYGS-UHFFFAOYSA-N 0.000 description 1
- QNNARSZPGNJZIX-UHFFFAOYSA-N 6-amino-5-prop-1-ynyl-1h-pyrimidin-2-one Chemical compound CC#CC1=CNC(=O)N=C1N QNNARSZPGNJZIX-UHFFFAOYSA-N 0.000 description 1
- NJBMMMJOXRZENQ-UHFFFAOYSA-N 6H-pyrrolo[2,3-f]quinoline Chemical compound c1cc2ccc3[nH]cccc3c2n1 NJBMMMJOXRZENQ-UHFFFAOYSA-N 0.000 description 1
- LOSIULRWFAEMFL-UHFFFAOYSA-N 7-deazaguanine Chemical compound O=C1NC(N)=NC2=C1CC=N2 LOSIULRWFAEMFL-UHFFFAOYSA-N 0.000 description 1
- HRYKDUPGBWLLHO-UHFFFAOYSA-N 8-azaadenine Chemical compound NC1=NC=NC2=NNN=C12 HRYKDUPGBWLLHO-UHFFFAOYSA-N 0.000 description 1
- LPXQRXLUHJKZIE-UHFFFAOYSA-N 8-azaguanine Chemical compound NC1=NC(O)=C2NN=NC2=N1 LPXQRXLUHJKZIE-UHFFFAOYSA-N 0.000 description 1
- 229960005508 8-azaguanine Drugs 0.000 description 1
- 102000007469 Actins Human genes 0.000 description 1
- 108010085238 Actins Proteins 0.000 description 1
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 1
- 102100036664 Adenosine deaminase Human genes 0.000 description 1
- 102100027211 Albumin Human genes 0.000 description 1
- 102000002260 Alkaline Phosphatase Human genes 0.000 description 1
- 108020004774 Alkaline Phosphatase Proteins 0.000 description 1
- GNRLUBOJIGSVNT-UHFFFAOYSA-N Aminoethoxyacetic acid Chemical compound NCCOCC(O)=O GNRLUBOJIGSVNT-UHFFFAOYSA-N 0.000 description 1
- 108020005544 Antisense RNA Proteins 0.000 description 1
- 241000203069 Archaea Species 0.000 description 1
- BHELIUBJHYAEDK-OAIUPTLZSA-N Aspoxicillin Chemical compound C1([C@H](C(=O)N[C@@H]2C(N3[C@H](C(C)(C)S[C@@H]32)C(O)=O)=O)NC(=O)[C@H](N)CC(=O)NC)=CC=C(O)C=C1 BHELIUBJHYAEDK-OAIUPTLZSA-N 0.000 description 1
- 241000271566 Aves Species 0.000 description 1
- 244000063299 Bacillus subtilis Species 0.000 description 1
- 235000014469 Bacillus subtilis Nutrition 0.000 description 1
- 108091032955 Bacterial small RNA Proteins 0.000 description 1
- 206010061692 Benign muscle neoplasm Diseases 0.000 description 1
- DWRXFEITVBNRMK-UHFFFAOYSA-N Beta-D-1-Arabinofuranosylthymine Natural products O=C1NC(=O)C(C)=CN1C1C(O)C(O)C(CO)O1 DWRXFEITVBNRMK-UHFFFAOYSA-N 0.000 description 1
- 241000701822 Bovine papillomavirus Species 0.000 description 1
- 240000002791 Brassica napus Species 0.000 description 1
- 235000006008 Brassica napus var napus Nutrition 0.000 description 1
- 101150005393 CBF1 gene Proteins 0.000 description 1
- 125000006519 CCH3 Chemical group 0.000 description 1
- QCMYYKRYFNMIEC-UHFFFAOYSA-N COP(O)=O Chemical class COP(O)=O QCMYYKRYFNMIEC-UHFFFAOYSA-N 0.000 description 1
- 241000909983 Candidatus Methanomethylophilus alvus Species 0.000 description 1
- 235000002566 Capsicum Nutrition 0.000 description 1
- 101710132601 Capsid protein Proteins 0.000 description 1
- 108010078791 Carrier Proteins Proteins 0.000 description 1
- 241000701459 Caulimovirus Species 0.000 description 1
- 102000020313 Cell-Penetrating Peptides Human genes 0.000 description 1
- 108010051109 Cell-Penetrating Peptides Proteins 0.000 description 1
- 102100025064 Cellular tumor antigen p53 Human genes 0.000 description 1
- 239000004380 Cholic acid Substances 0.000 description 1
- 108010077544 Chromatin Proteins 0.000 description 1
- 101710094648 Coat protein Proteins 0.000 description 1
- 241000218631 Coniferophyta Species 0.000 description 1
- 229920000742 Cotton Polymers 0.000 description 1
- 241000699802 Cricetulus griseus Species 0.000 description 1
- 230000004544 DNA amplification Effects 0.000 description 1
- 230000004568 DNA-binding Effects 0.000 description 1
- 102000016911 Deoxyribonucleases Human genes 0.000 description 1
- 108010053770 Deoxyribonucleases Proteins 0.000 description 1
- 229920002307 Dextran Polymers 0.000 description 1
- 108090000204 Dipeptidase 1 Proteins 0.000 description 1
- 208000035240 Disease Resistance Diseases 0.000 description 1
- 102000004533 Endonucleases Human genes 0.000 description 1
- 108010042407 Endonucleases Proteins 0.000 description 1
- 241000588921 Enterobacteriaceae Species 0.000 description 1
- 241000701959 Escherichia virus Lambda Species 0.000 description 1
- 241000206602 Eukaryota Species 0.000 description 1
- 108091029865 Exogenous DNA Proteins 0.000 description 1
- XZWYTXMRWQJBGX-VXBMVYAYSA-N FLAG peptide Chemical compound NCCCC[C@@H](C(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@H](CCCCN)NC(=O)[C@@H](NC(=O)[C@@H](N)CC(O)=O)CC1=CC=C(O)C=C1 XZWYTXMRWQJBGX-VXBMVYAYSA-N 0.000 description 1
- 108010046276 FLP recombinase Proteins 0.000 description 1
- 241000588088 Francisella tularensis subsp. novicida U112 Species 0.000 description 1
- 108010001515 Galectin 4 Proteins 0.000 description 1
- 102100039556 Galectin-4 Human genes 0.000 description 1
- 229930182566 Gentamicin Natural products 0.000 description 1
- CEAZRRDELHUEMR-URQXQFDESA-N Gentamicin Chemical compound O1[C@H](C(C)NC)CC[C@@H](N)[C@H]1O[C@H]1[C@H](O)[C@@H](O[C@@H]2[C@@H]([C@@H](NC)[C@@](C)(O)CO2)O)[C@H](N)C[C@@H]1N CEAZRRDELHUEMR-URQXQFDESA-N 0.000 description 1
- 239000004471 Glycine Substances 0.000 description 1
- 102100021181 Golgi phosphoprotein 3 Human genes 0.000 description 1
- 235000003222 Helianthus annuus Nutrition 0.000 description 1
- 102000008157 Histone Demethylases Human genes 0.000 description 1
- 108010074870 Histone Demethylases Proteins 0.000 description 1
- 102000003893 Histone acetyltransferases Human genes 0.000 description 1
- 108090000246 Histone acetyltransferases Proteins 0.000 description 1
- 108010033040 Histones Proteins 0.000 description 1
- 240000005979 Hordeum vulgare Species 0.000 description 1
- 235000007340 Hordeum vulgare Nutrition 0.000 description 1
- 108090000144 Human Proteins Proteins 0.000 description 1
- 102000003839 Human Proteins Human genes 0.000 description 1
- UGQMRVRMYYASKQ-UHFFFAOYSA-N Hypoxanthine nucleoside Natural products OC1C(O)C(CO)OC1N1C(NC=NC2=O)=C2N=C1 UGQMRVRMYYASKQ-UHFFFAOYSA-N 0.000 description 1
- 108700005091 Immunoglobulin Genes Proteins 0.000 description 1
- 108010015268 Integration Host Factors Proteins 0.000 description 1
- 241000235058 Komagataella pastoris Species 0.000 description 1
- QNAYBMKLOCPYGJ-REOHCLBHSA-N L-alanine Chemical compound C[C@H](N)C(O)=O QNAYBMKLOCPYGJ-REOHCLBHSA-N 0.000 description 1
- AGPKZVBTJJNPAG-WHFBIAKZSA-N L-isoleucine Chemical compound CC[C@H](C)[C@H](N)C(O)=O AGPKZVBTJJNPAG-WHFBIAKZSA-N 0.000 description 1
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 1
- FBOZXECLQNJBKD-ZDUSSCGKSA-N L-methotrexate Chemical compound C=1N=C2N=C(N)N=C(N)C2=NC=1CN(C)C1=CC=C(C(=O)N[C@@H](CCC(O)=O)C(O)=O)C=C1 FBOZXECLQNJBKD-ZDUSSCGKSA-N 0.000 description 1
- COLNVLDHVKWLRT-QMMMGPOBSA-N L-phenylalanine Chemical compound OC(=O)[C@@H](N)CC1=CC=CC=C1 COLNVLDHVKWLRT-QMMMGPOBSA-N 0.000 description 1
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 description 1
- KZSNJWFQEVHDMF-BYPYZUCNSA-N L-valine Chemical compound CC(C)[C@H](N)C(O)=O KZSNJWFQEVHDMF-BYPYZUCNSA-N 0.000 description 1
- GUBGYTABKSRVRQ-QKKXKWKRSA-N Lactose Natural products OC[C@H]1O[C@@H](O[C@H]2[C@H](O)[C@@H](O)C(O)O[C@@H]2CO)[C@H](O)[C@@H](O)[C@H]1O GUBGYTABKSRVRQ-QKKXKWKRSA-N 0.000 description 1
- 101710128836 Large T antigen Proteins 0.000 description 1
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 1
- 240000006240 Linum usitatissimum Species 0.000 description 1
- 235000004431 Linum usitatissimum Nutrition 0.000 description 1
- 108020005198 Long Noncoding RNA Proteins 0.000 description 1
- 241000218922 Magnoliophyta Species 0.000 description 1
- 101710125418 Major capsid protein Proteins 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 102000003792 Metallothionein Human genes 0.000 description 1
- 108090000157 Metallothionein Proteins 0.000 description 1
- 108020005196 Mitochondrial DNA Proteins 0.000 description 1
- 241000588629 Moraxella lacunata Species 0.000 description 1
- 108010085220 Multiprotein Complexes Proteins 0.000 description 1
- 102000007474 Multiprotein Complexes Human genes 0.000 description 1
- 101100078999 Mus musculus Mx1 gene Proteins 0.000 description 1
- 101710135898 Myc proto-oncogene protein Proteins 0.000 description 1
- 201000004458 Myoma Diseases 0.000 description 1
- 229930193140 Neomycin Natural products 0.000 description 1
- 244000061176 Nicotiana tabacum Species 0.000 description 1
- 235000002637 Nicotiana tabacum Nutrition 0.000 description 1
- 101710141454 Nucleoprotein Proteins 0.000 description 1
- 229910004679 ONO2 Inorganic materials 0.000 description 1
- REYJJPSVUYRZGE-UHFFFAOYSA-N Octadecylamine Chemical compound CCCCCCCCCCCCCCCCCCN REYJJPSVUYRZGE-UHFFFAOYSA-N 0.000 description 1
- 239000006002 Pepper Substances 0.000 description 1
- 102000010292 Peptide Elongation Factor 1 Human genes 0.000 description 1
- 108010077524 Peptide Elongation Factor 1 Proteins 0.000 description 1
- 108010085186 Peroxisomal Targeting Signals Proteins 0.000 description 1
- PCNDJXKNXGMECE-UHFFFAOYSA-N Phenazine Natural products C1=CC=CC2=NC3=CC=CC=C3N=C21 PCNDJXKNXGMECE-UHFFFAOYSA-N 0.000 description 1
- ABLZXFCXXLZCGV-UHFFFAOYSA-N Phosphorous acid Chemical class OP(O)=O ABLZXFCXXLZCGV-UHFFFAOYSA-N 0.000 description 1
- 241000235648 Pichia Species 0.000 description 1
- 235000016761 Piper aduncum Nutrition 0.000 description 1
- 240000003889 Piper guineense Species 0.000 description 1
- 235000017804 Piper guineense Nutrition 0.000 description 1
- 235000008184 Piper nigrum Nutrition 0.000 description 1
- 108700001094 Plant Genes Proteins 0.000 description 1
- 102000012338 Poly(ADP-ribose) Polymerases Human genes 0.000 description 1
- 108010061844 Poly(ADP-ribose) Polymerases Proteins 0.000 description 1
- 229920000776 Poly(Adenosine diphosphate-ribose) polymerase Polymers 0.000 description 1
- 108091036407 Polyadenylation Proteins 0.000 description 1
- 239000004952 Polyamide Substances 0.000 description 1
- 101710083689 Probable capsid protein Proteins 0.000 description 1
- 108010076504 Protein Sorting Signals Proteins 0.000 description 1
- 241000589516 Pseudomonas Species 0.000 description 1
- 101150090155 R gene Proteins 0.000 description 1
- 108020005067 RNA Splice Sites Proteins 0.000 description 1
- 238000011529 RT qPCR Methods 0.000 description 1
- 102000003661 Ribonuclease III Human genes 0.000 description 1
- 108010057163 Ribonuclease III Proteins 0.000 description 1
- 102000006382 Ribonucleases Human genes 0.000 description 1
- 108010083644 Ribonucleases Proteins 0.000 description 1
- 102000004389 Ribonucleoproteins Human genes 0.000 description 1
- 108010081734 Ribonucleoproteins Proteins 0.000 description 1
- 108091028664 Ribonucleotide Proteins 0.000 description 1
- 108020004422 Riboswitch Proteins 0.000 description 1
- 108010003581 Ribulose-bisphosphate carboxylase Proteins 0.000 description 1
- 241000607142 Salmonella Species 0.000 description 1
- 108020004688 Small Nuclear RNA Proteins 0.000 description 1
- 102000039471 Small Nuclear RNA Human genes 0.000 description 1
- 244000061458 Solanum melongena Species 0.000 description 1
- 235000002597 Solanum melongena Nutrition 0.000 description 1
- 244000061456 Solanum tuberosum Species 0.000 description 1
- 235000002595 Solanum tuberosum Nutrition 0.000 description 1
- 240000006394 Sorghum bicolor Species 0.000 description 1
- 235000011684 Sorghum saccharatum Nutrition 0.000 description 1
- 241000592344 Spermatophyta Species 0.000 description 1
- 108091081024 Start codon Proteins 0.000 description 1
- 241000193996 Streptococcus pyogenes Species 0.000 description 1
- 101000910035 Streptococcus pyogenes serotype M1 CRISPR-associated endonuclease Cas9/Csn1 Proteins 0.000 description 1
- 241000194020 Streptococcus thermophilus Species 0.000 description 1
- 108091027544 Subgenomic mRNA Proteins 0.000 description 1
- 108010043934 Sucrose synthase Proteins 0.000 description 1
- UCKMPCXJQFINFW-UHFFFAOYSA-N Sulphide Chemical compound [S-2] UCKMPCXJQFINFW-UHFFFAOYSA-N 0.000 description 1
- 239000004098 Tetracycline Substances 0.000 description 1
- 241000605257 Thiomicrospira sp. Species 0.000 description 1
- 108091036066 Three prime untranslated region Proteins 0.000 description 1
- 108010043645 Transcription Activator-Like Effector Nucleases Proteins 0.000 description 1
- 101710150448 Transcriptional regulator Myc Proteins 0.000 description 1
- 235000021307 Triticum Nutrition 0.000 description 1
- 102000004243 Tubulin Human genes 0.000 description 1
- 108090000704 Tubulin Proteins 0.000 description 1
- 108020004417 Untranslated RNA Proteins 0.000 description 1
- 102000039634 Untranslated RNA Human genes 0.000 description 1
- KZSNJWFQEVHDMF-UHFFFAOYSA-N Valine Natural products CC(C)C(N)C(O)=O KZSNJWFQEVHDMF-UHFFFAOYSA-N 0.000 description 1
- 241000545067 Venus Species 0.000 description 1
- 208000027418 Wounds and injury Diseases 0.000 description 1
- 241000607479 Yersinia pestis Species 0.000 description 1
- RLXCFCYWFYXTON-JTTSDREOSA-N [(3S,8S,9S,10R,13S,14S,17R)-3-hydroxy-10,13-dimethyl-17-[(2R)-6-methylheptan-2-yl]-2,3,4,7,8,9,11,12,14,15,16,17-dodecahydro-1H-cyclopenta[a]phenanthren-16-yl] N-hexylcarbamate Chemical group C1C=C2C[C@@H](O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC(OC(=O)NCCCCCC)[C@H]([C@H](C)CCCC(C)C)[C@@]1(C)CC2 RLXCFCYWFYXTON-JTTSDREOSA-N 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- XVIYCJDWYLJQBG-UHFFFAOYSA-N acetic acid;adamantane Chemical compound CC(O)=O.C1C(C2)CC3CC1CC2C3 XVIYCJDWYLJQBG-UHFFFAOYSA-N 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 239000012190 activator Substances 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 210000005006 adaptive immune system Anatomy 0.000 description 1
- 235000004279 alanine Nutrition 0.000 description 1
- 125000001931 aliphatic group Chemical group 0.000 description 1
- 150000001336 alkenes Chemical class 0.000 description 1
- 125000005083 alkoxyalkoxy group Chemical group 0.000 description 1
- 125000002877 alkyl aryl group Chemical group 0.000 description 1
- 125000005600 alkyl phosphonate group Chemical group 0.000 description 1
- 239000002168 alkylating agent Substances 0.000 description 1
- 239000013566 allergen Substances 0.000 description 1
- 102000009899 alpha Karyopherins Human genes 0.000 description 1
- 108010077099 alpha Karyopherins Proteins 0.000 description 1
- 125000003277 amino group Chemical group 0.000 description 1
- 125000005122 aminoalkylamino group Chemical group 0.000 description 1
- AVKUERGKIZMTKX-NJBDSQKTSA-N ampicillin Chemical compound C1([C@@H](N)C(=O)N[C@H]2[C@H]3SC([C@@H](N3C2=O)C(O)=O)(C)C)=CC=CC=C1 AVKUERGKIZMTKX-NJBDSQKTSA-N 0.000 description 1
- 229960000723 ampicillin Drugs 0.000 description 1
- 210000004102 animal cell Anatomy 0.000 description 1
- PYKYMHQGRFAEBM-UHFFFAOYSA-N anthraquinone Natural products CCC(=O)c1c(O)c2C(=O)C3C(C=CC=C3O)C(=O)c2cc1CC(=O)OC PYKYMHQGRFAEBM-UHFFFAOYSA-N 0.000 description 1
- 150000004056 anthraquinones Chemical class 0.000 description 1
- 239000003242 anti bacterial agent Substances 0.000 description 1
- 230000000692 anti-sense effect Effects 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 108091007433 antigens Proteins 0.000 description 1
- 102000036639 antigens Human genes 0.000 description 1
- 125000003710 aryl alkyl group Chemical group 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- 230000008970 bacterial immunity Effects 0.000 description 1
- 244000052616 bacterial pathogen Species 0.000 description 1
- 230000037429 base substitution Effects 0.000 description 1
- 239000011324 bead Substances 0.000 description 1
- ZYGHJZDHTFUPRJ-UHFFFAOYSA-N benzo-alpha-pyrone Natural products C1=CC=C2OC(=O)C=CC2=C1 ZYGHJZDHTFUPRJ-UHFFFAOYSA-N 0.000 description 1
- IQFYYKKMVGJFEH-UHFFFAOYSA-N beta-L-thymidine Natural products O=C1NC(=O)C(C)=CN1C1OC(CO)C(O)C1 IQFYYKKMVGJFEH-UHFFFAOYSA-N 0.000 description 1
- DRTQHJPVMGBUCF-PSQAKQOGSA-N beta-L-uridine Natural products O[C@H]1[C@@H](O)[C@H](CO)O[C@@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-PSQAKQOGSA-N 0.000 description 1
- 229940000635 beta-alanine Drugs 0.000 description 1
- 102000006635 beta-lactamase Human genes 0.000 description 1
- 125000002619 bicyclic group Chemical group 0.000 description 1
- 230000004071 biological effect Effects 0.000 description 1
- 229960002685 biotin Drugs 0.000 description 1
- 235000020958 biotin Nutrition 0.000 description 1
- 239000011616 biotin Substances 0.000 description 1
- 238000006664 bond formation reaction Methods 0.000 description 1
- 238000009395 breeding Methods 0.000 description 1
- 230000001488 breeding effect Effects 0.000 description 1
- 125000001369 canonical nucleoside group Chemical group 0.000 description 1
- 150000004657 carbamic acid derivatives Chemical class 0.000 description 1
- 150000001720 carbohydrates Chemical group 0.000 description 1
- 150000001732 carboxylic acid derivatives Chemical class 0.000 description 1
- 230000004700 cellular uptake Effects 0.000 description 1
- 239000002738 chelating agent Substances 0.000 description 1
- 239000012707 chemical precursor Substances 0.000 description 1
- 210000004978 chinese hamster ovary cell Anatomy 0.000 description 1
- 229930002868 chlorophyll a Natural products 0.000 description 1
- ATNHDLDRLWWWCB-AENOIHSZSA-M chlorophyll a Chemical compound C1([C@@H](C(=O)OC)C(=O)C2=C3C)=C2N2C3=CC(C(CC)=C3C)=[N+]4C3=CC3=C(C=C)C(C)=C5N3[Mg-2]42[N+]2=C1[C@@H](CCC(=O)OC\C=C(/C)CCC[C@H](C)CCC[C@H](C)CCCC(C)C)[C@H](C)C2=C5 ATNHDLDRLWWWCB-AENOIHSZSA-M 0.000 description 1
- 229930002869 chlorophyll b Natural products 0.000 description 1
- NSMUHPMZFPKNMZ-VBYMZDBQSA-M chlorophyll b Chemical compound C1([C@@H](C(=O)OC)C(=O)C2=C3C)=C2N2C3=CC(C(CC)=C3C=O)=[N+]4C3=CC3=C(C=C)C(C)=C5N3[Mg-2]42[N+]2=C1[C@@H](CCC(=O)OC\C=C(/C)CCC[C@H](C)CCC[C@H](C)CCCC(C)C)[C@H](C)C2=C5 NSMUHPMZFPKNMZ-VBYMZDBQSA-M 0.000 description 1
- 150000001841 cholesterols Chemical class 0.000 description 1
- BHQCQFFYRZLCQQ-OELDTZBJSA-N cholic acid Chemical compound C([C@H]1C[C@H]2O)[C@H](O)CC[C@]1(C)[C@@H]1[C@@H]2[C@@H]2CC[C@H]([C@@H](CCC(O)=O)C)[C@@]2(C)[C@@H](O)C1 BHQCQFFYRZLCQQ-OELDTZBJSA-N 0.000 description 1
- 229960002471 cholic acid Drugs 0.000 description 1
- 235000019416 cholic acid Nutrition 0.000 description 1
- 210000003483 chromatin Anatomy 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 239000011248 coating agent Substances 0.000 description 1
- 238000000576 coating method Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 239000003184 complementary RNA Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 235000001671 coumarin Nutrition 0.000 description 1
- 125000000332 coumarinyl group Chemical class O1C(=O)C(=CC2=CC=CC=C12)* 0.000 description 1
- 210000004748 cultured cell Anatomy 0.000 description 1
- 125000001995 cyclobutyl group Chemical group [H]C1([H])C([H])([H])C([H])(*)C1([H])[H] 0.000 description 1
- 125000000596 cyclohexenyl group Chemical group C1(=CCCCC1)* 0.000 description 1
- 210000000805 cytoplasm Anatomy 0.000 description 1
- 210000000172 cytosol Anatomy 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- KXGVEGMKQFWNSR-UHFFFAOYSA-N deoxycholic acid Natural products C1CC2CC(O)CCC2(C)C2C1C1CCC(C(CCC(O)=O)C)C1(C)C(O)C2 KXGVEGMKQFWNSR-UHFFFAOYSA-N 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- ANCLJVISBRWUTR-UHFFFAOYSA-N diaminophosphinic acid Chemical compound NP(N)(O)=O ANCLJVISBRWUTR-UHFFFAOYSA-N 0.000 description 1
- 238000006471 dimerization reaction Methods 0.000 description 1
- 230000034431 double-strand break repair via homologous recombination Effects 0.000 description 1
- 239000000975 dye Substances 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000029142 excretion Effects 0.000 description 1
- 210000002950 fibroblast Anatomy 0.000 description 1
- 239000007850 fluorescent dye Substances 0.000 description 1
- 125000001153 fluoro group Chemical group F* 0.000 description 1
- 229940014144 folate Drugs 0.000 description 1
- OVBPIULPVIDEAO-LBPRGKRZSA-N folic acid Chemical compound C=1N=C2NC(N)=NC(=O)C2=NC=1CNC1=CC=C(C(=O)N[C@@H](CCC(O)=O)C(O)=O)C=C1 OVBPIULPVIDEAO-LBPRGKRZSA-N 0.000 description 1
- 235000019152 folic acid Nutrition 0.000 description 1
- 239000011724 folic acid Substances 0.000 description 1
- 238000012239 gene modification Methods 0.000 description 1
- 229960002518 gentamicin Drugs 0.000 description 1
- ZDXPYRJPNDTMRX-UHFFFAOYSA-N glutamine Natural products OC(=O)C(N)CCC(N)=O ZDXPYRJPNDTMRX-UHFFFAOYSA-N 0.000 description 1
- 125000003827 glycol group Chemical group 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910001385 heavy metal Inorganic materials 0.000 description 1
- 208000006454 hepatitis Diseases 0.000 description 1
- 231100000283 hepatitis Toxicity 0.000 description 1
- 125000000592 heterocycloalkyl group Chemical group 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 210000000987 immune system Anatomy 0.000 description 1
- 230000036039 immunity Effects 0.000 description 1
- 230000005847 immunogenicity Effects 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 238000001727 in vivo Methods 0.000 description 1
- 108700032552 influenza virus INS1 Proteins 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 208000014674 injury Diseases 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000000138 intercalating agent Substances 0.000 description 1
- 230000009878 intermolecular interaction Effects 0.000 description 1
- 230000008863 intramolecular interaction Effects 0.000 description 1
- 230000002427 irreversible effect Effects 0.000 description 1
- 229960000310 isoleucine Drugs 0.000 description 1
- AGPKZVBTJJNPAG-UHFFFAOYSA-N isoleucine Natural products CCC(C)C(N)C(O)=O AGPKZVBTJJNPAG-UHFFFAOYSA-N 0.000 description 1
- 125000001449 isopropyl group Chemical group [H]C([H])([H])C([H])(*)C([H])([H])[H] 0.000 description 1
- 238000005304 joining Methods 0.000 description 1
- 239000008101 lactose Substances 0.000 description 1
- 238000010859 live-cell imaging Methods 0.000 description 1
- 230000033001 locomotion Effects 0.000 description 1
- 125000003588 lysine group Chemical group [H]N([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])(N([H])[H])C(*)=O 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 210000001161 mammalian embryo Anatomy 0.000 description 1
- 230000013011 mating Effects 0.000 description 1
- 230000035800 maturation Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000002609 medium Substances 0.000 description 1
- 230000021121 meiosis Effects 0.000 description 1
- 201000001441 melanoma Diseases 0.000 description 1
- 238000002844 melting Methods 0.000 description 1
- 230000008018 melting Effects 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 239000002207 metabolite Substances 0.000 description 1
- 239000002923 metal particle Substances 0.000 description 1
- 229960000485 methotrexate Drugs 0.000 description 1
- 125000000956 methoxy group Chemical group [H]C([H])([H])O* 0.000 description 1
- 230000000813 microbial effect Effects 0.000 description 1
- 239000011859 microparticle Substances 0.000 description 1
- 230000002438 mitochondrial effect Effects 0.000 description 1
- 238000002887 multiple sequence alignment Methods 0.000 description 1
- 239000002071 nanotube Substances 0.000 description 1
- 239000002070 nanowire Substances 0.000 description 1
- 229960004927 neomycin Drugs 0.000 description 1
- 125000001893 nitrooxy group Chemical group [O-][N+](=O)O* 0.000 description 1
- 235000021231 nutrient uptake Nutrition 0.000 description 1
- 235000016709 nutrition Nutrition 0.000 description 1
- 125000001181 organosilyl group Chemical group [SiH3]* 0.000 description 1
- 210000001672 ovary Anatomy 0.000 description 1
- 229910052760 oxygen Inorganic materials 0.000 description 1
- 125000004430 oxygen atom Chemical group O* 0.000 description 1
- 125000000913 palmityl group Chemical group [H]C([*])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])[H] 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- ONTNXMBMXUNDBF-UHFFFAOYSA-N pentatriacontane-17,18,19-triol Chemical compound CCCCCCCCCCCCCCCCC(O)C(O)C(O)CCCCCCCCCCCCCCCC ONTNXMBMXUNDBF-UHFFFAOYSA-N 0.000 description 1
- 229950000688 phenothiazine Drugs 0.000 description 1
- 150000002991 phenoxazines Chemical class 0.000 description 1
- COLNVLDHVKWLRT-UHFFFAOYSA-N phenylalanine Natural products OC(=O)C(N)CC1=CC=CC=C1 COLNVLDHVKWLRT-UHFFFAOYSA-N 0.000 description 1
- 150000008299 phosphorodiamidates Chemical class 0.000 description 1
- 230000035479 physiological effects, processes and functions Effects 0.000 description 1
- 229920002647 polyamide Polymers 0.000 description 1
- 229920000570 polyether Polymers 0.000 description 1
- 229920002704 polyhistidine Polymers 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 239000001103 potassium chloride Substances 0.000 description 1
- 235000011164 potassium chloride Nutrition 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 108020001580 protein domains Proteins 0.000 description 1
- 238000001742 protein purification Methods 0.000 description 1
- IGFXRKMLLMBKSA-UHFFFAOYSA-N purine Chemical compound N1=C[N]C2=NC=NC2=C1 IGFXRKMLLMBKSA-UHFFFAOYSA-N 0.000 description 1
- UBQKCCHYAOITMY-UHFFFAOYSA-N pyridin-2-ol Chemical compound OC1=CC=CC=N1 UBQKCCHYAOITMY-UHFFFAOYSA-N 0.000 description 1
- RXTQGIIIYVEHBN-UHFFFAOYSA-N pyrimido[4,5-b]indol-2-one Chemical compound C1=CC=CC2=NC3=NC(=O)N=CC3=C21 RXTQGIIIYVEHBN-UHFFFAOYSA-N 0.000 description 1
- SRBUGYKMBLUTIS-UHFFFAOYSA-N pyrrolo[2,3-d]pyrimidin-2-one Chemical compound O=C1N=CC2=CC=NC2=N1 SRBUGYKMBLUTIS-UHFFFAOYSA-N 0.000 description 1
- 102000005962 receptors Human genes 0.000 description 1
- 108020003175 receptors Proteins 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000022532 regulation of transcription, DNA-dependent Effects 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 125000006853 reporter group Chemical group 0.000 description 1
- 230000001850 reproductive effect Effects 0.000 description 1
- 108091008146 restriction endonucleases Proteins 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 239000002336 ribonucleotide Substances 0.000 description 1
- 125000002652 ribonucleotide group Chemical group 0.000 description 1
- 210000003705 ribosome Anatomy 0.000 description 1
- 125000000548 ribosyl group Chemical group C1([C@H](O)[C@H](O)[C@H](O1)CO)* 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 239000000523 sample Substances 0.000 description 1
- 230000003248 secreting effect Effects 0.000 description 1
- 230000028327 secretion Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000019491 signal transduction Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- HBMJWWWQQXIZIP-UHFFFAOYSA-N silicon carbide Chemical compound [Si+]#[C-] HBMJWWWQQXIZIP-UHFFFAOYSA-N 0.000 description 1
- 229910010271 silicon carbide Inorganic materials 0.000 description 1
- 230000003007 single stranded DNA break Effects 0.000 description 1
- 230000005783 single-strand break Effects 0.000 description 1
- 239000011780 sodium chloride Substances 0.000 description 1
- 230000010473 stable expression Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 102000005969 steroid hormone receptors Human genes 0.000 description 1
- 108020003113 steroid hormone receptors Proteins 0.000 description 1
- IIACRCGMVDHOTQ-UHFFFAOYSA-N sulfamic acid Chemical group NS(O)(=O)=O IIACRCGMVDHOTQ-UHFFFAOYSA-N 0.000 description 1
- 150000003456 sulfonamides Chemical group 0.000 description 1
- BDHFUVZGWQCTTF-UHFFFAOYSA-M sulfonate Chemical compound [O-]S(=O)=O BDHFUVZGWQCTTF-UHFFFAOYSA-M 0.000 description 1
- 150000003457 sulfones Chemical group 0.000 description 1
- 150000003462 sulfoxides Chemical class 0.000 description 1
- 229910052717 sulfur Inorganic materials 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 229920002994 synthetic fiber Polymers 0.000 description 1
- 238000002626 targeted therapy Methods 0.000 description 1
- 229960002180 tetracycline Drugs 0.000 description 1
- 229930101283 tetracycline Natural products 0.000 description 1
- 235000019364 tetracycline Nutrition 0.000 description 1
- 150000003522 tetracyclines Chemical class 0.000 description 1
- 150000003568 thioethers Chemical class 0.000 description 1
- 150000003573 thiols Chemical class 0.000 description 1
- RYYWUUFWQRZTIU-UHFFFAOYSA-K thiophosphate Chemical compound [O-]P([O-])([O-])=S RYYWUUFWQRZTIU-UHFFFAOYSA-K 0.000 description 1
- 229940104230 thymidine Drugs 0.000 description 1
- 238000012876 topography Methods 0.000 description 1
- 231100000419 toxicity Toxicity 0.000 description 1
- 230000001988 toxicity Effects 0.000 description 1
- 230000005030 transcription termination Effects 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 230000010474 transient expression Effects 0.000 description 1
- 230000014621 translational initiation Effects 0.000 description 1
- ZMANZCXQSJIPKH-UHFFFAOYSA-O triethylammonium ion Chemical compound CC[NH+](CC)CC ZMANZCXQSJIPKH-UHFFFAOYSA-O 0.000 description 1
- 125000000876 trifluoromethoxy group Chemical group FC(F)(F)O* 0.000 description 1
- 125000002023 trifluoromethyl group Chemical group FC(F)(F)* 0.000 description 1
- 210000004881 tumor cell Anatomy 0.000 description 1
- WFKWXMTUELFFGS-UHFFFAOYSA-N tungsten Chemical compound [W] WFKWXMTUELFFGS-UHFFFAOYSA-N 0.000 description 1
- 229910052721 tungsten Inorganic materials 0.000 description 1
- 239000010937 tungsten Substances 0.000 description 1
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 description 1
- 125000002948 undecyl group Chemical group [H]C([*])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])[H] 0.000 description 1
- 241000701161 unidentified adenovirus Species 0.000 description 1
- 241000701447 unidentified baculovirus Species 0.000 description 1
- DRTQHJPVMGBUCF-UHFFFAOYSA-N uracil arabinoside Natural products OC1C(O)C(CO)OC1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-UHFFFAOYSA-N 0.000 description 1
- 229940045145 uridine Drugs 0.000 description 1
- 239000004474 valine Substances 0.000 description 1
- 108700026215 vpr Genes Proteins 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
- 229940075420 xanthine Drugs 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/102—Mutagenizing nucleic acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases RNAses, DNAses
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
- C07K2319/01—Fusion polypeptide containing a localisation/targetting motif
- C07K2319/09—Fusion polypeptide containing a localisation/targetting motif containing a nuclear localisation signal
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
Definitions
- This disclosure relates to methods to increase site-directed nuclease editing.
- SDNs Site directed nucleases
- CRISPR-associated nucleases e.g. zinc finger nucleases, transcription activator-like effector nucleases, CRISPR-associated nucleases
- SDNs act as endonucleases and generally create double-stranded breaks (DSBs) in specific DNA sequences, activating intrinsic repair mechanisms of the cell (e.g., homologous recombination) .
- DSBs double-stranded breaks
- site-directed modification to said specific DNA sequence can be achieved.
- the CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) /Cas (CRISPR-associated) system evolved in bacteria and archaea as an adaptive immune system to defend against viral attack.
- CRISPR/Cas system has attracted particular interest as a tool for genome editing.
- CRISPR/Cas systems that generate site-specific double stranded breaks can be used to edit DNA in eukaryotic cells, e.g., by producing deletions, insertions, and/or changes in nucleotide sequence.
- a Cas12a protein comprising a sequence that is at least 80%identical to the amino acid sequence of SEQ ID NO: 1 and a human-induced mutation at position C965.
- the human-induced mutation is a cysteine to serine substitution.
- the Cas12a protein further comprises a human-induced mutation at position D156.
- the human-induced mutation at position D156 is an aspartic acid to arginine substitution.
- the sequence of the Cas12a protein comprises any one of SEQ ID NOs: 5-11.
- a Cas12a protein comprising a sequence that is at least 80%identical to the amino acid sequence of SEQ ID NO: 2 and a human-induced mutation at position C70, C1116, and/or C1190.
- the human-induced mutation is a cysteine to serine substitution.
- the Cas12a protein further comprises a human-induced mutation at position E184.
- the human-induced mutation at position E184 is a glutamic acid to arginine substitution.
- a Cas12a protein comprising a sequence that is at least 80%identical to the amino acid sequence of SEQ ID NO: 3 and a human-induced mutation at position C334, C379, and/or C674.
- the human-induced mutation is a cysteine to serine substitution.
- the Cas12a protein further comprises a human-induced mutation at position E174.
- the human-induced mutation at position E174 is a glutamic acid to arginine substitution.
- a Cas12a protein comprising a sequence that is at least 80%identical to the amino acid sequence of SEQ ID NO: 4 and a human-induced mutation at position C270, C583, C1068, C1099, and/or C1149.
- the human-induced mutation is a cysteine to serine substitution.
- the Cas12a protein further comprises a human-induced mutation at position D172.
- the human-induced mutation at position D172 is an aspartic acid to arginine substitution.
- the sequence of the Cas12a protein comprises any one of SEQ ID NOs: 12-19.
- the Cas12a protein is a catalytically dead Cas12a (dCas12a) protein of a nickase Cas12a (nCas12a) protein.
- dCas12a catalytically dead Cas12a
- nCas12a nickase Cas12a
- the Cas12a protein further comprises a nuclear localization signal.
- fusion protein comprising any of the Cas12a proteins described above and a heterologous domain.
- the heterologous domain is a deaminase domain, a transcription factor domain, a nuclease domain, a reverse-transcriptase domain, a transposase domain, a integrase domain, a uracil DNA glycosylase inhibitor domain, a recombinase domain, a nickase domain, a methyltransferase domain, a methylase domain, an acetylase domain, an acetyltransferase domain, a transcriptional activator domain, or a transcriptional repressor domain.
- the Cas12a protein is linked to the heterologous domain by a linker sequence.
- nucleic acid encoding any of the Cas12a proteins or any of the fusion proteins described above.
- nucleic acid sequence is any one of SEQ ID NOs: 20-34.
- a DNA construct comprising a promoter operably linked to the nucleic acid encoding any of the Cas12a proteins or any of the fusion proteins described above.
- a vector comprising the nucleic acid or the DNA construct described above.
- a cell comprising the nucleic acid, the DNA construct, or the vector described above.
- the cell is a plant cell.
- the cell is a maize plant cell, a wheat plant cell, a rice plant cell, a soybean plant cell, a sunflower plant cell, or a tomato plant cell.
- a method of editing a nucleic acid comprising contacting the nucleic acid with (i) any one of the Cas12a protein described above or any one of the fusion proteins described above, and (ii) a guide RNA having a region complementary to a selected portion of the nucleic acid, thereby resulting in an edit to the nucleic acid.
- the present application includes the following figures.
- the figures are intended to illustrate certain embodiments and/or features of the compositions and methods, and to supplement any description (s) of the compositions and methods.
- the figures do not limit the scope of the compositions and methods, unless the written description expressly indicates that such is the case.
- FIG. 1 shows the cysteine residues in LbCas12a may potentially form inter-or intra-molecular interactions.
- Left the PyMOL surface model of LbCas12a-crRNA-DNA ternary complex (PDB entry 5XUS) .
- the highlighted area pointed by arrows are the thiol groups of C965 and C1090 that are potentially exposed to the surface.
- FIG. 2 shows two of the cysteine residues in FnCas12a that were selected for substitution, according to aspects of this disclosure.
- the PyMOL stick models of C1190 and C1116 suggest the thiol groups (in black) are close to each other in the FnCas12a 3D structure (PDB entry 5NFV) , and may potentially form an intramolecular disulfide bond in between.
- compositions and methods recites various aspects and embodiments of the present compositions and methods. No particular embodiment is intended to define the scope of the compositions and methods. Rather, the embodiments merely provide non-limiting examples of various compositions and methods that are at least included within the scope of the disclosed compositions and methods. The description is to be read from the perspective of one of ordinary skill in the art; therefore, information well known to the skilled artisan is not necessarily included.
- nucleic acid or amino acid sequence
- nucleic acid sequence or an amino acid sequence that includes the subject sequence as a part or as its entire sequence.
- the transitional phrase “consisting essentially of” means that the scope of a claim is to be interpreted to encompass the specified materials or steps recited in the claim and those that do not materially affect the basic and novel characteristic (s) of the claimed matter. Thus, the term “consisting essentially of” when used in a claim of this disclosure is not intended to be interpreted to be equivalent to “comprising. ”
- a “plurality” refers to more than one entity.
- a “plurality of individuals” refers to at least two individuals.
- the term plurality refers to more than half of the whole.
- a “plurality of a population” refers to more than half the members of that population.
- plant refers to any plant at any stage of development, particularly a seed plant.
- plant cell refers to a structural and physiological unit of a plant, comprising a protoplast and a cell wall.
- the plant cell may be in form of an isolated single cell or a cultured cell, or as a part of higher organized unit such as, for example, plant tissue, a plant organ, or a whole plant.
- the plant cell may be derived from or part of an angiosperm or gymnosperm.
- the plant cell may be a monocotyledonous plant cell (e.g., a maize cell, a rice cell, a sorghum cell, a sugarcane cell, a barley cell, a wheat cell, an oat cell, a turf grass cell, or an ornamental grass cell) or a dicotyledonous plant cell (e.g., a tobacco cell, a pepper cell, an eggplant cell, a sunflower cell, a crucifer cell, a flax cell, a potato cell, a cotton cell, a soybean cell, a sugar bee cell, or an oilseed rape cell.
- a monocotyledonous plant cell e.g., a maize cell, a rice cell, a sorghum cell, a sugarcane cell, a barley cell, a wheat cell, an oat cell, a turf grass cell, or an ornamental grass cell
- a dicotyledonous plant cell e.g., a tobacco cell,
- plant cell culture refers to cultures of plant units such as, for example, protoplasts, cell culture cells, cells in plant tissues, pollen, pollen tubes, ovules, embryo sacs, zygotes and embryos at various stages of development.
- plant tissue refers to a group of plant cells organized into a structural and functional unit. Any tissue of a plant in planta or in culture is included. This term includes, but is not limited to, whole plants, plant organs, plant seeds, tissue culture and any group of plant cells organized into structural and/or functional units. The use of this term in conjunction with, or in the absence of, any specific type of plant tissue as listed above or otherwise embraced by this definition is not intended to be exclusive of any other type of plant tissue.
- plant part refers to a part of a plant, including single cells and cell tissues such as plant cells that are intact in plants, cell clumps and tissue cultures from which plants can be regenerated.
- plant parts include, but are not limited to, single cells and tissues from pollen, ovules, zygotes, leaves, embryos, roots, root tips, anthers, flowers, flower parts, fruits, stems, shoots, cuttings, and seeds; as well as pollen, ovules, egg cells, zygotes, leaves, embryos, roots, root tips, anthers, flowers, flower parts, fruits, stems, shoots, cuttings, scions, rootstocks, seeds, protoplasts, calli, and the like.
- polypeptide, ” and “protein” are used interchangeably herein to refer to a polymer of amino acid residues. As used herein, the terms encompass amino acid chains of any length, including full-length proteins, wherein the amino acid residues are linked by covalent peptide bonds.
- nucleic acid and “polynucleotide” are used interchangeably and as used herein refer to deoxyribonucleic acids (DNA) or ribonucleic acids (RNA) and polymers thereof in either single-or double-stranded form, as well as to both sense and anti-sense strands of RNA, cDNA, genomic DNA, mitochondrial DNA, and synthetic forms and mixed polymers of the above.
- DNA is the genetic material while RNA is involved in the transfer of information contained within DNA into proteins.
- a “genome” is the entire body of genetic material contained in each cell of an organism. It is understood that when an RNA is described, its corresponding cDNA is also described, wherein uridine is represented as thymidine.
- a nucleotide refers to a ribonucleotide, deoxynucleotide or a modified form of either type of nucleotide, and combinations thereof.
- a polynucleotide disclosed herein may include either or both naturally occurring and modified nucleotides linked together by naturally occurring and/or non-naturally occurring nucleotide linkages.
- the nucleic acid molecules may be modified chemically or biochemically or may contain non-natural or derivatized nucleotide bases, as will be readily appreciated by those of skill in the art.
- Such modifications include, for example, labels, methylation, substitution of one or more of the naturally occurring nucleotides with an analogue, internucleotide modifications such as uncharged linkages (e.g., methyl phosphonates, phosphotriesters, phosphoramidates, carbamates, and the like) , charged linkages (e.g., phosphorothioates, phosphorodithioates, and the like) , pendent moieties (e.g., polypeptides) , intercalators (e.g., acridine, psoralen, and the like) , chelators, alkylators, and modified linkages (e.g., alpha anomeric nucleic acids, and the like) .
- uncharged linkages e.g., methyl phosphonates, phosphotriesters, phosphoramidates, carbamates, and the like
- charged linkages e.g., phosphorothioates, phospho
- nucleic acid sequence encompasses its complement unless otherwise specified. Thus, a reference to a nucleic acid molecule having a particular sequence should be understood to encompass its complementary strand, with its complementary sequence. Nucleotide sequences are “complementary” when they specifically hybridize in solution (e.g., according to Watson-Crick base pairing rules) . The term also includes codon-optimized nucleic acids that encode the same polypeptide sequence. It is also understood that nucleic acids can be unpurified, purified, or attached, for example, to a synthetic material such as a bead or column matrix.
- nucleic acid sequences when the nucleic acid sequences of certain sequences are aligned with each other, the nucleic acids that “correspond to” certain enumerated positions in the present invention are those that align with these positions in a reference sequence, but that are not necessarily in these exact numerical positions relative to a particular nucleic acid sequence of the invention.
- Optimal alignment of sequences for comparison can be conducted by computerized implementations of known algorithms. or by visual inspection. Readily available sequence comparison and multiple sequence alignment algorithms are, respectively, the Basic Local Alignment Search Tool (BLAST) and ClustalW/ClustalW2/Clustal Omega programs available on the Internet (e.g., the website of the EMBL-EBI) .
- BLAST Basic Local Alignment Search Tool
- ClustalW/ClustalW2/Clustal Omega programs available on the Internet (e.g., the website of the EMBL-EBI) .
- nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions) , alleles, orthologs, SNPs, and complementary sequences as well as the sequence explicitly indicated.
- degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues. See Batzer et al., Nucleic Acid Res. 19: 5081 (1991) ; Ohtsuka et al., J. Biol. Chem. 260: 2605-2608 (1985) ; and Rossolini et al., Mol. Cell. Probes 8: 91-98 (1994) .
- identity refers to a sequence that has at least 60%sequence identity to a reference sequence.
- percent identity can be any integer from 60%to 100%.
- Exemplary embodiments include at least: 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%, as compared to a reference sequence using the programs described herein; preferably BLAST using standard parameters, as described below.
- sequence comparison typically one sequence acts as a reference sequence to which test sequences are compared.
- test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Default program parameters can be used, or alternative parameters can be designated.
- sequence comparison algorithm then calculates the percent sequence identities for the test sequences relative to the reference sequence, based on the program parameters.
- a “comparison window, ” as used herein, includes reference to a segment of any one of the number of contiguous positions selected from the group consisting of from 20 to 600, usually about 50 to about 200, more usually about 100 to about 150 in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned.
- Methods of alignment of sequences for comparison are well-known in the art.
- Optimal alignment of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman Add. APL. Math. 2: 482 (1981) , by the homology alignment algorithm of Needleman and Wunsch J. Mol. Biol. 48: 443 (1970) , by the search for similarity method of Pearson and Lipman Proc. Natl. Acad. Sci. (U.S.A. ) 85: 2444 (1988) , by computerized implementations of these algorithms (e.g., BLAST) , or by manual alignment and visual inspection.
- HSPs high scoring sequence pairs
- T is referred to as the neighborhood word score threshold (Altschul et al, supra) .
- These initial neighborhood word hits acts as seeds for initiating searches to find longer HSPs containing them.
- the word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased.
- Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always >0) and N (penalty score for mismatching residues; always ⁇ 0) .
- M forward score for a pair of matching residues
- N penalty score for mismatching residues
- Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached.
- the BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment.
- the BLASTP program uses as defaults a word size (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix. See Henikoff &Henikoff, Proc. Natl. Acad. Sci. USA 89: 10915 (1989) .
- the BLAST algorithm also performs a statistical analysis of the similarity between two sequences. See, e.g., Karlin &Altschul, Proc. Nat'l. Acad. Sci. USA 90: 5873-5787
- nucleic acid is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid to the reference nucleic acid is less than about 0.01, more preferably less than about 10 -5 , and most preferably less than about 10 -20 .
- Recombination is the exchange of DNA strands to produce new nucleotide sequence arrangements.
- the term may refer to the process of homologous recombination that occurs in double-strand DNA break repair, where a polynucleotide is used as a template to repair a homologous polynucleotide.
- the term may also refer to exchange of information between two homologous chromosomes during meiosis.
- the frequency of double recombination is the product of the frequencies of the single recombinants.
- a “gene” is a defined region that is located within a genome and that, besides the aforementioned coding nucleic acid sequence, comprises other, primarily regulatory, nucleic acid sequences responsible for the control of the expression, that is to say the transcription and translation, of the coding portion. Genes can include both coding and non-coding regions (e.g., introns, regulatory elements, promoters, enhancers, termination sequences and 5'a nd 3'untranslated regions) . A gene typically expresses mRNA, functional RNA, or specific protein, including regulatory sequences. Genes may or may not be capable of being used to produce a functional protein. In some embodiments, a gene refers to only the coding region.
- a gene refers to a gene as found in nature.
- the term “chimeric gene” refers to any gene that contains 1) DNA sequences, including regulatory and coding sequences that are not found together in nature, or 2) sequences encoding parts of proteins not naturally adjoined, or 3) parts of promoters that are not naturally adjoined. Accordingly, a chimeric gene may comprise regulatory sequences and coding sequences that are derived from different sources, or comprise regulatory sequences and coding sequences derived from the same source, but arranged in a manner different from that found in nature.
- a gene may be “isolated” by which is meant a nucleic acid molecule that is substantially or essentially free from components normally found in association with the nucleic acid molecule in its natural state. Such components include other cellular material, culture medium from recombinant production, and/or various chemicals used in chemically synthesizing the nucleic acid molecule.
- a “gene of interest” or “nucleotide sequence of interest” refers to any gene which, when transferred to a plant, confers upon the plant a desired characteristic such as antibiotic resistance, virus resistance, insect resistance, disease resistance, or resistance to other pests, herbicide tolerance, improved nutritional value, improved performance in an industrial process or altered reproductive capability.
- the “gene of interest” may also be one that is transferred to plants for the production of commercially valuable enzymes or metabolites in the plant.
- nucleic acid molecule or nucleotide sequence or an “isolated” polypeptide is a nucleic acid molecule, nucleotide sequence, or polypeptide that, by the hand of man, exists apart from its native environment and/or has a function that is different, modified, modulated and/or altered as compared to its function in its native environment and is therefore not a product of nature.
- An isolated nucleic acid molecule or isolated polypeptide may exist in a purified form or may exist in a non-native environment such as, for example, a recombinant host cell.
- the term isolated means that it is separated from the chromosome and/or cell in which it naturally occurs.
- a polynucleotide is also isolated if it is separated from the chromosome and/or cell in which it naturally occurs and is then inserted into a genetic context, a chromosome, a chromosome location, and/or a cell in which it does not naturally occur.
- the recombinant nucleic acid molecules and nucleotide sequences of the invention can be considered to be “isolated” as defined above.
- an “isolated nucleic acid molecule” or “isolated nucleotide sequence” is a nucleic acid molecule or nucleotide sequence that is not immediately contiguous with nucleotide sequences with which it is immediately contiguous (one on the 5'end and one on the 3'end) in the naturally occurring genome of the organism from which it is derived. Accordingly, in one embodiment, an isolated nucleic acid includes some or all of the 5'non-coding (e.g., promoter) sequences that are immediately contiguous to a coding sequence.
- 5'non-coding e.g., promoter
- the term therefore includes, for example, a recombinant nucleic acid that is incorporated into a vector, into an autonomously replicating plasmid or virus, or into the genomic DNA of a prokaryote or eukaryote, or which exists as a separate molecule (e.g., a cDNA or a genomic DNA fragment produced by PCR or restriction endonuclease treatment) , independent of other sequences. It also includes a recombinant nucleic acid that is part of a hybrid nucleic acid molecule encoding an additional polypeptide or peptide sequence.
- isolated can further refer to a nucleic acid molecule, nucleotide sequence, polypeptide, peptide or fragment that is substantially free of cellular material, viral material, and/or culture medium (e.g., when produced by recombinant DNA techniques) , or chemical precursors or other chemicals (e.g., when chemically synthesized) .
- an “isolated fragment” is a fragment of a nucleic acid molecule, nucleotide sequence or polypeptide that is not naturally occurring as a fragment and would not be found as such in the natural state. “Isolated” does not necessarily mean that the preparation is technically pure (homogeneous) , but it is sufficiently pure to provide the polypeptide or nucleic acid in a form in which it can be used for the intended purpose.
- “Homology dependent repair” or “homology directed repair” or “HDR” refers to a mechanism for repairing ssDNA and double stranded dna (dsDNA) damage in cells. This repair mechanism can be used by the cell when there is an HDR template with a sequence with significant homology to the injury site.
- the term “perfect HDR” refers to a situation in which genomic-homology junctions in the replaced allele underwent complete HDR and “imperfect HDR” refers to a situation in which genomic-homology junctions in the replaced allele underwent partial or incomplete HDR.
- a donor DNA molecule with homology to the cleaved target DNA sequence is used as a template for repair of the cleaved target DNA sequence, resulting in the transfer of genetic information from the donor polynucleotide to the target DNA.
- new nucleic acid material may be inserted/copied into the site.
- a target DNA is contacted with a donor molecule, for example a donor DNA molecule.
- a donor DNA molecule is introduced into a cell.
- at least a segment of a donor DNA molecule integrates into the genome of the cell.
- MMEJ Microhomology-mediated end joining
- Alt-NHEJ alternative nonhomologous end-joining
- heterologous refers to a nucleic acid molecule, nucleotide sequence, polypeptide, or amino acid sequence not naturally associated with a host cell into which it is introduced, that either originates from another species or is from the same species or organism but is modified from either its original form or the form primarily expressed in the cell, including non-naturally occurring multiple copies of a naturally occurring sequence.
- an amino acid sequence derived from an organism or species different from that of the cell into which the amino acid sequence is introduced is heterologous with respect to that cell and the cell's descendants.
- a heterologous sequence includes a sequence derived from and inserted into the same natural, original cell type, but which is present in a non-natural state, e.g., present in a different copy number, and/or under the control of different regulatory sequences than that found in the native state of the polypeptide.
- a sequence can also be heterologous to other sequences with which it may be associated, for example in a nucleic acid construct, such as e.g., an expression vector.
- a promoter may be present in a nucleic acid construct in combination with one or more regulatory element and/or coding sequences that do not naturally occur in association with that particular promoter, i.e., they are heterologous to the promoter.
- variant Cas12a proteins having increased site-directed nuclease (SDN) genome editing activity.
- SDN site-directed nuclease
- Site-directed nuclease technology has dramatically increased the speed and precision with which one can make genome edits in various organisms, including plants.
- the desired outcomes in SDN-mediated genome editing are 1) to target SDNs to cleave DNA at a specific genomic site in a host (e.g., a plant cell) and 2) to use the host’s natural repair mechanisms to introduce specific genomic changes at the cleavage site.
- the changes can include small deletions, substitutions, or the addition of a number of nucleotides.
- SDN applications have generally been divided into three categories: SDN-1, SDN-2, and SDN-3.
- SDN-1 produces a double-stranded break in a genome without the addition of foreign DNA.
- the host e.g., via NHEJ
- mutations or deletions can be introduced. If these mutations or deletions are in a gene, the gene can be silenced or knocked out.
- SDN-2 uses template DNA to introduce a predicted modification at the target cleavage site (e.g., via HDR) , but does not result in insertion of recombinant DNA.
- SDN-3 also uses template DNA to introduce recombinant or exogenous DNA templates (e.g., a transgene) at the target cleavage site.
- Cas12a is a CRISPR-associated (Cas) SDN that functions in a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) /Cas system.
- CRISPR Clustered Regularly Interspaced Short Palindromic Repeats
- this system can provide adaptive immunity against foreign DNA (Barrangou, R., et al, “CRISPR provides acquired resistance against viruses in prokaryotes, “Science (2007) 315: 1709-1712; Makarova, K.S., et al, “Evolution and classification of the CRISPR-Cas systems, ” Nat Rev Microbiol (2011) 9: 467-477; Garneau, J.
- CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA, ” Nature (2010) 468: 67-71; Sapranauskas, R., et al, “The Streptococcus thermophilus CRISPR/Cas system provides immunity in Escherichia coli, ” Nucleic Acids Res (2011) 39: 9275-9282) .
- a CRISPR/Cas system e.g., modified and/or unmodified
- a CRISPR/Cas system can comprise a guide nucleic acid such as a guide RNA (gRNA) complexed with a Cas protein for targeted regulation of gene expression and/or activity or nucleic acid editing.
- a guide nucleic acid such as a guide RNA (gRNA) complexed with a Cas protein for targeted regulation of gene expression and/or activity or nucleic acid editing.
- An RNA-guided Cas protein e.g., a Cas nuclease such as a Cas9 nuclease
- the Cas protein if possessing nuclease activity, can cleave the DNA (Gasiunas, G., et al, “Cas9-crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria, ” Proc Natl Acad Sci USA (2012) 109: E2579-E2 86; Jinek, M., et al, “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity, ” Science (2012) 337: 816-821; Sternberg, S.
- DNA cleavage e.g., double-strand breaks
- DNA break repair allows for the introduction of gene modification (s) (e.g., nucleic acid editing) .
- Cysteine residues are highly reactive residues that are subject to posttranslational modifications. Formation of undesired disulfide bonds and/or modifications could affect proper folding, and/or localization, and/or enzymatic activity of a protein. There are 8-9 cysteine residues in most Cas12a orthologs; in comparison, there are only 2 cysteine residues in Cas9 from Streptococcus pyogenes (SpCas9) . Most cysteine residues in Cas12a orthologs are not conserved. Therefore, those exposed on the surface are more likely to be involved in intermolecular disulfide bond formation and/or posttranslational modifications. conserveed cysteine residues LbCas12a protein, FnCas12a protein, AsCas12a protein, and Mb2Cas12a are shown in Tables 1-4.
- variant Cas12a proteins comprising at least one human-induced mutation.
- fusion proteins comprising the variant Cas12a proteins and one or more heterologous domains.
- associated nucleic acids, DNA constructs, vectors, cells, and methods of editing nucleic acids using the variant Cas12a proteins and/or fusion proteins result in an increased frequency of desired nucleic acid edits.
- the edits are SDN-1 edits.
- the increased frequency of desired nucleic acid edits is seen at genomic sites that are difficult to edit.
- variant Cas12a proteins comprising at least one human-induced mutation that have enhanced function (i.e., when compared to unmodified Cas12a proteins) .
- fusion proteins comprising said variant Cas12a proteins and at least one heterologous domain.
- the enhanced function of Cas12a is increased SDN-1 genome editing activity.
- the variant Cas12a proteins comprise substitutions of one or more surface-exposed cysteine residues.
- the variant Cas12a proteins comprise cysteine to serine substitutions at one or more surface-exposed cysteine residues.
- the variant Cas12a proteins provided herein further comprise a substitution of an aspartic acid residue and/or a glutamic acid residue to an arginine residue.
- Cas12a (which is also referred to as Cpf1) is a Class II, Type V CRISPR/Cas.
- a variant Cas12a protein provided herein can be a modified form of Cas12a from any of a number of bacterial species including, but not limited to, Lachnospiraceae bacterium, Acidaminococcus sp., Moraxella bovoculi, Thiomicrospira sp., Moraxella lacunata, Methanomethylophilus alvus, Btyrivibrio sp., or Bacteroidetesoral sp.
- Unmodified Cas12a protein sequences include Lachnospiraceae bacterium Cas12a (LbCas12a; SEQ ID NO: 1) , Francisella novicida U112 Cas12a (FnCas12a; SEQ ID NO: 2) , Acidaminococcus sp. Cas12a (AsCas12a; SEQ ID NO: 3) , and Moraxella bovoculi strain 57922 Cas12a (Mb2Cas12a; SEQ ID NO: 4) .
- the variant Cas12a protein is a modified form of LbCas12a.
- the Cas12a protein comprises a sequence that is at least 60%identical (e.g., at least 65%, at least 70%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%) to the amino acid sequence of SEQ ID NO: 1 and at least one human-induced mutation.
- the human-induced mutation is a substitution of a surface-exposed cysteine residue.
- Surface-exposed cysteine residues can be identified using methods known in the art, e.g., by the methods described in the Examples herein.
- one or more surface-exposed cysteine residues are substituted with another residue (e.g., a serine residue) .
- the human-induced mutation is at position C965 (i.e., the cysteine residue at position 965 of SEQ ID NO: 1) .
- the human-induced mutation is a substitution of the cysteine residue.
- the human-induced mutation is a cysteine to serine substitution.
- the Cas12a protein further comprises a human-induced mutation at position D156 (i.e., the aspartic acid residue at position 156 of SEQ ID NO: 1) , as described for example in WO2018195545 and WO2017184768, which are incorporated herein by reference in their entiriety.
- the human-induced mutation is a substitution of the aspartic acid residue.
- the human-induced mutation is an aspartic acid to arginine substitution.
- the sequence of the Cas12a protein comprises any one of SEQ ID NOs: 5-11.
- the variant Cas12a protein is a modified form of FnCas12a.
- the Cas12a protein comprises a sequence that is at least 60%identical (e.g., at least 65%, at least 70%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%) to the amino acid sequence of SEQ ID NO: 2 and at least one human-induced mutation.
- the human-induced mutation is a substitution of a surface-exposed cysteine residue. In some embodiments, one or more surface-exposed cysteine residues are substituted with another residue (e.g., a serine residue) . In some embodiments, the human-induced mutation is at position C70, C1116, and/or C1190. In some embodiments, the human-induced mutation is a substitution of the cysteine residue. In some embodiments, the human-induced mutation is a cysteine to serine substitution.
- the Cas12a protein further comprises a human-induced mutation at position E184 (i.e., the glutamic acid residue at position 184 of SEQ ID NO: 2) , as described for example in WO2018195545, which is incorporated herein by reference in their entiriety.
- the human-induced mutation is a substitution of the glutamic acid residue.
- the human-induced mutation is a glutamic acid to arginine substitution.
- the variant Cas12a protein is a modified form of AsCas12a.
- the Cas12a protein comprises a sequence that is at least 60%identical (e.g., at least 65%, at least 70%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%) to the amino acid sequence of SEQ ID NO: 3 and at least one human-induced mutation.
- the human-induced mutation is a substitution of a surface-exposed cysteine residue. In some embodiments, one or more surface-exposed cysteine residues are substituted with another residue (e.g., a serine residue) . In some embodiments, the human-induced mutation is at position C334, C379, and/or C674. In some embodiments, the human-induced mutation is a substitution of the cysteine residue. In some embodiments, the human-induced mutation is a cysteine to serine substitution. In some embodiments, the Cas12a protein further comprises a human-induced mutation at position E174, as described for example in WO2018195545, which is incorporated herein by reference in their entiriety. In some embodiments, the human-induced mutation is a substitution of the glutamic acid residue. In some embodiments, the human-induced mutation is a glutamic acid to arginine substitution.
- the variant Cas12a protein is a modified form of Mb2Cas12a.
- the Cas12a protein comprises a sequence that is at least 60%identical (e.g., at least 65%, at least 70%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%) to the amino acid sequence of SEQ ID NO: 4 and at least one human-induced mutation.
- the human-induced mutation is a substitution of a surface-exposed cysteine residue. In some embodiments, one or more surface-exposed cysteine residues are substituted with another residue (e.g., a serine residue) . In some embodiments, the human-induced mutation is at position C270, C583, C1068, C1099, and/or C1149. In some embodiments, the human-induced mutation is a substitution of the cysteine residue. In some embodiments, the human-induced mutation is a cysteine to serine substitution. In some embodiments, the Cas12a protein further comprises a human-induced mutation at position D172. In some embodiments, the human-induced mutation is a substitution of the aspartic acid residue. In some embodiments, the human-induced mutation is an aspartic acid to arginine substitution. In some embodiments, the sequence of the Cas12a protein comprises any one of SEQ ID NOs: 12-19.
- a Cas protein (e.g., a Cas12a protein) can comprise one or more domains.
- domains include guide nucleic acid recognition and/or binding domains, nuclease domains (e.g., DNase or RNase domains, RuvC, HNH) , DNA binding domains, RNA binding domains, helicase domains, protein-protein interaction domains, and dimerization domains.
- a guide nucleic acid recognition and/or binding domain can interact with a guide nucleic acid.
- a nuclease domain can comprise catalytic activity for nucleic acid cleavage.
- a nuclease domain can lack catalytic activity to prevent nucleic acid cleavage.
- a Cas protein can be a chimeric Cas protein that is fused to other proteins or polypeptides.
- a Cas protein can be a chimera of various Cas proteins, for example, comprising domains from different Cas proteins.
- a Cas protein (e.g., a Cas12a protein) used herein can be an active variant, inactive variant, or fragment of a wild-type or modified Cas protein.
- a Cas protein can comprise an amino acid change such as a deletion, insertion, substitution, variant, mutation, fusion, chimera, or any combination thereof relative to a wild-type version of the Cas protein.
- a Cas protein can be a polypeptide with at least about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%sequence identity or sequence similarity to a wild-type exemplary Cas protein.
- a Cas protein can be a polypeptide with at most about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%sequence identity and/or sequence similarity to a wild-type exemplary Cas protein.
- Variants or fragments can comprise at least about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%sequence identity or sequence similarity to a wild-type or modified Cas protein or a portion thereof.
- Variants or fragments can be targeted to a nucleic acid locus in complex with a guide nucleic acid while lacking nucleic acid cleavage activity.
- a modified Cas protein has decreased function relative to the unmodified form.
- a modified Cas protein is deficient in a function of the unmodified form.
- a nuclease deficient Cas protein retains the ability to bind DNA but lacks or has reduced nucleic acid cleavage activity.
- a Cas nuclease e.g., retaining wild-type nuclease activity, having reduced nuclease activity, and/or lacking nuclease activity
- the Cas protein can bind to a target polynucleotide and prevent transcription by physical obstruction or edit a nucleic acid sequence to yield non-functional gene products.
- the modified Cas protein has no more than 90%, no more than 80%, no more than 70%, no more than 60%, no more than 50%, no more than 40%, no more than 30%, no more than 20%, no more than 10%, no more than 5%, or no more than 1%of the function (e.g., nuclease activity) of the wild-type Cas protein (e.g., Cas12a) .
- the modified Cas protein has no substantial function of the wild-type Cas protein.
- a Cas protein When a Cas protein is a modified form that has no substantial nucleic acid-cleaving activity, it can be referred to as enzymatically inactive and/or “dead” (abbreviated by “d” ) .
- a dead Cas protein e.g., dCas, dCas12a
- a Cas12a protein provided herein is a dCas12a protein.
- a modified Cas protein can be a modified Cas “base editor” .
- Base editing enables direct, irreversible conversion of one target DNA base into another in a programmable manner, without requiring DNA cleavage or a donor DNA molecule.
- Komor et al 2016, Nature, 533: 420-424) , teach a Cas9-cytidine deaminase fusion, where the Cas9 has also been engineered to be inactivated and not induce double-stranded DNA breaks.
- a Cas12a protein provided herein is a modified Cas12a base editor.
- a Cas protein can be modified to optimize regulation of gene expression.
- a Cas protein can be modified to increase or decrease nucleic acid binding affinity, nucleic acid binding specificity, and/or enzymatic activity.
- Cas proteins can also be modified to change any other activity or property of the protein, such as stability. For example, one or more nuclease domains of the Cas protein can be modified, deleted, or inactivated, or a Cas protein can be truncated to remove domains that are not essential for the function of the protein or to optimize (e.g., enhance or reduce) the activity of the Cas protein for regulating gene expression.
- One or a plurality of the nuclease domains (e.g., RuvC, HNH) of a Cas protein can be deleted or mutated so that they are no longer functional or comprise reduced nuclease activity.
- a Cas protein comprising at least two nuclease domains (e.g., Cas12a)
- the resulting Cas protein can generate a single-strand break at a CRISPR RNA (crRNA) recognition sequence within a double-stranded DNA but not a double-strand break.
- crRNA CRISPR RNA
- nickase can cleave the complementary strand or the non-complementary strand, but may not cleave both.
- double strand break targeting specificity is improved by targeting a nickase to opposite strands at two nearby loci. If a nickase cleaves the single strand at both loci, a double strand break is formed and can be repaired as described herein. If all of the nuclease domains of a Cas protein (e.g., RuvC nuclease domains in a Cas12a protein) are deleted or mutated, the resulting Cas protein can have a reduced or no ability to cleave both strands of a double-stranded DNA.
- a Cas12a protein provided herein is a Cas12a nickase protein.
- fusion proteins comprising any of the proteins described above and a heterologous domain.
- a “fusion protein” is a protein comprising two different polypeptide sequences, i.e. a Cas12a protein sequence as described above and a heterologous polypeptide sequence, that are joined or linked to form a single polypeptide.
- the two amino acid sequences are encoded by separate nucleic acid sequences that have been joined so that they are transcribed and translated to produce a single polypeptide.
- the Cas12a protein and the heterologous domain can be linked in any order and orientation relative to each other.
- the C’ terminal end of the Cas12a protein may be linked to the N’ terminal end or the C’ terminal end of the heterologous domain.
- the Cas12a protein and the heterologous domain may also be separated by one or more additional fusion protein domains, as described below.
- heterologous domains include deaminase domains, transcription factor domains, nuclease domains, reverse-transcriptase domains, transposase domains, integrase domains, uracil DNA glycosylase inhibitor domains, recombinase domains, nickase domains, methyltransferase domains, methylase domains, acetylase domains, acetyltransferase domains, transcriptional activator domains, and transcriptional repressor domains. See, e.g., WO2021/061507, incorporated herein by reference in its entirety.
- the fusion proteins provided herein comprise one or more linkers.
- Linkers also referred to as spacers, as used herein are flexible molecules or a flexible stretch of molecules that joins or connects two portions (e.g., domains) of a fusion protein or a variant Cas12a protein as provided herein.
- the linker is a polypeptide. Proteins with domains joined by polypeptide linkers are referred to as fusion proteins. In some embodiments, the linker is a non-peptide linker. Proteins with domains joined by polypeptide linkers are referred to as modified proteins. It will be understood that, where fusion proteins are discussed throughout the present disclosure, modified proteins are generally also contemplated, where feasible.
- the linker may increase the range of orientations that may be adopted by the domains of the fusion protein or variant protein.
- the linker may be optimized to produce desired effects in the fusion protein or variant protein. Aspects of linker design and considerations are described, for example, in Chen, X. et al., Adv Drug Deliv Rev. 2013 Oct 15; 65 (10) : 1357-1369, and Klein, J.S. et al. 2014 Protein Eng. Des. Sel. 27 (10) : 325-330.
- the proteins provided herein comprise a peptide linker.
- the proteins provided herein comprise a non-peptide linker.
- the proteins provided herein comprise a peptide linker and a non-peptide linker.
- the proteins provided herein may also comprise a plurality of linkers, including at least one peptide linker, at least one non-peptide linker, or at least one peptide linker and at least one non-peptide linker.
- Linkers may be short or long, flexible or rigid. See, e.g., WO2021/061507, which incorporated herein by reference in its entirety, and WO 2020/168102, incorporated herein by reference in its entirety, and US 2021/0017506, incorporated herein by reference in its entirety.
- the length of a linker may affect one or more functions of the fusion protein. Selection of linkers to achieve the desired length is within the ability of one skilled in the art.
- a peptide linker may be, for example, 5 to 100 or more amino acids in length (e.g., 4 aa, 5 aa, 8 aa, 10 aa, 15 aa, 18 aa, 20 aa, 25 aa, 30 aa, 35 aa, 40 aa, 45 aa, 50 aa, 55 aa, 60 aa, 65 aa, 70 aa, 75 aa, 80 aa, 85 aa, 90 aa, 95 aa, or 100 aa) .
- the linker is about 30 amino acids in length. In some embodiments, the linker is about 8 amino acids in length.
- linker sequence may have various conformations in secondary structure, such as helical, ⁇ -strand, coil/bend, and turns.
- a linker sequence may have an extended conformation and function as an independent domain that does not interact with the adjacent protein domains.
- Linker sequences may be flexible or rigid. Flexible linkers provide a certain degree of movement or interaction between the polypeptide domains and are generally rich in small or polar amino acids such as Gly and Ser (e.g., at least 90%, at least 95%, at least 98%, at least 99%, or all of the amino acid residues of the linker are either Gly or Ser) .
- a rigid linker can be used to keep a fixed distance between the domains and to help maintain their independent functions. Linker attachment can be through an amide linkage (e.g., a peptide bond) or other functionalities as discussed further below.
- a peptide linker described herein comprises one or more repeats (e.g., 2 repeats, 3 repeats, 4 repeats, 5 repeats 6 repeats, or more) of GSSSS (SEQ ID NO: 43) and/or one or more repeats of GGGGS (SEQ ID NO: 44) and/or one or more repeats of GSSGSS (SEQ ID NO: 45) and/or one or more repeats of SGGS (SEQ ID NO: 77) .
- the linker comprises an amino acid sequence with at least 90%sequence identity to (GSSSS) 6 (SEQ ID NO: 46) or (SGGS) 2 (SEQ ID NO: 78) .
- Additional exemplary peptide linkers include, but are not limited to, peptide linkers comprising SGSETPGTSESATPE (SEQ ID NO: 47) , SGSETPGTSESATPES (SEQ ID NO: 48) , (GGGGS) 3 (SEQ ID NO: 49) , (GGGGS) 5 (SEQ ID NO: 50) , (GGGGS) 10 (SEQ ID NO: 51) , GGGGGGGG (SEQ ID NO: 52) , GSAGSAAGSGEF (SEQ ID NO: 53) , A (EAAAK) 3 A (SEQ ID NO: 54) , or A (EAAAK) 10 A (SEQ ID NO: 55) .
- linkers that can be used include those disclosed in PCT/US2020/051383, Chen et al., Adv. Drug. Deliv. Rev. 65 (10) : 1357-1369 (2014) and Rosemalen et al., Biochemistry 2017, 56, 50, 6565-6574, the entire contents of both of which are herein incorporated by reference.
- a non-peptide linker can comprise any of a number of known chemical linkers.
- exemplary chemical linkers can include one or more units of beta-alanine, 4-aminobutyric acid (GABA) , (2-aminoethoxy) acetic acid (AEA) , 5-aminobexanoic acid (Ahx) , PEG multimers, and trioxatricdeacan-succinamic acid (Ttds) .
- the non-peptide linker comprises one or more units of polyethylene glycol (PEG) , which is commonly used as a linker for conjugation of polypeptide domains due to its water solubility, lack of toxicity, low immunogenicity, and well-defined chain lengths. See, e.g., Ramirez-Paz, J., et al., PLoS One 13 (7) : e0197643 (2016) .
- the number of PEG linkage units may be selected based on the desired length of the linker.
- Modified proteins comprising a non-peptide linker can be produced in a variety of ways.
- a Cas12a protein and a heterologous domain may be produced separately (e.g., in vitro or by expression in and purification from host cells) and chemically linked in vitro.
- a Cas12a protein, a heterologous domain, and a linker can each be produced separately and chemically linked in vitro.
- Various chemical linkers may be used to cross link two amino acid residues.
- a site-directed nuclease of the present disclosure may comprise an MS2 RNA aptamer, which would facilitate interaction with a nonspecific end-processing enzyme comprising an MS2 coat protein.
- the fusion protein provided herein comprises an amino acid sequence having at least 70% (e.g., at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) identity to any one of SEQ ID NOs: 1-4.
- the fusion protein provided herein comprises an amino acid sequence as set forth in any one of SEQ ID NOs: 5-19.
- any of the proteins and fusion proteins described herein can further comprise a targeting sequence which mediates the localization (or retention) of the protein to a sub-cellular location, e.g., plasma membrane or membrane of a given organelle, nucleus, cytosol, mitochondria, endoplasmic reticulum (ER) , Golgi, chloroplast, apoplast, peroxisome or other organelle.
- a targeting sequence which mediates the localization (or retention) of the protein to a sub-cellular location, e.g., plasma membrane or membrane of a given organelle, nucleus, cytosol, mitochondria, endoplasmic reticulum (ER) , Golgi, chloroplast, apoplast, peroxisome or other organelle.
- a targeting sequence can direct a protein (e.g., a nuclease) to a nucleus utilizing a nuclear localization signal (NLS) ; outside of a nucleus of a cell, for example to the cytoplasm, utilizing a nuclear export signal (NES) ; mitochondria utilizing a mitochondrial targeting signal; the endoplasmic reticulum (ER) utilizing an ER-retention signal; a peroxisome utilizing a peroxisomal targeting signal; plasma membrane utilizing a membrane localization signal; or combinations thereof.
- the protein comprises a nuclear localization signal.
- Non-limiting examples of NLSs include an NLS sequence derived from: the NLS of the SV40 virus large T-antigen, having the amino acid sequence PKKKRKV (SEQ ID NO: 56) ; the NLS from nucleoplasmin (e.g.
- the nucleoplasmin bipartite NLS with the sequence KRPAATKKAGQAKKKK (SEQ ID NO: 57) ) ; the c-myc NLS having the amino acid sequence PAAKRVKLD (SEQ ID NO: 58) or RQRRNELKRSP (SEQ ID NO: 59) ; the hRNPA1 M9 NLS having the sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 60) ; the sequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO: 61) of the IBB domain from importin-alpha; the sequences VSRKRPRP (SEQ ID NO: 62) and PPKKARED (SEQ ID NO: 63) of the myoma T protein; the sequence PQPKKKPL (SEQ ID NO: 64) of human p53; the sequence SALIKKKKKMAP (SEQ ID NO:
- any of the proteins and fusion proteins described herein can further comprise a detectable moiety, for example, a fluorescent protein or fragment thereof.
- fluorescent proteins include, but are not limited to, yellow fluorescent protein (YFP, for example, Venus) , green fluorescent protein (GFP) , and red fluorescent protein (RFP) as well as derivatives, for example, mutant derivatives, of these proteins. See, for example, Chudakov et al. “Fluorescent Proteins and Their Applications in Imaging Living Cells and Tissues, ” Physiological Reviews 90 (3) : 1103-1163 (2010) ; and Specht et al., “A Critical and Comparative Review of Fluorescent Tools for Live-Cell Imaging, ” Annual Review of Physiology 79: 93-117 (2017)) .
- any of the proteins and fusion proteins described herein can further comprise an affinity tag, for example, a polyhistidine tag (e.g., (His) 6 (SEQ ID NO: 73) ) , an HA tag (e.g., YPYDVPDYA (SEQ ID NO: 74) ) , albumin-binding protein, alkaline phosphatase, an AU1 epitope, an AU5 epitope, a biotin-carboxy carrier protein (BCCP) , a FLAG epitope (e.g., DYKDDDDK (SEQ ID NO: 75) , or a MYC epitope (e.g., EQKLISEEDL (SEQ ID NO: 76)) , to name a few.
- a polyhistidine tag e.g., (His) 6 (SEQ ID NO: 73)
- an HA tag e.g., YPYDVPDYA (SEQ ID NO:
- variants of the polypeptides e.g., proteins and fusion proteins
- Polypeptide variants retain their respective biological activity, unless explicitly noted otherwise.
- variants of a Cas12a polypeptide retain the biological function of the full length, native sequence site directed Cas12a protein.
- variants of the heterologous domain retain the biological function of the full length, native sequence heterologous domain.
- Modifications to any of the polypeptides or proteins provided herein are made by known methods.
- modifications are made by site specific mutagenesis of nucleotides in a nucleic acid encoding the polypeptide, thereby producing a DNA encoding the modification, and thereafter expressing the DNA in recombinant cell culture to produce the encoded polypeptide.
- Techniques for making substitution mutations at predetermined sites in DNA having a known sequence are well known. For example, M13 primer mutagenesis and PCR-based mutagenesis methods can be used to make one or more substitution mutations.
- Any of the nucleic acid sequences provided herein can be codon-optimized to alter, for example, maximize expression, in a host cell or organism.
- the amino acids in the polypeptides described herein can be any of the 20 naturally occurring amino acids, D-stereoisomers of the naturally occurring amino acids, unnatural amino acids and chemically modified amino acids.
- Unnatural amino acids that is, those that are not naturally found in proteins
- Zhang et al. Protein engineering with unnatural amino acids, ” Curr. Opin. Struct. Biol. 23 (4) : 581-587 (2013) ; Xie et la. “Adding amino acids to the genetic repertoire, ” 9 (6) : 548-54 (2005) ) ; and all references cited therein.
- ⁇ and ⁇ amino acids are known in the art and are also contemplated herein as unnatural amino acids.
- a chemically modified amino acid refers to an amino acid whose side chain has been chemically modified.
- a side chain can be modified to comprise a signaling moiety, such as a fluorophore or a radiolabel.
- a side chain can also be modified to comprise a new functional group, such as a thiol, carboxylic acid, or amino group.
- Post-translationally modified amino acids are also included in the definition of chemically modified amino acids.
- conservative amino acid substitutions can be made in one or more of the amino acid residues, for example, in one or more lysine residues of any of the polypeptides provided herein.
- conservative amino acid substitutions can be made in one or more of the amino acid residues, for example, in one or more lysine residues of any of the polypeptides provided herein.
- One of skill in the art would know that a conservative substitution is the replacement of one amino acid residue with another that is biologically and/or chemically similar.
- the following eight groups each contain amino acids that are conservative substitutions for one another:
- recombinant nucleic acids encoding any of the variant Cas12a proteins or fusion proteins described herein.
- a recombinant nucleic acid encoding a polypeptide that has at least 70% (e.g., at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) identity to any of SEQ ID NOs: 20-34 is also provided.
- recombinant nucleic acids having at least 70%identity to any of SEQ ID NOs: 20-34 are also provided.
- a DNA construct comprising a promoter operably linked to a recombinant nucleic acid encoding a fusion protein or domains thereof as described herein.
- a nucleic acid is “operably linked” when it is placed into a functional relationship with another nucleic acid sequence.
- Numerous promoters can be used in the constructs described herein.
- a promoter is a region or a sequence located upstream and/or downstream from the start of transcription that is involved in recognition and binding of RNA polymerase and other proteins to initiate transcription.
- promoter refers to a nucleotide sequence, usually upstream (5’ ) to its coding sequence, which controls the expression of the coding sequence by providing the recognition for RNA polymerase and other factors required for proper transcription.
- Promoter regulatory sequences consist of proximal and more distal upstream elements. Promoter regulatory sequences influence the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences include enhancers, promoters, untranslated leader sequences, introns, and polyadenylation signal sequences. They include natural and synthetic sequences as well as sequences that may be a combination of synthetic and natural sequences.
- promoter is a DNA sequence that can stimulate promoter activity and may be an innate element of the promoter or a heterologous element inserted to enhance the level or tissue specificity of a promoter. It is capable of operating in both orientations (normal or flipped) and is capable of functioning even when moved either upstream or downstream from the promoter.
- promoter includes “promoter regulatory sequences. ”
- promoters The choice of promoters to be included depends upon several factors, including, but not limited to, efficiency, selectability, inducibility, desired expression level, and cell-or tissue-preferential expression. It is a routine matter for one of skill in the art to modulate the expression of a sequence by appropriately selecting and positioning promoters and other regulatory regions relative to that sequence.
- tissue specific promoters or tissue-preferred promoters
- RNA synthesis may occur in other tissues at reduced levels
- Certain promoters are able to direct RNA synthesis at relatively similar levels across all tissues of a plant. These are called “constitutive promoters" or “tissue-independent” promoters. Constitutive promoters can be divided into strong, moderate, and weak categories according to their effectiveness to directing RNA synthesis. Since it is necessary in many cases to simultaneously express a chimeric gene (or genes) in different tissues of a plant to get the desired functions of the gene (or genes) , constitutive promoters are especially useful in this regard.
- nopaline synthase (NOS) promoter (Ebert et al., Proc. Natl. Acad. Sci. USA 84: 5745-5749 (1987) ) ; the octapine synthase (OCS) promoter; caulimovirus promoters such as the cauliflower mosaic virus (CaMV) 19S promoter (Lawton et al., Plant Mol. Biol. 9: 315-324 (1987) ) ; the light inducible promoter from the small subunit of rubisco (Pellegrineschi et al., Biochem. Soc. Trans.
- NOS nopaline synthase
- OCS octapine synthase
- caulimovirus promoters such as the cauliflower mosaic virus (CaMV) 19S promoter (Lawton et al., Plant Mol. Biol. 9: 315-324 (1987) )
- the light inducible promoter from the small subunit of rubisco (P
- promoters combining elements from more than one promoter may be useful.
- U.S. Pat. No. 5,491,288 discloses combining a Cauliflower Mosaic Virus promoter with a histone promoter.
- the elements from the promoters disclosed herein may be combined with elements from other promoters.
- Promoters which are useful for plant transgene expression include those that are inducible, viral, synthetic, constitutive (Odell Nature 313: 810–812 (1985) ) , temporally regulated, spatially regulated, tissue specific, and spatial temporally regulated.
- numerous agronomic genes can be expressed in transformed plants. More particularly, plants can be genetically engineered to express various phenotypes of agronomic interest. ”
- the promoter can be a eukaryotic or a prokaryotic promoter.
- the promoter is an inducible promoter, a native inducible promoter (e.g., drought-inducible Rab17) , a synthetic inducible promoter (e.g., auxin-inducible DR5, estradiol-inducible XVE/pLex, dexamethasone- inducible GVG/Gal4) , a constitutive promoter (e.g., ZmUbq1, OsAct1, OsTub3, EF, EF1 ⁇ ) , an egg cell-specific promoter (e.g., EC1, EC2, EC3, EC4, EC5) , a pollen-specific promoter, an apical meristem tissue-specific promoter, or a promoter with enriched expression in the zygote.
- a native inducible promoter e.g., drought-inducible Rab17
- the promoter is a floral mosaic promoter (e.g., ZmBde1, OsAP1) .
- the promoter is a ubiquitin 4 promoter (e.g., a sugarcane ubiquitin 4 promoter) , an actin promoter, a tubulin promoter, a MADS box promoter, or a plant virus promoter.
- Suitable promoters are disclosed, e.g., in U.S. Pat. No. 10,519,456, the entire content of which is herein incorporated by reference, and PCT/US2022/020690, incorporated herein by reference.
- the recombinant nucleic acids provided herein can be included in expression cassettes for expression in a host cell or an organism of interest.
- the cassette will include 5′and 3′regulatory sequences operably linked to a recombinant nucleic acid provided herein that allows for expression of a fusion protein.
- the cassette may additionally contain at least one additional gene or genetic element to be cotransformed into the cell or organism. Where additional genes or elements are included, the components are operably linked. Alternatively, the additional gene (s) or element (s) can be provided on multiple expression cassettes.
- Such an expression cassette is provided with a plurality of restriction sites and/or recombination sites for insertion of the polynucleotides to be under the transcriptional regulation of the regulatory regions.
- the expression cassette may additionally contain a selectable marker gene.
- the expression cassette will include in the 5′ to 3′ direction of transcription: a transcriptional and translational initiation region (i.e., a promoter) , a polynucleotide of the invention, and a transcriptional and translational termination region (i.e., termination region) functional in the cell or organism of interest.
- the promoters of the invention are capable of directing or driving expression of a coding sequence (i.e., a nucleic acid sequence that is transcribed into RNA such as mRNA, rRNA, tRNA, snRNA, ncRNA, lncRNA, sense RNA, or antisense RNA, regardless of whether the RNA is then translated to produce a protein) in a host cell.
- a coding sequence i.e., a nucleic acid sequence that is transcribed into RNA such as mRNA, rRNA, tRNA, snRNA, ncRNA, lncRNA, sense RNA, or antisense RNA, regardless of whether the RNA is then translated to produce a protein
- the regulatory regions i.e., promoters, transcriptional regulatory regions, and translational termination regions
- heterologous in reference to a sequence is a sequence that originates from a foreign species, or, if from the same species, is substantially modified from its native form in composition
- Additional regulatory signals include, but are not limited to, transcriptional initiation start sites, operators, activators, enhancers, other regulatory elements, ribosomal binding sites, an initiation codon, termination signals, and the like. See Sambrook et al. (1992) Molecular Cloning: A Laboratory Manual, ed. Maniatis et al. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. ) ; Davis et al., eds. (1980) Advanced Bacterial Genetics (Cold Spring Harbor Laboratory Press) , Cold Spring Harbor, N.Y., and the references cited therein.
- the expression cassette can also comprise a selectable marker gene for the selection of transformed cells.
- Marker genes include genes conferring antibiotic resistance, such as those conferring hygromycin resistance, ampicillin resistance, gentamicin resistance, neomycin resistance, to name a few. Additional selectable markers are known and any can be used.
- the various DNA fragments may be manipulated, so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame.
- adapters or linkers may be employed to join the DNA fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like.
- in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions may be involved.
- the various DNA fragments may be manipulated, so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame.
- adapters or linkers may be employed to join the DNA fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like.
- in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions may be used.
- a vector comprising a recombinant nucleic acid or DNA construct set forth herein.
- the vector is contemplated to have the necessary functional elements that direct and regulate transcription of the inserted nucleic acid.
- These functional elements include, but are not limited to, a promoter, regions upstream or downstream of the promoter, such as enhancers and terminators, that may regulate the transcriptional activity of the promoter, an origin of replication, appropriate restriction sites to facilitate cloning of inserts adjacent to the promoter, antibiotic resistance genes or other markers which can serve to select for cells containing the vector or the vector containing the insert, RNA splice junctions, a transcription termination region, or any other region which may serve to facilitate the expression of the inserted gene or hybrid gene.
- the constructs and vectors comprise a nopaline synthase gene terminator sequence (e.g., an Agrobacterium tumefaciens nopaline synthase gene terminator sequence) .
- E. coli expression vectors There are numerous E. coli expression vectors known to one of ordinary skill in the art, which are useful for the expression of a nucleic acid.
- Other microbial hosts suitable for use include bacilli, such as Bacillus subtilis, and other enterobacteriaceae, such as Salmonella, Senatia, and various Pseudomonas species.
- bacilli such as Bacillus subtilis
- enterobacteriaceae such as Salmonella, Senatia
- various Pseudomonas species such as Salmonella, Senatia, and various Pseudomonas species.
- prokaryotic hosts one can also make expression vectors, which will typically contain expression control sequences compatible with the host cell (e.g., an origin of replication) .
- any number of a variety of well-known promoters will be present, such as the lactose promoter system, a tryptophan (Trp) promoter system, a beta-lactamase promoter system, or a promoter system from phage lambda.
- yeast expression can be used.
- a nucleic acid encoding a polypeptide of the present invention wherein the nucleic acid can be expressed by a yeast cell. More specifically, the nucleic acid can be expressed by Pichia pastoris or S. cerevisiae.
- Mammalian cells also permit the expression of proteins in an environment that favors important post-translational modifications such as folding and cysteine pairing, addition of complex carbohydrate structures, and secretion of active protein.
- Vectors useful for the expression of active proteins in mammalian cells are known in the art and can contain genes conferring hygromycin resistance, geneticin or G418 resistance, or other genes or phenotypes suitable for use as selectable markers, or methotrexate resistance for gene amplification.
- a number of suitable host cell lines capable of secreting intact human proteins have been developed in the art, and include CHO cells, HeLa cells, HEK-293 cells, HEK-293T cells, U2OS cells, or any other primary or transformed cell line.
- suitable host cell lines include COS-7 cells, myeloma cell lines, Jurkat cells, etc.
- Expression vectors for these cells can include expression control sequences, such as an origin of replication, a promoter, an enhancer, and necessary information processing sites, such as ribosome binding sites, RNA splice sites, polyadenylation sites, and transcriptional terminator sequences.
- Preferred expression control sequences are promoters derived from immunoglobulin genes, SV40, Adenovirus, Bovine Papilloma Virus, etc.
- the expression vectors described herein can also include the nucleic acids as described herein under the control of an inducible promoter such as the tetracycline inducible promoter or a glucocorticoid inducible promoter.
- the nucleic acids of the present invention can also be under the control of a tissue-specific promoter to promote expression of the nucleic acid in specific cells, tissues or organs.
- Any regulatable promoter such as a metallothionein promoter, a heat-shock promoter, and other regulatable promoters, of which many examples are well known in the art are also contemplated.
- a Cre-loxP inducible system can also be used, as well as a Flp recombinase inducible promoter system, both of which are known in the art.
- Insect cells also permit the expression of the polypeptides.
- Recombinant proteins produced in insect cells with baculovirus vectors undergo post-translational modifications similar to that of wild-type mammalian proteins.
- the cell is a plant cell.
- the plant cell is a maize plant cell, a wheat plant cell, a rice plant cell, a soybean plant cell, a sunflower plant cell, or a tomato plant cell.
- a host cell comprising a nucleic acid or a vector described herein is provided.
- the host cell can be an in vitro, ex vivo, or in vivo host cell.
- Host cells as provided herein are capable of expressing the fusion protein.
- Cell populations of any of the host cells described herein are also provided.
- the cell population comprises a plurality of cells, wherein the plurality of cells comprise a recombinant nucleic acid encoding the fusion protein as described herein.
- the cell population comprises a plurality of cells, wherein the plurality of cells comprises a DNA construct encoding the protein and/or fusion protein as described herein.
- the cell population comprises a plurality of cells, wherein the plurality of cells comprises a vector comprising a recombinant nucleic acid or a DNA construct encoding the protein and/or fusion protein as described herein. In some embodiments, the cell population comprises a plurality of cells, wherein the plurality of cells comprise a plurality of any of the host cells described herein. In some embodiments, a plurality of cells of any of the cell populations described herein express a protein and/or fusion protein as described herein.
- the provided cells express the protein and/or fusion protein stably or transiently.
- Stable expression of the protein and/or fusion protein in a cell refers to integration of any of the nucleic acids, DNA constructs, or vectors described herein into the genome of the cell, thereby allowing the cell to express the protein and/or fusion protein.
- Transient expression refers to expression of the protein and/or fusion protein directly from any of the nucleic acids, DNA constructs, and/or vectors following introduction into the cell (i.e., the gene encoding the protein and/or fusion protein is not integrated into the genome of the cell) .
- the provided cells express the protein and/or fusion protein constitutively or inducibly.
- Constitutive expression refers to ongoing, continuous expression of a gene (i.e., of a protein)
- inducible expression refers to gene (protein) expression that is responsive to a stimulus.
- Inducible expression is generally regulated via an inducible promoter, a description of which is included above.
- a cell culture comprising one or more host cells described herein is also provided.
- Methods for the culture and production of many cells including cells of bacterial (for example E. coli and other bacterial strains) , animal (especially mammalian) , and archebacterial origin are available in the art. See e.g., Sambrook, supra; Ausubel, ed.
- the host cell can be a prokaryotic cell, including, for example, a bacterial cell.
- the cell can be a eukaryotic cell, for example, a mammalian cell.
- the cell can be a HEK-293T cell, a HEK-293 cell, a Chinese hamster ovary (CHO) cell, a U2OS cell, or any other primary or transformed cell.
- the cell can be a COS-7 cell, a HELA cell, an avian cell, a myeloma cell, a Pichia cell, an insect cell or a plant cell.
- a number of other suitable host cell lines have been developed and include myeloma cell lines, fibroblast cell lines, and a variety of tumor cell lines such as melanoma cell lines.
- the vectors containing the nucleic acid segments of interest can be transferred or introduced into the host cell by well-known methods, which vary depending on the type of cellular host.
- introducing in the context of introducing a nucleic acid into a cell (e.g., a prokaryotic cell, a bacterial cell, a eukaryotic cell, a plant cell) refers to the translocation of the nucleic acid sequence from outside a cell to inside the cell. In some cases, introducing refers to translocation of the nucleic acid from outside the cell to inside the nucleus of the cell.
- these nucleic acid molecules can be assembled as part of a single polynucleotide or nucleic acid construct, or as separate polynucleotide or nucleic acid constructs, and can be located on the same or different nucleic acie constructs. Accordingly, such polynucleotides can be introduced into cells (e.g., plant cells) in a single transformation event, in separate transformation events, or, e.g., as part of a breeding protocol.
- cells e.g., plant cells
- nucleic acid into a cell including but not limited to, electroporation, nanoparticle delivery, biolistic transformation, viral delivery, contact with nanowires or nanotubes, receptor mediated internalization, translocation via cell penetrating peptides, liposome mediated translocation, DEAE dextran, lipofectamine, calcium phosphate or any method now known or identified in the future for introduction of nucleic acids into prokaryotic or eukaryotic cellular hosts.
- a targeted nuclease system e.g., an RNA-guided nuclease, a transcription activator-like effector nuclease (TALEN) , a zinc finger nuclease (ZFN) , or a megaTAL (MT) can also be used to introduce a nucleic acid, for example, a nucleic acid encoding a protein and/or fusion protein described herein, into a host cell. See Li et al. Signal Transduction and Targeted Therapy 5, Article No. 1 (2020) .
- Transformation of a cell may be stable or transient.
- a transgenic cell, plant cell, plant and/or plant part of the invention can be stably transformed or transiently transformed. Transformation can refer to the transfer of a nucleic acid molecule into the genome of a host cell, resulting in genetically stable inheritance.
- the introduction into a plant, plant part and/or plant cell is via bacterial-mediated transformation, particle bombardment transformation, calcium-phosphate-mediated transformation, cyclodextrin-mediated transformation, electroporation, liposome-mediated transformation, nanoparticle-mediated transformation, polymer-mediated transformation, virus-mediated nucleic acid delivery, whisker-mediated nucleic acid delivery, microinjection, sonication, infiltration, polyethylene glycol-mediated transformation, protoplast transformation, or any other electrical, chemical, physical and/or biological mechanism that results in the introduction of nucleic acid into the plant, plant part and/or cell thereof, or any combination thereof.
- Procedures for transforming plants are well known and routine in the art and are described throughout the literature.
- Non-limiting examples of methods for transformation of plants include transformation via bacterial-mediated nucleic acid delivery (e.g. via bacteria from the genus Agrobacterium) , viral-mediated nucleic acid delivery, silicon carbide or nucleic acid whisker-mediated nucleic acid delivery, liposome mediated nucleic acid delivery, microinjection, microparticle bombardment, calcium-phosphate-mediated transformation, cyclodextrin-mediated transformation, electroporation, nanoparticle-mediated transformation, , sonication, infiltration, PEG-mediated nucleic acid uptake, as well as any other electrical, chemical, physical (mechanical) and/or biological mechanism that results in the introduction of nucleic acid into the plant cell, including any combination thereof.
- Agrobacterium-mediated transformation is a commonly used method for transforming plants because of its high efficiency of transformation and because of its broad utility with many different species.
- Agrobacterium-mediated transformation typically involves transfer of the binary vector carrying the foreign DNA of interest to an appropriate Agrobacterium strain that may depend on the complement of vir genes carried by the host Agrobacterium strain either on a co-resident Ti plasmid or chromosomally (Uknes et al. 1993, Plant Cell 5: 159-169) .
- the transfer of the recombinant binary vector to Agrobacterium can be accomplished by a tri-parental mating procedure using Escherichia coli carrying the recombinant binary vector, a helper E.
- the recombinant binary vector can be transferred to Agrobacterium by nucleic acid transformation ( and Willmitzer 1988, Nucleic Acids Res 16: 9877) .
- Transformation of a plant by recombinant Agrobacterium usually involves co-cultivation of the Agrobacterium with explants from the plant and follows methods well known in the art. Transformed tissue is typically regenerated on selection medium carrying an antibiotic or herbicide resistance marker between the binary plasmid T-DNA borders.
- Another method for transforming plants, plant parts and plant cells involves propelling inert or biologically active particles at plant tissues and cells. See, e.g., US Patent Nos. 4,945,050; 5,036,006 and 5,100,792. Generally, this method involves propelling inert or biologically active particles at the plant cells under conditions effective to penetrate the outer surface of the cell and afford incorporation within the interior thereof.
- the vector can be introduced into the cell by coating the particles with the vector containing the nucleic acid of interest.
- a cell or cells can be surrounded by the vector so that the vector is carried into the cell by the wake of the particle.
- Biolistic transformation refers to a method of introducing RNA or DNA into cells (e.g., plant cells) directly, in which RNA or DNA is mixed with heavy metal particles (e.g., tungsten or gold) and released into the cell (e.g., the plant cell) using high speed pressure to allow the RNA or DNA to penetrate the cell (e.g., to penetrate the plant cell wall) .
- heavy metal particles e.g., tungsten or gold
- the CRISPR/Cas system can also be used to edit the genome of a host cell or organism.
- the “CRISPR/Cas” system refers to a widespread class of bacterial systems for defense against foreign nucleic acid. Any of the CRISPR/Cas system components described herein may be used to introduce proteins, fusion proteins, recombinant nucleic acids, or systems into the genome of a host cell or organism. Methods for CRISPR/Cas system mediated genome editing are known in the art. It will be understood that use of a CRISPR/Cas system for introduction of proteins, fusion proteins, recombinant nucleic acids, or systems described herein into the genome of a host cell or organism is different from the particular methods and systems provided herein.
- any of the proteins and/or fusion proteins described herein can be purified or isolated from a host cell or population of host cells.
- a recombinant nucleic acid encoding any of the proteins and/or fusion proteins described herein can be introduced into a host cell under conditions that allow expression of the protein and/or fusion protein.
- the recombinant nucleic acid is codon-optimized for expression.
- the protein and/or fusion protein can be isolated or purified using purification methods known in the art.
- systems useful for editing one or more nucleic acids comprise one or more of the Cas12a proteins and/or fusion proteins (or recombinant nucleic acids, constructs, vectors, or host cells) described above.
- the systems further comprise one or more additional elements that are useful for editing one or more nucleic acids.
- a system comprising a fusion protein comprising a Cas nuclease may further comprise one or more guide nucleic acids, which are detailed below.
- the systems provided herein are useful for performing the methods described in Section VI of this disclosure.
- the systems and methods described herein comprise at least one guide nucleic acid polynucleotide. In some cases, the systems and methods described herein comprise a plurality of guide nucleic acids.
- the polynucleotide can be deoxyribonucleic acid (DNA) . In some cases, the DNA sequence can be single-stranded or doubled-stranded.
- the at least one guide nucleic acid polynucleotide can be ribonucleic acid (guide RNA) .
- the Cas12a protein can be complexed with the at least one guide RNA polynucleotide.
- the at least one guide RNA polynucleotide can comprise a nucleic-acid targeting region that comprises a complementary sequence to a nucleic acid sequence on the targeted polynucleotide such as the targeted genomic loci or genes to confer sequence specificity of nuclease targeting.
- the at least one guide RNA polynucleotide can comprise two separate nucleic acid molecules, which can be referred to as a double guide nucleic acid or a single nucleic acid molecule, which can be referred to as a single guide nucleic acid (e.g., single guide RNA or sgRNA) .
- the Cas protein-binding segment of a guide nucleic acid can comprise two stretches of nucleotides (e.g., crRNA and tracrRNA) that are complementary to one another.
- the two stretches of nucleotides (e.g., crRNA and tracrRNA) that are complementary to one another can be covalently linked by intervening nucleotides (e.g., a linker in the case of a single guide nucleic acid) .
- the two stretches of nucleotides (e.g., crRNA and tracrRNA) that are complementary to one another can hybridize to form a double stranded RNA duplex or hairpin of the Cas protein-binding segment, thus resulting in a stem-loop structure.
- the crRNA and the tracrRNA can be covalently linked via the 3′ end of the crRNA and the 5′ end of the tracrRNA.
- tracrRNA and crRNA can be covalently linked via the 5′ end of the tracrRNA and the 3′ end of the crRNA.
- a crRNA can comprise the nucleic acid-targeting segment (e.g., spacer region) of the guide nucleic acid and a stretch of nucleotides that can form one half of a double-stranded duplex of the Cas protein-binding segment of the guide nucleic acid.
- the crRNA can also provide a single-stranded nucleic acid targeting segment (e.g., a spacer region) that hybridizes to a target nucleic acid recognition sequence (e.g., protospacer) .
- a target nucleic acid recognition sequence e.g., protospacer
- the nucleic acid-targeting region of a guide nucleic acid can be between 18 to 72 nucleotides in length.
- the nucleic acid-targeting region of a guide nucleic acid (e.g., spacer region) can have a length of from about 12 nucleotides to about 100 nucleotides.
- the nucleic acid-targeting region of a guide nucleic acid can have a length of from about 12 nucleotides (nt) to about 80 nt, from about 12 nt to about 50 nt, from about 12 nt to about 40 nt, from about 12 nt to about 30 nt, from about 12 nt to about 25 nt, from about 12 nt to about 20 nt, from about 12 nt to about 19 nt, from about 12 nt to about 18 nt, from about 12 nt to about 17 nt, from about 12 nt to about 16 nt, or from about 12 nt to about 15 nt.
- nt nucleotides
- the DNA-targeting segment can have a length of from about 18 nt to about 20 nt, from about 18 nt to about 25 nt, from about 18 nt to about 30 nt, from about 18 nt to about 35 nt, from about 18 nt to about 40 nt, from about 18 nt to about 45 nt, from about 18 nt to about 50 nt, from about 18 nt to about 60 nt, from about 18 nt to about 70 nt, from about 18 nt to about 80 nt, from about 18 nt to about 90 nt, from about 18 nt to about 100 nt, from about 20 nt to about 25 nt, from about 20 nt to about 30 nt, from about 20 nt to about 35 nt, from about 20 nt to about 40 nt, from about 20 nt to about 45 nt, from about 20 nt to about 50 nt, from about 20 ntt,
- the length of the nucleic acid-targeting region can be at least 5, 10, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30 or more nucleotides.
- the length of the nucleic acid-targeting region (e.g., spacer sequence) can be at most 5, 10, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30 or more nucleotides.
- the nucleic acid-targeting region of a guide nucleic acid (e.g., spacer) is 20 nucleotides in length. In some embodiments, the nucleic acid-targeting region of a guide nucleic acid is 19 nucleotides in length. In some embodiments, the nucleic acid-targeting region of a guide nucleic acid is 18 nucleotides in length. In some embodiments, the nucleic acid-targeting region of a guide nucleic acid is 17 nucleotides in length. In some embodiments, the nucleic acid-targeting region of a guide nucleic acid is 16 nucleotides in length.
- the nucleic acid-targeting region of a guide nucleic acid is 21 nucleotides in length. In some embodiments, the nucleic acid-targeting region of a guide nucleic acid is 22 nucleotides in length.
- the nucleotide sequence of the guide nucleic acid that is complementary to a nucleotide sequence (target sequence) of the target nucleic acid can have a length of, for example, at least about 12 nt, at least about 15 nt, at least about 18 nt, at least about 19 nt, at least about 20 nt, at least about 25 nt, at least about 30 nt, at least about 35 nt or at least about 40 nt.
- the nucleotide sequence of the guide nucleic acid that is complementary to a nucleotide sequence (target sequence) of the target nucleic acid can have a length of from about 12 nucleotides (nt) to about 80 nt, from about 12 nt to about 50 nt, from about 12 nt to about 45 nt, from about 12 nt to about 40 nt, from about 12 nt to about 35 nt, from about 12 nt to about 30 nt, from about 12 nt to about 25 nt, from about 12 nt to about 20 nt, from about 12 nt to about 19 nt, from about 19 nt to about 20 nt, from about 19 nt to about 25 nt, from about 19 nt to about 30 nt, from about 19 nt to about 35 nt, from about 19 nt to about 40 nt, from about 19 nt to about 45 nt, from about 19 nt to about 50
- a protospacer sequence of a targeted polynucleotide can be identified by identifying a protospacer-adjacent motif (PAM) within a region of interest and selecting a region of a desired size upstream or downstream of the PAM as the protospacer.
- a corresponding spacer sequence can be designed by determining the complementary sequence of the protospacer region.
- a spacer sequence can be identified using a computer program (e.g., machine readable code) .
- the computer program can use variables such as predicted melting temperature, secondary structure formation, and predicted annealing temperature, sequence identity, genomic context, chromatin accessibility, %GC, frequency of genomic occurrence, methylation status, presence of SNPs, and the like.
- the percent complementarity between the nucleic acid-targeting sequence (e.g., a spacer sequence of the at least one guide polynucleotide as disclosed herein) and the target nucleic acid (e.g., a protospacer sequence of the one or more target loci as disclosed herein) can be at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100%.
- the percent complementarity between the nucleic acid-targeting sequence and the target nucleic acid can be at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100%over about 20 contiguous nucleotides.
- the Cas protein binding segment of a guide nucleic acid can have a length of from about 10 nucleotides to about 100 nucleotides, e.g., from about 10 nucleotides (nt) to about 20 nt, from about 20 nt to about 30 nt, from about 30 nt to about 40 nt, from about 40 nt to about 50 nt, from about 50 nt to about 60 nt, from about 60 nt to about 70 nt, from about 70 nt to about 80 nt, from about 80 nt to about 90 nt, or from about 90 nt to about 100 nt.
- the Cas protein-binding segment of a guide nucleic acid can have a length of from about 15 nucleotides (nt) to about 80 nt, from about 15 nt to about 50 nt, from about 15 nt to about 40 nt, from about 15 nt to about 30 nt or from about 15 nt to about 25 nt.
- the dsRNA duplex of the Cas protein-binding segment of the guide nucleic acid can have a length from about 6 base pairs (bp) to about 50 bp.
- the dsRNA duplex of the protein-binding segment can have a length from about 6 bp to about 40 bp, from about 6 bp to about 30 bp, from about 6 bp to about 25 bp, from about 6 bp to about 20 bp, from about 6 bp to about 15 bp, from about 8 bp to about 40 bp, from about 8 bp to about 30 bp, from about 8 bp to about 25 bp, from about 8 bp to about 20 bp or from about 8 bp to about 15 bp.
- the dsRNA duplex of the Cas protein-binding segment can have a length from about from about 8 bp to about 10 bp, from about 10 bp to about 15 bp, from about 15 bp to about 18 bp, from about 18 bp to about 20 bp, from about 20 bp to about 25 bp, from about 25 bp to about 30 bp, from about 30 bp to about 35 bp, from about 35 bp to about 40 bp, or from about 40 bp to about 50 bp.
- the dsRNA duplex of the Cas protein-binding segment can have a length of 36 base pairs.
- the percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the protein-binding segment can be at least about 60%.
- the percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the protein-binding segment can be at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or at least about 99%.
- the percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the protein-binding segment is 100%.
- Guide nucleic acids of the systems of the disclosure can include modifications or sequences that provide for additional desirable features (e.g., modified or regulated stability; subcellular targeting; tracking with a fluorescent label; a binding site for a protein or protein complex; and the like) .
- modifications include, for example, a 5′cap (a7-methylguanylate cap (m7G) ) ; a 3′ polyadenylated tail (a 3′ poly (A) tail) ; a riboswitch sequence (e.g., to allow for regulated stability and/or regulated accessibility by proteins and/or protein complexes) ; a stability control sequence; a sequence that forms a dsRNA duplex (a hairpin) ) ; a modification or sequence that targets the RNA to a subcellular location (e.g., nucleus, mitochondria, chloroplasts, and the like) ; a modification or sequence that provides for tracking (e.g., direct conjugation to a fluorescent molecule, conjugation to a moiety that facilitates fluorescent detection, a sequence that allows for fluorescent detection, and so forth) ; a modification or sequence that provides a binding site for proteins (e.g., proteins that act on DNA, including transcriptional activators, transcriptional repressors, DNA
- a guide nucleic acid can comprise one or more modifications (e.g., a base modification, a backbone modification) , to provide the nucleic acid with a new or enhanced feature (e.g., improved stability) .
- a guide nucleic acid can comprise a nucleic acid affinity tag.
- a nucleoside can be a base-sugar combination. The base portion of the nucleotide can be a heterocyclic base. The two most common classes of such heterocyclic bases are the purines and the pyrimidines.
- Nucleotides can be nucleosides that further include a phosphate group covalently linked to the sugar portion of the nucleoside.
- the phosphate group can be linked to the 2′, the 3′, or the 5′ hydroxyl moiety of the sugar.
- the phosphate groups can covalently link adjacent nucleosides to one another to form a linear polymeric compound.
- the respective ends of this linear polymeric compound can be further joined to form a circular compound; however, linear compounds can be suitable.
- linear compounds can have internal nucleotide base complementarity and can therefore fold in a manner as to produce a fully or partially double-stranded compound.
- the phosphate groups can commonly be referred to as forming the internucleoside backbone of the guide nucleic acid.
- the linkage or backbone of the guide nucleic acid can be a 3′ to 5′ phosphodiester linkage.
- a guide nucleic acid can comprise a modified backbone and/or modified internucleoside linkages.
- Modified backbones can include those that retain a phosphorus atom in the backbone and those that do not have a phosphorus atom in the backbone.
- Suitable modified guide nucleic acid backbones containing a phosphorus atom therein can include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkylphosphotriesters, methyl and other alkyl phosphonates such as 3′-alkylene phosphonates, 5′-alkylene phosphonates, chiral phosphonates, phosphinates, phosphoramidates including 3′-amino phosphoramidate and aminoalkylphosphoramidates, phosphorodiamidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, selenophosphates, and boranophosphates having normal 3′-5′ linkages, 2′-5′ linked analogs, and those having inverted polarity wherein one or more internucleotide linkages is a 3′ to 3′, a 5′ to 5′
- Suitable guide nucleic acids having inverted polarity can comprise a single 3′ to 3′ linkage at the 3′-most internucleotide linkage (such as a single inverted nucleoside residue in which the nucleobase is missing or has a hydroxyl group in place thereof) .
- Various salts e.g., potassium chloride or sodium chloride
- mixed salts, and free acid forms can also be included.
- a guide nucleic acid can comprise a morpholino backbone structure.
- a nucleic acid can comprise a 6-membered morpholino ring in place of a ribose ring.
- a phosphorodiamidate or other non-phosphodiester internucleoside linkage replaces a phosphodiester linkage.
- a guide nucleic acid can comprise polynucleotide backbones that are formed by short chain alkyl or cycloalkyl internucleoside linkages, mixed heteroatom and alkyl or cycloalkyl internucleoside linkages, or one or more short chain heteroatomic or heterocyclic internucleoside linkages.
- These can include those having morpholino linkages (formed in part from the sugar portion of a nucleoside) ; siloxane backbones; sulfide, sulfoxide and sulfone backbones; formacetyl and thioformacetyl backbones; methylene formacetyl and thioformacetyl backbones; riboacetyl backbones; alkene containing backbones; sulfamate backbones; methyleneimino and methylenehydrazino backbones; sulfonate and sulfonamide backbones; amide backbones; and others having mixed N, O, S and CH2 component parts.
- siloxane backbones siloxane backbones
- sulfide, sulfoxide and sulfone backbones formacetyl and thioformacetyl backbones
- a guide nucleic acid can comprise a nucleic acid mimetic.
- the term “mimetic” can be intended to include polynucleotides wherein only the furanose ring or both the furanose ring and the internucleotide linkage are replaced with non-furanose groups, replacement of only the furanose ring can also be referred as being a sugar surrogate.
- the heterocyclic base moiety or a modified heterocyclic base moiety can be maintained for hybridization with an appropriate target nucleic acid.
- One such nucleic acid can be a peptide nucleic acid (PNA) .
- the sugar-backbone of a polynucleotide can be replaced with an amide containing backbone, in particular an aminoethylglycine backbone.
- the nucleotides can be retained and are bound directly or indirectly to aza nitrogen atoms of the amide portion of the backbone.
- the backbone in PNA compounds can comprise two or more linked aminoethylglycine units which gives PNA an amide containing backbone.
- the heterocyclic base moieties can be bound directly or indirectly to aza nitrogen atoms of the amide portion of the backbone.
- a guide nucleic acid can comprise linked morpholino units (morpholino nucleic acid) having heterocyclic bases attached to the morpholino ring.
- Linking groups can link the morpholino monomeric units in a morpholino nucleic acid.
- Non-ionic morpholino-based oligomeric compounds can have less undesired interactions with cellular proteins.
- Morpholino-based polynucleotides can be non-ionic mimics of guide nucleic acids.
- a variety of compounds within the morpholino class can be joined using different linking groups.
- a further class of polynucleotide mimetic can be referred to as cyclohexenyl nucleic acids (CeNA) .
- the furanose ring normally present in a nucleic acid molecule can be replaced with a cyclohexenyl ring.
- CeNA DMT protected phosphoramidite monomers can be prepared and used for oligomeric compound synthesis using phosphoramidite chemistry.
- the incorporation of CeNA monomers into a nucleic acid chain can increase the stability of a DNA/RNA hybrid.
- CeNA oligoadenylates can form complexes with nucleic acid complements with similar stability to the native complexes.
- a further modification can include Locked Nucleic Acids (LNAs) in which the 2′-hydroxyl group is linked to the 4′carbon atom of the sugar ring thereby forming a 2′-C, 4′-C-oxymethylene linkage thereby forming a bicyclic sugar moiety.
- the linkage can be a methylene (-CH2-) , group bridging the 2′oxygen atom and the 4′ carbon atom wherein n is 1 or 2.
- a guide nucleic acid can comprise one or more substituted sugar moieties.
- Suitable polynucleotides can comprise a sugar substituent group selected from: OH; F; O-, S-, or N-alkyl; O-, S-, or N-alkenyl; O-, S-or N-alkynyl; or O-alkyl-O-alkyl, wherein the alkyl, alkenyl and alkynyl can be substituted or unsubstituted C 1 to C 10 alkyl or C 2 to C 10 alkenyl and alkynyl.
- O (CH 2 ) n O) m CH 3 O (CH 2 ) n OCH 3 , O (CH 2 ) n NH 2 , O (CH 2 ) n CH 3 , O (CH 2 ) n ONH 2 , and O (CH 2 ) n ON ( (CH 2 ) n CH 3 ) 2 , where n and m are from 1 to about 10.
- a sugar substituent group can be selected from: C 1 to C 10 lower alkyl, substituted lower alkyl, alkenyl, alkynyl, alkaryl, aralkyl, O-alkaryl or O-aralkyl, SH, SCH 3 , OCN, Cl, Br, CN, CF 3 , OCF 3 , SOCH 3 , SO 2 CH 3 , ONO 2 , NO 2 , N 3 , NH 2 , heterocycloalkyl, heterocycloalkaryl, aminoalkylamino, polyalkylamino, substituted silyl, an RNA cleaving group, a reporter group, an intercalator, a group for improving the pharmacokinetic properties of an guide nucleic acid, or a group for improving the pharmacodynamic properties of an guide nucleic acid, and other substituents having similar properties.
- a suitable modification can include 2′-methoxyethoxy (2′-O-CH 2 CH 2 OCH 3 , also known as 2′-O- (2-methoxyethyl) or 2′-MOE, an alkoxyalkoxy group) .
- a further suitable modification can include 2′-dimethylaminooxyethoxy, (a O (CH 2 ) 2 ON (CH 3 ) 2 group, also known as 2′-DMAOE) , 2′-dimethylaminoethoxyethoxy (also known as 2′-O-dimethyl-amino-ethoxy-ethyl or 2′-DMAEOE) , or 2′-O-CH 2 -O-CH 2 -N (CH 3 ) 2 .
- 2′-sugar substituent groups can be in the arabino (up) position or ribo (down) position.
- a suitable 2′-arabino modification is 2′-F.
- Similar modifications can also be made at other positions on the oligomeric compound, particularly the 3′ position of the sugar on the 3′ terminal nucleoside or in 2′-5′ linked nucleotides and the 5′ position of 5′ terminal nucleotide.
- Oligomeric compounds can also have sugar mimetics such as cyclobutyl moieties in place of the pentofuranosyl sugar.
- a guide nucleic acid can also include nucleobase (or “base” ) modifications or substitutions.
- nucleobases can include the purine bases, (e.g. adenine (A) and guanine (G) ) , and the pyrimidine bases, (e.g. thymine (T) , cytosine (C) and uracil (U) ) .
- Modified nucleobases can include tricyclic pyrimidines such as phenoxazine cytidine (1H-pyrimido (5, 4-b) (1, 4) benzoxazin-2 (3H) -one) , phenothiazine cytidine (1H-pyrimido (5, 4-b) (1, 4) benzothiazin-2 (3H) -one) , G-clamps such as a substituted phenoxazine cytidine (e.g.
- Heterocyclic base moieties can include those in which the purine or pyrimidine base is replaced with other heterocycles, for example 7-deaza-adenine, 7-deazaguanosine, 2-aminopyridine and 2-pyridone.
- Nucleobases can be useful for increasing the binding affinity of a polynucleotide compound. These can include 5-substituted pyrimidines, 6-azapyrimidines and N-2, N-6 and O-6 substituted purines, including 2-aminopropyladenine, 5-propynyluracil and 5-propynylcytosine. 5-methylcytosine substitutions can increase nucleic acid duplex stability by 0.6-1.2°C and can be suitable base substitutions (e.g., when combined with 2′-O-methoxyethyl sugar modifications) .
- a modification of a guide nucleic acid can comprise chemically linking to the guide nucleic acid one or more moieties or conjugates that can enhance the activity, cellular distribution or cellular uptake of the guide nucleic acid.
- These moieties or conjugates can include conjugate groups covalently bound to functional groups such as primary or secondary hydroxyl groups.
- Conjugate groups can include, but are not limited to, intercalators, reporter molecules, polyamines, polyamides, polyethylene glycols, polyethers, groups that enhance the pharmacodynamic properties of oligomers, and groups that can enhance the pharmacokinetic properties of oligomers.
- Conjugate groups can include, but are not limited to, cholesterols, lipids, phospholipids, biotin, phenazine, folate, phenanthridine, anthraquinone, acridine, fluoresceins, rhodamines, coumarins, and dyes.
- Groups that enhance the pharmacodynamic properties include groups that improve uptake, enhance resistance to degradation, and/or strengthen sequence-specific hybridization with the target nucleic acid.
- Groups that can enhance the pharmacokinetic properties include groups that improve uptake, distribution, metabolism or excretion of a nucleic acid.
- Conjugate moieties can include but are not limited to lipid moieties such as a cholesterol moiety, cholic acid a thioether, (e.g., hexyl-S-tritylthiol) , a thiocholesterol, an aliphatic chain (e.g., dodecandiol or undecyl residues) , a phospholipid (e.g., di-hexadecyl-rac-glycerol or triethylammonium 1, 2-di-O-hexadecyl-rac-glycero-3-H-phosphonate) , a polyamine or a polyethylene glycol chain, or adamantane acetic acid, a palmityl moiety, or an octadecylamine or hexylamino-carbonyl-oxycholesterol moiety.
- lipid moieties such as a cholesterol moiety, cholic acid a thi
- the at least one guide RNA polynucleotide of a system or method provided herein can bind to at least a portion of a genome (e.g., a plant genome) or a gene (e.g., a plant gene) .
- the at least one guide RNA polynucleotide is capable of forming a complex with a Cas12a protein to direct the protein to target the portion of a target nucleic acid (e.g., a site in a genome or a gene) .
- the systems described herein comprise at least one guide RNA polynucleotide that is able to form a complex with a Cas12a protein or fusion protein of the system. In some embodiments, the systems described herein comprise at least two (e.g., at least three, at least four, at least five, or at least six) different guide RNA polynucleotides that are able to form a complex with a site-directed nuclease portion of a fusion protein of the system.
- the guide nucleic acid comprises a nucleotide sequence having at least 70% (e.g., at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) identity to any one of SEQ ID NOs: 27-34 as set forth in Table 5.
- at least 70% e.g., at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%
- kits that include the components of the systems described in this disclosure.
- the kits include one or more of the fusion proteins and/or polynucleotides described herein.
- the methods comprise contacting a nucleic acid (i.e., the nucleic acid to be edited) with at least one Cas12a protein and/or fusion protein as described herein.
- the methods further comprise contacting the nucleic acid with a guide RNA (e.g., as described in Section V above) having a region complementary to a selected portion of the nucleic acid.
- a guide RNA e.g., as described in Section V above
- the nucleic acid (i.e., the nucleic acid to be edited) can be any suitable nucleic acid.
- the nucleic acid is a portion of a chromosome.
- the nucleic acid is a portion of a genome (e.g., a plant genome) .
- the methods provided herein can result in increased frequency of one or more desired nucleic acid editing outcomes (e.g., SDN-1 editing) .
- SDN-1 editing efficiency can be measured by dividing the number of plants with an insertion or deletion ( “indel” ) by the total number of transgenic plants.
- use of a Cas12a protein or fusion protein provided herein results in an increase in SDN-1 editing efficiency relative to use of an unmodified (i.e., wild-type) Cas12a protein.
- indel events can be further analyzed for the occurrence of homozygous edits (i.e., the same indel is present at both alleles of the target nucleic acid) and biallelic edits (i.e., different indels are present at each allele of the target nucleic acid) .
- the rate of homozygous/biallelic edits can be measured by dividing the number of plants with homozygous/biallelic edits by the total number of plants with indels.
- use of a Cas12a protein or fusion protein provided herein results in an increase in the rate of homozygous/biallelic edits.
- the methods herein comprise providing a Cas12a protein and/or fusion protein and a nucleic acid to be edited and can also comprise providing at least one guide RNA.
- providing a Cas12a protein or fusion protein can comprise introducing the Cas12a protein or fusion protein into a cell or introducing a recombinant nucleic acid, construct, or vector encoding the Cas12a protein or fusion protein into a cell.
- a gRNA can be provided by introducing the gRNA itself or a nucleic acid sequence encoding the gRNA.
- a Cas12a protein and/or fusion protein and a gRNA can be encoded by the same DNA construct or vector.
- Example 1 C965S acts synergistically with D156R to improve the efficiency of SDN1 editing at difficult target sites in maize
- LbCas12a variants A total of five LbCas12a variants were generated: in the first variant, the Cys965 residue was mutated to a serine residue, named LbCas12a-C965S; in the second variant, both the Cys10 and Cys965 residues were mutated to serine residues, named LbCas12a-C10S-C965S; in the third variant, both the Cys965 and Cys1090 residues were mutated to serine residues, named LbCas12a-C965S-C1090S; in the fourth variant, the Cys965 residue was mutated a serine residue, while the Asp156 residue was mutated to an arginine residue, named LbCas12a-D156R-C965S; in the fifth variant, only the Asp156 residue was mutated to an arginine residue, named LbCas12a-D156R.
- LbCas12a The coding sequence of LbCas12a was optimized based on maize-preferred codon usage; the codon triplet of selected cysteine residues, TGC, was mutated to serine-coding TCC by introducing the mutation in overlapping PCR primers.
- an SV40 NLS (SEQ ID NO: 56) was fused to the N-terminus via a flexible 30-amino acid (GSSSS) 6 (SEQ ID NO: 46) peptide linker, while two SV40 NLS’s separated by an 8-amino acid (SGGS) 2 (SEQ ID NO: 78) peptide linker were fused to the C-terminus via a flexible 30-amino acid (GSSSS) 6 (SEQ ID NO: 46) peptide linker as well.
- a binary vector was constructed to express one variant (or the control) in stable transgenic maize plants, in order to assess the SDN1-generation performance of the variant.
- the coding sequence of one variant, fused with a NLS was operably linked to a sugarcane ubiquitin 4 gene promoter and an Agrobacterium tumefaciens nopaline synthase gene terminator, for strong constitutive expression in maize cells.
- a same gRNA array driven by an Oryza sativa U6 promoter, was designed to express a gRNA targeting the maize gene Starch Branching Enzyme IIb (ZmSBEIIb) .
- the gRNA was based on the mature crRNA scaffold of LbCas12a.
- Transgenic maize plants were generated by infecting calli derived from immature maize embryos with Agrobacterium tumefaciens strain harboring one of the binary vectors described above, followed by tissue culture procedures.
- the leaf sheaths of regenerated plantlets were sampled for DNA extraction, and the transgenic plants were identified by TaqMan qPCR assays.
- the sequence spanning the target site was PCR-amplified and Sanger-sequenced, in order to determine the genotype and the SDN1 efficiency at the target site.
- the SDN1 efficiencies at the SBEIIb target site and the Wx1 target site were compared.
- the SBEIIb target site is difficult to edit with Cas12a.
- C965S alone slightly improved the overall SDN1 efficiency of SBEIIb but not the rate of homozygous/biallelic edits.
- C965S increased the overall SDN1 efficiency by 4 folds over the wildtype, with more than half being homozygous or biallelic edits; in comparison, D156R alone increased the SDN1 efficiency at ZmSBEIIb target site by 3 folds, with about half being homozygous or biallelic edits.
- SDN1 editing efficiencies were similar, if only modestly improved, over the wildtype.
- LbCas12a variants LbCas12a-D156R and LbCas12a-D156R-C965S, will be compared for their SDN1-generating performance.
- the two variants are identical to those tested in maize as described in Example 1, except that the coding sequences were optimized based on Arabidopsis-preferred codon usage.
- the coding sequence of one variant, fused with a NLS is operably linked to a promoter, such as an Arabidopsis elongation factor 1 alpha (EF1 ⁇ ) promoter, and a terminator, such as an Agrobacterium tumefaciens nopaline synthase gene terminator, for strong constitutive expression in soybean cells.
- a promoter such as an Arabidopsis elongation factor 1 alpha (EF1 ⁇ ) promoter
- a terminator such as an Agrobacterium tumefaciens nopaline synthase gene terminator
- a gRNA or a gRNA array driven by a soybean ubiquitin 1 promoter is designed to express gRNA (s) targeting a sitethe soybean genome, such as FAD2 (SEQ ID NO: 38 provides the LbCas12a gRNA targeting soybean FAD2-1A gene) .
- the gRNA (s) are based on the mature crRNA scaffold of LbCas12a, and are processed by self-cleaving ribozymes on the flanks.
- Transgenic soybean plants are generated by infecting mature soybean seeds with Agrobacterium tumefaciens strain harboring one of the binary vectors described above, followed by tissue culture procedures.
- the leaves of regenerated plantlets will be sampled for DNA extraction, and the transgenic plants will be identified by TaqMan assays.
- the sequences spanning each of the three target sites will be PCR-amplified and Sanger-sequenced, in order to determine the genotypes and the SDN1 efficiencies at the target sites.
- Example 3 Generation and identification of FnCas12a cysteine-substituted variants with enhanced in planta SDN1 editing efficiency.
- a total of 9 cysteine residues (Cys70, Cys473, Cys568, Cys717, Cys882, Cys1086, Cys1116, Cys1190 and Cys1196) exist in the FnCas12a primary sequence (SEQ ID NO: 2) .
- the crystal structure of FnCas12a (PDB entries 5NFV and 6I1K) suggested four cysteine residues (Cys70, Cys473, Cys1116, and Cys1190) are most likely surface-exposed, and thus might be prone to undesired interactions and/or modifications.
- Cys473 suggests it was difficult for an interacting protein or a modification enzyme to access, while our PyMOL analysis suggest that Cys1116 and Cys1190 are likely to form intramolecular disulfide bond (FIG. 2) . Therefore Cys70, Cys1116, and Cys1190 were selected for substitution.
- All cysteine-substituted variants were generated on the basis of FnCas12a-E184R variant.
- Three variants carrying single Cys-to-Ser substitution (FnCas12a-E184R-C70S, FnCas12a-E184R-C1116S, FnCas12a-E184R-C1190S) were generated, as well as three variants carrying double Cys-to-Ser substitutions (FnCas12a-E184R-C70S-C1116S, FnCas12a-E184R-C70S-C1190S, FnCas12a-E184R-C1116S-C1190S) .
- FnCas12a The coding sequence of FnCas12a was optimized based on maize-preferred codon usage; the codon triplet of selected cysteine residues, TGC, was mutated to serine-coding TCC for C1116 and AGC for C70 and C1190 by introducing the mutation in overlapping PCR primers.
- an SV40 NLS (SEQ ID NO: 56) was fused to the N-terminus via a flexible, 30-amino acid (GSSSS) 6 (SEQ ID NO: 46) peptide linker, while two SV40 NLS’s separated by an 8-amino acid (SGGS) 2 (SEQ ID NO: 78) peptide linker were fused to the C-terminus via a flexible, 30-amino acid (GSSSS) 6 (SEQ ID NO: 46) as well.
- a binary vector was constructed to express one variant (or the control) in stable transgenic maize plants, in order to assess the SDN1-generation performance of the variant.
- the coding sequence of one variant, fused with a NLS was operably linked to a sugarcane ubiquitin 4 gene promoter and an Agrobacterium tumefaciens nopaline synthase gene terminator, for strong constitutive expression in maize cells.
- gRNA array driven by an Oryza sativa U6 promoter, was designed to express three gRNAs targeting three different maize genes: Waxy1 (ZmWx1) , Glossy2 (ZmGL2) , and Starch Branching Enzyme IIb (ZmSBEIIb) .
- the gRNAs were based on the mature crRNA scaffold of FnCas12a.
- Transgenic maize plants will be generated by infecting calli derived from immature maize embryos with Agrobacterium tumefaciens strain harboring one of the binary vectors described above, followed by tissue culture procedures.
- the leaf sheath of regenerated plantlets will be sampled for DNA extraction, and the transgenic plants will be identified by TaqMan assays.
- the sequences spanning each of the three target sites will be PCR-amplified and Sanger-sequenced, in order to determine the genotypes and the SDN1 efficiencies at the target sites. Both the overall SDN1 editing efficiency and the rate of homozygous/biallelic mutants of each variant will be compared to those of the FnCas12a-E184R control, to assess to efficacy of the cysteine substitutions.
- Example 4 Generation and identification of AsCas12a cysteine-substituted variants with enhanced in planta SDN1 editing efficiency.
- a total of 8 cysteine residues (Cys65, Cys205, Cys334, Cys379, Cys608, Cys674, Cys1025, and Cys1248) exist in the AsCas12a primary sequence (SEQ ID NO: 3) .
- the crystal structure of AsCas12a suggests three cysteine residues: Cys334, Cys379, and Cys674 are most likely surface-exposed and thus prone to undesired interactions and/or modifications. These three residues were selected for substitution.
- All cysteine-substituted variants were generated on the basis of AsCas12a-E174R variant.
- Three variants carrying single Cys-to-Ser substitution (AsCas12a-E174R-C334S, AsCas12a-E174R-C379S, AsCas12a-E174R-C674S) were generated, as well as three variants carrying double Cys-to-Ser substitutions (AsCas12a-E174R-C334S-C379S, AsCas12a-E174R-C334S-C674S, AsCas12a-E174R-C379S-C674S) .
- the coding sequence of AsCas12a was optimized based on maize-preferred codon usage; the codon triplet of selected cysteine residues, TGC, was mutated to serine-coding TCC by introducing the mutation in overlapping PCR primers.
- an SV40 NLS (SEQ ID NO: 56) was fused to the N-terminus via a flexible, 30-amino acid (GSSSS) 6 (SEQ ID NO: 46) peptide linker, while two SV40 NLS’s separated by an 8-amino acid (SGGS) 2 (SEQ ID NO: 78) peptide linker were fused to the C-terminus via a flexible 30-amino acid (GSSSS) 6 (SEQ ID NO: 46) peptide linker as well.
- a binary vector was constructed to express one variant (or the control) in stable transgenic maize plants, in order to assess the SDN1-generation performance of the variant.
- the coding sequence of one variant, fused with a NLS was operably linked to a sugarcane ubiquitin 4 gene promoter and an Agrobacterium tumefaciens nopaline synthase gene terminator, for strong constitutive expression in maize cells.
- gRNA array driven by an Oryza sativa U6 promoter, was designed to express three gRNAs targeting three different maize genes: Waxy1 (ZmWx1) , Glossy2 (ZmGL2) , and Starch Branching Enzyme IIb (ZmSBEIIb) .
- the gRNAs were based on the mature crRNA scaffold of AsCas12a.
- Transgenic maize plants will be generated by infecting calli derived from immature maize embryos with Agrobacterium tumafciens strain harboring one of the binary vectors described above, followed by tissue culture procedures.
- the leaf sheath of regenerated plantlets will sampled for DNA extraction, and the transgenic plants will be identified by TaqMan assays.
- the sequences spanning each of the three target sites will be PCR-amplified and Sanger-sequenced, in order to determine the genotypes and the SDN1 efficiencies at the target sites. Both the overall SDN1 editing efficiency and the rate of homozygous/biallelic mutants of each variant will be compared to those of the AsCas12a-E174R control, to assess to efficacy of the cysteine substitutions.
- Example 5 Generation and identification of Mb2Cas12a cysteine-substituted variants with enhanced in planta SDN1 editing efficiency.
- Mb2Cas12a crystal structure of MbCas12a (PDB entry 6IV6) from M. bovoculi strain 22581, the closest ortholog sharing 94.7%amino acid identity with Mb2Cas12a, was used as a reference structure to estimate the location of the cysteine residues in Mb2Cas12a.
- Cys270, Cys307, Cys583, Cys662, Cys1068, Cys1099, Cys1149, and Cys1162 exist in the primary sequence of the Mb2Cas12a from strain 57922 (SEQ ID NO: 4) , which correspond to Cys283, Cys320, Cys593, Cys672, Cys1078, Cys1109, Cys1159, and Tyr1172 in Mb2Cas12a from strain 22581, respectively.
- This estimation suggests Cys270, Cys307, Cys583, Cys1068, Cys1099, Cys1149 and Cys1162 are likely exposed on the surface of Mb2Cas12a.
- Cys1162 of Mb2Cas12a aligns to Tyr1172 in MbCas12a
- Tyr1172 was mutated in 6IV6 and the structure was remodeled with PyMOL.
- the resulting structure model suggests Cys1162 is also likely surface-exposed in Mb2Cas12a.
- the surface topology suggested Cys1162 is difficult for an interacting protein or a modification enzyme to access. Therefore Cys270, Cys583, Cys1068, Cys1099, Cys1149 were selected for site directed mutagenesis.
- Mb2Cas12a The coding sequence of Mb2Cas12a was optimized based on maize-preferred codon usage; the codon triplet of selected cysteine residues, TGC, was mutated to serine-coding TCC by introducing the mutation in overlapping PCR primers for five single mutation variants.
- Mb2Cas12a was synthesized through introducing serine-coding TCC to replace TGC or alanine-coding GCC.
- an SV40 NLS (SEQ ID NO: 56) was fused to the N-terminus via a flexible 30-amino acid (GSSSS) 6 (SEQ ID NO: 46) peptide linker, while two SV40 NLS’s separated by an 8-amino acid (SGGS) 2 (SEQ ID NO: 78) peptide linker were fused to the C-terminus via a flexible 30-amino acid (GSSSS) 6 (SEQ ID NO: 46) peptide linker as well.
- a binary vector was constructed to express one variant (or the control) in stable transgenic maize plants, in order to assess the SDN1-generation performance of the variant.
- the coding sequence of one variant, fused with NLS was operably linked to a sugarcane ubiquitin 4 gene promoter and an Agrobacterium tumefaciens nopaline synthase gene terminator, for strong constitutive expression in maize cells.
- gRNA array driven by sugarcane ubiquitin 4 gene promoter and an Agrobacterium tumefaciens nopaline synthase gene terminator, was designed to express four gRNAs targeting four different maize genes: Waxy1 (ZmWx1) , Benzoxazinone synthesis 9 (ZmBx9) , Glossy2 (ZmGL2) , and ZmBINa.
- the gRNAs were based on the mature crRNA scaffold of LbCas12a and processed by self-cleaving ribozymes on the flanks.
- Transgenic maize plants were generated by infecting calli derived from immature maize embryos with Agrobacterium tumefaciens strain harboring one of the binary vectors described above, followed by tissue culture procedures.
- the leaf sheath of regenerated plantlets were sampled for DNA extraction, and the transgenic plants were identified by TaqMan assays.
- the sequences spanning each of the three target sites were PCR-amplified and Sanger-sequenced, in order to determine the genotypes and the SDN1 efficiencies at the target sites.
- Table 8 in comparison with the Mb2Cas12a-D172R control, all variants with single Cys-to-Ser mutation increased the rate of homozygous/biallelic mutants. The efficacy of stacking five cysteine mutations will be determined similarly.
- SEQ ID NO: 6 amino acid sequence of LbCas12a D156R:
- SEQ ID NO: 7 amino acid sequence of LbCas12a + D156R + C965S:
- SEQ ID NO: 8 amino acid sequence of LbCas12a + C10S + C965S:
- SEQ ID NO: 9 amino acid sequence of LbCas12a + C965S + C1090S:
- SEQ ID NO: 10 amino acid sequence of LbCas12a + linker + D156R:
- SEQ ID NO: 11 amino acid sequence of LbCas12a + linker + D156R + C965S:
- SEQ ID NO: 12 amino acid sequence of Mb2Cas12a + linker + D172R:
- SEQ ID NO: 13 amino acid sequence of Mb2Cas12a + linker + D172R + C270S:
- SEQ ID NO: 14 amino acid sequence of Mb2Cas12a + linker + D172R + C583S:
- SEQ ID NO: 15 amino acid sequence of Mb2Cas12a + linker + D172R + C1068S:
- SEQ ID NO: 16 amino acid sequence of Mb2Cas12a + linker + D172R + C1099S:
- SEQ ID NO: 17 amino acid sequence of Mb2Cas12a + linker + D172R + C1149S:
- SEQ ID NO: 18 amino acid sequence of Mb2Cas12a + linker + D172R + C270S + C583S + C1068S + C1099S + C1149S:
- SEQ ID NO: 19 amino acid sequence of Mb2Cas12a + linker + D172R + C270A +C583A + C1068A+ C1099A+ C1149A:
- SEQ ID NO: 20 nucleic acid sequence encoding LbCas12a + linker, maize codon-optimized:
- SEQ ID NO: 22 nucleic acid sequence encoding LbCas12a + linker + D156R + C965S, maize codon-optimized:
- SEQ ID NO: 23 nucleic acid sequence encoding LbCas12a + linker + C10S + C965S, maize codon-optimized:
- SEQ ID NO: 24 nucleic acid sequence encoding LbCas12a + linker + C965S + C1090S, maize codon-optimized:
- SEQ ID NO: 25 nucleic acid sequence encoding LbCas12a + linker + D156R, Arabidopsis codon-optimized:
- SEQ ID NO: 26 nucleic acid sequence encoding LbCas12a + linker + D156R + C965S, Arabidopsis codon-optimized:
- SEQ ID NO: 27 nucleic acid sequence encoding Mb2Cas12a + linker + D172R, maize codon-optimized:
- SEQ ID NO: 28 nucleic acid sequence encoding Mb2Cas12a + linker + D172R + C270S, maize codon-optimized:
- SEQ ID NO: 29 nucleic acid sequence encoding Mb2Cas12a + linker + D172R + C583S, maize codon-optimized:
- SEQ ID NO: 30 nucleic acid sequence encoding Mb2Cas12a + linker + D172R +C1068S, maize codon-optimized:
- SEQ ID NO: 31 nucleic acid sequence encoding Mb2Cas12a + linker + D172R +C1099S, maize codon-optimized:
- SEQ ID NO: 32 nucleic acid sequence encoding Mb2Cas12a + linker + D172R +C1149S, maize codon-optimized:
- SEQ ID NO: 33 nucleic acid sequence encoding Mb2Cas12a + linker + D172R + C270S + C583S + C1068S + C1099S + C1149S, maize codon-optimized:
- SEQ ID NO: 34 nucleic acid sequence encoding Mb2Cas12a + linker + D172R + C270A + C583A + C1068A+ C1099A+ C1149A, maize codon-optimized:
- a single component may be replaced by multiple components, and multiple components may be replaced by a single component, to provide an element or structure or to perform a given function or functions. Except where such substitution would not be operative to practice certain embodiments of the disclosure, such substitution is considered within the scope of the disclosure.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Organic Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Biomedical Technology (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Medicinal Chemistry (AREA)
- Crystallography & Structural Chemistry (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Plant Pathology (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
Abstract
Provided herein are variant Cas12a proteins comprising at least one human-induced mutation. Also provided are fusion proteins comprising the variant Cas12a proteins and one or more heterologous domains. Also provided are associated nucleic acids, DNA constructs, vectors, cells, and methods of editing nucleic acids using the variant Cas12a proteins and/or fusion proteins. Use of the provided proteins can increase the frequency of desired nucleic acid edits (e.g., SDN-1 edits in plant genomes).
Description
This disclosure relates to methods to increase site-directed nuclease editing.
REFERENCE TO A SEQUENCE LISTING SUBMITTED AS AN XML FILE
This application is accompanied by a sequence listing entitled 82447-SL. xml, created January 19, 2023, which is approximately 149 kilobytes in size. This sequence listing is incorporated herein by reference in its entirety.
Site directed nucleases (SDNs) (e.g. zinc finger nucleases, transcription activator-like effector nucleases, CRISPR-associated nucleases) have gained increasing popularity in the gene editing space. These SDNs act as endonucleases and generally create double-stranded breaks (DSBs) in specific DNA sequences, activating intrinsic repair mechanisms of the cell (e.g., homologous recombination) . During the repair process, site-directed modification to said specific DNA sequence can be achieved. The CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) /Cas (CRISPR-associated) system evolved in bacteria and archaea as an adaptive immune system to defend against viral attack. In recent years, the CRISPR/Cas system has attracted particular interest as a tool for genome editing. CRISPR/Cas systems that generate site-specific double stranded breaks (DSBs) can be used to edit DNA in eukaryotic cells, e.g., by producing deletions, insertions, and/or changes in nucleotide sequence.
BRIEF SUMMARY
The Summary is provided to introduce a selection of concepts that are further described below in the Detailed Description. This Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in limiting the scope of the claimed subject matter.
In one aspect, provided is a Cas12a protein comprising a sequence that is at least 80%identical to the amino acid sequence of SEQ ID NO: 1 and a human-induced mutation at position C965. In some embodiments, the human-induced mutation is a cysteine to serine substitution. In some embodiments, the Cas12a protein further comprises a human-induced
mutation at position D156. In some embodiments, the human-induced mutation at position D156 is an aspartic acid to arginine substitution. In some embodiments, the sequence of the Cas12a protein comprises any one of SEQ ID NOs: 5-11.
In another aspect, provided is a Cas12a protein comprising a sequence that is at least 80%identical to the amino acid sequence of SEQ ID NO: 2 and a human-induced mutation at position C70, C1116, and/or C1190. In some embodiments, the human-induced mutation is a cysteine to serine substitution. In some embodiments, the Cas12a protein further comprises a human-induced mutation at position E184. In some embodiments, the human-induced mutation at position E184 is a glutamic acid to arginine substitution.
In another aspect, provided is a Cas12a protein comprising a sequence that is at least 80%identical to the amino acid sequence of SEQ ID NO: 3 and a human-induced mutation at position C334, C379, and/or C674. In some embodiments, the human-induced mutation is a cysteine to serine substitution. In some embodiments, the Cas12a protein further comprises a human-induced mutation at position E174. In some embodiments, the human-induced mutation at position E174 is a glutamic acid to arginine substitution.
In another aspect, provided is a Cas12a protein comprising a sequence that is at least 80%identical to the amino acid sequence of SEQ ID NO: 4 and a human-induced mutation at position C270, C583, C1068, C1099, and/or C1149. In some embodiments, the human-induced mutation is a cysteine to serine substitution. In some embodiments, the Cas12a protein further comprises a human-induced mutation at position D172. In some embodiments, the human-induced mutation at position D172 is an aspartic acid to arginine substitution. In some embodiments, the sequence of the Cas12a protein comprises any one of SEQ ID NOs: 12-19.
In some embodiments of any of the Cas12a proteins described above, the Cas12a protein is a catalytically dead Cas12a (dCas12a) protein of a nickase Cas12a (nCas12a) protein.
In some embodiments of any of the Cas12a proteins described above, the Cas12a protein further comprises a nuclear localization signal.
In another aspect, provided is a fusion protein comprising any of the Cas12a proteins described above and a heterologous domain.
In some embodiments, the heterologous domain is a deaminase domain, a transcription factor domain, a nuclease domain, a reverse-transcriptase domain, a transposase domain, a integrase domain, a uracil DNA glycosylase inhibitor domain, a recombinase domain, a nickase domain, a methyltransferase domain, a methylase domain, an acetylase domain, an acetyltransferase domain, a transcriptional activator domain, or a transcriptional repressor domain.
In some embodiments of the fusion protein, the Cas12a protein is linked to the heterologous domain by a linker sequence.
In another aspect, provided is a nucleic acid encoding any of the Cas12a proteins or any of the fusion proteins described above. In some embodiments, the nucleic acid sequence is any one of SEQ ID NOs: 20-34.
In another aspect, provided is a DNA construct comprising a promoter operably linked to the nucleic acid encoding any of the Cas12a proteins or any of the fusion proteins described above.
In another aspect, provided is a vector comprising the nucleic acid or the DNA construct described above.
In another aspect, provided is a cell comprising the nucleic acid, the DNA construct, or the vector described above. In some embodiments, the cell is a plant cell. In some embodiments, the cell is a maize plant cell, a wheat plant cell, a rice plant cell, a soybean plant cell, a sunflower plant cell, or a tomato plant cell.
In another aspect, provided is a method of editing a nucleic acid, the method comprising contacting the nucleic acid with (i) any one of the Cas12a protein described above or any one of the fusion proteins described above, and (ii) a guide RNA having a region complementary to a selected portion of the nucleic acid, thereby resulting in an edit to the nucleic acid.
The present application includes the following figures. The figures are intended to illustrate certain embodiments and/or features of the compositions and methods, and to supplement any description (s) of the compositions and methods. The figures do not limit the scope of the compositions and methods, unless the written description expressly indicates that such is the case.
FIG. 1 shows the cysteine residues in LbCas12a may potentially form inter-or intra-molecular interactions. Left: the PyMOL surface model of LbCas12a-crRNA-DNA ternary complex (PDB entry 5XUS) . The highlighted area pointed by arrows are the thiol groups of C965 and C1090 that are potentially exposed to the surface. Right: four cysteine residues (C10, C805, C912, C965) that are scattered in the linear amino acid sequence form a cluster inside the 3D structure of LbCas12a.
FIG. 2 shows two of the cysteine residues in FnCas12a that were selected for substitution, according to aspects of this disclosure. The PyMOL stick models of C1190 and C1116 suggest the thiol groups (in black) are close to each other in the FnCas12a 3D structure (PDB entry 5NFV) , and may potentially form an intramolecular disulfide bond in between.
The following description recites various aspects and embodiments of the present compositions and methods. No particular embodiment is intended to define the scope of the compositions and methods. Rather, the embodiments merely provide non-limiting examples of various compositions and methods that are at least included within the scope of the disclosed compositions and methods. The description is to be read from the perspective of one of ordinary skill in the art; therefore, information well known to the skilled artisan is not necessarily included.
I. Terminology
All technical and scientific terms used herein, unless otherwise defined below, are intended to have the same meaning as commonly understood by one of ordinary skill in the art. References to techniques employed herein are intended to refer to the techniques as commonly understood in the art, including variations on those techniques and/or substitutions of equivalent techniques that would be apparent to one of skill in the art. While the following terms are believed to be well understood by one of ordinary skill in the art, the following definitions are set forth to facilitate explanation of the presently disclosed subject.
As used herein, the singular forms “a” , “an” and “the” include plural referents unless the content clearly dictates otherwise. Thus, for example, reference to “an enzyme” optionally includes a combination of two or more such molecules, and the like.
As used herein, “and/or” refers to and encompasses any and all possible combinations of one or more of the associated listed items.
The term “about” as used herein refers to the usual error range for the respective value readily known to the skilled person in this technical field, for example ± 20%, ± 10%, or ± 5%, are within the intended meaning of the recited value.
As used herein, the term “comprising” or “comprise” is open-ended. When used in connection with a subject nucleic acid (or amino acid sequence) , it refers to a nucleic acid sequence (or an amino acid sequence) that includes the subject sequence as a part or as its entire sequence.
As used herein, the transitional phrase “consisting essentially of” means that the scope of a claim is to be interpreted to encompass the specified materials or steps recited in the claim and those that do not materially affect the basic and novel characteristic (s) of the claimed matter. Thus, the term “consisting essentially of” when used in a claim of this disclosure is not intended to be interpreted to be equivalent to “comprising. ”
The term “plurality” refers to more than one entity. Thus, a “plurality of individuals” refers to at least two individuals. In some embodiments, the term plurality refers to more than half of the whole. For example, in some embodiments a “plurality of a population” refers to more than half the members of that population.
The term “plant” as used herein refers to any plant at any stage of development, particularly a seed plant. The term “plant cell” as used herein refers to a structural and physiological unit of a plant, comprising a protoplast and a cell wall. The plant cell may be in form of an isolated single cell or a cultured cell, or as a part of higher organized unit such as, for example, plant tissue, a plant organ, or a whole plant. The plant cell may be derived from or part of an angiosperm or gymnosperm. The plant cell may be a monocotyledonous plant cell (e.g., a maize cell, a rice cell, a sorghum cell, a sugarcane cell, a barley cell, a wheat cell, an oat cell, a turf grass cell, or an ornamental grass cell) or a dicotyledonous plant cell (e.g., a tobacco cell, a pepper cell, an eggplant cell, a sunflower cell, a crucifer cell, a flax cell, a potato cell, a cotton cell, a soybean cell, a sugar bee cell, or an oilseed rape cell. The term “plant cell culture” as used herein refers to cultures of plant units such as, for example, protoplasts, cell culture cells, cells in plant tissues, pollen, pollen tubes, ovules, embryo sacs, zygotes and embryos at various stages of development. The term “plant tissue” as used herein refers to a group of plant cells organized into a structural and functional unit. Any tissue of a
plant in planta or in culture is included. This term includes, but is not limited to, whole plants, plant organs, plant seeds, tissue culture and any group of plant cells organized into structural and/or functional units. The use of this term in conjunction with, or in the absence of, any specific type of plant tissue as listed above or otherwise embraced by this definition is not intended to be exclusive of any other type of plant tissue. The term “plant part” as used herein refers to a part of a plant, including single cells and cell tissues such as plant cells that are intact in plants, cell clumps and tissue cultures from which plants can be regenerated. Examples of plant parts include, but are not limited to, single cells and tissues from pollen, ovules, zygotes, leaves, embryos, roots, root tips, anthers, flowers, flower parts, fruits, stems, shoots, cuttings, and seeds; as well as pollen, ovules, egg cells, zygotes, leaves, embryos, roots, root tips, anthers, flowers, flower parts, fruits, stems, shoots, cuttings, scions, rootstocks, seeds, protoplasts, calli, and the like.
The terms “polypeptide, ” “peptide, ” and “protein” are used interchangeably herein to refer to a polymer of amino acid residues. As used herein, the terms encompass amino acid chains of any length, including full-length proteins, wherein the amino acid residues are linked by covalent peptide bonds.
The terms “nucleic acid” and “polynucleotide” are used interchangeably and as used herein refer to deoxyribonucleic acids (DNA) or ribonucleic acids (RNA) and polymers thereof in either single-or double-stranded form, as well as to both sense and anti-sense strands of RNA, cDNA, genomic DNA, mitochondrial DNA, and synthetic forms and mixed polymers of the above. In higher plants, DNA is the genetic material while RNA is involved in the transfer of information contained within DNA into proteins. A “genome” is the entire body of genetic material contained in each cell of an organism. It is understood that when an RNA is described, its corresponding cDNA is also described, wherein uridine is represented as thymidine. In particular embodiments, a nucleotide refers to a ribonucleotide, deoxynucleotide or a modified form of either type of nucleotide, and combinations thereof. In addition, a polynucleotide disclosed herein may include either or both naturally occurring and modified nucleotides linked together by naturally occurring and/or non-naturally occurring nucleotide linkages. The nucleic acid molecules may be modified chemically or biochemically or may contain non-natural or derivatized nucleotide bases, as will be readily appreciated by those of skill in the art. Such modifications include, for example, labels, methylation, substitution of one or more of the naturally occurring nucleotides with an analogue, internucleotide modifications such as uncharged linkages (e.g., methyl
phosphonates, phosphotriesters, phosphoramidates, carbamates, and the like) , charged linkages (e.g., phosphorothioates, phosphorodithioates, and the like) , pendent moieties (e.g., polypeptides) , intercalators (e.g., acridine, psoralen, and the like) , chelators, alkylators, and modified linkages (e.g., alpha anomeric nucleic acids, and the like) . The above term is also intended to include any topological conformation, including single-stranded, double-stranded, partially duplexed, triplex, hairpinned, circular and padlocked conformations. A reference to a nucleic acid sequence encompasses its complement unless otherwise specified. Thus, a reference to a nucleic acid molecule having a particular sequence should be understood to encompass its complementary strand, with its complementary sequence. Nucleotide sequences are “complementary” when they specifically hybridize in solution (e.g., according to Watson-Crick base pairing rules) . The term also includes codon-optimized nucleic acids that encode the same polypeptide sequence. It is also understood that nucleic acids can be unpurified, purified, or attached, for example, to a synthetic material such as a bead or column matrix.
The term “corresponding to” in the context of nucleic acid sequences means that when the nucleic acid sequences of certain sequences are aligned with each other, the nucleic acids that “correspond to” certain enumerated positions in the present invention are those that align with these positions in a reference sequence, but that are not necessarily in these exact numerical positions relative to a particular nucleic acid sequence of the invention. Optimal alignment of sequences for comparison can be conducted by computerized implementations of known algorithms. or by visual inspection. Readily available sequence comparison and multiple sequence alignment algorithms are, respectively, the Basic Local Alignment Search Tool (BLAST) and ClustalW/ClustalW2/Clustal Omega programs available on the Internet (e.g., the website of the EMBL-EBI) . Other suitable programs include, but are not limited to, GAP, BestFit, Plot Similarity, and FASTA, which are part of the Accelrys GCG Package available from Accelrys, Inc. of San Diego, Calif., United States of America. See also Smith &Waterman, 1981; Needleman &Wunsch, 1970; Pearson &Lipman, 1988; Ausubel et al., 1988; and Sambrook &Russell, 2001.
Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions) , alleles, orthologs, SNPs, and complementary sequences as well as the sequence explicitly indicated. Specifically, degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted
with mixed-base and/or deoxyinosine residues. See Batzer et al., Nucleic Acid Res. 19: 5081 (1991) ; Ohtsuka et al., J. Biol. Chem. 260: 2605-2608 (1985) ; and Rossolini et al., Mol. Cell. Probes 8: 91-98 (1994) .
The terms “identity” or “substantial identity, ” as used in the context of a polynucleotide or polypeptide sequence described herein, refers to a sequence that has at least 60%sequence identity to a reference sequence. Alternatively, percent identity can be any integer from 60%to 100%. Exemplary embodiments include at least: 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%, as compared to a reference sequence using the programs described herein; preferably BLAST using standard parameters, as described below. One of skill will recognize that these values can be appropriately adjusted to determine corresponding identity of proteins encoded by two nucleotide sequences by taking into account codon degeneracy, amino acid similarity, reading frame positioning and the like.
For sequence comparison, typically one sequence acts as a reference sequence to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Default program parameters can be used, or alternative parameters can be designated. The sequence comparison algorithm then calculates the percent sequence identities for the test sequences relative to the reference sequence, based on the program parameters.
A “comparison window, ” as used herein, includes reference to a segment of any one of the number of contiguous positions selected from the group consisting of from 20 to 600, usually about 50 to about 200, more usually about 100 to about 150 in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned. Methods of alignment of sequences for comparison are well-known in the art. Optimal alignment of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman Add. APL. Math. 2: 482 (1981) , by the homology alignment algorithm of Needleman and Wunsch J. Mol. Biol. 48: 443 (1970) , by the search for similarity method of Pearson and Lipman Proc. Natl. Acad. Sci. (U.S.A. ) 85: 2444 (1988) , by computerized implementations of these algorithms (e.g., BLAST) , or by manual alignment and visual inspection.
Algorithms that are suitable for determining percent sequence identity and sequence similarity are the BLAST and BLAST 2.0 algorithms, which are described in Altschul et al. (1990) J. Mol. Biol. 215: 403-410 and Altschul et al. (1977) Nucleic Acids Res. 25: 3389-3402, respectively. Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information (NCBI) web site. The algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul et al, supra) . These initial neighborhood word hits acts as seeds for initiating searches to find longer HSPs containing them. The word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always >0) and N (penalty score for mismatching residues; always <0) . For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) uses as defaults a word size (W) of 28, an expectation (E) of 10, M=1, N=-2, and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a word size (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix. See Henikoff &Henikoff, Proc. Natl. Acad. Sci. USA 89: 10915 (1989) .
The BLAST algorithm also performs a statistical analysis of the similarity between two sequences. See, e.g., Karlin &Altschul, Proc. Nat'l. Acad. Sci. USA 90: 5873-5787
. One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P (N) ) , which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance. For example, a nucleic acid is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid to the reference nucleic acid is less than about 0.01, more preferably less than about 10-5, and most preferably less than about 10-20.
“Recombination” is the exchange of DNA strands to produce new nucleotide sequence arrangements. The term may refer to the process of homologous recombination that occurs in double-strand DNA break repair, where a polynucleotide is used as a template to repair a homologous polynucleotide. The term may also refer to exchange of information between two homologous chromosomes during meiosis. The frequency of double recombination is the product of the frequencies of the single recombinants. For instance, a recombinant in a 10 cM area can be found with a frequency of 10%, and double recombinants are found with a frequency of 10%x 10%= 1 % (1 centimorgan is defined as 1%recombinant progeny in a testcross) .
A “gene” is a defined region that is located within a genome and that, besides the aforementioned coding nucleic acid sequence, comprises other, primarily regulatory, nucleic acid sequences responsible for the control of the expression, that is to say the transcription and translation, of the coding portion. Genes can include both coding and non-coding regions (e.g., introns, regulatory elements, promoters, enhancers, termination sequences and 5'a nd 3'untranslated regions) . A gene typically expresses mRNA, functional RNA, or specific protein, including regulatory sequences. Genes may or may not be capable of being used to produce a functional protein. In some embodiments, a gene refers to only the coding region. The term “native gene” refers to a gene as found in nature. The term “chimeric gene” refers to any gene that contains 1) DNA sequences, including regulatory and coding sequences that are not found together in nature, or 2) sequences encoding parts of proteins not naturally adjoined, or 3) parts of promoters that are not naturally adjoined. Accordingly, a chimeric gene may comprise regulatory sequences and coding sequences that are derived from different sources, or comprise regulatory sequences and coding sequences derived from the same source, but arranged in a manner different from that found in nature. A gene may be “isolated” by which is meant a nucleic acid molecule that is substantially or essentially free from components normally found in association with the nucleic acid molecule in its natural state. Such components include other cellular material, culture medium from recombinant production, and/or various chemicals used in chemically synthesizing the nucleic acid molecule.
A “gene of interest” or “nucleotide sequence of interest” refers to any gene which, when transferred to a plant, confers upon the plant a desired characteristic such as antibiotic resistance, virus resistance, insect resistance, disease resistance, or resistance to other pests, herbicide tolerance, improved nutritional value, improved performance in an industrial
process or altered reproductive capability. The “gene of interest” may also be one that is transferred to plants for the production of commercially valuable enzymes or metabolites in the plant.
An “isolated” nucleic acid molecule or nucleotide sequence or an “isolated” polypeptide is a nucleic acid molecule, nucleotide sequence, or polypeptide that, by the hand of man, exists apart from its native environment and/or has a function that is different, modified, modulated and/or altered as compared to its function in its native environment and is therefore not a product of nature. An isolated nucleic acid molecule or isolated polypeptide may exist in a purified form or may exist in a non-native environment such as, for example, a recombinant host cell. Thus, for example, with respect to polynucleotides, the term isolated means that it is separated from the chromosome and/or cell in which it naturally occurs. A polynucleotide is also isolated if it is separated from the chromosome and/or cell in which it naturally occurs and is then inserted into a genetic context, a chromosome, a chromosome location, and/or a cell in which it does not naturally occur. The recombinant nucleic acid molecules and nucleotide sequences of the invention can be considered to be “isolated” as defined above.
Thus, an “isolated nucleic acid molecule” or “isolated nucleotide sequence” is a nucleic acid molecule or nucleotide sequence that is not immediately contiguous with nucleotide sequences with which it is immediately contiguous (one on the 5'end and one on the 3'end) in the naturally occurring genome of the organism from which it is derived. Accordingly, in one embodiment, an isolated nucleic acid includes some or all of the 5'non-coding (e.g., promoter) sequences that are immediately contiguous to a coding sequence. The term therefore includes, for example, a recombinant nucleic acid that is incorporated into a vector, into an autonomously replicating plasmid or virus, or into the genomic DNA of a prokaryote or eukaryote, or which exists as a separate molecule (e.g., a cDNA or a genomic DNA fragment produced by PCR or restriction endonuclease treatment) , independent of other sequences. It also includes a recombinant nucleic acid that is part of a hybrid nucleic acid molecule encoding an additional polypeptide or peptide sequence. An “isolated nucleic acid molecule” or “isolated nucleotide sequence” can also include a nucleotide sequence derived from and inserted into the same natural, original cell type, but which is present in a non-natural state, e.g., present in a different copy number, and/or under the control of different regulatory sequences than that found in the native state of the nucleic acid molecule.
The term “isolated” can further refer to a nucleic acid molecule, nucleotide sequence, polypeptide, peptide or fragment that is substantially free of cellular material, viral material, and/or culture medium (e.g., when produced by recombinant DNA techniques) , or chemical precursors or other chemicals (e.g., when chemically synthesized) . Moreover, an “isolated fragment” is a fragment of a nucleic acid molecule, nucleotide sequence or polypeptide that is not naturally occurring as a fragment and would not be found as such in the natural state. “Isolated” does not necessarily mean that the preparation is technically pure (homogeneous) , but it is sufficiently pure to provide the polypeptide or nucleic acid in a form in which it can be used for the intended purpose.
“Homology dependent repair” or “homology directed repair” or “HDR” refers to a mechanism for repairing ssDNA and double stranded dna (dsDNA) damage in cells. This repair mechanism can be used by the cell when there is an HDR template with a sequence with significant homology to the injury site. The term “perfect HDR” refers to a situation in which genomic-homology junctions in the replaced allele underwent complete HDR and “imperfect HDR” refers to a situation in which genomic-homology junctions in the replaced allele underwent partial or incomplete HDR. a donor DNA molecule with homology to the cleaved target DNA sequence is used as a template for repair of the cleaved target DNA sequence, resulting in the transfer of genetic information from the donor polynucleotide to the target DNA. As such, new nucleic acid material may be inserted/copied into the site. In some cases, a target DNA is contacted with a donor molecule, for example a donor DNA molecule. In some cases, a donor DNA molecule is introduced into a cell. In some cases, at least a segment of a donor DNA molecule integrates into the genome of the cell.
“Microhomology-mediated end joining” or “MMEJ” or “alternative nonhomologous end-joining” (Alt-NHEJ) refers to a form of repairing double-stranded breaks in DNA. This repair mechanism utilizes microhomologous sequences to align the broken strands. “Non-homologous end joining” or “NHEJ” refers to a form of repairing double-stranded breaks in DNA. The double-strand breaks are repaired by direct ligation of the break ends to one another. Generally, no new nucleic acid material is inserted into the site, although some nucleic acid material may be lost or added, resulting in a small deletion or a small insertion.
As used herein, “heterologous” refers to a nucleic acid molecule, nucleotide sequence, polypeptide, or amino acid sequence not naturally associated with a host cell into
which it is introduced, that either originates from another species or is from the same species or organism but is modified from either its original form or the form primarily expressed in the cell, including non-naturally occurring multiple copies of a naturally occurring sequence. Thus, an amino acid sequence derived from an organism or species different from that of the cell into which the amino acid sequence is introduced, is heterologous with respect to that cell and the cell's descendants. In addition, a heterologous sequence includes a sequence derived from and inserted into the same natural, original cell type, but which is present in a non-natural state, e.g., present in a different copy number, and/or under the control of different regulatory sequences than that found in the native state of the polypeptide. A sequence can also be heterologous to other sequences with which it may be associated, for example in a nucleic acid construct, such as e.g., an expression vector. As one non-limiting example, a promoter may be present in a nucleic acid construct in combination with one or more regulatory element and/or coding sequences that do not naturally occur in association with that particular promoter, i.e., they are heterologous to the promoter.
II. Introduction
In some aspects, provided herein are variant Cas12a proteins having increased site-directed nuclease (SDN) genome editing activity. Site-directed nuclease technology has dramatically increased the speed and precision with which one can make genome edits in various organisms, including plants. Generally, the desired outcomes in SDN-mediated genome editing are 1) to target SDNs to cleave DNA at a specific genomic site in a host (e.g., a plant cell) and 2) to use the host’s natural repair mechanisms to introduce specific genomic changes at the cleavage site. The changes can include small deletions, substitutions, or the addition of a number of nucleotides. Such targeted edits can result in a new and desired characteristic (e.g., enhanced nutrient uptake, decreased allergen production) and/or a reduction in an undesirable characteristic (e.g., herbicide susceptibility) . SDN applications have generally been divided into three categories: SDN-1, SDN-2, and SDN-3. SDN-1 produces a double-stranded break in a genome without the addition of foreign DNA. When such a break is repaired by the host (e.g., via NHEJ) , mutations or deletions can be introduced. If these mutations or deletions are in a gene, the gene can be silenced or knocked out. SDN-2 uses template DNA to introduce a predicted modification at the target cleavage site (e.g., via HDR) , but does not result in insertion of recombinant DNA. SDN-3 also uses template DNA to introduce recombinant or exogenous DNA templates (e.g., a transgene) at the target cleavage site.
Cas12a is a CRISPR-associated (Cas) SDN that functions in a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) /Cas system. In bacteria, this system can provide adaptive immunity against foreign DNA (Barrangou, R., et al, “CRISPR provides acquired resistance against viruses in prokaryotes, “Science (2007) 315: 1709-1712; Makarova, K.S., et al, “Evolution and classification of the CRISPR-Cas systems, ” Nat Rev Microbiol (2011) 9: 467-477; Garneau, J. E., et al, “The CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA, ” Nature (2010) 468: 67-71; Sapranauskas, R., et al, “The Streptococcus thermophilus CRISPR/Cas system provides immunity in Escherichia coli, ” Nucleic Acids Res (2011) 39: 9275-9282) . In a wide variety of organisms including diverse mammals, animals, plants, microbes, and yeast, a CRISPR/Cas system (e.g., modified and/or unmodified) can be utilized as a genome engineering tool. A CRISPR/Cas system can comprise a guide nucleic acid such as a guide RNA (gRNA) complexed with a Cas protein for targeted regulation of gene expression and/or activity or nucleic acid editing. An RNA-guided Cas protein (e.g., a Cas nuclease such as a Cas9 nuclease) can specifically bind a target polynucleotide (e.g., DNA) in a sequence-dependent manner. The Cas protein, if possessing nuclease activity, can cleave the DNA (Gasiunas, G., et al, “Cas9-crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria, ” Proc Natl Acad Sci USA (2012) 109: E2579-E2 86; Jinek, M., et al, “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity, ” Science (2012) 337: 816-821; Sternberg, S. H., et al, “DNA interrogation by the CRISPR RNA-guided endonuclease Cas9, ” Nature (2014) 507: 62; Deltcheva, E., et al, “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III, ” Nature (201 1) 471 : 602-607) . DNA cleavage (e.g., double-strand breaks) can result in DNA break repair which allows for the introduction of gene modification (s) (e.g., nucleic acid editing) .
Cysteine residues are highly reactive residues that are subject to posttranslational modifications. Formation of undesired disulfide bonds and/or modifications could affect proper folding, and/or localization, and/or enzymatic activity of a protein. There are 8-9 cysteine residues in most Cas12a orthologs; in comparison, there are only 2 cysteine residues in Cas9 from Streptococcus pyogenes (SpCas9) . Most cysteine residues in Cas12a orthologs are not conserved. Therefore, those exposed on the surface are more likely to be involved in intermolecular disulfide bond formation and/or posttranslational modifications. Conserved cysteine residues LbCas12a protein, FnCas12a protein, AsCas12a protein, and Mb2Cas12a are shown in Tables 1-4.
Table 1. The cysteine residues in LbCas12a and their aligned residues in a four-ortholog pairwise alignment.
Table 2. The cysteine residues in FnCas12a and their aligned residues in a four-ortholog pairwise alignment.
Table 3. The cysteine residues in AsCas12a and their aligned residues in a four-ortholog pairwise alignment.
Table 4. The cysteine residues in Mb2Cas12a and their aligned residues in a four-ortholog pairwise alignment.
The present disclosure is based in part on the discovery by the inventors that mutating surface-exposed cysteine residues of Cas12a can improve the bioavailability of Cas12a proteins. Without being bound by any particular theory, it is likely that such mutations avoid the undesired modifications described above. Provided herein are variant Cas12a proteins comprising at least one human-induced mutation. Also provided are fusion proteins comprising the variant Cas12a proteins and one or more heterologous domains. Also provided are associated nucleic acids, DNA constructs, vectors, cells, and methods of editing nucleic acids using the variant Cas12a proteins and/or fusion proteins. In some embodiments, as demonstrated in the Examples herein, the provided methods result in an increased frequency of desired nucleic acid edits. In some embodiments, the edits are SDN-1 edits. In some embodiments, the increased frequency of desired nucleic acid edits is seen at genomic sites that are difficult to edit.
III. Variant Cas12a proteins and fusion proteins
In one aspect, provided herein are variant Cas12a proteins comprising at least one human-induced mutation that have enhanced function (i.e., when compared to unmodified Cas12a proteins) . Also provided are fusion proteins comprising said variant Cas12a proteins and at least one heterologous domain. In some embodiments, the enhanced function of Cas12a is increased SDN-1 genome editing activity. In some embodiments, the variant Cas12a proteins comprise substitutions of one or more surface-exposed cysteine residues. In some embodiments, the variant Cas12a proteins comprise cysteine to serine substitutions at one or more surface-exposed cysteine residues. In some embodiments, the variant Cas12a proteins provided herein further comprise a substitution of an aspartic acid residue and/or a glutamic acid residue to an arginine residue.
Cas12a (which is also referred to as Cpf1) is a Class II, Type V CRISPR/Cas. A variant Cas12a protein provided herein can be a modified form of Cas12a from any of a number of bacterial species including, but not limited to, Lachnospiraceae bacterium, Acidaminococcus sp., Moraxella bovoculi, Thiomicrospira sp., Moraxella lacunata, Methanomethylophilus alvus, Btyrivibrio sp., or Bacteroidetesoral sp. Unmodified Cas12a protein sequences include Lachnospiraceae bacterium Cas12a (LbCas12a; SEQ ID NO: 1) , Francisella novicida U112 Cas12a (FnCas12a; SEQ ID NO: 2) , Acidaminococcus sp. Cas12a
(AsCas12a; SEQ ID NO: 3) , and Moraxella bovoculi strain 57922 Cas12a (Mb2Cas12a; SEQ ID NO: 4) .
In some embodiments, the variant Cas12a protein is a modified form of LbCas12a. In some embodiments, the Cas12a protein comprises a sequence that is at least 60%identical (e.g., at least 65%, at least 70%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%) to the amino acid sequence of SEQ ID NO: 1 and at least one human-induced mutation. In some embodiments, the human-induced mutation is a substitution of a surface-exposed cysteine residue. Surface-exposed cysteine residues can be identified using methods known in the art, e.g., by the methods described in the Examples herein. In some embodiments, one or more surface-exposed cysteine residues are substituted with another residue (e.g., a serine residue) . In some embodiments, the human-induced mutation is at position C965 (i.e., the cysteine residue at position 965 of SEQ ID NO: 1) . In some embodiments, the human-induced mutation is a substitution of the cysteine residue. In some embodiments, the human-induced mutation is a cysteine to serine substitution. In some embodiments, the Cas12a protein further comprises a human-induced mutation at position D156 (i.e., the aspartic acid residue at position 156 of SEQ ID NO: 1) , as described for example in WO2018195545 and WO2017184768, which are incorporated herein by reference in their entiriety. In some embodiments, the human-induced mutation is a substitution of the aspartic acid residue. In some embodiments, the human-induced mutation is an aspartic acid to arginine substitution. In some embodiments, the sequence of the Cas12a protein comprises any one of SEQ ID NOs: 5-11.
In some embodiments, the variant Cas12a protein is a modified form of FnCas12a. In some embodiments, the Cas12a protein comprises a sequence that is at least 60%identical (e.g., at least 65%, at least 70%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%) to the amino acid sequence of SEQ ID NO: 2 and at least one human-induced mutation. In some embodiments, the human-induced mutation is a substitution of a surface-exposed cysteine residue. In some embodiments, one or more surface-exposed cysteine residues are substituted
with another residue (e.g., a serine residue) . In some embodiments, the human-induced mutation is at position C70, C1116, and/or C1190. In some embodiments, the human-induced mutation is a substitution of the cysteine residue. In some embodiments, the human-induced mutation is a cysteine to serine substitution. In some embodiments, the Cas12a protein further comprises a human-induced mutation at position E184 (i.e., the glutamic acid residue at position 184 of SEQ ID NO: 2) , as described for example in WO2018195545, which is incorporated herein by reference in their entiriety. In some embodiments, the human-induced mutation is a substitution of the glutamic acid residue. In some embodiments, the human-induced mutation is a glutamic acid to arginine substitution.
In some embodiments, the variant Cas12a protein is a modified form of AsCas12a. In some embodiments, the Cas12a protein comprises a sequence that is at least 60%identical (e.g., at least 65%, at least 70%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%) to the amino acid sequence of SEQ ID NO: 3 and at least one human-induced mutation. In some embodiments, the human-induced mutation is a substitution of a surface-exposed cysteine residue. In some embodiments, one or more surface-exposed cysteine residues are substituted with another residue (e.g., a serine residue) . In some embodiments, the human-induced mutation is at position C334, C379, and/or C674. In some embodiments, the human-induced mutation is a substitution of the cysteine residue. In some embodiments, the human-induced mutation is a cysteine to serine substitution. In some embodiments, the Cas12a protein further comprises a human-induced mutation at position E174, as described for example in WO2018195545, which is incorporated herein by reference in their entiriety. In some embodiments, the human-induced mutation is a substitution of the glutamic acid residue. In some embodiments, the human-induced mutation is a glutamic acid to arginine substitution.
In some embodiments, the variant Cas12a protein is a modified form of Mb2Cas12a. In some embodiments, the Cas12a protein comprises a sequence that is at least 60%identical (e.g., at least 65%, at least 70%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%) to the amino acid sequence of SEQ ID NO: 4 and at least one human-induced
mutation. In some embodiments, the human-induced mutation is a substitution of a surface-exposed cysteine residue. In some embodiments, one or more surface-exposed cysteine residues are substituted with another residue (e.g., a serine residue) . In some embodiments, the human-induced mutation is at position C270, C583, C1068, C1099, and/or C1149. In some embodiments, the human-induced mutation is a substitution of the cysteine residue. In some embodiments, the human-induced mutation is a cysteine to serine substitution. In some embodiments, the Cas12a protein further comprises a human-induced mutation at position D172. In some embodiments, the human-induced mutation is a substitution of the aspartic acid residue. In some embodiments, the human-induced mutation is an aspartic acid to arginine substitution. In some embodiments, the sequence of the Cas12a protein comprises any one of SEQ ID NOs: 12-19.
A Cas protein (e.g., a Cas12a protein) can comprise one or more domains. Non-limiting examples of domains include guide nucleic acid recognition and/or binding domains, nuclease domains (e.g., DNase or RNase domains, RuvC, HNH) , DNA binding domains, RNA binding domains, helicase domains, protein-protein interaction domains, and dimerization domains. A guide nucleic acid recognition and/or binding domain can interact with a guide nucleic acid. A nuclease domain can comprise catalytic activity for nucleic acid cleavage. A nuclease domain can lack catalytic activity to prevent nucleic acid cleavage. A Cas protein can be a chimeric Cas protein that is fused to other proteins or polypeptides. A Cas protein can be a chimera of various Cas proteins, for example, comprising domains from different Cas proteins.
A Cas protein (e.g., a Cas12a protein) used herein can be an active variant, inactive variant, or fragment of a wild-type or modified Cas protein. A Cas protein can comprise an amino acid change such as a deletion, insertion, substitution, variant, mutation, fusion, chimera, or any combination thereof relative to a wild-type version of the Cas protein. A Cas protein can be a polypeptide with at least about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%sequence identity or sequence similarity to a wild-type exemplary Cas protein. A Cas protein can be a polypeptide with at most about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%sequence identity and/or sequence similarity to a wild-type exemplary Cas protein. Variants or fragments can comprise at least about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%sequence identity or sequence similarity to a wild-type or modified Cas protein or a portion thereof. Variants or
fragments can be targeted to a nucleic acid locus in complex with a guide nucleic acid while lacking nucleic acid cleavage activity.
In some embodiments, a modified Cas protein has decreased function relative to the unmodified form. In some embodiments, a modified Cas protein is deficient in a function of the unmodified form. For example, a nuclease deficient Cas protein retains the ability to bind DNA but lacks or has reduced nucleic acid cleavage activity. A Cas nuclease (e.g., retaining wild-type nuclease activity, having reduced nuclease activity, and/or lacking nuclease activity) can function in a CRISPR/Cas system to regulate the level and/or activity of a target gene or protein (e.g., decrease, increase, or elimination) . The Cas protein can bind to a target polynucleotide and prevent transcription by physical obstruction or edit a nucleic acid sequence to yield non-functional gene products. In some embodiments, the modified Cas protein has no more than 90%, no more than 80%, no more than 70%, no more than 60%, no more than 50%, no more than 40%, no more than 30%, no more than 20%, no more than 10%, no more than 5%, or no more than 1%of the function (e.g., nuclease activity) of the wild-type Cas protein (e.g., Cas12a) . In some embodiments, the modified Cas protein has no substantial function of the wild-type Cas protein. When a Cas protein is a modified form that has no substantial nucleic acid-cleaving activity, it can be referred to as enzymatically inactive and/or “dead” (abbreviated by “d” ) . A dead Cas protein (e.g., dCas, dCas12a) can bind to a target polynucleotide but may not cleave the target polynucleotide. In some embodiments, a Cas12a protein provided herein is a dCas12a protein.
In some embodiments, a modified Cas protein can be a modified Cas “base editor” . Base editing enables direct, irreversible conversion of one target DNA base into another in a programmable manner, without requiring DNA cleavage or a donor DNA molecule. For example, Komor et al (2016, Nature, 533: 420-424) , teach a Cas9-cytidine deaminase fusion, where the Cas9 has also been engineered to be inactivated and not induce double-stranded DNA breaks. Additionally, Gaudelli et al (2017, Nature, doi: 10.1038/nature24644) teach a catalytically impaired Cas9 fused to a tRNA adenosine deaminase, which can mediate conversion of an A/T to G/C in a target DNA sequence. In some embodiments, a Cas12a protein provided herein is a modified Cas12a base editor.
A Cas protein can be modified to optimize regulation of gene expression. A Cas protein can be modified to increase or decrease nucleic acid binding affinity, nucleic acid binding specificity, and/or enzymatic activity. Cas proteins can also be modified to change
any other activity or property of the protein, such as stability. For example, one or more nuclease domains of the Cas protein can be modified, deleted, or inactivated, or a Cas protein can be truncated to remove domains that are not essential for the function of the protein or to optimize (e.g., enhance or reduce) the activity of the Cas protein for regulating gene expression.
One or a plurality of the nuclease domains (e.g., RuvC, HNH) of a Cas protein can be deleted or mutated so that they are no longer functional or comprise reduced nuclease activity. For example, in a Cas protein comprising at least two nuclease domains (e.g., Cas12a) , if one of the nuclease domains is deleted or mutated, the resulting Cas protein, known as a nickase, can generate a single-strand break at a CRISPR RNA (crRNA) recognition sequence within a double-stranded DNA but not a double-strand break. Such a nickase can cleave the complementary strand or the non-complementary strand, but may not cleave both. In some embodiments, double strand break targeting specificity is improved by targeting a nickase to opposite strands at two nearby loci. If a nickase cleaves the single strand at both loci, a double strand break is formed and can be repaired as described herein. If all of the nuclease domains of a Cas protein (e.g., RuvC nuclease domains in a Cas12a protein) are deleted or mutated, the resulting Cas protein can have a reduced or no ability to cleave both strands of a double-stranded DNA. In some embodiments, a Cas12a protein provided herein is a Cas12a nickase protein.
Also provided herein are fusion proteins comprising any of the proteins described above and a heterologous domain. As used throughout, a “fusion protein” is a protein comprising two different polypeptide sequences, i.e. a Cas12a protein sequence as described above and a heterologous polypeptide sequence, that are joined or linked to form a single polypeptide. In some embodiments, the two amino acid sequences are encoded by separate nucleic acid sequences that have been joined so that they are transcribed and translated to produce a single polypeptide. The Cas12a protein and the heterologous domain can be linked in any order and orientation relative to each other. For example, the C’ terminal end of the Cas12a protein may be linked to the N’ terminal end or the C’ terminal end of the heterologous domain. The Cas12a protein and the heterologous domain may also be separated by one or more additional fusion protein domains, as described below.
Exemplary heterologous domains include deaminase domains, transcription factor domains, nuclease domains, reverse-transcriptase domains, transposase domains, integrase
domains, uracil DNA glycosylase inhibitor domains, recombinase domains, nickase domains, methyltransferase domains, methylase domains, acetylase domains, acetyltransferase domains, transcriptional activator domains, and transcriptional repressor domains. See, e.g., WO2021/061507, incorporated herein by reference in its entirety.
In some embodiments, the fusion proteins provided herein comprise one or more linkers. Linkers, also referred to as spacers, as used herein are flexible molecules or a flexible stretch of molecules that joins or connects two portions (e.g., domains) of a fusion protein or a variant Cas12a protein as provided herein. In some embodiments, the linker is a polypeptide. Proteins with domains joined by polypeptide linkers are referred to as fusion proteins. In some embodiments, the linker is a non-peptide linker. Proteins with domains joined by polypeptide linkers are referred to as modified proteins. It will be understood that, where fusion proteins are discussed throughout the present disclosure, modified proteins are generally also contemplated, where feasible.
The linker may increase the range of orientations that may be adopted by the domains of the fusion protein or variant protein. The linker may be optimized to produce desired effects in the fusion protein or variant protein. Aspects of linker design and considerations are described, for example, in Chen, X. et al., Adv Drug Deliv Rev. 2013 Oct 15; 65 (10) : 1357-1369, and Klein, J.S. et al. 2014 Protein Eng. Des. Sel. 27 (10) : 325-330. In some embodiments, the proteins provided herein comprise a peptide linker. In some embodiments, the proteins provided herein comprise a non-peptide linker. In some embodiments, the proteins provided herein comprise a peptide linker and a non-peptide linker. The proteins provided herein may also comprise a plurality of linkers, including at least one peptide linker, at least one non-peptide linker, or at least one peptide linker and at least one non-peptide linker.
Linkers may be short or long, flexible or rigid. See, e.g., WO2021/061507, which incorporated herein by reference in its entirety, and WO 2020/168102, incorporated herein by reference in its entirety, and US 2021/0017506, incorporated herein by reference in its entirety.
In some embodiments, the length of a linker may affect one or more functions of the fusion protein. Selection of linkers to achieve the desired length is within the ability of one skilled in the art. In some embodiments, a peptide linker may be, for example, 5 to 100 or more amino acids in length (e.g., 4 aa, 5 aa, 8 aa, 10 aa, 15 aa, 18 aa, 20 aa, 25 aa, 30 aa, 35
aa, 40 aa, 45 aa, 50 aa, 55 aa, 60 aa, 65 aa, 70 aa, 75 aa, 80 aa, 85 aa, 90 aa, 95 aa, or 100 aa) . In some embodiments, the linker is about 30 amino acids in length. In some embodiments, the linker is about 8 amino acids in length.
Depending on length, linker sequence may have various conformations in secondary structure, such as helical, β-strand, coil/bend, and turns. In some instances, a linker sequence may have an extended conformation and function as an independent domain that does not interact with the adjacent protein domains. Linker sequences may be flexible or rigid. Flexible linkers provide a certain degree of movement or interaction between the polypeptide domains and are generally rich in small or polar amino acids such as Gly and Ser (e.g., at least 90%, at least 95%, at least 98%, at least 99%, or all of the amino acid residues of the linker are either Gly or Ser) . A rigid linker can be used to keep a fixed distance between the domains and to help maintain their independent functions. Linker attachment can be through an amide linkage (e.g., a peptide bond) or other functionalities as discussed further below.
In some embodiments, a peptide linker described herein comprises one or more repeats (e.g., 2 repeats, 3 repeats, 4 repeats, 5 repeats 6 repeats, or more) of GSSSS (SEQ ID NO: 43) and/or one or more repeats of GGGGS (SEQ ID NO: 44) and/or one or more repeats of GSSGSS (SEQ ID NO: 45) and/or one or more repeats of SGGS (SEQ ID NO: 77) . In some embodiments, the linker comprises an amino acid sequence with at least 90%sequence identity to (GSSSS) 6 (SEQ ID NO: 46) or (SGGS) 2 (SEQ ID NO: 78) . Additional exemplary peptide linkers include, but are not limited to, peptide linkers comprising SGSETPGTSESATPE (SEQ ID NO: 47) , SGSETPGTSESATPES (SEQ ID NO: 48) , (GGGGS) 3 (SEQ ID NO: 49) , (GGGGS) 5 (SEQ ID NO: 50) , (GGGGS) 10 (SEQ ID NO: 51) , GGGGGGGG (SEQ ID NO: 52) , GSAGSAAGSGEF (SEQ ID NO: 53) , A (EAAAK) 3A (SEQ ID NO: 54) , or A (EAAAK) 10A (SEQ ID NO: 55) . Additional non-limiting exemplary linkers that can be used include those disclosed in PCT/US2020/051383, Chen et al., Adv. Drug. Deliv. Rev. 65 (10) : 1357-1369 (2014) and Rosemalen et al., Biochemistry 2017, 56, 50, 6565-6574, the entire contents of both of which are herein incorporated by reference.
In some embodiments, a non-peptide linker can comprise any of a number of known chemical linkers. Exemplary chemical linkers can include one or more units of beta-alanine, 4-aminobutyric acid (GABA) , (2-aminoethoxy) acetic acid (AEA) , 5-aminobexanoic acid (Ahx) , PEG multimers, and trioxatricdeacan-succinamic acid (Ttds) . In some embodiments, the non-peptide linker comprises one or more units of polyethylene glycol (PEG) , which is
commonly used as a linker for conjugation of polypeptide domains due to its water solubility, lack of toxicity, low immunogenicity, and well-defined chain lengths. See, e.g., Ramirez-Paz, J., et al., PLoS One 13 (7) : e0197643 (2018) . The number of PEG linkage units may be selected based on the desired length of the linker.
Modified proteins comprising a non-peptide linker can be produced in a variety of ways. For example, a Cas12a protein and a heterologous domain may be produced separately (e.g., in vitro or by expression in and purification from host cells) and chemically linked in vitro. In some embodiments, a Cas12a protein, a heterologous domain, and a linker can each be produced separately and chemically linked in vitro. Various chemical linkers may be used to cross link two amino acid residues.
Also contemplated herein are embodiments in which the Cas12a protein and the heterologous domain as described above are used separately (e.g., introduced into cells separately or applied to target nucleic acids separately) and brought into proximity to form a complex without using linkers as described above. Various methods of forming complexes between two or more polypeptides are known in the art and include, but are not limited to, using protein-protein interaction strategies (e.g., SunTag, coiled-coil, etc. ) , using RNA-aptamers and associated binding proteins (e.g., MS2, N22, etc. ) , and Tag: Catcher strategies. For example, a site-directed nuclease of the present disclosure may comprise an MS2 RNA aptamer, which would facilitate interaction with a nonspecific end-processing enzyme comprising an MS2 coat protein.
In some embodiments, the fusion protein provided herein comprises an amino acid sequence having at least 70% (e.g., at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) identity to any one of SEQ ID NOs: 1-4. In some embodiments, the fusion protein provided herein comprises an amino acid sequence as set forth in any one of SEQ ID NOs: 5-19.
Any of the proteins and fusion proteins described herein can further comprise a targeting sequence which mediates the localization (or retention) of the protein to a sub-cellular location, e.g., plasma membrane or membrane of a given organelle, nucleus, cytosol, mitochondria, endoplasmic reticulum (ER) , Golgi, chloroplast, apoplast, peroxisome or other organelle. For example, a targeting sequence can direct a protein (e.g., a nuclease) to a nucleus utilizing a nuclear localization signal (NLS) ; outside of a nucleus of a cell, for
example to the cytoplasm, utilizing a nuclear export signal (NES) ; mitochondria utilizing a mitochondrial targeting signal; the endoplasmic reticulum (ER) utilizing an ER-retention signal; a peroxisome utilizing a peroxisomal targeting signal; plasma membrane utilizing a membrane localization signal; or combinations thereof. In some embodiments, the protein comprises a nuclear localization signal. Non-limiting examples of NLSs include an NLS sequence derived from: the NLS of the SV40 virus large T-antigen, having the amino acid sequence PKKKRKV (SEQ ID NO: 56) ; the NLS from nucleoplasmin (e.g. the nucleoplasmin bipartite NLS with the sequence KRPAATKKAGQAKKKK (SEQ ID NO: 57) ) ; the c-myc NLS having the amino acid sequence PAAKRVKLD (SEQ ID NO: 58) or RQRRNELKRSP (SEQ ID NO: 59) ; the hRNPA1 M9 NLS having the sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 60) ; the sequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO: 61) of the IBB domain from importin-alpha; the sequences VSRKRPRP (SEQ ID NO: 62) and PPKKARED (SEQ ID NO: 63) of the myoma T protein; the sequence PQPKKKPL (SEQ ID NO: 64) of human p53; the sequence SALIKKKKKMAP (SEQ ID NO: 65) of mouse c-abl IV; the sequences DRLRR (SEQ ID NO: 66) and PKQKKRK (SEQ ID NO: 67) of the influenza virus NS1; the sequence RKLKKKIKKL (SEQ ID NO: 68) of the Hepatitis virus delta antigen; the sequence REKKKFLKRR (SEQ ID NO: 69) of the mouse Mx1 protein; the sequence KRKGDEVDGVDEVAKKKSKK (SEQ ID NO: 70) of the human poly (ADP-ribose) polymerase; the sequence RKCLQAGMNLEARKTKK (SEQ ID NO: 71) of the steroid hormone receptors (human) glucocorticoid; and the sequence KRPRDRHDGELGGRKRAR (SEQ ID NO: 72) of the Agrobacterium VirD2 protein.
Any of the proteins and fusion proteins described herein can further comprise a detectable moiety, for example, a fluorescent protein or fragment thereof. Examples of fluorescent proteins include, but are not limited to, yellow fluorescent protein (YFP, for example, Venus) , green fluorescent protein (GFP) , and red fluorescent protein (RFP) as well as derivatives, for example, mutant derivatives, of these proteins. See, for example, Chudakov et al. “Fluorescent Proteins and Their Applications in Imaging Living Cells and Tissues, ” Physiological Reviews 90 (3) : 1103-1163 (2010) ; and Specht et al., “A Critical and Comparative Review of Fluorescent Tools for Live-Cell Imaging, ” Annual Review of Physiology 79: 93-117 (2017)) .
Any of the proteins and fusion proteins described herein can further comprise an affinity tag, for example, a polyhistidine tag (e.g., (His) 6 (SEQ ID NO: 73) ) , an HA tag (e.g.,
YPYDVPDYA (SEQ ID NO: 74) ) , albumin-binding protein, alkaline phosphatase, an AU1 epitope, an AU5 epitope, a biotin-carboxy carrier protein (BCCP) , a FLAG epitope (e.g., DYKDDDDK (SEQ ID NO: 75) , or a MYC epitope (e.g., EQKLISEEDL (SEQ ID NO: 76)) , to name a few. See, Kimple et al. “Overview of Affinity Tags for Protein Purification, Curr. Protoc. Protein Sci. 73: Unit-9.9 (2013) .
Also provided herein are variants of the polypeptides (e.g., proteins and fusion proteins) of this disclosure. Polypeptide variants retain their respective biological activity, unless explicitly noted otherwise. For example, variants of a Cas12a polypeptide retain the biological function of the full length, native sequence site directed Cas12a protein. In another example, variants of the heterologous domain retain the biological function of the full length, native sequence heterologous domain.
Modifications to any of the polypeptides or proteins provided herein are made by known methods. By way of example, modifications are made by site specific mutagenesis of nucleotides in a nucleic acid encoding the polypeptide, thereby producing a DNA encoding the modification, and thereafter expressing the DNA in recombinant cell culture to produce the encoded polypeptide. Techniques for making substitution mutations at predetermined sites in DNA having a known sequence are well known. For example, M13 primer mutagenesis and PCR-based mutagenesis methods can be used to make one or more substitution mutations. Any of the nucleic acid sequences provided herein can be codon-optimized to alter, for example, maximize expression, in a host cell or organism.
The amino acids in the polypeptides described herein can be any of the 20 naturally occurring amino acids, D-stereoisomers of the naturally occurring amino acids, unnatural amino acids and chemically modified amino acids. Unnatural amino acids (that is, those that are not naturally found in proteins) are also known in the art, as set forth in, for example, Zhang et al. “Protein engineering with unnatural amino acids, ” Curr. Opin. Struct. Biol. 23 (4) : 581-587 (2013) ; Xie et la. “Adding amino acids to the genetic repertoire, ” 9 (6) : 548-54 (2005) ) ; and all references cited therein. Β and γ amino acids are known in the art and are also contemplated herein as unnatural amino acids.
As used herein, a chemically modified amino acid refers to an amino acid whose side chain has been chemically modified. For example, a side chain can be modified to comprise a signaling moiety, such as a fluorophore or a radiolabel. A side chain can also be modified to comprise a new functional group, such as a thiol, carboxylic acid, or amino
group. Post-translationally modified amino acids are also included in the definition of chemically modified amino acids.
Also contemplated are conservative amino acid substitutions. By way of example, conservative amino acid substitutions can be made in one or more of the amino acid residues, for example, in one or more lysine residues of any of the polypeptides provided herein. One of skill in the art would know that a conservative substitution is the replacement of one amino acid residue with another that is biologically and/or chemically similar. The following eight groups each contain amino acids that are conservative substitutions for one another:
1) Alanine (A) , Glycine (G) ;
2) Aspartic acid (D) , Glutamic acid (E) ;
3) Asparagine (N) , Glutamine (Q) ;
4) Arginine (R) , Lysine (K) ;
5) Isoleucine (I) , Leucine (L) , Methionine (M) , Valine (V) ;
6) Phenylalanine (F) , Tyrosine (Y) , Tryptophan (W) ;
7) Serine (S) , Threonine (T) ; and
8) Cysteine (C) , Methionine (M) .
By way of example, when an arginine to serine is mentioned, also contemplated is a conservative substitution for the serine (e.g., threonine) . Nonconservative substitutions, for example, substituting a lysine with an asparagine, are also contemplated.
IV. Recombinant nucleic acids, constructs, vectors, and host cells
Also provided herein are recombinant nucleic acids encoding any of the variant Cas12a proteins or fusion proteins described herein. For example, a recombinant nucleic acid encoding a polypeptide that has at least 70% (e.g., at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) identity to any of SEQ ID NOs: 20-34 is also provided. Also provided are recombinant nucleic acids having at least 70%identity to any of SEQ ID NOs: 20-34.
Also provided is a DNA construct comprising a promoter operably linked to a recombinant nucleic acid encoding a fusion protein or domains thereof as described herein. A nucleic acid is “operably linked” when it is placed into a functional relationship with another nucleic acid sequence. Numerous promoters can be used in the constructs described herein. A promoter is a region or a sequence located upstream and/or downstream from the start of
transcription that is involved in recognition and binding of RNA polymerase and other proteins to initiate transcription.
The term “promoter” as used herein refers to a nucleotide sequence, usually upstream (5’ ) to its coding sequence, which controls the expression of the coding sequence by providing the recognition for RNA polymerase and other factors required for proper transcription. “Promoter regulatory sequences” consist of proximal and more distal upstream elements. Promoter regulatory sequences influence the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences include enhancers, promoters, untranslated leader sequences, introns, and polyadenylation signal sequences. They include natural and synthetic sequences as well as sequences that may be a combination of synthetic and natural sequences. An “enhancer” is a DNA sequence that can stimulate promoter activity and may be an innate element of the promoter or a heterologous element inserted to enhance the level or tissue specificity of a promoter. It is capable of operating in both orientations (normal or flipped) and is capable of functioning even when moved either upstream or downstream from the promoter. The meaning of the term “promoter” includes “promoter regulatory sequences. ”
The choice of promoters to be included depends upon several factors, including, but not limited to, efficiency, selectability, inducibility, desired expression level, and cell-or tissue-preferential expression. It is a routine matter for one of skill in the art to modulate the expression of a sequence by appropriately selecting and positioning promoters and other regulatory regions relative to that sequence.
It has been shown that certain promoters are able to direct RNA synthesis at a higher rate than others. These are called "strong promoters". Certain other promoters have been shown to direct RNA synthesis at higher levels only in particular types of cells or tissues and are often referred to as "tissue specific promoters", or "tissue-preferred promoters", if the promoters direct RNA synthesis preferentially in certain tissues (RNA synthesis may occur in other tissues at reduced levels) . Since patterns of expression of a chimeric gene (or genes) introduced into a plant are controlled using promoters, there is an ongoing interest in the isolation of novel promoters that are capable of controlling the expression of a chimeric gene (or genes) at certain levels in specific tissue types or at specific plant developmental stages.
Certain promoters are able to direct RNA synthesis at relatively similar levels across all tissues of a plant. These are called "constitutive promoters" or "tissue-independent"
promoters. Constitutive promoters can be divided into strong, moderate, and weak categories according to their effectiveness to directing RNA synthesis. Since it is necessary in many cases to simultaneously express a chimeric gene (or genes) in different tissues of a plant to get the desired functions of the gene (or genes) , constitutive promoters are especially useful in this regard. Though many constitutive promoters have been discovered from plants and plant viruses and characterized, there is still an ongoing interest in the isolation of more novel constitutive promoters, synthetic or native, which are capable of controlling the expression of a chimeric gene (or genes) at different levels and the expression of multiple genes in the same transgenic plant for gene stacking.
Among the most commonly used promoters are the nopaline synthase (NOS) promoter (Ebert et al., Proc. Natl. Acad. Sci. USA 84: 5745-5749 (1987) ) ; the octapine synthase (OCS) promoter; caulimovirus promoters such as the cauliflower mosaic virus (CaMV) 19S promoter (Lawton et al., Plant Mol. Biol. 9: 315-324 (1987) ) ; the light inducible promoter from the small subunit of rubisco (Pellegrineschi et al., Biochem. Soc. Trans. 23 (2) : 247-250 (1995) ) ; the Adh promoter (Walker et al., Proc. Natl. Acad. Sci. USA 84: 6624-66280 (1987) ) ; the sucrose synthase promoter (Yang et al., Proc. Natl. Acad. Sci. USA 87: 414-44148 (1990) ) ; the R gene complex promoter (Chandler et al., Plant Cell 1: 1175-1183 (1989) ) ; the chlorophyll a/b binding protein gene promoter; and the like. ”
Furthermore, it is contemplated that promoters combining elements from more than one promoter may be useful. For example, U.S. Pat. No. 5,491,288 discloses combining a Cauliflower Mosaic Virus promoter with a histone promoter. Thus, the elements from the promoters disclosed herein may be combined with elements from other promoters. Promoters which are useful for plant transgene expression include those that are inducible, viral, synthetic, constitutive (Odell Nature 313: 810–812 (1985) ) , temporally regulated, spatially regulated, tissue specific, and spatial temporally regulated. Using the regulatory elements described herein, numerous agronomic genes can be expressed in transformed plants. More particularly, plants can be genetically engineered to express various phenotypes of agronomic interest. ”
In some embodiments of the DNA constructs provided herein, the promoter can be a eukaryotic or a prokaryotic promoter. In some embodiments, the promoter is an inducible promoter, a native inducible promoter (e.g., drought-inducible Rab17) , a synthetic inducible promoter (e.g., auxin-inducible DR5, estradiol-inducible XVE/pLex, dexamethasone-
inducible GVG/Gal4) , a constitutive promoter (e.g., ZmUbq1, OsAct1, OsTub3, EF, EF1α) , an egg cell-specific promoter (e.g., EC1, EC2, EC3, EC4, EC5) , a pollen-specific promoter, an apical meristem tissue-specific promoter, or a promoter with enriched expression in the zygote. In some embodiments, the promoter is a floral mosaic promoter (e.g., ZmBde1, OsAP1) . In some embodiments, the promoter is a ubiquitin 4 promoter (e.g., a sugarcane ubiquitin 4 promoter) , an actin promoter, a tubulin promoter, a MADS box promoter, or a plant virus promoter. Suitable promoters are disclosed, e.g., in U.S. Pat. No. 10,519,456, the entire content of which is herein incorporated by reference, and PCT/US2022/020690, incorporated herein by reference.
The recombinant nucleic acids provided herein can be included in expression cassettes for expression in a host cell or an organism of interest. The cassette will include 5′and 3′regulatory sequences operably linked to a recombinant nucleic acid provided herein that allows for expression of a fusion protein. The cassette may additionally contain at least one additional gene or genetic element to be cotransformed into the cell or organism. Where additional genes or elements are included, the components are operably linked. Alternatively, the additional gene (s) or element (s) can be provided on multiple expression cassettes. Such an expression cassette is provided with a plurality of restriction sites and/or recombination sites for insertion of the polynucleotides to be under the transcriptional regulation of the regulatory regions. The expression cassette may additionally contain a selectable marker gene. The expression cassette will include in the 5′ to 3′ direction of transcription: a transcriptional and translational initiation region (i.e., a promoter) , a polynucleotide of the invention, and a transcriptional and translational termination region (i.e., termination region) functional in the cell or organism of interest. The promoters of the invention are capable of directing or driving expression of a coding sequence (i.e., a nucleic acid sequence that is transcribed into RNA such as mRNA, rRNA, tRNA, snRNA, ncRNA, lncRNA, sense RNA, or antisense RNA, regardless of whether the RNA is then translated to produce a protein) in a host cell. The regulatory regions (i.e., promoters, transcriptional regulatory regions, and translational termination regions) may be endogenous or heterologous to the host cell or to each other. As used herein, “heterologous” in reference to a sequence is a sequence that originates from a foreign species, or, if from the same species, is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention.
Additional regulatory signals include, but are not limited to, transcriptional initiation start sites, operators, activators, enhancers, other regulatory elements, ribosomal
binding sites, an initiation codon, termination signals, and the like. See Sambrook et al. (1992) Molecular Cloning: A Laboratory Manual, ed. Maniatis et al. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. ) ; Davis et al., eds. (1980) Advanced Bacterial Genetics (Cold Spring Harbor Laboratory Press) , Cold Spring Harbor, N.Y., and the references cited therein.
The expression cassette can also comprise a selectable marker gene for the selection of transformed cells. Marker genes include genes conferring antibiotic resistance, such as those conferring hygromycin resistance, ampicillin resistance, gentamicin resistance, neomycin resistance, to name a few. Additional selectable markers are known and any can be used.
In preparing the expression cassette, the various DNA fragments may be manipulated, so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame. Toward this end, adapters or linkers may be employed to join the DNA fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like. For this purpose, in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions, may be involved.
In preparing the expression cassette, the various DNA fragments may be manipulated, so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame. Toward this end, adapters or linkers may be employed to join the DNA fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like. For this purpose, in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions, may be used.
Further provided is a vector comprising a recombinant nucleic acid or DNA construct set forth herein. The vector is contemplated to have the necessary functional elements that direct and regulate transcription of the inserted nucleic acid. These functional elements include, but are not limited to, a promoter, regions upstream or downstream of the promoter, such as enhancers and terminators, that may regulate the transcriptional activity of the promoter, an origin of replication, appropriate restriction sites to facilitate cloning of inserts adjacent to the promoter, antibiotic resistance genes or other markers which can serve to select for cells containing the vector or the vector containing the insert, RNA splice
junctions, a transcription termination region, or any other region which may serve to facilitate the expression of the inserted gene or hybrid gene. See generally, Sambrook et al. Molecular Cloning: A Laboratory Manual, 4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 2012. The vector, for example, can be a plasmid. In some embodiments of the DNA constructs and vectors provided herein, the constructs and vectors comprise a nopaline synthase gene terminator sequence (e.g., an Agrobacterium tumefaciens nopaline synthase gene terminator sequence) .
There are numerous E. coli expression vectors known to one of ordinary skill in the art, which are useful for the expression of a nucleic acid. Other microbial hosts suitable for use include bacilli, such as Bacillus subtilis, and other enterobacteriaceae, such as Salmonella, Senatia, and various Pseudomonas species. In these prokaryotic hosts, one can also make expression vectors, which will typically contain expression control sequences compatible with the host cell (e.g., an origin of replication) . In addition, any number of a variety of well-known promoters will be present, such as the lactose promoter system, a tryptophan (Trp) promoter system, a beta-lactamase promoter system, or a promoter system from phage lambda. Additionally, yeast expression can be used. Provided herein is a nucleic acid encoding a polypeptide of the present invention, wherein the nucleic acid can be expressed by a yeast cell. More specifically, the nucleic acid can be expressed by Pichia pastoris or S. cerevisiae.
Mammalian cells also permit the expression of proteins in an environment that favors important post-translational modifications such as folding and cysteine pairing, addition of complex carbohydrate structures, and secretion of active protein. Vectors useful for the expression of active proteins in mammalian cells are known in the art and can contain genes conferring hygromycin resistance, geneticin or G418 resistance, or other genes or phenotypes suitable for use as selectable markers, or methotrexate resistance for gene amplification. A number of suitable host cell lines capable of secreting intact human proteins have been developed in the art, and include CHO cells, HeLa cells, HEK-293 cells, HEK-293T cells, U2OS cells, or any other primary or transformed cell line. Other suitable host cell lines include COS-7 cells, myeloma cell lines, Jurkat cells, etc. Expression vectors for these cells can include expression control sequences, such as an origin of replication, a promoter, an enhancer, and necessary information processing sites, such as ribosome binding sites, RNA splice sites, polyadenylation sites, and transcriptional terminator sequences. Preferred
expression control sequences are promoters derived from immunoglobulin genes, SV40, Adenovirus, Bovine Papilloma Virus, etc.
The expression vectors described herein can also include the nucleic acids as described herein under the control of an inducible promoter such as the tetracycline inducible promoter or a glucocorticoid inducible promoter. The nucleic acids of the present invention can also be under the control of a tissue-specific promoter to promote expression of the nucleic acid in specific cells, tissues or organs. Any regulatable promoter, such as a metallothionein promoter, a heat-shock promoter, and other regulatable promoters, of which many examples are well known in the art are also contemplated. Furthermore, a Cre-loxP inducible system can also be used, as well as a Flp recombinase inducible promoter system, both of which are known in the art.
Insect cells also permit the expression of the polypeptides. Recombinant proteins produced in insect cells with baculovirus vectors undergo post-translational modifications similar to that of wild-type mammalian proteins.
Also provided herein are host cells comprising the recombinant nucleic acids, DNA constructs, and/or vectors described herein as well as methods of making such cells. In some embodiments, the cell is a plant cell. In some embodiments, the plant cell is a maize plant cell, a wheat plant cell, a rice plant cell, a soybean plant cell, a sunflower plant cell, or a tomato plant cell.
A host cell comprising a nucleic acid or a vector described herein is provided. The host cell can be an in vitro, ex vivo, or in vivo host cell. Host cells as provided herein are capable of expressing the fusion protein. Cell populations of any of the host cells described herein are also provided. In some embodiments, the cell population comprises a plurality of cells, wherein the plurality of cells comprise a recombinant nucleic acid encoding the fusion protein as described herein. In some embodiments, the cell population comprises a plurality of cells, wherein the plurality of cells comprises a DNA construct encoding the protein and/or fusion protein as described herein. In some embodiments, the cell population comprises a plurality of cells, wherein the plurality of cells comprises a vector comprising a recombinant nucleic acid or a DNA construct encoding the protein and/or fusion protein as described herein. In some embodiments, the cell population comprises a plurality of cells, wherein the plurality of cells comprise a plurality of any of the host cells described herein. In some
embodiments, a plurality of cells of any of the cell populations described herein express a protein and/or fusion protein as described herein.
In some embodiments, the provided cells express the protein and/or fusion protein stably or transiently. Stable expression of the protein and/or fusion protein in a cell refers to integration of any of the nucleic acids, DNA constructs, or vectors described herein into the genome of the cell, thereby allowing the cell to express the protein and/or fusion protein. Transient expression refers to expression of the protein and/or fusion protein directly from any of the nucleic acids, DNA constructs, and/or vectors following introduction into the cell (i.e., the gene encoding the protein and/or fusion protein is not integrated into the genome of the cell) .
In some embodiments, the provided cells express the protein and/or fusion protein constitutively or inducibly. Constitutive expression refers to ongoing, continuous expression of a gene (i.e., of a protein) , whereas inducible expression refers to gene (protein) expression that is responsive to a stimulus. Inducible expression is generally regulated via an inducible promoter, a description of which is included above.
A cell culture comprising one or more host cells described herein is also provided. Methods for the culture and production of many cells, including cells of bacterial (for example E. coli and other bacterial strains) , animal (especially mammalian) , and archebacterial origin are available in the art. See e.g., Sambrook, supra; Ausubel, ed. (1995) Current Protocols in Molecular Biology, John Wiley &Sons, as well as Freshney (1994) Culture of Animal Cells, a Manual of Basic Technique, 3rd Ed., Wiley-Liss, New York and the references cited therein; Doyle and Griffiths (1997) Mammalian Cell Culture: Essential Techniques John Wiley and Sons, NY; Humason (1979) Animal Tissue Techniques, 4th Ed. W.H. Freeman and Company; and Ricciardelli, et al., (1989) In vitro Cell Dev. Biol. 25: 1016-1024.
The host cell can be a prokaryotic cell, including, for example, a bacterial cell. Alternatively, the cell can be a eukaryotic cell, for example, a mammalian cell. In some embodiments, the cell can be a HEK-293T cell, a HEK-293 cell, a Chinese hamster ovary (CHO) cell, a U2OS cell, or any other primary or transformed cell. In some embodiments, the cell can be a COS-7 cell, a HELA cell, an avian cell, a myeloma cell, a Pichia cell, an insect cell or a plant cell. A number of other suitable host cell lines have been developed and include myeloma cell lines, fibroblast cell lines, and a variety of tumor cell lines such as
melanoma cell lines. The vectors containing the nucleic acid segments of interest can be transferred or introduced into the host cell by well-known methods, which vary depending on the type of cellular host.
As used herein, the phrase “introducing” in the context of introducing a nucleic acid into a cell (e.g., a prokaryotic cell, a bacterial cell, a eukaryotic cell, a plant cell) refers to the translocation of the nucleic acid sequence from outside a cell to inside the cell. In some cases, introducing refers to translocation of the nucleic acid from outside the cell to inside the nucleus of the cell. Where more than one nucleic acid molecule is to be introduced, these nucleic acid molecules can be assembled as part of a single polynucleotide or nucleic acid construct, or as separate polynucleotide or nucleic acid constructs, and can be located on the same or different nucleic acie constructs. Accordingly, such polynucleotides can be introduced into cells (e.g., plant cells) in a single transformation event, in separate transformation events, or, e.g., as part of a breeding protocol. Various methods of introducing a nucleic acid into a cell are contemplated, including but not limited to, electroporation, nanoparticle delivery, biolistic transformation, viral delivery, contact with nanowires or nanotubes, receptor mediated internalization, translocation via cell penetrating peptides, liposome mediated translocation, DEAE dextran, lipofectamine, calcium phosphate or any method now known or identified in the future for introduction of nucleic acids into prokaryotic or eukaryotic cellular hosts. A targeted nuclease system (e.g., an RNA-guided nuclease, a transcription activator-like effector nuclease (TALEN) , a zinc finger nuclease (ZFN) , or a megaTAL (MT) can also be used to introduce a nucleic acid, for example, a nucleic acid encoding a protein and/or fusion protein described herein, into a host cell. See Li et al. Signal Transduction and Targeted Therapy 5, Article No. 1 (2020) .
Transformation of a cell may be stable or transient. Thus, a transgenic cell, plant cell, plant and/or plant part of the invention can be stably transformed or transiently transformed. Transformation can refer to the transfer of a nucleic acid molecule into the genome of a host cell, resulting in genetically stable inheritance. In some embodiments, the introduction into a plant, plant part and/or plant cell is via bacterial-mediated transformation, particle bombardment transformation, calcium-phosphate-mediated transformation, cyclodextrin-mediated transformation, electroporation, liposome-mediated transformation, nanoparticle-mediated transformation, polymer-mediated transformation, virus-mediated nucleic acid delivery, whisker-mediated nucleic acid delivery, microinjection, sonication, infiltration, polyethylene glycol-mediated transformation, protoplast transformation, or any
other electrical, chemical, physical and/or biological mechanism that results in the introduction of nucleic acid into the plant, plant part and/or cell thereof, or any combination thereof.
Procedures for transforming plants are well known and routine in the art and are described throughout the literature. Non-limiting examples of methods for transformation of plants include transformation via bacterial-mediated nucleic acid delivery (e.g. via bacteria from the genus Agrobacterium) , viral-mediated nucleic acid delivery, silicon carbide or nucleic acid whisker-mediated nucleic acid delivery, liposome mediated nucleic acid delivery, microinjection, microparticle bombardment, calcium-phosphate-mediated transformation, cyclodextrin-mediated transformation, electroporation, nanoparticle-mediated transformation, , sonication, infiltration, PEG-mediated nucleic acid uptake, as well as any other electrical, chemical, physical (mechanical) and/or biological mechanism that results in the introduction of nucleic acid into the plant cell, including any combination thereof. General guides to various plant transformation methods known in the art include Miki et al. ( “Procedures for Introducing Foreign DNA into Plants” in Methods in Plant Molecular Biology and Biotechnology, Glick, B.R. and Thompson, J.E., Eds. (CRC Press, Inc., Boca Raton, 1993) , pages 67-88) and Rakowoczy-Trojanowska (Cell Mol Biol Lett 7: 849-858 (2002)) .
Agrobacterium-mediated transformation is a commonly used method for transforming plants because of its high efficiency of transformation and because of its broad utility with many different species. Agrobacterium-mediated transformation typically involves transfer of the binary vector carrying the foreign DNA of interest to an appropriate Agrobacterium strain that may depend on the complement of vir genes carried by the host Agrobacterium strain either on a co-resident Ti plasmid or chromosomally (Uknes et al. 1993, Plant Cell 5: 159-169) . The transfer of the recombinant binary vector to Agrobacterium can be accomplished by a tri-parental mating procedure using Escherichia coli carrying the recombinant binary vector, a helper E. coli strain that carries a plasmid that is able to mobilize the recombinant binary vector to the target Agrobacterium strain. Alternatively, the recombinant binary vector can be transferred to Agrobacterium by nucleic acid transformation (and Willmitzer 1988, Nucleic Acids Res 16: 9877) .
Transformation of a plant by recombinant Agrobacterium usually involves co-cultivation of the Agrobacterium with explants from the plant and follows methods well
known in the art. Transformed tissue is typically regenerated on selection medium carrying an antibiotic or herbicide resistance marker between the binary plasmid T-DNA borders.
Another method for transforming plants, plant parts and plant cells involves propelling inert or biologically active particles at plant tissues and cells. See, e.g., US Patent Nos. 4,945,050; 5,036,006 and 5,100,792. Generally, this method involves propelling inert or biologically active particles at the plant cells under conditions effective to penetrate the outer surface of the cell and afford incorporation within the interior thereof. When inert particles are utilized, the vector can be introduced into the cell by coating the particles with the vector containing the nucleic acid of interest. Alternatively, a cell or cells can be surrounded by the vector so that the vector is carried into the cell by the wake of the particle. Biologically active particles (e.g., dried yeast cells, dried bacteria or a bacteriophage, each containing one or more nucleic acids sought to be introduced) also can be propelled into plant tissue. As used herein, the phrase “biolistic transformation” refers to a method of introducing RNA or DNA into cells (e.g., plant cells) directly, in which RNA or DNA is mixed with heavy metal particles (e.g., tungsten or gold) and released into the cell (e.g., the plant cell) using high speed pressure to allow the RNA or DNA to penetrate the cell (e.g., to penetrate the plant cell wall) .
The CRISPR/Cas system can also be used to edit the genome of a host cell or organism. As detailed above, the “CRISPR/Cas” system refers to a widespread class of bacterial systems for defense against foreign nucleic acid. Any of the CRISPR/Cas system components described herein may be used to introduce proteins, fusion proteins, recombinant nucleic acids, or systems into the genome of a host cell or organism. Methods for CRISPR/Cas system mediated genome editing are known in the art. It will be understood that use of a CRISPR/Cas system for introduction of proteins, fusion proteins, recombinant nucleic acids, or systems described herein into the genome of a host cell or organism is different from the particular methods and systems provided herein.
Any of the proteins and/or fusion proteins described herein can be purified or isolated from a host cell or population of host cells. For example, a recombinant nucleic acid encoding any of the proteins and/or fusion proteins described herein can be introduced into a host cell under conditions that allow expression of the protein and/or fusion protein. In some embodiments, the recombinant nucleic acid is codon-optimized for expression. After
expression in the host cell, the protein and/or fusion protein can be isolated or purified using purification methods known in the art.
V. Systems
In another aspect, provided herein are systems useful for editing one or more nucleic acids. The systems comprise one or more of the Cas12a proteins and/or fusion proteins (or recombinant nucleic acids, constructs, vectors, or host cells) described above. In some embodiments, the systems further comprise one or more additional elements that are useful for editing one or more nucleic acids. For example, a system comprising a fusion protein comprising a Cas nuclease may further comprise one or more guide nucleic acids, which are detailed below. The systems provided herein are useful for performing the methods described in Section VI of this disclosure.
In some cases, the systems and methods described herein comprise at least one guide nucleic acid polynucleotide. In some cases, the systems and methods described herein comprise a plurality of guide nucleic acids. In some embodiments, the polynucleotide can be deoxyribonucleic acid (DNA) . In some cases, the DNA sequence can be single-stranded or doubled-stranded. In some embodiments, the at least one guide nucleic acid polynucleotide can be ribonucleic acid (guide RNA) .
In some embodiments, the Cas12a protein can be complexed with the at least one guide RNA polynucleotide. The at least one guide RNA polynucleotide can comprise a nucleic-acid targeting region that comprises a complementary sequence to a nucleic acid sequence on the targeted polynucleotide such as the targeted genomic loci or genes to confer sequence specificity of nuclease targeting. In some embodiments, the at least one guide RNA polynucleotide can comprise two separate nucleic acid molecules, which can be referred to as a double guide nucleic acid or a single nucleic acid molecule, which can be referred to as a single guide nucleic acid (e.g., single guide RNA or sgRNA) .
The Cas protein-binding segment of a guide nucleic acid can comprise two stretches of nucleotides (e.g., crRNA and tracrRNA) that are complementary to one another. The two stretches of nucleotides (e.g., crRNA and tracrRNA) that are complementary to one another can be covalently linked by intervening nucleotides (e.g., a linker in the case of a single guide nucleic acid) . The two stretches of nucleotides (e.g., crRNA and tracrRNA) that are complementary to one another can hybridize to form a double stranded RNA duplex or hairpin of the Cas protein-binding segment, thus resulting in a stem-loop structure. The
crRNA and the tracrRNA can be covalently linked via the 3′ end of the crRNA and the 5′ end of the tracrRNA. Alternatively, tracrRNA and crRNA can be covalently linked via the 5′ end of the tracrRNA and the 3′ end of the crRNA. A crRNA can comprise the nucleic acid-targeting segment (e.g., spacer region) of the guide nucleic acid and a stretch of nucleotides that can form one half of a double-stranded duplex of the Cas protein-binding segment of the guide nucleic acid. The crRNA can also provide a single-stranded nucleic acid targeting segment (e.g., a spacer region) that hybridizes to a target nucleic acid recognition sequence (e.g., protospacer) . Whether a nuclease requires a crRNA molecule only or whether it requires both a crRNA molecule and a tracrRNA molecule (whether covalently linked or not) depends on the CRISPR-associated nuclease used. Cas12 proteins typically do not require a tracrRNA.
In some embodiments, the nucleic acid-targeting region of a guide nucleic acid can be between 18 to 72 nucleotides in length. The nucleic acid-targeting region of a guide nucleic acid (e.g., spacer region) can have a length of from about 12 nucleotides to about 100 nucleotides. For example, the nucleic acid-targeting region of a guide nucleic acid (e.g., spacer region) can have a length of from about 12 nucleotides (nt) to about 80 nt, from about 12 nt to about 50 nt, from about 12 nt to about 40 nt, from about 12 nt to about 30 nt, from about 12 nt to about 25 nt, from about 12 nt to about 20 nt, from about 12 nt to about 19 nt, from about 12 nt to about 18 nt, from about 12 nt to about 17 nt, from about 12 nt to about 16 nt, or from about 12 nt to about 15 nt. Alternatively, the DNA-targeting segment can have a length of from about 18 nt to about 20 nt, from about 18 nt to about 25 nt, from about 18 nt to about 30 nt, from about 18 nt to about 35 nt, from about 18 nt to about 40 nt, from about 18 nt to about 45 nt, from about 18 nt to about 50 nt, from about 18 nt to about 60 nt, from about 18 nt to about 70 nt, from about 18 nt to about 80 nt, from about 18 nt to about 90 nt, from about 18 nt to about 100 nt, from about 20 nt to about 25 nt, from about 20 nt to about 30 nt, from about 20 nt to about 35 nt, from about 20 nt to about 40 nt, from about 20 nt to about 45 nt, from about 20 nt to about 50 nt, from about 20 nt to about 60 nt, from about 20 nt to about 70 nt, from about 20 nt to about 80 nt, from about 20 nt to about 90 nt, or from about 20 nt to about 100 nt. The length of the nucleic acid-targeting region can be at least 5, 10, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30 or more nucleotides. The length of the nucleic acid-targeting region (e.g., spacer sequence) can be at most 5, 10, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30 or more nucleotides.
In some embodiments, the nucleic acid-targeting region of a guide nucleic acid (e.g., spacer) is 20 nucleotides in length. In some embodiments, the nucleic acid-targeting region of a guide nucleic acid is 19 nucleotides in length. In some embodiments, the nucleic acid-targeting region of a guide nucleic acid is 18 nucleotides in length. In some embodiments, the nucleic acid-targeting region of a guide nucleic acid is 17 nucleotides in length. In some embodiments, the nucleic acid-targeting region of a guide nucleic acid is 16 nucleotides in length. In some embodiments, the nucleic acid-targeting region of a guide nucleic acid is 21 nucleotides in length. In some embodiments, the nucleic acid-targeting region of a guide nucleic acid is 22 nucleotides in length.
The nucleotide sequence of the guide nucleic acid that is complementary to a nucleotide sequence (target sequence) of the target nucleic acid can have a length of, for example, at least about 12 nt, at least about 15 nt, at least about 18 nt, at least about 19 nt, at least about 20 nt, at least about 25 nt, at least about 30 nt, at least about 35 nt or at least about 40 nt. The nucleotide sequence of the guide nucleic acid that is complementary to a nucleotide sequence (target sequence) of the target nucleic acid can have a length of from about 12 nucleotides (nt) to about 80 nt, from about 12 nt to about 50 nt, from about 12 nt to about 45 nt, from about 12 nt to about 40 nt, from about 12 nt to about 35 nt, from about 12 nt to about 30 nt, from about 12 nt to about 25 nt, from about 12 nt to about 20 nt, from about 12 nt to about 19 nt, from about 19 nt to about 20 nt, from about 19 nt to about 25 nt, from about 19 nt to about 30 nt, from about 19 nt to about 35 nt, from about 19 nt to about 40 nt, from about 19 nt to about 45 nt, from about 19 nt to about 50 nt, from about 19 nt to about 60 nt, from about 20 nt to about 25 nt, from about 20 nt to about 30 nt, from about 20 nt to about 35 nt, from about 20 nt to about 40 nt, from about 20 nt to about 45 nt, from about 20 nt to about 50 nt, or from about 20 nt to about 60 nt.
A protospacer sequence of a targeted polynucleotide can be identified by identifying a protospacer-adjacent motif (PAM) within a region of interest and selecting a region of a desired size upstream or downstream of the PAM as the protospacer. A corresponding spacer sequence can be designed by determining the complementary sequence of the protospacer region.
A spacer sequence can be identified using a computer program (e.g., machine readable code) . The computer program can use variables such as predicted melting temperature, secondary structure formation, and predicted annealing temperature, sequence
identity, genomic context, chromatin accessibility, %GC, frequency of genomic occurrence, methylation status, presence of SNPs, and the like.
The percent complementarity between the nucleic acid-targeting sequence (e.g., a spacer sequence of the at least one guide polynucleotide as disclosed herein) and the target nucleic acid (e.g., a protospacer sequence of the one or more target loci as disclosed herein) can be at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100%. The percent complementarity between the nucleic acid-targeting sequence and the target nucleic acid can be at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100%over about 20 contiguous nucleotides.
The Cas protein binding segment of a guide nucleic acid can have a length of from about 10 nucleotides to about 100 nucleotides, e.g., from about 10 nucleotides (nt) to about 20 nt, from about 20 nt to about 30 nt, from about 30 nt to about 40 nt, from about 40 nt to about 50 nt, from about 50 nt to about 60 nt, from about 60 nt to about 70 nt, from about 70 nt to about 80 nt, from about 80 nt to about 90 nt, or from about 90 nt to about 100 nt. For example, the Cas protein-binding segment of a guide nucleic acid can have a length of from about 15 nucleotides (nt) to about 80 nt, from about 15 nt to about 50 nt, from about 15 nt to about 40 nt, from about 15 nt to about 30 nt or from about 15 nt to about 25 nt.
The dsRNA duplex of the Cas protein-binding segment of the guide nucleic acid can have a length from about 6 base pairs (bp) to about 50 bp. For example, the dsRNA duplex of the protein-binding segment can have a length from about 6 bp to about 40 bp, from about 6 bp to about 30 bp, from about 6 bp to about 25 bp, from about 6 bp to about 20 bp, from about 6 bp to about 15 bp, from about 8 bp to about 40 bp, from about 8 bp to about 30 bp, from about 8 bp to about 25 bp, from about 8 bp to about 20 bp or from about 8 bp to about 15 bp. For example, the dsRNA duplex of the Cas protein-binding segment can have a length from about from about 8 bp to about 10 bp, from about 10 bp to about 15 bp, from about 15 bp to about 18 bp, from about 18 bp to about 20 bp, from about 20 bp to about 25 bp, from about 25 bp to about 30 bp, from about 30 bp to about 35 bp, from about 35 bp to about 40 bp, or from about 40 bp to about 50 bp.
In some embodiments, the dsRNA duplex of the Cas protein-binding segment can have a length of 36 base pairs. The percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the protein-binding segment can be at
least about 60%. For example, the percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the protein-binding segment can be at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or at least about 99%. In some cases, the percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the protein-binding segment is 100%.
Guide nucleic acids of the systems of the disclosure can include modifications or sequences that provide for additional desirable features (e.g., modified or regulated stability; subcellular targeting; tracking with a fluorescent label; a binding site for a protein or protein complex; and the like) . Examples of such modifications include, for example, a 5′cap (a7-methylguanylate cap (m7G) ) ; a 3′ polyadenylated tail (a 3′ poly (A) tail) ; a riboswitch sequence (e.g., to allow for regulated stability and/or regulated accessibility by proteins and/or protein complexes) ; a stability control sequence; a sequence that forms a dsRNA duplex (a hairpin) ) ; a modification or sequence that targets the RNA to a subcellular location (e.g., nucleus, mitochondria, chloroplasts, and the like) ; a modification or sequence that provides for tracking (e.g., direct conjugation to a fluorescent molecule, conjugation to a moiety that facilitates fluorescent detection, a sequence that allows for fluorescent detection, and so forth) ; a modification or sequence that provides a binding site for proteins (e.g., proteins that act on DNA, including transcriptional activators, transcriptional repressors, DNA methyl transferases, DNA demethylases, histone acetyltransferases, histone deacetylases, and combinations thereof.
A guide nucleic acid can comprise one or more modifications (e.g., a base modification, a backbone modification) , to provide the nucleic acid with a new or enhanced feature (e.g., improved stability) . A guide nucleic acid can comprise a nucleic acid affinity tag. A nucleoside can be a base-sugar combination. The base portion of the nucleotide can be a heterocyclic base. The two most common classes of such heterocyclic bases are the purines and the pyrimidines. Nucleotides can be nucleosides that further include a phosphate group covalently linked to the sugar portion of the nucleoside. For those nucleosides that include a pentofuranosyl sugar, the phosphate group can be linked to the 2′, the 3′, or the 5′ hydroxyl moiety of the sugar. In forming guide nucleic acids, the phosphate groups can covalently link adjacent nucleosides to one another to form a linear polymeric compound. In turn, the respective ends of this linear polymeric compound can be further joined to form a circular compound; however, linear compounds can be suitable. In addition, linear compounds can
have internal nucleotide base complementarity and can therefore fold in a manner as to produce a fully or partially double-stranded compound. Further, within guide nucleic acids, the phosphate groups can commonly be referred to as forming the internucleoside backbone of the guide nucleic acid. The linkage or backbone of the guide nucleic acid can be a 3′ to 5′ phosphodiester linkage.
A guide nucleic acid can comprise a modified backbone and/or modified internucleoside linkages. Modified backbones can include those that retain a phosphorus atom in the backbone and those that do not have a phosphorus atom in the backbone.
Suitable modified guide nucleic acid backbones containing a phosphorus atom therein can include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkylphosphotriesters, methyl and other alkyl phosphonates such as 3′-alkylene phosphonates, 5′-alkylene phosphonates, chiral phosphonates, phosphinates, phosphoramidates including 3′-amino phosphoramidate and aminoalkylphosphoramidates, phosphorodiamidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, selenophosphates, and boranophosphates having normal 3′-5′ linkages, 2′-5′ linked analogs, and those having inverted polarity wherein one or more internucleotide linkages is a 3′ to 3′, a 5′ to 5′ or a 2′ to 2′ linkage. Suitable guide nucleic acids having inverted polarity can comprise a single 3′ to 3′ linkage at the 3′-most internucleotide linkage (such as a single inverted nucleoside residue in which the nucleobase is missing or has a hydroxyl group in place thereof) . Various salts (e.g., potassium chloride or sodium chloride) , mixed salts, and free acid forms can also be included.
A guide nucleic acid can comprise one or more phosphorothioate and/or heteroatom internucleoside linkages, in particular -CH2-NH-O-CH2-, -CH2-N (CH3) -O-CH2- (a methylene (methylimino) or MMI backbone) , -CH2-O-N (CH3) -CH2-, -CH2-N (CH3) -N (CH3) -CH2-and -O-N (CH3) -CH2-CH2- (wherein the native phosphodiester internucleotide linkage is represented as -O-P (=O) (OH) -O-CH2-) .
A guide nucleic acid can comprise a morpholino backbone structure. For example, a nucleic acid can comprise a 6-membered morpholino ring in place of a ribose ring. In some of these embodiments, a phosphorodiamidate or other non-phosphodiester internucleoside linkage replaces a phosphodiester linkage.
A guide nucleic acid can comprise polynucleotide backbones that are formed by short chain alkyl or cycloalkyl internucleoside linkages, mixed heteroatom and alkyl or cycloalkyl internucleoside linkages, or one or more short chain heteroatomic or heterocyclic internucleoside linkages. These can include those having morpholino linkages (formed in part from the sugar portion of a nucleoside) ; siloxane backbones; sulfide, sulfoxide and sulfone backbones; formacetyl and thioformacetyl backbones; methylene formacetyl and thioformacetyl backbones; riboacetyl backbones; alkene containing backbones; sulfamate backbones; methyleneimino and methylenehydrazino backbones; sulfonate and sulfonamide backbones; amide backbones; and others having mixed N, O, S and CH2 component parts.
A guide nucleic acid can comprise a nucleic acid mimetic. The term “mimetic” can be intended to include polynucleotides wherein only the furanose ring or both the furanose ring and the internucleotide linkage are replaced with non-furanose groups, replacement of only the furanose ring can also be referred as being a sugar surrogate. The heterocyclic base moiety or a modified heterocyclic base moiety can be maintained for hybridization with an appropriate target nucleic acid. One such nucleic acid can be a peptide nucleic acid (PNA) . In a PNA, the sugar-backbone of a polynucleotide can be replaced with an amide containing backbone, in particular an aminoethylglycine backbone. The nucleotides can be retained and are bound directly or indirectly to aza nitrogen atoms of the amide portion of the backbone. The backbone in PNA compounds can comprise two or more linked aminoethylglycine units which gives PNA an amide containing backbone. The heterocyclic base moieties can be bound directly or indirectly to aza nitrogen atoms of the amide portion of the backbone.
A guide nucleic acid can comprise linked morpholino units (morpholino nucleic acid) having heterocyclic bases attached to the morpholino ring. Linking groups can link the morpholino monomeric units in a morpholino nucleic acid. Non-ionic morpholino-based oligomeric compounds can have less undesired interactions with cellular proteins. Morpholino-based polynucleotides can be non-ionic mimics of guide nucleic acids. A variety of compounds within the morpholino class can be joined using different linking groups. A further class of polynucleotide mimetic can be referred to as cyclohexenyl nucleic acids (CeNA) . The furanose ring normally present in a nucleic acid molecule can be replaced with a cyclohexenyl ring. CeNA DMT protected phosphoramidite monomers can be prepared and used for oligomeric compound synthesis using phosphoramidite chemistry. The incorporation of CeNA monomers into a nucleic acid chain can increase the stability of a DNA/RNA hybrid. CeNA oligoadenylates can form complexes with nucleic acid complements with
similar stability to the native complexes. A further modification can include Locked Nucleic Acids (LNAs) in which the 2′-hydroxyl group is linked to the 4′carbon atom of the sugar ring thereby forming a 2′-C, 4′-C-oxymethylene linkage thereby forming a bicyclic sugar moiety. The linkage can be a methylene (-CH2-) , group bridging the 2′oxygen atom and the 4′ carbon atom wherein n is 1 or 2. LNA and LNA analogs can display very high duplex thermal stabilities with complementary nucleic acid (Tm=+3 to +10℃) , stability towards 3′-exonucleolytic degradation and good solubility properties.
A guide nucleic acid can comprise one or more substituted sugar moieties. Suitable polynucleotides can comprise a sugar substituent group selected from: OH; F; O-, S-, or N-alkyl; O-, S-, or N-alkenyl; O-, S-or N-alkynyl; or O-alkyl-O-alkyl, wherein the alkyl, alkenyl and alkynyl can be substituted or unsubstituted C1 to C10 alkyl or C2 to C10 alkenyl and alkynyl. Particularly suitable are O ( (CH2) nO) mCH3, O (CH2) nOCH3, O (CH2) nNH2, O (CH2) nCH3, O (CH2) nONH2, and O (CH2) nON ( (CH2) nCH3) 2, where n and m are from 1 to about 10. A sugar substituent group can be selected from: C1 to C10 lower alkyl, substituted lower alkyl, alkenyl, alkynyl, alkaryl, aralkyl, O-alkaryl or O-aralkyl, SH, SCH3, OCN, Cl, Br, CN, CF3, OCF3, SOCH3, SO2CH3, ONO2, NO2, N3, NH2, heterocycloalkyl, heterocycloalkaryl, aminoalkylamino, polyalkylamino, substituted silyl, an RNA cleaving group, a reporter group, an intercalator, a group for improving the pharmacokinetic properties of an guide nucleic acid, or a group for improving the pharmacodynamic properties of an guide nucleic acid, and other substituents having similar properties. A suitable modification can include 2′-methoxyethoxy (2′-O-CH2 CH2OCH3, also known as 2′-O- (2-methoxyethyl) or 2′-MOE, an alkoxyalkoxy group) . A further suitable modification can include 2′-dimethylaminooxyethoxy, (a O (CH2) 2ON (CH3) 2 group, also known as 2′-DMAOE) , 2′-dimethylaminoethoxyethoxy (also known as 2′-O-dimethyl-amino-ethoxy-ethyl or 2′-DMAEOE) , or 2′-O-CH2-O-CH2-N (CH3) 2.
Other suitable sugar substituent groups can include methoxy (-O-CH3) , aminopropoxy (--O CH2 CH2NH2) , allyl (-CH2-CH=CH2) , -O-allyl (--O--CH2-CH=CH2) and fluoro (F) . 2′-sugar substituent groups can be in the arabino (up) position or ribo (down) position. A suitable 2′-arabino modification is 2′-F. Similar modifications can also be made at other positions on the oligomeric compound, particularly the 3′ position of the sugar on the 3′ terminal nucleoside or in 2′-5′ linked nucleotides and the 5′ position of 5′ terminal nucleotide. Oligomeric compounds can also have sugar mimetics such as cyclobutyl moieties in place of the pentofuranosyl sugar.
A guide nucleic acid can also include nucleobase (or “base” ) modifications or substitutions. As used herein, “unmodified” or “natural” nucleobases can include the purine bases, (e.g. adenine (A) and guanine (G) ) , and the pyrimidine bases, (e.g. thymine (T) , cytosine (C) and uracil (U) ) . Modified nucleobases can include other synthetic and natural nucleobases such as 5-methylcytosine (5-me-C) , 5-hydroxymethyl cytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and other alkyl derivatives of adenine and guanine, 2-propyl and other alkyl derivatives of adenine and guanine, 2-thiouracil, 2-thiothymine and 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl (-C=C-CH3) uracil and cytosine and other alkynyl derivatives of pyrimidine bases, 6-azo uracil, cytosine and thymine, 5-uracil (pseudouracil) , 4-thiouracil, 8-halo, 8-amino, 8-thiol, 8-thioalkyl, 8-hydroxyl and other 8-substituted adenines and guanines, 5-halo particularly 5-bromo, 5-trifluoromethyl and other 5-substituted uracils and cytosines, 7-methylguanine and 7-methyladenine, 2-F-adenine, 2-amino-adenine, 8-azaguanine and 8-azaadenine, 7-deazaguanine and 7-deazaadenine and 3-deazaguanine and 3-deazaadenine. Modified nucleobases can include tricyclic pyrimidines such as phenoxazine cytidine (1H-pyrimido (5, 4-b) (1, 4) benzoxazin-2 (3H) -one) , phenothiazine cytidine (1H-pyrimido (5, 4-b) (1, 4) benzothiazin-2 (3H) -one) , G-clamps such as a substituted phenoxazine cytidine (e.g. 9- (2-aminoethoxy) -H-pyrimido (5, 4- (b) (1, 4) benzoxazin-2 (3H) -one) , carbazole cytidine (2H-pyrimido (4, 5-b) indol-2-one) , pyridoindole cytidine ( (3′, 2′: 4, 5) pyrrolo (2, 3-d) pyrimidin-2-one) .
Heterocyclic base moieties can include those in which the purine or pyrimidine base is replaced with other heterocycles, for example 7-deaza-adenine, 7-deazaguanosine, 2-aminopyridine and 2-pyridone. Nucleobases can be useful for increasing the binding affinity of a polynucleotide compound. These can include 5-substituted pyrimidines, 6-azapyrimidines and N-2, N-6 and O-6 substituted purines, including 2-aminopropyladenine, 5-propynyluracil and 5-propynylcytosine. 5-methylcytosine substitutions can increase nucleic acid duplex stability by 0.6-1.2℃ and can be suitable base substitutions (e.g., when combined with 2′-O-methoxyethyl sugar modifications) .
A modification of a guide nucleic acid can comprise chemically linking to the guide nucleic acid one or more moieties or conjugates that can enhance the activity, cellular distribution or cellular uptake of the guide nucleic acid. These moieties or conjugates can include conjugate groups covalently bound to functional groups such as primary or secondary hydroxyl groups. Conjugate groups can include, but are not limited to, intercalators, reporter molecules, polyamines, polyamides, polyethylene glycols, polyethers, groups that enhance
the pharmacodynamic properties of oligomers, and groups that can enhance the pharmacokinetic properties of oligomers. Conjugate groups can include, but are not limited to, cholesterols, lipids, phospholipids, biotin, phenazine, folate, phenanthridine, anthraquinone, acridine, fluoresceins, rhodamines, coumarins, and dyes. Groups that enhance the pharmacodynamic properties include groups that improve uptake, enhance resistance to degradation, and/or strengthen sequence-specific hybridization with the target nucleic acid. Groups that can enhance the pharmacokinetic properties include groups that improve uptake, distribution, metabolism or excretion of a nucleic acid. Conjugate moieties can include but are not limited to lipid moieties such as a cholesterol moiety, cholic acid a thioether, (e.g., hexyl-S-tritylthiol) , a thiocholesterol, an aliphatic chain (e.g., dodecandiol or undecyl residues) , a phospholipid (e.g., di-hexadecyl-rac-glycerol or triethylammonium 1, 2-di-O-hexadecyl-rac-glycero-3-H-phosphonate) , a polyamine or a polyethylene glycol chain, or adamantane acetic acid, a palmityl moiety, or an octadecylamine or hexylamino-carbonyl-oxycholesterol moiety.
In some embodiments, the at least one guide RNA polynucleotide of a system or method provided herein can bind to at least a portion of a genome (e.g., a plant genome) or a gene (e.g., a plant gene) . In some cases, the at least one guide RNA polynucleotide is capable of forming a complex with a Cas12a protein to direct the protein to target the portion of a target nucleic acid (e.g., a site in a genome or a gene) .
In some embodiments, the systems described herein comprise at least one guide RNA polynucleotide that is able to form a complex with a Cas12a protein or fusion protein of the system. In some embodiments, the systems described herein comprise at least two (e.g., at least three, at least four, at least five, or at least six) different guide RNA polynucleotides that are able to form a complex with a site-directed nuclease portion of a fusion protein of the system.
In some embodiments, the guide nucleic acid comprises a nucleotide sequence having at least 70% (e.g., at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) identity to any one of SEQ ID NOs: 27-34 as set forth in Table 5.
Table 5. Exemplary gRNA sequences
Also provided herein are kits that include the components of the systems described in this disclosure. In some embodiments, the kits include one or more of the fusion proteins and/or polynucleotides described herein.
VI. Methods
In another aspect, provided herein are methods for editing one or more nucleic acids using the Cas12a proteins, fusion proteins and/or systems described herein. In some embodiments, the methods comprise contacting a nucleic acid (i.e., the nucleic acid to be edited) with at least one Cas12a protein and/or fusion protein as described herein. In some embodiments, the methods further comprise contacting the nucleic acid with a guide RNA (e.g., as described in Section V above) having a region complementary to a selected portion of the nucleic acid. In some embodiments, contacting the nucleic acid with the Cas12a protein and/or fusion protein and the guide RNA results in an edit to the nucleic acid. The nucleic acid (i.e., the nucleic acid to be edited) can be any suitable nucleic acid. In some embodiments, the nucleic acid is a portion of a chromosome. In some embodiments, the nucleic acid is a portion of a genome (e.g., a plant genome) .
As described herein and demonstrated in the Examples below, the methods provided herein can result in increased frequency of one or more desired nucleic acid editing outcomes
(e.g., SDN-1 editing) . In some embodiments, SDN-1 editing efficiency can be measured by dividing the number of plants with an insertion or deletion ( “indel” ) by the total number of transgenic plants. In some embodiments, use of a Cas12a protein or fusion protein provided herein results in an increase in SDN-1 editing efficiency relative to use of an unmodified (i.e., wild-type) Cas12a protein. In some embodiments, indel events can be further analyzed for the occurrence of homozygous edits (i.e., the same indel is present at both alleles of the target nucleic acid) and biallelic edits (i.e., different indels are present at each allele of the target nucleic acid) . In some embodiments, the rate of homozygous/biallelic edits can be measured by dividing the number of plants with homozygous/biallelic edits by the total number of plants with indels. In some embodiments, use of a Cas12a protein or fusion protein provided herein results in an increase in the rate of homozygous/biallelic edits.
The methods herein comprise providing a Cas12a protein and/or fusion protein and a nucleic acid to be edited and can also comprise providing at least one guide RNA. These various components can be provided using any suitable technique. For example, providing a Cas12a protein or fusion protein can comprise introducing the Cas12a protein or fusion protein into a cell or introducing a recombinant nucleic acid, construct, or vector encoding the Cas12a protein or fusion protein into a cell. Similarly, a gRNA can be provided by introducing the gRNA itself or a nucleic acid sequence encoding the gRNA. In some embodiments, a Cas12a protein and/or fusion protein and a gRNA can be encoded by the same DNA construct or vector.
EXAMPLES
Example 1. C965S acts synergistically with D156R to improve the efficiency of SDN1 editing at difficult target sites in maize
By analyzing the crystal structure of LbCas12a (PDB entry 5XUS) , two surface-exposed cysteine residues (Cys965 and Cys1090) and another close to the N-terminus (Cys10) of the native LbCas12a protein (SEQ ID NO: 1) were selected for site-directed mutagenesis (FIG. 1) . A total of five LbCas12a variants were generated: in the first variant, the Cys965 residue was mutated to a serine residue, named LbCas12a-C965S; in the second variant, both the Cys10 and Cys965 residues were mutated to serine residues, named LbCas12a-C10S-C965S; in the third variant, both the Cys965 and Cys1090 residues were mutated to serine residues, named LbCas12a-C965S-C1090S; in the fourth variant, the Cys965 residue was mutated a serine residue, while the Asp156 residue was mutated to an
arginine residue, named LbCas12a-D156R-C965S; in the fifth variant, only the Asp156 residue was mutated to an arginine residue, named LbCas12a-D156R. The coding sequence of LbCas12a was optimized based on maize-preferred codon usage; the codon triplet of selected cysteine residues, TGC, was mutated to serine-coding TCC by introducing the mutation in overlapping PCR primers. For all five variants plus the wildtype LbCas12a as a control, an SV40 NLS (SEQ ID NO: 56) was fused to the N-terminus via a flexible 30-amino acid (GSSSS) 6 (SEQ ID NO: 46) peptide linker, while two SV40 NLS’s separated by an 8-amino acid (SGGS) 2 (SEQ ID NO: 78) peptide linker were fused to the C-terminus via a flexible 30-amino acid (GSSSS) 6 (SEQ ID NO: 46) peptide linker as well.
For each of the five variants plus the wildtype control, a binary vector was constructed to express one variant (or the control) in stable transgenic maize plants, in order to assess the SDN1-generation performance of the variant. In each construct, the coding sequence of one variant, fused with a NLS, was operably linked to a sugarcane ubiquitin 4 gene promoter and an Agrobacterium tumefaciens nopaline synthase gene terminator, for strong constitutive expression in maize cells. In all constructs, a same gRNA array, driven by an Oryza sativa U6 promoter, was designed to express a gRNA targeting the maize gene Starch Branching Enzyme IIb (ZmSBEIIb) . The gRNA was based on the mature crRNA scaffold of LbCas12a. Transgenic maize plants were generated by infecting calli derived from immature maize embryos with Agrobacterium tumefaciens strain harboring one of the binary vectors described above, followed by tissue culture procedures.
The leaf sheaths of regenerated plantlets were sampled for DNA extraction, and the transgenic plants were identified by TaqMan qPCR assays. The sequence spanning the target site was PCR-amplified and Sanger-sequenced, in order to determine the genotype and the SDN1 efficiency at the target site. As summarized in Table 6 and 7, the SDN1 efficiencies at the SBEIIb target site and the Wx1 target site were compared. The SBEIIb target site is difficult to edit with Cas12a. In comparison with the wildtype, C965S alone slightly improved the overall SDN1 efficiency of SBEIIb but not the rate of homozygous/biallelic edits. Neither C10 nor C1090S exhibited a positive effect on top of C965S. In contrast, when paired with D156R, C965S increased the overall SDN1 efficiency by 4 folds over the wildtype, with more than half being homozygous or biallelic edits; in comparison, D156R alone increased the SDN1 efficiency at ZmSBEIIb target site by 3 folds, with about half being homozygous or biallelic edits. With respect to Wx1, SDN1 editing efficiencies were similar, if only modestly improved, over the wildtype.
Table 6. SDN1 editing efficiencies of LbCas12a variants in maize.
Table 7. SDN1 editing efficiencies of LbCas12a variants in maize.
Example 2. C965S acts synergistically with D156R to improve the efficiency of SDN1 editing in soybean
In order to assess the efficacy of C965S in improving SDN1-inducing efficiency of LbCas12a in soybean, two LbCas12a variants, LbCas12a-D156R and LbCas12a-D156R-C965S, will be compared for their SDN1-generating performance. The two variants are identical to those tested in maize as described in Example 1, except that the coding sequences were optimized based on Arabidopsis-preferred codon usage.
For each variant, two binary vectors were constructed to test the SDN1 efficiency at different target loci. In each construct, the coding sequence of one variant, fused with a NLS, is operably linked to a promoter, such as an Arabidopsis elongation factor 1 alpha (EF1α) promoter, and a terminator, such as an Agrobacterium tumefaciens nopaline synthase gene terminator, for strong constitutive expression in soybean cells. In all constructs, a gRNA or a gRNA array driven by a soybean ubiquitin 1 promoter, is designed to express gRNA (s) targeting a sitethe soybean genome, such as FAD2 (SEQ ID NO: 38 provides the LbCas12a gRNA targeting soybean FAD2-1A gene) . The gRNA (s) are based on the mature crRNA scaffold of LbCas12a, and are processed by self-cleaving ribozymes on the flanks. Transgenic soybean plants are generated by infecting mature soybean seeds with Agrobacterium tumefaciens strain harboring one of the binary vectors described above, followed by tissue culture procedures.
The leaves of regenerated plantlets will be sampled for DNA extraction, and the transgenic plants will be identified by TaqMan assays. The sequences spanning each of the three target sites will be PCR-amplified and Sanger-sequenced, in order to determine the genotypes and the SDN1 efficiencies at the target sites.
Example 3. Generation and identification of FnCas12a cysteine-substituted variants with enhanced in planta SDN1 editing efficiency.
A total of 9 cysteine residues (Cys70, Cys473, Cys568, Cys717, Cys882, Cys1086, Cys1116, Cys1190 and Cys1196) exist in the FnCas12a primary sequence (SEQ ID NO: 2) . The crystal structure of FnCas12a (PDB entries 5NFV and 6I1K) suggested four cysteine residues (Cys70, Cys473, Cys1116, and Cys1190) are most likely surface-exposed, and thus might be prone to undesired interactions and/or modifications. The surface topography around Cys473 suggests it was difficult for an interacting protein or a modification enzyme to access, while our PyMOL analysis suggest that Cys1116 and Cys1190 are likely to form intramolecular disulfide bond (FIG. 2) . Therefore Cys70, Cys1116, and Cys1190 were selected for substitution.
All cysteine-substituted variants were generated on the basis of FnCas12a-E184R variant. Three variants carrying single Cys-to-Ser substitution (FnCas12a-E184R-C70S, FnCas12a-E184R-C1116S, FnCas12a-E184R-C1190S) were generated, as well as three variants carrying double Cys-to-Ser substitutions (FnCas12a-E184R-C70S-C1116S, FnCas12a-E184R-C70S-C1190S, FnCas12a-E184R-C1116S-C1190S) . The coding sequence
of FnCas12a was optimized based on maize-preferred codon usage; the codon triplet of selected cysteine residues, TGC, was mutated to serine-coding TCC for C1116 and AGC for C70 and C1190 by introducing the mutation in overlapping PCR primers. For all six variants plus the FnCas12a-E184R as a control, an SV40 NLS (SEQ ID NO: 56) was fused to the N-terminus via a flexible, 30-amino acid (GSSSS) 6 (SEQ ID NO: 46) peptide linker, while two SV40 NLS’s separated by an 8-amino acid (SGGS) 2 (SEQ ID NO: 78) peptide linker were fused to the C-terminus via a flexible, 30-amino acid (GSSSS) 6 (SEQ ID NO: 46) as well.
For each of the six variants plus the FnCas12a-E184R control, a binary vector was constructed to express one variant (or the control) in stable transgenic maize plants, in order to assess the SDN1-generation performance of the variant. In each construct, the coding sequence of one variant, fused with a NLS, was operably linked to a sugarcane ubiquitin 4 gene promoter and an Agrobacterium tumefaciens nopaline synthase gene terminator, for strong constitutive expression in maize cells. In all constructs, a same gRNA array, driven by an Oryza sativa U6 promoter, was designed to express three gRNAs targeting three different maize genes: Waxy1 (ZmWx1) , Glossy2 (ZmGL2) , and Starch Branching Enzyme IIb (ZmSBEIIb) . The gRNAs were based on the mature crRNA scaffold of FnCas12a. Transgenic maize plants will be generated by infecting calli derived from immature maize embryos with Agrobacterium tumefaciens strain harboring one of the binary vectors described above, followed by tissue culture procedures.
The leaf sheath of regenerated plantlets will be sampled for DNA extraction, and the transgenic plants will be identified by TaqMan assays. The sequences spanning each of the three target sites will be PCR-amplified and Sanger-sequenced, in order to determine the genotypes and the SDN1 efficiencies at the target sites. Both the overall SDN1 editing efficiency and the rate of homozygous/biallelic mutants of each variant will be compared to those of the FnCas12a-E184R control, to assess to efficacy of the cysteine substitutions.
Example 4. Generation and identification of AsCas12a cysteine-substituted variants with enhanced in planta SDN1 editing efficiency.
A total of 8 cysteine residues (Cys65, Cys205, Cys334, Cys379, Cys608, Cys674, Cys1025, and Cys1248) exist in the AsCas12a primary sequence (SEQ ID NO: 3) . The crystal structure of AsCas12a (PDB entry 5KK5) suggests three cysteine residues: Cys334, Cys379, and Cys674 are most likely surface-exposed and thus prone to undesired interactions and/or modifications. These three residues were selected for substitution.
All cysteine-substituted variants were generated on the basis of AsCas12a-E174R variant. Three variants carrying single Cys-to-Ser substitution (AsCas12a-E174R-C334S, AsCas12a-E174R-C379S, AsCas12a-E174R-C674S) were generated, as well as three variants carrying double Cys-to-Ser substitutions (AsCas12a-E174R-C334S-C379S, AsCas12a-E174R-C334S-C674S, AsCas12a-E174R-C379S-C674S) . The coding sequence of AsCas12a was optimized based on maize-preferred codon usage; the codon triplet of selected cysteine residues, TGC, was mutated to serine-coding TCC by introducing the mutation in overlapping PCR primers. For all six variants plus the AsCas12a-E174R as a control, an SV40 NLS (SEQ ID NO: 56) was fused to the N-terminus via a flexible, 30-amino acid (GSSSS) 6 (SEQ ID NO: 46) peptide linker, while two SV40 NLS’s separated by an 8-amino acid (SGGS) 2 (SEQ ID NO: 78) peptide linker were fused to the C-terminus via a flexible 30-amino acid (GSSSS) 6 (SEQ ID NO: 46) peptide linker as well.
For each of the six variants plus the AsCas12a-E174R control, a binary vector was constructed to express one variant (or the control) in stable transgenic maize plants, in order to assess the SDN1-generation performance of the variant. In each construct, the coding sequence of one variant, fused with a NLS, was operably linked to a sugarcane ubiquitin 4 gene promoter and an Agrobacterium tumefaciens nopaline synthase gene terminator, for strong constitutive expression in maize cells. In all constructs, a same gRNA array, driven by an Oryza sativa U6 promoter, was designed to express three gRNAs targeting three different maize genes: Waxy1 (ZmWx1) , Glossy2 (ZmGL2) , and Starch Branching Enzyme IIb (ZmSBEIIb) . The gRNAs were based on the mature crRNA scaffold of AsCas12a. Transgenic maize plants will be generated by infecting calli derived from immature maize embryos with Agrobacterium tumafciens strain harboring one of the binary vectors described above, followed by tissue culture procedures.
The leaf sheath of regenerated plantlets will sampled for DNA extraction, and the transgenic plants will be identified by TaqMan assays. The sequences spanning each of the three target sites will be PCR-amplified and Sanger-sequenced, in order to determine the genotypes and the SDN1 efficiencies at the target sites. Both the overall SDN1 editing efficiency and the rate of homozygous/biallelic mutants of each variant will be compared to those of the AsCas12a-E174R control, to assess to efficacy of the cysteine substitutions.
Example 5. Generation and identification of Mb2Cas12a cysteine-substituted variants with enhanced in planta SDN1 editing efficiency.
Because there is no published crystal structure of Mb2Cas12a (from Moraxella bovoculi strain 57922) to date, the crystal structure of MbCas12a (PDB entry 6IV6) from M. bovoculi strain 22581, the closest ortholog sharing 94.7%amino acid identity with Mb2Cas12a, was used as a reference structure to estimate the location of the cysteine residues in Mb2Cas12a. A total of 8 cysteine residues (Cys270, Cys307, Cys583, Cys662, Cys1068, Cys1099, Cys1149, and Cys1162) exist in the primary sequence of the Mb2Cas12a from strain 57922 (SEQ ID NO: 4) , which correspond to Cys283, Cys320, Cys593, Cys672, Cys1078, Cys1109, Cys1159, and Tyr1172 in Mb2Cas12a from strain 22581, respectively. This estimation suggests Cys270, Cys307, Cys583, Cys1068, Cys1099, Cys1149 and Cys1162 are likely exposed on the surface of Mb2Cas12a. Since Cys1162 of Mb2Cas12a aligns to Tyr1172 in MbCas12a, Tyr1172 was mutated in 6IV6 and the structure was remodeled with PyMOL. The resulting structure model suggests Cys1162 is also likely surface-exposed in Mb2Cas12a. However, the surface topology suggested Cys1162 is difficult for an interacting protein or a modification enzyme to access. Therefore Cys270, Cys583, Cys1068, Cys1099, Cys1149 were selected for site directed mutagenesis.
All cysteine-substituted variants were generated on the basis of Mb2Cas12a-D172R variant, which was the control for the new variants. Five variants carrying single Cys-to-Ser substitution (Mb2Cas12a-D172R-C270S, Mb2Cas12a-D172R-C583S, Mb2Cas12a-D172R-C1068S, Mb2Cas12a-D172R-C1099S, Mb2Cas12a-D172R-C1149S) were generated, as well as one variant carrying quintuple Cys-to-Ser substitutions (Mb2Cas12a-D172R-C270S-C583S-C1068S-C1099S-C1149S) and one carrying quintuple Cys-to-Ala substitutions (Mb2Cas12a-D172R-C270A-C583A-C1068A-C1099A-C1149A) . The coding sequence of Mb2Cas12a was optimized based on maize-preferred codon usage; the codon triplet of selected cysteine residues, TGC, was mutated to serine-coding TCC by introducing the mutation in overlapping PCR primers for five single mutation variants. For the variants with quintuple mutations, Mb2Cas12a was synthesized through introducing serine-coding TCC to replace TGC or alanine-coding GCC. For all seven variants plus the Mb2Cas12a-D172R as a control, an SV40 NLS (SEQ ID NO: 56) was fused to the N-terminus via a flexible 30-amino acid (GSSSS) 6 (SEQ ID NO: 46) peptide linker, while two SV40 NLS’s separated by an 8-amino acid (SGGS) 2 (SEQ ID NO: 78) peptide linker were fused to the C-terminus via a flexible 30-amino acid (GSSSS) 6 (SEQ ID NO: 46) peptide linker as well.
For each of the six variants plus the Mb2Cas12a-D172R control, a binary vector was constructed to express one variant (or the control) in stable transgenic maize plants, in order to assess the SDN1-generation performance of the variant. In each construct, the coding sequence of one variant, fused with NLS, was operably linked to a sugarcane ubiquitin 4 gene promoter and an Agrobacterium tumefaciens nopaline synthase gene terminator, for strong constitutive expression in maize cells. In all constructs, a same gRNA array, driven by sugarcane ubiquitin 4 gene promoter and an Agrobacterium tumefaciens nopaline synthase gene terminator, was designed to express four gRNAs targeting four different maize genes: Waxy1 (ZmWx1) , Benzoxazinone synthesis 9 (ZmBx9) , Glossy2 (ZmGL2) , and ZmBINa. The gRNAs were based on the mature crRNA scaffold of LbCas12a and processed by self-cleaving ribozymes on the flanks. Transgenic maize plants were generated by infecting calli derived from immature maize embryos with Agrobacterium tumefaciens strain harboring one of the binary vectors described above, followed by tissue culture procedures.
The leaf sheath of regenerated plantlets were sampled for DNA extraction, and the transgenic plants were identified by TaqMan assays. The sequences spanning each of the three target sites were PCR-amplified and Sanger-sequenced, in order to determine the genotypes and the SDN1 efficiencies at the target sites. As summarized in Table 8, in comparison with the Mb2Cas12a-D172R control, all variants with single Cys-to-Ser mutation increased the rate of homozygous/biallelic mutants. The efficacy of stacking five cysteine mutations will be determined similarly.
Table 8. SDN1 editing efficiencies of Mb2Cas12a variants in maize.
LIST OF REFERNECED SEQUENCES
SEQ ID NO: 1 -Lachnospiraceae bacterium Cas12a protein (LbCas12a)
SEQ ID NO: 2 -Francisella novicida U112 Cas12a protein (FnCas12a)
SEQ ID NO: 3 -Acidaminococcus sp. Cas12a protein (AsCas12a)
SEQ ID NO: 4 -Moraxella bovoculi strain 57922 Cas12a protein (Mb2Cas12a)
SEQ ID NO: 5 –amino acid sequence of LbCas12a + linker:
SEQ ID NO: 6 --amino acid sequence of LbCas12a D156R:
SEQ ID NO: 7 --amino acid sequence of LbCas12a + D156R + C965S:
SEQ ID NO: 8 --amino acid sequence of LbCas12a + C10S + C965S:
SEQ ID NO: 9 --amino acid sequence of LbCas12a + C965S + C1090S:
SEQ ID NO: 10 --amino acid sequence of LbCas12a + linker + D156R:
SEQ ID NO: 11 --amino acid sequence of LbCas12a + linker + D156R + C965S:
SEQ ID NO: 12 --amino acid sequence of Mb2Cas12a + linker + D172R:
SEQ ID NO: 13 --amino acid sequence of Mb2Cas12a + linker + D172R + C270S:
SEQ ID NO: 14 --amino acid sequence of Mb2Cas12a + linker + D172R + C583S:
SEQ ID NO: 15 --amino acid sequence of Mb2Cas12a + linker + D172R + C1068S:
SEQ ID NO: 16 --amino acid sequence of Mb2Cas12a + linker + D172R + C1099S:
SEQ ID NO: 17 --amino acid sequence of Mb2Cas12a + linker + D172R + C1149S:
SEQ ID NO: 18 --amino acid sequence of Mb2Cas12a + linker + D172R + C270S + C583S + C1068S + C1099S + C1149S:
SEQ ID NO: 19 --amino acid sequence of Mb2Cas12a + linker + D172R + C270A +C583A + C1068A+ C1099A+ C1149A:
SEQ ID NO: 20 --nucleic acid sequence encoding LbCas12a + linker, maize codon-optimized:
SEQ ID NO: 21 -nucleic acid sequence encoding LbCas12a + linker + D156R, maize codon-optimized:
SEQ ID NO: 22 --nucleic acid sequence encoding LbCas12a + linker + D156R + C965S, maize codon-optimized:
SEQ ID NO: 23 --nucleic acid sequence encoding LbCas12a + linker + C10S + C965S, maize codon-optimized:
SEQ ID NO: 24 --nucleic acid sequence encoding LbCas12a + linker + C965S + C1090S, maize codon-optimized:
SEQ ID NO: 25 --nucleic acid sequence encoding LbCas12a + linker + D156R, Arabidopsis codon-optimized:
SEQ ID NO: 26 --nucleic acid sequence encoding LbCas12a + linker + D156R + C965S, Arabidopsis codon-optimized:
SEQ ID NO: 27 --nucleic acid sequence encoding Mb2Cas12a + linker + D172R, maize codon-optimized:
SEQ ID NO: 28 --nucleic acid sequence encoding Mb2Cas12a + linker + D172R + C270S, maize codon-optimized:
SEQ ID NO: 29 --nucleic acid sequence encoding Mb2Cas12a + linker + D172R + C583S, maize codon-optimized:
SEQ ID NO: 30 --nucleic acid sequence encoding Mb2Cas12a + linker + D172R +C1068S, maize codon-optimized:
SEQ ID NO: 31 --nucleic acid sequence encoding Mb2Cas12a + linker + D172R +C1099S, maize codon-optimized:
SEQ ID NO: 32 --nucleic acid sequence encoding Mb2Cas12a + linker + D172R +C1149S, maize codon-optimized:
SEQ ID NO: 33 --nucleic acid sequence encoding Mb2Cas12a + linker + D172R + C270S + C583S + C1068S + C1099S + C1149S, maize codon-optimized:
SEQ ID NO: 34 --nucleic acid sequence encoding Mb2Cas12a + linker + D172R + C270A + C583A + C1068A+ C1099A+ C1149A, maize codon-optimized:
All patents, patent publications, patent applications, journal articles, books, technical references, and the like discussed in the instant disclosure are incorporated herein by reference in their entirety for all purposes.
It is to be understood that the figures and descriptions of the disclosure have been simplified to illustrate elements that are relevant for a clear understanding of the disclosure. It should be appreciated that the figures are presented for illustrative purposes and not as construction drawings. Omitted details and modifications or alternative embodiments are within the purview of persons of ordinary skill in the art.
It can be appreciated that, in certain aspects of the disclosure, a single component may be replaced by multiple components, and multiple components may be replaced by a single component, to provide an element or structure or to perform a given function or functions. Except where such substitution would not be operative to practice certain embodiments of the disclosure, such substitution is considered within the scope of the disclosure.
The examples presented herein are intended to illustrate potential and specific implementations of the disclosure. It can be appreciated that the examples are intended primarily for purposes of illustration of the disclosure for those skilled in the art. There may be variations to these diagrams or the operations described herein without departing from the spirit of the disclosure. For instance, in certain cases, method steps or operations may be performed or executed in differing order, or operations may be added, deleted or modified.
Where a range of values is provided, it is understood that each intervening value, to the smallest fraction of the unit of the lower limit, unless the context clearly dictates otherwise, between the upper and lower limits of that range is also specifically disclosed. Any narrower range between any stated values or unstated intervening values in a stated range and any other stated or intervening value in that stated range is encompassed. The upper and lower limits of those smaller ranges may independently be included or excluded in the range, and each range where either, neither, or both limits are included in the smaller ranges is also encompassed within the technology, subject to any specifically excluded limit
in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included.
In the foregoing description, numerous specific details are set forth to provide a more thorough understanding of the present invention. However, it will be apparent to one of skill in the art that the invention described in this disclosure may be practiced without one or more of these specific details. In other instances, well-known features and procedures well known to those skilled in the art have not been described in order to avoid obscuring the invention. Embodiments of the disclosure have been described for illustrative and not restrictive purposes. Although the present invention is described primarily with reference to specific embodiments, it is also envisioned that other embodiments will become apparent to those skilled in the art upon reading the present disclosure, and it is intended that such embodiments be contained within the present inventive methods. Accordingly, the present disclosure is not limited to the embodiments described above or depicted in the drawings, and various embodiments and modifications can be made without departing from the scope of the claims below.
Claims (31)
- A Cas12a protein comprising a sequence that is at least 80%identical to the amino acid sequence of SEQ ID NO: 1 and a human-induced mutation at position C965.
- The Cas12a protein of claim 1, wherein the human-induced mutation is a cysteine to serine substitution.
- The Cas12a protein of claim 1 or 2, further comprising a human-induced mutation at position D156.
- The Cas12a protein of claim 3, wherein the human-induced mutation at position D156 is an aspartic acid to arginine substitution.
- The Cas12a protein of any one of claims 1 to 4, wherein the sequence comprises any one of SEQ ID NOs: 5-11.
- A Cas12a protein comprising a sequence that is at least 80%identical to the amino acid sequence of SEQ ID NO: 2 and a human-induced mutation at position C70, C1116, and/or C1190.
- The Cas12a protein of claim 6, wherein the human-induced mutation is a cysteine to serine substitution.
- The Cas12a protein of claim 6 or 7, further comprising a human-induced mutation at position E184.
- The Cas12a protein of claim 8, wherein the human-induced mutation at position E184 is a glutamic acid to arginine substitution.
- A Cas12a protein comprising a sequence that is at least 80%identical to the amino acid sequence of SEQ ID NO: 3 and a human-induced mutation at position C334, C379, and/or C674.
- The Cas12a protein of claim 10, wherein the human-induced mutation is a cysteine to serine substitution.
- The Cas12a protein of claim 10 or 11, further comprising a human-induced mutation at position E174.
- The Cas12a protein of claim 12, wherein the human-induced mutation at position E174 is a glutamic acid to arginine substitution.
- A Cas12a protein comprising a sequence that is at least 80%identical to the amino acid sequence of SEQ ID NO: 4 and a human-induced mutation at position C270, C583, C1068, C1099, and/or C1149.
- The Cas12a protein of claim 14, wherein the human-induced mutation is a cysteine to serine substitution.
- The Cas12a protein of claim 14 or 15, further comprising a human-induced mutation at position D172.
- The Cas12a protein of claim 16, wherein the human-induced mutation at position D172 is an aspartic acid to arginine substitution.
- The Cas12a protein of any one of claims 14 to 17, wherein the sequence comprises any one of SEQ ID NOs: 12-19.
- The Cas12a protein of any one of claims 1 to 18, wherein the Cas12a protein is a catalytically dead Cas12a (dCas12a) protein of a nickase Cas12a (nCas12a) protein.
- The Cas12a protein of any one of claims 1 to 19, further comprising a nuclear localization signal.
- A fusion protein comprising the Cas12a protein of any one of claims 1 to 20 and a heterologous domain.
- The fusion protein of claim 21, wherein the heterologous domain is a deaminase domain, a transcription factor domain, a nuclease domain, a reverse-transcriptase domain, a transposase domain, a integrase domain, a uracil DNA glycosylase inhibitor domain, a recombinase domain, a nickase domain, a methyltransferase domain, a methylase domain, an acetylase domain, an acetyltransferase domain, a transcriptional activator domain, or a transcriptional repressor domain.
- The fusion protein of claim 21 or 22, wherein the Cas12a protein is linked to the heterologous domain by a linker sequence.
- A nucleic acid encoding the Cas12a protein of any one of claims 1 to 20 or the fusion protein of any one of claims 21 to 23.
- The nucleic acid of claim 24, wherein the nucleic acid sequence is any one of SEQ ID NOs: 20-34.
- A DNA construct comprising a promoter operably linked to the nucleic acid of claim 24 or 25.
- A vector comprising the nucleic acid of claim 24 or 25 or the DNA construct of claim 26.
- A cell comprising the nucleic acid of claim 24, the DNA construct of claim 26, or the vector of claim 27.
- The cell of claim 28, wherein the cell is a plant cell.
- The cell of claim 29, wherein the cell is a maize plant cell, a wheat plant cell, a rice plant cell, a soybean plant cell, a sunflower plant cell, or a tomato plant cell.
- A method of editing a nucleic acid, the method comprising:contacting the nucleic acid with (i) the Cas12a protein of any one of claims 1 to 20 or the fusion protein of any one of claims 21 to 23 and (ii) a guide RNA having a region complementary to a selected portion of the nucleic acid, thereby resulting in an edit to the nucleic acid.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2023/073486 WO2024156084A1 (en) | 2023-01-27 | 2023-01-27 | Variants of cpf1 (cas12a) with improved activity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2023/073486 WO2024156084A1 (en) | 2023-01-27 | 2023-01-27 | Variants of cpf1 (cas12a) with improved activity |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024156084A1 true WO2024156084A1 (en) | 2024-08-02 |
Family
ID=91969831
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/073486 WO2024156084A1 (en) | 2023-01-27 | 2023-01-27 | Variants of cpf1 (cas12a) with improved activity |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024156084A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180187176A1 (en) * | 2016-11-22 | 2018-07-05 | Integrated Dna Technologies, Inc. | Crispr/cpf1 systems and methods |
US20190010481A1 (en) * | 2017-04-21 | 2019-01-10 | The General Hospital Corporation | Variants of CPF1 (CAS12a) With Altered PAM Specificity |
CN111417727A (en) * | 2017-05-18 | 2020-07-14 | 博德研究所 | Systems, methods, and compositions for targeted nucleic acid editing |
-
2023
- 2023-01-27 WO PCT/CN2023/073486 patent/WO2024156084A1/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180187176A1 (en) * | 2016-11-22 | 2018-07-05 | Integrated Dna Technologies, Inc. | Crispr/cpf1 systems and methods |
US20190010481A1 (en) * | 2017-04-21 | 2019-01-10 | The General Hospital Corporation | Variants of CPF1 (CAS12a) With Altered PAM Specificity |
CN111417727A (en) * | 2017-05-18 | 2020-07-14 | 博德研究所 | Systems, methods, and compositions for targeted nucleic acid editing |
Non-Patent Citations (3)
Title |
---|
DATABASE Protein 13 October 2019 (2019-10-13), ANONYMOUS: "type V CRISPR-associated protein Cas12a/Cpf1 [Moraxella bovoculi]", XP093193995, retrieved from NCBI Database accession no. WP_046697655.1 * |
DATABASE Protein 25 January 2022 (2022-01-25), ANONYMOUS: "type V CRISPR-associated protein Cas12a/Cpf1 [Paracoccus salsus]", XP093193992, retrieved from NCBI Database accession no. WP_235757406.1 * |
ZHANG YINGXIAO, REN QIURONG, TANG XU, LIU SHISHI, MALZAHN AIMEE A., ZHOU JIANPING, WANG JIAHENG, YIN DESUO, PAN CHANGTIAN, YUAN MI: "Expanding the scope of plant genome engineering with Cas12a orthologs and highly multiplexable editing systems", NATURE COMMUNICATIONS, NATURE PUBLISHING GROUP, UK, vol. 12, no. 1, 29 March 2021 (2021-03-29), UK, pages 1944, XP093193990, ISSN: 2041-1723, DOI: 10.1038/s41467-021-22330-w * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11555181B2 (en) | Engineered cascade components and cascade complexes | |
US11001843B2 (en) | Engineered nucleic acid-targeting nucleic acids | |
US11293011B2 (en) | CRISPR-associated (CAS) protein | |
EP3902911B1 (en) | Polypeptides useful for gene editing and methods of use | |
US20200308571A1 (en) | Adenine dna base editor variants with reduced off-target rna editing | |
Wang et al. | Egg cell-specific promoter-controlled CRISPR/Cas9 efficiently generates homozygous mutants for multiple target genes in Arabidopsis in a single generation | |
AU2016334225B2 (en) | Novel RNA-guided nucleases and uses thereof | |
US20160362667A1 (en) | CRISPR-Cas Compositions and Methods | |
WO2022253185A1 (en) | Cas12 protein, gene editing system containing cas12 protein, and application | |
JP2020521446A (en) | Bipartite base editor (BBE) structure and type II-C-CAS9 zinc finger editing | |
IL257307A (en) | Engineered crispr-cas9 compositions and methods of use | |
KR20190008998A (en) | Crispr hybrid dna/rna polynucleotides and methods of use | |
CN112029787A (en) | Nuclease-mediated DNA Assembly | |
JP2015500648A (en) | Compositions and methods for modifying a given target nucleic acid sequence | |
WO2019127087A1 (en) | System and method for genome editing | |
EP4314266A1 (en) | Dna modifyng enzymes and active fragments and variants thereof and methods of use | |
WO2024156084A1 (en) | Variants of cpf1 (cas12a) with improved activity | |
WO2024187310A1 (en) | Cas fusion proteins and associated methods for site specific integration | |
EP4271805A1 (en) | Novel nucleic acid-guided nucleases | |
WO2023250475A2 (en) | Cas exonuclease fusion proteins and associated methods for excision, inversion, and site specific integration | |
Huhdanmäki | CRISPR-Cas9 based genetic engineering and mutation detection in genus Nicotiana | |
WO2024158864A1 (en) | Mb2cas12a variants with enhanced efficiency | |
BASE | Adenine Dna Base Editor Variants With Reduced Off-target Rna Editing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23918091 Country of ref document: EP Kind code of ref document: A1 |