CN116694603A - Novel Cas protein, crispr-Cas system and use thereof in the field of gene editing - Google Patents
Novel Cas protein, crispr-Cas system and use thereof in the field of gene editing Download PDFInfo
- Publication number
- CN116694603A CN116694603A CN202310742030.9A CN202310742030A CN116694603A CN 116694603 A CN116694603 A CN 116694603A CN 202310742030 A CN202310742030 A CN 202310742030A CN 116694603 A CN116694603 A CN 116694603A
- Authority
- CN
- China
- Prior art keywords
- cas
- protein
- crispr
- sequence
- seq
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 161
- 102000004169 proteins and genes Human genes 0.000 title claims abstract description 144
- 238000010362 genome editing Methods 0.000 title claims abstract description 32
- 210000004027 cell Anatomy 0.000 claims description 66
- 108020004414 DNA Proteins 0.000 claims description 53
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims description 22
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 19
- 108091028113 Trans-activating crRNA Proteins 0.000 claims description 19
- 150000007523 nucleic acids Chemical group 0.000 claims description 17
- 239000013604 expression vector Substances 0.000 claims description 15
- 210000004102 animal cell Anatomy 0.000 claims description 7
- 108091092236 Chimeric RNA Proteins 0.000 claims description 5
- 210000003527 eukaryotic cell Anatomy 0.000 claims description 2
- 125000003275 alpha amino acid group Chemical group 0.000 claims 1
- 238000003745 diagnosis Methods 0.000 claims 1
- 201000010099 disease Diseases 0.000 claims 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims 1
- 230000000694 effects Effects 0.000 description 46
- 238000003776 cleavage reaction Methods 0.000 description 41
- 230000007017 scission Effects 0.000 description 35
- 101100058739 Arabidopsis thaliana BZR2 gene Proteins 0.000 description 32
- 108091079001 CRISPR RNA Proteins 0.000 description 31
- 108091033409 CRISPR Proteins 0.000 description 29
- 101710163270 Nuclease Proteins 0.000 description 28
- 150000001413 amino acids Chemical class 0.000 description 25
- 239000000758 substrate Substances 0.000 description 21
- 238000013518 transcription Methods 0.000 description 21
- 230000035897 transcription Effects 0.000 description 21
- 102000053602 DNA Human genes 0.000 description 20
- 239000012636 effector Substances 0.000 description 18
- 230000008685 targeting Effects 0.000 description 15
- 101000910035 Streptococcus pyogenes serotype M1 CRISPR-associated endonuclease Cas9/Csn1 Proteins 0.000 description 14
- 238000000338 in vitro Methods 0.000 description 14
- 239000000047 product Substances 0.000 description 14
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 12
- 238000002474 experimental method Methods 0.000 description 12
- 239000000203 mixture Substances 0.000 description 12
- 239000013612 plasmid Substances 0.000 description 12
- 239000000872 buffer Substances 0.000 description 11
- 238000006243 chemical reaction Methods 0.000 description 11
- 108020005004 Guide RNA Proteins 0.000 description 10
- 230000015572 biosynthetic process Effects 0.000 description 10
- 210000005260 human cell Anatomy 0.000 description 10
- 238000003786 synthesis reaction Methods 0.000 description 10
- 239000013598 vector Substances 0.000 description 10
- 239000002609 medium Substances 0.000 description 9
- 238000002360 preparation method Methods 0.000 description 9
- 238000001514 detection method Methods 0.000 description 8
- 238000000034 method Methods 0.000 description 8
- 239000002096 quantum dot Substances 0.000 description 8
- 238000012163 sequencing technique Methods 0.000 description 7
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 6
- 230000001580 bacterial effect Effects 0.000 description 6
- RAXXELZNTBOGNW-UHFFFAOYSA-N imidazole Natural products C1=CNC=N1 RAXXELZNTBOGNW-UHFFFAOYSA-N 0.000 description 6
- 239000007788 liquid Substances 0.000 description 6
- 238000000746 purification Methods 0.000 description 6
- 239000011780 sodium chloride Substances 0.000 description 6
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 5
- 108090000790 Enzymes Proteins 0.000 description 5
- 102000004190 Enzymes Human genes 0.000 description 5
- 238000013461 design Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 5
- 230000014509 gene expression Effects 0.000 description 5
- 229930027917 kanamycin Natural products 0.000 description 5
- 229960000318 kanamycin Drugs 0.000 description 5
- SBUJHOSQTJFQJX-NOAMYHISSA-N kanamycin Chemical compound O[C@@H]1[C@@H](O)[C@H](O)[C@@H](CN)O[C@@H]1O[C@H]1[C@H](O)[C@@H](O[C@@H]2[C@@H]([C@@H](N)[C@H](O)[C@@H](CO)O2)O)[C@H](N)C[C@@H]1N SBUJHOSQTJFQJX-NOAMYHISSA-N 0.000 description 5
- 229930182823 kanamycin A Natural products 0.000 description 5
- 238000001890 transfection Methods 0.000 description 5
- 108091003079 Bovine Serum Albumin Proteins 0.000 description 4
- 238000007400 DNA extraction Methods 0.000 description 4
- 108091092584 GDNA Proteins 0.000 description 4
- 241001148134 Veillonella Species 0.000 description 4
- 230000003321 amplification Effects 0.000 description 4
- 238000000137 annealing Methods 0.000 description 4
- ZYWFEOZQIUMEGL-UHFFFAOYSA-N chloroform;3-methylbutan-1-ol;phenol Chemical compound ClC(Cl)Cl.CC(C)CCO.OC1=CC=CC=C1 ZYWFEOZQIUMEGL-UHFFFAOYSA-N 0.000 description 4
- 238000004925 denaturation Methods 0.000 description 4
- 230000036425 denaturation Effects 0.000 description 4
- 238000001976 enzyme digestion Methods 0.000 description 4
- 239000012091 fetal bovine serum Substances 0.000 description 4
- 239000001963 growth medium Substances 0.000 description 4
- 230000003834 intracellular effect Effects 0.000 description 4
- 238000005259 measurement Methods 0.000 description 4
- 238000003199 nucleic acid amplification method Methods 0.000 description 4
- 238000003752 polymerase chain reaction Methods 0.000 description 4
- 241000894007 species Species 0.000 description 4
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 3
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 3
- 241000305071 Enterobacterales Species 0.000 description 3
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 3
- 101001000998 Homo sapiens Protein phosphatase 1 regulatory subunit 12C Proteins 0.000 description 3
- 238000012408 PCR amplification Methods 0.000 description 3
- 108020004682 Single-Stranded DNA Proteins 0.000 description 3
- 239000011543 agarose gel Substances 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 238000003556 assay Methods 0.000 description 3
- 238000004113 cell culture Methods 0.000 description 3
- 238000005520 cutting process Methods 0.000 description 3
- 239000012149 elution buffer Substances 0.000 description 3
- 230000000968 intestinal effect Effects 0.000 description 3
- 238000011068 loading method Methods 0.000 description 3
- 239000002773 nucleotide Substances 0.000 description 3
- 125000003729 nucleotide group Chemical group 0.000 description 3
- 238000011002 quantification Methods 0.000 description 3
- 125000006850 spacer group Chemical group 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- NMEHNETUFHBYEG-IHKSMFQHSA-N tttn Chemical group C([C@@H](C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC=1NC=NC=1)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](C)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(=O)NCC(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N1[C@@H](CCC1)C(=O)NCC(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCSC)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@H](CC(O)=O)NC(=O)[C@@H](NC(=O)[C@H]1N(CCC1)C(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](N)[C@@H](C)O)[C@@H](C)O)C1=CC=CC=C1 NMEHNETUFHBYEG-IHKSMFQHSA-N 0.000 description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 3
- 238000001353 Chip-sequencing Methods 0.000 description 2
- 239000006144 Dulbecco’s modified Eagle's medium Substances 0.000 description 2
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 2
- ZHNUHDYFZUAESO-UHFFFAOYSA-N Formamide Chemical compound NC=O ZHNUHDYFZUAESO-UHFFFAOYSA-N 0.000 description 2
- 101100219625 Mus musculus Casd1 gene Proteins 0.000 description 2
- 102100035620 Protein phosphatase 1 regulatory subunit 12C Human genes 0.000 description 2
- 101800005109 Triakontatetraneuropeptide Proteins 0.000 description 2
- 241000700605 Viruses Species 0.000 description 2
- 238000001042 affinity chromatography Methods 0.000 description 2
- 238000000246 agarose gel electrophoresis Methods 0.000 description 2
- 101150055766 cat gene Proteins 0.000 description 2
- 239000007795 chemical reaction product Substances 0.000 description 2
- 230000003749 cleanliness Effects 0.000 description 2
- 238000010367 cloning Methods 0.000 description 2
- 238000001962 electrophoresis Methods 0.000 description 2
- 238000010828 elution Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000001952 enzyme assay Methods 0.000 description 2
- 239000006167 equilibration buffer Substances 0.000 description 2
- 239000000499 gel Substances 0.000 description 2
- 230000006801 homologous recombination Effects 0.000 description 2
- 238000002744 homologous recombination Methods 0.000 description 2
- 238000001727 in vivo Methods 0.000 description 2
- 239000012160 loading buffer Substances 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000006780 non-homologous end joining Effects 0.000 description 2
- YBYRMVIVWMBXKQ-UHFFFAOYSA-N phenylmethanesulfonyl fluoride Chemical compound FS(=O)(=O)CC1=CC=CC=C1 YBYRMVIVWMBXKQ-UHFFFAOYSA-N 0.000 description 2
- 239000012264 purified product Substances 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 239000000243 solution Substances 0.000 description 2
- 239000006228 supernatant Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- GOJUJUVQIVIZAV-UHFFFAOYSA-N 2-amino-4,6-dichloropyrimidine-5-carbaldehyde Chemical group NC1=NC(Cl)=C(C=O)C(Cl)=N1 GOJUJUVQIVIZAV-UHFFFAOYSA-N 0.000 description 1
- 241000203069 Archaea Species 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 238000010453 CRISPR/Cas method Methods 0.000 description 1
- 108700004991 Cas12a Proteins 0.000 description 1
- 101100329224 Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003) cpf1 gene Proteins 0.000 description 1
- 102000012410 DNA Ligases Human genes 0.000 description 1
- 108010061982 DNA Ligases Proteins 0.000 description 1
- 241000588724 Escherichia coli Species 0.000 description 1
- 101710193865 Exodeoxyribonuclease 1 Proteins 0.000 description 1
- 239000012880 LB liquid culture medium Substances 0.000 description 1
- 241000713666 Lentivirus Species 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 108091034117 Oligonucleotide Proteins 0.000 description 1
- 101000702488 Rattus norvegicus High affinity cationic amino acid transporter 1 Proteins 0.000 description 1
- 102000006382 Ribonucleases Human genes 0.000 description 1
- 108010083644 Ribonucleases Proteins 0.000 description 1
- 108091027544 Subgenomic mRNA Proteins 0.000 description 1
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000005349 anion exchange Methods 0.000 description 1
- 239000003242 anti bacterial agent Substances 0.000 description 1
- 229940088710 antibiotic agent Drugs 0.000 description 1
- 238000007664 blowing Methods 0.000 description 1
- 238000011095 buffer preparation Methods 0.000 description 1
- 229940041514 candida albicans extract Drugs 0.000 description 1
- 101150059443 cas12a gene Proteins 0.000 description 1
- 238000005341 cation exchange Methods 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012258 culturing Methods 0.000 description 1
- 238000000502 dialysis Methods 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- 239000003085 diluting agent Substances 0.000 description 1
- 238000010790 dilution Methods 0.000 description 1
- 239000012895 dilution Substances 0.000 description 1
- 230000005782 double-strand break Effects 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 239000002158 endotoxin Substances 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 230000008014 freezing Effects 0.000 description 1
- 238000007710 freezing Methods 0.000 description 1
- 108020001507 fusion proteins Proteins 0.000 description 1
- 102000037865 fusion proteins Human genes 0.000 description 1
- 238000001502 gel electrophoresis Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 239000003292 glue Substances 0.000 description 1
- 238000003306 harvesting Methods 0.000 description 1
- 230000003053 immunization Effects 0.000 description 1
- 238000002649 immunization Methods 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 238000011081 inoculation Methods 0.000 description 1
- 238000004255 ion exchange chromatography Methods 0.000 description 1
- BPHPUYQFMNQIOC-NXRLNHOXSA-N isopropyl beta-D-thiogalactopyranoside Chemical compound CC(C)S[C@@H]1O[C@H](CO)[C@H](O)[C@H](O)[C@H]1O BPHPUYQFMNQIOC-NXRLNHOXSA-N 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000001404 mediated effect Effects 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 230000000813 microbial effect Effects 0.000 description 1
- 244000005700 microbiome Species 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 239000002077 nanosphere Substances 0.000 description 1
- 229910052754 neon Inorganic materials 0.000 description 1
- GKAOGPIIYCISHV-UHFFFAOYSA-N neon atom Chemical compound [Ne] GKAOGPIIYCISHV-UHFFFAOYSA-N 0.000 description 1
- 102000039446 nucleic acids Human genes 0.000 description 1
- 108020004707 nucleic acids Proteins 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 230000000149 penetrating effect Effects 0.000 description 1
- 238000013492 plasmid preparation Methods 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000000751 protein extraction Methods 0.000 description 1
- 238000001742 protein purification Methods 0.000 description 1
- 239000011535 reaction buffer Substances 0.000 description 1
- 238000003259 recombinant expression Methods 0.000 description 1
- 239000003161 ribonuclease inhibitor Substances 0.000 description 1
- 238000007363 ring formation reaction Methods 0.000 description 1
- 238000012772 sequence design Methods 0.000 description 1
- 238000002415 sodium dodecyl sulfate polyacrylamide gel electrophoresis Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000011550 stock solution Substances 0.000 description 1
- 239000012536 storage buffer Substances 0.000 description 1
- 238000007725 thermal activation Methods 0.000 description 1
- 239000012137 tryptone Substances 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 239000011534 wash buffer Substances 0.000 description 1
- 239000012138 yeast extract Substances 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases RNAses, DNAses
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/113—Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/87—Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
- C12N15/90—Stable introduction of foreign DNA into chromosome
- C12N15/902—Stable introduction of foreign DNA into chromosome using homologous recombination
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A50/00—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE in human health protection, e.g. against extreme weather
- Y02A50/30—Against vector-borne diseases, e.g. mosquito-borne, fly-borne, tick-borne or waterborne diseases whose impact is exacerbated by climate change
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Wood Science & Technology (AREA)
- Biomedical Technology (AREA)
- Organic Chemistry (AREA)
- Molecular Biology (AREA)
- Zoology (AREA)
- General Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Plant Pathology (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Mycology (AREA)
- Medicinal Chemistry (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
- Peptides Or Proteins (AREA)
Abstract
The invention relates to the field of gene editing, in particular to a novel Cas protein, a Crispr-Cas system and application thereof in the field of gene editing. The novel Cas protein is selected from at least one of the following: SEQ ID NO. 1-SEQ ID NO. 4; the sequence similarity is 85% or more, preferably 90% or more, compared with any one of SEQ ID NO 1 to SEQ ID NO 4. The novel Cas protein provided by the invention can be used for a Crispr-Cas system, and can be used for editing genes. It can edit more target sites and is easier to deliver into cells for editing without causing off-target.
Description
Technical Field
The invention relates to the field of gene editing, in particular to a novel Cas protein, a Crispr-Cas system and application thereof in the field of gene editing.
Background
CRISPR (Clustered regularly interspaced short palindromic repeats), called regular clustered interval short palindromic repeats, is in fact a gene editor, a natural immunization modality in most bacteria and archaea. By analysis of flanking sequences of the CRISPR cluster, it was found that there is a polymorphic family gene in its vicinity and co-acts with the CRISPR region and is therefore named CRISPR associated gene (CRISPR associated), abbreviated Cas. Most CRISPR-Cas systems contain Cas1 proteins, and Cas1 is a more conserved protein in the Cas family. Depending on the structure of the effector module, the CRISPR-Cas systems currently being discovered are mainly of two types: class1 is a complex containing multiple Cas proteins and having multiple effector proteins (effectors) acting together, mainly including Type I, type III and Type IV; class2 contains only one large effector protein, including Type II, type V and Type VI. Currently, class2 includes Cas9 systems (Type ii) and Cpf1 (Type v) systems, and is widely used in gene editing applications.
However, the Crispr-Cas system still suffers from several drawbacks, such as the possible occurrence of gene off-target, and its limited range of applications, and further improvements are needed.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems in the related art to some extent. To this end, an object of the present invention is to propose a novel Cas protein, a Crispr-Cas system and its use in the field of gene editing.
The CRISPR/Cas system is a commonly used system for gene editing that can be successfully applied to the precise editing of animal and plant genomes. The system is used for targeted recognition of DNA double-strand specific sites by RNA mediation and cleavage by nuclease, and is generally used for Cas9 nuclease and Cpf1 nuclease more widely. The Cas9 nuclease and the Cpf1 nuclease recognize DNA double-strand specific sites through RNA mediated targeting and cut, so that DNA double-strand breaks are caused, and cells are repaired through NHEJ (nonhomologous end joining) or HR (homologous recombination), so that the site-specific modification of target genes is realized. One Cas9 nuclease that is widely used commercially is the SpCas9 nuclease, which recognizes the PAM sequence as NGG, is located at the 3' end of the targeting sequence, and cleaves at 3bp from the PAM sequence to form a blunt end. LbCPf1 is a Cpf1 nuclease of wide commercial application that recognizes the PAM site as a TTTN sequence 5' to the targeting sequence and cleaves distally to form a cohesive end.
During the course of the study it was found that: both SpCas9 and LbCpf1 have relatively stringent PAM sequences, limiting the design of the targeting sites. Furthermore, the SpCas9 protein and the LbCpf1 protein are composed of 1368 and 1228 amino acids, respectively, and are too large to be packaged and delivered by AAV viruses, which limits the application thereof in animal cells to a certain extent. And the targeted sequence of SpCas9 is 20bp, and similar sequences are easy to appear in the whole genome, so that off-target is caused.
Find novel useful Cas proteins that make their protein length smaller, thereby allowing for convenient packaging and delivery, further expanding their application in the field of animal cells. Moreover, the Crispr-Cas system is not easy to cause off-target, and is of great importance.
For this reason, we have studied to find a variety of novel Cas proteins, which are shorter in protein length, that can be more easily delivered to cells for editing when used in a Crispr-Cas system. And is less prone to off-target. Taking BES1 protein obtained on human enterobacteria Veillonella sp AF-2 (abbreviated as AF 13-2) as an example, the PAM protein is used for a Crispr-Cas system, the identified PAM sequences are lower in specificity than the commercial SpCas9 and LbCPf1, and the target sites for editing the Cas protein are more potential. Furthermore, the BES1 protein consists of only 1064 amino acids, and is more easily delivered to cells for editing. The targeting sequence of SpCas9 is 20bp, and the targeting sequence of our BES1 is 23bp, which is potentially less likely to cause off-target than SpCas 9.
Specifically, the invention provides the following technical scheme:
according to a first aspect of the present invention, there is provided a Cas protein selected from at least one of the following: SEQ ID NO. 1-SEQ ID NO. 4; the sequence similarity is 85% or more, preferably 90% or more, compared with any one of SEQ ID NO 1 to SEQ ID NO 4. The novel Cas proteins SEQ ID NO. 1-SEQ ID NO. 4 are obtained through biological information technology screening, and are verified through molecular biological technology, any one of the Cas proteins is easy to be delivered into cells for gene editing. And the PAM sequence identified by the target sequence has proper specificity, so that more target sites can be edited, the length of the target sequence is proper, and off-target is not easy to cause. Compared with any one of the proteins shown in SEQ ID No. 1-SEQ ID No. 4, the sequence similarity is more than 85%, such as more than 86%, more than 87%, more than 88%, more than 89%, preferably more than 90%, such as more than 91%, more than 92%, more than 93%, and more than 94%, and the protein has the same or similar activity and function as the Cas protein shown in the SEQ ID No. 1-SEQ ID No. 4, is also easy to be delivered into cells for gene editing, has more editable target sites, has more proper sequence length to be targeted and is less prone to cause off-target.
According to an embodiment of the present invention, the Cas protein described above may further include the following technical features:
in some embodiments of the invention, the sequence similarity is 95% or more, preferably 96% or more, more preferably 97% or more, more preferably 98% or more, most preferably 99% or more, as compared to any of SEQ ID NOs 1 to 4. Compared with any one protein of SEQ ID NO. 1-SEQ ID NO. 4, the sequence similarity is more than 95%, preferably more than 96%, 97%, 98%, 99% and 99.5% of the protein has the same or similar activity as the Cas protein, is easy to be delivered into cells for gene editing, has more editable target sites, is more suitable in the length of the targeted sequence, and is not easy to cause off-target.
In some embodiments of the invention, the Cas protein is a Cas protein having nuclease activity with one or more amino acids substituted, deleted, or added as compared to any one of SEQ ID NOs 1 to 4. These proteins have the same or similar nuclease activity as the proteins shown in SEQ ID No. 1-SEQ ID No. 4, are also easily delivered into cells for gene editing, have many target sites, and are not easily off-target.
In some embodiments of the invention, the Cas protein hybridizes to SEQ ID NOs: 4, and a Cas protein having nuclease activity, substituted, deleted, or added by up to 8 amino acids. These proteins have the same or similar nuclease activity as the proteins shown in SEQ ID No. 1-SEQ ID No. 4, are also easily delivered into cells for gene editing, have many target sites, and are not easily off-target.
In some embodiments of the invention, the Cas protein hybridizes to SEQ ID NOs: 4, and a Cas protein having nuclease activity, substituted, deleted, or added by up to 6 amino acids. These proteins have the same or similar nuclease activity as the proteins shown in SEQ ID No. 1-SEQ ID No. 4, are also easily delivered into cells for gene editing, have many target sites, and are not easily off-target.
In some embodiments of the invention, the Cas protein hybridizes to SEQ ID NOs: 4, and a Cas protein having nuclease activity, substituted, deleted, or added by up to 5 amino acids. These proteins have the same or similar nuclease activity as the proteins shown in SEQ ID No. 1-SEQ ID No. 4, are also easily delivered into cells for gene editing, have many target sites, and are not easily off-target.
In some embodiments of the invention, the Cas protein hybridizes to SEQ ID NOs: 4, and a Cas protein having nuclease activity, substituted, deleted, or added by up to 4 amino acids. These proteins have the same or similar nuclease activity as the proteins shown in SEQ ID No. 1-SEQ ID No. 4, are also easily delivered into cells for gene editing, have many target sites, and are not easily off-target.
In some embodiments of the invention, the Cas protein hybridizes to SEQ ID NOs: 4, and a Cas protein having nuclease activity, substituted, deleted, or added by up to 3 amino acids. These proteins have the same or similar nuclease activity as the proteins shown in SEQ ID No. 1-SEQ ID No. 4, are also easily delivered into cells for gene editing, have many target sites, and are not easily off-target.
In some embodiments of the invention, the Cas protein hybridizes to SEQ ID NOs: 4, and a Cas protein having nuclease activity, substituted, deleted, or added by up to 2 amino acids. These proteins have the same or similar nuclease activity as the proteins shown in SEQ ID No. 1-SEQ ID No. 4, are also easily delivered into cells for gene editing, have many target sites, and are not easily off-target.
In some embodiments of the invention, the Cas protein hybridizes to SEQ ID NOs: 4, and a Cas protein having nuclease activity, which has been substituted, deleted or added with 1 amino acid compared to any one of the sequences. These proteins have the same or similar nuclease activity as the proteins shown in SEQ ID No. 1-SEQ ID No. 4, are also easily delivered into cells for gene editing, have many target sites, and are not easily off-target.
In some embodiments of the invention, the Cas protein is set forth in SEQ ID No. 1. The Cas protein consists of 1064 amino acids, the number of the amino acids is smaller, the Cas protein is easier to deliver into cells for editing, and the identified PAM sequence is NNNV (wherein V represents base A/G/C), so that more target sites can be edited, and the target sequence is 23bp, so that off-target phenomenon is not easy to cause. The Cas protein has in vitro DNA double strand cleavage activity, and no human intracellular editing activity was detected.
In some embodiments of the invention, the Cas protein is set forth in SEQ ID No. 2. The Cas protein consists of 1368 amino acids, the number of amino acids is smaller, it is easier to be delivered into cells for editing, and the PAM sequence identified by it is NNMTA. The Cas protein has in vitro DNA double strand cleavage activity, and no human intracellular editing activity was detected.
In some embodiments of the invention, the Cas protein is set forth in SEQ ID No. 3. The Cas protein consists of 1245 amino acids, is less in amino acid number, is easier to deliver into cells for editing, and recognizes the PAM sequence as TTTN. The Cas protein has an in vitro DNA double strand cleavage activity and a human intracellular editing activity.
In some embodiments of the invention, the Cas protein is set forth in SEQ ID No. 4. The Cas protein consists of 1306 amino acids, the number of the amino acids is smaller, the Cas protein is easier to be delivered into cells for editing, and the identified PAM sequence is YYN, so that the limit that LbCPf1 only identifies TTTN is greatly relieved. The Cas protein has an in vitro DNA double strand cleavage activity and a human intracellular editing activity.
According to a second aspect of the present invention there is provided a nucleic acid sequence selected from at least one of the following: a nucleic acid sequence encoding a Cas protein according to any one of the embodiments of the first aspect of the invention; a nucleic acid sequence that is reverse-complementary to a nucleic acid sequence encoding a Cas protein according to any one of the embodiments of the first aspect of the invention.
In some embodiments of the invention, the nucleic acid sequence is DNA or RNA.
According to a third aspect of the present invention there is provided an expression vector comprising a nucleic acid sequence according to the second aspect of the present invention. Constructing the nucleic acid sequence and the vector to obtain expression vectors, wherein the expression vectors can express corresponding Cas proteins in target cells, so that corresponding gene editing is performed in the target cells. The usual vectors may be plasmids, lentiviruses, etc., and may be, for example, pET 28a vectors, pMD19 vectors, etc.
According to a fourth aspect of the present invention there is provided a recombinant cell comprising an expression vector according to the third aspect of the present invention. The expression vector is introduced into cells to form recombinant cells, and the corresponding Cas protein is expressed by the expression vector, so that gene editing of the recombinant cells can be realized. These recombinant cells may be eukaryotic cells, such as plant cells, animal cells. Especially, compared with the common SpCas9 protein and LbCPf1 protein, the Cas protein provided by the invention has fewer amino acid numbers and is easier to be delivered into cells for editing. When the virus vector is used for animal cells, the virus vector is more convenient to package and deliver, and the application in the field of animal cells is expanded.
According to a fifth aspect of the present invention, there is provided a Crispr-Cas system comprising a Cas protein according to the first aspect of the present invention. The Cas protein provided by the invention can be used in a Crispr-Cas system, is applied to the field of gene editing, expands the editable range, is not easy to miss targets, and improves the editing accuracy. The system can be used in a plurality of fields such as basic bioscience, medicine, agriculture and the like.
According to an embodiment of the present invention, the Crispr-Cas system described above may further include the following technical features:
in some embodiments of the invention, the Crispr-Cas system further comprises at least one of the following: crRNA, tracrRNA or a chimeric RNA formed from crRNA, tracrRNA. These RNAs can help the Crispr-cas system to function as a gene editor. In addition, the Crispr-Cas system may further include a crispr_repeat sequence, as needed, wherein the crispr_repeat sequence corresponding to each Cas protein is shown in the accompanying table I and the accompanying table II.
In some embodiments of the invention, the crRNA, tracrRNA is as shown in the accompanying tables I and II. The crRNA, tracrRNA sequences used by Cas proteins in gene editing are listed in table I and table II. These sequences can help Cas proteins to be precisely located to target sequences, enabling precise gene editing.
According to a sixth aspect of the present invention, there is provided the use of the Cas protein, the nucleic acid sequence, the expression vector, the recombinant cell or the Crispr-Cas system according to the first aspect of the present invention in the field of gene editing, wherein the Cas protein is the Cas protein according to the first aspect of the present invention, the nucleic acid sequence is the nucleic acid sequence according to the second aspect of the present invention, the expression vector is the expression vector according to the third aspect of the present invention, the recombinant cell is the recombinant cell according to the fourth aspect of the present invention, and the Crispr-Cas system is the Crispr-Cas system according to the fifth aspect of the present invention.
Drawings
Fig. 1 is a PAM bias chart of BES1 provided in accordance with an embodiment of the present invention.
FIG. 2 is a graph of BES1 purification results provided in accordance with an embodiment of the present invention.
FIG. 3 is a base sequence and a structural diagram of crRNA+tracrrna-L, sgRNA-1, sgRNA-2, and sgRNA-3 of BES1 provided according to an embodiment of the present invention.
FIG. 4 is a PAM bias chart of BES1 for chip detection with crRNA+tracrrna-L, sgRNA-1, sgRNA-3, respectively, according to an embodiment of the present invention.
FIG. 5 is a sequence diagram of a spacer provided in accordance with an embodiment of the present invention.
Fig. 6 is a PAM library sequence constructed as provided in accordance with an embodiment of the present invention.
FIG. 7 is a schematic representation of cleavage substrate sequences provided in accordance with an embodiment of the present invention.
FIG. 8 is a band diagram of in vitro cleavage products of BES1 with crRNA+tracrrna-L, sgRNA-1, sgRNA-2 and sgRNA-3 at 20 ℃, 25 ℃ and 37 ℃ provided according to an embodiment of the present invention.
Fig. 9 is a schematic flow chart of obtaining a novel Cas protein provided according to an embodiment of the present invention.
Fig. 10 is a PAM bias chart of a chip detection BES2, BES4 and BES6 system according to an embodiment of the present invention.
FIG. 11 is a graph of in vitro cutting experiments for BES2, BES4 and BES6 systems according to embodiments of the present invention.
FIG. 12 is an electrophoresis diagram of human cell editing activity assay of BES6 system according to an embodiment of the present invention.
FIG. 13 is an electrophoresis diagram of human cell editing activity assay of BES4 system according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention. Also, certain terminology or expressions present herein have been chosen for the purpose of facilitating an understanding of the invention and are not to be construed as limiting the scope of the invention, for the purpose of enabling those of ordinary skill in the art to better understand the invention.
Herein, the terms "Crispr", "Crispr" or "Crispr" all refer to short palindromic repeats of regular clustered intervals, i.e. acronyms for Clustered regularly interspaced short palindromic repeats, and are expressions commonly used in the art, whether capitalized or lowercase or capitalized. Accordingly, there are different expressions in the Crispr-Cas system for letter cases. In addition, when a base is represented, unless otherwise specified, the letters N and V represent bases having the usual meaning in the art, i.e., N represents a random or arbitrary base A, T, C or G, and V represents a random or arbitrary base A, C or G.
Cas9 enzymes cleave at target DNA targets, the target site is typically determined by: an RNA molecule called Crispr RNA (crRNA) binds together with a part of its sequence to an RNA molecule called tracrRNA by base pairing to form a chimeric RNA (tracrRNA/crRNA) which then base pairs with a target DNA site via another part of the crRNA sequence, whereby the chimeric RNA directs Cas protein binding to this target site for cleavage, such chimeric RNA also being called guide RNA (guide RNA). Unlike the Crispr-Cas9 system, the Cpf1 enzyme is able to process CrRNA precursors alone and then specifically target and cleave DNA using crrnas produced after processing, without the need for ribonucleases and tracrrnas from the host cell.
The targeting specificity of Crispr is determined by two parts, one part being base pairing between the RNA chimera and the target DNA, and the other part relying on Cas protein and a short DNA sequence at the 3' end of the target DNA, called PAM (protospacer adjacent motif).
If the PAM sequence is stringent (e.g., possibly a specific few bases), then there are fewer target sites that the Cas protein can edit, thus limiting the application of the Crispr-Cas system. Both SpCas9 and LbCpf1 have a relatively stringent PAM sequence, thus limiting the design of the targeting site. For example, the PAM sequence identified by the SpCas9 nuclease is NGG, located at the 3' end of the targeting sequence, and cleaved at 3bp from the PAM sequence to form a blunt end, which limits the application of the editing system because the PAM sequence is NGG only.
We have found a variety of novel Cas9 systems and Cpf1 systems with genetic editing potential in human intestinal flora using bioinformatics and molecular experimental techniques, as shown in table I and table II. Wherein the Cpf1 enzyme in the Cpf system, also known as Cas12a protein, is genetically edited differently than Cas9 protein, cpf1 enzyme is smaller than SpCas9 protein and is more easily transported into cells and tissues. And the method is applied to a Crsipr-Cpf1 system, only one crRNA is needed, and multi-site simultaneous editing can be realized. Cas proteins provided in the present application include both Cas9 proteins and Cpf1 proteins. Namely, the application provides a Cas protein which is at least one of SEQ ID NO. 1-SEQ ID NO. 4. The Cas proteins have nuclease activity and can be used for cutting target nucleic acid, so that the Cas proteins are applied to a Crispr-Cas system, effective editing of genes is realized, more target sites for editing are available, and the application range is wider.
The novel Cas9 and Cpf1 systems provided have lower identified PAM specificity, thus expanding the application of gene editing systems. Taking BES1 protein obtained on human enterobacteria Veillonella sp AF-2 (AF 13-2 for short) as an example, the PAM specificity of the protein is lower and the protein is smaller than that of the existing commercial SpCas9 and LbCPf 1. The BES1 protein has a smaller number of amino acids and is easier to deliver into cells for gene editing functions. The PAM sequence preference of BES1 is shown in fig. 1, in which the abscissa in fig. 1 represents 7 sites immediately adjacent to the 3' end of the target sequence and the ordinate represents the proportion of each base in all positive sequences that are cut. In FIG. 1, the probability of either base A, base C, base T or base G is high at the first site immediately 3' of the target sequence, which site can be denoted as N, and the results of each site are observed sequentially. As can be seen from FIG. 1, the probability of cleavage is very low (less than 0.05) only when the fourth bit is T, and thus the PAM sequence of BES1 is NNNV (where V represents base A, G or C).
Novel Cas9 systems tables, including strain names, genome ID (NCBI database), taxID (NCBI database), species, genus, phylum, crRNA, tracrRNA, crispr repeat sequence, effector protein length, effector amino acid sequence, etc., of the Cas protein are detailed in table I.
Novel Cpf1 systems tables, including strain name, genome ID (NCBI database), taxID (NCBI database), species, genus, phylum, crRNA, tracrRNA, crispr repeat sequence, effector protein length, effector amino acid sequence, etc., for which the Cas protein is located, are detailed in Table II.
The scheme of the present invention will be explained below with reference to examples. It will be appreciated by those skilled in the art that the following examples are illustrative of the present invention and should not be construed as limiting the scope of the invention. The examples are not to be construed as limiting the specific techniques or conditions described in the literature in this field or as per the specifications of the product. The reagents or apparatus used were conventional products commercially available without the manufacturer's attention.
Example 1
Analysis of microorganisms in the human intestinal flora based on the microbial genome database predicts Cas protein sequences and Crispr sequences, determining all protein sequences 20kb upstream and downstream of Crispr. Then, the protein is compared with a protein database in NCBI to obtain homologous proteins of known TypeII or TypeV proteins. These homologous proteins were analyzed to determine the conserved sites of key domains of the homologous proteins and the integrity of the proteins, resulting in Cas protein sequences and nearby Crispr sequences in tables I and II. The analysis method is shown in FIG. 9. These novel Crispr-Cas systems belong to the novel Type II and Type V Crispr-Cas systems, with a different gene editing capacity than the existing SpCas9 proteins. These novel Crispr-Cas systems enrich the existing Crispr-Cas systems and can be used in different cells, e.g., animal cells and plant cells, as needed to perform gene editing functions.
Taking BES1 obtained on human intestinal bacteria Veillonella sp AF-2 (abbreviated as AF 13-2) as an example, the PAM specificity is low and the protein is smaller than that of the existing commercial SpCas9 and LbCPf 1. As shown in FIG. 1, the PAM sequence preference of BES1 is that the probability of cleavage is extremely low (less than 0.05) only when the fourth bit is T, and the PAM sequence of BES1 is NNNV (where V represents base A, G, C).
The additional table I is a novel Cas9 systems table, including the strain name, genome ID (NCBI database), taxID (NCBI database), species, genus, phylum information, and crRNA, tracrRNA, crispr repeat sequence (Crispr repeat), effector protein length (effector length), effector amino acid sequence (effector amino acid sequence) of the Crispr-Cas system or Cas protein. The Cas proteins shown in the accompanying table I, having shown the corresponding crRNA, tracrRNA and/or crispr repeat sequence, can be applied directly by the person skilled in the art according to the sequences shown.
The additional table II is a novel Cpf1 systems table including the strain name, genome ID (NCBI database), taxID (NCBI database), species, genus, phylum information, and crRNA, tracrRNA, crispr repeat sequence, effector protein length, effector amino acid sequence of the Crispr-Cas system or Cas protein. Cas proteins shown in the accompanying table II, not shown as corresponding crRNA, tracrRNA and/or crispr repeat sequence, can find crRNA, tracrRNA and/or crispr repeat sequence that can help these Cas proteins perform editing functions based on the information of the corresponding Cas proteins.
EXAMPLE two experiments to express purified BES1 protein
1. Construction of BES1 expression vectors
Constructing an expression vector by adopting an In-fusion method, selecting NdeI and EcoR I two sites to enzyme-cut a pET 28a vector, and inserting a BES1 coding gene sequence into a cloning region of the vector pET 28 a. The 6 His at the N-terminal of the amino acid sequence of the recombinant BES1 protein are used as purification tags, wherein the screening tag is kanamycin, and the constructed vector is named pET 28a-BES1.
2. Cultivation and Induction of BES1 Strain
LB liquid medium: 10g/L tryptone, 5g/L yeast extract and 10g/L NaCl.
The recombinant expression vector pET 28a-BES1 was transformed into E.coli expression strain Ecoli.BL21 (DE 3), and the bacterial liquid was spread evenly on LB solid medium plates with a kanamycin concentration of 50. Mu.g/mL, and cultured overnight at 37 ℃. Single colonies were picked and cultured in 5mL LB medium (containing 50. Mu.g/mL kanamycin) at 37℃and 200rpm overnight. The bacterial liquid obtained above was inoculated at 1:100 into 50mL of LB medium (containing 50. Mu.g/mL kanamycin) and cultured at 37℃for 4 hours at 200 rpm. The bacterial liquid of the expansion culture is inoculated into 2L LB liquid culture medium (containing 50 mug/mL kanamycin) according to the ratio of 1:100 for culture, the temperature is 37 ℃, the rpm is 200, when the OD600 value reaches about 0.6-0.8, IPTG is added to the final concentration of 0.4mM, the temperature is 16 ℃, the rpm is 200, and the culture is carried out for about 16-18 hours. And centrifuging 10000g of the induced bacterial liquid to collect bacterial cells, and freezing the bacterial cells at-20 ℃ for later use.
3. BES1 protein extraction and purification
Purifying Buffer preparation:
(1) Ni column affinity chromatography
Buffer a equilibration Buffer: 50mM Tris-HCl+500mM NaCl+20mM imidazole, pH 7.5.
Buffer B elution Buffer: 50mM Tris-HCl+500mM NaCl+500mM imidazole, pH 7.5.
(2) Ion exchange chromatography
Buffer C equilibration Buffer: 50mM Tris-HCl+100mM NaCl,pH 7.0.
Buffer D elution Buffer: 50mM Tris-HCl+1M NaCl, pH 7.0.
(3) Protein sample diluent
Buffer E dilution: 50mM Tris-HCl, pH 7.0.
(4) Protein sample 2 x stock solution
Buffer f2×stock: 50mM Tris-HCl+300mM NaCl,pH 7.0.
The cells were resuspended in a proportion of 1g of cells plus 15ml Buffer A, and PMSF was added to a final concentration of 1mM, and the cells were sonicated until the cell solution was clear. The crushed cells were centrifuged at 12000rpm at 4℃for 30min, and the supernatant was filtered through a 0.22 μm filter membrane and stored at 4 ℃.
The Ni column affinity chromatography column was washed with water for 5CV, buffer B was washed for 5CV, and buffer A was equilibrated for 10CV, followed by loading. After loading was completed, 15CV was equilibrated, the impure proteins were washed off using 15% Buffer B, and the proteins were collected by linear elution (15-100% Buffer B,10 CV) when the UV value was greater than 100 mAU.
The protein collected by the Ni column is diluted 5 times by Buffer E, the Q anion exchange column is washed with water for 5CV, buffer C is balanced for 5CV, a protein sample is loaded, and collecting of penetrating fluid is started when the UV value rises. The SP cation exchange column is equilibrated with Buffer C for 5CV, the protein sample obtained in the previous step is loaded, after loading is completed, the protein sample is equilibrated with Buffer C for 15CV, and then eluted with elution Buffer D (0-100% Buffer D,10 CV) in a linear manner, and the protein is collected. The proteins were collected for overnight dialysis and the dialysate was 2 x storage Buffer. The final protein concentration was 1mg/mL and the glycerol concentration was 50%. As shown in FIG. 2, SDS-PAGE results show that the fusion protein has good purification effect and qualified purity.
In the following examples three and four, taking as an example the Cas9 protein BES1BES1 (SEQ ID NO: 1) found in human enterobacteria Veillonella sp AF13-2, the PAM sequence recognized by the protein and its cleavage function in vitro on the target substrate were investigated.
Example III experiment to obtain BES1 PAM sequence
1. Preparation of wizard RNA (guide RNA)
First, we designed double-stranded DNA transcription templates for crRNA and tracrRNA-L from predicted crRNA and tracrRNA sequences of BES1 in strain AF13-2 (see Table I below). At the same time, on the basis of this, it was attempted to shorten the sequence of the pairing region of crRNA and tracrRNA-L, and ligate them with a GAAA ligation sequence, so that a single DNA strand, i.e., sgRNA-1, was formed, and the transcription template sequence of sgRNA-1 is shown in Table 1 below. Meanwhile, in order to maintain the activity of the original RNA to the greatest extent, sgRNA-3 is designed, the transcription template sequence of the sgRNA-3 is shown in the following table 1, and the deoxynucleotide sequences used in the table 1 are synthesized on a Shenzhen national gene library synthesis and editing platform. Wherein the sequences shown in Table 1 are all DNA template sequences for each RNA transfer. The sequence and secondary structure of crRNA+tracrrna-L, sgRNA-1, and sgRNA-3 are shown in FIG. 3.
TABLE 1BES1 cleavage on chip experiment template sequence for RNA transcription
The double-stranded DNA template described above was prepared by DNA polymerase chain reaction using KAPAHiFiTM heat activated on-the-fly using a cocktail (Roche). After the reaction, DNA double-stranded template was purified using a phenol chloroform isoamyl alcohol mixture (Allatin), the purity of the purified DNA double-stranded template was determined using a Nanodrop TM 2000 spectrometer (Thermo Fisher Scientific), and Qubit was used for the purified DNA double-stranded template TM The double-stranded DNA high-sensitivity quantitative kit (Thermo Fisher Scientific) and the Qubit TM 3.0 fluorescent quantitative instrument are used for concentration measurement.
Then, transcription is performed using the above DNA double-stranded template, and when transcription is performed, the transcription is performed according to MEGAscript TM In the specification of T7Transcription Kit, 2 picomoles of DNA double-stranded template were added and incubated for 12 hours at 37℃using a Bio-rad S1000. TM. PCR instrument. And the RNA was purified using a phenol chloroform isoamyl alcohol mixture (Allatin), and the purity and concentration of the purified RNA were determined using a Nanodrop TM 2000 spectrometer (Thermo Fisher Scientific).
2. Preparation of cleavage substrate Single-chain Loop
Cleavage substrates were prepared which can be used for the above BES1 proteins, wherein the deoxynucleotide sequences used for cleavage of the substrates are shown in the following Table (Table 2). Wherein the deoxynucleotide sequences used in Table 2 are synthesized in Shenzhen national gene library synthesis and editing platform.
By DNA polymerase chain reactionThermal activation the double strand of the substrate to be cleaved (double strand substrate) was prepared on-the-fly using a cocktail (Roche). The two nucleotide sequences of PAM_AF13-2_2/1 and PAM_AF13-2_2/2 in the table 2 are denatured at 95 ℃ and then renatured to be used as templates, and the two nucleotide sequences of PAM_AF13-2_1 and PAM_AF13-2_3 are used as primers for carrying out polymerase chain reaction amplification to obtain the double-chain substrate.
The obtained polymerase chain reaction product was recovered using an e.z.n.a.tm glue recovery kit, and then the recovered product was subjected to purity measurement (Thermo Fisher Scientific) using a Nanodrop (TM) 2000 spectrometer, and concentration measurement was performed using a Qubit (TM) double-stranded DNA high-sensitivity quantification kit (Thermo Fisher Scientific) and a Qubit (TM) 3.0 fluorescent quantification meter.
TABLE 2 deoxynucleotide sequences used for cleavage substrate preparation
Then, single-strand cyclization is performed using the double-strand substrate obtained as described above to obtain a single-strand loop product. The method comprises the following steps:
using 1 picomolar of the DNA double-stranded substrate prepared above, 1 XPTA buffer (Epicentre), T4 DNA ligase 120U (Epicentre), and 10mM ATP (NEB) final concentration, the reaction product system size was 60 μl, using Bio-rad S1000 TM The PCR instrument was incubated at 37℃for 1 hour.
EXO III (10U/. Mu.l) (from BGI) and EXO I (3U/. Mu.l) (from BGI) were then used, using Bio-rad S1000 TM The PCR instrument was incubated at 37℃for 30 minutes, and the unqualified PCR product was digested. The product used 2.5 volumes of AMPure XP (Beckman TM ) After purification and using Qubit TM Single-stranded DNA high-sensitivity quantitative kit (Thermo Fisher Scientific) and Qubit TM 3.0 (Thermo Fisher Scientific) the concentration was measured by a fluorescence quantitative measuring instrument.
3. SE51 sequencing
(1) The nanospheres used in the machine were prepared by using the above single-stranded ring, 6 nanograms of the above single-stranded ring product was taken, and nuclease-free pure water (Ambion) TM ) The mixture was equilibrated to 20. Mu.l, and 20. Mu.l of Make DnB Buffer (BGI) was added, and after mixing, the mixture was centrifuged, and the mixture was incubated at 95℃for 1 minute, 65℃for 1 minute, 40℃for 1 minute, and 4℃for 1 minute using a Bio-rad S1000. TM. PCR instrument.
After reaction, the product was added with make DnB enzyme mix V2.0.0 (BGI) 40. Mu.l, make DnB enzyme mix II V2.0.0 (BGI) 2. Mu.l, mixed and incubated for 20 min at 30℃using a Bio-rad S1000. Mu.M PCR apparatus, mixed with DnB stop Buffer (BGI) after reaction, blown with a flared tip (Axygen), added with 30. Mu.l load DnB Buffer (BGI), blown with a flared tip (Axygen), and the library was immobilized on a BGITMSEQ 500V 3.1 chip (BGI) using a BGITMSEQ500 DnB loader (BGI) to give the chip to be sequenced.
(2) Using BGI TM The sequence information and ID number of each nucleic acid sequence are obtained by performing SE51 sequencing on the chip by using a BGITMSEQ500 sequencer (BGI) by using a SEQ500 SE100 sequencing Cartridge sequencing kit (BGI).
4. BES1-PAM native strand sequencing
Since the sequencing results in single-stranded DNA, the complementary strand (i.e., the original strand) is synthesized using the single-stranded DNA, and the obtained double-stranded DNA is used for the cleavage experiment of the protein. Comprising the following steps:
(1) After the chip sequencing is completed, the chip sequencing is finished in BGI TM New strands generated from the first sequencing were eluted on SEQ500 DnB loader (BGI) using 100% formamide (Sigma).
(2) After the completion of the chip elution, dNTP mix 2 (BGI) was used to perform the reaction in BGI TM The original strand synthesis is carried out on a SEQ500 sequencer (BGI) to obtain double-stranded DNA, the synthesis length is 50 nucleotides, the 51 st base is synthesized by dNTP mix 1 (BGI), and the step is to add fluorescence dNTP at the end of the synthesis strand.
(3) After the above steps are completed, BGI is used TM The chip is photographed by a SEQ500 sequencer (BGI), and is stored as an original image on the sequencer.
(4) BES1 chip enzyme digestion reaction. And (3) performing enzyme digestion reaction on the double-stranded DNA obtained in the step (2) by using different RNAs. Wherein the buffer used in the reaction is spCas9 1 ×reaction buffer (NEB), 30 μg of RNA (crRNA+tracrrRNA-L, sgRNA-1 or sgRNA-3) prepared in step 1 is added, BES1 protein is 0.1 μmol, RNase inhibitor (Epicentre) reaction system has a final volume of 300 μL, and BGI is used TM The mixture was pumped into the chip by a SEQ500 DnB loader (BGI) pump and incubated at 37℃for 5 hours.
(5) The chips were washed 3 times with 300. Mu.l of washing buffer 2 (BGI).
(6) After the above steps are completed, the chip is photographed by using a BGITMSEQ500 sequencer (BGI), and the chip is stored as an original picture II on the sequencer.
(7) The stored primary and secondary images were compared for fluorescence signals before and after digestion using a BGITMSEQ500 sequencer (BGI) by manual basecall software (BGI). The PAM sequence of BES1 was analyzed with SpCas9 as a control and the results are shown in fig. 4.
In the results shown in FIG. 4, 7 sites immediately adjacent to the 3' -end of the target sequence are shown on the abscissa, and the proportion of each base in all positive sequences that are cut is shown on the ordinate. That is, the ordinate represents the number of sequences to be cut as denominator, which base is to be cut at each position is determined, and the ratio of four bases at each position is calculated. As can be seen from the results shown in fig. 4, the preference of BES1 is not much different under the action of Guide RNA, which is slightly different in structure, than SpCas 9.
In vitro cleavage experiments of example four BES1
1. Preparation of guide RNA
According to the method of example three, a crRNA transcription template, a double-stranded DNA transcription template of tracrRNA-L, and double-stranded DNA transcription templates of sgRNA-1 and sgRNA-3 were obtained. At the same time, shorter tracrRNA-S was designed, and sgRNA-2 was designed using the complete crRNA and tracrRNA-S, the transcription template sequences of which are shown in table 3 below. The transcription template DNA is synthesized in Shenzhen national gene library synthesis and editing platform.
TABLE 3 double-stranded DNA transcription templates for sgRNA-2
Functional RNAs such as those shown in FIG. 4, including crRNA+tracrrna-L, sgRNA-1, sgRNA-2, and sgRNA-3 (where the target sequence is replaced with N in FIG. 4) can be transcribed using the DNA templates described above.
Specifically, according to the method of the third embodiment, the method includes:
double-stranded DNA templates were prepared by DNA polymerase chain reaction using KAPAHiFiTM heat activated on-the-fly using a cocktail (Roche). After the reaction, DNA double-stranded template was purified using a phenol chloroform isoamyl alcohol mixture (Allatin), the purity of the purified DNA double-stranded template was determined using a Nanodrop TM 2000 spectrometer (Thermo Fisher Scientific), and Qubit was used for the purified DNA double-stranded template TM The double-stranded DNA high-sensitivity quantitative kit (Thermo Fisher Scientific) and the Qubit TM 3.0 fluorescent quantitative instrument are used for concentration measurement.
Then, transcription is performed using the above DNA double-stranded template, and when transcription is performed, the transcription is performed according to MEGAscript TM In the specification of T7Transcription Kit, 2 picomoles of DNA double-stranded template were added and incubated for 12 hours at 37℃using a Bio-rad S1000. TM. PCR instrument. RNA was purified using a phenol chloroform isoamyl alcohol mixture (Allatin), and the purity and concentration of the purified RNA were determined using a Nanodrop TM 2000 spectrometer (Thermo Fisher Scientific).
2. Cleavage substrate preparation
Target site design: the Crispr sequence is typically composed of a leader, which may typically act as a promoter for the Crispr sequence, multiple repeats, which may form a hairpin structure, and multiple spacers, which typically consist of captured foreign DNA. Thus, the original pro-spacer sequence (selected-spacer in FIG. 5) on the genomic sequence of the Veilonella sp.AF13-2 strain (NCBI genome ID: QTMT 00000000) was used as the target site sequence.
PAM sequence design: A7N PAM library (spacer and PAM sequences in FIG. 6) was created to facilitate cleavage of BES1 protein.
Cleavage substrate design: cloning of the synthesized PAM library sequences into the pMD19 vector resulted in a pMD19-AF13-2-3' PAM library. We amplified a 842bp cleavage substrate sequence in this library (see FIG. 7, where the cleavage substrate sequence is shown in SEQ ID NO: 243), the target site positions were 402bp-431bp (see FIG. 7), and the PAM positions were 432bp-438bp (see FIG. 7, i.e., 7 random bases from position 432 to position 438 in SEQ ID NO:24, underlined), so that the cleavage products were all about 400 bp. The reason for this design is that in the case of gel electrophoresis with low resolution, the cleavage product forms a broad band, so that we can detect whether or not it is cleaved.
The cleavage substrate sequence of 842bp is as follows (N stands for any base) (SEQ ID NO: 23):
CTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATACCGCTCGCCGCAGCCGAACGACCGAGCGCAGCGAGTCAGTGAGCGAGGAAGCGGAAGAGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCATTAATGCAGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCTATGACCATGATTACGCCAAGTTTGCACGCCTGCCGTTCGACGATTGTAGTAGCTCAAAAGGGAACTGCTACCGAANNNNNNNAATCTCTGGAAGATCCGCGCGTACCGAGTTCTAATTCACTGGCCGTCGTTTTACAACGTCGTGACTGGGAAAACCCTGGCGTTACCCAACTTAATCGCCTTGCAGCACATCCCCCTTTCGCCAGCTGGCGTAATAGCGAAGAGGCCCGCACCGATCGCCCTTCCCAACAGTTGCGCAGCCTGAATGGCGAATGGCGCCTGATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCACACCGCATATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGCCCCGACACCCGCCAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACAGACAAGCTGTGACCGTCTCCGGGAGCTGCATGTGTCAGAGG(SEQ ID NO:23)。
3. cutting experiments and results
The cleavage system was functional RNA (four RNAs shown in FIG. 4), cleavage substrate and BES1 were added at a final concentration of 100nM, incubated at 20℃and 25℃and 37℃for 1 hour, and the cleavage products were identified by using 2% agarose gel, and the cleavage results were shown in FIG. 8.
From the results shown in FIG. 8, it can be seen that BES1 can cleave the target substrate by adding four functional RNAs as shown in FIG. 4, respectively, at 20 ℃, 25 ℃ and 37 ℃ in incubation.
Examples PAM preference identification of five BES2, BES4 and BES6 systems
The PAM identification experimental method and steps of three systems of BES2, BES4 and BES6 are consistent with the above embodiments, and the main steps are as follows:
(1) Preparation of guide RNA
The production of the messages was predicted to obtain the tracrRNA and crRNA sequences of the BES2 system in the strain Collinella sp.Marseille-P2666 (see Table I below), and double-stranded DNA transcription templates of the sgRNAs integrated by ligation of crRNAs with the tracrRNA were designed, and the specific deoxynucleotide sequences are shown in Table 4 below. BES4 and BES6 belong to Cpf1 homologous systems, the system can realize genome targeted cleavage only by crRNA guide effector proteins, the participation of tracrRNA is not needed, crRNA sequences of the two proteins are predicted through letter generation, and double-stranded DNA transcription templates are designed and synthesized, and specific deoxynucleotide sequences are shown in the following table 4. The deoxynucleotide sequences used in Table 4 are synthesized in Shenzhen national gene library synthesis and editing platform.
Table 4:
the preparation of double-stranded DNA transcription template guide RNA for the crRNA of BES2 system, BES4 and BES6 system shown in Table 4 is the same as in example III.
(2) PAM identification
Rapid detection of PAM sequences for the BES2, BES4 and BES6 systems based on DNB chips was consistent with example three. PAM preference for three systems is shown in fig. 10.
EXAMPLES six in vitro cleavage Activity assays of BES2, BES4 and BES6 systems
First, the guide RNA sequences of the BES2, BES4 and BES6 systems are expressed in vitro according to the description in example three; secondly, in accordance with the experimental method in example two, expressing effector proteins of the purified BES2, BES4 and BES6 systems; finally, substrate preparation and in vitro cleavage were performed in accordance with the experimental procedure in example four. As shown in FIG. 11, all three systems have the activity of cleaving DNA double strand in vitro.
Example seven BES6 System identification of edit Activity in human cells
(1) Human cell culture
The inventors selected human HEK293T cells as cells for in vivo editing activity testing. HEK293T cells were cultured on DMEM medium and were fed with Fetal Bovine Serum (FBS).
(2) RNP preparation
For editing HEK293T cells, we selected endogenous gene AAVS1 for targeted cleavage validation.
The targeting region nucleotide sequence of AAVS1 is as follows:
CCCTTGCTCTCTGCTGTGTTGCTGCCCAAGGATGCTCTTTCCGGAGCACTTCCTTC
TCGGCGCTGCACCACGTGATGTCCTCTGAGCGGATCCTCCCCGTGTCTGGGTCCTCTC
CGGGCATCTCTCCTCCCTCACCCAACCCCATGCCGTcTTCACTCGCTGGGTTCCCTTTT
CCTTCTCCTTCTGGGGCCTGTGCCATCTCTCGTTTCTTAGGATGGCCTTCTCCGACGGA
TGTCTCCCTTGCGTCCCGCCTCCCCTTCTTGTAGGCCTGCATCATCACCGTTTTTCTGG
ACAACCCCAAAGTACCCCGTCTCCCTGGCTTtAGcCACCTCTCCATCCTCTTGCTTTCTT
TGCCTGGACACCCCGTTCTCCTGTGGATTCGGGTCACCTCTCACTCCTTTCATTTGGGC
AGCTCCCCTACCCCCCTTACCTCTCTAGTCTGTGCTAGCTCTTCCAGCCCCCTGTCATG
GCATCTTCCAGGGGTCCGAGAGCTCAGCTAGTCTTCTTCCTCCAACCCGGGCCCcTAT
GTCCACTTCAGGACAGCATGTTTGCTGCCTCCAGGGATCCTGTGTCCCCGAGCTGGGA
CCACCTTATATTCCCAGGGCCGGTTAATGTGGCTCTGGTTCTGGGTACTTTTATCTGTCC
CCTCCACCCCACAGTGGGGCCACTAGGGACAGGATTGGTGACAGAAAAGCCCCATCC
TTAGGCCTCCTCCTTCCTAGTCTCCTGATATTGGGTCTAACCCCCACCTCCTGTTAGGC
AGATTCCTTATCTGGTGACACACCCCCATTTCCTGGAGCCATCTCTCTCCTTGCCAGAA
CCTCTAAGGTTTGCTTACGATGGAGCCAGAGAGGATCCTGGGAGGGAGAGCTTGGCA
GGGGGTGGGAGGGAAGGGGGGGATGCGTGACCTGCCCGGTTCTCAGTGGCCACCCT
GCGCTACCCTCTCCCAGAACCTGAGCTGCTCTGACGCGGCTGTCTGGTGCGTTTCACT
GATCCTGGTGCTGCAGCTTCCTTACACTTCCCAAGAGGAGAAGCAGTTTGGAAAAAC
AAAATCAGAATAAGTTGGTCCTGAGTTCTAACTTTGGCTCTTCACCTTTCTAGTCCCCA
ATTTATATTGTTCCTCCGTGCGTCAGTTTTACCTGTGAGATAAGGCCAGTAGCCACCCC
CGTCCTGGCAGGGCTGTGGTGAGGAGGGGGGTGTCCGTGTGGAAAACTCCCTTTGTG
AGAATGGTGCGTCCTAGGTGTTCACCAGGTCGTGGCCGCCTCTACTCCCTTTCTCTTTC
TCCATCCTTCTTTCCTTAAAGAGCCCCCAGTGCTATCTGGACATATTCCTCCGCCCAGA
GCAGGGTCCGCTTCCCTAAGGCCCTGCTCTGGGCTTCTGGGTTTGAGTCCTTGCAAGC
CCAGGAGAGCGCTAGCTTCCCTGTCCCCCTTCCTCGTCCACCATCTCATGCCCTGGCT
CTCCTGCCCCTTCCTACA(SEQ ID NO:27).
for this gene, 1 targeting site was designed, and its double-stranded DNA transcription template was designed and synthesized, and specific deoxynucleotide sequences are shown in table 5 below. The deoxynucleotide sequences used in Table 5 are synthesized in Shenzhen national gene library synthesis and editing platform.
Table 5:
BES4 and BES6 targeting AAVS1 site sequences shown in Table 5 were transcribed in vitro to generate guide RNA according to the manufacturer's recommended method using ordered oligonucleotides and MEGAshortscriptTM T transcription kit (Invitrogen). In vitro expression of BES4 and BES6 effector proteins is consistent with example two.
(3) RNP transfer into human cells
In a twelve well plate, 10 picomoles of purified effector protein and 0.5 microliters of gRNA were added to each well. Using Neon TM Transfection System kit and Nuclear transfection apparatus (Invitrogen) RNPs were assembled and transfected into HEK293T cells according to the manufacturer's protocol.
(4) Editing activity identification
Cells were harvested 2-3 days after RNP transfection and tested for activity by T7E1 enzyme assay as follows:
(a) Collecting cells: 200 microliters of 0.5 molar EDTA (pH 8.0) was added to each well of the 12-well plate to resuspend the cells;
(b) Genomic DNA extraction: genomic DNA was extracted using a genomic DNA extraction kit (Tiangen), and gDNA concentration was measured using Nanodrop;
(c) Targeting region PCR: target site region amplification was performed from gDNA using GXL Prime, the amplification primers are shown in table 6 below, and the deoxynucleotide sequences used were synthesized in the shenzhen national gene library synthesis and editing platform. And purified using PCR purification and gel extraction kit (MN). PCR product cleanliness was analyzed by agarose gel electrophoresis while concentration was measured using Nanodrop.
(d) Denaturation and annealing: denaturation and annealing of the purified product of step (c) was performed using a Bio-rad PCR instrument. The T7E1 cleavage reaction was carried out by adding an equivalent amount of substrate DNA (about 200-300ng/rxn, 10. Mu.l of reaction system).
(e) T7E1 enzyme digestion: 0.2. Mu.l of T7EI nuclease was added to the 10. Mu.l of sample in step (d). The cleavage reaction was performed at 37℃for 20 minutes.
(f) Activity detection: after completion of the cleavage reaction, T7E1 was added to a loading buffer to carry out agarose gel detection.
Table 6: PCR amplification primer list
As shown in FIG. 12, BES6 has human cell editing activity.
Example identification of the edit Activity of the eight BES4 System in human cells
(1) Human cell culture
The inventors selected human HEK293T cells as cells for in vivo editing activity testing. HEK293T cells were cultured on DMEM medium and were fed with Fetal Bovine Serum (FBS).
(2) Plasmid preparation
For editing HEK293T cells, we selected endogenous gene HBG for targeted cleavage validation.
The nucleotide sequence of the targeting region of HBG is as follows:
CCCTGCTGTGCTCAGATCAATACTCCGTTGTCTAAGTTGCCTCGAGACTAAAGGC
AACAGGGCTGAAACATCTCCTGGACTCACCTTGAAGTTCTCAGGATCCACATGCAGCT
TGTCACAGTGCAGTTCACTCAGCTGGGCAAAGGTGCCCTTGAGATCATCCAGGTGCTT
TGTGGCATCTCCCAAGGAAGTCAGCACCTTCTTGCCATGTGCCTTGACTTTGGGGTTG
CCCATGATGGCAGAGGCAGAGGACAGGTTGCCAAAGCTGTCAAAGAACCTCTGGGTC
CATGGGTAGACAACCAGGAGCCTGTGAGATTGACAAGAACAGTTTGACAGTCAGAAG
GTGCCACAAATCCTGAGAAGCGACCTGGACTTTTGCCAGGCACAGGGTCCTTCCTTC
CCTCCCTTGTCCTGGTCACCAGAGCCTACCTTCCCAGGGTTTCTCCTCCAGCATCTTCC
ACATTCACCTTGCCCCACAGGCTTGTGATAGTAGCCTTGTCCTCCTCTGTGAAATGACC
CATGGCGTCTGGACTAGGAGCTTATTGATAACCTCAGACGTTCCAGAAGCGAGTGTGT
GGAACTGCTGAAGGGTGCTTCCTTTTATTCTTCATCCCTAGCCAGCCGCCGGCCCCTG
GCCTCACTGGATACTCTAAGACTATTGGTCAAGTTTGCCTTGTCAAGGCTATTGGTCAA
GGCAAGGCTGGCCAACCCATGGGTGGAGTTTAGCCAGGGACCGTTTCAGACAGATAT
TTGCATTGAGATAGTGTGGGGAAGGGGCCCCCAAGAGGATACTGCTAATTTTTTTTATA
GCCTTTGCCTTGTTCCGATTCAGTCATTCCAGTTTTTCTCTAATTTATTCTTCCCTTTAGC
TAGTTTCCTTCTCCCATCATAGAGGATACCAGGACTTCTTTTGTCAGCCGTTTTTTACCT
TCTTGTCTCTAGCTCCAGTGAGGCCTGTAGTTTAAAGCTAAAGCATGTACCAATTTTTG
AAAAGTTCAGGGATTGTGAAATGTGTTTTAGGCATAGGTCCAGGATTTTTGACGGGAC
AAATCTTAGTCTCTTTCAGTTAGCAGTGGTTTCTAAGGA(SEQ ID NO:32).
for this region, the inventors designed three targets and synthesized the corresponding plasmid sequences,
BES4-HBG-sg01:
GAGGGCCTATTTCCCATGATTCCTTCATATTTGCATATACGATACAAGGCTGTTAGA
GAGATAATTGGAATTAATTTGACTGTAAACACAAAGATATTAGTACAAAATACGTGACG
TAGAAAGTAATAATTTCTTGGGTAGTTTGCAGTTTTAAAATTATGTTTTAAAATGGACTA
TCATATGCTTACCGTAACTTGAAAGTATTTCGATTTCTTGGCTTTATATATCTTGTGGAA
AGGACGAAACACCGAATTTCTACTATTGTAGATGCCAGCCTTGCCTTGACCAATAGTTT
TTTGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTTTTAGCGCGTGC
GCCAATTCTGCAGACAAATGGCTCTAGAGGTACCCGTTACATAACTTACGGTAAATGG
CCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAGTAACGCCAATA
GGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAG
TACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGG
CCCGCCTGGCATTGTGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACAT
CTACGTATTAGTCATCGCTATTACCATGGTCGAGGTGAGCCCCACGTTCTGCTTCACTC
TCCCCATCTCCCCCCCCTCCCCACCCCCAATTTTGTATTTATTTATTTTTTAATTATTTTG
TGCAGCGATGGGGGCGGGGGGGGGGGGGGGGCGCGCGCCAGGCGGGGCGGGGCGG
GGCGAGGGGCGGGGCGGGGCGAGGCGGAGAGGTGCGGCGGCAGCCAATCAGAGCG
GCGCGCTCCGAAAGTTTCCTTTTATGGCGAGGCGGCGGCGGCGGCGGCCCTATAAAA
AGCGAAGCGCGCGGCGGGCGGGAGTCGCTGCGCGCTGCCTTCGCCCCGTGCCCCGCT
CCGCCGCCGCCTCGCGCCGCCCGCCCCGGCTCTGACTGACCGCGTTACTCCCACAGG
TGAGCGGGCGGGACGGCCCTTCTCCTCCGGGCTGTAATTAGCTGAGCAAGAGGTAAG
GGTTTAAGGGATGGTTGGTTGGTGGGGTATTAATGTTTAATTACCTGGAGCACCTGCCT
GAAATCACTTTTTTTCAGGTTGGACCGGTGCCACCATGGACTATAAGGACCACGACGG
AGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGATGGCCCCAAAG
AAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCATGCAGGAGAGAAAGAA
GATCAGCCACCTGACCCACAGAAACAGCGTGAAGAAAACCATCAGAATGCAGCTGA
ACCCCGTGGGAAAGACCATGGACTACTTCCAGGCCAAGCAGATCCTGGAGAACGACG
AGAAGCTGAAGGAGGACTACCAGAAGATCAAGGAGATCGCCGACAGATTCTACAGA
AACCTGAACGAGGACGTGCTGAGCAAAACCGGACTGGACAAGCTGAAGGACTACGC
CGAGATCTACTACCATTGCAACACCGACGCCGACAGAAAGAGACTGAACGAGTGCGC
CAGCGAGCTGAGAAAGGAGATCGTGAAGAACTTCAAGAACAGAGATGAGTACAACA
AGCTGTTCAACAAGAAGATGATCGAGATCGTGCTGCCCAAGCACCTGAAGAACGAGG
ACGAGAAGGAAGTGGTGGCCAGCTTCAAGAACTTCACCACCTACTTCACCGGCTTCT
TCACCAACAGAAAGAACATGTACAGCGACGGCGAAGAGTCTACCGCTATTGCCTACA
GATGCATCAACGAGAACCTGCCCAAGCACCTGGACAACGTGAAGGTGTTCGAGAAG
GCCATCAGCAAGCTGAGCAAGAACGCCATCGACGACCTGGATGCCACATATTCTGGCC
TGTGCGGCACAAATCTGTACGACGTGTTCACCGTGGACTACTTCAACTTCCTGCTGCC
CCAAAGCGGAATCACCGAGTACAACAAGATCATCGGCGGCTACACAACAAGCGACGG
CACCAAAGTGAAGGGCATCAACGAGTACATCAACCTGTACAACCAGCAGGTGAGCAA
GAGAGACAAGATCCCCAACCTGAAGATCCTGTACAAGCAGATCCTGAGCGAGAGCGA
GAAGGTGTCTTTCATCCCCCCCAAGTTCGAGGACGACAACGAACTGCTGTCTGCCGT
GAGCGAGTTCTATGCCAACGACGAGACATTTGATGGCATGCCCCTGAAGAAAGCCATC
GACGAAACCAAACTGCTGTTCGGCAACCTGGACAACAGCAGCCTGAACGGCATCTAC
ATCCAGAACGACAGAAGCGTGACCAACCTGAGCAACAGCATGTTCGGCAGCTGGAG
CGTGATTGAGGACCTGTGGAACAAGAACTACGACAGCGTGAACAGCAACAGCAGAA
TCAAGGACATCCAGAAGAGAGAGGACAAGAGAAAGAAGGCCTACAAGGCCGAGAA
GAAGCTGAGCCTGAGCTTCCTGCAGGTGCTGATCAGCAACAGCGAGAACGACGAGAT
CAGAAAGAAGAGCATCGTGGACTACTACAAGACCAGCCTGATGCAGCTGACCGACAA
CCTGAGCGACAAGTACAAAGAAGCCGCCCCCCTGTTTTCTGAGAACTACGACAACGA
GAAGGGCCTGAAGAACGACGACAAGAGCATCAGCCTGATCAAGAACTTCCTGGACG
CCATCAAGGAGATCGAGAAGTTCATCAAGCCCCTGAGCGAGACAAATATCACCGGCG
AGAAGAACGACCTGTTCTACAGCCAGTTCACCCCCCTGCTGGACAACATCAGCAGAA
TCGACAGACTGTACGACAAGGTGAGAAACTACGTGACCCAGAAGCCCTTCAGCACCG
ACAAGATCAAGCTGAACTTCGGCAACAGCCAGCTTCTGAACGGCTGGGACAGAAAC
AAGGAGAAGGACTGTGGCGCTGTGCTGCTGTGTAAGGACGAGAAGTACTACCTGGCC
ATCATCGACAAGAGCAACAACAGCATCCTGGAGAACATCGACTTCCAGGACTGCAAC
GAGAGCGACTACTACGAGAAGATCGTGTACAAGCTGCTGACCAAGATCTCTGGCAAC
CTGCCCAGAGTGTTCTTCAGCGAGAAGCACAAGAAGCTGCTGAGCCCCAGCGATGAG
ATCCTGAAGATCTACAAGAGCGGCACCTTCAAGAAGGGCGACAAGTTCAGCCTTGAC
GACTGCCACAAGCTGATCGACTTCTACAAGGAGAGCTTCAAGAAGTACCCCAAGTGG
CTGATCTACAACTTCAAGTTCAAGAACACCAACGAGTACAACGACATCAGCGAGTTCT
ACAACGACGTGGCCAGCCAGGGATACAACATCAGCAAGATGAAGATCCCCACCAGCT
TCATCGACAAGCTGGTGGACGAGGGCAAGATCTACCTGTTCCAGCTGTACAACAAGG
ACTTCAGCCCCCACAGCAAGGGAACACCTAACCTGCACACCCTGTACTTCAAGATGC
TGTTCGACGAGAGAAACCTGGAGGACGTGGTGTACAAGCTGAATGGCGAGGCCGAG
ATGTTTTACAGACCCGCCAGCATCAAGTATGACAAGCCCACCCACCCTAAGAACACCC
CCATCAAGAACAAGAACACCCTGAACGACAAGAAGGCCAGCACCTTCCCCTACGACC
TGATCAAGGACAAGAGATACACCAAGTGGCAGTTCAGCCTGCACTTCCCCATCACCAT
GAACTTCAAGGCCCCCGACAGAGCCATGATCAACGACGACGTGAGAAACCTGCTGA
AGAGCTGCAACAACAACTTCATCATCGGCATCGACAGAGGCGAGAGAAACCTGCTGT
ACGTGAGCGTGATCGATAGCAACGGCGCCATCATCTACCAGCACAGCCTGAACATCAT
CGGCAACAAGTTCAAGGGCAAGACCTACGAAACCAACTACAGAGAGAAGCTGGCCA
CCAGAGAGAAGGAGAGAACCGAGCAGAGAAGAAACTGGAAGGCCATCGAGAGCAT
CAAGGAGCTGAAGGAGGGCTACATCAGCCAAACCGTGCACGTGATTTGCCAGCTGGT
GGTGAAGTACGACGCCATCATCGTGATGGAGAAGCTGACCGACGGCTTCAAGAGAGG
CAGAACCAAGTTCGAGAAGCAGGTGTACCAGAAGTTCGAGAAGATGCTGATCGACA
AGCTGAACTACTACGTGGACAAGAAGCTGGACCCCAATGAGGAAGGCGGACTGCTG
CATGCTTATCAGCTGACCAACAAGCTGGACAGCTTCGACAAGCTGGGAATGCAGAGC
GGCTTCATCTTCTACGTCAGACCCGACTTCACCAGCAAAATCGACCCCGTGACCGGAT
TTGTGAACCTGCTGTACCCCAGATACGAGAACATCGACAAGGCCAAGGACATGATCA
GCAGATTCGACGACATCAGATACAACGCCGGCGAGGACTTCTTCGAGTTCGACATCG
ACTACGACAAGTTCCCCAAGACCGCCAGCGACTACAGAAAGAAGTGGACCATCTGCA
CCAACGGCGAGAGAATCGAGGCCTTCAGAAACCCCGCCAACAACAACGAGTGGAGC
TACAGAACCATCATCCTGGCCGAGAAGTTCAAGGAGCTGTTCGACAACAACAGCATC
AACTACAGAGACAGCGACGACCTGAAAGCCGAGATCCTGAGCCAAACCAAGGGCAA
GTTCTTCGAGGACTTCTTCAAGCTGCTGAGACTGACCCTGCAGATGAGAAACAGCAA
CCCCGAAACCGGAGAGGACAGGATTCTGAGCCCCGTGAAGGACAAGAACGGCAACT
TCTACGACAGCAGCAAGTACGACGAGAAGAGCAAGCTGCCCTGTGACGCTGATGCTA
ACGGCGCTTACAACATCGCCAGAAAGGGCCTGTGGATCGTGGAGCAGTTCAAGAAGG
CCGACAACGTGTCTGCTGTGGAACCCGTGATCCACAACGACAAGTGGCTGAAGTTCG
TGCAGGAGAACGACATGGCCAACAACAAAAGGCCGGCGGCCACGAAAAAGGCCGG
CCAGGCAAAAAAGAAAAAGGAATTCGGCAGTGGAGAGGGCAGAGGAAGTCTGCTAA
CATGCGGTGACGTCGAGGAGAATCCTGGCCCAGTGAGCAAGGGCGAGGAGCTGTTC
ACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTC
AGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTC
ATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCT
ACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCA
AGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACG
GCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC
ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTG
GAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGC
ATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCC
GACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAAC
CACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCAC
ATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGT
ACAAGGAATTCTAACTAGAGCTCGCTGATCAGCCTCGACTGTGCCTTCTAGTTGCCAG
CCATCTGTTGTTTGCCCCTCCCCCGTGCCTTCCTTGACCCTGGAAGGTGCCACTCCCA
CTGTCCTTTCCTAATAAAATGAGGAAATTGCATCGCATTGTCTGAGTAGGTGTCATTCT
ATTCTGGGGGGTGGGGTGGGGCAGGACAGCAAGGGGGAGGATTGGGAAGAGAATAG
CAGGCATGCTGGGGAGCGGCCGCAGGAACCCCTAGTGATGGAGTTGGCCACTCCCTC
TCTGCGCGCTCGCTCGCTCACTGAGGCCGGGCGACCAAAGGTCGCCCGACGCCCGGG
CTTTGCCCGGGCGGCCTCAGTGAGCGAGCGAGCGCGCAGCTGCCTGCAGGGGCGCCT
GATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCACACCGCATACGTCAAAGCA
ACCATAGTACGCGCCCTGTAGCGGCGCATTAAGCGCGGCGGGTGTGGTGGTTACGCGC
AGCGTGACCGCTACACTTGCCAGCGCCTTAGCGCCCGCTCCTTTCGCTTTCTTCCCTTC
CTTTCTCGCCACGTTCGCCGGCTTTCCCCGTCAAGCTCTAAATCGGGGGCTCCCTTTA
GGGTTCCGATTTAGTGCTTTACGGCACCTCGACCCCAAAAAACTTGATTTGGGTGATG
GTTCACGTAGTGGGCCATCGCCCTGATAGACGGTTTTTCGCCCTTTGACGTTGGAGTC
CACGTTCTTTAATAGTGGACTCTTGTTCCAAACTGGAACAACACTCAACTCTATCTCG
GGCTATTCTTTTGATTTATAAGGGATTTTGCCGATTTCGGTCTATTGGTTAAAAAATGAG
CTGATTTAACAAAAATTTAACGCGAATTTTAACAAAATATTAACGTTTACAATTTTATGG
TGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGCCCCGACACCCGC
CAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACAGAC
AAGCTGTGACCGTCTCCGGGAGCTGCATGTGTCAGAGGTTTTCACCGTCATCACCGA
AACGCGCGAGACGAAAGGGCCTCGTGATACGCCTATTTTTATAGGTTAATGTCATGATA
ATAATGGTTTCTTAGACGTCAGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTA
TTTGTTTATTTTTCTAAATACATTCAAATATGTATCCGCTCATGAGACAATAACCCTGAT
AAATGCTTCAATAATATTGAAAAAGGAAGAGTATGAGTATTCAACATTTCCGTGTCGCC
CTTATTCCCTTTTTTGCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGT
GAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGA
TCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCCGAAGAACGTTTTCCAATGATG
AGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGTATTGACGCCGGGCAAGA
GCAACTCGGTCGCCGCATACACTATTCTCAGAATGACTTGGTTGAGTACTCACCAGTC
ACAGAAAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAA
CCATGAGTGATAACACTGCGGCCAACTTACTTCTGACAACGATCGGAGGACCGAAGG
AGCTAACCGCTTTTTTGCACAACATGGGGGATCATGTAACTCGCCTTGATCGTTGGGA
ACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTGTAGC
AATGGCAACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGG
CAACAATTAATAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTGCGCTCGG
CCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAGCGTGGAAGCCG
CGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTATCTACA
CGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGATCGCTGAGATAGGTG
CCTCACTGATTAAGCATTGGTAACTGTCAGACCAAGTTTACTCATATATACTTTAGATTG
ATTTAAAACTTCATTTTTAATTTAAAAGGATCTAGGTGAAGATCCTTTTTGATAATCTCA
TGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAA
GATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAA
AAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAACTCTTTT
TCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTTCTTCTAGTGTAG
CCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGC
TAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGA
CTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGTG
CACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGA
GCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAG
CGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGT
ATCTTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGC
TCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGT(SEQ ID NO:33).
BES4-HBG-sg02:
GAGGGCCTATTTCCCATGATTCCTTCATATTTGCATATACGATACAAGGCTGTTAGAGAGATAATTGGAATTAATTTGACTGTAAACACAAAGATATTAGTACAAAATACGTGACGTAGAAAGTAATAATTTCTTGGGTAGTTTGCAGTTTTAAAATTATGTTTTAAAATGGACTATCATATGCTTACCGTAACTTGAAAGTATTTCGATTTCTTGGCTTTATATATCTTGTGGAAAGGACGAAACACCGAATTTCTACTATTGTAGATACCAATAGCCTTGACAAGGCAAATT
TTTTGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTTTTAGCGCGTG
CGCCAATTCTGCAGACAAATGGCTCTAGAGGTACCCGTTACATAACTTACGGTAAATG
GCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAGTAACGCCAAT
AGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCA
GTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATG
GCCCGCCTGGCATTGTGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACA
TCTACGTATTAGTCATCGCTATTACCATGGTCGAGGTGAGCCCCACGTTCTGCTTCACT
CTCCCCATCTCCCCCCCCTCCCCACCCCCAATTTTGTATTTATTTATTTTTTAATTATTTT
GTGCAGCGATGGGGGCGGGGGGGGGGGGGGGGCGCGCGCCAGGCGGGGCGGGGCG
GGGCGAGGGGCGGGGCGGGGCGAGGCGGAGAGGTGCGGCGGCAGCCAATCAGAGC
GGCGCGCTCCGAAAGTTTCCTTTTATGGCGAGGCGGCGGCGGCGGCGGCCCTATAAA
AAGCGAAGCGCGCGGCGGGCGGGAGTCGCTGCGCGCTGCCTTCGCCCCGTGCCCCG
CTCCGCCGCCGCCTCGCGCCGCCCGCCCCGGCTCTGACTGACCGCGTTACTCCCACAG
GTGAGCGGGCGGGACGGCCCTTCTCCTCCGGGCTGTAATTAGCTGAGCAAGAGGTAA
GGGTTTAAGGGATGGTTGGTTGGTGGGGTATTAATGTTTAATTACCTGGAGCACCTGCC
TGAAATCACTTTTTTTCAGGTTGGACCGGTGCCACCATGGACTATAAGGACCACGACG
GAGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGATGGCCCCAAA
GAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCATGCAGGAGAGAAAGA
AGATCAGCCACCTGACCCACAGAAACAGCGTGAAGAAAACCATCAGAATGCAGCTG
AACCCCGTGGGAAAGACCATGGACTACTTCCAGGCCAAGCAGATCCTGGAGAACGAC
GAGAAGCTGAAGGAGGACTACCAGAAGATCAAGGAGATCGCCGACAGATTCTACAG
AAACCTGAACGAGGACGTGCTGAGCAAAACCGGACTGGACAAGCTGAAGGACTACG
CCGAGATCTACTACCATTGCAACACCGACGCCGACAGAAAGAGACTGAACGAGTGCG
CCAGCGAGCTGAGAAAGGAGATCGTGAAGAACTTCAAGAACAGAGATGAGTACAAC
AAGCTGTTCAACAAGAAGATGATCGAGATCGTGCTGCCCAAGCACCTGAAGAACGAG
GACGAGAAGGAAGTGGTGGCCAGCTTCAAGAACTTCACCACCTACTTCACCGGCTTC
TTCACCAACAGAAAGAACATGTACAGCGACGGCGAAGAGTCTACCGCTATTGCCTAC
AGATGCATCAACGAGAACCTGCCCAAGCACCTGGACAACGTGAAGGTGTTCGAGAA
GGCCATCAGCAAGCTGAGCAAGAACGCCATCGACGACCTGGATGCCACATATTCTGG
CCTGTGCGGCACAAATCTGTACGACGTGTTCACCGTGGACTACTTCAACTTCCTGCTG
CCCCAAAGCGGAATCACCGAGTACAACAAGATCATCGGCGGCTACACAACAAGCGAC
GGCACCAAAGTGAAGGGCATCAACGAGTACATCAACCTGTACAACCAGCAGGTGAGC
AAGAGAGACAAGATCCCCAACCTGAAGATCCTGTACAAGCAGATCCTGAGCGAGAGC
GAGAAGGTGTCTTTCATCCCCCCCAAGTTCGAGGACGACAACGAACTGCTGTCTGCC
GTGAGCGAGTTCTATGCCAACGACGAGACATTTGATGGCATGCCCCTGAAGAAAGCC
ATCGACGAAACCAAACTGCTGTTCGGCAACCTGGACAACAGCAGCCTGAACGGCATC
TACATCCAGAACGACAGAAGCGTGACCAACCTGAGCAACAGCATGTTCGGCAGCTGG
AGCGTGATTGAGGACCTGTGGAACAAGAACTACGACAGCGTGAACAGCAACAGCAG
AATCAAGGACATCCAGAAGAGAGAGGACAAGAGAAAGAAGGCCTACAAGGCCGAG
AAGAAGCTGAGCCTGAGCTTCCTGCAGGTGCTGATCAGCAACAGCGAGAACGACGA
GATCAGAAAGAAGAGCATCGTGGACTACTACAAGACCAGCCTGATGCAGCTGACCGA
CAACCTGAGCGACAAGTACAAAGAAGCCGCCCCCCTGTTTTCTGAGAACTACGACAA
CGAGAAGGGCCTGAAGAACGACGACAAGAGCATCAGCCTGATCAAGAACTTCCTGG
ACGCCATCAAGGAGATCGAGAAGTTCATCAAGCCCCTGAGCGAGACAAATATCACCG
GCGAGAAGAACGACCTGTTCTACAGCCAGTTCACCCCCCTGCTGGACAACATCAGCA
GAATCGACAGACTGTACGACAAGGTGAGAAACTACGTGACCCAGAAGCCCTTCAGCA
CCGACAAGATCAAGCTGAACTTCGGCAACAGCCAGCTTCTGAACGGCTGGGACAGA
AACAAGGAGAAGGACTGTGGCGCTGTGCTGCTGTGTAAGGACGAGAAGTACTACCTG
GCCATCATCGACAAGAGCAACAACAGCATCCTGGAGAACATCGACTTCCAGGACTGC
AACGAGAGCGACTACTACGAGAAGATCGTGTACAAGCTGCTGACCAAGATCTCTGGC
AACCTGCCCAGAGTGTTCTTCAGCGAGAAGCACAAGAAGCTGCTGAGCCCCAGCGAT
GAGATCCTGAAGATCTACAAGAGCGGCACCTTCAAGAAGGGCGACAAGTTCAGCCTT
GACGACTGCCACAAGCTGATCGACTTCTACAAGGAGAGCTTCAAGAAGTACCCCAAG
TGGCTGATCTACAACTTCAAGTTCAAGAACACCAACGAGTACAACGACATCAGCGAG
TTCTACAACGACGTGGCCAGCCAGGGATACAACATCAGCAAGATGAAGATCCCCACC
AGCTTCATCGACAAGCTGGTGGACGAGGGCAAGATCTACCTGTTCCAGCTGTACAAC
AAGGACTTCAGCCCCCACAGCAAGGGAACACCTAACCTGCACACCCTGTACTTCAAG
ATGCTGTTCGACGAGAGAAACCTGGAGGACGTGGTGTACAAGCTGAATGGCGAGGCC
GAGATGTTTTACAGACCCGCCAGCATCAAGTATGACAAGCCCACCCACCCTAAGAAC
ACCCCCATCAAGAACAAGAACACCCTGAACGACAAGAAGGCCAGCACCTTCCCCTAC
GACCTGATCAAGGACAAGAGATACACCAAGTGGCAGTTCAGCCTGCACTTCCCCATC
ACCATGAACTTCAAGGCCCCCGACAGAGCCATGATCAACGACGACGTGAGAAACCTG
CTGAAGAGCTGCAACAACAACTTCATCATCGGCATCGACAGAGGCGAGAGAAACCTG
CTGTACGTGAGCGTGATCGATAGCAACGGCGCCATCATCTACCAGCACAGCCTGAACA
TCATCGGCAACAAGTTCAAGGGCAAGACCTACGAAACCAACTACAGAGAGAAGCTG
GCCACCAGAGAGAAGGAGAGAACCGAGCAGAGAAGAAACTGGAAGGCCATCGAGA
GCATCAAGGAGCTGAAGGAGGGCTACATCAGCCAAACCGTGCACGTGATTTGCCAGC
TGGTGGTGAAGTACGACGCCATCATCGTGATGGAGAAGCTGACCGACGGCTTCAAGA
GAGGCAGAACCAAGTTCGAGAAGCAGGTGTACCAGAAGTTCGAGAAGATGCTGATC
GACAAGCTGAACTACTACGTGGACAAGAAGCTGGACCCCAATGAGGAAGGCGGACT
GCTGCATGCTTATCAGCTGACCAACAAGCTGGACAGCTTCGACAAGCTGGGAATGCA
GAGCGGCTTCATCTTCTACGTCAGACCCGACTTCACCAGCAAAATCGACCCCGTGACC
GGATTTGTGAACCTGCTGTACCCCAGATACGAGAACATCGACAAGGCCAAGGACATG
ATCAGCAGATTCGACGACATCAGATACAACGCCGGCGAGGACTTCTTCGAGTTCGACA
TCGACTACGACAAGTTCCCCAAGACCGCCAGCGACTACAGAAAGAAGTGGACCATCT
GCACCAACGGCGAGAGAATCGAGGCCTTCAGAAACCCCGCCAACAACAACGAGTGG
AGCTACAGAACCATCATCCTGGCCGAGAAGTTCAAGGAGCTGTTCGACAACAACAGC
ATCAACTACAGAGACAGCGACGACCTGAAAGCCGAGATCCTGAGCCAAACCAAGGG
CAAGTTCTTCGAGGACTTCTTCAAGCTGCTGAGACTGACCCTGCAGATGAGAAACAG
CAACCCCGAAACCGGAGAGGACAGGATTCTGAGCCCCGTGAAGGACAAGAACGGCA
ACTTCTACGACAGCAGCAAGTACGACGAGAAGAGCAAGCTGCCCTGTGACGCTGATG
CTAACGGCGCTTACAACATCGCCAGAAAGGGCCTGTGGATCGTGGAGCAGTTCAAGA
AGGCCGACAACGTGTCTGCTGTGGAACCCGTGATCCACAACGACAAGTGGCTGAAGT
TCGTGCAGGAGAACGACATGGCCAACAACAAAAGGCCGGCGGCCACGAAAAAGGCC
GGCCAGGCAAAAAAGAAAAAGGAATTCGGCAGTGGAGAGGGCAGAGGAAGTCTGCT
AACATGCGGTGACGTCGAGGAGAATCCTGGCCCAGTGAGCAAGGGCGAGGAGCTGT
TCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGT
TCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGT
TCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGA
CCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTT
CAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGA
CGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACC
GCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAG
CTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAAC
GGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTC
GCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC
AACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGAT
CACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAG
CTGTACAAGGAATTCTAACTAGAGCTCGCTGATCAGCCTCGACTGTGCCTTCTAGTTG
CCAGCCATCTGTTGTTTGCCCCTCCCCCGTGCCTTCCTTGACCCTGGAAGGTGCCACT
CCCACTGTCCTTTCCTAATAAAATGAGGAAATTGCATCGCATTGTCTGAGTAGGTGTCA
TTCTATTCTGGGGGGTGGGGTGGGGCAGGACAGCAAGGGGGAGGATTGGGAAGAGA
ATAGCAGGCATGCTGGGGAGCGGCCGCAGGAACCCCTAGTGATGGAGTTGGCCACTC
CCTCTCTGCGCGCTCGCTCGCTCACTGAGGCCGGGCGACCAAAGGTCGCCCGACGCC
CGGGCTTTGCCCGGGCGGCCTCAGTGAGCGAGCGAGCGCGCAGCTGCCTGCAGGGG
CGCCTGATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCACACCGCATACGTCA
AAGCAACCATAGTACGCGCCCTGTAGCGGCGCATTAAGCGCGGCGGGTGTGGTGGTT
ACGCGCAGCGTGACCGCTACACTTGCCAGCGCCTTAGCGCCCGCTCCTTTCGCTTTCT
TCCCTTCCTTTCTCGCCACGTTCGCCGGCTTTCCCCGTCAAGCTCTAAATCGGGGGCTC
CCTTTAGGGTTCCGATTTAGTGCTTTACGGCACCTCGACCCCAAAAAACTTGATTTGG
GTGATGGTTCACGTAGTGGGCCATCGCCCTGATAGACGGTTTTTCGCCCTTTGACGTT
GGAGTCCACGTTCTTTAATAGTGGACTCTTGTTCCAAACTGGAACAACACTCAACTCT
ATCTCGGGCTATTCTTTTGATTTATAAGGGATTTTGCCGATTTCGGTCTATTGGTTAAAA
AATGAGCTGATTTAACAAAAATTTAACGCGAATTTTAACAAAATATTAACGTTTACAAT
TTTATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGCCCCGAC
ACCCGCCAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTA
CAGACAAGCTGTGACCGTCTCCGGGAGCTGCATGTGTCAGAGGTTTTCACCGTCATC
ACCGAAACGCGCGAGACGAAAGGGCCTCGTGATACGCCTATTTTTATAGGTTAATGTC
ATGATAATAATGGTTTCTTAGACGTCAGGTGGCACTTTTCGGGGAAATGTGCGCGGAA
CCCCTATTTGTTTATTTTTCTAAATACATTCAAATATGTATCCGCTCATGAGACAATAACC
CTGATAAATGCTTCAATAATATTGAAAAAGGAAGAGTATGAGTATTCAACATTTCCGTG
TCGCCCTTATTCCCTTTTTTGCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACG
CTGGTGAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAA
CTGGATCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCCGAAGAACGTTTTCCAA
TGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGTATTGACGCCGGG
CAAGAGCAACTCGGTCGCCGCATACACTATTCTCAGAATGACTTGGTTGAGTACTCAC
CAGTCACAGAAAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGC
CATAACCATGAGTGATAACACTGCGGCCAACTTACTTCTGACAACGATCGGAGGACCG
AAGGAGCTAACCGCTTTTTTGCACAACATGGGGGATCATGTAACTCGCCTTGATCGTT
GGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTG
TAGCAATGGCAACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTC
CCGGCAACAATTAATAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTGCG
CTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAGCGTGGA
AGCCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTAT
CTACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGATCGCTGAGAT
AGGTGCCTCACTGATTAAGCATTGGTAACTGTCAGACCAAGTTTACTCATATATACTTT
AGATTGATTTAAAACTTCATTTTTAATTTAAAAGGATCTAGGTGAAGATCCTTTTTGATA
ATCTCATGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGT
AGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGC
AAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAA
CTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTTCTTCT
AGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTC
GCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCG
GGTTGGACTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGG
GTTCGTGCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTAC
AGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTAT
CCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAA
ACGCCTGGTATCTTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTT
TTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGT(SEQ ID NO:34)
BES4-HBG-SG03:
GAGGGCCTATTTCCCATGATTCCTTCATATTTGCATATACGATACAAGGCTGTTAGAGAGATAATTGGAATTAATTTGACTGTAAACACAAAGATATTAGTACAAAATACGTGACGTAGAAAGTAATAATTTCTTGGGTAGTTTGCAGTTTTAAAATTATGTTTTAAAATGGACTATCATATGCTTACCGTAACTTGAAAGTATTTCGATTTCTTGGCTTTATATATCTTGTGGAAAGGACGAAACACCGAATTTCTACTATTGTAGATCCTTGTCAAGGCTATTGGTCAAGTTTTTTGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTTTTAGCGCGTGCGCCAATTCTGCAGACAAATGGCTCTAGAGGTACCCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAG
TACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGG
CCCGCCTGGCATTGTGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACAT
CTACGTATTAGTCATCGCTATTACCATGGTCGAGGTGAGCCCCACGTTCTGCTTCACTC
TCCCCATCTCCCCCCCCTCCCCACCCCCAATTTTGTATTTATTTATTTTTTAATTATTTTG
TGCAGCGATGGGGGCGGGGGGGGGGGGGGGGCGCGCGCCAGGCGGGGCGGGGCGG
GGCGAGGGGCGGGGCGGGGCGAGGCGGAGAGGTGCGGCGGCAGCCAATCAGAGCG
GCGCGCTCCGAAAGTTTCCTTTTATGGCGAGGCGGCGGCGGCGGCGGCCCTATAAAA
AGCGAAGCGCGCGGCGGGCGGGAGTCGCTGCGCGCTGCCTTCGCCCCGTGCCCCGCT
CCGCCGCCGCCTCGCGCCGCCCGCCCCGGCTCTGACTGACCGCGTTACTCCCACAGG
TGAGCGGGCGGGACGGCCCTTCTCCTCCGGGCTGTAATTAGCTGAGCAAGAGGTAAG
GGTTTAAGGGATGGTTGGTTGGTGGGGTATTAATGTTTAATTACCTGGAGCACCTGCCT
GAAATCACTTTTTTTCAGGTTGGACCGGTGCCACCATGGACTATAAGGACCACGACGG
AGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGATGGCCCCAAAG
AAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCATGCAGGAGAGAAAGAA
GATCAGCCACCTGACCCACAGAAACAGCGTGAAGAAAACCATCAGAATGCAGCTGA
ACCCCGTGGGAAAGACCATGGACTACTTCCAGGCCAAGCAGATCCTGGAGAACGACG
AGAAGCTGAAGGAGGACTACCAGAAGATCAAGGAGATCGCCGACAGATTCTACAGA
AACCTGAACGAGGACGTGCTGAGCAAAACCGGACTGGACAAGCTGAAGGACTACGC
CGAGATCTACTACCATTGCAACACCGACGCCGACAGAAAGAGACTGAACGAGTGCGC
CAGCGAGCTGAGAAAGGAGATCGTGAAGAACTTCAAGAACAGAGATGAGTACAACA
AGCTGTTCAACAAGAAGATGATCGAGATCGTGCTGCCCAAGCACCTGAAGAACGAGG
ACGAGAAGGAAGTGGTGGCCAGCTTCAAGAACTTCACCACCTACTTCACCGGCTTCT
TCACCAACAGAAAGAACATGTACAGCGACGGCGAAGAGTCTACCGCTATTGCCTACA
GATGCATCAACGAGAACCTGCCCAAGCACCTGGACAACGTGAAGGTGTTCGAGAAG
GCCATCAGCAAGCTGAGCAAGAACGCCATCGACGACCTGGATGCCACATATTCTGGCC
TGTGCGGCACAAATCTGTACGACGTGTTCACCGTGGACTACTTCAACTTCCTGCTGCC
CCAAAGCGGAATCACCGAGTACAACAAGATCATCGGCGGCTACACAACAAGCGACGG
CACCAAAGTGAAGGGCATCAACGAGTACATCAACCTGTACAACCAGCAGGTGAGCAA
GAGAGACAAGATCCCCAACCTGAAGATCCTGTACAAGCAGATCCTGAGCGAGAGCGA
GAAGGTGTCTTTCATCCCCCCCAAGTTCGAGGACGACAACGAACTGCTGTCTGCCGT
GAGCGAGTTCTATGCCAACGACGAGACATTTGATGGCATGCCCCTGAAGAAAGCCATC
GACGAAACCAAACTGCTGTTCGGCAACCTGGACAACAGCAGCCTGAACGGCATCTAC
ATCCAGAACGACAGAAGCGTGACCAACCTGAGCAACAGCATGTTCGGCAGCTGGAG
CGTGATTGAGGACCTGTGGAACAAGAACTACGACAGCGTGAACAGCAACAGCAGAA
TCAAGGACATCCAGAAGAGAGAGGACAAGAGAAAGAAGGCCTACAAGGCCGAGAA
GAAGCTGAGCCTGAGCTTCCTGCAGGTGCTGATCAGCAACAGCGAGAACGACGAGAT
CAGAAAGAAGAGCATCGTGGACTACTACAAGACCAGCCTGATGCAGCTGACCGACAA
CCTGAGCGACAAGTACAAAGAAGCCGCCCCCCTGTTTTCTGAGAACTACGACAACGA
GAAGGGCCTGAAGAACGACGACAAGAGCATCAGCCTGATCAAGAACTTCCTGGACG
CCATCAAGGAGATCGAGAAGTTCATCAAGCCCCTGAGCGAGACAAATATCACCGGCG
AGAAGAACGACCTGTTCTACAGCCAGTTCACCCCCCTGCTGGACAACATCAGCAGAA
TCGACAGACTGTACGACAAGGTGAGAAACTACGTGACCCAGAAGCCCTTCAGCACCG
ACAAGATCAAGCTGAACTTCGGCAACAGCCAGCTTCTGAACGGCTGGGACAGAAAC
AAGGAGAAGGACTGTGGCGCTGTGCTGCTGTGTAAGGACGAGAAGTACTACCTGGCC
ATCATCGACAAGAGCAACAACAGCATCCTGGAGAACATCGACTTCCAGGACTGCAAC
GAGAGCGACTACTACGAGAAGATCGTGTACAAGCTGCTGACCAAGATCTCTGGCAAC
CTGCCCAGAGTGTTCTTCAGCGAGAAGCACAAGAAGCTGCTGAGCCCCAGCGATGAG
ATCCTGAAGATCTACAAGAGCGGCACCTTCAAGAAGGGCGACAAGTTCAGCCTTGAC
GACTGCCACAAGCTGATCGACTTCTACAAGGAGAGCTTCAAGAAGTACCCCAAGTGG
CTGATCTACAACTTCAAGTTCAAGAACACCAACGAGTACAACGACATCAGCGAGTTCT
ACAACGACGTGGCCAGCCAGGGATACAACATCAGCAAGATGAAGATCCCCACCAGCT
TCATCGACAAGCTGGTGGACGAGGGCAAGATCTACCTGTTCCAGCTGTACAACAAGG
ACTTCAGCCCCCACAGCAAGGGAACACCTAACCTGCACACCCTGTACTTCAAGATGC
TGTTCGACGAGAGAAACCTGGAGGACGTGGTGTACAAGCTGAATGGCGAGGCCGAG
ATGTTTTACAGACCCGCCAGCATCAAGTATGACAAGCCCACCCACCCTAAGAACACCC
CCATCAAGAACAAGAACACCCTGAACGACAAGAAGGCCAGCACCTTCCCCTACGACC
TGATCAAGGACAAGAGATACACCAAGTGGCAGTTCAGCCTGCACTTCCCCATCACCAT
GAACTTCAAGGCCCCCGACAGAGCCATGATCAACGACGACGTGAGAAACCTGCTGA
AGAGCTGCAACAACAACTTCATCATCGGCATCGACAGAGGCGAGAGAAACCTGCTGT
ACGTGAGCGTGATCGATAGCAACGGCGCCATCATCTACCAGCACAGCCTGAACATCAT
CGGCAACAAGTTCAAGGGCAAGACCTACGAAACCAACTACAGAGAGAAGCTGGCCA
CCAGAGAGAAGGAGAGAACCGAGCAGAGAAGAAACTGGAAGGCCATCGAGAGCAT
CAAGGAGCTGAAGGAGGGCTACATCAGCCAAACCGTGCACGTGATTTGCCAGCTGGT
GGTGAAGTACGACGCCATCATCGTGATGGAGAAGCTGACCGACGGCTTCAAGAGAGG
CAGAACCAAGTTCGAGAAGCAGGTGTACCAGAAGTTCGAGAAGATGCTGATCGACA
AGCTGAACTACTACGTGGACAAGAAGCTGGACCCCAATGAGGAAGGCGGACTGCTG
CATGCTTATCAGCTGACCAACAAGCTGGACAGCTTCGACAAGCTGGGAATGCAGAGC
GGCTTCATCTTCTACGTCAGACCCGACTTCACCAGCAAAATCGACCCCGTGACCGGAT
TTGTGAACCTGCTGTACCCCAGATACGAGAACATCGACAAGGCCAAGGACATGATCA
GCAGATTCGACGACATCAGATACAACGCCGGCGAGGACTTCTTCGAGTTCGACATCG
ACTACGACAAGTTCCCCAAGACCGCCAGCGACTACAGAAAGAAGTGGACCATCTGCA
CCAACGGCGAGAGAATCGAGGCCTTCAGAAACCCCGCCAACAACAACGAGTGGAGC
TACAGAACCATCATCCTGGCCGAGAAGTTCAAGGAGCTGTTCGACAACAACAGCATC
AACTACAGAGACAGCGACGACCTGAAAGCCGAGATCCTGAGCCAAACCAAGGGCAA
GTTCTTCGAGGACTTCTTCAAGCTGCTGAGACTGACCCTGCAGATGAGAAACAGCAA
CCCCGAAACCGGAGAGGACAGGATTCTGAGCCCCGTGAAGGACAAGAACGGCAACT
TCTACGACAGCAGCAAGTACGACGAGAAGAGCAAGCTGCCCTGTGACGCTGATGCTA
ACGGCGCTTACAACATCGCCAGAAAGGGCCTGTGGATCGTGGAGCAGTTCAAGAAGG
CCGACAACGTGTCTGCTGTGGAACCCGTGATCCACAACGACAAGTGGCTGAAGTTCG
TGCAGGAGAACGACATGGCCAACAACAAAAGGCCGGCGGCCACGAAAAAGGCCGG
CCAGGCAAAAAAGAAAAAGGAATTCGGCAGTGGAGAGGGCAGAGGAAGTCTGCTAA
CATGCGGTGACGTCGAGGAGAATCCTGGCCCAGTGAGCAAGGGCGAGGAGCTGTTC
ACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTC
AGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTC
ATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCT
ACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCA
AGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACG
GCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC
ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTG
GAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGC
ATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCC
GACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAAC
CACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCAC
ATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGT
ACAAGGAATTCTAACTAGAGCTCGCTGATCAGCCTCGACTGTGCCTTCTAGTTGCCAG
CCATCTGTTGTTTGCCCCTCCCCCGTGCCTTCCTTGACCCTGGAAGGTGCCACTCCCA
CTGTCCTTTCCTAATAAAATGAGGAAATTGCATCGCATTGTCTGAGTAGGTGTCATTCT
ATTCTGGGGGGTGGGGTGGGGCAGGACAGCAAGGGGGAGGATTGGGAAGAGAATAG
CAGGCATGCTGGGGAGCGGCCGCAGGAACCCCTAGTGATGGAGTTGGCCACTCCCTC
TCTGCGCGCTCGCTCGCTCACTGAGGCCGGGCGACCAAAGGTCGCCCGACGCCCGGG
CTTTGCCCGGGCGGCCTCAGTGAGCGAGCGAGCGCGCAGCTGCCTGCAGGGGCGCCT
GATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCACACCGCATACGTCAAAGCA
ACCATAGTACGCGCCCTGTAGCGGCGCATTAAGCGCGGCGGGTGTGGTGGTTACGCGC
AGCGTGACCGCTACACTTGCCAGCGCCTTAGCGCCCGCTCCTTTCGCTTTCTTCCCTTC
CTTTCTCGCCACGTTCGCCGGCTTTCCCCGTCAAGCTCTAAATCGGGGGCTCCCTTTA
GGGTTCCGATTTAGTGCTTTACGGCACCTCGACCCCAAAAAACTTGATTTGGGTGATG
GTTCACGTAGTGGGCCATCGCCCTGATAGACGGTTTTTCGCCCTTTGACGTTGGAGTC
CACGTTCTTTAATAGTGGACTCTTGTTCCAAACTGGAACAACACTCAACTCTATCTCG
GGCTATTCTTTTGATTTATAAGGGATTTTGCCGATTTCGGTCTATTGGTTAAAAAATGAG
CTGATTTAACAAAAATTTAACGCGAATTTTAACAAAATATTAACGTTTACAATTTTATGG
TGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGCCCCGACACCCGC
CAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACAGAC
AAGCTGTGACCGTCTCCGGGAGCTGCATGTGTCAGAGGTTTTCACCGTCATCACCGA
AACGCGCGAGACGAAAGGGCCTCGTGATACGCCTATTTTTATAGGTTAATGTCATGATA
ATAATGGTTTCTTAGACGTCAGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTA
TTTGTTTATTTTTCTAAATACATTCAAATATGTATCCGCTCATGAGACAATAACCCTGAT
AAATGCTTCAATAATATTGAAAAAGGAAGAGTATGAGTATTCAACATTTCCGTGTCGCC
CTTATTCCCTTTTTTGCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGT
GAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGA
TCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCCGAAGAACGTTTTCCAATGATG
AGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGTATTGACGCCGGGCAAGA
GCAACTCGGTCGCCGCATACACTATTCTCAGAATGACTTGGTTGAGTACTCACCAGTC
ACAGAAAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAA
CCATGAGTGATAACACTGCGGCCAACTTACTTCTGACAACGATCGGAGGACCGAAGG
AGCTAACCGCTTTTTTGCACAACATGGGGGATCATGTAACTCGCCTTGATCGTTGGGA
ACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTGTAGC
AATGGCAACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGG
CAACAATTAATAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTGCGCTCGG
CCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAGCGTGGAAGCCG
CGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTATCTACA
CGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGATCGCTGAGATAGGTG
CCTCACTGATTAAGCATTGGTAACTGTCAGACCAAGTTTACTCATATATACTTTAGATTG
ATTTAAAACTTCATTTTTAATTTAAAAGGATCTAGGTGAAGATCCTTTTTGATAATCTCA
TGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAA
GATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAA
AAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAACTCTTTT
TCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTTCTTCTAGTGTAG
CCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGC
TAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGA
CTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGTG
CACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGA
GCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAG
CGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGT
ATCTTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGC
TCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGT(SEQ ID NO:35).
PX458-HBG-SG01:
GAGGGCCTATTTCCCATGATTCCTTCATATTTGCATATACGATACAAGGCTGTTAGAGAGATAATTGGAATTAATTTGACTGTAAACACAAAGATATTAGTACAAAATACGTGACGTAGAAAGTAATAATTTCTTGGGTAGTTTGCAGTTTTAAAATTATGTTTTAAAATGGACTATCATATGCTTACCGTAACTTGAAAGTATTTCGATTTCTTGGCTTTATATATCTTGTGGAAAGGACGAAACACCCTTGTCAAGGCTATTGGTCAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTTGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTTTTAGCGCGTGCGCCAATTCTGCAGACAAATGGCTCTAGAGGTACCCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTGTGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTCGAGGTGAGCCCCACGTTCTGCTTCACTCTCC
CCATCTCCCCCCCCTCCCCACCCCCAATTTTGTATTTATTTATTTTTTAATTATTTTGTGC
AGCGATGGGGGCGGGGGGGGGGGGGGGGCGCGCGCCAGGCGGGGCGGGGCGGGGC
GAGGGGCGGGGCGGGGCGAGGCGGAGAGGTGCGGCGGCAGCCAATCAGAGCGGCG
CGCTCCGAAAGTTTCCTTTTATGGCGAGGCGGCGGCGGCGGCGGCCCTATAAAAAGC
GAAGCGCGCGGCGGGCGGGAGTCGCTGCGCGCTGCCTTCGCCCCGTGCCCCGCTCCG
CCGCCGCCTCGCGCCGCCCGCCCCGGCTCTGACTGACCGCGTTACTCCCACAGGTGA
GCGGGCGGGACGGCCCTTCTCCTCCGGGCTGTAATTAGCTGAGCAAGAGGTAAGGGT
TTAAGGGATGGTTGGTTGGTGGGGTATTAATGTTTAATTACCTGGAGCACCTGCCTGAA
ATCACTTTTTTTCAGGTTGGACCGGTGCCACCATGGACTATAAGGACCACGACGGAGA
CTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGATGGCCCCAAAGAAG
AAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGG
CCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGT
GCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGA
ACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGA
AGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAA
GAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAA
GAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAAC
ATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAG
AAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCC
CACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGAACCCCGACAAC
AGCGACGTGGACAAGCTGTTCATCCAGCTGGTGCAGACCTACAACCAGCTGTTCGAG
GAAAACCCCATCAACGCCAGCGGCGTGGACGCCAAGGCCATCCTGTCTGCCAGACTG
AGCAAGAGCAGACGGCTGGAAAATCTGATCGCCCAGCTGCCCGGCGAGAAGAAGAA
TGGCCTGTTCGGAAACCTGATTGCCCTGAGCCTGGGCCTGACCCCCAACTTCAAGAG
CAACTTCGACCTGGCCGAGGATGCCAAACTGCAGCTGAGCAAGGACACCTACGACGA
CGACCTGGACAACCTGCTGGCCCAGATCGGCGACCAGTACGCCGACCTGTTTCTGGC
CGCCAAGAACCTGTCCGACGCCATCCTGCTGAGCGACATCCTGAGAGTGAACACCGA
GATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCA
GGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGA
GATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAG
CCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGA
GGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCG
ACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGC
GGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCC
TGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGC
CTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGG
TGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGA
ACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCG
TGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCT
TCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGG
AAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGA
CTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCAC
GATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGAC
ATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGG
AACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGC
GGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGG
GACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAAC
AGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAG
AAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCC
GGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTC
GTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGA
GAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATC
GAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAA
CACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATAT
GTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATC
GTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGC
GACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGAT
GAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGA
CAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCAT
CAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGG
ACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAG
TGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAA
AGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGT
GGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGA
CTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCA
AGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGAT
TACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAAC
CGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAG
CATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAA
AGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTG
GGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTG
GTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCT
GGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGA
AGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTC
CCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCA
GAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAG
CCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGT
GGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAA
GAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCA
CCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACC
AATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGG
TACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGC
CTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGC
CACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGGAATTCGGCAGTGGAGAGGGC
AGAGGAAGTCTGCTAACATGCGGTGACGTCGAGGAGAATCCTGGCCCAGTGAGCAAG
GGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTA
AACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAA
GCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCT
CGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAA
GCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCAT
CTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCG
ACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACA
TCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGA
CAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACG
GCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCG
TGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCA
ACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTC
TCGGCATGGACGAGCTGTACAAGGAATTCTAACTAGAGCTCGCTGATCAGCCTCGACT
GTGCCTTCTAGTTGCCAGCCATCTGTTGTTTGCCCCTCCCCCGTGCCTTCCTTGACCCT
GGAAGGTGCCACTCCCACTGTCCTTTCCTAATAAAATGAGGAAATTGCATCGCATTGT
CTGAGTAGGTGTCATTCTATTCTGGGGGGTGGGGTGGGGCAGGACAGCAAGGGGGAG
GATTGGGAAGAGAATAGCAGGCATGCTGGGGAGCGGCCGCAGGAACCCCTAGTGATG
GAGTTGGCCACTCCCTCTCTGCGCGCTCGCTCGCTCACTGAGGCCGGGCGACCAAAG
GTCGCCCGACGCCCGGGCTTTGCCCGGGCGGCCTCAGTGAGCGAGCGAGCGCGCAG
CTGCCTGCAGGGGCGCCTGATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCAC
ACCGCATACGTCAAAGCAACCATAGTACGCGCCCTGTAGCGGCGCATTAAGCGCGGC
GGGTGTGGTGGTTACGCGCAGCGTGACCGCTACACTTGCCAGCGCCTTAGCGCCCGC
TCCTTTCGCTTTCTTCCCTTCCTTTCTCGCCACGTTCGCCGGCTTTCCCCGTCAAGCTC
TAAATCGGGGGCTCCCTTTAGGGTTCCGATTTAGTGCTTTACGGCACCTCGACCCCAA
AAAACTTGATTTGGGTGATGGTTCACGTAGTGGGCCATCGCCCTGATAGACGGTTTTT
CGCCCTTTGACGTTGGAGTCCACGTTCTTTAATAGTGGACTCTTGTTCCAAACTGGAA
CAACACTCAACTCTATCTCGGGCTATTCTTTTGATTTATAAGGGATTTTGCCGATTTCGG
TCTATTGGTTAAAAAATGAGCTGATTTAACAAAAATTTAACGCGAATTTTAACAAAATA
TTAACGTTTACAATTTTATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTA
AGCCAGCCCCGACACCCGCCAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTC
CCGGCATCCGCTTACAGACAAGCTGTGACCGTCTCCGGGAGCTGCATGTGTCAGAGG
TTTTCACCGTCATCACCGAAACGCGCGAGACGAAAGGGCCTCGTGATACGCCTATTTT
TATAGGTTAATGTCATGATAATAATGGTTTCTTAGACGTCAGGTGGCACTTTTCGGGGA
AATGTGCGCGGAACCCCTATTTGTTTATTTTTCTAAATACATTCAAATATGTATCCGCTC
ATGAGACAATAACCCTGATAAATGCTTCAATAATATTGAAAAAGGAAGAGTATGAGTAT
TCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGGCATTTTGCCTTCCTGTTTTTGC
TCACCCAGAAACGCTGGTGAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGT
GGGTTACATCGAACTGGATCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCCGAA
GAACGTTTTCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCG
TATTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACACTATTCTCAGAATGACTTG
GTTGAGTACTCACCAGTCACAGAAAAGCATCTTACGGATGGCATGACAGTAAGAGAA
TTATGCAGTGCTGCCATAACCATGAGTGATAACACTGCGGCCAACTTACTTCTGACAA
CGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACATGGGGGATCATGTAA
CTCGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTG
ACACCACGATGCCTGTAGCAATGGCAACAACGTTGCGCAAACTATTAACTGGCGAACT
ACTTACTCTAGCTTCCCGGCAACAATTAATAGACTGGATGGAGGCGGATAAAGTTGCA
GGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAG
CCGGTGAGCGTGGAAGCCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCT
CCCGTATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAG
ACAGATCGCTGAGATAGGTGCCTCACTGATTAAGCATTGGTAACTGTCAGACCAAGTT
TACTCATATATACTTTAGATTGATTTAAAACTTCATTTTTAATTTAAAAGGATCTAGGTG
AAGATCCTTTTTGATAATCTCATGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTG
AGCGTCAGACCCCGTAGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGC
GTAATCTGCTGCTTGCAAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCG
GATCAAGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATAC
CAAATACTGTTCTTCTAGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGC
ACCGCCTACATACCTCGCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATA
AGTCGTGTCTTACCGGGTTGGACTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTC
GGGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACGACCTACACCG
AACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAA
AGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAG
CTTCCAGGGGGAAACGCCTGGTATCTTTATAGTCCTGTCGGGTTTCGCCACCTCTGACT
TGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGT(SEQ ID NO:36)。
after plasmid sequences required for BES4 Activity detection experiments were obtained from commercially synthesized plasmids and strains, the plasmids and strains were amplified by direct inoculation:
(a) 15mL of antibiotic-free LB liquid medium is taken, 15 mu L of 1000 XAmp antibiotics is added, then a white gun head is utilized to pick up the strain with the target plasmid stored therein, the strain is placed in the medium, and the strain is cultured at 37 ℃ and 200rpm for overnight;
(b) Centrifuging overnight cultured strain at 8000rpm for 3min, centrifuging the strain to bottom, and pouring out culture medium;
(c) Extracting by using a small extraction kit of the radix angelicae or a small extraction medium amount kit of the radix angelicae without endotoxin;
(d) After the plasmid is extracted, concentration quantification is carried out by using Nanodrop, and the plasmid is preserved at the temperature of minus 20 ℃.
(3) Plasmid transfer into human cells
(a) Plasmid transfection utilized the Lipo3000 kit (1.5 μg of plasmid per well input);
(b) Culturing the cells for 2-3 days after transfection, and recovering the cells after full gene editing;
(c) After the cell culture is completed, a gun head is used for sucking the culture medium, 200 mu L of 0.5M EDTA solution is added into each hole of a 12-hole plate, after the culture medium is placed for ten minutes, the culture medium is resuspended by blowing, transferred into an EP tube, and centrifuged at 12000rpm for 1min, and the supernatant is taken for cell recovery;
(4) Editing activity identification
After harvesting the cells, genome extraction and T7E1 enzyme assay were performed to detect activity as follows:
(a) Genomic DNA extraction: genomic DNA was extracted using a genomic DNA extraction kit (Tiangen), and gDNA concentration was measured using Nanodrop;
(b) Targeting region PCR: target site region amplification was performed from gDNA using GXL Prime, the amplification primers are shown in table 7 below, and the deoxynucleotide sequences used were synthesized in the shenzhen national gene library synthesis and editing platform. And purified using PCR purification and gel extraction kit (MN). PCR product cleanliness was analyzed by agarose gel electrophoresis while concentration was measured using Nanodrop.
(c) Denaturation and annealing: denaturation and annealing of the purified product of step (c) was performed using a Bio-rad PCR instrument. The T7E1 cleavage reaction was carried out by adding an equivalent amount of substrate DNA (about 200-300ng/rxn, 10. Mu.l of reaction system).
(d) T7E1 enzyme digestion: 0.2. Mu.l of T7EI nuclease was added to the 10. Mu.l of sample in step (d). The cleavage reaction was performed at 37℃for 20 minutes.
(e) Activity detection: after completion of the cleavage reaction, T7E1 was added to a loading buffer to carry out agarose gel detection.
Table 7: PCR amplification primer list
As shown in FIG. 13, the sg03 plasmid of BES4 has human cell editing activity.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.
Table I: novel Type II Crispr-Cas system
Table II: novel Cpf1 (Type V) system
Claims (10)
1. A Cas protein, comprising:
the amino acid sequence shown in SEQ ID NO. 4.
2. A nucleic acid sequence encoding the Cas protein of claim 1.
3. The nucleic acid sequence of claim 2, wherein the nucleic acid sequence is DNA or RNA.
4. An expression vector comprising the nucleic acid sequence of claim 2 or 3.
5. A recombinant cell comprising the expression vector of claim 4, wherein the recombinant cell is a non-plant cell.
6. The recombinant cell of claim 5, wherein the recombinant cell is a eukaryotic cell.
7. The recombinant cell of claim 6, wherein the recombinant cell is an animal cell.
8. A Crispr-Cas system comprising the Cas protein of claim 1.
9. The system of claim 8, further comprising at least one of: crRNA, tracrRNA or a chimeric RNA formed from crRNA, tracrRNA.
10. Use of the Cas protein of claim 1, the nucleic acid sequence of claim 2 or 3, the expression vector of claim 4, the recombinant cell of any one of claims 5-7, or the Crispr-Cas system of claim 8 or 9 in the field of gene editing for non-disease diagnosis or treatment.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910399082 | 2019-05-14 | ||
CN2019103990824 | 2019-05-14 | ||
CN202010401622.0A CN112301018B (en) | 2019-05-14 | 2020-05-13 | Novel Cas protein, crispr-Cas system and use thereof in the field of gene editing |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010401622.0A Division CN112301018B (en) | 2019-05-14 | 2020-05-13 | Novel Cas protein, crispr-Cas system and use thereof in the field of gene editing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116694603A true CN116694603A (en) | 2023-09-05 |
Family
ID=74336498
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010401622.0A Active CN112301018B (en) | 2019-05-14 | 2020-05-13 | Novel Cas protein, crispr-Cas system and use thereof in the field of gene editing |
CN202310742030.9A Pending CN116694603A (en) | 2019-05-14 | 2020-05-13 | Novel Cas protein, crispr-Cas system and use thereof in the field of gene editing |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010401622.0A Active CN112301018B (en) | 2019-05-14 | 2020-05-13 | Novel Cas protein, crispr-Cas system and use thereof in the field of gene editing |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN112301018B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114921439B (en) * | 2022-06-16 | 2024-04-26 | 尧唐(上海)生物科技有限公司 | CRISPR-Cas effector protein, gene editing system and application thereof |
WO2024098383A1 (en) * | 2022-11-11 | 2024-05-16 | 深圳华大生命科学研究院 | Protein mutant and use thereof in treatment of disease related to hbb gene mutation |
CN116410955B (en) * | 2023-03-10 | 2023-12-19 | 华中农业大学 | Two novel endonucleases and application thereof in nucleic acid detection |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107784200B (en) * | 2016-08-26 | 2020-11-06 | 深圳华大生命科学研究院 | Method and device for screening novel CRISPR-Cas system |
KR20190104342A (en) * | 2016-12-14 | 2019-09-09 | 바게닝겐 유니버시테이트 | Thermostable CAS9 nuclease |
CN108690845B (en) * | 2017-04-10 | 2021-04-27 | 中国科学院动物研究所 | Genome editing system and method |
-
2020
- 2020-05-13 CN CN202010401622.0A patent/CN112301018B/en active Active
- 2020-05-13 CN CN202310742030.9A patent/CN116694603A/en active Pending
Also Published As
Publication number | Publication date |
---|---|
CN112301018B (en) | 2023-07-25 |
CN112301018A (en) | 2021-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12123014B2 (en) | Class II, type V CRISPR systems | |
US11802277B2 (en) | Thermostable Cas9 nucleases | |
CN107208096A (en) | Composition and application method based on CRISPR | |
CN110709514B (en) | Site-specific DNA modification using donor DNA repair templates with tandem repeats | |
CN110699407B (en) | Preparation method of long single-stranded DNA | |
CN116694603A (en) | Novel Cas protein, crispr-Cas system and use thereof in the field of gene editing | |
EP4227412A1 (en) | Engineered guide rna for increasing efficiency of crispr/cas12f1 (cas14a1) system, and use thereof | |
US20230416710A1 (en) | Engineered and chimeric nucleases | |
EP4159853A1 (en) | Genome editing system and method | |
JP2021528975A (en) | Compositions, systems, and methods for amplification using CRISPR / CAS and transposases | |
WO2023206871A1 (en) | Optimized crispr/spcas12f1 system, engineered guide rna and use thereof | |
CN114703328B (en) | Pfago protein mediated B19 virus nucleic acid detection kit and detection method | |
US20240200047A1 (en) | Enzymes with ruvc domains | |
CN109593743A (en) | Novel C RISPR/ScCas12a albumen and preparation method thereof | |
CN111909929B (en) | Method for obtaining Trichoderma reesei cellulase regulation gene in targeted manner | |
JP2024509048A (en) | CRISPR-related transposon system and its usage | |
US20240141325A1 (en) | Generation of novel crispr genome editing agents using combinatorial chemistry | |
US20240360477A1 (en) | Systems and methods for transposing cargo nucleotide sequences | |
WO2023206872A1 (en) | Engineering-optimized nuclease, guide rna, editing system, and use | |
US20230348873A1 (en) | Nuclease-mediated nucleic acid modification | |
CN118006584A (en) | Programmable nuclease with CRISPR loci completely deleted from Cas1, cas2 and Cas4 and application thereof | |
CN117821423A (en) | RalCas13d protein and editing system thereof | |
CN118086243A (en) | Taq DNA polymerase mutant and application thereof | |
CN118389730A (en) | Method for identifying single-stranded DNA locus of plant at whole genome level | |
KR20240051994A (en) | Systems, compositions, and methods comprising retrotransposons and functional fragments thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |