CN116694603A

CN116694603A - Novel Cas protein, crispr-Cas system and use thereof in the field of gene editing

Info

Publication number: CN116694603A
Application number: CN202310742030.9A
Authority: CN
Inventors: 江媛; 王丹; 章登位; 戴雪辰; 汪晓珏; 纪泽阳; 王�琦; 赵静; 李卓坤; 顾颖; 欧阳文杰; 沈玥; 陈奥; 章文蔚; 肖亮
Original assignee: BGI Shenzhen Co Ltd
Current assignee: BGI Shenzhen Co Ltd
Priority date: 2019-05-14
Filing date: 2020-05-13
Publication date: 2023-09-05
Also published as: CN112301018B; CN112301018A

Abstract

The invention relates to the field of gene editing, in particular to a novel Cas protein, a Crispr-Cas system and application thereof in the field of gene editing. The novel Cas protein is selected from at least one of the following: SEQ ID NO. 1-SEQ ID NO. 4; the sequence similarity is 85% or more, preferably 90% or more, compared with any one of SEQ ID NO 1 to SEQ ID NO 4. The novel Cas protein provided by the invention can be used for a Crispr-Cas system, and can be used for editing genes. It can edit more target sites and is easier to deliver into cells for editing without causing off-target.

Description

Novel Cas protein, crispr-Cas system and use thereof in the field of gene editing

Technical Field

The invention relates to the field of gene editing, in particular to a novel Cas protein, a Crispr-Cas system and application thereof in the field of gene editing.

Background

CRISPR (Clustered regularly interspaced short palindromic repeats), called regular clustered interval short palindromic repeats, is in fact a gene editor, a natural immunization modality in most bacteria and archaea. By analysis of flanking sequences of the CRISPR cluster, it was found that there is a polymorphic family gene in its vicinity and co-acts with the CRISPR region and is therefore named CRISPR associated gene (CRISPR associated), abbreviated Cas. Most CRISPR-Cas systems contain Cas1 proteins, and Cas1 is a more conserved protein in the Cas family. Depending on the structure of the effector module, the CRISPR-Cas systems currently being discovered are mainly of two types: class1 is a complex containing multiple Cas proteins and having multiple effector proteins (effectors) acting together, mainly including Type I, type III and Type IV; class2 contains only one large effector protein, including Type II, type V and Type VI. Currently, class2 includes Cas9 systems (Type ii) and Cpf1 (Type v) systems, and is widely used in gene editing applications.

However, the Crispr-Cas system still suffers from several drawbacks, such as the possible occurrence of gene off-target, and its limited range of applications, and further improvements are needed.

Disclosure of Invention

The present invention aims to solve at least one of the technical problems in the related art to some extent. To this end, an object of the present invention is to propose a novel Cas protein, a Crispr-Cas system and its use in the field of gene editing.

The CRISPR/Cas system is a commonly used system for gene editing that can be successfully applied to the precise editing of animal and plant genomes. The system is used for targeted recognition of DNA double-strand specific sites by RNA mediation and cleavage by nuclease, and is generally used for Cas9 nuclease and Cpf1 nuclease more widely. The Cas9 nuclease and the Cpf1 nuclease recognize DNA double-strand specific sites through RNA mediated targeting and cut, so that DNA double-strand breaks are caused, and cells are repaired through NHEJ (nonhomologous end joining) or HR (homologous recombination), so that the site-specific modification of target genes is realized. One Cas9 nuclease that is widely used commercially is the SpCas9 nuclease, which recognizes the PAM sequence as NGG, is located at the 3' end of the targeting sequence, and cleaves at 3bp from the PAM sequence to form a blunt end. LbCPf1 is a Cpf1 nuclease of wide commercial application that recognizes the PAM site as a TTTN sequence 5' to the targeting sequence and cleaves distally to form a cohesive end.

During the course of the study it was found that: both SpCas9 and LbCpf1 have relatively stringent PAM sequences, limiting the design of the targeting sites. Furthermore, the SpCas9 protein and the LbCpf1 protein are composed of 1368 and 1228 amino acids, respectively, and are too large to be packaged and delivered by AAV viruses, which limits the application thereof in animal cells to a certain extent. And the targeted sequence of SpCas9 is 20bp, and similar sequences are easy to appear in the whole genome, so that off-target is caused.

Find novel useful Cas proteins that make their protein length smaller, thereby allowing for convenient packaging and delivery, further expanding their application in the field of animal cells. Moreover, the Crispr-Cas system is not easy to cause off-target, and is of great importance.

For this reason, we have studied to find a variety of novel Cas proteins, which are shorter in protein length, that can be more easily delivered to cells for editing when used in a Crispr-Cas system. And is less prone to off-target. Taking BES1 protein obtained on human enterobacteria Veillonella sp AF-2 (abbreviated as AF 13-2) as an example, the PAM protein is used for a Crispr-Cas system, the identified PAM sequences are lower in specificity than the commercial SpCas9 and LbCPf1, and the target sites for editing the Cas protein are more potential. Furthermore, the BES1 protein consists of only 1064 amino acids, and is more easily delivered to cells for editing. The targeting sequence of SpCas9 is 20bp, and the targeting sequence of our BES1 is 23bp, which is potentially less likely to cause off-target than SpCas 9.

Specifically, the invention provides the following technical scheme:

according to a first aspect of the present invention, there is provided a Cas protein selected from at least one of the following: SEQ ID NO. 1-SEQ ID NO. 4; the sequence similarity is 85% or more, preferably 90% or more, compared with any one of SEQ ID NO 1 to SEQ ID NO 4. The novel Cas proteins SEQ ID NO. 1-SEQ ID NO. 4 are obtained through biological information technology screening, and are verified through molecular biological technology, any one of the Cas proteins is easy to be delivered into cells for gene editing. And the PAM sequence identified by the target sequence has proper specificity, so that more target sites can be edited, the length of the target sequence is proper, and off-target is not easy to cause. Compared with any one of the proteins shown in SEQ ID No. 1-SEQ ID No. 4, the sequence similarity is more than 85%, such as more than 86%, more than 87%, more than 88%, more than 89%, preferably more than 90%, such as more than 91%, more than 92%, more than 93%, and more than 94%, and the protein has the same or similar activity and function as the Cas protein shown in the SEQ ID No. 1-SEQ ID No. 4, is also easy to be delivered into cells for gene editing, has more editable target sites, has more proper sequence length to be targeted and is less prone to cause off-target.

According to an embodiment of the present invention, the Cas protein described above may further include the following technical features:

in some embodiments of the invention, the sequence similarity is 95% or more, preferably 96% or more, more preferably 97% or more, more preferably 98% or more, most preferably 99% or more, as compared to any of SEQ ID NOs 1 to 4. Compared with any one protein of SEQ ID NO. 1-SEQ ID NO. 4, the sequence similarity is more than 95%, preferably more than 96%, 97%, 98%, 99% and 99.5% of the protein has the same or similar activity as the Cas protein, is easy to be delivered into cells for gene editing, has more editable target sites, is more suitable in the length of the targeted sequence, and is not easy to cause off-target.

In some embodiments of the invention, the Cas protein is a Cas protein having nuclease activity with one or more amino acids substituted, deleted, or added as compared to any one of SEQ ID NOs 1 to 4. These proteins have the same or similar nuclease activity as the proteins shown in SEQ ID No. 1-SEQ ID No. 4, are also easily delivered into cells for gene editing, have many target sites, and are not easily off-target.

In some embodiments of the invention, the Cas protein hybridizes to SEQ ID NOs: 4, and a Cas protein having nuclease activity, substituted, deleted, or added by up to 8 amino acids. These proteins have the same or similar nuclease activity as the proteins shown in SEQ ID No. 1-SEQ ID No. 4, are also easily delivered into cells for gene editing, have many target sites, and are not easily off-target.

In some embodiments of the invention, the Cas protein hybridizes to SEQ ID NOs: 4, and a Cas protein having nuclease activity, substituted, deleted, or added by up to 6 amino acids. These proteins have the same or similar nuclease activity as the proteins shown in SEQ ID No. 1-SEQ ID No. 4, are also easily delivered into cells for gene editing, have many target sites, and are not easily off-target.

In some embodiments of the invention, the Cas protein hybridizes to SEQ ID NOs: 4, and a Cas protein having nuclease activity, substituted, deleted, or added by up to 5 amino acids. These proteins have the same or similar nuclease activity as the proteins shown in SEQ ID No. 1-SEQ ID No. 4, are also easily delivered into cells for gene editing, have many target sites, and are not easily off-target.

In some embodiments of the invention, the Cas protein hybridizes to SEQ ID NOs: 4, and a Cas protein having nuclease activity, substituted, deleted, or added by up to 4 amino acids. These proteins have the same or similar nuclease activity as the proteins shown in SEQ ID No. 1-SEQ ID No. 4, are also easily delivered into cells for gene editing, have many target sites, and are not easily off-target.

In some embodiments of the invention, the Cas protein hybridizes to SEQ ID NOs: 4, and a Cas protein having nuclease activity, substituted, deleted, or added by up to 3 amino acids. These proteins have the same or similar nuclease activity as the proteins shown in SEQ ID No. 1-SEQ ID No. 4, are also easily delivered into cells for gene editing, have many target sites, and are not easily off-target.

In some embodiments of the invention, the Cas protein hybridizes to SEQ ID NOs: 4, and a Cas protein having nuclease activity, substituted, deleted, or added by up to 2 amino acids. These proteins have the same or similar nuclease activity as the proteins shown in SEQ ID No. 1-SEQ ID No. 4, are also easily delivered into cells for gene editing, have many target sites, and are not easily off-target.

In some embodiments of the invention, the Cas protein hybridizes to SEQ ID NOs: 4, and a Cas protein having nuclease activity, which has been substituted, deleted or added with 1 amino acid compared to any one of the sequences. These proteins have the same or similar nuclease activity as the proteins shown in SEQ ID No. 1-SEQ ID No. 4, are also easily delivered into cells for gene editing, have many target sites, and are not easily off-target.

In some embodiments of the invention, the Cas protein is set forth in SEQ ID No. 1. The Cas protein consists of 1064 amino acids, the number of the amino acids is smaller, the Cas protein is easier to deliver into cells for editing, and the identified PAM sequence is NNNV (wherein V represents base A/G/C), so that more target sites can be edited, and the target sequence is 23bp, so that off-target phenomenon is not easy to cause. The Cas protein has in vitro DNA double strand cleavage activity, and no human intracellular editing activity was detected.

In some embodiments of the invention, the Cas protein is set forth in SEQ ID No. 2. The Cas protein consists of 1368 amino acids, the number of amino acids is smaller, it is easier to be delivered into cells for editing, and the PAM sequence identified by it is NNMTA. The Cas protein has in vitro DNA double strand cleavage activity, and no human intracellular editing activity was detected.

In some embodiments of the invention, the Cas protein is set forth in SEQ ID No. 3. The Cas protein consists of 1245 amino acids, is less in amino acid number, is easier to deliver into cells for editing, and recognizes the PAM sequence as TTTN. The Cas protein has an in vitro DNA double strand cleavage activity and a human intracellular editing activity.

In some embodiments of the invention, the Cas protein is set forth in SEQ ID No. 4. The Cas protein consists of 1306 amino acids, the number of the amino acids is smaller, the Cas protein is easier to be delivered into cells for editing, and the identified PAM sequence is YYN, so that the limit that LbCPf1 only identifies TTTN is greatly relieved. The Cas protein has an in vitro DNA double strand cleavage activity and a human intracellular editing activity.

According to a second aspect of the present invention there is provided a nucleic acid sequence selected from at least one of the following: a nucleic acid sequence encoding a Cas protein according to any one of the embodiments of the first aspect of the invention; a nucleic acid sequence that is reverse-complementary to a nucleic acid sequence encoding a Cas protein according to any one of the embodiments of the first aspect of the invention.

In some embodiments of the invention, the nucleic acid sequence is DNA or RNA.

According to a third aspect of the present invention there is provided an expression vector comprising a nucleic acid sequence according to the second aspect of the present invention. Constructing the nucleic acid sequence and the vector to obtain expression vectors, wherein the expression vectors can express corresponding Cas proteins in target cells, so that corresponding gene editing is performed in the target cells. The usual vectors may be plasmids, lentiviruses, etc., and may be, for example, pET 28a vectors, pMD19 vectors, etc.

According to a fourth aspect of the present invention there is provided a recombinant cell comprising an expression vector according to the third aspect of the present invention. The expression vector is introduced into cells to form recombinant cells, and the corresponding Cas protein is expressed by the expression vector, so that gene editing of the recombinant cells can be realized. These recombinant cells may be eukaryotic cells, such as plant cells, animal cells. Especially, compared with the common SpCas9 protein and LbCPf1 protein, the Cas protein provided by the invention has fewer amino acid numbers and is easier to be delivered into cells for editing. When the virus vector is used for animal cells, the virus vector is more convenient to package and deliver, and the application in the field of animal cells is expanded.

According to a fifth aspect of the present invention, there is provided a Crispr-Cas system comprising a Cas protein according to the first aspect of the present invention. The Cas protein provided by the invention can be used in a Crispr-Cas system, is applied to the field of gene editing, expands the editable range, is not easy to miss targets, and improves the editing accuracy. The system can be used in a plurality of fields such as basic bioscience, medicine, agriculture and the like.

According to an embodiment of the present invention, the Crispr-Cas system described above may further include the following technical features:

in some embodiments of the invention, the Crispr-Cas system further comprises at least one of the following: crRNA, tracrRNA or a chimeric RNA formed from crRNA, tracrRNA. These RNAs can help the Crispr-cas system to function as a gene editor. In addition, the Crispr-Cas system may further include a crispr_repeat sequence, as needed, wherein the crispr_repeat sequence corresponding to each Cas protein is shown in the accompanying table I and the accompanying table II.

In some embodiments of the invention, the crRNA, tracrRNA is as shown in the accompanying tables I and II. The crRNA, tracrRNA sequences used by Cas proteins in gene editing are listed in table I and table II. These sequences can help Cas proteins to be precisely located to target sequences, enabling precise gene editing.

According to a sixth aspect of the present invention, there is provided the use of the Cas protein, the nucleic acid sequence, the expression vector, the recombinant cell or the Crispr-Cas system according to the first aspect of the present invention in the field of gene editing, wherein the Cas protein is the Cas protein according to the first aspect of the present invention, the nucleic acid sequence is the nucleic acid sequence according to the second aspect of the present invention, the expression vector is the expression vector according to the third aspect of the present invention, the recombinant cell is the recombinant cell according to the fourth aspect of the present invention, and the Crispr-Cas system is the Crispr-Cas system according to the fifth aspect of the present invention.

Drawings

Fig. 1 is a PAM bias chart of BES1 provided in accordance with an embodiment of the present invention.

FIG. 2 is a graph of BES1 purification results provided in accordance with an embodiment of the present invention.

FIG. 3 is a base sequence and a structural diagram of crRNA+tracrrna-L, sgRNA-1, sgRNA-2, and sgRNA-3 of BES1 provided according to an embodiment of the present invention.

FIG. 4 is a PAM bias chart of BES1 for chip detection with crRNA+tracrrna-L, sgRNA-1, sgRNA-3, respectively, according to an embodiment of the present invention.

FIG. 5 is a sequence diagram of a spacer provided in accordance with an embodiment of the present invention.

Fig. 6 is a PAM library sequence constructed as provided in accordance with an embodiment of the present invention.

FIG. 7 is a schematic representation of cleavage substrate sequences provided in accordance with an embodiment of the present invention.

FIG. 8 is a band diagram of in vitro cleavage products of BES1 with crRNA+tracrrna-L, sgRNA-1, sgRNA-2 and sgRNA-3 at 20 ℃, 25 ℃ and 37 ℃ provided according to an embodiment of the present invention.

Fig. 9 is a schematic flow chart of obtaining a novel Cas protein provided according to an embodiment of the present invention.

Fig. 10 is a PAM bias chart of a chip detection BES2, BES4 and BES6 system according to an embodiment of the present invention.

FIG. 11 is a graph of in vitro cutting experiments for BES2, BES4 and BES6 systems according to embodiments of the present invention.

FIG. 12 is an electrophoresis diagram of human cell editing activity assay of BES6 system according to an embodiment of the present invention.

FIG. 13 is an electrophoresis diagram of human cell editing activity assay of BES4 system according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention. Also, certain terminology or expressions present herein have been chosen for the purpose of facilitating an understanding of the invention and are not to be construed as limiting the scope of the invention, for the purpose of enabling those of ordinary skill in the art to better understand the invention.

Herein, the terms "Crispr", "Crispr" or "Crispr" all refer to short palindromic repeats of regular clustered intervals, i.e. acronyms for Clustered regularly interspaced short palindromic repeats, and are expressions commonly used in the art, whether capitalized or lowercase or capitalized. Accordingly, there are different expressions in the Crispr-Cas system for letter cases. In addition, when a base is represented, unless otherwise specified, the letters N and V represent bases having the usual meaning in the art, i.e., N represents a random or arbitrary base A, T, C or G, and V represents a random or arbitrary base A, C or G.

Cas9 enzymes cleave at target DNA targets, the target site is typically determined by: an RNA molecule called Crispr RNA (crRNA) binds together with a part of its sequence to an RNA molecule called tracrRNA by base pairing to form a chimeric RNA (tracrRNA/crRNA) which then base pairs with a target DNA site via another part of the crRNA sequence, whereby the chimeric RNA directs Cas protein binding to this target site for cleavage, such chimeric RNA also being called guide RNA (guide RNA). Unlike the Crispr-Cas9 system, the Cpf1 enzyme is able to process CrRNA precursors alone and then specifically target and cleave DNA using crrnas produced after processing, without the need for ribonucleases and tracrrnas from the host cell.

The targeting specificity of Crispr is determined by two parts, one part being base pairing between the RNA chimera and the target DNA, and the other part relying on Cas protein and a short DNA sequence at the 3' end of the target DNA, called PAM (protospacer adjacent motif).

If the PAM sequence is stringent (e.g., possibly a specific few bases), then there are fewer target sites that the Cas protein can edit, thus limiting the application of the Crispr-Cas system. Both SpCas9 and LbCpf1 have a relatively stringent PAM sequence, thus limiting the design of the targeting site. For example, the PAM sequence identified by the SpCas9 nuclease is NGG, located at the 3' end of the targeting sequence, and cleaved at 3bp from the PAM sequence to form a blunt end, which limits the application of the editing system because the PAM sequence is NGG only.

We have found a variety of novel Cas9 systems and Cpf1 systems with genetic editing potential in human intestinal flora using bioinformatics and molecular experimental techniques, as shown in table I and table II. Wherein the Cpf1 enzyme in the Cpf system, also known as Cas12a protein, is genetically edited differently than Cas9 protein, cpf1 enzyme is smaller than SpCas9 protein and is more easily transported into cells and tissues. And the method is applied to a Crsipr-Cpf1 system, only one crRNA is needed, and multi-site simultaneous editing can be realized. Cas proteins provided in the present application include both Cas9 proteins and Cpf1 proteins. Namely, the application provides a Cas protein which is at least one of SEQ ID NO. 1-SEQ ID NO. 4. The Cas proteins have nuclease activity and can be used for cutting target nucleic acid, so that the Cas proteins are applied to a Crispr-Cas system, effective editing of genes is realized, more target sites for editing are available, and the application range is wider.

The novel Cas9 and Cpf1 systems provided have lower identified PAM specificity, thus expanding the application of gene editing systems. Taking BES1 protein obtained on human enterobacteria Veillonella sp AF-2 (AF 13-2 for short) as an example, the PAM specificity of the protein is lower and the protein is smaller than that of the existing commercial SpCas9 and LbCPf 1. The BES1 protein has a smaller number of amino acids and is easier to deliver into cells for gene editing functions. The PAM sequence preference of BES1 is shown in fig. 1, in which the abscissa in fig. 1 represents 7 sites immediately adjacent to the 3' end of the target sequence and the ordinate represents the proportion of each base in all positive sequences that are cut. In FIG. 1, the probability of either base A, base C, base T or base G is high at the first site immediately 3' of the target sequence, which site can be denoted as N, and the results of each site are observed sequentially. As can be seen from FIG. 1, the probability of cleavage is very low (less than 0.05) only when the fourth bit is T, and thus the PAM sequence of BES1 is NNNV (where V represents base A, G or C).

Novel Cas9 systems tables, including strain names, genome ID (NCBI database), taxID (NCBI database), species, genus, phylum, crRNA, tracrRNA, crispr repeat sequence, effector protein length, effector amino acid sequence, etc., of the Cas protein are detailed in table I.

Novel Cpf1 systems tables, including strain name, genome ID (NCBI database), taxID (NCBI database), species, genus, phylum, crRNA, tracrRNA, crispr repeat sequence, effector protein length, effector amino acid sequence, etc., for which the Cas protein is located, are detailed in Table II.

The scheme of the present invention will be explained below with reference to examples. It will be appreciated by those skilled in the art that the following examples are illustrative of the present invention and should not be construed as limiting the scope of the invention. The examples are not to be construed as limiting the specific techniques or conditions described in the literature in this field or as per the specifications of the product. The reagents or apparatus used were conventional products commercially available without the manufacturer's attention.

Example 1

Analysis of microorganisms in the human intestinal flora based on the microbial genome database predicts Cas protein sequences and Crispr sequences, determining all protein sequences 20kb upstream and downstream of Crispr. Then, the protein is compared with a protein database in NCBI to obtain homologous proteins of known TypeII or TypeV proteins. These homologous proteins were analyzed to determine the conserved sites of key domains of the homologous proteins and the integrity of the proteins, resulting in Cas protein sequences and nearby Crispr sequences in tables I and II. The analysis method is shown in FIG. 9. These novel Crispr-Cas systems belong to the novel Type II and Type V Crispr-Cas systems, with a different gene editing capacity than the existing SpCas9 proteins. These novel Crispr-Cas systems enrich the existing Crispr-Cas systems and can be used in different cells, e.g., animal cells and plant cells, as needed to perform gene editing functions.

Taking BES1 obtained on human intestinal bacteria Veillonella sp AF-2 (abbreviated as AF 13-2) as an example, the PAM specificity is low and the protein is smaller than that of the existing commercial SpCas9 and LbCPf 1. As shown in FIG. 1, the PAM sequence preference of BES1 is that the probability of cleavage is extremely low (less than 0.05) only when the fourth bit is T, and the PAM sequence of BES1 is NNNV (where V represents base A, G, C).

The additional table I is a novel Cas9 systems table, including the strain name, genome ID (NCBI database), taxID (NCBI database), species, genus, phylum information, and crRNA, tracrRNA, crispr repeat sequence (Crispr repeat), effector protein length (effector length), effector amino acid sequence (effector amino acid sequence) of the Crispr-Cas system or Cas protein. The Cas proteins shown in the accompanying table I, having shown the corresponding crRNA, tracrRNA and/or crispr repeat sequence, can be applied directly by the person skilled in the art according to the sequences shown.

The additional table II is a novel Cpf1 systems table including the strain name, genome ID (NCBI database), taxID (NCBI database), species, genus, phylum information, and crRNA, tracrRNA, crispr repeat sequence, effector protein length, effector amino acid sequence of the Crispr-Cas system or Cas protein. Cas proteins shown in the accompanying table II, not shown as corresponding crRNA, tracrRNA and/or crispr repeat sequence, can find crRNA, tracrRNA and/or crispr repeat sequence that can help these Cas proteins perform editing functions based on the information of the corresponding Cas proteins.

EXAMPLE two experiments to express purified BES1 protein

1. Construction of BES1 expression vectors

Constructing an expression vector by adopting an In-fusion method, selecting NdeI and EcoR I two sites to enzyme-cut a pET 28a vector, and inserting a BES1 coding gene sequence into a cloning region of the vector pET 28 a. The 6 His at the N-terminal of the amino acid sequence of the recombinant BES1 protein are used as purification tags, wherein the screening tag is kanamycin, and the constructed vector is named pET 28a-BES1.

2. Cultivation and Induction of BES1 Strain

LB liquid medium: 10g/L tryptone, 5g/L yeast extract and 10g/L NaCl.

The recombinant expression vector pET 28a-BES1 was transformed into E.coli expression strain Ecoli.BL21 (DE 3), and the bacterial liquid was spread evenly on LB solid medium plates with a kanamycin concentration of 50. Mu.g/mL, and cultured overnight at 37 ℃. Single colonies were picked and cultured in 5mL LB medium (containing 50. Mu.g/mL kanamycin) at 37℃and 200rpm overnight. The bacterial liquid obtained above was inoculated at 1:100 into 50mL of LB medium (containing 50. Mu.g/mL kanamycin) and cultured at 37℃for 4 hours at 200 rpm. The bacterial liquid of the expansion culture is inoculated into 2L LB liquid culture medium (containing 50 mug/mL kanamycin) according to the ratio of 1:100 for culture, the temperature is 37 ℃, the rpm is 200, when the OD600 value reaches about 0.6-0.8, IPTG is added to the final concentration of 0.4mM, the temperature is 16 ℃, the rpm is 200, and the culture is carried out for about 16-18 hours. And centrifuging 10000g of the induced bacterial liquid to collect bacterial cells, and freezing the bacterial cells at-20 ℃ for later use.

3. BES1 protein extraction and purification

Purifying Buffer preparation:

(1) Ni column affinity chromatography

Buffer a equilibration Buffer: 50mM Tris-HCl+500mM NaCl+20mM imidazole, pH 7.5.

Buffer B elution Buffer: 50mM Tris-HCl+500mM NaCl+500mM imidazole, pH 7.5.

(2) Ion exchange chromatography

Buffer C equilibration Buffer: 50mM Tris-HCl+100mM NaCl,pH 7.0.

Buffer D elution Buffer: 50mM Tris-HCl+1M NaCl, pH 7.0.

(3) Protein sample diluent

Buffer E dilution: 50mM Tris-HCl, pH 7.0.

(4) Protein sample 2 x stock solution

Buffer f2×stock: 50mM Tris-HCl+300mM NaCl,pH 7.0.

The cells were resuspended in a proportion of 1g of cells plus 15ml Buffer A, and PMSF was added to a final concentration of 1mM, and the cells were sonicated until the cell solution was clear. The crushed cells were centrifuged at 12000rpm at 4℃for 30min, and the supernatant was filtered through a 0.22 μm filter membrane and stored at 4 ℃.

The Ni column affinity chromatography column was washed with water for 5CV, buffer B was washed for 5CV, and buffer A was equilibrated for 10CV, followed by loading. After loading was completed, 15CV was equilibrated, the impure proteins were washed off using 15% Buffer B, and the proteins were collected by linear elution (15-100% Buffer B,10 CV) when the UV value was greater than 100 mAU.

The protein collected by the Ni column is diluted 5 times by Buffer E, the Q anion exchange column is washed with water for 5CV, buffer C is balanced for 5CV, a protein sample is loaded, and collecting of penetrating fluid is started when the UV value rises. The SP cation exchange column is equilibrated with Buffer C for 5CV, the protein sample obtained in the previous step is loaded, after loading is completed, the protein sample is equilibrated with Buffer C for 15CV, and then eluted with elution Buffer D (0-100% Buffer D,10 CV) in a linear manner, and the protein is collected. The proteins were collected for overnight dialysis and the dialysate was 2 x storage Buffer. The final protein concentration was 1mg/mL and the glycerol concentration was 50%. As shown in FIG. 2, SDS-PAGE results show that the fusion protein has good purification effect and qualified purity.

In the following examples three and four, taking as an example the Cas9 protein BES1BES1 (SEQ ID NO: 1) found in human enterobacteria Veillonella sp AF13-2, the PAM sequence recognized by the protein and its cleavage function in vitro on the target substrate were investigated.

Example III experiment to obtain BES1 PAM sequence

1. Preparation of wizard RNA (guide RNA)

First, we designed double-stranded DNA transcription templates for crRNA and tracrRNA-L from predicted crRNA and tracrRNA sequences of BES1 in strain AF13-2 (see Table I below). At the same time, on the basis of this, it was attempted to shorten the sequence of the pairing region of crRNA and tracrRNA-L, and ligate them with a GAAA ligation sequence, so that a single DNA strand, i.e., sgRNA-1, was formed, and the transcription template sequence of sgRNA-1 is shown in Table 1 below. Meanwhile, in order to maintain the activity of the original RNA to the greatest extent, sgRNA-3 is designed, the transcription template sequence of the sgRNA-3 is shown in the following table 1, and the deoxynucleotide sequences used in the table 1 are synthesized on a Shenzhen national gene library synthesis and editing platform. Wherein the sequences shown in Table 1 are all DNA template sequences for each RNA transfer. The sequence and secondary structure of crRNA+tracrrna-L, sgRNA-1, and sgRNA-3 are shown in FIG. 3.

TABLE 1BES1 cleavage on chip experiment template sequence for RNA transcription

The double-stranded DNA template described above was prepared by DNA polymerase chain reaction using KAPAHiFiTM heat activated on-the-fly using a cocktail (Roche). After the reaction, DNA double-stranded template was purified using a phenol chloroform isoamyl alcohol mixture (Allatin), the purity of the purified DNA double-stranded template was determined using a Nanodrop TM 2000 spectrometer (Thermo Fisher Scientific), and Qubit was used for the purified DNA double-stranded template ^TM The double-stranded DNA high-sensitivity quantitative kit (Thermo Fisher Scientific) and the Qubit TM 3.0 fluorescent quantitative instrument are used for concentration measurement.

Then, transcription is performed using the above DNA double-stranded template, and when transcription is performed, the transcription is performed according to MEGAscript ^TM In the specification of T7Transcription Kit, 2 picomoles of DNA double-stranded template were added and incubated for 12 hours at 37℃using a Bio-rad S1000. TM. PCR instrument. And the RNA was purified using a phenol chloroform isoamyl alcohol mixture (Allatin), and the purity and concentration of the purified RNA were determined using a Nanodrop TM 2000 spectrometer (Thermo Fisher Scientific).

2. Preparation of cleavage substrate Single-chain Loop

Cleavage substrates were prepared which can be used for the above BES1 proteins, wherein the deoxynucleotide sequences used for cleavage of the substrates are shown in the following Table (Table 2). Wherein the deoxynucleotide sequences used in Table 2 are synthesized in Shenzhen national gene library synthesis and editing platform.

By DNA polymerase chain reactionThermal activation the double strand of the substrate to be cleaved (double strand substrate) was prepared on-the-fly using a cocktail (Roche). The two nucleotide sequences of PAM_AF13-2_2/1 and PAM_AF13-2_2/2 in the table 2 are denatured at 95 ℃ and then renatured to be used as templates, and the two nucleotide sequences of PAM_AF13-2_1 and PAM_AF13-2_3 are used as primers for carrying out polymerase chain reaction amplification to obtain the double-chain substrate.

The obtained polymerase chain reaction product was recovered using an e.z.n.a.tm glue recovery kit, and then the recovered product was subjected to purity measurement (Thermo Fisher Scientific) using a Nanodrop (TM) 2000 spectrometer, and concentration measurement was performed using a Qubit (TM) double-stranded DNA high-sensitivity quantification kit (Thermo Fisher Scientific) and a Qubit (TM) 3.0 fluorescent quantification meter.

TABLE 2 deoxynucleotide sequences used for cleavage substrate preparation

Then, single-strand cyclization is performed using the double-strand substrate obtained as described above to obtain a single-strand loop product. The method comprises the following steps:

using 1 picomolar of the DNA double-stranded substrate prepared above, 1 XPTA buffer (Epicentre), T4 DNA ligase 120U (Epicentre), and 10mM ATP (NEB) final concentration, the reaction product system size was 60 μl, using Bio-rad S1000 ^TM The PCR instrument was incubated at 37℃for 1 hour.

EXO III (10U/. Mu.l) (from BGI) and EXO I (3U/. Mu.l) (from BGI) were then used, using Bio-rad S1000 ^TM The PCR instrument was incubated at 37℃for 30 minutes, and the unqualified PCR product was digested. The product used 2.5 volumes of AMPure XP (Beckman ^TM ) After purification and using Qubit ^TM Single-stranded DNA high-sensitivity quantitative kit (Thermo Fisher Scientific) and Qubit ^TM 3.0 (Thermo Fisher Scientific) the concentration was measured by a fluorescence quantitative measuring instrument.

3. SE51 sequencing

(1) The nanospheres used in the machine were prepared by using the above single-stranded ring, 6 nanograms of the above single-stranded ring product was taken, and nuclease-free pure water (Ambion) ^TM ) The mixture was equilibrated to 20. Mu.l, and 20. Mu.l of Make DnB Buffer (BGI) was added, and after mixing, the mixture was centrifuged, and the mixture was incubated at 95℃for 1 minute, 65℃for 1 minute, 40℃for 1 minute, and 4℃for 1 minute using a Bio-rad S1000. TM. PCR instrument.

After reaction, the product was added with make DnB enzyme mix V2.0.0 (BGI) 40. Mu.l, make DnB enzyme mix II V2.0.0 (BGI) 2. Mu.l, mixed and incubated for 20 min at 30℃using a Bio-rad S1000. Mu.M PCR apparatus, mixed with DnB stop Buffer (BGI) after reaction, blown with a flared tip (Axygen), added with 30. Mu.l load DnB Buffer (BGI), blown with a flared tip (Axygen), and the library was immobilized on a BGITMSEQ 500V 3.1 chip (BGI) using a BGITMSEQ500 DnB loader (BGI) to give the chip to be sequenced.

(2) Using BGI ^TM The sequence information and ID number of each nucleic acid sequence are obtained by performing SE51 sequencing on the chip by using a BGITMSEQ500 sequencer (BGI) by using a SEQ500 SE100 sequencing Cartridge sequencing kit (BGI).

4. BES1-PAM native strand sequencing

Since the sequencing results in single-stranded DNA, the complementary strand (i.e., the original strand) is synthesized using the single-stranded DNA, and the obtained double-stranded DNA is used for the cleavage experiment of the protein. Comprising the following steps:

(1) After the chip sequencing is completed, the chip sequencing is finished in BGI ^TM New strands generated from the first sequencing were eluted on SEQ500 DnB loader (BGI) using 100% formamide (Sigma).

(2) After the completion of the chip elution, dNTP mix 2 (BGI) was used to perform the reaction in BGI ^TM The original strand synthesis is carried out on a SEQ500 sequencer (BGI) to obtain double-stranded DNA, the synthesis length is 50 nucleotides, the 51 st base is synthesized by dNTP mix 1 (BGI), and the step is to add fluorescence dNTP at the end of the synthesis strand.

(3) After the above steps are completed, BGI is used ^TM The chip is photographed by a SEQ500 sequencer (BGI), and is stored as an original image on the sequencer.

(4) BES1 chip enzyme digestion reaction. And (3) performing enzyme digestion reaction on the double-stranded DNA obtained in the step (2) by using different RNAs. Wherein the buffer used in the reaction is spCas9 1 ×reaction buffer (NEB), 30 μg of RNA (crRNA+tracrrRNA-L, sgRNA-1 or sgRNA-3) prepared in step 1 is added, BES1 protein is 0.1 μmol, RNase inhibitor (Epicentre) reaction system has a final volume of 300 μL, and BGI is used ^TM The mixture was pumped into the chip by a SEQ500 DnB loader (BGI) pump and incubated at 37℃for 5 hours.

(5) The chips were washed 3 times with 300. Mu.l of washing buffer 2 (BGI).

(6) After the above steps are completed, the chip is photographed by using a BGITMSEQ500 sequencer (BGI), and the chip is stored as an original picture II on the sequencer.

(7) The stored primary and secondary images were compared for fluorescence signals before and after digestion using a BGITMSEQ500 sequencer (BGI) by manual basecall software (BGI). The PAM sequence of BES1 was analyzed with SpCas9 as a control and the results are shown in fig. 4.

In the results shown in FIG. 4, 7 sites immediately adjacent to the 3' -end of the target sequence are shown on the abscissa, and the proportion of each base in all positive sequences that are cut is shown on the ordinate. That is, the ordinate represents the number of sequences to be cut as denominator, which base is to be cut at each position is determined, and the ratio of four bases at each position is calculated. As can be seen from the results shown in fig. 4, the preference of BES1 is not much different under the action of Guide RNA, which is slightly different in structure, than SpCas 9.

In vitro cleavage experiments of example four BES1

1. Preparation of guide RNA

According to the method of example three, a crRNA transcription template, a double-stranded DNA transcription template of tracrRNA-L, and double-stranded DNA transcription templates of sgRNA-1 and sgRNA-3 were obtained. At the same time, shorter tracrRNA-S was designed, and sgRNA-2 was designed using the complete crRNA and tracrRNA-S, the transcription template sequences of which are shown in table 3 below. The transcription template DNA is synthesized in Shenzhen national gene library synthesis and editing platform.

TABLE 3 double-stranded DNA transcription templates for sgRNA-2

Functional RNAs such as those shown in FIG. 4, including crRNA+tracrrna-L, sgRNA-1, sgRNA-2, and sgRNA-3 (where the target sequence is replaced with N in FIG. 4) can be transcribed using the DNA templates described above.

Specifically, according to the method of the third embodiment, the method includes:

double-stranded DNA templates were prepared by DNA polymerase chain reaction using KAPAHiFiTM heat activated on-the-fly using a cocktail (Roche). After the reaction, DNA double-stranded template was purified using a phenol chloroform isoamyl alcohol mixture (Allatin), the purity of the purified DNA double-stranded template was determined using a Nanodrop TM 2000 spectrometer (Thermo Fisher Scientific), and Qubit was used for the purified DNA double-stranded template ^TM The double-stranded DNA high-sensitivity quantitative kit (Thermo Fisher Scientific) and the Qubit TM 3.0 fluorescent quantitative instrument are used for concentration measurement.

Then, transcription is performed using the above DNA double-stranded template, and when transcription is performed, the transcription is performed according to MEGAscript ^TM In the specification of T7Transcription Kit, 2 picomoles of DNA double-stranded template were added and incubated for 12 hours at 37℃using a Bio-rad S1000. TM. PCR instrument. RNA was purified using a phenol chloroform isoamyl alcohol mixture (Allatin), and the purity and concentration of the purified RNA were determined using a Nanodrop TM 2000 spectrometer (Thermo Fisher Scientific).

2. Cleavage substrate preparation

Target site design: the Crispr sequence is typically composed of a leader, which may typically act as a promoter for the Crispr sequence, multiple repeats, which may form a hairpin structure, and multiple spacers, which typically consist of captured foreign DNA. Thus, the original pro-spacer sequence (selected-spacer in FIG. 5) on the genomic sequence of the Veilonella sp.AF13-2 strain (NCBI genome ID: QTMT 00000000) was used as the target site sequence.

PAM sequence design: A7N PAM library (spacer and PAM sequences in FIG. 6) was created to facilitate cleavage of BES1 protein.

Cleavage substrate design: cloning of the synthesized PAM library sequences into the pMD19 vector resulted in a pMD19-AF13-2-3' PAM library. We amplified a 842bp cleavage substrate sequence in this library (see FIG. 7, where the cleavage substrate sequence is shown in SEQ ID NO: 243), the target site positions were 402bp-431bp (see FIG. 7), and the PAM positions were 432bp-438bp (see FIG. 7, i.e., 7 random bases from position 432 to position 438 in SEQ ID NO:24, underlined), so that the cleavage products were all about 400 bp. The reason for this design is that in the case of gel electrophoresis with low resolution, the cleavage product forms a broad band, so that we can detect whether or not it is cleaved.

The cleavage substrate sequence of 842bp is as follows (N stands for any base) (SEQ ID NO: 23):

CTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATACCGCTCGCCGCAGCCGAACGACCGAGCGCAGCGAGTCAGTGAGCGAGGAAGCGGAAGAGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCATTAATGCAGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCTATGACCATGATTACGCCAAGTTTGCACGCCTGCCGTTCGACGATTGTAGTAGCTCAAAAGGGAACTGCTACCGAANNNNNNNAATCTCTGGAAGATCCGCGCGTACCGAGTTCTAATTCACTGGCCGTCGTTTTACAACGTCGTGACTGGGAAAACCCTGGCGTTACCCAACTTAATCGCCTTGCAGCACATCCCCCTTTCGCCAGCTGGCGTAATAGCGAAGAGGCCCGCACCGATCGCCCTTCCCAACAGTTGCGCAGCCTGAATGGCGAATGGCGCCTGATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCACACCGCATATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGCCCCGACACCCGCCAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACAGACAAGCTGTGACCGTCTCCGGGAGCTGCATGTGTCAGAGG(SEQ ID NO:23)。

3. cutting experiments and results

The cleavage system was functional RNA (four RNAs shown in FIG. 4), cleavage substrate and BES1 were added at a final concentration of 100nM, incubated at 20℃and 25℃and 37℃for 1 hour, and the cleavage products were identified by using 2% agarose gel, and the cleavage results were shown in FIG. 8.

From the results shown in FIG. 8, it can be seen that BES1 can cleave the target substrate by adding four functional RNAs as shown in FIG. 4, respectively, at 20 ℃, 25 ℃ and 37 ℃ in incubation.

Examples PAM preference identification of five BES2, BES4 and BES6 systems

The PAM identification experimental method and steps of three systems of BES2, BES4 and BES6 are consistent with the above embodiments, and the main steps are as follows:

(1) Preparation of guide RNA

The production of the messages was predicted to obtain the tracrRNA and crRNA sequences of the BES2 system in the strain Collinella sp.Marseille-P2666 (see Table I below), and double-stranded DNA transcription templates of the sgRNAs integrated by ligation of crRNAs with the tracrRNA were designed, and the specific deoxynucleotide sequences are shown in Table 4 below. BES4 and BES6 belong to Cpf1 homologous systems, the system can realize genome targeted cleavage only by crRNA guide effector proteins, the participation of tracrRNA is not needed, crRNA sequences of the two proteins are predicted through letter generation, and double-stranded DNA transcription templates are designed and synthesized, and specific deoxynucleotide sequences are shown in the following table 4. The deoxynucleotide sequences used in Table 4 are synthesized in Shenzhen national gene library synthesis and editing platform.

Table 4:

the preparation of double-stranded DNA transcription template guide RNA for the crRNA of BES2 system, BES4 and BES6 system shown in Table 4 is the same as in example III.

(2) PAM identification

Rapid detection of PAM sequences for the BES2, BES4 and BES6 systems based on DNB chips was consistent with example three. PAM preference for three systems is shown in fig. 10.

EXAMPLES six in vitro cleavage Activity assays of BES2, BES4 and BES6 systems

First, the guide RNA sequences of the BES2, BES4 and BES6 systems are expressed in vitro according to the description in example three; secondly, in accordance with the experimental method in example two, expressing effector proteins of the purified BES2, BES4 and BES6 systems; finally, substrate preparation and in vitro cleavage were performed in accordance with the experimental procedure in example four. As shown in FIG. 11, all three systems have the activity of cleaving DNA double strand in vitro.

Example seven BES6 System identification of edit Activity in human cells

(1) Human cell culture

The inventors selected human HEK293T cells as cells for in vivo editing activity testing. HEK293T cells were cultured on DMEM medium and were fed with Fetal Bovine Serum (FBS).

(2) RNP preparation

For editing HEK293T cells, we selected endogenous gene AAVS1 for targeted cleavage validation.

The targeting region nucleotide sequence of AAVS1 is as follows:

CCCTTGCTCTCTGCTGTGTTGCTGCCCAAGGATGCTCTTTCCGGAGCACTTCCTTC

TCGGCGCTGCACCACGTGATGTCCTCTGAGCGGATCCTCCCCGTGTCTGGGTCCTCTC

CGGGCATCTCTCCTCCCTCACCCAACCCCATGCCGTcTTCACTCGCTGGGTTCCCTTTT

CCTTCTCCTTCTGGGGCCTGTGCCATCTCTCGTTTCTTAGGATGGCCTTCTCCGACGGA

TGTCTCCCTTGCGTCCCGCCTCCCCTTCTTGTAGGCCTGCATCATCACCGTTTTTCTGG

ACAACCCCAAAGTACCCCGTCTCCCTGGCTTtAGcCACCTCTCCATCCTCTTGCTTTCTT

TGCCTGGACACCCCGTTCTCCTGTGGATTCGGGTCACCTCTCACTCCTTTCATTTGGGC

AGCTCCCCTACCCCCCTTACCTCTCTAGTCTGTGCTAGCTCTTCCAGCCCCCTGTCATG

GCATCTTCCAGGGGTCCGAGAGCTCAGCTAGTCTTCTTCCTCCAACCCGGGCCCcTAT

GTCCACTTCAGGACAGCATGTTTGCTGCCTCCAGGGATCCTGTGTCCCCGAGCTGGGA

CCACCTTATATTCCCAGGGCCGGTTAATGTGGCTCTGGTTCTGGGTACTTTTATCTGTCC

CCTCCACCCCACAGTGGGGCCACTAGGGACAGGATTGGTGACAGAAAAGCCCCATCC

TTAGGCCTCCTCCTTCCTAGTCTCCTGATATTGGGTCTAACCCCCACCTCCTGTTAGGC

AGATTCCTTATCTGGTGACACACCCCCATTTCCTGGAGCCATCTCTCTCCTTGCCAGAA

CCTCTAAGGTTTGCTTACGATGGAGCCAGAGAGGATCCTGGGAGGGAGAGCTTGGCA

GGGGGTGGGAGGGAAGGGGGGGATGCGTGACCTGCCCGGTTCTCAGTGGCCACCCT

GCGCTACCCTCTCCCAGAACCTGAGCTGCTCTGACGCGGCTGTCTGGTGCGTTTCACT

GATCCTGGTGCTGCAGCTTCCTTACACTTCCCAAGAGGAGAAGCAGTTTGGAAAAAC

AAAATCAGAATAAGTTGGTCCTGAGTTCTAACTTTGGCTCTTCACCTTTCTAGTCCCCA

ATTTATATTGTTCCTCCGTGCGTCAGTTTTACCTGTGAGATAAGGCCAGTAGCCACCCC

CGTCCTGGCAGGGCTGTGGTGAGGAGGGGGGTGTCCGTGTGGAAAACTCCCTTTGTG

AGAATGGTGCGTCCTAGGTGTTCACCAGGTCGTGGCCGCCTCTACTCCCTTTCTCTTTC

TCCATCCTTCTTTCCTTAAAGAGCCCCCAGTGCTATCTGGACATATTCCTCCGCCCAGA

GCAGGGTCCGCTTCCCTAAGGCCCTGCTCTGGGCTTCTGGGTTTGAGTCCTTGCAAGC

CCAGGAGAGCGCTAGCTTCCCTGTCCCCCTTCCTCGTCCACCATCTCATGCCCTGGCT

CTCCTGCCCCTTCCTACA(SEQ ID NO:27).

for this gene, 1 targeting site was designed, and its double-stranded DNA transcription template was designed and synthesized, and specific deoxynucleotide sequences are shown in table 5 below. The deoxynucleotide sequences used in Table 5 are synthesized in Shenzhen national gene library synthesis and editing platform.

Table 5:

BES4 and BES6 targeting AAVS1 site sequences shown in Table 5 were transcribed in vitro to generate guide RNA according to the manufacturer's recommended method using ordered oligonucleotides and MEGAshortscriptTM T transcription kit (Invitrogen). In vitro expression of BES4 and BES6 effector proteins is consistent with example two.

(3) RNP transfer into human cells

In a twelve well plate, 10 picomoles of purified effector protein and 0.5 microliters of gRNA were added to each well. Using Neon ^TM Transfection System kit and Nuclear transfection apparatus (Invitrogen) RNPs were assembled and transfected into HEK293T cells according to the manufacturer's protocol.

(4) Editing activity identification

Cells were harvested 2-3 days after RNP transfection and tested for activity by T7E1 enzyme assay as follows:

(a) Collecting cells: 200 microliters of 0.5 molar EDTA (pH 8.0) was added to each well of the 12-well plate to resuspend the cells;

(b) Genomic DNA extraction: genomic DNA was extracted using a genomic DNA extraction kit (Tiangen), and gDNA concentration was measured using Nanodrop;

(c) Targeting region PCR: target site region amplification was performed from gDNA using GXL Prime, the amplification primers are shown in table 6 below, and the deoxynucleotide sequences used were synthesized in the shenzhen national gene library synthesis and editing platform. And purified using PCR purification and gel extraction kit (MN). PCR product cleanliness was analyzed by agarose gel electrophoresis while concentration was measured using Nanodrop.

(d) Denaturation and annealing: denaturation and annealing of the purified product of step (c) was performed using a Bio-rad PCR instrument. The T7E1 cleavage reaction was carried out by adding an equivalent amount of substrate DNA (about 200-300ng/rxn, 10. Mu.l of reaction system).

(e) T7E1 enzyme digestion: 0.2. Mu.l of T7EI nuclease was added to the 10. Mu.l of sample in step (d). The cleavage reaction was performed at 37℃for 20 minutes.

(f) Activity detection: after completion of the cleavage reaction, T7E1 was added to a loading buffer to carry out agarose gel detection.

Table 6: PCR amplification primer list

As shown in FIG. 12, BES6 has human cell editing activity.

Example identification of the edit Activity of the eight BES4 System in human cells

(1) Human cell culture

(2) Plasmid preparation

For editing HEK293T cells, we selected endogenous gene HBG for targeted cleavage validation.

The nucleotide sequence of the targeting region of HBG is as follows:

CCCTGCTGTGCTCAGATCAATACTCCGTTGTCTAAGTTGCCTCGAGACTAAAGGC

AACAGGGCTGAAACATCTCCTGGACTCACCTTGAAGTTCTCAGGATCCACATGCAGCT

TGTCACAGTGCAGTTCACTCAGCTGGGCAAAGGTGCCCTTGAGATCATCCAGGTGCTT

TGTGGCATCTCCCAAGGAAGTCAGCACCTTCTTGCCATGTGCCTTGACTTTGGGGTTG

CCCATGATGGCAGAGGCAGAGGACAGGTTGCCAAAGCTGTCAAAGAACCTCTGGGTC

CATGGGTAGACAACCAGGAGCCTGTGAGATTGACAAGAACAGTTTGACAGTCAGAAG

GTGCCACAAATCCTGAGAAGCGACCTGGACTTTTGCCAGGCACAGGGTCCTTCCTTC

CCTCCCTTGTCCTGGTCACCAGAGCCTACCTTCCCAGGGTTTCTCCTCCAGCATCTTCC

ACATTCACCTTGCCCCACAGGCTTGTGATAGTAGCCTTGTCCTCCTCTGTGAAATGACC

CATGGCGTCTGGACTAGGAGCTTATTGATAACCTCAGACGTTCCAGAAGCGAGTGTGT

GGAACTGCTGAAGGGTGCTTCCTTTTATTCTTCATCCCTAGCCAGCCGCCGGCCCCTG

GCCTCACTGGATACTCTAAGACTATTGGTCAAGTTTGCCTTGTCAAGGCTATTGGTCAA

GGCAAGGCTGGCCAACCCATGGGTGGAGTTTAGCCAGGGACCGTTTCAGACAGATAT

TTGCATTGAGATAGTGTGGGGAAGGGGCCCCCAAGAGGATACTGCTAATTTTTTTTATA

GCCTTTGCCTTGTTCCGATTCAGTCATTCCAGTTTTTCTCTAATTTATTCTTCCCTTTAGC

TAGTTTCCTTCTCCCATCATAGAGGATACCAGGACTTCTTTTGTCAGCCGTTTTTTACCT

TCTTGTCTCTAGCTCCAGTGAGGCCTGTAGTTTAAAGCTAAAGCATGTACCAATTTTTG

AAAAGTTCAGGGATTGTGAAATGTGTTTTAGGCATAGGTCCAGGATTTTTGACGGGAC

AAATCTTAGTCTCTTTCAGTTAGCAGTGGTTTCTAAGGA(SEQ ID NO:32).

for this region, the inventors designed three targets and synthesized the corresponding plasmid sequences,

BES4-HBG-sg01:

GAGGGCCTATTTCCCATGATTCCTTCATATTTGCATATACGATACAAGGCTGTTAGA

GAGATAATTGGAATTAATTTGACTGTAAACACAAAGATATTAGTACAAAATACGTGACG

TAGAAAGTAATAATTTCTTGGGTAGTTTGCAGTTTTAAAATTATGTTTTAAAATGGACTA

TCATATGCTTACCGTAACTTGAAAGTATTTCGATTTCTTGGCTTTATATATCTTGTGGAA

AGGACGAAACACCGAATTTCTACTATTGTAGATGCCAGCCTTGCCTTGACCAATAGTTT

TTTGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTTTTAGCGCGTGC

GCCAATTCTGCAGACAAATGGCTCTAGAGGTACCCGTTACATAACTTACGGTAAATGG

CCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAGTAACGCCAATA

GGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAG

TACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGG

CCCGCCTGGCATTGTGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACAT

CTACGTATTAGTCATCGCTATTACCATGGTCGAGGTGAGCCCCACGTTCTGCTTCACTC

TCCCCATCTCCCCCCCCTCCCCACCCCCAATTTTGTATTTATTTATTTTTTAATTATTTTG

TGCAGCGATGGGGGCGGGGGGGGGGGGGGGGCGCGCGCCAGGCGGGGCGGGGCGG

GGCGAGGGGCGGGGCGGGGCGAGGCGGAGAGGTGCGGCGGCAGCCAATCAGAGCG

GCGCGCTCCGAAAGTTTCCTTTTATGGCGAGGCGGCGGCGGCGGCGGCCCTATAAAA

AGCGAAGCGCGCGGCGGGCGGGAGTCGCTGCGCGCTGCCTTCGCCCCGTGCCCCGCT

CCGCCGCCGCCTCGCGCCGCCCGCCCCGGCTCTGACTGACCGCGTTACTCCCACAGG

TGAGCGGGCGGGACGGCCCTTCTCCTCCGGGCTGTAATTAGCTGAGCAAGAGGTAAG

GGTTTAAGGGATGGTTGGTTGGTGGGGTATTAATGTTTAATTACCTGGAGCACCTGCCT

GAAATCACTTTTTTTCAGGTTGGACCGGTGCCACCATGGACTATAAGGACCACGACGG

AGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGATGGCCCCAAAG

AAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCATGCAGGAGAGAAAGAA

GATCAGCCACCTGACCCACAGAAACAGCGTGAAGAAAACCATCAGAATGCAGCTGA

ACCCCGTGGGAAAGACCATGGACTACTTCCAGGCCAAGCAGATCCTGGAGAACGACG

AGAAGCTGAAGGAGGACTACCAGAAGATCAAGGAGATCGCCGACAGATTCTACAGA

AACCTGAACGAGGACGTGCTGAGCAAAACCGGACTGGACAAGCTGAAGGACTACGC

CGAGATCTACTACCATTGCAACACCGACGCCGACAGAAAGAGACTGAACGAGTGCGC

CAGCGAGCTGAGAAAGGAGATCGTGAAGAACTTCAAGAACAGAGATGAGTACAACA

AGCTGTTCAACAAGAAGATGATCGAGATCGTGCTGCCCAAGCACCTGAAGAACGAGG

ACGAGAAGGAAGTGGTGGCCAGCTTCAAGAACTTCACCACCTACTTCACCGGCTTCT

TCACCAACAGAAAGAACATGTACAGCGACGGCGAAGAGTCTACCGCTATTGCCTACA

GATGCATCAACGAGAACCTGCCCAAGCACCTGGACAACGTGAAGGTGTTCGAGAAG

GCCATCAGCAAGCTGAGCAAGAACGCCATCGACGACCTGGATGCCACATATTCTGGCC

TGTGCGGCACAAATCTGTACGACGTGTTCACCGTGGACTACTTCAACTTCCTGCTGCC

CCAAAGCGGAATCACCGAGTACAACAAGATCATCGGCGGCTACACAACAAGCGACGG

CACCAAAGTGAAGGGCATCAACGAGTACATCAACCTGTACAACCAGCAGGTGAGCAA

GAGAGACAAGATCCCCAACCTGAAGATCCTGTACAAGCAGATCCTGAGCGAGAGCGA

GAAGGTGTCTTTCATCCCCCCCAAGTTCGAGGACGACAACGAACTGCTGTCTGCCGT

GAGCGAGTTCTATGCCAACGACGAGACATTTGATGGCATGCCCCTGAAGAAAGCCATC

GACGAAACCAAACTGCTGTTCGGCAACCTGGACAACAGCAGCCTGAACGGCATCTAC

ATCCAGAACGACAGAAGCGTGACCAACCTGAGCAACAGCATGTTCGGCAGCTGGAG

CGTGATTGAGGACCTGTGGAACAAGAACTACGACAGCGTGAACAGCAACAGCAGAA

TCAAGGACATCCAGAAGAGAGAGGACAAGAGAAAGAAGGCCTACAAGGCCGAGAA

GAAGCTGAGCCTGAGCTTCCTGCAGGTGCTGATCAGCAACAGCGAGAACGACGAGAT

CAGAAAGAAGAGCATCGTGGACTACTACAAGACCAGCCTGATGCAGCTGACCGACAA

CCTGAGCGACAAGTACAAAGAAGCCGCCCCCCTGTTTTCTGAGAACTACGACAACGA

GAAGGGCCTGAAGAACGACGACAAGAGCATCAGCCTGATCAAGAACTTCCTGGACG

CCATCAAGGAGATCGAGAAGTTCATCAAGCCCCTGAGCGAGACAAATATCACCGGCG

AGAAGAACGACCTGTTCTACAGCCAGTTCACCCCCCTGCTGGACAACATCAGCAGAA

TCGACAGACTGTACGACAAGGTGAGAAACTACGTGACCCAGAAGCCCTTCAGCACCG

ACAAGATCAAGCTGAACTTCGGCAACAGCCAGCTTCTGAACGGCTGGGACAGAAAC

AAGGAGAAGGACTGTGGCGCTGTGCTGCTGTGTAAGGACGAGAAGTACTACCTGGCC

ATCATCGACAAGAGCAACAACAGCATCCTGGAGAACATCGACTTCCAGGACTGCAAC

GAGAGCGACTACTACGAGAAGATCGTGTACAAGCTGCTGACCAAGATCTCTGGCAAC

CTGCCCAGAGTGTTCTTCAGCGAGAAGCACAAGAAGCTGCTGAGCCCCAGCGATGAG

ATCCTGAAGATCTACAAGAGCGGCACCTTCAAGAAGGGCGACAAGTTCAGCCTTGAC

GACTGCCACAAGCTGATCGACTTCTACAAGGAGAGCTTCAAGAAGTACCCCAAGTGG

CTGATCTACAACTTCAAGTTCAAGAACACCAACGAGTACAACGACATCAGCGAGTTCT

ACAACGACGTGGCCAGCCAGGGATACAACATCAGCAAGATGAAGATCCCCACCAGCT

TCATCGACAAGCTGGTGGACGAGGGCAAGATCTACCTGTTCCAGCTGTACAACAAGG

ACTTCAGCCCCCACAGCAAGGGAACACCTAACCTGCACACCCTGTACTTCAAGATGC

TGTTCGACGAGAGAAACCTGGAGGACGTGGTGTACAAGCTGAATGGCGAGGCCGAG

ATGTTTTACAGACCCGCCAGCATCAAGTATGACAAGCCCACCCACCCTAAGAACACCC

CCATCAAGAACAAGAACACCCTGAACGACAAGAAGGCCAGCACCTTCCCCTACGACC

TGATCAAGGACAAGAGATACACCAAGTGGCAGTTCAGCCTGCACTTCCCCATCACCAT

GAACTTCAAGGCCCCCGACAGAGCCATGATCAACGACGACGTGAGAAACCTGCTGA

AGAGCTGCAACAACAACTTCATCATCGGCATCGACAGAGGCGAGAGAAACCTGCTGT

ACGTGAGCGTGATCGATAGCAACGGCGCCATCATCTACCAGCACAGCCTGAACATCAT

CGGCAACAAGTTCAAGGGCAAGACCTACGAAACCAACTACAGAGAGAAGCTGGCCA

CCAGAGAGAAGGAGAGAACCGAGCAGAGAAGAAACTGGAAGGCCATCGAGAGCAT

CAAGGAGCTGAAGGAGGGCTACATCAGCCAAACCGTGCACGTGATTTGCCAGCTGGT

GGTGAAGTACGACGCCATCATCGTGATGGAGAAGCTGACCGACGGCTTCAAGAGAGG

CAGAACCAAGTTCGAGAAGCAGGTGTACCAGAAGTTCGAGAAGATGCTGATCGACA

AGCTGAACTACTACGTGGACAAGAAGCTGGACCCCAATGAGGAAGGCGGACTGCTG

CATGCTTATCAGCTGACCAACAAGCTGGACAGCTTCGACAAGCTGGGAATGCAGAGC

GGCTTCATCTTCTACGTCAGACCCGACTTCACCAGCAAAATCGACCCCGTGACCGGAT

TTGTGAACCTGCTGTACCCCAGATACGAGAACATCGACAAGGCCAAGGACATGATCA

GCAGATTCGACGACATCAGATACAACGCCGGCGAGGACTTCTTCGAGTTCGACATCG

ACTACGACAAGTTCCCCAAGACCGCCAGCGACTACAGAAAGAAGTGGACCATCTGCA

CCAACGGCGAGAGAATCGAGGCCTTCAGAAACCCCGCCAACAACAACGAGTGGAGC

TACAGAACCATCATCCTGGCCGAGAAGTTCAAGGAGCTGTTCGACAACAACAGCATC

AACTACAGAGACAGCGACGACCTGAAAGCCGAGATCCTGAGCCAAACCAAGGGCAA

GTTCTTCGAGGACTTCTTCAAGCTGCTGAGACTGACCCTGCAGATGAGAAACAGCAA

CCCCGAAACCGGAGAGGACAGGATTCTGAGCCCCGTGAAGGACAAGAACGGCAACT

TCTACGACAGCAGCAAGTACGACGAGAAGAGCAAGCTGCCCTGTGACGCTGATGCTA

ACGGCGCTTACAACATCGCCAGAAAGGGCCTGTGGATCGTGGAGCAGTTCAAGAAGG

CCGACAACGTGTCTGCTGTGGAACCCGTGATCCACAACGACAAGTGGCTGAAGTTCG

TGCAGGAGAACGACATGGCCAACAACAAAAGGCCGGCGGCCACGAAAAAGGCCGG

CCAGGCAAAAAAGAAAAAGGAATTCGGCAGTGGAGAGGGCAGAGGAAGTCTGCTAA

CATGCGGTGACGTCGAGGAGAATCCTGGCCCAGTGAGCAAGGGCGAGGAGCTGTTC

ACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTC

AGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTC

ATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCT

ACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCA

AGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACG

GCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC

ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTG

GAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGC

ATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCC

GACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAAC

CACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCAC

ATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGT

ACAAGGAATTCTAACTAGAGCTCGCTGATCAGCCTCGACTGTGCCTTCTAGTTGCCAG

CCATCTGTTGTTTGCCCCTCCCCCGTGCCTTCCTTGACCCTGGAAGGTGCCACTCCCA

CTGTCCTTTCCTAATAAAATGAGGAAATTGCATCGCATTGTCTGAGTAGGTGTCATTCT

ATTCTGGGGGGTGGGGTGGGGCAGGACAGCAAGGGGGAGGATTGGGAAGAGAATAG

CAGGCATGCTGGGGAGCGGCCGCAGGAACCCCTAGTGATGGAGTTGGCCACTCCCTC

TCTGCGCGCTCGCTCGCTCACTGAGGCCGGGCGACCAAAGGTCGCCCGACGCCCGGG

CTTTGCCCGGGCGGCCTCAGTGAGCGAGCGAGCGCGCAGCTGCCTGCAGGGGCGCCT

GATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCACACCGCATACGTCAAAGCA

ACCATAGTACGCGCCCTGTAGCGGCGCATTAAGCGCGGCGGGTGTGGTGGTTACGCGC

AGCGTGACCGCTACACTTGCCAGCGCCTTAGCGCCCGCTCCTTTCGCTTTCTTCCCTTC

CTTTCTCGCCACGTTCGCCGGCTTTCCCCGTCAAGCTCTAAATCGGGGGCTCCCTTTA

GGGTTCCGATTTAGTGCTTTACGGCACCTCGACCCCAAAAAACTTGATTTGGGTGATG

GTTCACGTAGTGGGCCATCGCCCTGATAGACGGTTTTTCGCCCTTTGACGTTGGAGTC

CACGTTCTTTAATAGTGGACTCTTGTTCCAAACTGGAACAACACTCAACTCTATCTCG

GGCTATTCTTTTGATTTATAAGGGATTTTGCCGATTTCGGTCTATTGGTTAAAAAATGAG

CTGATTTAACAAAAATTTAACGCGAATTTTAACAAAATATTAACGTTTACAATTTTATGG

TGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGCCCCGACACCCGC

CAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACAGAC

AAGCTGTGACCGTCTCCGGGAGCTGCATGTGTCAGAGGTTTTCACCGTCATCACCGA

AACGCGCGAGACGAAAGGGCCTCGTGATACGCCTATTTTTATAGGTTAATGTCATGATA

ATAATGGTTTCTTAGACGTCAGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTA

TTTGTTTATTTTTCTAAATACATTCAAATATGTATCCGCTCATGAGACAATAACCCTGAT

AAATGCTTCAATAATATTGAAAAAGGAAGAGTATGAGTATTCAACATTTCCGTGTCGCC

CTTATTCCCTTTTTTGCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGT

GAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGA

TCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCCGAAGAACGTTTTCCAATGATG

AGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGTATTGACGCCGGGCAAGA

GCAACTCGGTCGCCGCATACACTATTCTCAGAATGACTTGGTTGAGTACTCACCAGTC

ACAGAAAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAA

CCATGAGTGATAACACTGCGGCCAACTTACTTCTGACAACGATCGGAGGACCGAAGG

AGCTAACCGCTTTTTTGCACAACATGGGGGATCATGTAACTCGCCTTGATCGTTGGGA

ACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTGTAGC

AATGGCAACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGG

CAACAATTAATAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTGCGCTCGG

CCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAGCGTGGAAGCCG

CGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTATCTACA

CGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGATCGCTGAGATAGGTG

CCTCACTGATTAAGCATTGGTAACTGTCAGACCAAGTTTACTCATATATACTTTAGATTG

ATTTAAAACTTCATTTTTAATTTAAAAGGATCTAGGTGAAGATCCTTTTTGATAATCTCA

TGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAA

GATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAA

AAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAACTCTTTT

TCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTTCTTCTAGTGTAG

CCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGC

TAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGA

CTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGTG

CACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGA

GCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAG

CGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGT

ATCTTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGC

TCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGT(SEQ ID NO:33).

BES4-HBG-sg02:

GAGGGCCTATTTCCCATGATTCCTTCATATTTGCATATACGATACAAGGCTGTTAGAGAGATAATTGGAATTAATTTGACTGTAAACACAAAGATATTAGTACAAAATACGTGACGTAGAAAGTAATAATTTCTTGGGTAGTTTGCAGTTTTAAAATTATGTTTTAAAATGGACTATCATATGCTTACCGTAACTTGAAAGTATTTCGATTTCTTGGCTTTATATATCTTGTGGAAAGGACGAAACACCGAATTTCTACTATTGTAGATACCAATAGCCTTGACAAGGCAAATT

TTTTGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTTTTAGCGCGTG

CGCCAATTCTGCAGACAAATGGCTCTAGAGGTACCCGTTACATAACTTACGGTAAATG

GCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAGTAACGCCAAT

AGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCA

GTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATG

GCCCGCCTGGCATTGTGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACA

TCTACGTATTAGTCATCGCTATTACCATGGTCGAGGTGAGCCCCACGTTCTGCTTCACT

CTCCCCATCTCCCCCCCCTCCCCACCCCCAATTTTGTATTTATTTATTTTTTAATTATTTT

GTGCAGCGATGGGGGCGGGGGGGGGGGGGGGGCGCGCGCCAGGCGGGGCGGGGCG

GGGCGAGGGGCGGGGCGGGGCGAGGCGGAGAGGTGCGGCGGCAGCCAATCAGAGC

GGCGCGCTCCGAAAGTTTCCTTTTATGGCGAGGCGGCGGCGGCGGCGGCCCTATAAA

AAGCGAAGCGCGCGGCGGGCGGGAGTCGCTGCGCGCTGCCTTCGCCCCGTGCCCCG

CTCCGCCGCCGCCTCGCGCCGCCCGCCCCGGCTCTGACTGACCGCGTTACTCCCACAG

GTGAGCGGGCGGGACGGCCCTTCTCCTCCGGGCTGTAATTAGCTGAGCAAGAGGTAA

GGGTTTAAGGGATGGTTGGTTGGTGGGGTATTAATGTTTAATTACCTGGAGCACCTGCC

TGAAATCACTTTTTTTCAGGTTGGACCGGTGCCACCATGGACTATAAGGACCACGACG

GAGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGATGGCCCCAAA

GAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCATGCAGGAGAGAAAGA

AGATCAGCCACCTGACCCACAGAAACAGCGTGAAGAAAACCATCAGAATGCAGCTG

AACCCCGTGGGAAAGACCATGGACTACTTCCAGGCCAAGCAGATCCTGGAGAACGAC

GAGAAGCTGAAGGAGGACTACCAGAAGATCAAGGAGATCGCCGACAGATTCTACAG

AAACCTGAACGAGGACGTGCTGAGCAAAACCGGACTGGACAAGCTGAAGGACTACG

CCGAGATCTACTACCATTGCAACACCGACGCCGACAGAAAGAGACTGAACGAGTGCG

CCAGCGAGCTGAGAAAGGAGATCGTGAAGAACTTCAAGAACAGAGATGAGTACAAC

AAGCTGTTCAACAAGAAGATGATCGAGATCGTGCTGCCCAAGCACCTGAAGAACGAG

GACGAGAAGGAAGTGGTGGCCAGCTTCAAGAACTTCACCACCTACTTCACCGGCTTC

TTCACCAACAGAAAGAACATGTACAGCGACGGCGAAGAGTCTACCGCTATTGCCTAC

AGATGCATCAACGAGAACCTGCCCAAGCACCTGGACAACGTGAAGGTGTTCGAGAA

GGCCATCAGCAAGCTGAGCAAGAACGCCATCGACGACCTGGATGCCACATATTCTGG

CCTGTGCGGCACAAATCTGTACGACGTGTTCACCGTGGACTACTTCAACTTCCTGCTG

CCCCAAAGCGGAATCACCGAGTACAACAAGATCATCGGCGGCTACACAACAAGCGAC

GGCACCAAAGTGAAGGGCATCAACGAGTACATCAACCTGTACAACCAGCAGGTGAGC

AAGAGAGACAAGATCCCCAACCTGAAGATCCTGTACAAGCAGATCCTGAGCGAGAGC

GAGAAGGTGTCTTTCATCCCCCCCAAGTTCGAGGACGACAACGAACTGCTGTCTGCC

GTGAGCGAGTTCTATGCCAACGACGAGACATTTGATGGCATGCCCCTGAAGAAAGCC

ATCGACGAAACCAAACTGCTGTTCGGCAACCTGGACAACAGCAGCCTGAACGGCATC

TACATCCAGAACGACAGAAGCGTGACCAACCTGAGCAACAGCATGTTCGGCAGCTGG

AGCGTGATTGAGGACCTGTGGAACAAGAACTACGACAGCGTGAACAGCAACAGCAG

AATCAAGGACATCCAGAAGAGAGAGGACAAGAGAAAGAAGGCCTACAAGGCCGAG

AAGAAGCTGAGCCTGAGCTTCCTGCAGGTGCTGATCAGCAACAGCGAGAACGACGA

GATCAGAAAGAAGAGCATCGTGGACTACTACAAGACCAGCCTGATGCAGCTGACCGA

CAACCTGAGCGACAAGTACAAAGAAGCCGCCCCCCTGTTTTCTGAGAACTACGACAA

CGAGAAGGGCCTGAAGAACGACGACAAGAGCATCAGCCTGATCAAGAACTTCCTGG

ACGCCATCAAGGAGATCGAGAAGTTCATCAAGCCCCTGAGCGAGACAAATATCACCG

GCGAGAAGAACGACCTGTTCTACAGCCAGTTCACCCCCCTGCTGGACAACATCAGCA

GAATCGACAGACTGTACGACAAGGTGAGAAACTACGTGACCCAGAAGCCCTTCAGCA

CCGACAAGATCAAGCTGAACTTCGGCAACAGCCAGCTTCTGAACGGCTGGGACAGA

AACAAGGAGAAGGACTGTGGCGCTGTGCTGCTGTGTAAGGACGAGAAGTACTACCTG

GCCATCATCGACAAGAGCAACAACAGCATCCTGGAGAACATCGACTTCCAGGACTGC

AACGAGAGCGACTACTACGAGAAGATCGTGTACAAGCTGCTGACCAAGATCTCTGGC

AACCTGCCCAGAGTGTTCTTCAGCGAGAAGCACAAGAAGCTGCTGAGCCCCAGCGAT

GAGATCCTGAAGATCTACAAGAGCGGCACCTTCAAGAAGGGCGACAAGTTCAGCCTT

GACGACTGCCACAAGCTGATCGACTTCTACAAGGAGAGCTTCAAGAAGTACCCCAAG

TGGCTGATCTACAACTTCAAGTTCAAGAACACCAACGAGTACAACGACATCAGCGAG

TTCTACAACGACGTGGCCAGCCAGGGATACAACATCAGCAAGATGAAGATCCCCACC

AGCTTCATCGACAAGCTGGTGGACGAGGGCAAGATCTACCTGTTCCAGCTGTACAAC

AAGGACTTCAGCCCCCACAGCAAGGGAACACCTAACCTGCACACCCTGTACTTCAAG

ATGCTGTTCGACGAGAGAAACCTGGAGGACGTGGTGTACAAGCTGAATGGCGAGGCC

GAGATGTTTTACAGACCCGCCAGCATCAAGTATGACAAGCCCACCCACCCTAAGAAC

ACCCCCATCAAGAACAAGAACACCCTGAACGACAAGAAGGCCAGCACCTTCCCCTAC

GACCTGATCAAGGACAAGAGATACACCAAGTGGCAGTTCAGCCTGCACTTCCCCATC

ACCATGAACTTCAAGGCCCCCGACAGAGCCATGATCAACGACGACGTGAGAAACCTG

CTGAAGAGCTGCAACAACAACTTCATCATCGGCATCGACAGAGGCGAGAGAAACCTG

CTGTACGTGAGCGTGATCGATAGCAACGGCGCCATCATCTACCAGCACAGCCTGAACA

TCATCGGCAACAAGTTCAAGGGCAAGACCTACGAAACCAACTACAGAGAGAAGCTG

GCCACCAGAGAGAAGGAGAGAACCGAGCAGAGAAGAAACTGGAAGGCCATCGAGA

GCATCAAGGAGCTGAAGGAGGGCTACATCAGCCAAACCGTGCACGTGATTTGCCAGC

TGGTGGTGAAGTACGACGCCATCATCGTGATGGAGAAGCTGACCGACGGCTTCAAGA

GAGGCAGAACCAAGTTCGAGAAGCAGGTGTACCAGAAGTTCGAGAAGATGCTGATC

GACAAGCTGAACTACTACGTGGACAAGAAGCTGGACCCCAATGAGGAAGGCGGACT

GCTGCATGCTTATCAGCTGACCAACAAGCTGGACAGCTTCGACAAGCTGGGAATGCA

GAGCGGCTTCATCTTCTACGTCAGACCCGACTTCACCAGCAAAATCGACCCCGTGACC

GGATTTGTGAACCTGCTGTACCCCAGATACGAGAACATCGACAAGGCCAAGGACATG

ATCAGCAGATTCGACGACATCAGATACAACGCCGGCGAGGACTTCTTCGAGTTCGACA

TCGACTACGACAAGTTCCCCAAGACCGCCAGCGACTACAGAAAGAAGTGGACCATCT

GCACCAACGGCGAGAGAATCGAGGCCTTCAGAAACCCCGCCAACAACAACGAGTGG

AGCTACAGAACCATCATCCTGGCCGAGAAGTTCAAGGAGCTGTTCGACAACAACAGC

ATCAACTACAGAGACAGCGACGACCTGAAAGCCGAGATCCTGAGCCAAACCAAGGG

CAAGTTCTTCGAGGACTTCTTCAAGCTGCTGAGACTGACCCTGCAGATGAGAAACAG

CAACCCCGAAACCGGAGAGGACAGGATTCTGAGCCCCGTGAAGGACAAGAACGGCA

ACTTCTACGACAGCAGCAAGTACGACGAGAAGAGCAAGCTGCCCTGTGACGCTGATG

CTAACGGCGCTTACAACATCGCCAGAAAGGGCCTGTGGATCGTGGAGCAGTTCAAGA

AGGCCGACAACGTGTCTGCTGTGGAACCCGTGATCCACAACGACAAGTGGCTGAAGT

TCGTGCAGGAGAACGACATGGCCAACAACAAAAGGCCGGCGGCCACGAAAAAGGCC

GGCCAGGCAAAAAAGAAAAAGGAATTCGGCAGTGGAGAGGGCAGAGGAAGTCTGCT

AACATGCGGTGACGTCGAGGAGAATCCTGGCCCAGTGAGCAAGGGCGAGGAGCTGT

TCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGT

TCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGT

TCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGA

CCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTT

CAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGA

CGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACC

GCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAG

CTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAAC

GGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTC

GCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC

AACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGAT

CACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAG

CTGTACAAGGAATTCTAACTAGAGCTCGCTGATCAGCCTCGACTGTGCCTTCTAGTTG

CCAGCCATCTGTTGTTTGCCCCTCCCCCGTGCCTTCCTTGACCCTGGAAGGTGCCACT

CCCACTGTCCTTTCCTAATAAAATGAGGAAATTGCATCGCATTGTCTGAGTAGGTGTCA

TTCTATTCTGGGGGGTGGGGTGGGGCAGGACAGCAAGGGGGAGGATTGGGAAGAGA

ATAGCAGGCATGCTGGGGAGCGGCCGCAGGAACCCCTAGTGATGGAGTTGGCCACTC

CCTCTCTGCGCGCTCGCTCGCTCACTGAGGCCGGGCGACCAAAGGTCGCCCGACGCC

CGGGCTTTGCCCGGGCGGCCTCAGTGAGCGAGCGAGCGCGCAGCTGCCTGCAGGGG

CGCCTGATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCACACCGCATACGTCA

AAGCAACCATAGTACGCGCCCTGTAGCGGCGCATTAAGCGCGGCGGGTGTGGTGGTT

ACGCGCAGCGTGACCGCTACACTTGCCAGCGCCTTAGCGCCCGCTCCTTTCGCTTTCT

TCCCTTCCTTTCTCGCCACGTTCGCCGGCTTTCCCCGTCAAGCTCTAAATCGGGGGCTC

CCTTTAGGGTTCCGATTTAGTGCTTTACGGCACCTCGACCCCAAAAAACTTGATTTGG

GTGATGGTTCACGTAGTGGGCCATCGCCCTGATAGACGGTTTTTCGCCCTTTGACGTT

GGAGTCCACGTTCTTTAATAGTGGACTCTTGTTCCAAACTGGAACAACACTCAACTCT

ATCTCGGGCTATTCTTTTGATTTATAAGGGATTTTGCCGATTTCGGTCTATTGGTTAAAA

AATGAGCTGATTTAACAAAAATTTAACGCGAATTTTAACAAAATATTAACGTTTACAAT

TTTATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGCCCCGAC

ACCCGCCAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTA

CAGACAAGCTGTGACCGTCTCCGGGAGCTGCATGTGTCAGAGGTTTTCACCGTCATC

ACCGAAACGCGCGAGACGAAAGGGCCTCGTGATACGCCTATTTTTATAGGTTAATGTC

ATGATAATAATGGTTTCTTAGACGTCAGGTGGCACTTTTCGGGGAAATGTGCGCGGAA

CCCCTATTTGTTTATTTTTCTAAATACATTCAAATATGTATCCGCTCATGAGACAATAACC

CTGATAAATGCTTCAATAATATTGAAAAAGGAAGAGTATGAGTATTCAACATTTCCGTG

TCGCCCTTATTCCCTTTTTTGCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACG

CTGGTGAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAA

CTGGATCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCCGAAGAACGTTTTCCAA

TGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGTATTGACGCCGGG

CAAGAGCAACTCGGTCGCCGCATACACTATTCTCAGAATGACTTGGTTGAGTACTCAC

CAGTCACAGAAAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGC

CATAACCATGAGTGATAACACTGCGGCCAACTTACTTCTGACAACGATCGGAGGACCG

AAGGAGCTAACCGCTTTTTTGCACAACATGGGGGATCATGTAACTCGCCTTGATCGTT

GGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTG

TAGCAATGGCAACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTC

CCGGCAACAATTAATAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTGCG

CTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAGCGTGGA

AGCCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTAT

CTACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGATCGCTGAGAT

AGGTGCCTCACTGATTAAGCATTGGTAACTGTCAGACCAAGTTTACTCATATATACTTT

AGATTGATTTAAAACTTCATTTTTAATTTAAAAGGATCTAGGTGAAGATCCTTTTTGATA

ATCTCATGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGT

AGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGC

AAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAA

CTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTTCTTCT

AGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTC

GCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCG

GGTTGGACTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGG

GTTCGTGCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTAC

AGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTAT

CCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAA

ACGCCTGGTATCTTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTT

TTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGT(SEQ ID NO:34)

BES4-HBG-SG03:

GAGGGCCTATTTCCCATGATTCCTTCATATTTGCATATACGATACAAGGCTGTTAGAGAGATAATTGGAATTAATTTGACTGTAAACACAAAGATATTAGTACAAAATACGTGACGTAGAAAGTAATAATTTCTTGGGTAGTTTGCAGTTTTAAAATTATGTTTTAAAATGGACTATCATATGCTTACCGTAACTTGAAAGTATTTCGATTTCTTGGCTTTATATATCTTGTGGAAAGGACGAAACACCGAATTTCTACTATTGTAGATCCTTGTCAAGGCTATTGGTCAAGTTTTTTGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTTTTAGCGCGTGCGCCAATTCTGCAGACAAATGGCTCTAGAGGTACCCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAG

TACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGG

CCCGCCTGGCATTGTGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACAT

CTACGTATTAGTCATCGCTATTACCATGGTCGAGGTGAGCCCCACGTTCTGCTTCACTC

TCCCCATCTCCCCCCCCTCCCCACCCCCAATTTTGTATTTATTTATTTTTTAATTATTTTG

TGCAGCGATGGGGGCGGGGGGGGGGGGGGGGCGCGCGCCAGGCGGGGCGGGGCGG

GGCGAGGGGCGGGGCGGGGCGAGGCGGAGAGGTGCGGCGGCAGCCAATCAGAGCG

GCGCGCTCCGAAAGTTTCCTTTTATGGCGAGGCGGCGGCGGCGGCGGCCCTATAAAA

AGCGAAGCGCGCGGCGGGCGGGAGTCGCTGCGCGCTGCCTTCGCCCCGTGCCCCGCT

CCGCCGCCGCCTCGCGCCGCCCGCCCCGGCTCTGACTGACCGCGTTACTCCCACAGG

TGAGCGGGCGGGACGGCCCTTCTCCTCCGGGCTGTAATTAGCTGAGCAAGAGGTAAG

GGTTTAAGGGATGGTTGGTTGGTGGGGTATTAATGTTTAATTACCTGGAGCACCTGCCT

GAAATCACTTTTTTTCAGGTTGGACCGGTGCCACCATGGACTATAAGGACCACGACGG

AGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGATGGCCCCAAAG

AAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCATGCAGGAGAGAAAGAA

GATCAGCCACCTGACCCACAGAAACAGCGTGAAGAAAACCATCAGAATGCAGCTGA

ACCCCGTGGGAAAGACCATGGACTACTTCCAGGCCAAGCAGATCCTGGAGAACGACG

AGAAGCTGAAGGAGGACTACCAGAAGATCAAGGAGATCGCCGACAGATTCTACAGA

AACCTGAACGAGGACGTGCTGAGCAAAACCGGACTGGACAAGCTGAAGGACTACGC

CGAGATCTACTACCATTGCAACACCGACGCCGACAGAAAGAGACTGAACGAGTGCGC

CAGCGAGCTGAGAAAGGAGATCGTGAAGAACTTCAAGAACAGAGATGAGTACAACA

AGCTGTTCAACAAGAAGATGATCGAGATCGTGCTGCCCAAGCACCTGAAGAACGAGG

ACGAGAAGGAAGTGGTGGCCAGCTTCAAGAACTTCACCACCTACTTCACCGGCTTCT

TCACCAACAGAAAGAACATGTACAGCGACGGCGAAGAGTCTACCGCTATTGCCTACA

GATGCATCAACGAGAACCTGCCCAAGCACCTGGACAACGTGAAGGTGTTCGAGAAG

GCCATCAGCAAGCTGAGCAAGAACGCCATCGACGACCTGGATGCCACATATTCTGGCC

TGTGCGGCACAAATCTGTACGACGTGTTCACCGTGGACTACTTCAACTTCCTGCTGCC

CCAAAGCGGAATCACCGAGTACAACAAGATCATCGGCGGCTACACAACAAGCGACGG

CACCAAAGTGAAGGGCATCAACGAGTACATCAACCTGTACAACCAGCAGGTGAGCAA

GAGAGACAAGATCCCCAACCTGAAGATCCTGTACAAGCAGATCCTGAGCGAGAGCGA

GAAGGTGTCTTTCATCCCCCCCAAGTTCGAGGACGACAACGAACTGCTGTCTGCCGT

GAGCGAGTTCTATGCCAACGACGAGACATTTGATGGCATGCCCCTGAAGAAAGCCATC

GACGAAACCAAACTGCTGTTCGGCAACCTGGACAACAGCAGCCTGAACGGCATCTAC

ATCCAGAACGACAGAAGCGTGACCAACCTGAGCAACAGCATGTTCGGCAGCTGGAG

CGTGATTGAGGACCTGTGGAACAAGAACTACGACAGCGTGAACAGCAACAGCAGAA

TCAAGGACATCCAGAAGAGAGAGGACAAGAGAAAGAAGGCCTACAAGGCCGAGAA

GAAGCTGAGCCTGAGCTTCCTGCAGGTGCTGATCAGCAACAGCGAGAACGACGAGAT

CAGAAAGAAGAGCATCGTGGACTACTACAAGACCAGCCTGATGCAGCTGACCGACAA

CCTGAGCGACAAGTACAAAGAAGCCGCCCCCCTGTTTTCTGAGAACTACGACAACGA

GAAGGGCCTGAAGAACGACGACAAGAGCATCAGCCTGATCAAGAACTTCCTGGACG

CCATCAAGGAGATCGAGAAGTTCATCAAGCCCCTGAGCGAGACAAATATCACCGGCG

AGAAGAACGACCTGTTCTACAGCCAGTTCACCCCCCTGCTGGACAACATCAGCAGAA

TCGACAGACTGTACGACAAGGTGAGAAACTACGTGACCCAGAAGCCCTTCAGCACCG

ACAAGATCAAGCTGAACTTCGGCAACAGCCAGCTTCTGAACGGCTGGGACAGAAAC

AAGGAGAAGGACTGTGGCGCTGTGCTGCTGTGTAAGGACGAGAAGTACTACCTGGCC

ATCATCGACAAGAGCAACAACAGCATCCTGGAGAACATCGACTTCCAGGACTGCAAC

GAGAGCGACTACTACGAGAAGATCGTGTACAAGCTGCTGACCAAGATCTCTGGCAAC

CTGCCCAGAGTGTTCTTCAGCGAGAAGCACAAGAAGCTGCTGAGCCCCAGCGATGAG

ATCCTGAAGATCTACAAGAGCGGCACCTTCAAGAAGGGCGACAAGTTCAGCCTTGAC

GACTGCCACAAGCTGATCGACTTCTACAAGGAGAGCTTCAAGAAGTACCCCAAGTGG

CTGATCTACAACTTCAAGTTCAAGAACACCAACGAGTACAACGACATCAGCGAGTTCT

ACAACGACGTGGCCAGCCAGGGATACAACATCAGCAAGATGAAGATCCCCACCAGCT

TCATCGACAAGCTGGTGGACGAGGGCAAGATCTACCTGTTCCAGCTGTACAACAAGG

ACTTCAGCCCCCACAGCAAGGGAACACCTAACCTGCACACCCTGTACTTCAAGATGC

TGTTCGACGAGAGAAACCTGGAGGACGTGGTGTACAAGCTGAATGGCGAGGCCGAG

ATGTTTTACAGACCCGCCAGCATCAAGTATGACAAGCCCACCCACCCTAAGAACACCC

CCATCAAGAACAAGAACACCCTGAACGACAAGAAGGCCAGCACCTTCCCCTACGACC

TGATCAAGGACAAGAGATACACCAAGTGGCAGTTCAGCCTGCACTTCCCCATCACCAT

GAACTTCAAGGCCCCCGACAGAGCCATGATCAACGACGACGTGAGAAACCTGCTGA

AGAGCTGCAACAACAACTTCATCATCGGCATCGACAGAGGCGAGAGAAACCTGCTGT

ACGTGAGCGTGATCGATAGCAACGGCGCCATCATCTACCAGCACAGCCTGAACATCAT

CGGCAACAAGTTCAAGGGCAAGACCTACGAAACCAACTACAGAGAGAAGCTGGCCA

CCAGAGAGAAGGAGAGAACCGAGCAGAGAAGAAACTGGAAGGCCATCGAGAGCAT

CAAGGAGCTGAAGGAGGGCTACATCAGCCAAACCGTGCACGTGATTTGCCAGCTGGT

GGTGAAGTACGACGCCATCATCGTGATGGAGAAGCTGACCGACGGCTTCAAGAGAGG

CAGAACCAAGTTCGAGAAGCAGGTGTACCAGAAGTTCGAGAAGATGCTGATCGACA

AGCTGAACTACTACGTGGACAAGAAGCTGGACCCCAATGAGGAAGGCGGACTGCTG

CATGCTTATCAGCTGACCAACAAGCTGGACAGCTTCGACAAGCTGGGAATGCAGAGC

GGCTTCATCTTCTACGTCAGACCCGACTTCACCAGCAAAATCGACCCCGTGACCGGAT

TTGTGAACCTGCTGTACCCCAGATACGAGAACATCGACAAGGCCAAGGACATGATCA

GCAGATTCGACGACATCAGATACAACGCCGGCGAGGACTTCTTCGAGTTCGACATCG

ACTACGACAAGTTCCCCAAGACCGCCAGCGACTACAGAAAGAAGTGGACCATCTGCA

CCAACGGCGAGAGAATCGAGGCCTTCAGAAACCCCGCCAACAACAACGAGTGGAGC

TACAGAACCATCATCCTGGCCGAGAAGTTCAAGGAGCTGTTCGACAACAACAGCATC

AACTACAGAGACAGCGACGACCTGAAAGCCGAGATCCTGAGCCAAACCAAGGGCAA

GTTCTTCGAGGACTTCTTCAAGCTGCTGAGACTGACCCTGCAGATGAGAAACAGCAA

CCCCGAAACCGGAGAGGACAGGATTCTGAGCCCCGTGAAGGACAAGAACGGCAACT

TCTACGACAGCAGCAAGTACGACGAGAAGAGCAAGCTGCCCTGTGACGCTGATGCTA

ACGGCGCTTACAACATCGCCAGAAAGGGCCTGTGGATCGTGGAGCAGTTCAAGAAGG

CCGACAACGTGTCTGCTGTGGAACCCGTGATCCACAACGACAAGTGGCTGAAGTTCG

TGCAGGAGAACGACATGGCCAACAACAAAAGGCCGGCGGCCACGAAAAAGGCCGG

CCAGGCAAAAAAGAAAAAGGAATTCGGCAGTGGAGAGGGCAGAGGAAGTCTGCTAA

CATGCGGTGACGTCGAGGAGAATCCTGGCCCAGTGAGCAAGGGCGAGGAGCTGTTC

ACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTC

AGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTC

ATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCT

ACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCA

AGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACG

GCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC

ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTG

GAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGC

ATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCC

GACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAAC

CACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCAC

ATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGT

ACAAGGAATTCTAACTAGAGCTCGCTGATCAGCCTCGACTGTGCCTTCTAGTTGCCAG

CCATCTGTTGTTTGCCCCTCCCCCGTGCCTTCCTTGACCCTGGAAGGTGCCACTCCCA

CTGTCCTTTCCTAATAAAATGAGGAAATTGCATCGCATTGTCTGAGTAGGTGTCATTCT

ATTCTGGGGGGTGGGGTGGGGCAGGACAGCAAGGGGGAGGATTGGGAAGAGAATAG

CAGGCATGCTGGGGAGCGGCCGCAGGAACCCCTAGTGATGGAGTTGGCCACTCCCTC

TCTGCGCGCTCGCTCGCTCACTGAGGCCGGGCGACCAAAGGTCGCCCGACGCCCGGG

CTTTGCCCGGGCGGCCTCAGTGAGCGAGCGAGCGCGCAGCTGCCTGCAGGGGCGCCT

GATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCACACCGCATACGTCAAAGCA

ACCATAGTACGCGCCCTGTAGCGGCGCATTAAGCGCGGCGGGTGTGGTGGTTACGCGC

AGCGTGACCGCTACACTTGCCAGCGCCTTAGCGCCCGCTCCTTTCGCTTTCTTCCCTTC

CTTTCTCGCCACGTTCGCCGGCTTTCCCCGTCAAGCTCTAAATCGGGGGCTCCCTTTA

GGGTTCCGATTTAGTGCTTTACGGCACCTCGACCCCAAAAAACTTGATTTGGGTGATG

GTTCACGTAGTGGGCCATCGCCCTGATAGACGGTTTTTCGCCCTTTGACGTTGGAGTC

CACGTTCTTTAATAGTGGACTCTTGTTCCAAACTGGAACAACACTCAACTCTATCTCG

GGCTATTCTTTTGATTTATAAGGGATTTTGCCGATTTCGGTCTATTGGTTAAAAAATGAG

CTGATTTAACAAAAATTTAACGCGAATTTTAACAAAATATTAACGTTTACAATTTTATGG

TGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGCCCCGACACCCGC

CAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACAGAC

AAGCTGTGACCGTCTCCGGGAGCTGCATGTGTCAGAGGTTTTCACCGTCATCACCGA

AACGCGCGAGACGAAAGGGCCTCGTGATACGCCTATTTTTATAGGTTAATGTCATGATA

ATAATGGTTTCTTAGACGTCAGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTA

TTTGTTTATTTTTCTAAATACATTCAAATATGTATCCGCTCATGAGACAATAACCCTGAT

AAATGCTTCAATAATATTGAAAAAGGAAGAGTATGAGTATTCAACATTTCCGTGTCGCC

CTTATTCCCTTTTTTGCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGT

GAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGA

TCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCCGAAGAACGTTTTCCAATGATG

AGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGTATTGACGCCGGGCAAGA

GCAACTCGGTCGCCGCATACACTATTCTCAGAATGACTTGGTTGAGTACTCACCAGTC

ACAGAAAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAA

CCATGAGTGATAACACTGCGGCCAACTTACTTCTGACAACGATCGGAGGACCGAAGG

AGCTAACCGCTTTTTTGCACAACATGGGGGATCATGTAACTCGCCTTGATCGTTGGGA

ACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTGTAGC

AATGGCAACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGG

CAACAATTAATAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTGCGCTCGG

CCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAGCGTGGAAGCCG

CGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTATCTACA

CGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGATCGCTGAGATAGGTG

CCTCACTGATTAAGCATTGGTAACTGTCAGACCAAGTTTACTCATATATACTTTAGATTG

ATTTAAAACTTCATTTTTAATTTAAAAGGATCTAGGTGAAGATCCTTTTTGATAATCTCA

TGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAA

GATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAA

AAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAACTCTTTT

TCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTTCTTCTAGTGTAG

CCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGC

TAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGA

CTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGTG

CACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGA

GCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAG

CGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGT

ATCTTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGC

TCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGT(SEQ ID NO:35).

PX458-HBG-SG01:

GAGGGCCTATTTCCCATGATTCCTTCATATTTGCATATACGATACAAGGCTGTTAGAGAGATAATTGGAATTAATTTGACTGTAAACACAAAGATATTAGTACAAAATACGTGACGTAGAAAGTAATAATTTCTTGGGTAGTTTGCAGTTTTAAAATTATGTTTTAAAATGGACTATCATATGCTTACCGTAACTTGAAAGTATTTCGATTTCTTGGCTTTATATATCTTGTGGAAAGGACGAAACACCCTTGTCAAGGCTATTGGTCAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTTGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTTTTAGCGCGTGCGCCAATTCTGCAGACAAATGGCTCTAGAGGTACCCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTGTGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTCGAGGTGAGCCCCACGTTCTGCTTCACTCTCC

CCATCTCCCCCCCCTCCCCACCCCCAATTTTGTATTTATTTATTTTTTAATTATTTTGTGC

AGCGATGGGGGCGGGGGGGGGGGGGGGGCGCGCGCCAGGCGGGGCGGGGCGGGGC

GAGGGGCGGGGCGGGGCGAGGCGGAGAGGTGCGGCGGCAGCCAATCAGAGCGGCG

CGCTCCGAAAGTTTCCTTTTATGGCGAGGCGGCGGCGGCGGCGGCCCTATAAAAAGC

GAAGCGCGCGGCGGGCGGGAGTCGCTGCGCGCTGCCTTCGCCCCGTGCCCCGCTCCG

CCGCCGCCTCGCGCCGCCCGCCCCGGCTCTGACTGACCGCGTTACTCCCACAGGTGA

GCGGGCGGGACGGCCCTTCTCCTCCGGGCTGTAATTAGCTGAGCAAGAGGTAAGGGT

TTAAGGGATGGTTGGTTGGTGGGGTATTAATGTTTAATTACCTGGAGCACCTGCCTGAA

ATCACTTTTTTTCAGGTTGGACCGGTGCCACCATGGACTATAAGGACCACGACGGAGA

CTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGATGGCCCCAAAGAAG

AAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGG

CCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGT

GCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGA

ACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGA

AGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAA

GAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAA

GAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAAC

ATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAG

AAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCC

CACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGAACCCCGACAAC

AGCGACGTGGACAAGCTGTTCATCCAGCTGGTGCAGACCTACAACCAGCTGTTCGAG

GAAAACCCCATCAACGCCAGCGGCGTGGACGCCAAGGCCATCCTGTCTGCCAGACTG

AGCAAGAGCAGACGGCTGGAAAATCTGATCGCCCAGCTGCCCGGCGAGAAGAAGAA

TGGCCTGTTCGGAAACCTGATTGCCCTGAGCCTGGGCCTGACCCCCAACTTCAAGAG

CAACTTCGACCTGGCCGAGGATGCCAAACTGCAGCTGAGCAAGGACACCTACGACGA

CGACCTGGACAACCTGCTGGCCCAGATCGGCGACCAGTACGCCGACCTGTTTCTGGC

CGCCAAGAACCTGTCCGACGCCATCCTGCTGAGCGACATCCTGAGAGTGAACACCGA

GATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCA

GGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGA

GATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAG

CCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGA

GGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCG

ACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGC

GGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCC

TGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGC

CTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGG

TGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGA

ACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCG

TGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCT

TCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGG

AAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGA

CTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCAC

GATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGAC

ATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGG

AACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGC

GGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGG

GACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAAC

AGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAG

AAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCC

GGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTC

GTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGA

GAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATC

GAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAA

CACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATAT

GTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATC

GTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGC

GACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGAT

GAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGA

CAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCAT

CAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGG

ACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAG

TGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAA

AGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGT

GGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGA

CTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCA

AGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGAT

TACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAAC

CGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAG

CATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAA

AGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTG

GGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTG

GTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCT

GGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGA

AGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTC

CCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCA

GAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAG

CCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGT

GGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAA

GAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCA

CCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACC

AATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGG

TACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGC

CTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGC

CACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGGAATTCGGCAGTGGAGAGGGC

AGAGGAAGTCTGCTAACATGCGGTGACGTCGAGGAGAATCCTGGCCCAGTGAGCAAG

GGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTA

AACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAA

GCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCT

CGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAA

GCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCAT

CTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCG

ACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACA

TCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGA

CAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACG

GCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCG

TGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCA

ACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTC

TCGGCATGGACGAGCTGTACAAGGAATTCTAACTAGAGCTCGCTGATCAGCCTCGACT

GTGCCTTCTAGTTGCCAGCCATCTGTTGTTTGCCCCTCCCCCGTGCCTTCCTTGACCCT

GGAAGGTGCCACTCCCACTGTCCTTTCCTAATAAAATGAGGAAATTGCATCGCATTGT

CTGAGTAGGTGTCATTCTATTCTGGGGGGTGGGGTGGGGCAGGACAGCAAGGGGGAG

GATTGGGAAGAGAATAGCAGGCATGCTGGGGAGCGGCCGCAGGAACCCCTAGTGATG

GAGTTGGCCACTCCCTCTCTGCGCGCTCGCTCGCTCACTGAGGCCGGGCGACCAAAG

GTCGCCCGACGCCCGGGCTTTGCCCGGGCGGCCTCAGTGAGCGAGCGAGCGCGCAG

CTGCCTGCAGGGGCGCCTGATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATTTCAC

ACCGCATACGTCAAAGCAACCATAGTACGCGCCCTGTAGCGGCGCATTAAGCGCGGC

GGGTGTGGTGGTTACGCGCAGCGTGACCGCTACACTTGCCAGCGCCTTAGCGCCCGC

TCCTTTCGCTTTCTTCCCTTCCTTTCTCGCCACGTTCGCCGGCTTTCCCCGTCAAGCTC

TAAATCGGGGGCTCCCTTTAGGGTTCCGATTTAGTGCTTTACGGCACCTCGACCCCAA

AAAACTTGATTTGGGTGATGGTTCACGTAGTGGGCCATCGCCCTGATAGACGGTTTTT

CGCCCTTTGACGTTGGAGTCCACGTTCTTTAATAGTGGACTCTTGTTCCAAACTGGAA

CAACACTCAACTCTATCTCGGGCTATTCTTTTGATTTATAAGGGATTTTGCCGATTTCGG

TCTATTGGTTAAAAAATGAGCTGATTTAACAAAAATTTAACGCGAATTTTAACAAAATA

TTAACGTTTACAATTTTATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTA

AGCCAGCCCCGACACCCGCCAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTC

CCGGCATCCGCTTACAGACAAGCTGTGACCGTCTCCGGGAGCTGCATGTGTCAGAGG

TTTTCACCGTCATCACCGAAACGCGCGAGACGAAAGGGCCTCGTGATACGCCTATTTT

TATAGGTTAATGTCATGATAATAATGGTTTCTTAGACGTCAGGTGGCACTTTTCGGGGA

AATGTGCGCGGAACCCCTATTTGTTTATTTTTCTAAATACATTCAAATATGTATCCGCTC

ATGAGACAATAACCCTGATAAATGCTTCAATAATATTGAAAAAGGAAGAGTATGAGTAT

TCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGGCATTTTGCCTTCCTGTTTTTGC

TCACCCAGAAACGCTGGTGAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGT

GGGTTACATCGAACTGGATCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCCGAA

GAACGTTTTCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCG

TATTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACACTATTCTCAGAATGACTTG

GTTGAGTACTCACCAGTCACAGAAAAGCATCTTACGGATGGCATGACAGTAAGAGAA

TTATGCAGTGCTGCCATAACCATGAGTGATAACACTGCGGCCAACTTACTTCTGACAA

CGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACATGGGGGATCATGTAA

CTCGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTG

ACACCACGATGCCTGTAGCAATGGCAACAACGTTGCGCAAACTATTAACTGGCGAACT

ACTTACTCTAGCTTCCCGGCAACAATTAATAGACTGGATGGAGGCGGATAAAGTTGCA

GGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAG

CCGGTGAGCGTGGAAGCCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCT

CCCGTATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAG

ACAGATCGCTGAGATAGGTGCCTCACTGATTAAGCATTGGTAACTGTCAGACCAAGTT

TACTCATATATACTTTAGATTGATTTAAAACTTCATTTTTAATTTAAAAGGATCTAGGTG

AAGATCCTTTTTGATAATCTCATGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTG

AGCGTCAGACCCCGTAGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGC

GTAATCTGCTGCTTGCAAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCG

GATCAAGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATAC

CAAATACTGTTCTTCTAGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGC

ACCGCCTACATACCTCGCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATA

AGTCGTGTCTTACCGGGTTGGACTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTC

GGGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACGACCTACACCG

AACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAA

AGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAG

CTTCCAGGGGGAAACGCCTGGTATCTTTATAGTCCTGTCGGGTTTCGCCACCTCTGACT

TGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGT(SEQ ID NO:36)。

after plasmid sequences required for BES4 Activity detection experiments were obtained from commercially synthesized plasmids and strains, the plasmids and strains were amplified by direct inoculation:

(a) 15mL of antibiotic-free LB liquid medium is taken, 15 mu L of 1000 XAmp antibiotics is added, then a white gun head is utilized to pick up the strain with the target plasmid stored therein, the strain is placed in the medium, and the strain is cultured at 37 ℃ and 200rpm for overnight;

(b) Centrifuging overnight cultured strain at 8000rpm for 3min, centrifuging the strain to bottom, and pouring out culture medium;

(c) Extracting by using a small extraction kit of the radix angelicae or a small extraction medium amount kit of the radix angelicae without endotoxin;

(d) After the plasmid is extracted, concentration quantification is carried out by using Nanodrop, and the plasmid is preserved at the temperature of minus 20 ℃.

(3) Plasmid transfer into human cells

(a) Plasmid transfection utilized the Lipo3000 kit (1.5 μg of plasmid per well input);

(b) Culturing the cells for 2-3 days after transfection, and recovering the cells after full gene editing;

(c) After the cell culture is completed, a gun head is used for sucking the culture medium, 200 mu L of 0.5M EDTA solution is added into each hole of a 12-hole plate, after the culture medium is placed for ten minutes, the culture medium is resuspended by blowing, transferred into an EP tube, and centrifuged at 12000rpm for 1min, and the supernatant is taken for cell recovery;

(4) Editing activity identification

After harvesting the cells, genome extraction and T7E1 enzyme assay were performed to detect activity as follows:

(a) Genomic DNA extraction: genomic DNA was extracted using a genomic DNA extraction kit (Tiangen), and gDNA concentration was measured using Nanodrop;

(b) Targeting region PCR: target site region amplification was performed from gDNA using GXL Prime, the amplification primers are shown in table 7 below, and the deoxynucleotide sequences used were synthesized in the shenzhen national gene library synthesis and editing platform. And purified using PCR purification and gel extraction kit (MN). PCR product cleanliness was analyzed by agarose gel electrophoresis while concentration was measured using Nanodrop.

(c) Denaturation and annealing: denaturation and annealing of the purified product of step (c) was performed using a Bio-rad PCR instrument. The T7E1 cleavage reaction was carried out by adding an equivalent amount of substrate DNA (about 200-300ng/rxn, 10. Mu.l of reaction system).

(d) T7E1 enzyme digestion: 0.2. Mu.l of T7EI nuclease was added to the 10. Mu.l of sample in step (d). The cleavage reaction was performed at 37℃for 20 minutes.

(e) Activity detection: after completion of the cleavage reaction, T7E1 was added to a loading buffer to carry out agarose gel detection.

Table 7: PCR amplification primer list

As shown in FIG. 13, the sg03 plasmid of BES4 has human cell editing activity.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Table I: novel Type II Crispr-Cas system

Table II: novel Cpf1 (Type V) system

Claims

1. A Cas protein, comprising:

the amino acid sequence shown in SEQ ID NO. 4.

2. A nucleic acid sequence encoding the Cas protein of claim 1.

3. The nucleic acid sequence of claim 2, wherein the nucleic acid sequence is DNA or RNA.

4. An expression vector comprising the nucleic acid sequence of claim 2 or 3.

5. A recombinant cell comprising the expression vector of claim 4, wherein the recombinant cell is a non-plant cell.

6. The recombinant cell of claim 5, wherein the recombinant cell is a eukaryotic cell.

7. The recombinant cell of claim 6, wherein the recombinant cell is an animal cell.

8. A Crispr-Cas system comprising the Cas protein of claim 1.

9. The system of claim 8, further comprising at least one of: crRNA, tracrRNA or a chimeric RNA formed from crRNA, tracrRNA.

10. Use of the Cas protein of claim 1, the nucleic acid sequence of claim 2 or 3, the expression vector of claim 4, the recombinant cell of any one of claims 5-7, or the Crispr-Cas system of claim 8 or 9 in the field of gene editing for non-disease diagnosis or treatment.