CN113025597B

CN113025597B - Improved genome editing system

Info

Publication number: CN113025597B
Application number: CN201911351725.4A
Authority: CN
Inventors: 邱金龙; 张倩伟; 尹康权
Original assignee: Institute of Microbiology of CAS
Current assignee: Institute of Microbiology of CAS
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2024-07-05
Anticipated expiration: 2039-12-24
Also published as: CN113025597A

Abstract

The present invention relates to the field of genome editing. In particular, the present invention relates to an improved genome editing system and applications thereof. More specifically, the present invention provides a genome editing fusion polypeptide comprising a CRISPR nuclease and a 5'→3' exonuclease. The invention also provides polynucleotides or expression constructs encoding the polypeptides, and genome editing systems comprising the polypeptides, polynucleotides and/or constructs. The invention also provides a method for editing the genome of a cell by using the genome editing system.

Description

Improved genome editing system

Technical Field

Description of the background

The use of CRISPR/Cas systems to provide immunity against viral infection in bacteria and archaea (WIEDENHEFT et al, 2012) technically simplifies genome editing and is revolutionizing biology and genetic engineering. The CRISPR/Cas9 system is most widely used for genome editing of a variety of organisms including plants (Hsu et al, 2014; yin et al, 2017). The CRISPR/Cas9 system consists of two parts, a Cas9 nuclease and a single guide RNA (sgRNA). Cas9 binds to the scaffold of the sgrnas, and target specificity is determined by a spacer sequence of about 20 nucleotides (nt) at the 5' end of the sgrnas (Jinek et al, 2012). Cas9 typically cleaves the target DNA about 3bp upstream of the prosomain sequence adjacent motif (PAM) sequence. The mutation features induced by CRISPR/Cas9 in plants mainly include deletions of less than 10bp (typically 1-3 bp) and insertions of one base pair (bp), in particular A/T (Paul et al 2016; bortesi et al 2016). Likewise, most mutations induced by Cas9 in mammalian cells are small insertions/deletions (indels) (Kim et al, 2015; kosicki et al, 2018). Although a large genomic deletion of up to 250bp was detected after Cas9 editing (Heckl et al, 2014; liang et al, 2015), its frequency was very low. Thus, CRISPR/Cas9 has been widely used for genomic coding regions, as small indels in coding genes often result in frame shift mutations, resulting in loss of function. However, cas9 remains a challenge for editing regulatory and non-coding genomic sequences, since small insertion deletions induced by one sgRNA are unlikely to result in loss-of-function mutations for regulatory and non-coding genomic sequences.

Two guide RNAs on the border of the deleted fragments, the paired guide RNAs (pgrnas), have been used to generate larger non-coding DNA deletions (Han et al, 2014; yin et al, 2015; zhu et al, 2016) and regulatory element deletions (Diao et al, 2017). However, the need for two sgrnas would certainly increase the limitations of this approach. First, PAM sequence is a limiting factor in the broad application of this strategy. Second, pgrnas still tend to produce a single editing event, especially when the two sgRNA target sites are distant from each other (Zhu et al, 2016). In addition, the introduction of two sgrnas is more laborious and the frequency of off-targets may increase.

Cas12a (formerly Cpf 1) has also been used as a genome editing tool (Zetsche et al, 2015; koonin et al, 2017). Like Cas9, cas12a is also a class 2 Cas enzyme of RNA guide. However, the guide RNA used by Cas12a is shorter than the sgRNA of Cas9 (Li et al, 2017; dang et al, 2015), cas12a can recognize T-enriched PAM (Zetsche et al, 2015; jink et al, 2012) compared to G-enriched PAM of Cas 9. Furthermore, cas12a produces a Double Strand Break (DSB) with staggered ends of 4-5nt overhangs at PAM distal positions, unlike Cas9 (Zetsche et al, 2015). Thus, the mutation characteristics of Cas12a in plants are mainly a shortage of up to 44bp (typically 6-13 bp) and rare insertions (Tang et al, 2017).

There is a need in the art to provide further methods that can produce larger genomic deletions at a particular target site with only one guide RNA, without the need for paired guide RNAs.

Brief description of the invention

In one aspect, the invention provides an isolated fusion polypeptide comprising a CRISPR nuclease and a 5'→3' exonuclease.

In another aspect, the present invention also provides a genome editing system comprising at least one of the following i) to v):

i) Fusion polypeptides and guide RNAs of the invention;

ii) an expression construct comprising a nucleotide sequence encoding a fusion polypeptide of the invention, and a guide RNA;

iii) Fusion polypeptides of the invention, and expression constructs comprising a nucleotide sequence encoding a guide RNA;

iv) an expression construct comprising a nucleotide sequence encoding a fusion polypeptide of the invention, and an expression construct comprising a nucleotide sequence encoding a guide RNA;

v) an expression construct comprising a nucleotide sequence encoding a fusion polypeptide of the invention and a nucleotide sequence encoding a guide RNA.

In another aspect, the invention also provides a method of genetically modifying a cell, comprising introducing into the cell, preferably a plant cell, a genome editing system of the invention.

Brief Description of Drawings

Fig. 1: fusion of T5 exonuclease to Cas9 alters the indel characteristics of genome editing. a) Schematic representation of Cas9 and T5exo-Cas9 constructs. b) Ratio of deletion and insertion of rice protoplast OsMKK5 target site induced by Cas9 or T5exo-Cas 9. Cas9 and T5exo-Cas9 induced mutations were first enriched by PCR amplification of protoplast genomic DNA pre-digested with HindIII, and PCR products were cloned for Sanger sequencing. c) Size distribution of deletions produced by Cas9 and T5exo-Cas 9. d) Representative deletions induced by T5exo-Cas9 at the OsMKK5 locus. Black line, deleted genomic region; a rectangular, front inter-region sequence; triangle, cleavage site for Cas 9; two-sided arrows, forward and reverse primers for PCR amplification.

Fig. 2: fusing T5 exonuclease to Cas9 increases the frequency and size of genomic deletions at the guide RNA target locus. a) In rice protoplasts, cas9 and T5exo-Cas9 induce insertion deletion patterns at the OsMPK16, oscc 48, osALS and OsXa target sites. All experiments were repeated three times with similar results. b) Size distribution of deletions created by Cas9 and T5exo-Cas9 at four target sites in rice protoplasts. "D" indicates the deletion length. All experiments were repeated three times with similar results. c) Genome editing efficiency of Cas9 and T5exo-Cas9 at four targets in rice protoplasts. Untreated protoplast samples were used as controls. Data are mean ± s.e.m (n=3). P-values were calculated by two-way ANOVA. * P < 0.01, P < 0.001.

Fig. 3: the T5exo-Cas9 fusion contributes to the genomic deletion of transgenic rice plants. a) Genotyping results for T0 transgenic rice lines obtained by transformation of sgRNA OsXa-T1 and Cas9 or T5exo-Cas9, respectively, are summarized. b) Indel patterns generated by Cas9 and T5exo-Cas9 at OsXa promoters in transgenic rice lines. "D" indicates the deletion length. c) Resistance of the indicated rice mutants to Xanthomonas oryzae (Xanthomonas oryzae pv. Oryzae) (Xoo) strain PXO 99. The leaves were inoculated and lesion length was measured 12 days after inoculation. Data were analyzed by one-way ANOVA (mean ± s.d). The significant difference between the averages was determined by Fisher's protected LSD test (P.ltoreq.0.05), and significantly different groups were indicated by different lower case letters. d) c sequence of insertion or deletion mutation shown in panel. The upper panel shows the structure of OsXa gene. The lower panel shows the sequence of OsXa target sites. The UPT _PthXo1 sequence is shown in grey, the sgRNA target sequence is underlined, PAM is framed by rectangles, dashed lines indicate deleted nucleotides, triangles indicate inserted nucleotides. WT, wild type.

Fig. 4: fusing T5 exonuclease to Cas12a increases the frequency and size of genomic deletions at the guide RNA target locus. a) Schematic representation of Cas12a and T5exo-Cas12a constructs. b) Insertion deletion pattern induced by Cas12a and T5exo-Cas12a at OsBADH, osEPSPs and OsPDS target sites in rice protoplasts. All experiments were repeated three times with similar results. c) Size distribution of deletions generated by Cas12a and T5exo-Cas12a at three target sites of rice protoplasts. "D" indicates the deletion length. All experiments were repeated three times with similar results. d) Genome editing efficiency of Cas12a and T5exo-Cas12a at three target sites in rice protoplasts. Untreated protoplast samples were used as controls. Data are mean ± s.e.m (n=3). P-values were calculated by two-way ANOVA. * P < 0.01.

Fig. 5: the T5exo-Cas12a fusion produces a larger genomic deletion in transgenic rice plants. a) Genotyping results for T0 transgenic rice lines obtained by transformation of guide RNA OsPDS-T1 and Cas12a or T5exo-Cas12a, respectively, are summarized. b) Indel patterns generated by Cas12a and T5exo-Cas12a at OsPDS genes in transgenic rice lines. "D" indicates the deletion length.

Detailed Description

1. Definition of the definition

In the present invention, unless otherwise indicated, scientific and technical terms used herein have the meanings commonly understood by one of ordinary skill in the art. Also, protein and nucleic acid chemistry, molecular biology, cell and tissue culture, microbiology, immunology-related terms and laboratory procedures as used herein are terms and conventional procedures that are widely used in the corresponding arts. For example, standard recombinant DNA and molecular cloning techniques for use in the present invention are well known to those skilled in the art and are more fully described in the following documents: sambrook, j., fritsch, e.f., and Maniatis,T.,Molecular Cloning：A Laboratory Manual：Cold Spring Harbor Laboratory Press：Cold Spring Harbor,1989( are abbreviated as "Sambrook"). Meanwhile, in order to better understand the present invention, definitions and explanations of related terms are provided below.

As used herein, the term "and/or" encompasses all combinations of items connected by the term, and should be viewed as having been individually listed herein. For example, "a and/or B" encompasses "a", "a and B", and "B". For example, "A, B and/or C" encompasses "a", "B", "C", "a and B", "a and C", "B and C" and "a and B and C".

The term "comprising" is used herein to describe a sequence of a protein or nucleic acid, which may consist of the sequence, or may have additional amino acids or nucleotides at one or both ends of the protein or nucleic acid, but still have the activity described herein. Furthermore, it will be clear to those skilled in the art that the methionine encoded by the start codon at the N-terminus of a polypeptide may be retained in some practical situations (e.g., when expressed in a particular expression system) without substantially affecting the function of the polypeptide. Thus, in describing a particular polypeptide amino acid sequence in the present specification and claims, although it may not comprise a methionine encoded at the N-terminus by the initiation codon, a sequence comprising such methionine is also contemplated at this time, and accordingly, the encoding nucleotide sequence may also comprise the initiation codon; and vice versa.

As used herein, the term "CRISPR nuclease" generally refers to nucleases found in naturally occurring CRISPR systems, as well as modified forms thereof, variants thereof, catalytically active fragments thereof, and the like. The term encompasses any effector protein based on a CRISPR system that is capable of achieving gene targeting (e.g., gene editing, gene targeting regulation, etc.) within a cell.

Examples of "CRISPR nucleases" include Cas9 nucleases or variants thereof. The Cas9 nuclease may be a Cas9 nuclease from a different species, such as spCas9 from streptococcus pyogenes(s) or SaCas9 derived from staphylococcus aureus (s.aureus). "Cas9 nuclease" and "Cas9" are used interchangeably herein to refer to an RNA-guided nuclease comprising a Cas9 protein or fragment thereof (e.g., a protein comprising the active DNA cleavage domain of Cas9 and/or the gRNA binding domain of Cas 9). Cas9 is a component of a CRISPR/Cas (clustered regularly interspaced short palindromic repeats and related systems) genome editing system that can target and cleave DNA target sequences to form DNA Double Strand Breaks (DSBs) under the direction of guide RNAs.

Examples of "CRISPR nucleases" may also include Cas12a nucleases or variants thereof, e.g., high specificity variants. The Cas12a nuclease may be a Cas12a nuclease from a different species, e.g., cas12a nucleases from FRANCISELLA NOVICIDA U112, acidaminococcus sp.bv3l6 and Lachnospiraceae bacterium ND 2006.

As used herein, the term "5' →3' exonuclease" refers to an exonuclease that degrades DNA from the 5' end, i.e., in the 5' to 3' direction. The 5 '. Fwdarw.3 ' exonuclease of interest can remove nucleotides from the 5' end of the ds DNA strand at the blunt end and, in certain embodiments, at the 3' and/or 5' overhang.

As used herein, "gRNA" and "guide RNA" are used interchangeably to refer to an RNA molecule that is capable of forming a complex with a CRISPR nuclease and of targeting the complex to a target sequence due to some complementarity to the target sequence. For example, in Cas 9-based gene editing systems, the gRNA is typically composed of crRNA and tracrRNA molecules that are partially complementary to form a complex, wherein the crRNA comprises a sequence that has sufficient identity to a target sequence and directs the CRISPR complex (Cas 9+ crRNA + tracrRNA) to specifically bind to the target sequence. However, it is known in the art that one-way guide RNAs (sgrnas) can be designed which contain both the features of crrnas and tracrrnas. Whereas in Cas12 a-based genome editing systems, the gRNA is typically composed of only mature crRNA molecules, where the crRNA contains sequences that have sufficient identity to the target sequence and direct specific binding of the complex (Cas 12 a+crrna) to the target sequence. It is within the ability of the person skilled in the art to design a suitable gRNA sequence based on the CRISPR nuclease used and the target sequence to be edited.

As used herein, "genome" encompasses not only chromosomal DNA present in the nucleus of a cell, but also organelle DNA present in subcellular components of the cell (e.g., mitochondria, plastids).

As used herein, "cell" includes cells of any organism suitable for genome editing. Examples of organisms include, but are not limited to, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cows, cats; poultry such as chickens, ducks, geese; plants include monocots and dicots such as rice, maize, wheat, sorghum, barley, soybean, peanut, arabidopsis, and the like.

By "genetically modified organism" or "genetically modified cell" is meant an organism or cell comprising within its genome an exogenous polynucleotide or modified gene or expression control sequence. For example, an exogenous polynucleotide can be stably integrated into the genome of an organism or cell and inherit successive generations. The exogenous polynucleotide may be integrated into the genome alone or as part of a recombinant DNA construct. Modified genes or expression control sequences are those in which the sequence comprises single or multiple deoxynucleotide substitutions, deletions and additions in the genome of the organism or cell.

"Exogenous" with respect to a sequence means a sequence from a foreign species, or if from the same species, a sequence that has undergone significant alteration in composition and/or locus from its native form by deliberate human intervention.

"Polynucleotide", "nucleic acid sequence", "nucleotide sequence" or "nucleic acid fragment" are used interchangeably and are single-or double-stranded RNA or DNA polymers, optionally containing synthetic, unnatural or altered nucleotide bases. Nucleotides are referred to by their single letter designations as follows: "A" is adenosine or deoxyadenosine (corresponding to RNA or DNA, respectively), "C" represents cytidine or deoxycytidine, "G" represents guanosine or deoxyguanosine, "U" represents uridine, "T" represents deoxythymidine, "R" represents purine (A or G), "Y" represents pyrimidine (C or T), "K" represents G or T, "H" represents A or C or T, "I" represents inosine, and "N" represents any nucleotide.

"Polypeptide", "peptide", and "protein" are used interchangeably herein to refer to a polymer of amino acid residues. The term applies to amino acid polymers in which one or more amino acid residues are artificial chemical analogues of the corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers. The terms "polypeptide", "peptide", "amino acid sequence" and "protein" may also include modified forms including, but not limited to, glycosylation, lipid attachment, sulfation, gamma carboxylation of glutamic acid residues, hydroxylation and ADP-ribosylation.

As used herein, an "expression construct" refers to a vector, such as a recombinant vector, suitable for expression of a nucleotide sequence of interest in an organism. "expression" refers to the production of a functional product. For example, expression of a nucleotide sequence may refer to transcription of the nucleotide sequence (e.g., transcription into mRNA or functional RNA) and/or translation of RNA into a precursor or mature protein.

The "expression construct" of the invention may be a linear nucleic acid fragment, a circular plasmid, a viral vector, or in some embodiments, may be an RNA (e.g., mRNA) capable of translation.

The "expression construct" of the invention may comprise regulatory sequences of different origin and nucleotide sequences of interest, or regulatory sequences and nucleotide sequences of interest of the same origin but arranged in a manner different from that normally found in nature.

"Regulatory sequence" and "regulatory element" are used interchangeably and refer to a nucleotide sequence that is located upstream (5 'non-coding sequence), intermediate or downstream (3' non-coding sequence) of a coding sequence and affects transcription, RNA processing or stability, or translation of the relevant coding sequence. Regulatory sequences may include, but are not limited to, promoters, translation leader sequences, introns, and polyadenylation recognition sequences.

"Promoter" refers to a nucleic acid fragment capable of controlling transcription of another nucleic acid fragment. In some embodiments of the invention, the promoter is a promoter capable of controlling transcription of a gene in a cell, whether or not it is derived from the cell. The promoter may be a constitutive or tissue specific or developmentally regulated or inducible promoter.

"Constitutive promoter" refers to a promoter that will generally cause a gene to be expressed in most cases in most cell types. "tissue-specific promoter" and "tissue-preferred promoter" are used interchangeably and refer to promoters that are expressed primarily, but not necessarily exclusively, in one tissue or organ, but also in one particular cell or cell type. "developmentally regulated promoter" refers to a promoter whose activity is determined by developmental events. An "inducible promoter" selectively expresses an operably linked DNA sequence in response to an endogenous or exogenous stimulus (environmental, hormonal, chemical signal, etc.).

As used herein, the term "operably linked" refers to a regulatory element (e.g., without limitation, a promoter sequence, a transcription termination sequence, etc.) linked to a nucleic acid sequence (e.g., a coding sequence or an open reading frame) such that transcription of the nucleotide sequence is controlled and regulated by the transcription regulatory element. Techniques for operably linking a regulatory element region to a nucleic acid molecule are known in the art.

"Introducing" a nucleic acid molecule (e.g., plasmid, linear nucleic acid fragment, RNA, etc.) or protein into an organism refers to transforming a cell of the organism with the nucleic acid or protein such that the nucleic acid or protein is capable of functioning in the cell. "transformation" as used herein includes both stable transformation and transient transformation. "Stable transformation" refers to the introduction of an exogenous nucleotide sequence into the genome, resulting in stable inheritance of an exogenous gene. Once stably transformed, the exogenous nucleic acid sequence is stably integrated into the genome of the organism and any successive generation thereof. "transient transformation" refers to the introduction of a nucleic acid molecule or protein into a cell to perform a function without stable inheritance of an exogenous gene. In transient transformation, the exogenous nucleic acid sequence is not integrated into the genome.

2. Genome editing fusion polypeptides

The present invention provides an isolated fusion polypeptide, wherein the fusion polypeptide comprises a CRISPR nuclease and a 5'→3' exonuclease.

The CRISPR nuclease of the present invention can be any CRISPR nuclease capable of genome editing. In some embodiments, the CRISPR nuclease is Cas9 or an active fragment thereof, such as Cas9 from streptococcus pyogenes (SpCas 9), cas9 from staphylococcus aureus (SaCas 9), cas9 from FRANCISELLA NOVICIDA (FnCas 9), cas9 from vibrio jejuni (Campylobacter jejuni) (CjCas 9), and Cas9 from neisseria griseus (NEISSERIA CINEREA) (NcCas 9). In some embodiments, the CRISPR nuclease is Cas12a or an active fragment thereof, e.g., cas12a (FnCas a) from FRANCISELLA NOVICIDA U112, cas12a of the amino coccus species (Acidaminococcus sp.) BV3L6, and Cas12a (LbCas a) of Mao Luoke bacteria (Lachnospiraceae bacterium) ND 2006. In one embodiment, the amino acid sequence of the CRISPR nuclease is selected from the group consisting of SEQ ID NO:8 or 15. In one embodiment, the nucleotide sequence encoding the CRISPR nuclease is selected from the group consisting of SEQ ID NO:9 or 16.

The 5 '. Fwdarw.3 ' exonuclease according to the present invention may be an exonuclease degrading DNA from the 5' end, i.e., in the 5' to 3' direction. In one embodiment, the exonuclease can digest double-stranded DNA (dsDNA). In some embodiments, the exonuclease can digest single stranded DNA (ssDNA). In some embodiments, the exonuclease can digest double-stranded DNA (dsDNA) and single-stranded DNA (ssDNA). In some embodiments, the exonuclease is not a 3 '. Fwdarw.5' exonuclease. In some embodiments, the 5 '. Fwdarw.3' exonuclease is a T5 exonuclease, e.g., a phage T5 gene D15 product. In some embodiments, the T5 exonuclease comprises a nucleotide sequence that hybridizes to SEQ ID NO:3, an amino acid sequence having at least 80%, at least 90%, at least 95%, at least 99% or 100% sequence identity. In some embodiments, the T5 exonuclease consists of a nucleotide sequence that hybridizes to SEQ ID NO:4, a nucleotide sequence encoding a sequence having at least 80%, at least 90%, at least 95%, at least 99% or 100% sequence identity. In some preferred embodiments, the T5 exonuclease comprises SEQ ID NO:3, and a sequence of amino acids. In a preferred embodiment, the T5 exonuclease consists of SEQ ID NO:4, and a nucleotide sequence encoding the same.

In the polypeptide of the invention, the 5 '. Fwdarw.3' exonuclease and the CRISPR nuclease may be fused directly or indirectly. In some embodiments, the 5 '. Fwdarw.3' exonuclease is directly fused to the CRISPR nuclease. In some embodiments, the 5 '. Fwdarw.3' exonuclease and the CRISPR nuclease can be fused indirectly, e.g., by a linker. The linker may be a nonfunctional amino acid sequence 1-50 (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20 or 20-25, 25-50) or more amino acids long, without secondary or higher structure. For example, the joint may be a flexible joint. In some embodiments, the amino acid sequence of the linker is selected from the group consisting of SEQ ID NOs: 5 or 14.

In the polypeptide of the invention, the 5 '. Fwdarw.3' exonuclease is located at the N-terminus and/or the C-terminus of the CRISPR nuclease. In some embodiments, the 5 '. Fwdarw.3' exonuclease is located at the N-terminus of the CRISPR nuclease. In some embodiments, the 5 '. Fwdarw.3' exonuclease is located at the C-terminus of the CRISPR nuclease.

In some embodiments, the isolated fusion polypeptide further comprises a Nuclear Localization Sequence (NLS). In general, one or more NLS in the polypeptide should be of sufficient strength to drive the polypeptide in the nucleus of the cell to accumulate in an amount that can fulfill its genome editing function. In general, the intensity of the nuclear localization activity is determined by the number, location, one or more specific NLS(s) used, or a combination of these factors in the polypeptide.

In some embodiments of the invention, the NLS of the polypeptide of the invention may be located at the N-terminus and/or the C-terminus. In some embodiments of the invention, the NLS of the polypeptide of the invention may be located between the 5'→3' exonuclease and the CRISPR nuclease. In some embodiments, the polypeptide comprises about 1,2, 3, 4,5, 6, 7, 8, 9, 10 or more NLSs. In some embodiments, the polypeptide comprises about 1,2, 3, 4,5, 6, 7, 8, 9, 10 or more NLS at or near the N-terminus. In some embodiments, the polypeptide comprises about 1,2, 3, 4,5, 6, 7, 8, 9, 10 or more NLS at or near the C-terminus. In some embodiments, the polypeptide comprises a combination of these, such as comprising one or more NLS at the N-terminus and one or more NLS at the C-terminus. When there is more than one NLS, each may be selected to be independent of the other NLS.

Generally, NLS consists of one or more short sequences of positively charged lysines or arginines exposed on the protein surface, but other types of NLS are also known. In some embodiments, the amino acid sequence of the NLS of the invention is selected from the group consisting of SEQ ID NOs: 6. 7, 12 or 13.

In addition, the isolated fusion polypeptides of the invention may also include other targeting sequences, such as cytoplasmic targeting sequences, chloroplast targeting sequences, mitochondrial targeting sequences, etc., depending on the desired editing of the DNA location.

In some embodiments, the isolated fusion polypeptide comprises, from N-terminus to C-terminus: the 5 '. Fwdarw.3' exonuclease, an NLS, the CRISPR nuclease, and another NLS. In some embodiments, the isolated fusion polypeptide comprises, from N-terminus to C-terminus: an NLS, the 5 '. Fwdarw.3' exonuclease, the CRISPR nuclease, and another NLS.

In some preferred embodiments, the isolated fusion polypeptide of the invention comprises the amino acid sequence of SEQ ID NO:1 or 10.

The invention also provides isolated polynucleotides encoding the fusion polypeptides of the invention. In some embodiments, the polynucleotide comprises SEQ ID NO:2 or 11 or a degenerate variant thereof.

In order to obtain efficient expression, in some embodiments, the polynucleotide is codon optimized for the organism being edited, e.g., a plant.

Codon optimization refers to a method of modifying a nucleic acid sequence to enhance expression in a host cell of interest by replacing at least one codon of a native sequence with a more or most frequently used codon in the gene of the host cell (e.g., about or more than about 1, 2,3, 4,5, 10, 15, 20, 25, 50 or more codons while maintaining the native amino acid sequence; different species exhibit specific preferences for certain codons of a particular amino acid; codon preference (difference in codon usage between organisms) is often correlated with translation efficiency of messenger RNA (mRNA) which is believed to depend on the nature of the translated codon and availability of a particular transfer RNA (tRNA) molecule; the dominance of selected tRNA within the cell generally reflects the most frequently used codon for peptide synthesis; thus, genes can be tailored to optimal gene expression in a given organism based on codon optimization; codon usage tables can be readily obtained, e.g., at www.kazusa.orjp/codon/availability ("difference in codon usage" in Codon Usage Database) and can be adapted in the same way as those of Namura, see, e.g., the database, and the like ,"Codon usage tabulated from the international DNA sequence databases：status for the year2000.Nucl.Acids Res.,28：292(2000).

In some embodiments, the isolated fusion polypeptide of the invention, the coding sequence of the 5 '. Fwdarw.3' exonuclease and/or the coding sequence of the CRISPR nuclease are codon optimized for the organism being edited. In some embodiments, the isolated fusion polypeptides of the invention, the coding sequence of the 5 '. Fwdarw.3' exonuclease and/or the coding sequence of the CRISPR nuclease are codon optimized for rice (Oryza sativa).

3. Improved genome editing system

The inventors surprisingly found that fusion of T5 exonuclease with a CRISPR nuclease such as Cas9 or Cas12a can produce larger deletions using only one gRNA at a specific target site and greatly improve editing efficiency. More unexpectedly, when the fusion polypeptide of T5 exonuclease and CRISPR nuclease transformed cells, no cytotoxicity was observed, which makes the fusion polypeptide particularly suitable for genome editing of cells.

Thus, in a further aspect the invention provides the use of an isolated fusion polypeptide of the invention for genome editing of a cell.

In another aspect, the present invention provides a genome editing system comprising at least one of the following i) to v):

i) Isolated fusion polypeptides and guide RNAs of the invention;

iii) The isolated fusion polypeptides of the invention, and expression constructs comprising a nucleotide sequence encoding a guide RNA;

As used herein, a "genome editing system" refers to a combination of components required for genome editing of an intracellular genome. Wherein the individual components of the system, e.g., fusion polypeptides, guide RNAs, etc., may each be independently present or may be present in any combination as a composition.

In some embodiments, wherein the guide RNA is sgRNA. In some embodiments, wherein the guide RNA is a sgRNA and the sgrnas are not paired. Methods for constructing suitable sgrnas according to a given target sequence are known in the art. See, for example, the literature ：Wang,Y.et al.Simultaneous editing of three homoeoalleles in hexaploid bread wheat confers heritable resistance to powdery mildew.Nat.Biotechnol.32,947-951(2014);Shan,Q.et al.Targeted genome modification of crop plants using a CRISPR-Cas system.Nat.Biotechnol.31,686-688(2013);Liang,Z.et a1.Targeted mutagenesis in Zea mays using TALENs and the CRISPR/Cas system.J Genet Genomics.41,63-68(2014).

The design of target sequences that can be recognized and targeted by CRISPR nucleases and guide RNA complexes is within the skill of one of ordinary skill in the art. In general, for Cas9, the target sequence is a sequence complementary to a guide sequence of about 20 nucleotides contained in the guide RNA, and the 3 'end is immediately adjacent to the proscenium sequence adjacent motif (protospacer adjacent motif) (PAM), e.g., 5' -NGG. Whereas for Cas12a it is typically desirable to include PAM at the 5 'end of the target sequence, which may be, for example, 5' -TTTN.

In some embodiments, the CRISPR system of the present invention comprises at least one of ii) to v) above. In some embodiments, the nucleotide sequence encoding the fusion polypeptide of the invention and/or the nucleotide sequence encoding the guide RNA is operably linked to an expression control sequence, preferably a plant expression control sequence, such as a promoter.

Examples of promoters that can be used in the present invention include, but are not limited to: the cauliflower mosaic virus 35S promoter (Odell et al (1985) Nature 313:810-812), the maize Ubi-1 promoter, the wheat U6 promoter, the rice U3 promoter, the maize U3 promoter, the rice actin promoter, trpPro promoter (U.S. patent application Ser. No.10/377,318; 16. 2005), pEMU promoter (Last et al (1991) Theor. Appl. Genet. 81:581-588), the MAS promoter (Velten et al (1984) EMBO J.3:2723-2730), the maize H3 histone promoter (LEPETIT ET A1. (1992) mol. Gen. Genet.231:276-285 and Atanasva et al (1992) Plant J.2 (3): 300) and the European (Brassanasus ALS) 3 (PCT application WO 97/41228). Promoters useful in the present invention also include Moore et al (2006) Plant j.45 (4): 651-683.

In an exemplary embodiment, the construct of the invention comprises the maize Ubi-1 promoter.

4. Method for modifying target sequence in cell genome

In another aspect, the invention provides a method of modifying a target sequence in the genome of a cell, comprising introducing into the cell a genome editing system of the invention.

In some embodiments, the modification results in the deletion of one or more nucleotides, preferably a plurality of consecutive nucleotides, in the target sequence. In some embodiments, the deletion comprises 1-500 or even more consecutive nucleotides.

In some embodiments, the deletion is within the target sequence. In some embodiments, the modification does not include an insertion and/or substitution mutation.

In another aspect, the invention also provides a method of producing a genetically modified cell comprising introducing into the cell a gene editing system of the invention.

In the present invention, the target sequence to be modified may be located at any position of the genome, for example, within a functional gene such as a protein-encoding gene, or may be located, for example, in a gene expression regulatory region such as a promoter region or an enhancer region, thereby effecting functional modification of the gene or modification of gene expression. Modifications in the cellular target sequence may be detected by T7EI, PCR/RE, or sequencing methods. The genome editing system of the present invention is particularly suitable for modification of regulatory sequences such as promoters or non-coding sequences and the like.

In the methods of the invention, the genome editing system may be introduced into cells by various methods well known to those skilled in the art.

Methods useful for introducing the genome editing system of the invention into cells include, but are not limited to: calcium phosphate transfection, protoplast fusion, electroporation, liposome transfection, microinjection, viral infection (e.g., baculovirus, vaccinia virus, adenovirus, and other viruses), gene gun method, PEG-mediated protoplast transformation, agrobacterium-mediated transformation.

Cells that can be genome edited by the methods of the invention can be from, for example, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats; poultry such as chickens, ducks, geese; plants, including monocots and dicots, such as rice, maize, wheat, sorghum, barley, soybean, peanut, arabidopsis, and the like. Preferably, the cell is a plant cell, such as a rice cell.

In some embodiments, the methods of the invention are performed in vitro. For example, the cell is an isolated cell. In other embodiments, the methods of the invention may also be performed in vivo. For example, the cell is a cell in an organism, into which the system of the invention can be introduced in vivo, for example by a virus-mediated method. In some embodiments, the cell is a germ cell. In some embodiments, the cell is a somatic cell.

In another aspect, the invention also provides a genetically modified organism comprising a genetically modified cell produced by the method of the invention.

Such organisms include, but are not limited to, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cows, cats; poultry such as chickens, ducks, geese; plants, including monocots and dicots, such as rice, maize, wheat, sorghum, barley, soybean, peanut, arabidopsis, and the like. Preferably, the organism is a plant, preferably rice.

5. Kit for detecting a substance in a sample

In yet another aspect, also included within the scope of the invention is a kit for use in the methods of the invention, the kit comprising the genome editing system of the invention, and instructions for use. Kits generally include a label that indicates the intended use and/or method of use of the kit contents. The term label includes any written or recorded material provided on or with or otherwise with the kit.

Examples

In order that the invention may be readily understood, a more particular description of the invention will be rendered by reference to specific embodiments that are illustrated in the appended drawings. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

Method of

Plasmid construction

The T5 exonuclease coding sequence has been codon optimized for rice (Oryza sativa) and commercially synthesized (GenScript, nanjing, china). The T5 coding sequence is fused in-frame to the 5' end of Cas9 or Cas12a by a Gibson assembly to produce p163-T5exo-Cas9 or p163-T5exo-Cas12a, respectively. To construct the pH-T5exo-Cas9-sgRNA binary vector, the T5exo-Cas9 expression cassette was cloned into pHUE framework 411 (Xing et al, 2014). To construct the pCambia-T5exo-Cas12a-crRNA binary vector, T5exo-Cas12a and crRNA expression cassettes were cloned into the pCambia2300 backbone.

Protoplast transfection

Seedlings of yellow Japonica rice (Japonica rice) grown on a medium were used to prepare protoplasts. Isolation and transformation of protoplasts was performed as described previously (Shan et al, 2013; wang et al, 2014). Plasmid DNA (10 μg of each construct) was delivered to the protoplasts by PEG-mediated transfection, and the transfected protoplasts were then incubated at 28 ℃. 48 hours after transfection, protoplasts were collected to extract genomic DNA for restriction enzyme assays or amplicon depth sequencing.

Agrobacterium-mediated transformation of rice

The binary vector was introduced into Agrobacterium tumefaciens strain AGL1 by electroporation. As previously reported, agrobacterium-mediated transformation of rice cultivar Nipponbare and regeneration of rice plants was performed (Hiei et al, 1994). Rice calli edited by Cas9/T5exo-Cas9 were selected on medium containing hygromycin (50. Mu.g/ml) and calli edited by Cas12a/T5exo-Cas12a were selected on medium containing G418 (60. Mu.g/ml).

Next generation sequencing of amplicons extracted and targeted by plant genomic DNA

Genomic DNA was extracted from protoplasts and seedlings using the CTAB method (Murray et al, 1980) and then used as a template for PCR amplification. In the first round of PCR, the target region is amplified using site-specific primers. In the second round of PCR, both forward and reverse barcodes were added to the ends of the PCR products for library construction. Equal amounts of PCR products were collected and samples were commercially sequenced by paired end read sequencing using Illumina NextSeq 500 platform (GENEWIZ, su, china). The sequencing reads were checked for indels of the sgRNA target site. Using genomic DNA extracted from three independent protoplast samples, 3 amplicon sequencing was repeated for each target site.

Off-target detection

The inventors examined the potential off-target effects of Cas or T5exo-Cas in OsXa and OsPDS rice mutants, respectively. Potential off-target sites were predicted in the Nipponbare genome by the online tool Cas-OFFinder (Bae et al, 2014). Four potential off-target sites in OsXa rice mutants have 3-4 nucleotide mismatches. Five potential off-target sites with 4-5 nucleotide mismatches in the OsPDS mutants were selected. Locus specific primers flanking these off-target sites were designed. Amplicons of potential off-target sites (about 700 to 1000 bp) were sequenced by Sanger sequencing.

Pathogen inoculation and virulence determination

The Xoo strain PXO99 was inoculated onto two recently fully developed leaves of rice seedlings by six-leaf stage cut (Yang et al, 2013) as previously described (Yang et al, 2013). Disease symptoms were scored by measuring lesion length.

Example 1: fusion of T5 exonuclease to Cas9 alters the indel characteristics of genome editing

To generate larger genomic deletions using the CRISPR/Cas system in higher plants, the inventors selected T5, a well-studied exonuclease that degrades DNA in the 5'- > 3' direction (Kaliman et al, 1986) and fuses it to the N-terminus of Cas9 under the same reading frame (fig. 1 a). To test whether a T5exo-Cas9 fusion protein can alter indel characteristics, the inventors transfected T5exo-Cas9 and Cas9 plasmids, respectively, into rice protoplasts along with the sgRNA OsMKK5-T1 plasmid. Genomic DNA extracted from protoplasts 48h after transfection was first digested with HindIII to reduce unedited DNA, which was then used as template for targeted PCR amplification. To identify indels, the purified PCR products were cloned and sequenced by Sanger sequencing. The resulting indel signature shows that the percentage of deletions in the indels produced by the T5exo-Cas9 fusion is greatly increased, while the portion of insertions is reduced relative to Cas9 (fig. 1 b). The inventors further found that Cas 9-induced deletions were predominantly 1-2bp in size (fig. 1 d), which is consistent with previous reports (Zhang et al, 2014). In contrast, T5exo-Cas9 fusion-induced deletions were variable and larger, up to 446bp (fig. 1 c). These results indicate that fusion of T5 exonuclease with Cas9 contributes to deletions during genome editing.

Example 2: t5exo-Cas9 fusions induced higher frequency and increased the size of genomic deletions

To more thoroughly examine the indels generated by the T5exo-Cas9 fusion, the inventors designed four sgRNAs targeting different genomic loci of rice (OsMPK 16-T1, osCDC48-T1, osALS-T1, osXa-T1) (Table 1). These sgrnas are transformed into rice protoplasts with T5exo-Cas9 fusion or Cas9, respectively. Indels produced at four target sites were analyzed by targeted depth sequencing. Consistently, the T5exo-Cas9 fusion induced significantly more deletions than Cas9. For OsMPK16-T1, the deletion rate increased from 20.1 to 86.5 (4.3 fold); for OsCDC48-T1, the deletion rate increased from 71.8 to 95.6 (1.3 fold); for OsALS-T1, the deletion rate was from 76.4 to 97.4 (1.3 fold); and for OsXa-T1, the deletion rate increased from 22.8 to 90.6 (4.0 fold) (FIG. 2 a). These results demonstrate that during genome editing, T5exo-Cas9 induced deletions more frequently than Cas9.

Table 1: summary of sgRNA target sites and corresponding oligonucleotides for vector construction.

* PAM motifs in each target sequence are shown in bold

Next, the inventors analyzed the size of all deletions made by the T5exo-Cas9 fusion and Cas9 at four target sites. For Cas9, most deletions are smaller than 10bp (96.5% -100%), with only a small portion (0-3.5%) being larger than 10bp. The deletion patterns of OsXa-T1 and OsALS-T1 were predominantly around 1-3bp (FIG. 2 b). The T5exo-Cas9 fusion produced a large deletion, with a deletion of about 16.1% -35.8% greater than 10bp, with an average deletion size of 33-44bp (FIG. 2 b). Interestingly, the genome editing efficiency of Cas9 appears to be enhanced by T5 fusion (fig. 2 c).

Example 3: t5exo-Cas9 fusions produce larger genomic deletions in transgenic rice plants

To demonstrate that rice mutant plants were produced using T5exo-Cas9 fusions, the present inventors performed agrobacterium-mediated rice transformation with a OsXa exo-Cas9 or Cas9, respectively, binary vector expressing OsXa-T1, which was targeted to the UPT _PthXo1 box of the OsXa gene promoter (upregulated by the transcription activator-like effector PthXo 1). For Cas9, the inventors obtained 42T 0 transformants, 36 of which were edited with an editing efficiency of 85.7%. Of the lines compiled, 12 were homozygote mutants and 24 were double mutants (FIG. 3 a). 82% of the indel patterns had 1bp insertions (FIG. 3 b). For T5exo-Cas9, 46T 0 transformants were obtained, 42 strains were edited with an editing efficiency of 91.3%. Of the edited lines, 3 lines were homozygote mutants of the single allele and 35 lines were double allele mutants (fig. 3 a). The frequency of deletions was 72%, with 35% of the deletion mutants having deletions of more than 3bp at the target site (FIG. 3 b). Overall, in transgenic rice plants, the T5exo-Cas9 fusion induced higher frequency and greater genome deletion than Cas9, consistent with the results observed in rice protoplasts (fig. 2). In addition, it appears that fusion of T5 with Cas9 enhances genome editing efficiency in transgenic rice plants, similar to that seen in protoplasts.

The UPT _PthXo1 box (25 bp) in the OsXa gene promoter is the only Xoo-responsive cis-acting element (Yuan et al, 2011). Naturally occurring deletions in the UPT _PthXo1 frame of OsXa result in recessive resistance to Xanthomonas oryzae (Xanthomonas oryzae pv. Oryzae) (Xoo), including the PXO99 strain (Chu et al, 2006). Thus, the present inventors examined PXO99 resistance phenotypes of various homozygous deletion mutants produced by T5exo-Cas9 and Cas9 by a leaf-cutting method. The inventors found that the average lesion length formed on wild-type leaves was about 13 cm. Lesions were about 7-8cm in length on mutants with 1bp insertions or deletions of no more than 2bp on the allele. The 4/-12bp biallelic mutant exhibited the strongest resistance, with lesion lengths of only about 3cm (FIGS. 3c and 3 d). This result suggests that the T5exo-Cas9 fusion may promote loss-of-function mutations of the cis-regulatory element.

The inventors further examined the effect of T5 fusions on Cas9 off-target activity by measuring the frequency of indels of putative off-target sites. For OsXa13-T1 (Table 2), four potential off-target sites with three to four mismatches (Table 2) were identified using the online tool Cas-OFFinder (Table 2) (Bae et al, 2014). The inventors amplified DNA fragments covering these potential off-target sites from T5exo-Cas9 and Cas9 generated mutants. Sequencing of the targeted amplicon indicated that no mutation of these potential off-target sites was detected in the mutants generated by Cas9 and T5exo-Cas9, indicating that fusion of the T5 exonuclease did not alter the off-target activity of Cas 9.

TABLE 2 potential off-target sites in rice

* PAM sequences in each target sequence are shown in bold. Positions that do not match the preselected target are shown underlined

Example 4: fusion of T5 exonuclease with Cas12a increases deletion frequency and enlarges deletion size

To test whether T5 fusions are suitable for other Cas nucleases, the inventors also fused T5 exonucleases to the N-terminus of Cas12a in frame using XTEN linkers (Schellenbergerv et al, 2009). The fusion gene was driven by the maize Ubiquitin-1 promoter (Ubi-1) (FIG. 4 a). Three sgrnas (OsBADH-T1, osEPSPs-T1, osPDS-T1) were designed targeting different genomic loci in rice (table 1). Each sgRNA was transformed into rice protoplasts along with T5exo-Cas12a or Cas12a and editing of each gene was assessed by targeted amplicon depth sequencing. Similar to that observed with T5exo-Cas9, T5exo-Cas12a also induced a higher deletion frequency relative to Cas12a, with OsBADH-T1, the insertion rate was drastically reduced from 6.2 to 0.2 (31.0 fold); for OsEPSPs-T1, the insertion rate was reduced from 11.2 to 1.1 (10.2 times); for OsPDS-T1, from 7.7 to 1.4 (5.5 times) (FIG. 4 b). The inventors then analyzed the size of all deletions made by the T5exo-Cas12a fusion and Cas12a at three target sites. For Cas12a, the deletions are mostly less than 15bp at all three target sites, and concentrated around 6-10 bp. As expected, the T5exo-Cas12a fusions induced a greater deletion per site, and the proportion of > 15bp deletions induced by these T5exo-Cas12a fusions at these target sites was on average 8.6 times higher than Cas12a (fig. 4 c). Taken together, these results support that fusion of T5 exonuclease to Cas12a increases the frequency and size of genomic deletions at the guide RNA target locus. In addition, the genome editing efficiency of the T5exo-Cas12a fusion was higher (1.34-1.47 fold) than Cas12a for all three target sites (fig. 4 d).

Example 5: t5exo-Cas12a fusion produces a larger genomic deletion in transgenic rice plants

The inventors also performed agrobacterium-mediated rice transformation with binary vectors expressing T5exo-Cas12a or Cas12a, or guide RNAs targeting OsPDS genes (table 1). For Cas12a, 128T 0 transformants were obtained, 21 of which were edited with an editing efficiency of 16.4%. For the OsPDS position, all editing features were deletions, most of which were less than 1-15bp, with only 11.5% of the deletions being greater than 15bp (FIG. 5 a). For T5exo-Cas12a, the inventors obtained 150T 0 transformants with a mutation frequency of 28.7% and about 1.8 times that of Cas12 a. For the deletions generated by T5exo-Cas12a, 46.8% was greater than 15bp (fig. 5 a). Wherein, 11.3% of the deletions were greater than 30bp (FIG. 5 b). This result supports that the T5 fusion enhances the genomic deletion and genome editing efficiency of Cas12a in transgenic rice plants.

The inventors also examined the off-target effect of the sgsn rna targeting OsPDS gene on T5exo-Cas12 a. 5 potential off-target sites with 5 to 6 mismatches were identified using the online tool Cas-OFFinder (table 2). DNA fragments covering potential off-target sites were amplified from the mutants produced by T5exo-Cas12a and Cas12 a. Sequencing of the targeted amplicon indicated that no mutations were detected at these potential off-target sites in the mutants produced by T5exo-Cas12a and Cas12a, suggesting that fusion of the T5 exonuclease did not alter the off-target activity of Cas12 a.

Provided herein is a novel method by which Cas9 or Cas12a can be fused to a T5 exonuclease to create larger genomic deletions with one guide RNA at a given target point. As shown by experiments in rice protoplasts and seedlings, the T5exo-Cas fusion caused an increase in both frequency and size of deletions at the target genomic site. Furthermore, the genome editing efficiency of Cas9 and Cas12a is improved by fusion of T5 exonucleases. The T5exo-Cas fusion expands the CRISPR kit and facilitates knockout of regulatory and non-coding DNA. More broadly, the results of the present invention suggest a general strategy for the generation of larger deletions for other Cas nucleases.

Without being bound by any theory, it is speculated that T5 exonucleases can degrade the 5' end of DSBs generated by Cas9 or Cas12a, resulting in increased frequency and increased deletion size when NHEJ repairs DSBs. The different deletion sizes at one genomic site may be due to the different durations of binding of the T5exo-Cas9 or T5exo-Cas12a fusion proteins to DNA, which determines the activity of the T5 exonuclease at the DNA ends. Interestingly, the T5exo-Cas12a fusion produced a larger deletion than T5exo-Cas 9. This may be due to the viscous end produced by Cas12a and the blunt end produced by Cas 9. T5 exonucleases are reported to bind more strongly to DNA duplex with 5' -overhangs than to DNA duplex with blunt ends (Garforth et al, 1997).

The larger genomic deletions generated by the T5exo-Cas fusion will greatly facilitate functional analysis of regulatory sequences and non-coding sequences (such as lncRNA, miRNA and cis elements) because small indels in these regions are highly likely not to generate a loss-of-function phenotype. In this study, the inventors also observed that the small indels (+1/-2) generated in the UPT _PthXo1 box of the OsXa promoter failed to knock out its function, but that the larger deletions (-4/-12) induced by the T5exo-Cas9 fusion disrupted the function of the UPT _PthXo1 box. Recently, rice mutants of 149bp deleted covering the UPT _PthXo1 box were obtained using paired sgRNAs, which showed strong Xoo resistance without affecting fertility (Li et al, 2019). However, the 149bp deletion is much larger than the UPT _PthXo1 box (25 bp), which may affect other regulatory sequences in this region. In contrast, most deletions made by the T5exo-Cas fusion are in the UPT _PthXo1 box, suggesting that the T5exo-Cas fusion provides a more precise strategy to knock out short regulatory sequences and non-coding sequences. Furthermore, it is not easy to design two sgrnas targeting such short sequences, and the present invention would be useful with a new tool that uses only one sgRNA.

One concern of T5 fusion is the potential toxicity of T5 when expressed in exogenous cells, which was originally proposed in bacteria (Kaliman et al, 1986). However, no visible phenotype or defect in growth was observed in transgenic rice expressing T5exo-Cas9 or T5exo-Cas12a, indicating that T5 fusions did not affect plant growth and development.

In summary, the inventors developed a new and efficient strategy that could create larger deletions with one guide RNA based on using fusion strategies for T5 exonuclease with Cas9 or Cas12 a.

Sequence listing

＞SEQ ID NO：1 T5exo-Cas9

＞SEQ ID NO：2 T5exo-Cas9

T5exo-Linker-NLS1-Cas9-NLS2

the NLS，Cas9，T5 exonuclease and Linker are highlighted in gray，purple，blue and orange respectively.

> SEQ ID NO:3 T5 exonuclease

> SEQ ID NO:4 T5 exonuclease

＞SEQ ID NO：5 Linker

＞SEQ ID NO：6 NLS1

＞SEQ ID NO：7 NLS2

＞SEQ ID NO：8 Cas9

＞SEQ ID NO：9 Cas9

＞SEO ID NO：10 T5exo-Cas12a

＞SEQ ID NO：11 T5exo-Cas12a

NLS3-T5exo-Linker-Cas9-NLS4

the NLS,Cas12a,T5 exonuclease and XTEN linker are highlighted in gray,green,blue and yellow respectively.

＞SEQ ID NO：12 NLS3

＞SEQ ID NO：13 NLS3

＞SEQ ID NO：14 XTEN linker

＞SEQ ID NO：15 Cas12a

＞SEQ ID NO：16 Cas12a

Sequence listing

<110> Institute of microorganisms at national academy of sciences

<120> Improved genome editing System

<130> I2019TC3889CB

<160> 16

<170> PatentIn version 3.5

<210> 1

<211> 1709

<212> PRT

<213> ARTIFICIAL SEQUENCE

<220>

<223> ARTIFICIAL SEQUENCE

<400> 1

Met Ser Lys Ser Trp Gly Lys Phe Ile Glu Glu Glu Glu Ala Glu Met

1 5 10 15

Ala Ser Arg Arg Asn Leu Met Ile Val Asp Gly Thr Asn Leu Gly Phe

20 25 30

Arg Phe Lys His Asn Asn Ser Lys Lys Pro Phe Ala Ser Ser Tyr Val

35 40 45

Ser Thr Ile Gln Ser Leu Ala Lys Ser Tyr Ser Ala Arg Thr Thr Ile

50 55 60

Val Leu Gly Asp Lys Gly Lys Ser Val Phe Arg Leu Glu His Leu Pro

65 70 75 80

Glu Tyr Lys Gly Asn Arg Asp Glu Lys Tyr Ala Gln Arg Thr Glu Glu

85 90 95

Glu Lys Ala Leu Asp Glu Gln Phe Phe Glu Tyr Leu Lys Asp Ala Phe

100 105 110

Glu Leu Cys Lys Thr Thr Phe Pro Thr Phe Thr Ile Arg Gly Val Glu

115 120 125

Ala Asp Asp Met Ala Ala Tyr Ile Val Lys Leu Ile Gly His Leu Tyr

130 135 140

Asp His Val Trp Leu Ile Ser Thr Asp Gly Asp Trp Asp Thr Leu Leu

145 150 155 160

Thr Asp Lys Val Ser Arg Phe Ser Phe Thr Thr Arg Arg Glu Tyr His

165 170 175

Leu Arg Asp Met Tyr Glu His His Asn Val Asp Asp Val Glu Gln Phe

180 185 190

Ile Ser Leu Lys Ala Ile Met Gly Asp Leu Gly Asp Asn Ile Arg Gly

195 200 205

Val Glu Gly Ile Gly Ala Lys Arg Gly Tyr Asn Ile Ile Arg Glu Phe

210 215 220

Gly Asn Val Leu Asp Ile Ile Asp Gln Leu Pro Leu Pro Gly Lys Gln

225 230 235 240

Lys Tyr Ile Gln Asn Leu Asn Ala Ser Glu Glu Leu Leu Phe Arg Asn

245 250 255

Leu Ile Leu Val Asp Leu Pro Thr Tyr Cys Val Asp Ala Ile Ala Ala

260 265 270

Val Gly Gln Asp Val Leu Asp Lys Phe Thr Lys Asp Ile Leu Glu Ile

275 280 285

Ala Glu Gln Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly

290 295 300

Gly Ser Gly Ser Met Ala Pro Lys Lys Lys Arg Lys Val Gly Ile His

305 310 315 320

Gly Val Pro Ala Ala Met Asp Lys Lys Tyr Ser Ile Gly Leu Asp Ile

325 330 335

Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val

340 345 350

Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile

355 360 365

Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala

370 375 380

Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg

385 390 395 400

Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala

405 410 415

Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val

420 425 430

Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly Asn Ile Val

435 440 445

Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg

450 455 460

Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr

465 470 475 480

Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu

485 490 495

Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln

500 505 510

Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala

515 520 525

Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser

530 535 540

Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn

545 550 555 560

Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn

565 570 575

Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser

580 585 590

Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly

595 600 605

Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala

610 615 620

Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala

625 630 635 640

Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His His Gln Asp

645 650 655

Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr

660 665 670

Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile

675 680 685

Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile

690 695 700

Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg

705 710 715 720

Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro

725 730 735

His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu

740 745 750

Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile

755 760 765

Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn

770 775 780

Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro

785 790 795 800

Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe

805 810 815

Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val

820 825 830

Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu

835 840 845

Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe

850 855 860

Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr

865 870 875 880

Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys

885 890 895

Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe

900 905 910

Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp

915 920 925

Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile

930 935 940

Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg

945 950 955 960

Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met Lys Gln Leu

965 970 975

Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile

980 985 990

Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu

995 1000 1005

Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu Ile His

1010 1015 1020

Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln Val

1025 1030 1035

Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala

1040 1045 1050

Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val

1055 1060 1065

Val Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn

1070 1075 1080

Ile Val Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly

1085 1090 1095

Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile

1100 1105 1110

Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn

1115 1120 1125

Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn

1130 1135 1140

Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu

1145 1150 1155

Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys

1160 1165 1170

Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn

1175 1180 1185

Arg Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys

1190 1195 1200

Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr

1205 1210 1215

Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu

1220 1225 1230

Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu

1235 1240 1245

Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg

1250 1255 1260

Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val

1265 1270 1275

Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys

1280 1285 1290

Asp Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His

1295 1300 1305

Ala His Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile

1310 1315 1320

Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr

1325 1330 1335

Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu

1340 1345 1350

Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met

1355 1360 1365

Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg

1370 1375 1380

Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val

1385 1390 1395

Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser

1400 1405 1410

Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly

1415 1420 1425

Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys

1430 1435 1440

Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly

1445 1450 1455

Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys

1460 1465 1470

Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu

1475 1480 1485

Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro

1490 1495 1500

Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp

1505 1510 1515

Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn

1520 1525 1530

Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly

1535 1540 1545

Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu

1550 1555 1560

Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu

1565 1570 1575

Gln Lys Gln Leu Phe Val Glu Gln His Lys His Tyr Leu Asp Glu

1580 1585 1590

Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala

1595 1600 1605

Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg

1610 1615 1620

Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe

1625 1630 1635

Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp

1640 1645 1650

Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu

1655 1660 1665

Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr

1670 1675 1680

Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Lys Arg Pro Ala Ala

1685 1690 1695

Thr Lys Lys Ala Gly Gln Ala Lys Lys Lys Lys

1700 1705

<210> 2

<211> 5130

<212> DNA

<213> ARTIFICIAL SEQUENCE

<220>

<223> T5exo - Linker- NLS1- Cas9-NLS2

<400> 2

atgtcaaagt cttggggcaa gttcatcgag gaggaggagg ccgagatggc gtcaaggcgc 60

aacctcatga ttgtcgacgg caccaatctg ggcttccggt tcaagcacaa caattctaag 120

aagcctttcg cctccagcta cgtgtccaca atccagagcc tcgccaagtc ctacagcgcg 180

cgcaccacaa ttgtgctggg cgacaagggc aagtcagtct tccggctgga gcatctgccg 240

gagtacaagg gcaacaggga tgagaagtac gcacagagga ccgaggagga gaaggcactc 300

gatgagcagt tcttcgagta cctcaaggac gccttcgagc tgtgcaagac cacattccca 360

accttcacaa tcaggggagt ggaggcagac gatatggcag cgtacatcgt caagctcatt 420

ggccacctgt acgatcatgt gtggctcatt tccacagacg gcgattggga caccctcctg 480

acagacaagg tctcacggtt ctctttcacc acacggaggg agtaccacct gagggatatg 540

tacgagcacc ataacgtgga cgatgtcgag cagttcatca gcctcaaggc cattatgggc 600

gatctgggcg acaatatcag gggagtcgag ggaattggag caaagagggg ctacaacatc 660

attcgggagt tcggcaatgt gctcgatatc attgaccagc tcccgctgcc aggcaagcag 720

aagtacatcc agaacctcaa tgcgtccgag gagctcctgt tccgcaatct catcctggtg 780

gatctgccga cctactgcgt cgacgcaatt gcagcagtgg gacaggatgt cctcgacaag 840

ttcacaaagg atatcctgga gattgcggag cagggtggag gcggaagtgg aggtggcggg 900

tcagggggtg gcggatctgg atccatggcc cctaagaaga agagaaaggt cggtattcac 960

ggcgttcctg cggcgatgga caagaagtat agtattggtc tggacattgg gacgaattcc 1020

gttggctggg ccgtgatcac cgatgagtac aaggtccctt ccaagaagtt taaggttctg 1080

gggaacaccg atcggcacag catcaagaag aatctcattg gagccctcct gttcgactca 1140

ggcgagaccg ccgaagcaac aaggctcaag agaaccgcaa ggagacggta tacaagaagg 1200

aagaatagga tctgctacct gcaggagatt ttcagcaacg aaatggcgaa ggtggacgat 1260

tcgttctttc atagattgga ggagagtttc ctcgtcgagg aagataagaa gcacgagagg 1320

catcctatct ttggcaacat tgtcgacgag gttgcctatc acgaaaagta ccccacaatc 1380

tatcatctgc ggaagaagct tgtggactcg actgataagg cggaccttag attgatctac 1440

ctcgctctgg cacacatgat taagttcagg ggccattttc tgatcgaggg ggatcttaac 1500

ccggacaata gcgatgtgga caagttgttc atccagctcg tccaaaccta caatcagctc 1560

tttgaggaaa acccaattaa tgcttcaggc gtcgacgcca aggcgatcct gtctgcacgc 1620

ctttcaaagt ctcgccggct tgagaacttg atcgctcaac tcccgggcga aaagaagaac 1680

ggcttgttcg ggaatctcat tgcactttcg ttggggctca caccaaactt caagagtaat 1740

tttgatctcg ctgaggacgc aaagctgcag ctttccaagg acacttatga cgatgacctg 1800

gataaccttt tggcccaaat cggcgatcag tacgcggact tgttcctcgc cgcgaagaat 1860

ttgtcggacg cgatcctcct gagtgatatt ctccgcgtga acaccgagat tacaaaggcc 1920

ccgctctcgg cgagtatgat caagcgctat gacgagcacc atcaggatct gacccttttg 1980

aaggctttgg tccggcagca actcccagag aagtacaagg aaatcttctt tgatcaatcc 2040

aagaacggct acgctggtta tattgacggc ggggcatcgc aggaggaatt ctacaagttt 2100

atcaagccaa ttctggagaa gatggatggc acagaggaac tcctggtgaa gctcaatagg 2160

gaggaccttt tgcggaagca aagaactttc gataacggca gcatccctca ccagattcat 2220

ctcggggagc tgcacgccat cctgagaagg caggaagact tctacccctt tcttaaggat 2280

aaccgggaga agatcgaaaa gattctgacg ttcagaattc cgtactatgt cggaccactc 2340

gcccggggta attccagatt tgcgtggatg accagaaaga gcgaggaaac catcacacct 2400

tggaacttcg aggaagtggt cgataagggc gcttccgcac agagcttcat tgagcgcatg 2460

acaaattttg acaagaacct gcctaatgag aaggtccttc ccaagcattc cctcctgtac 2520

gagtatttca ctgtttataa cgaactcacg aaggtgaagt atgtgaccga gggaatgcgc 2580

aagcccgcct tcctgagcgg cgagcaaaag aaggcgatcg tggacctttt gtttaagacc 2640

aatcggaagg tcacagttaa gcagctcaag gaggactact tcaagaagat tgaatgcttc 2700

gattccgttg agatcagcgg cgtggaagac aggtttaacg cgtcactggg gacttaccac 2760

gatctcctga agatcattaa ggataaggac ttcttggaca acgaggaaaa tgaggatatc 2820

ctcgaagaca ttgtcctgac tcttacgttg tttgaggata gggaaatgat cgaggaacgc 2880

ttgaagacgt atgcccatct cttcgatgac aaggttatga agcagctcaa gagaagaaga 2940

tacaccggat ggggaaggct gtcccgcaag cttatcaatg gcattagaga caagcaatca 3000

gggaagacaa tccttgactt tttgaagtct gatggcttcg cgaacaggaa ttttatgcag 3060

ctgattcacg atgactcact tactttcaag gaggatatcc agaaggctca agtgtcggga 3120

caaggtgaca gtctgcacga gcatatcgcc aaccttgcgg gatctcctgc aatcaagaag 3180

ggtattctgc agacagtcaa ggttgtggat gagcttgtga aggtcatggg acggcataag 3240

cccgagaaca tcgttattga gatggccaga gaaaatcaga ccacacaaaa gggtcagaag 3300

aactcgaggg agcgcatgaa gcgcatcgag gaaggcatta aggagctggg gagtcagatc 3360

cttaaggagc acccggtgga aaacacgcag ttgcaaaatg agaagctcta tctgtactat 3420

ctgcaaaatg gcagggatat gtatgtggac caggagttgg atattaaccg cctctcggat 3480

tacgacgtcg atcatatcgt tcctcagtcc ttccttaagg atgacagcat tgacaataag 3540

gttctcacca ggtccgacaa gaaccgcggg aagtccgata atgtgcccag cgaggaagtc 3600

gttaagaaga tgaagaacta ctggaggcaa cttttgaatg ccaagttgat cacacagagg 3660

aagtttgata acctcactaa ggccgagcgc ggaggtctca gcgaactgga caaggcgggc 3720

ttcattaagc ggcaactggt tgagactaga cagatcacga agcacgtggc gcagattctc 3780

gattcacgca tgaacacgaa gtacgatgag aatgacaagc tgatccggga agtgaaggtc 3840

atcaccttga agtcaaagct cgtttctgac ttcaggaagg atttccaatt ttataaggtg 3900

cgcgagatca acaattatca ccatgctcat gacgcatacc tcaacgctgt ggtcggaaca 3960

gcattgatta agaagtaccc gaagctcgag tccgaattcg tgtacggtga ctataaggtt 4020

tacgatgtgc gcaagatgat cgccaagtca gagcaggaaa ttggcaaggc cactgcgaag 4080

tatttctttt actctaacat tatgaatttc tttaagactg agatcacgct ggctaatggc 4140

gaaatccgga agagaccact tattgagacc aacggcgaga caggggaaat cgtgtgggac 4200

aaggggaggg atttcgccac agtccgcaag gttctctcta tgcctcaagt gaatattgtc 4260

aagaagactg aagtccagac gggcgggttc tcaaaggaat ctattctgcc caagcggaac 4320

tcggataagc ttatcgccag aaagaaggac tgggacccga agaagtatgg aggtttcgac 4380

tcaccaacgg tggcttactc tgtcctggtt gtggcaaagg tggagaaggg aaagtcaaag 4440

aagctcaagt ctgtcaagga gctcctgggt atcaccatta tggagaggtc cagcttcgaa 4500

aagaatccga tcgattttct cgaggcgaag ggatataagg aagtgaagaa ggacctgatc 4560

attaagcttc caaagtacag tcttttcgag ttggaaaacg gcaggaagcg catgttggct 4620

tccgcaggag agctccagaa gggtaacgag cttgctttgc cgtccaagta tgtgaacttc 4680

ctctatctgg catcccacta cgagaagctc aagggcagcc cagaggataa cgaacagaag 4740

caactgtttg tggagcaaca caagcattat cttgacgaga tcattgaaca gatttcggag 4800

ttcagtaagc gcgtcatcct cgccgacgcg aatttggata aggttctctc agcctacaac 4860

aagcaccggg acaagcctat cagagagcag gcggaaaata tcattcatct cttcaccctg 4920

acaaaccttg gggctcccgc tgcattcaag tattttgaca ctacgattga tcggaagaga 4980

tacacttcta cgaaggaggt gctggatgca acccttatcc accaatcgat tactggcctc 5040

tacgagacgc ggatcgactt gagtcagctc gggggggata agagaccagc ggcaaccaag 5100

aaggcaggac aagcgaagaa gaagaagtag 5130

<210> 3

<211> 290

<212> PRT

<213> ARTIFICIAL SEQUENCE

<220>

<223> ARTIFICIAL SEQUENCE

<400> 3

Ser Lys Ser Trp Gly Lys Phe Ile Glu Glu Glu Glu Ala Glu Met Ala

1 5 10 15

Ser Arg Arg Asn Leu Met Ile Val Asp Gly Thr Asn Leu Gly Phe Arg

20 25 30

Phe Lys His Asn Asn Ser Lys Lys Pro Phe Ala Ser Ser Tyr Val Ser

35 40 45

Thr Ile Gln Ser Leu Ala Lys Ser Tyr Ser Ala Arg Thr Thr Ile Val

50 55 60

Leu Gly Asp Lys Gly Lys Ser Val Phe Arg Leu Glu His Leu Pro Glu

65 70 75 80

Tyr Lys Gly Asn Arg Asp Glu Lys Tyr Ala Gln Arg Thr Glu Glu Glu

85 90 95

Lys Ala Leu Asp Glu Gln Phe Phe Glu Tyr Leu Lys Asp Ala Phe Glu

100 105 110

Leu Cys Lys Thr Thr Phe Pro Thr Phe Thr Ile Arg Gly Val Glu Ala

115 120 125

Asp Asp Met Ala Ala Tyr Ile Val Lys Leu Ile Gly His Leu Tyr Asp

130 135 140

His Val Trp Leu Ile Ser Thr Asp Gly Asp Trp Asp Thr Leu Leu Thr

145 150 155 160

Asp Lys Val Ser Arg Phe Ser Phe Thr Thr Arg Arg Glu Tyr His Leu

165 170 175

Arg Asp Met Tyr Glu His His Asn Val Asp Asp Val Glu Gln Phe Ile

180 185 190

Ser Leu Lys Ala Ile Met Gly Asp Leu Gly Asp Asn Ile Arg Gly Val

195 200 205

Glu Gly Ile Gly Ala Lys Arg Gly Tyr Asn Ile Ile Arg Glu Phe Gly

210 215 220

Asn Val Leu Asp Ile Ile Asp Gln Leu Pro Leu Pro Gly Lys Gln Lys

225 230 235 240

Tyr Ile Gln Asn Leu Asn Ala Ser Glu Glu Leu Leu Phe Arg Asn Leu

245 250 255

Ile Leu Val Asp Leu Pro Thr Tyr Cys Val Asp Ala Ile Ala Ala Val

260 265 270

Gly Gln Asp Val Leu Asp Lys Phe Thr Lys Asp Ile Leu Glu Ile Ala

275 280 285

Glu Gln

290

<210> 4

<211> 870

<212> DNA

<213> ARTIFICIAL SEQUENCE

<220>

<223> ARTIFICIAL SEQUENCE

<400> 4

tcaaagtctt ggggcaagtt catcgaggag gaggaggccg agatggcgtc aaggcgcaac 60

ctcatgattg tcgacggcac caatctgggc ttccggttca agcacaacaa ttctaagaag 120

cctttcgcct ccagctacgt gtccacaatc cagagcctcg ccaagtccta cagcgcgcgc 180

accacaattg tgctgggcga caagggcaag tcagtcttcc ggctggagca tctgccggag 240

tacaagggca acagggatga gaagtacgca cagaggaccg aggaggagaa ggcactcgat 300

gagcagttct tcgagtacct caaggacgcc ttcgagctgt gcaagaccac attcccaacc 360

ttcacaatca ggggagtgga ggcagacgat atggcagcgt acatcgtcaa gctcattggc 420

cacctgtacg atcatgtgtg gctcatttcc acagacggcg attgggacac cctcctgaca 480

gacaaggtct cacggttctc tttcaccaca cggagggagt accacctgag ggatatgtac 540

gagcaccata acgtggacga tgtcgagcag ttcatcagcc tcaaggccat tatgggcgat 600

ctgggcgaca atatcagggg agtcgaggga attggagcaa agaggggcta caacatcatt 660

cgggagttcg gcaatgtgct cgatatcatt gaccagctcc cgctgccagg caagcagaag 720

tacatccaga acctcaatgc gtccgaggag ctcctgttcc gcaatctcat cctggtggat 780

ctgccgacct actgcgtcga cgcaattgca gcagtgggac aggatgtcct cgacaagttc 840

acaaaggata tcctggagat tgcggagcag 870

<210> 5

<211> 15

<212> PRT

<213> ARTIFICIAL SEQUENCE

<220>

<223> ARTIFICIAL SEQUENCE

<400> 5

Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser

1 5 10 15

<210> 6

<211> 17

<212> PRT

<213> ARTIFICIAL SEQUENCE

<220>

<223> ARTIFICIAL SEQUENCE

<400> 6

Met Ala Pro Lys Lys Lys Arg Lys Val Gly Ile His Gly Val Pro Ala

1 5 10 15

Ala

<210> 7

<211> 16

<212> PRT

<213> ARTIFICIAL SEQUENCE

<220>

<223> ARTIFICIAL SEQUENCE

<400> 7

Lys Arg Pro Ala Ala Thr Lys Lys Ala Gly Gln Ala Lys Lys Lys Lys

1 5 10 15

<210> 8

<211> 1368

<212> PRT

<213> ARTIFICIAL SEQUENCE

<220>

<223> ARTIFICIAL SEQUENCE

<400> 8

Met Asp Lys Lys Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val

1 5 10 15

Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe

20 25 30

Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile

35 40 45

Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu

50 55 60

Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys

65 70 75 80

Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser

85 90 95

Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys

100 105 110

His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr

115 120 125

His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp

130 135 140

Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His

145 150 155 160

Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro

165 170 175

Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr

180 185 190

Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala

195 200 205

Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn

210 215 220

Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn

225 230 235 240

Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe

245 250 255

Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp

260 265 270

Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp

275 280 285

Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp

290 295 300

Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser

305 310 315 320

Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys

325 330 335

Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe

340 345 350

Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser

355 360 365

Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp

370 375 380

Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg

385 390 395 400

Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu

405 410 415

Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe

420 425 430

Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile

435 440 445

Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp

450 455 460

Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu

465 470 475 480

Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr

485 490 495

Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser

500 505 510

Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys

515 520 525

Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln

530 535 540

Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr

545 550 555 560

Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp

565 570 575

Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly

580 585 590

Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp

595 600 605

Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr

610 615 620

Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala

625 630 635 640

His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr

645 650 655

Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp

660 665 670

Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe

675 680 685

Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe

690 695 700

Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu

705 710 715 720

His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly

725 730 735

Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly

740 745 750

Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln

755 760 765

Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile

770 775 780

Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro

785 790 795 800

Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu

805 810 815

Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg

820 825 830

Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys

835 840 845

Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg

850 855 860

Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys

865 870 875 880

Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys

885 890 895

Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp

900 905 910

Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr

915 920 925

Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp

930 935 940

Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser

945 950 955 960

Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg

965 970 975

Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val

980 985 990

Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe

995 1000 1005

Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala

1010 1015 1020

Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe

1025 1030 1035

Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala

1040 1045 1050

Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu

1055 1060 1065

Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val

1070 1075 1080

Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr

1085 1090 1095

Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys

1100 1105 1110

Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro

1115 1120 1125

Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val

1130 1135 1140

Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys

1145 1150 1155

Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser

1160 1165 1170

Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys

1175 1180 1185

Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu

1190 1195 1200

Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly

1205 1210 1215

Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val

1220 1225 1230

Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser

1235 1240 1245

Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys

1250 1255 1260

His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys

1265 1270 1275

Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala

1280 1285 1290

Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn

1295 1300 1305

Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala

1310 1315 1320

Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser

1325 1330 1335

Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr

1340 1345 1350

Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp

1355 1360 1365

<210> 9

<211> 4104

<212> DNA

<213> ARTIFICIAL SEQUENCE

<220>

<223> ARTIFICIAL SEQUENCE

<400> 9

atggacaaga agtatagtat tggtctggac attgggacga attccgttgg ctgggccgtg 60

atcaccgatg agtacaaggt cccttccaag aagtttaagg ttctggggaa caccgatcgg 120

cacagcatca agaagaatct cattggagcc ctcctgttcg actcaggcga gaccgccgaa 180

gcaacaaggc tcaagagaac cgcaaggaga cggtatacaa gaaggaagaa taggatctgc 240

tacctgcagg agattttcag caacgaaatg gcgaaggtgg acgattcgtt ctttcataga 300

ttggaggaga gtttcctcgt cgaggaagat aagaagcacg agaggcatcc tatctttggc 360

aacattgtcg acgaggttgc ctatcacgaa aagtacccca caatctatca tctgcggaag 420

aagcttgtgg actcgactga taaggcggac cttagattga tctacctcgc tctggcacac 480

atgattaagt tcaggggcca ttttctgatc gagggggatc ttaacccgga caatagcgat 540

gtggacaagt tgttcatcca gctcgtccaa acctacaatc agctctttga ggaaaaccca 600

attaatgctt caggcgtcga cgccaaggcg atcctgtctg cacgcctttc aaagtctcgc 660

cggcttgaga acttgatcgc tcaactcccg ggcgaaaaga agaacggctt gttcgggaat 720

ctcattgcac tttcgttggg gctcacacca aacttcaaga gtaattttga tctcgctgag 780

gacgcaaagc tgcagctttc caaggacact tatgacgatg acctggataa ccttttggcc 840

caaatcggcg atcagtacgc ggacttgttc ctcgccgcga agaatttgtc ggacgcgatc 900

ctcctgagtg atattctccg cgtgaacacc gagattacaa aggccccgct ctcggcgagt 960

atgatcaagc gctatgacga gcaccatcag gatctgaccc ttttgaaggc tttggtccgg 1020

cagcaactcc cagagaagta caaggaaatc ttctttgatc aatccaagaa cggctacgct 1080

ggttatattg acggcggggc atcgcaggag gaattctaca agtttatcaa gccaattctg 1140

gagaagatgg atggcacaga ggaactcctg gtgaagctca atagggagga ccttttgcgg 1200

aagcaaagaa ctttcgataa cggcagcatc cctcaccaga ttcatctcgg ggagctgcac 1260

gccatcctga gaaggcagga agacttctac ccctttctta aggataaccg ggagaagatc 1320

gaaaagattc tgacgttcag aattccgtac tatgtcggac cactcgcccg gggtaattcc 1380

agatttgcgt ggatgaccag aaagagcgag gaaaccatca caccttggaa cttcgaggaa 1440

gtggtcgata agggcgcttc cgcacagagc ttcattgagc gcatgacaaa ttttgacaag 1500

aacctgccta atgagaaggt ccttcccaag cattccctcc tgtacgagta tttcactgtt 1560

tataacgaac tcacgaaggt gaagtatgtg accgagggaa tgcgcaagcc cgccttcctg 1620

agcggcgagc aaaagaaggc gatcgtggac cttttgttta agaccaatcg gaaggtcaca 1680

gttaagcagc tcaaggagga ctacttcaag aagattgaat gcttcgattc cgttgagatc 1740

agcggcgtgg aagacaggtt taacgcgtca ctggggactt accacgatct cctgaagatc 1800

attaaggata aggacttctt ggacaacgag gaaaatgagg atatcctcga agacattgtc 1860

ctgactctta cgttgtttga ggatagggaa atgatcgagg aacgcttgaa gacgtatgcc 1920

catctcttcg atgacaaggt tatgaagcag ctcaagagaa gaagatacac cggatgggga 1980

aggctgtccc gcaagcttat caatggcatt agagacaagc aatcagggaa gacaatcctt 2040

gactttttga agtctgatgg cttcgcgaac aggaatttta tgcagctgat tcacgatgac 2100

tcacttactt tcaaggagga tatccagaag gctcaagtgt cgggacaagg tgacagtctg 2160

cacgagcata tcgccaacct tgcgggatct cctgcaatca agaagggtat tctgcagaca 2220

gtcaaggttg tggatgagct tgtgaaggtc atgggacggc ataagcccga gaacatcgtt 2280

attgagatgg ccagagaaaa tcagaccaca caaaagggtc agaagaactc gagggagcgc 2340

atgaagcgca tcgaggaagg cattaaggag ctggggagtc agatccttaa ggagcacccg 2400

gtggaaaaca cgcagttgca aaatgagaag ctctatctgt actatctgca aaatggcagg 2460

gatatgtatg tggaccagga gttggatatt aaccgcctct cggattacga cgtcgatcat 2520

atcgttcctc agtccttcct taaggatgac agcattgaca ataaggttct caccaggtcc 2580

gacaagaacc gcgggaagtc cgataatgtg cccagcgagg aagtcgttaa gaagatgaag 2640

aactactgga ggcaactttt gaatgccaag ttgatcacac agaggaagtt tgataacctc 2700

actaaggccg agcgcggagg tctcagcgaa ctggacaagg cgggcttcat taagcggcaa 2760

ctggttgaga ctagacagat cacgaagcac gtggcgcaga ttctcgattc acgcatgaac 2820

acgaagtacg atgagaatga caagctgatc cgggaagtga aggtcatcac cttgaagtca 2880

aagctcgttt ctgacttcag gaaggatttc caattttata aggtgcgcga gatcaacaat 2940

tatcaccatg ctcatgacgc atacctcaac gctgtggtcg gaacagcatt gattaagaag 3000

tacccgaagc tcgagtccga attcgtgtac ggtgactata aggtttacga tgtgcgcaag 3060

atgatcgcca agtcagagca ggaaattggc aaggccactg cgaagtattt cttttactct 3120

aacattatga atttctttaa gactgagatc acgctggcta atggcgaaat ccggaagaga 3180

ccacttattg agaccaacgg cgagacaggg gaaatcgtgt gggacaaggg gagggatttc 3240

gccacagtcc gcaaggttct ctctatgcct caagtgaata ttgtcaagaa gactgaagtc 3300

cagacgggcg ggttctcaaa ggaatctatt ctgcccaagc ggaactcgga taagcttatc 3360

gccagaaaga aggactggga cccgaagaag tatggaggtt tcgactcacc aacggtggct 3420

tactctgtcc tggttgtggc aaaggtggag aagggaaagt caaagaagct caagtctgtc 3480

aaggagctcc tgggtatcac cattatggag aggtccagct tcgaaaagaa tccgatcgat 3540

tttctcgagg cgaagggata taaggaagtg aagaaggacc tgatcattaa gcttccaaag 3600

tacagtcttt tcgagttgga aaacggcagg aagcgcatgt tggcttccgc aggagagctc 3660

cagaagggta acgagcttgc tttgccgtcc aagtatgtga acttcctcta tctggcatcc 3720

cactacgaga agctcaaggg cagcccagag gataacgaac agaagcaact gtttgtggag 3780

caacacaagc attatcttga cgagatcatt gaacagattt cggagttcag taagcgcgtc 3840

atcctcgccg acgcgaattt ggataaggtt ctctcagcct acaacaagca ccgggacaag 3900

cctatcagag agcaggcgga aaatatcatt catctcttca ccctgacaaa ccttggggct 3960

cccgctgcat tcaagtattt tgacactacg attgatcgga agagatacac ttctacgaag 4020

gaggtgctgg atgcaaccct tatccaccaa tcgattactg gcctctacga gacgcggatc 4080

gacttgagtc agctcggggg ggat 4104

<210> 10

<211> 1566

<212> PRT

<213> ARTIFICIAL SEQUENCE

<220>

<223> ARTIFICIAL SEQUENCE

<400> 10

Met Ala Pro Lys Lys Lys Arg Lys Val Gly Ile His Gly Val Pro Ala

1 5 10 15

Ala Ser Lys Ser Trp Gly Lys Phe Ile Glu Glu Glu Glu Ala Glu Met

20 25 30

Ala Ser Arg Arg Asn Leu Met Ile Val Asp Gly Thr Asn Leu Gly Phe

35 40 45

Arg Phe Lys His Asn Asn Ser Lys Lys Pro Phe Ala Ser Ser Tyr Val

50 55 60

Ser Thr Ile Gln Ser Leu Ala Lys Ser Tyr Ser Ala Arg Thr Thr Ile

65 70 75 80

Val Leu Gly Asp Lys Gly Lys Ser Val Phe Arg Leu Glu His Leu Pro

85 90 95

Glu Tyr Lys Gly Asn Arg Asp Glu Lys Tyr Ala Gln Arg Thr Glu Glu

100 105 110

Glu Lys Ala Leu Asp Glu Gln Phe Phe Glu Tyr Leu Lys Asp Ala Phe

115 120 125

Glu Leu Cys Lys Thr Thr Phe Pro Thr Phe Thr Ile Arg Gly Val Glu

130 135 140

Ala Asp Asp Met Ala Ala Tyr Ile Val Lys Leu Ile Gly His Leu Tyr

145 150 155 160

Asp His Val Trp Leu Ile Ser Thr Asp Gly Asp Trp Asp Thr Leu Leu

165 170 175

Thr Asp Lys Val Ser Arg Phe Ser Phe Thr Thr Arg Arg Glu Tyr His

180 185 190

Leu Arg Asp Met Tyr Glu His His Asn Val Asp Asp Val Glu Gln Phe

195 200 205

Ile Ser Leu Lys Ala Ile Met Gly Asp Leu Gly Asp Asn Ile Arg Gly

210 215 220

Val Glu Gly Ile Gly Ala Lys Arg Gly Tyr Asn Ile Ile Arg Glu Phe

225 230 235 240

Gly Asn Val Leu Asp Ile Ile Asp Gln Leu Pro Leu Pro Gly Lys Gln

245 250 255

Lys Tyr Ile Gln Asn Leu Asn Ala Ser Glu Glu Leu Leu Phe Arg Asn

260 265 270

Leu Ile Leu Val Asp Leu Pro Thr Tyr Cys Val Asp Ala Ile Ala Ala

275 280 285

Val Gly Gln Asp Val Leu Asp Lys Phe Thr Lys Asp Ile Leu Glu Ile

290 295 300

Ala Glu Gln Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr

305 310 315 320

Pro Glu Ser Ser Lys Leu Glu Lys Phe Thr Asn Cys Tyr Ser Leu Ser

325 330 335

Lys Thr Leu Arg Phe Lys Ala Ile Pro Val Gly Lys Thr Gln Glu Asn

340 345 350

Ile Asp Asn Lys Arg Leu Leu Val Glu Asp Glu Lys Arg Ala Glu Asp

355 360 365

Tyr Lys Gly Val Lys Lys Leu Leu Asp Arg Tyr Tyr Leu Ser Phe Ile

370 375 380

Asn Asp Val Leu His Ser Ile Lys Leu Lys Asn Leu Asn Asn Tyr Ile

385 390 395 400

Ser Leu Phe Arg Lys Lys Thr Arg Thr Glu Lys Glu Asn Lys Glu Leu

405 410 415

Glu Asn Leu Glu Ile Asn Leu Arg Lys Glu Ile Ala Lys Ala Phe Lys

420 425 430

Gly Asn Glu Gly Tyr Lys Ser Leu Phe Lys Lys Asp Ile Ile Glu Thr

435 440 445

Ile Leu Pro Glu Phe Leu Asp Asp Lys Asp Glu Ile Ala Leu Val Asn

450 455 460

Ser Phe Asn Gly Phe Thr Thr Ala Phe Thr Gly Phe Phe Asp Asn Arg

465 470 475 480

Glu Asn Met Phe Ser Glu Glu Ala Lys Ser Thr Ser Ile Ala Phe Arg

485 490 495

Cys Ile Asn Glu Asn Leu Thr Arg Tyr Ile Ser Asn Met Asp Ile Phe

500 505 510

Glu Lys Val Asp Ala Ile Phe Asp Lys His Glu Val Gln Glu Ile Lys

515 520 525

Glu Lys Ile Leu Asn Ser Asp Tyr Asp Val Glu Asp Phe Phe Glu Gly

530 535 540

Glu Phe Phe Asn Phe Val Leu Thr Gln Glu Gly Ile Asp Val Tyr Asn

545 550 555 560

Ala Ile Ile Gly Gly Phe Val Thr Glu Ser Gly Glu Lys Ile Lys Gly

565 570 575

Leu Asn Glu Tyr Ile Asn Leu Tyr Asn Gln Lys Thr Lys Gln Lys Leu

580 585 590

Pro Lys Phe Lys Pro Leu Tyr Lys Gln Val Leu Ser Asp Arg Glu Ser

595 600 605

Leu Ser Phe Tyr Gly Glu Gly Tyr Thr Ser Asp Glu Glu Val Leu Glu

610 615 620

Val Phe Arg Asn Thr Leu Asn Lys Asn Ser Glu Ile Phe Ser Ser Ile

625 630 635 640

Lys Lys Leu Glu Lys Leu Phe Lys Asn Phe Asp Glu Tyr Ser Ser Ala

645 650 655

Gly Ile Phe Val Lys Asn Gly Pro Ala Ile Ser Thr Ile Ser Lys Asp

660 665 670

Ile Phe Gly Glu Trp Asn Val Ile Arg Asp Lys Trp Asn Ala Glu Tyr

675 680 685

Asp Asp Ile His Leu Lys Lys Lys Ala Val Val Thr Glu Lys Tyr Glu

690 695 700

Asp Asp Arg Arg Lys Ser Phe Lys Lys Ile Gly Ser Phe Ser Leu Glu

705 710 715 720

Gln Leu Gln Glu Tyr Ala Asp Ala Asp Leu Ser Val Val Glu Lys Leu

725 730 735

Lys Glu Ile Ile Ile Gln Lys Val Asp Glu Ile Tyr Lys Val Tyr Gly

740 745 750

Ser Ser Glu Lys Leu Phe Asp Ala Asp Phe Val Leu Glu Lys Ser Leu

755 760 765

Lys Lys Asn Asp Ala Val Val Ala Ile Met Lys Asp Leu Leu Asp Ser

770 775 780

Val Lys Ser Phe Glu Asn Tyr Ile Lys Ala Phe Phe Gly Glu Gly Lys

785 790 795 800

Glu Thr Asn Arg Asp Glu Ser Phe Tyr Gly Asp Phe Val Leu Ala Tyr

805 810 815

Asp Ile Leu Leu Lys Val Asp His Ile Tyr Asp Ala Ile Arg Asn Tyr

820 825 830

Val Thr Gln Lys Pro Tyr Ser Lys Asp Lys Phe Lys Leu Tyr Phe Gln

835 840 845

Asn Pro Gln Phe Met Gly Gly Trp Asp Lys Asp Lys Glu Thr Asp Tyr

850 855 860

Arg Ala Thr Ile Leu Arg Tyr Gly Ser Lys Tyr Tyr Leu Ala Ile Met

865 870 875 880

Asp Lys Lys Tyr Ala Lys Cys Leu Gln Lys Ile Asp Lys Asp Asp Val

885 890 895

Asn Gly Asn Tyr Glu Lys Ile Asn Tyr Lys Leu Leu Pro Gly Pro Asn

900 905 910

Lys Met Leu Pro Lys Val Phe Phe Ser Lys Lys Trp Met Ala Tyr Tyr

915 920 925

Asn Pro Ser Glu Asp Ile Gln Lys Ile Tyr Lys Asn Gly Thr Phe Lys

930 935 940

Lys Gly Asp Met Phe Asn Leu Asn Asp Cys His Lys Leu Ile Asp Phe

945 950 955 960

Phe Lys Asp Ser Ile Ser Arg Tyr Pro Lys Trp Ser Asn Ala Tyr Asp

965 970 975

Phe Asn Phe Ser Glu Thr Glu Lys Tyr Lys Asp Ile Ala Gly Phe Tyr

980 985 990

Arg Glu Val Glu Glu Gln Gly Tyr Lys Val Ser Phe Glu Ser Ala Ser

995 1000 1005

Lys Lys Glu Val Asp Lys Leu Val Glu Glu Gly Lys Leu Tyr Met

1010 1015 1020

Phe Gln Ile Tyr Asn Lys Asp Phe Ser Asp Lys Ser His Gly Thr

1025 1030 1035

Pro Asn Leu His Thr Met Tyr Phe Lys Leu Leu Phe Asp Glu Asn

1040 1045 1050

Asn His Gly Gln Ile Arg Leu Ser Gly Gly Ala Glu Leu Phe Met

1055 1060 1065

Arg Arg Ala Ser Leu Lys Lys Glu Glu Leu Val Val His Pro Ala

1070 1075 1080

Asn Ser Pro Ile Ala Asn Lys Asn Pro Asp Asn Pro Lys Lys Thr

1085 1090 1095

Thr Thr Leu Ser Tyr Asp Val Tyr Lys Asp Lys Arg Phe Ser Glu

1100 1105 1110

Asp Gln Tyr Glu Leu His Ile Pro Ile Ala Ile Asn Lys Cys Pro

1115 1120 1125

Lys Asn Ile Phe Lys Ile Asn Thr Glu Val Arg Val Leu Leu Lys

1130 1135 1140

His Asp Asp Asn Pro Tyr Val Ile Gly Ile Asp Arg Gly Glu Arg

1145 1150 1155

Asn Leu Leu Tyr Ile Val Val Val Asp Gly Lys Gly Asn Ile Val

1160 1165 1170

Glu Gln Tyr Ser Leu Asn Glu Ile Ile Asn Asn Phe Asn Gly Ile

1175 1180 1185

Arg Ile Lys Thr Asp Tyr His Ser Leu Leu Asp Lys Lys Glu Lys

1190 1195 1200

Glu Arg Phe Glu Ala Arg Gln Asn Trp Thr Ser Ile Glu Asn Ile

1205 1210 1215

Lys Glu Leu Lys Ala Gly Tyr Ile Ser Gln Val Val His Lys Ile

1220 1225 1230

Cys Glu Leu Val Glu Lys Tyr Asp Ala Val Ile Ala Leu Glu Asp

1235 1240 1245

Leu Asn Ser Gly Phe Lys Asn Ser Arg Val Lys Val Glu Lys Gln

1250 1255 1260

Val Tyr Gln Lys Phe Glu Lys Met Leu Ile Asp Lys Leu Asn Tyr

1265 1270 1275

Met Val Asp Lys Lys Ser Asn Pro Cys Ala Thr Gly Gly Ala Leu

1280 1285 1290

Lys Gly Tyr Gln Ile Thr Asn Lys Phe Glu Ser Phe Lys Ser Met

1295 1300 1305

Ser Thr Gln Asn Gly Phe Ile Phe Tyr Ile Pro Ala Trp Leu Thr

1310 1315 1320

Ser Lys Ile Asp Pro Ser Thr Gly Phe Val Asn Leu Leu Lys Thr

1325 1330 1335

Lys Tyr Thr Ser Ile Ala Asp Ser Lys Lys Phe Ile Ser Ser Phe

1340 1345 1350

Asp Arg Ile Met Tyr Val Pro Glu Glu Asp Leu Phe Glu Phe Ala

1355 1360 1365

Leu Asp Tyr Lys Asn Phe Ser Arg Thr Asp Ala Asp Tyr Ile Lys

1370 1375 1380

Lys Trp Lys Leu Tyr Ser Tyr Gly Asn Arg Ile Arg Ile Phe Arg

1385 1390 1395

Asn Pro Lys Lys Asn Asn Val Phe Asp Trp Glu Glu Val Cys Leu

1400 1405 1410

Thr Ser Ala Tyr Lys Glu Leu Phe Asn Lys Tyr Gly Ile Asn Tyr

1415 1420 1425

Gln Gln Gly Asp Ile Arg Ala Leu Leu Cys Glu Gln Ser Asp Lys

1430 1435 1440

Ala Phe Tyr Ser Ser Phe Met Ala Leu Met Ser Leu Met Leu Gln

1445 1450 1455

Met Arg Asn Ser Ile Thr Gly Arg Thr Asp Val Asp Phe Leu Ile

1460 1465 1470

Ser Pro Val Lys Asn Ser Asp Gly Ile Phe Tyr Asp Ser Arg Asn

1475 1480 1485

Tyr Glu Ala Gln Glu Asn Ala Ile Leu Pro Lys Asn Ala Asp Ala

1490 1495 1500

Asn Gly Ala Tyr Asn Ile Ala Arg Lys Val Leu Trp Ala Ile Gly

1505 1510 1515

Gln Phe Lys Lys Ala Glu Asp Glu Lys Leu Asp Lys Val Lys Ile

1520 1525 1530

Ala Ile Ser Asn Lys Glu Trp Leu Glu Tyr Ala Gln Thr Ser Val

1535 1540 1545

Lys His Lys Arg Pro Ala Ala Thr Lys Lys Ala Gly Gln Ala Lys

1550 1555 1560

Lys Lys Lys

1565

<210> 11

<211> 4701

<212> DNA

<213> ARTIFICIAL SEQUENCE

<220>

<223> ARTIFICIAL SEQUENCE

<400> 11

atggctccta agaagaagcg gaaggttggt attcacgggg tgcctgcggc ttcaaagtct 60

tggggcaagt tcatcgagga ggaggaggcc gagatggcgt caaggcgcaa cctcatgatt 120

gtcgacggca ccaatctggg cttccggttc aagcacaaca attctaagaa gcctttcgcc 180

tccagctacg tgtccacaat ccagagcctc gccaagtcct acagcgcgcg caccacaatt 240

gtgctgggcg acaagggcaa gtcagtcttc cggctggagc atctgccgga gtacaagggc 300

aacagggatg agaagtacgc acagaggacc gaggaggaga aggcactcga tgagcagttc 360

ttcgagtacc tcaaggacgc cttcgagctg tgcaagacca cattcccaac cttcacaatc 420

aggggagtgg aggcagacga tatggcagcg tacatcgtca agctcattgg ccacctgtac 480

gatcatgtgt ggctcatttc cacagacggc gattgggaca ccctcctgac agacaaggtc 540

tcacggttct ctttcaccac acggagggag taccacctga gggatatgta cgagcaccat 600

aacgtggacg atgtcgagca gttcatcagc ctcaaggcca ttatgggcga tctgggcgac 660

aatatcaggg gagtcgaggg aattggagca aagaggggct acaacatcat tcgggagttc 720

ggcaatgtgc tcgatatcat tgaccagctc ccgctgccag gcaagcagaa gtacatccag 780

aacctcaatg cgtccgagga gctcctgttc cgcaatctca tcctggtgga tctgccgacc 840

tactgcgtcg acgcaattgc agcagtggga caggatgtcc tcgacaagtt cacaaaggat 900

atcctggaga ttgcggagca gtccggcagc gagacgccag gcacctccga gagcgctacg 960

cctgaatcgt caaagctcga gaaattcacc aactgttatt cgttgagcaa aacactgcgg 1020

tttaaagcga ttccagtcgg caagactcaa gagaatatag acaataagcg gctgttggtg 1080

gaagatgaaa agcgcgcgga agactacaaa ggggtgaaga agttgttgga cagatactac 1140

ctctctttta tcaatgatgt cttgcactca atcaaattga agaatctgaa caactacatc 1200

tccctcttca gaaagaaaac aaggacagaa aaggagaata aggaacttga aaatttggag 1260

atcaatctga ggaaagagat cgcgaaagcc tttaaaggca acgaaggata caaaagtctg 1320

ttcaagaagg atataattga gacaattttg ccagagttcc tcgatgacaa ggacgagatt 1380

gcgctggtca attcgttcaa cggattcaca acagcattca caggcttctt tgataatcgg 1440

gaaaatatgt tctctgagga ggcaaagtcc acttctattg cgttcaggtg tatcaatgag 1500

aatctcacta ggtacatttc caacatggat atctttgaga aggttgacgc aatttttgac 1560

aagcacgaag ttcaggagat taaggagaag atcctcaatt ccgattatga cgttgaggac 1620

ttcttcgaag gtgagttttt taatttcgtg ctcactcaag agggtatcga cgtgtataat 1680

gcgatcatcg gtgggttcgt gactgagtcc ggtgaaaaga ttaagggatt gaacgagtat 1740

atcaaccttt acaaccaaaa gacgaaacag aagctgccaa agttcaagcc tctttacaaa 1800

caggttcttt cagaccgcga gtcactctcg ttctatgggg agggctacac ttcggatgag 1860

gaagtcctgg aggtgttcag gaatactctc aataagaatt cggagatttt ctcttctata 1920

aaaaaactgg aaaagttgtt taagaatttt gacgaatact ctagcgccgg catatttgtg 1980

aaaaacggcc cggccatatc aacgataagt aaagatatct tcggcgaatg gaacgtgatc 2040

agagacaaat ggaacgcgga gtatgacgat attcacctga agaagaaggc tgtcgtaacg 2100

gagaagtacg aggatgatcg caggaaaagc ttcaaaaaga tcggaagttt cagcctggaa 2160

cagttgcagg agtatgctga cgccgatctt agcgtcgtcg agaagttgaa ggagataatc 2220

atccaaaagg tcgacgagat atataaagtc tatggatcaa gtgaaaaact gttcgacgcc 2280

gacttcgttt tggagaagtc cctgaagaag aacgacgctg ttgttgccat tatgaaggat 2340

ctgctcgaca gcgtgaagag tttcgagaac tatattaagg cttttttcgg ggaggggaag 2400

gagactaaca gagatgagtc cttctacgga gacttcgtcc tcgcgtacga tatactcctt 2460

aaggtagacc acatctacga cgcaatcaga aattacgtga cacaaaagcc gtacagcaag 2520

gacaagttca aactctactt ccagaacccc cagttcatgg gcggctggga caaggacaag 2580

gaaacggatt acagggctac gatcctgagg tatggttcaa aatactactt ggcgattatg 2640

gacaagaagt acgccaagtg tctccagaag attgacaaag acgatgtcaa tggcaattat 2700

gagaagatca actacaagct gcttccgggt ccgaacaaga tgctcccaaa ggttttcttc 2760

agcaagaaat ggatggccta ctataaccca agcgaggaca tccagaagat ttataagaac 2820

ggtacgttca agaagggcga catgttcaat cttaacgact gtcacaagct gatcgacttc 2880

ttcaaagact caattagccg gtacccaaag tggtctaacg cctatgactt caacttttcg 2940

gaaaccgaga agtacaagga tatagccgga ttttatagag aggtggaaga gcagggctac 3000

aaggtgtcat tcgagtccgc cagcaagaag gaagtggaca agctcgtgga agagggtaag 3060

ctctacatgt tccagattta taataaagac tttagcgata agagccacgg gacacctaat 3120

ctccacacaa tgtatttcaa gctgctcttc gacgagaata accacggcca aatcaggttg 3180

tcaggagggg ctgaactctt catgcggcgc gctagcctta agaaggagga gcttgtagtc 3240

caccctgcga atagtccaat tgcgaataag aacccggaca atcctaaaaa gactacaaca 3300

ttgagctacg acgtgtacaa ggataagagg ttttccgagg atcagtacga gctccacatc 3360

ccgattgcga tcaacaagtg cccaaagaat attttcaaga taaacacaga ggtgcgtgta 3420

ctcctgaagc atgacgacaa tccttacgtc attgggattg atcggggcga gaggaacctc 3480

ctctatattg tggtggtgga cgggaagggg aacatagtcg aacagtactc ccttaacgaa 3540

ataattaaca atttcaacgg catccgtatc aagaccgact accattcgtt gctggacaag 3600

aaggagaagg agagatttga ggcgcggcaa aattggacaa gtatcgagaa catcaaggaa 3660

ctcaaagcag gttatatctc tcaagttgtg cataagatat gcgagctggt tgagaagtat 3720

gacgcagtga tcgctcttga ggacctcaac tcgggcttta agaattctag agttaaagtg 3780

gagaagcagg tctatcaaaa gttcgagaag atgcttatag ataagctcaa ctacatggtc 3840

gataagaaat cgaacccatg tgccaccggc ggcgcactca aaggttacca aataacaaac 3900

aaattcgagt ccttcaaatc gatgagtact cagaatgggt tcatatttta tataccggcg 3960

tggcttacgt ctaagatcga cccgtcaact ggttttgtca acctgttgaa gacgaaatac 4020

acgtccattg ccgattcgaa aaagttcata tctagttttg atcgtattat gtacgtccca 4080

gaggaagatc ttttcgagtt tgctctcgac tacaaaaact tttcgcggac cgatgcggat 4140

tacattaaaa aatggaaact ctattcgtac ggcaacagaa tcaggatttt tcgcaaccct 4200

aagaagaata acgtctttga ttgggaggaa gtttgcttga ctagcgcgta caaggagctc 4260

tttaataagt atggcattaa ctaccaacag ggtgatatca gagcactgct ttgcgaacaa 4320

tctgacaagg ctttctactc atccttcatg gctttgatga gcctgatgct ccagatgaga 4380

aattcaatta caggcagaac cgacgtggat ttcttgatct ccccggttaa aaattctgat 4440

ggcatctttt acgatagcag gaactatgaa gcgcaagaga atgcgattct gccaaaaaat 4500

gcagacgcca acggtgccta taacatcgcc aggaaagtcc tgtgggcgat cggccagttc 4560

aaaaaggccg aagacgaaaa attggacaag gtcaaaatcg ctatcagcaa caaagagtgg 4620

ctggagtatg ctcagacatc cgtaaagcat aagcgtcctg ctgccaccaa aaaggccgga 4680

caggctaaga aaaagaagtg a 4701

<210> 12

<211> 7

<212> PRT

<213> ARTIFICIAL SEQUENCE

<220>

<223> ARTIFICIAL SEQUENCE

<400> 12

Pro Lys Lys Lys Arg Lys Val

1 5

<210> 13

<211> 16

<212> PRT

<213> ARTIFICIAL SEQUENCE

<220>

<223> ARTIFICIAL SEQUENCE

<400> 13

Lys Arg Pro Ala Ala Thr Lys Lys Ala Gly Gln Ala Lys Lys Lys Lys

1 5 10 15

<210> 14

<211> 16

<212> PRT

<213> Artificial Sequence

<220>

<223> ARTIFICIAL SEQUENCE

<400> 14

Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser

1 5 10 15

<210> 15

<211> 1227

<212> PRT

<213> ARTIFICIAL SEQUENCE

<220>

<223> ARTIFICIAL SEQUENCE

<400> 15

Ser Lys Leu Glu Lys Phe Thr Asn Cys Tyr Ser Leu Ser Lys Thr Leu

1 5 10 15

Arg Phe Lys Ala Ile Pro Val Gly Lys Thr Gln Glu Asn Ile Asp Asn

20 25 30

Lys Arg Leu Leu Val Glu Asp Glu Lys Arg Ala Glu Asp Tyr Lys Gly

35 40 45

Val Lys Lys Leu Leu Asp Arg Tyr Tyr Leu Ser Phe Ile Asn Asp Val

50 55 60

Leu His Ser Ile Lys Leu Lys Asn Leu Asn Asn Tyr Ile Ser Leu Phe

65 70 75 80

Arg Lys Lys Thr Arg Thr Glu Lys Glu Asn Lys Glu Leu Glu Asn Leu

85 90 95

Glu Ile Asn Leu Arg Lys Glu Ile Ala Lys Ala Phe Lys Gly Asn Glu

100 105 110

Gly Tyr Lys Ser Leu Phe Lys Lys Asp Ile Ile Glu Thr Ile Leu Pro

115 120 125

Glu Phe Leu Asp Asp Lys Asp Glu Ile Ala Leu Val Asn Ser Phe Asn

130 135 140

Gly Phe Thr Thr Ala Phe Thr Gly Phe Phe Asp Asn Arg Glu Asn Met

145 150 155 160

Phe Ser Glu Glu Ala Lys Ser Thr Ser Ile Ala Phe Arg Cys Ile Asn

165 170 175

Glu Asn Leu Thr Arg Tyr Ile Ser Asn Met Asp Ile Phe Glu Lys Val

180 185 190

Asp Ala Ile Phe Asp Lys His Glu Val Gln Glu Ile Lys Glu Lys Ile

195 200 205

Leu Asn Ser Asp Tyr Asp Val Glu Asp Phe Phe Glu Gly Glu Phe Phe

210 215 220

Asn Phe Val Leu Thr Gln Glu Gly Ile Asp Val Tyr Asn Ala Ile Ile

225 230 235 240

Gly Gly Phe Val Thr Glu Ser Gly Glu Lys Ile Lys Gly Leu Asn Glu

245 250 255

Tyr Ile Asn Leu Tyr Asn Gln Lys Thr Lys Gln Lys Leu Pro Lys Phe

260 265 270

Lys Pro Leu Tyr Lys Gln Val Leu Ser Asp Arg Glu Ser Leu Ser Phe

275 280 285

Tyr Gly Glu Gly Tyr Thr Ser Asp Glu Glu Val Leu Glu Val Phe Arg

290 295 300

Asn Thr Leu Asn Lys Asn Ser Glu Ile Phe Ser Ser Ile Lys Lys Leu

305 310 315 320

Glu Lys Leu Phe Lys Asn Phe Asp Glu Tyr Ser Ser Ala Gly Ile Phe

325 330 335

Val Lys Asn Gly Pro Ala Ile Ser Thr Ile Ser Lys Asp Ile Phe Gly

340 345 350

Glu Trp Asn Val Ile Arg Asp Lys Trp Asn Ala Glu Tyr Asp Asp Ile

355 360 365

His Leu Lys Lys Lys Ala Val Val Thr Glu Lys Tyr Glu Asp Asp Arg

370 375 380

Arg Lys Ser Phe Lys Lys Ile Gly Ser Phe Ser Leu Glu Gln Leu Gln

385 390 395 400

Glu Tyr Ala Asp Ala Asp Leu Ser Val Val Glu Lys Leu Lys Glu Ile

405 410 415

Ile Ile Gln Lys Val Asp Glu Ile Tyr Lys Val Tyr Gly Ser Ser Glu

420 425 430

Lys Leu Phe Asp Ala Asp Phe Val Leu Glu Lys Ser Leu Lys Lys Asn

435 440 445

Asp Ala Val Val Ala Ile Met Lys Asp Leu Leu Asp Ser Val Lys Ser

450 455 460

Phe Glu Asn Tyr Ile Lys Ala Phe Phe Gly Glu Gly Lys Glu Thr Asn

465 470 475 480

Arg Asp Glu Ser Phe Tyr Gly Asp Phe Val Leu Ala Tyr Asp Ile Leu

485 490 495

Leu Lys Val Asp His Ile Tyr Asp Ala Ile Arg Asn Tyr Val Thr Gln

500 505 510

Lys Pro Tyr Ser Lys Asp Lys Phe Lys Leu Tyr Phe Gln Asn Pro Gln

515 520 525

Phe Met Gly Gly Trp Asp Lys Asp Lys Glu Thr Asp Tyr Arg Ala Thr

530 535 540

Ile Leu Arg Tyr Gly Ser Lys Tyr Tyr Leu Ala Ile Met Asp Lys Lys

545 550 555 560

Tyr Ala Lys Cys Leu Gln Lys Ile Asp Lys Asp Asp Val Asn Gly Asn

565 570 575

Tyr Glu Lys Ile Asn Tyr Lys Leu Leu Pro Gly Pro Asn Lys Met Leu

580 585 590

Pro Lys Val Phe Phe Ser Lys Lys Trp Met Ala Tyr Tyr Asn Pro Ser

595 600 605

Glu Asp Ile Gln Lys Ile Tyr Lys Asn Gly Thr Phe Lys Lys Gly Asp

610 615 620

Met Phe Asn Leu Asn Asp Cys His Lys Leu Ile Asp Phe Phe Lys Asp

625 630 635 640

Ser Ile Ser Arg Tyr Pro Lys Trp Ser Asn Ala Tyr Asp Phe Asn Phe

645 650 655

Ser Glu Thr Glu Lys Tyr Lys Asp Ile Ala Gly Phe Tyr Arg Glu Val

660 665 670

Glu Glu Gln Gly Tyr Lys Val Ser Phe Glu Ser Ala Ser Lys Lys Glu

675 680 685

Val Asp Lys Leu Val Glu Glu Gly Lys Leu Tyr Met Phe Gln Ile Tyr

690 695 700

Asn Lys Asp Phe Ser Asp Lys Ser His Gly Thr Pro Asn Leu His Thr

705 710 715 720

Met Tyr Phe Lys Leu Leu Phe Asp Glu Asn Asn His Gly Gln Ile Arg

725 730 735

Leu Ser Gly Gly Ala Glu Leu Phe Met Arg Arg Ala Ser Leu Lys Lys

740 745 750

Glu Glu Leu Val Val His Pro Ala Asn Ser Pro Ile Ala Asn Lys Asn

755 760 765

Pro Asp Asn Pro Lys Lys Thr Thr Thr Leu Ser Tyr Asp Val Tyr Lys

770 775 780

Asp Lys Arg Phe Ser Glu Asp Gln Tyr Glu Leu His Ile Pro Ile Ala

785 790 795 800

Ile Asn Lys Cys Pro Lys Asn Ile Phe Lys Ile Asn Thr Glu Val Arg

805 810 815

Val Leu Leu Lys His Asp Asp Asn Pro Tyr Val Ile Gly Ile Asp Arg

820 825 830

Gly Glu Arg Asn Leu Leu Tyr Ile Val Val Val Asp Gly Lys Gly Asn

835 840 845

Ile Val Glu Gln Tyr Ser Leu Asn Glu Ile Ile Asn Asn Phe Asn Gly

850 855 860

Ile Arg Ile Lys Thr Asp Tyr His Ser Leu Leu Asp Lys Lys Glu Lys

865 870 875 880

Glu Arg Phe Glu Ala Arg Gln Asn Trp Thr Ser Ile Glu Asn Ile Lys

885 890 895

Glu Leu Lys Ala Gly Tyr Ile Ser Gln Val Val His Lys Ile Cys Glu

900 905 910

Leu Val Glu Lys Tyr Asp Ala Val Ile Ala Leu Glu Asp Leu Asn Ser

915 920 925

Gly Phe Lys Asn Ser Arg Val Lys Val Glu Lys Gln Val Tyr Gln Lys

930 935 940

Phe Glu Lys Met Leu Ile Asp Lys Leu Asn Tyr Met Val Asp Lys Lys

945 950 955 960

Ser Asn Pro Cys Ala Thr Gly Gly Ala Leu Lys Gly Tyr Gln Ile Thr

965 970 975

Asn Lys Phe Glu Ser Phe Lys Ser Met Ser Thr Gln Asn Gly Phe Ile

980 985 990

Phe Tyr Ile Pro Ala Trp Leu Thr Ser Lys Ile Asp Pro Ser Thr Gly

995 1000 1005

Phe Val Asn Leu Leu Lys Thr Lys Tyr Thr Ser Ile Ala Asp Ser

1010 1015 1020

Lys Lys Phe Ile Ser Ser Phe Asp Arg Ile Met Tyr Val Pro Glu

1025 1030 1035

Glu Asp Leu Phe Glu Phe Ala Leu Asp Tyr Lys Asn Phe Ser Arg

1040 1045 1050

Thr Asp Ala Asp Tyr Ile Lys Lys Trp Lys Leu Tyr Ser Tyr Gly

1055 1060 1065

Asn Arg Ile Arg Ile Phe Arg Asn Pro Lys Lys Asn Asn Val Phe

1070 1075 1080

Asp Trp Glu Glu Val Cys Leu Thr Ser Ala Tyr Lys Glu Leu Phe

1085 1090 1095

Asn Lys Tyr Gly Ile Asn Tyr Gln Gln Gly Asp Ile Arg Ala Leu

1100 1105 1110

Leu Cys Glu Gln Ser Asp Lys Ala Phe Tyr Ser Ser Phe Met Ala

1115 1120 1125

Leu Met Ser Leu Met Leu Gln Met Arg Asn Ser Ile Thr Gly Arg

1130 1135 1140

Thr Asp Val Asp Phe Leu Ile Ser Pro Val Lys Asn Ser Asp Gly

1145 1150 1155

Ile Phe Tyr Asp Ser Arg Asn Tyr Glu Ala Gln Glu Asn Ala Ile

1160 1165 1170

Leu Pro Lys Asn Ala Asp Ala Asn Gly Ala Tyr Asn Ile Ala Arg

1175 1180 1185

Lys Val Leu Trp Ala Ile Gly Gln Phe Lys Lys Ala Glu Asp Glu

1190 1195 1200

Lys Leu Asp Lys Val Lys Ile Ala Ile Ser Asn Lys Glu Trp Leu

1205 1210 1215

Glu Tyr Ala Gln Thr Ser Val Lys His

1220 1225

<210> 16

<211> 3681

<212> DNA

<213> ARTIFICIAL SEQUENCE

<220>

<223> ARTIFICIAL SEQUENCE

<400> 16

tcaaagctcg agaaattcac caactgttat tcgttgagca aaacactgcg gtttaaagcg 60

attccagtcg gcaagactca agagaatata gacaataagc ggctgttggt ggaagatgaa 120

aagcgcgcgg aagactacaa aggggtgaag aagttgttgg acagatacta cctctctttt 180

atcaatgatg tcttgcactc aatcaaattg aagaatctga acaactacat ctccctcttc 240

agaaagaaaa caaggacaga aaaggagaat aaggaacttg aaaatttgga gatcaatctg 300

aggaaagaga tcgcgaaagc ctttaaaggc aacgaaggat acaaaagtct gttcaagaag 360

gatataattg agacaatttt gccagagttc ctcgatgaca aggacgagat tgcgctggtc 420

aattcgttca acggattcac aacagcattc acaggcttct ttgataatcg ggaaaatatg 480

ttctctgagg aggcaaagtc cacttctatt gcgttcaggt gtatcaatga gaatctcact 540

aggtacattt ccaacatgga tatctttgag aaggttgacg caatttttga caagcacgaa 600

gttcaggaga ttaaggagaa gatcctcaat tccgattatg acgttgagga cttcttcgaa 660

ggtgagtttt ttaatttcgt gctcactcaa gagggtatcg acgtgtataa tgcgatcatc 720

ggtgggttcg tgactgagtc cggtgaaaag attaagggat tgaacgagta tatcaacctt 780

tacaaccaaa agacgaaaca gaagctgcca aagttcaagc ctctttacaa acaggttctt 840

tcagaccgcg agtcactctc gttctatggg gagggctaca cttcggatga ggaagtcctg 900

gaggtgttca ggaatactct caataagaat tcggagattt tctcttctat aaaaaaactg 960

gaaaagttgt ttaagaattt tgacgaatac tctagcgccg gcatatttgt gaaaaacggc 1020

ccggccatat caacgataag taaagatatc ttcggcgaat ggaacgtgat cagagacaaa 1080

tggaacgcgg agtatgacga tattcacctg aagaagaagg ctgtcgtaac ggagaagtac 1140

gaggatgatc gcaggaaaag cttcaaaaag atcggaagtt tcagcctgga acagttgcag 1200

gagtatgctg acgccgatct tagcgtcgtc gagaagttga aggagataat catccaaaag 1260

gtcgacgaga tatataaagt ctatggatca agtgaaaaac tgttcgacgc cgacttcgtt 1320

ttggagaagt ccctgaagaa gaacgacgct gttgttgcca ttatgaagga tctgctcgac 1380

agcgtgaaga gtttcgagaa ctatattaag gcttttttcg gggaggggaa ggagactaac 1440

agagatgagt ccttctacgg agacttcgtc ctcgcgtacg atatactcct taaggtagac 1500

cacatctacg acgcaatcag aaattacgtg acacaaaagc cgtacagcaa ggacaagttc 1560

aaactctact tccagaaccc ccagttcatg ggcggctggg acaaggacaa ggaaacggat 1620

tacagggcta cgatcctgag gtatggttca aaatactact tggcgattat ggacaagaag 1680

tacgccaagt gtctccagaa gattgacaaa gacgatgtca atggcaatta tgagaagatc 1740

aactacaagc tgcttccggg tccgaacaag atgctcccaa aggttttctt cagcaagaaa 1800

tggatggcct actataaccc aagcgaggac atccagaaga tttataagaa cggtacgttc 1860

aagaagggcg acatgttcaa tcttaacgac tgtcacaagc tgatcgactt cttcaaagac 1920

tcaattagcc ggtacccaaa gtggtctaac gcctatgact tcaacttttc ggaaaccgag 1980

aagtacaagg atatagccgg attttataga gaggtggaag agcagggcta caaggtgtca 2040

ttcgagtccg ccagcaagaa ggaagtggac aagctcgtgg aagagggtaa gctctacatg 2100

ttccagattt ataataaaga ctttagcgat aagagccacg ggacacctaa tctccacaca 2160

atgtatttca agctgctctt cgacgagaat aaccacggcc aaatcaggtt gtcaggaggg 2220

gctgaactct tcatgcggcg cgctagcctt aagaaggagg agcttgtagt ccaccctgcg 2280

aatagtccaa ttgcgaataa gaacccggac aatcctaaaa agactacaac attgagctac 2340

gacgtgtaca aggataagag gttttccgag gatcagtacg agctccacat cccgattgcg 2400

atcaacaagt gcccaaagaa tattttcaag ataaacacag aggtgcgtgt actcctgaag 2460

catgacgaca atccttacgt cattgggatt gatcggggcg agaggaacct cctctatatt 2520

gtggtggtgg acgggaaggg gaacatagtc gaacagtact cccttaacga aataattaac 2580

aatttcaacg gcatccgtat caagaccgac taccattcgt tgctggacaa gaaggagaag 2640

gagagatttg aggcgcggca aaattggaca agtatcgaga acatcaagga actcaaagca 2700

ggttatatct ctcaagttgt gcataagata tgcgagctgg ttgagaagta tgacgcagtg 2760

atcgctcttg aggacctcaa ctcgggcttt aagaattcta gagttaaagt ggagaagcag 2820

gtctatcaaa agttcgagaa gatgcttata gataagctca actacatggt cgataagaaa 2880

tcgaacccat gtgccaccgg cggcgcactc aaaggttacc aaataacaaa caaattcgag 2940

tccttcaaat cgatgagtac tcagaatggg ttcatatttt atataccggc gtggcttacg 3000

tctaagatcg acccgtcaac tggttttgtc aacctgttga agacgaaata cacgtccatt 3060

gccgattcga aaaagttcat atctagtttt gatcgtatta tgtacgtccc agaggaagat 3120

cttttcgagt ttgctctcga ctacaaaaac ttttcgcgga ccgatgcgga ttacattaaa 3180

aaatggaaac tctattcgta cggcaacaga atcaggattt ttcgcaaccc taagaagaat 3240

aacgtctttg attgggagga agtttgcttg actagcgcgt acaaggagct ctttaataag 3300

tatggcatta actaccaaca gggtgatatc agagcactgc tttgcgaaca atctgacaag 3360

gctttctact catccttcat ggctttgatg agcctgatgc tccagatgag aaattcaatt 3420

acaggcagaa ccgacgtgga tttcttgatc tccccggtta aaaattctga tggcatcttt 3480

tacgatagca ggaactatga agcgcaagag aatgcgattc tgccaaaaaa tgcagacgcc 3540

aacggtgcct ataacatcgc caggaaagtc ctgtgggcga tcggccagtt caaaaaggcc 3600

gaagacgaaa aattggacaa ggtcaaaatc gctatcagca acaaagagtg gctggagtat 3660

gctcagacat ccgtaaagca t 3681

Claims

1. An isolated fusion polypeptide, wherein the fusion polypeptide comprises a CRISPR nuclease and a 5'→3' exonuclease, wherein the fusion polypeptide consists of the amino acid sequence of SEQ ID NO:1 or 10.

2. An isolated polynucleotide encoding the polypeptide of claim 1 and consisting of the nucleotide sequence of SEQ ID No. 2 or 11.

3. A genome editing system comprising at least one of the following i) to v):

i) The fusion polypeptide and guide RNA of claim 1;

ii) an expression construct comprising a nucleotide sequence encoding the fusion polypeptide of claim 1, and a guide RNA;

iii) The fusion polypeptide of claim 1, and an expression construct comprising a nucleotide sequence encoding a guide RNA;

iv) an expression construct comprising a nucleotide sequence encoding the fusion polypeptide of claim 1, and an expression construct comprising a nucleotide sequence encoding a guide RNA;

v) an expression construct comprising a nucleotide sequence encoding the fusion polypeptide of claim 1 and a nucleotide sequence encoding a guide RNA.

4. The genome editing system of claim 3 wherein the guide RNA is sgRNA.

5. A method of genetically modifying a cell, comprising introducing the genome editing system of claim 3 or 4 into a cell, wherein the cell is a plant cell.