[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN110964741B - Nuclear localization signal FNB and application thereof in improving base editing efficiency - Google Patents

Nuclear localization signal FNB and application thereof in improving base editing efficiency Download PDF

Info

Publication number
CN110964741B
CN110964741B CN201911323189.7A CN201911323189A CN110964741B CN 110964741 B CN110964741 B CN 110964741B CN 201911323189 A CN201911323189 A CN 201911323189A CN 110964741 B CN110964741 B CN 110964741B
Authority
CN
China
Prior art keywords
sequence
nuclear localization
localization signal
cell
leu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911323189.7A
Other languages
Chinese (zh)
Other versions
CN110964741A (en
Inventor
王飞鹏
杨进孝
宋伟
李璐
袁爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Academy of Agriculture and Forestry Sciences
Original Assignee
Beijing Academy of Agriculture and Forestry Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Academy of Agriculture and Forestry Sciences filed Critical Beijing Academy of Agriculture and Forestry Sciences
Priority to CN201911323189.7A priority Critical patent/CN110964741B/en
Publication of CN110964741A publication Critical patent/CN110964741A/en
Application granted granted Critical
Publication of CN110964741B publication Critical patent/CN110964741B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8216Methods for controlling, regulating or enhancing expression of transgenes in plant cells
    • C12N15/8218Antisense, co-suppression, viral induced gene silencing [VIGS], post-transcriptional induced gene silencing [PTGS]
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K7/00Peptides having 5 to 20 amino acids in a fully defined sequence; Derivatives thereof
    • C07K7/04Linear peptides containing only normal peptide links
    • C07K7/06Linear peptides containing only normal peptide links having 5 to 11 amino acids
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K7/00Peptides having 5 to 20 amino acids in a fully defined sequence; Derivatives thereof
    • C07K7/04Linear peptides containing only normal peptide links
    • C07K7/08Linear peptides containing only normal peptide links having 12 to 20 amino acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04001Cytosine deaminase (3.5.4.1)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04002Adenine deaminase (3.5.4.2)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/09Fusion polypeptide containing a localisation/targetting motif containing a nuclear localisation signal
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/40Fusion polypeptide containing a tag for immunodetection, or an epitope for immunisation
    • C07K2319/43Fusion polypeptide containing a tag for immunodetection, or an epitope for immunisation containing a FLAG-tag

Landscapes

  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Medicinal Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Virology (AREA)
  • Cell Biology (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

The invention discloses a nuclear localization signal FNB and application thereof in improving base editing efficiency. The nuclear localization signal FNB consists of a nuclear localization signal A and a nuclear localization signal B, and the nuclear localization signal A comprises 3 Flag1 tag protein and NLS1 protein; the nuclear localization signal b comprises a bpNLS protein; the amino acid sequence of the 3 Flag1 tag protein is 1 st to 22 nd of the sequence 8; the amino acid sequence of the NLS1 protein is 25 th to 31 th of a sequence 8; the amino acid sequence of the bpNLS protein is sequence 7. Experiments prove that: the nuclear localization signal FNB can improve the base editing efficiency and has good application prospect in the field of biological genome editing.

Description

Nuclear localization signal FNB and application thereof in improving base editing efficiency
Technical Field
The invention belongs to the technical field of biology, and particularly relates to a nuclear localization signal FNB and application thereof in improving base editing efficiency.
Background
The CRISPR-Cas9 technology has become a powerful genome editing means and is widely applied to many tissues and cells. The CRISPR/Cas9 protein-RNA complex is localized on the target by a guide RNA (guide RNA), cleaved to generate a DNA double strand break (dsDNA break, DSB), and the organism will then instinctively initiate a DNA repair mechanism to repair the DSB. Repair mechanisms are generally of two types, one being non-homologous end joining (NHEJ) and the other being homologous recombination (HDR). In general, NHEJ dominates, and repair produces random indels (insertions or deletions) much higher than precise repair. For base exact substitution, the application of using HDR to achieve base exact substitution is greatly limited because of the low efficiency of HDR and the need for a DNA template.
In 2016, two laboratories such as David Liu and Akihiko Kondo independently report two different types of Cytosine Base Editors (CBEs), respectively, and use two different types of cytidine deaminases rAPOBEC1(rat APOBEC1) and PmCDA1(activation-induced Cytosine deaminase (AID) orthogonal template), which are based on the principle that the base editing of a single Cytosine (C) base is directly realized by using the cytidine deaminase, but not by generating DSB and initiating HDR repair, so that the base editing efficiency of C to be replaced by Thymine (Thymine, T) is greatly improved. Specifically, dead Cas9(dCas9) or the Cas9 nickase (Cas9n) are positioned to a target point through sgRNA together with rAPOBEC1 or PmCDA1, rAPOBEC1 or PmCDA1 catalyzes cytosine deamination reaction of C on unpaired single-stranded DNA to Uracil (U), and the U is paired with Adenine (Adenine, a) through DNA repair and finally paired with a through DNA replication, thereby realizing C-to-T conversion. The mean mutation rate of SpCas9n (D10A) & rAPOBEC1/PmCDA1& UGI base editing system (which contains uracil DNA glycosylase inhibitor, UGI)) was higher in the editor tested for two reasons: firstly, UGI can inhibit Uracil DNA Glycosylase (UDG) from catalyzing and removing U in DNA, and secondly, SpCas9n (D10A) generates a nick on a non-editing chain, and induces a eukaryotic mismatch repair mechanism or a long-patch BER (base-extension repair) repair mechanism to promote more preferential repair of U: G mismatch into U: A.
In order to improve the working efficiency and reduce the working cost, the improvement of the base substitution efficiency has been the research direction of the base editing system of the animal and plant genome. The existing base editing system still has the condition that the base editing efficiency is not high, or a target point which can not be edited by the base editing, or the editing efficiency of a certain target base in the target point is low or the target base can not be edited.
Disclosure of Invention
The object of the present invention is to improve the efficiency of base editing in a base editing system.
In order to achieve the above object, the present invention firstly provides a kit comprising a sgRNA or a biological material related to the sgRNA, a Cas9 nuclease or a biological material related to the Cas9 nuclease, a deaminase or a biological material related to the deaminase, a nuclear localization signal a or a biological material related to the nuclear localization signal a, a nuclear localization signal b or a biological material related to the nuclear localization signal b;
the nuclear localization signal A comprises 3 Flag1 tag protein and NLS1 protein;
the nuclear localization signal b comprises a bpNLS protein;
the amino acid sequence of the 3 Flag1 tag protein is 1 st to 22 nd of the sequence 8;
the amino acid sequence of the NLS1 protein is 25 th to 31 th of a sequence 8;
the amino acid sequence of the bpNLS protein is sequence 7.
In the kit, in the nuclear localization signal a, the number of the 3 Flag 1-tagged proteins may be 1 or 2 or more, and the number of the NLS1 proteins may also be 1 or 2 or more. In a specific embodiment of the invention, said nuclear localization signal a comprises 1 of said 3 Flag1 tag protein and 1 of said NLS1 protein.
In the nuclear localization signal B, the number of the bpNLS protein can be 1 or 2 or more. In a specific embodiment of the invention, said nuclear localization signal b comprises 1 of said bpNLS proteins.
Further, the amino acid sequence of the nuclear localization signal A is A1) or A2):
A1) the amino acid sequence is a protein shown in a sequence 8;
A2) the protein which is obtained by substituting and/or deleting and/or adding one or more amino acid residues to the amino acid sequence shown in the sequence 8 in the sequence table and has the same function;
the biological material related to the nuclear localization signal A is any one of B1) to B5):
B1) a nucleic acid molecule encoding the nuclear localization signal A;
B2) an expression cassette comprising the nucleic acid molecule of B1);
B3) a recombinant vector containing the nucleic acid molecule of B1) or a recombinant vector containing the expression cassette of B2);
B4) a recombinant microorganism containing B1) the nucleic acid molecule, or a recombinant microorganism containing B2) the expression cassette, or a recombinant microorganism containing B3) the recombinant vector;
B5) a transgenic cell line comprising B1) the nucleic acid molecule or a transgenic cell line comprising B2) the expression cassette;
the amino acid sequence of the nuclear localization signal B is C1) or C2):
C1) the amino acid sequence is a protein shown in a sequence 7;
C2) the protein which is obtained by substituting and/or deleting and/or adding one or more amino acid residues to the amino acid sequence shown in the sequence 7 in the sequence table and has the same function;
the biological material related to the nuclear localization signal B is any one of D1) to D5):
D1) a nucleic acid molecule encoding the nuclear localization signal b;
D2) an expression cassette comprising the nucleic acid molecule of D1);
D3) a recombinant vector containing the nucleic acid molecule of D1) or a recombinant vector containing the expression cassette of D2);
D4) a recombinant microorganism containing D1) the nucleic acid molecule, or a recombinant microorganism containing D2) the expression cassette, or a recombinant microorganism containing D3) the recombinant vector;
D5) a transgenic cell line comprising D1) the nucleic acid molecule or a transgenic cell line comprising D2) the expression cassette.
In a still further aspect of the present invention,
B1) the nucleic acid molecule is b1) or b2) or b 3):
b1) a cDNA molecule or DNA molecule shown in 1 st to 93 th sites of a sequence 5 in a sequence table;
b2) a cDNA or DNA molecule having 75% or more identity with the nucleotide sequence defined in b1) and encoding the nuclear localization signal A;
b3) a cDNA molecule or a DNA molecule which hybridizes with the nucleotide sequence defined by b1) or b2) under strict conditions and codes the nuclear localization signal A;
D1) the nucleic acid molecule is d1) or d2) or d 3):
d1) a cDNA molecule or DNA molecule shown in the 8647-8697 th site of the sequence 1 in the sequence table;
d2) a cDNA or DNA molecule having 75% or more identity to the nucleotide sequence defined in d1) and encoding the nuclear localization signal B;
d3) a cDNA molecule or a DNA molecule which hybridizes with the nucleotide sequence defined by d1) or d2) under stringent conditions and codes for the nuclear localization signal B.
In the above kit, the Cas9 nuclease includes Cas9 nuclease or its variant, dead inactivating enzyme (dead Cas9, dCas9) or its variant, nickase (Cas9 nickase, Cas9n) or its variant from different sources. The Cas9 nucleases or variants thereof of different origins include Cas9 (such as SaCas9, SaCas9-KKH and the like) derived from bacteria, Cas9 variants (such as xCas9, Cas9-NG, Cas9-VQR, Cas9-VRER and the like) recognizing different PAMs, Cas9 high fidelity enzyme variants (such as HypaCas9, eSpCas9(1.1), Cas9-HF1 and the like) and the like.
The deaminase may be a cytosine deaminase or an adenine deaminase.
The cytosine deaminase can be human APOBEC3A, human AID, PmCDA1 or rAPOBEC1 and other proteins.
The adenine deaminase may be derived from different sources, such as adenine deaminase derived from Escherichia coli (e.g., protein such as ecTadA and ecTadA), and adenine deaminase derived from plant (e.g., protein such as OsTadA derived from rice or AtTadA derived from Arabidopsis).
In one embodiment of the present invention,
the Cas9 nuclease is a Cas9n protein;
the deaminase is cytosine deaminase; the cytosine deaminase is rAPOBEC1 protein;
the Cas9n protein is J1) or J2):
J1) the amino acid sequence is a protein shown in a sequence 3;
J2) the protein which is obtained by substituting and/or deleting and/or adding one or more amino acid residues to the amino acid sequence shown in the sequence 3 in the sequence table and has the same function;
the biological material related to the Cas9n protein is any one of K1) to K5):
K1) a nucleic acid molecule encoding the Cas9n protein;
K2) an expression cassette comprising the nucleic acid molecule of K1);
K3) a recombinant vector containing K1) the nucleic acid molecule or a recombinant vector containing K2) the expression cassette;
K4) a recombinant microorganism containing K1) the nucleic acid molecule, or a recombinant microorganism containing K2) the expression cassette, or a recombinant microorganism containing K3) the recombinant vector;
K5) a transgenic cell line containing K1) the nucleic acid molecule or a transgenic cell line containing K2) the expression cassette;
the rAPOBEC1 protein is L1) or L2):
l1) the amino acid sequence is a protein shown in the sequence 2;
l2) the amino acid sequence shown in the sequence 2 in the sequence table is substituted and/or deleted and/or added by one or more amino acid residues and has the same function;
the biological material related to the rAPOBEC1 protein is any one of M1) to M5):
m1) a nucleic acid molecule encoding said rAPOBEC1 protein;
m2) an expression cassette containing the nucleic acid molecule of M1);
m3) a recombinant vector containing the nucleic acid molecule of M1) or a recombinant vector containing the expression cassette of M2);
m4) a recombinant microorganism containing M1) the nucleic acid molecule, or a recombinant microorganism containing M2) the expression cassette, or a recombinant microorganism containing M3) the recombinant vector;
m5) a transgenic cell line containing M1) the nucleic acid molecule or a transgenic cell line containing M2) the expression cassette;
the sgRNA targets a target sequence;
the sgRNA is tRNA-sgRNA;
the tRNA-sgRNA structure is as follows: tRNA-RNA transcribed from the target sequence-esgRNA backbone;
the tRNA is N1) or N2) or N3):
n1) replacing T in the 474-550 th position of the sequence 1 with U to obtain an RNA molecule;
n2) carrying out substitution and/or deletion and/or addition of one or more nucleotides on the RNA molecule shown in N1) and having the same function;
n3) and N1) or N2) and has the same function;
the esgRNA backbone is P1) or P2) or P3):
p1) replacing T in the 571-656 position of the sequence 1 with U to obtain an RNA molecule;
p2) carrying out substitution and/or deletion and/or addition of one or more nucleotides on the RNA molecule shown in P1) and having the same function;
p3) and P1) or P2) and has the same function.
In the above kit, when the deaminase is a cytosine deaminase, the kit may further comprise UGI protein or biological material related to the UGI protein.
The UGI protein is Q1) or Q2):
q1) the amino acid sequence is the protein shown in the sequence 4;
q2) is protein which is obtained by substituting and/or deleting and/or adding one or more amino acid residues of the amino acid sequence shown in the sequence 4 in the sequence table and has the same function;
the biological material related to the UGI protein is any one of R1) to R5):
r1) a nucleic acid molecule encoding said UGI protein;
r2) an expression cassette containing the nucleic acid molecule according to R1);
r3) a recombinant vector containing the nucleic acid molecule according to R1) or a recombinant vector containing the expression cassette according to R2);
r4) a recombinant microorganism containing the nucleic acid molecule of R1), or a recombinant microorganism containing R2) the expression cassette, or a recombinant microorganism containing R3) the recombinant vector;
r5) a transgenic cell line which contains the nucleic acid molecule described in R1) or a transgenic cell line which contains the expression cassette described in R2).
The protein of A2), C2), J2), L2) and Q2) is a protein having 75% or more identity to the amino acid sequence of the protein represented by SEQ ID NO. 8, SEQ ID NO. 7, SEQ ID NO. 3, SEQ ID NO. 2 or SEQ ID NO. 4 and having the same function. The identity of 75% or more than 75% is 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity.
The protein of A2), C2), J2), L2) and Q2) can be artificially synthesized, or can be obtained by synthesizing the coding gene and then performing biological expression.
The coding genes of the proteins A2), C2), J2), L2) and Q2) can be obtained by carrying out missense mutation of one or more base pairs and/or codons which are deleted one or more amino acid residues in the DNA sequence shown by the positions 1 to 93 (the protein shown by a coding sequence 8) of the sequence 5, the positions 8647 and 8697 (the protein shown by a coding sequence 7) of the sequence 1, the positions 4012 and 8112 (the protein shown by a coding sequence 3) of the sequence 1, the positions 3280 and 3963 (the protein shown by a coding sequence 2) of the sequence 1, and the positions 8386 and 8634 (the protein shown by a coding sequence 4) of the sequence 1.
Further, K1) the nucleic acid molecule is K1) or K2) or K3):
k1) a cDNA molecule or DNA molecule shown in 4012-8112 site of the sequence 1 in the sequence table;
k2) a cDNA molecule or DNA molecule having 75% or more identity to the nucleotide sequence defined by k1) and encoding the Cas9 n;
k3) a cDNA or DNA molecule hybridizing under stringent conditions to a nucleotide sequence defined by k1) or k2) and encoding the Cas9 n;
m1) the nucleic acid molecule is M1) or M2) or M3):
m1) the cDNA molecule or DNA molecule shown in the 3280-3963 site of the sequence 1 in the sequence table;
m2) has 75% or more than 75% identity with the nucleotide sequence defined by m1) and encodes the cDNA molecule or DNA molecule of rAPOBEC 1;
m3) hybridizes under stringent conditions with the nucleotide sequence defined in m1) or m2) and encodes a cDNA molecule or a DNA molecule of the rAPOBEC 1;
r1) the nucleic acid molecule is R1) or R2) or R3):
r1) shown in the sequence 1 at the 8386-8634 site in the sequence list;
r2) has 75% or more than 75% identity with the nucleotide sequence defined by r1) and encodes the cDNA or DNA molecule of the UGI;
r3) hybridizes under stringent conditions with the nucleotide sequence defined in r1) or r2) and encodes a cDNA molecule or a DNA molecule of said UGI.
Wherein the nucleic acid molecule may be DNA, such as cDNA, genomic DNA or recombinant DNA; the nucleic acid molecule may also be RNA, such as mRNA or hnRNA, etc.
The nucleotide sequence of the present invention encoding the nuclear localization signal a or the nuclear localization signal b or the Cas9n or the rAPOBEC1 or the UGI can be easily mutated by a person of ordinary skill in the art using known methods, such as directed evolution and point mutation. Those nucleotides which are artificially modified to have 75% or more identity to the nucleotide sequence of the nuclear localization signal A or the nuclear localization signal B or the Cas9n or the rAPOBEC1 or the UGI of the present invention are derived from the nucleotide sequence of the present invention and are identical to the sequence of the present invention as long as they encode the nuclear localization signal A or the nuclear localization signal B or the Cas9n or the rAPOBEC1 or the UGI and have the same function.
The term "identity" as used herein refers to sequence similarity to a native nucleic acid sequence. "identity" includes nucleotide sequences that are 75% or more, or 85% or more, or 90% or more, or 95% or more identical to the nucleotide sequence of a protein consisting of the amino acid sequence shown in coding sequences 8, 7, 3, 2, 4 of the present invention. Identity can be assessed visually or by computer software. Using computer software, the identity between two or more sequences can be expressed in percent (%), which can be used to assess the identity between related sequences.
The stringent conditions are hybridization and washing of the membrane 2 times, 5min each, at 68 ℃ in a solution of 2 XSSC, 0.1% SDS, and 2 times, 15min each, at 68 ℃ in a solution of 0.5 XSSC, 0.1% SDS; alternatively, hybridization was carried out at 65 ℃ in a solution of 0.1 XSSPE (or 0.1 XSSC), 0.1% SDS, and the membrane was washed.
The above-mentioned identity of 75% or more may be 80%, 85%, 90% or 95% or more.
B2) The expression cassette containing a nucleic acid molecule encoding a nuclear localization signal A (nuclear localization signal A expression cassette) refers to a DNA capable of expressing the nuclear localization signal A in a host cell, and the DNA may include not only a promoter for initiating transcription of the nuclear localization signal A but also a terminator for terminating transcription of the nuclear localization signal A. Further, the expression cassette may also include an enhancer sequence. The recombinant vector containing the nuclear localization signal A expression cassette can be constructed by using the existing expression vector.
D2) The expression cassette containing a nucleic acid molecule encoding a nuclear localization signal B (a nuclear localization signal B expression cassette) refers to a DNA capable of expressing the nuclear localization signal B in a host cell, and the DNA may include not only a promoter for initiating transcription of the nuclear localization signal B but also a terminator for terminating transcription of the nuclear localization signal B. Further, the expression cassette may also include an enhancer sequence. The recombinant vector containing the nuclear localization signal B expression cassette can be constructed by using the existing expression vector.
K2) The expression cassette containing a nucleic acid molecule encoding a Cas9n protein (Cas9n gene expression cassette) refers to a DNA capable of expressing a Cas9n protein in a host cell, and the DNA can comprise a promoter for starting the transcription of a Cas9n gene and a terminator for stopping the transcription of the Cas9n gene. Further, the expression cassette may also include an enhancer sequence. Existing expression vectors can be used to construct recombinant vectors containing the Cas9n gene expression cassette.
M2) the expression cassette containing a nucleic acid molecule encoding rAPOBEC1 protein (rAPOBEC1 gene expression cassette) refers to DNA capable of expressing rAPOBEC1 protein in a host cell, which DNA may include not only a promoter that initiates transcription of the rAPOBEC1 gene, but also a terminator that terminates transcription of the rAPOBEC1 gene. Further, the expression cassette may also include an enhancer sequence. The recombinant vector containing the rAPOBEC1 gene expression cassette can be constructed by using the existing expression vector.
R2) the expression cassette containing a nucleic acid molecule encoding UGI protein (UGI gene expression cassette) refers to a DNA capable of expressing UGI protein in a host cell, which may include not only a promoter for initiating transcription of UGI gene but also a terminator for terminating transcription of UGI gene. Further, the expression cassette may also include an enhancer sequence. The recombinant vector containing the UGI gene expression cassette can be constructed using an existing expression vector.
The vector may be a plasmid, cosmid, phage or viral vector. In a particular embodiment of the invention, the recombinant vector is in particular a FNB-sCBE-1 recombinant expression vector.
The FNB-sCBE-1 recombinant expression vector is a sequence obtained by replacing the bpNLS nucleotide sequence shown in the 3229-3279 th site of the sequence 1 in the sequence of the sCBE-1 recombinant expression vector with a sequence 5 and keeping other sequences unchanged.
The microorganism may be a yeast, bacterium, algae or fungus. Wherein the bacterium can be an Agrobacterium, such as Agrobacterium EHA 105. In a particular embodiment of the invention, said recombinant microorganism is in particular Agrobacterium EHA105 containing said recombinant expression vector FNB-sCBE-1.
The transgenic cell line does not include propagation material.
The use of the above kit is as follows:
s1) editing of a genomic target sequence of an organism or cell of an organism;
s2) preparing an edited product of a genomic target sequence of the organism or cell of the organism;
s3) improving the efficiency of editing a genomic target sequence of an organism or a cell of an organism;
s4) to produce a product that increases the efficiency of editing a target sequence in the genome of an organism or cell of an organism.
The nuclear localization signal A or the biological material related to the nuclear localization signal A and/or the nuclear localization signal B or the biological material related to the nuclear localization signal B also belong to the protection scope of the invention.
In order to achieve the above object, the present invention also provides a new use of the above kit and/or the above nuclear localization signal a or a biological material related to the nuclear localization signal a and/or the above nuclear localization signal b or a biological material related to the nuclear localization signal b.
The invention provides the use of the kit as described above and/or of the nuclear localization signal A as described above or of a biological material associated with the nuclear localization signal A as described above and/or of the nuclear localization signal B as described above or of a biological material associated with the nuclear localization signal B as described above in any of S1) -S4):
s1) editing of a genomic target sequence of an organism or cell of an organism;
s2) preparing an edited product of a genomic target sequence of the organism or cell of the organism;
s3) improving the efficiency of editing a genomic target sequence of an organism or a cell of an organism;
s4) to produce a product that increases the efficiency of editing a target sequence in the genome of an organism or cell of an organism.
In order to achieve the above object, the present invention also provides the method of T1) or T2):
t1) or a method for increasing the efficiency of editing a genomic target sequence of an organism or cell comprising the steps of: allowing an organism or a biological cell to express the nuclear localization signal A, the nuclear localization signal B, the sgRNA, the Cas9 nuclease, and the deaminase; the sgRNA targets the target sequence;
t2) biological mutant, comprising the following steps: editing the genome of the organism according to the method described in T1) to obtain a biological mutant.
In the above method, when the deaminase is a cytosine deaminase, T1) wherein the sgRNA is tRNA-esgRNA, the tRNA-esgRNA obtained by transcription of the DNA molecule of the tRNA-esgRNA is an immature RNA precursor, and the tRNA in the RNA precursor is cleaved by two enzymes (RNase P and RNase Z) to obtain mature RNA. And (b) obtaining a plurality of independent mature RNAs by the number of targets in a recombinant expression vector, wherein each mature RNA consists of RNA transcribed by the target sequence and the esgRNA skeleton in sequence or consists of RNA transcribed by the target sequence, the esgRNA skeleton and residual individual bases of the tRNA in sequence.
In the above method, when the deaminase is a cytosine deaminase, the T1) may further comprise a step of introducing the UGI protein into an organism or an organism cell, and the number of UGIs may be 1 or 2 or more. In a specific embodiment of the present invention, the number of the UGIs is specifically 2.
Further, the method for expressing the nuclear localization signal a, the nuclear localization signal b, the sgRNA, the Cas9 nuclease and the deaminase in an organism or a biological cell is to introduce a gene encoding the nuclear localization signal a, a gene encoding the nuclear localization signal b, a DNA molecule for transcribing the sgRNA, a gene encoding the Cas9 nuclease, a gene encoding the cytosine deaminase and a gene encoding UGI into the organism or the biological cell.
Furthermore, the coding gene of the nuclear localization signal A, the coding gene of the nuclear localization signal B, the DNA molecule for transcribing the sgRNA, the coding gene of the Cas9 nuclease, the coding gene of the cytosine deaminase and the coding gene of the UGI are introduced into an organism or an organism cell through a recombinant expression vector. The coding gene of the nuclear localization signal A, the coding gene of the nuclear localization signal B, the DNA molecule for transcribing the sgRNA, the coding gene of the Cas9 nuclease, the coding gene of the cytosine deaminase and the coding gene of the UGI can be introduced into organisms or biological cells through the same recombinant expression vector, or can be introduced into the organisms or biological cells through two or more recombinant expression vectors.
In a specific embodiment of the invention, the recombinant expression vector comprises an expression cassette consisting of a promoter, a coding gene of a nuclear localization signal A, a coding gene of cytosine deaminase rAPOBEC1, a coding gene of Cas9n nuclease, a coding gene of UGI, a coding gene of a nuclear localization signal B and a terminator in sequence. The recombinant expression vector can be the FNB-sCBE-1 recombinant expression vector.
In the kit or use or method, the number of target sequences may be 1 or 2 or more.
In the above kit or use or method, the editing of the target sequence may be a mutation of base a to base G or a mutation of base C to base T. The base A can be a base A at any position on the target sequence, and the base C can be a base C at any position on the target sequence.
In the above-described kit of parts or use or method,
the organism is X1) or X2) or X3) or X4):
x1) plant or animal;
x2) a monocot or dicot;
x3) gramineous plants;
x4) rice;
the biological cell is Y1) or Y2) or Y3) or Y4):
y1) plant cells or animal cells;
y2) a monocotyledonous or dicotyledonous plant cell;
y3) a graminaceous plant cell;
y4) rice cells.
The invention adds 3 Flag1&1 NLS1 nuclear localization signal (nuclear localization signal A) in front of rAPOBEC1 element in rAPOBEC1& Cas9n & UGI base editing system, and finds that after adding bpNLS nuclear localization signal (nuclear localization signal B) behind UGI element: the base editing system rAPOBEC1& Cas9n & UGI has obviously improved C.T base replacement efficiency (up to 85.7 percent).
Drawings
FIG. 1 is a schematic diagram of a structure of a recombinant expression vector of a cytosine base editor.
Detailed Description
The present invention is described in further detail below with reference to specific embodiments, which are given for the purpose of illustration only and are not intended to limit the scope of the invention. The experimental procedures in the following examples are conventional unless otherwise specified. Materials, reagents, instruments and the like used in the following examples are commercially available unless otherwise specified. In the following examples, unless otherwise specified, the 1 st position of each nucleotide sequence in the sequence listing is the 5 'terminal nucleotide of the corresponding DNA/RNA, and the last position is the 3' terminal nucleotide of the corresponding DNA/RNA.
Primer pair T1 was composed of primer T1-F: 5'-GCGAATGGCCACAGGG-3' and primer T1-R: 5'-TCTGATCATCATGGATTCCTTC-3', and is used for amplifying target points T1 and T2.
Primer pair T3 was composed of primer T3-F: 5'-CTCATCCACGACACATCCATAC-3' and primer T3-R: 5'-ATACTTCTGGCTTATGCTTGCG-3', and is used for amplifying target T3.
Primer pair T4 was composed of primer T4-F: 5'-GCCATCAACTAAACACAGCC-3' and primer T4-R: 5'-CATGAGCGTGAGAATTCTGATC-3', and is used for amplifying target T4.
In the following examples, C.T base substitutions refer to mutations from C to T at any position in the target sequence.
The efficiency of C · T base substitution was equal to the number of positive T0 seedlings with C · T base substitution/total positive T0 seedlings analyzed × 100%.
Japanese fine rice: reference documents: the effects of sodium nitroprusside and its photolysis products on the growth of Nippon rice seedlings and the expression of 5 hormone marker genes [ J ]. proceedings of university of Master Henan (Nature edition), 2017(2): 48-52.; the public is available from the agroforestry academy of sciences of Beijing.
Recovering the culture medium: n6 solid medium containing 200mg/L timentin.
Screening a culture medium: n6 solid medium containing 50mg/L hygromycin.
Differentiation medium: n6 solid culture medium containing 2mg/L KT, 0.2mg/L NAA, 0.5g/L glutamic acid and 0.5g/L proline.
Rooting culture medium: n6 solid medium containing 0.2mg/L NAA, 0.5g/L glutamic acid, 0.5g/L proline.
Example 1 application of Nuclear localization Signal FNB to improvement of C.T base substitution efficiency
Design and construction of recombinant expression vector
1. Design of recombinant expression vectors
A nuclear localization signal is added in an rAPOBEC1& Cas9n & UGI base editing system, the added nuclear localization signal is divided into four design types according to different added nuclear localization signals, and the structural schematic diagram of the recombinant expression vector containing the four different design types is shown in figure 1. The design modes of four different design types are as follows:
sCBE system: the bpNLS nuclear localization signal was added before the rAPOBEC1 element in the rAPOBEC1& Cas9n & UGI base editing system and after the UGI element. The amino acid sequence of the bpNLS nuclear localization signal is as follows: KRTADGSEFEPKKKRKV (SEQ ID NO: 7). This type of design is designated bpNLS-bpNLS.
FNLS-sCBE system: at rAPOBEC1&Cas9n&Addition of 3 Flag1 to rAPOBEC1 element in UGI base editing System&1 × NLS1 nuclear localization signal, and 1 × NLS1 nuclear localization signal was added after UGI element. 3 Flag1&The 1 × NLS1 nuclear localization signal sequentially comprises 13 × Flag1 tag protein and 1 NLS1 protein, and 3 × Flag1&1-NLS 1 amino acid sequenceThe following were used:
Figure BDA0002327698710000081
(sequence 8), wherein the amino acid sequence of the 3 Flag1 tag protein is shown by underlining, and the amino acid sequence of the NLS1 protein is shown by wavy lines. The 1 × NLS1 nuclear localization signal includes 1 NLS1 protein, and the amino acid sequence of 1 × NLS1 is as follows: PKKKRKV. This design type is denoted as 3 Flag1&1*NLS1-1*NLS1。
F4NLS-sCBE System: at rAPOBEC1&Cas9n&Addition of 3 Flag2 to rAPOBEC1 element in UGI base editing System&4 × NLS2 nuclear localization signal, and 4 × NLS2 nuclear localization signal was added after UGI element. 3 Flag2&The 4 × NLS2 nuclear localization signal sequentially comprises 13 × Flag2 tag protein and 4NLS 2 protein, 3 × Flag2&4NLS 2 the amino acid sequence of the nuclear localization signal is as follows:
Figure BDA0002327698710000091
the amino acid sequence of the 3 Flag2 tag protein is shown by underlining, and the amino acid sequence of the NLS2 protein is shown by wavy lines. The 4 × NLS2 nuclear localization signal includes 4NLS 2 proteins, and the amino acid sequence of the 4 × NLS2 nuclear localization signal is as follows: PKKKRKVGGSPKKKRKVGGSPKKKRKVGGSPKKKRKV are provided. This design type is denoted as 3 Flag2&4*NLS2-4*NLS2。
FNB-sCBE system: the 3 Flag1&1 NLS1 nuclear localization signal was added before the rAPOBEC1 element in the rAPOBEC1& Cas9n & UGI base editing system and the bpNLS nuclear localization signal was added after the UGI element. 3 Flag1&1 NLS1 nuclear localization signal amino acid sequence of 8, bpNLS nuclear localization signal amino acid sequence of 7. This design type was designated 3 Flag1&1 NLS1-bpnls (fnb).
2. Construction of recombinant expression vectors
The following recombinant expression vectors were artificially synthesized, each of which was a circular plasmid:
a recombinant expression vector comprising bpNLS-bpNLS: sCBE-1;
a recombinant expression vector containing 3 Flag1&1 NLS1-1 NLS 1: FNLS-sCBE-1;
a recombinant expression vector containing 3 Flag2& 4NLS 2-4 NLS 2: f4 NLS-sCBE-1;
a recombinant expression vector containing 3 Flag1&1 NLS 1-bpNLS: FNB-sCBE-1.
The nucleotide sequence of the sCBE-1 recombinant expression vector is sequence 1 in a sequence table. Wherein, the 131-467 site of the sequence 1 is a nucleotide sequence of OsU3 promoter, the 474-550 site, 657-733 site, 840-916 site and 1023-1099 site are nucleotide sequences of tRNA, the 551-656 site, 734-839 site, 917-1022 site and 1100-1205 site are nucleotide sequences of esgRNA, the 551-570 site, 734-753 site, 917-936 site and 1100-1119 site are nucleotide sequences of T1, T2, T3 and T4 targets, respectively, the 571-656 site, 754-839 site, 937-1022 site and 1120-1119 site are nucleotide sequences of esgRNA framework, and the 1206-1496 site is a nucleotide sequence of OsU3 terminator; the 1503-3216 site of the sequence 1 is a nucleotide sequence of an OsUbq3 promoter, the 3229-3279 site is a bpNLS nucleotide sequence, the 3280-3963 site is a coding sequence (without a stop codon) of an rAPOBEC1 protein, and the coding sequence 2 shows the rAPOBEC1 protein; the 4012-8112 site of the sequence 1 is a coding sequence (without a stop codon) of the Cas9n protein, and the coding sequence 3 shows the Cas9n protein; the 8125-8373 and 8386-8634 of the sequence 1 are both UGI protein coding sequences (without stop codons), and the UGI protein shown in the coding sequence 4; the 8647-8697 th site of the sequence 1 is a bpNLS nucleotide sequence; the 9040-position 9292 of the sequence 1 is a Nos terminator sequence; the 9333-11325 site of the sequence 1 is the nucleotide sequence of ZmUbi1 promoter, the 11332-12357 site is the coding sequence of hygromycin phosphotransferase, and the 12384-12599 site is the nucleotide sequence of CaMV35S polyA. The four targets of the sCBE-1 recombinant expression vector are T1, T2, T3 and T4 respectively, and the sequences are shown in Table 1.
The nucleotide sequence of the FNLS-sCBE-1 recombinant expression vector is a sequence obtained by replacing the bpNLS nucleotide sequence shown in the 3229-3279 th site of the sequence 1 in the sCBE-1 recombinant expression vector sequence with a sequence 5, replacing the bpNLS nucleotide sequence shown in the 8647-8697 th site of the sequence 5 with a nucleotide sequence shown in the 73-93 th site of the sequence 5 and keeping other sequences unchanged. Wherein, the 1 st to 66 th positions of the sequence 5 are 3 Flag1 nucleotide sequences, the 73 th to 93 th positions are NLS1 nucleotide sequences, and the sequence 5 totally contains 1 NLS1 nucleotide sequences.
The nucleotide sequence of the F4NLS-sCBE-1 recombinant expression vector is a sequence obtained by replacing the bpNLS nucleotide sequence shown in 3229-3279 th site of the sequence 1 in the sCBE-1 recombinant expression vector sequence with a sequence 6, replacing the bpNLS nucleotide sequence shown in 8647-8697 th site with a nucleotide sequence shown in 55-201 th site of the sequence 6 and keeping other sequences unchanged. Wherein, the 1 st to 66 th positions of the sequence 6 are 3 Flag2 nucleotide sequences, the 73 rd to 93 th positions, the 103 rd and 123 th positions, the 133 th and 153 th positions and the 163 th and 183 th positions are NLS2 nucleotide sequences, and the sequence 6 totally contains 4NLS 2 nucleotide sequences.
The FNB-sCBE-1 recombinant expression vector is a sequence obtained by replacing the nucleotide sequence of bpNLS shown in the 3229-3279 th site of the sequence 1 in the sequence of the sCBE-1 recombinant expression vector with the sequence 5 and keeping other sequences unchanged.
The target nucleotide sequence and the corresponding PAM sequence of each vector are shown in table 1.
TABLE 1
Figure BDA0002327698710000101
Second, obtaining the Positive T0 Rice seedlings
Respectively operating the sCBE-1 vector, the FNLS-sCBE-1 vector, the F4NLS-sCBE-1 vector and the FNB-sCBE-1 vector obtained in the step one according to the following steps 1-9:
1. the vector was introduced into Agrobacterium EHA105 (product of Shanghai Diego Biotechnology Ltd., CAT #: AC1010) to obtain recombinant Agrobacterium.
2. Culturing the recombinant Agrobacterium with a medium (YEP medium containing 50. mu.g/ml kanamycin and 25. mu.g/ml rifampicin), shaking at 28 ℃ and 150rpm to OD600At room temperature, centrifuging at 10000rpm for 1min, resuspending the thallus with an infection solution (glucose and sucrose are replaced by N6 liquid culture medium, and the concentrations of glucose and sucrose in the infection solution are 10g/L and 20g/L respectively) and diluting to OD600And the concentration is 0.2, and an agrobacterium tumefaciens infection solution is obtained.
3. The mature seeds of the rice variety Nipponbare are shelled and threshed, placed in a 100mL triangular flask, added with 70% (v/v) ethanol water solution for soaking for 30sec, then placed in 25% (v/v) sodium hypochlorite water solution, sterilized by shaking at 120rpm for 30min, washed by sterile water for 3 times, sucked by filter paper to remove water, then placed on an N6 solid culture medium with the embryo of the seeds facing downwards, and cultured in dark at 28 ℃ for 4-6 weeks to obtain the callus of the rice.
4. After the step 3 is completed, soaking the rice callus in an agrobacterium infection solution A (the agrobacterium infection solution A is a liquid obtained by adding acetosyringone into the agrobacterium infection solution, the addition amount of the acetosyringone meets the volume ratio of the acetosyringone to the agrobacterium infection solution of 25 mul: 50ml), soaking for 10min, then placing the rice callus on a culture dish (containing about 200ml of the agrobacterium-free infection solution) paved with two layers of sterilization filter paper, and performing dark culture at 21 ℃ for 1 day.
5. And (4) putting the rice callus obtained in the step (4) on a recovery culture medium, and performing dark culture at 25-28 ℃ for 3 days.
6. And (4) placing the rice callus obtained in the step (5) on a screening culture medium, and performing dark culture at 28 ℃ for 2 weeks.
7. And (4) putting the rice callus obtained in the step (6) on a screening culture medium again, and performing dark culture at 28 ℃ for 2 weeks to obtain the rice resistance callus.
8. And (3) putting the rice resistant callus obtained in the step (7) on a differentiation culture medium, performing illumination culture at 25 ℃ for about 1 month, transplanting the differentiated plantlets on a rooting culture medium, and performing illumination culture at 25 ℃ for 2 weeks to obtain rice T0 seedlings.
9. Extracting genome DNA of rice T0 seedling, using the genome DNA as a template, and performing PCR amplification by using a primer pair consisting of a primer F (5'-CCGAGGAGACTATCACCCCT-3') and a primer R (5'-CGACCCATAACCTTGACAAGC-3') to obtain a PCR amplification product; the PCR amplification product was subjected to agarose gel electrophoresis, followed by judgment as follows: if the PCR amplification product contains a DNA fragment of about 853bp, the corresponding rice T0 seedling is a rice positive T0 seedling; if the PCR amplification product does not contain the DNA fragment of about 853bp, the corresponding rice T0 seedling is not the rice positive T0 seedling.
Third, result analysis
1. Taking the genomic DNA of the rice positive T0 seedling obtained in the step two as a template for each vector, and carrying out PCR amplification on T1 by adopting primers for T1 and T2 targets to obtain PCR amplification products; for the T3 target, carrying out PCR amplification on T3 by adopting a primer pair to obtain a PCR amplification product; for the T4 target, PCR amplification is carried out by adopting a primer pair T4 to obtain a PCR amplification product.
2. And (3) carrying out Sanger sequencing and analysis on the PCR amplification product obtained in the step (1). The sequencing results were analyzed only for each target region. The number of positive T0 seedlings with C.T base substitution at T1, T2, T3 and T4 is counted respectively, and the C.T base substitution efficiency is calculated, and the result is shown in Table 2.
The nuclear localization signal used by the sCBE system is bpNLS-bpNLS; the nuclear localization signal used by the FNLS-sccbe system was 3 Flag1&1 NLS1-1 NLS 1; the nuclear localization signal used by the F4 NLS-sccbe system was 3 Flag2& 4NLS 2-4 NLS 2; the nuclear localization signal used by the FNB-sccbe system was 3 Flag1&1 NLS 1-bpNLS.
The base editing results of different systems on the T1, T2, T3 and T4 targets show that: compared with the sCBE and FNLS-sCBE systems, the replacement efficiency of C.T basic groups of all four target points, namely the FNB-sCBE system, is greatly improved. Compared with the F4NLS-sCBE system, the FNB-sCBE system has the advantages that the C.T base substitution efficiency of the FNB-sCBE system is greatly improved compared with that of the F4NLS-sCBE system except for the T2 target point. Particularly for the T3 target point, the other three systems can not realize editing, and the FNB-sCBE system can realize editing. It can be seen that FNB-sCBE using the nuclear localization signals 3 Flag &1 NLS1 and bpNLS in combination can significantly improve the efficiency of C · T base substitution.
TABLE 2
Figure BDA0002327698710000111
Sequence listing
<110> agriculture and forestry academy of sciences of Beijing City
<120> nuclear localization signal FNB and application thereof in improving base editing efficiency
<160>8
<170>PatentIn version 3.5
<210>1
<211>19005
<212>DNA
<213>Artificial Sequence
<400>1
ggtggcagga tatattgtgg tgtaaacatg gcactagcct caccgtcttc gcagacgagg 60
ccgctaagtc gcagctacgc tctcaacggc actgactagg tagtttaaac gtgcacttaa 120
ttaaggtacc gaagcaactt aaagttatca ggcatgcatg gatcttggag gaatcagatg 180
tgcagtcagg gaccatagca caagacaggc gtcttctact ggtgctacca gcaaatgctg 240
gaagccggga acactgggta cgttggaaac cacgtgatgt gaagaagtaa gataaactgt 300
aggagaaaag catttcgtag tgggccatga agcctttcag gacatgtatt gcagtatggg 360
ccggcccatt acgcaattgg acgacaacaa agactagtat tagtaccacc tcggctatcc 420
acatagatca aagctgattt aaaagagttg tgcagatgat ccgtggcgga tccaacaaag 480
caccagtggt ctagtggtag aatagtaccc tgccacggta cagacccggg ttcgattccc 540
ggctggtgca ttgtaatcaa ctccagtgtc gtttcagagc tatgctggaa acagcatagc 600
aagttgaaat aaggctagtc cgttatcaac ttgaaaaagt ggcaccgagt cggtgcaaca 660
aagcaccagt ggtctagtgg tagaatagta ccctgccacg gtacagaccc gggttcgatt 720
cccggctggt gcaccttctc caggaatgac ggagtttcag agctatgctg gaaacagcat 780
agcaagttga aataaggcta gtccgttatc aacttgaaaa agtggcaccg agtcggtgca 840
acaaagcacc agtggtctag tggtagaata gtaccctgcc acggtacaga cccgggttcg 900
attcccggct ggtgcagacc agccagcgtc tggcgcgttt cagagctatg ctggaaacag 960
catagcaagt tgaaataagg ctagtccgtt atcaacttga aaaagtggca ccgagtcggt 1020
gcaacaaagc accagtggtc tagtggtaga atagtaccct gccacggtac agacccgggt 1080
tcgattcccg gctggtgcaa atcctgatga tgctgcagtg tttcagagct atgctggaaa 1140
cagcatagca agttgaaata aggctagtcc gttatcaact tgaaaaagtg gcaccgagtc 1200
ggtgcttttt tttttcgttt tgcattgagt tttctccgtc gcatgtttgc agttttattt 1260
tccgttttgc attgaaattt ctccgtctca tgtttgcagc gtgttcaaaa agtacgcagc 1320
tgtatttcac ttatttacgg cgccacattt tcatgccgtt tgtgccaact atcccgagct 1380
agtgaataca gcttggcttc acacaacact ggtgacccgc tgacctgctc gtacctcgta 1440
ccgtcgtacg gcacagcatt tggaattaaa gggtgtgatc gatactgctt gctgctaagc 1500
ttacaaattc gggtcaaggc ggaagccagc gcgccacccc acgtcagcaa atacggaggc 1560
gcggggttga cggcgtcacc cggtcctaac ggcgaccaac aaaccagcca gaagaaatta 1620
cagtaaaaaa aaagtaaatt gcactttgat ccacctttta ttacctaagt ctcaatttgg 1680
atcaccctta aacctatctt ttcaatttgg gccgggttgt ggtttggact accatgaaca 1740
acttttcgtc atgtctaact tccctttcag caaacatatg aaccatatat agaggagatc 1800
ggccgtatac tagagctgat gtgtttaagg tcgttgattg cacgagaaaa aaaaatccaa 1860
atcgcaacaa tagcaaattt atctggttca aagtgaaaag atatgtttaa aggtagtcca 1920
aagtaaaact tatagataat aaaatgtggt ccaaagcgta attcactcaa aaaaaatcaa 1980
cgagacgtgt accaaacgga gacaaacggc atcttctcga aatttcccaa ccgctcgctc 2040
gcccgcctcg tcttcccgga aaccgcggtg gtttcagcgt ggcggattct ccaagcagac 2100
ggagacgtca cggcacggga ctcctcccac cacccaaccg ccataaatac cagccccctc 2160
atctcctctc ctcgcatcag ctccaccccc gaaaaatttc tccccaatct cgcgaggctc 2220
tcgtcgtcga atcgaatcct ctcgcgtcct caaggtacgc tgcttctcct ctcctcgctt 2280
cgtttcgatt cgatttcgga cgggtgaggt tgttttgttg ctagatccga ttggtggtta 2340
gggttgtcga tgtgattatc gtgagatgtt taggggttgt agatctgatg gttgtgattt 2400
gggcacggtt ggttcgatag gtggaatcgt ggttaggttt tgggattgga tgttggttct 2460
gatgattggg gggaattttt acggttagat gaattgttgg atgattcgat tggggaaatc 2520
ggtgtagatc tgttggggaa ttgtggaact agtcatgcct gagtgattgg tgcgatttgt 2580
agcgtgttcc atcttgtagg ccttgttgcg agcatgttca gatctactgt tccgctcttg 2640
attgagttat tggtgccatg ggttggtgca aacacaggct ttaatatgtt atatctgttt 2700
tgtgtttgat gtagatctgt agggtagttc ttcttagaca tggttcaatt atgtagcttg 2760
tgcgtttcga tttgatttca tatgttcaca gattagataa tgatgaactc ttttaattaa 2820
ttgtcaatgg taaataggaa gtcttgtcgc tatatctgtc ataatgatct catgttacta 2880
tctgccagta atttatgcta agaactatat tagaatatca tgttacaatc tgtagtaata 2940
tcatgttaca atctgtagtt catctatata atctattgtg gtaatttctt tttactatct 3000
gtgtgaagat tattgccact agttcattct acttatttct gaagttcagg atacgtgtgc 3060
tgttactacc tatctgaata catgtgtgat gtgcctgtta ctatcttttt gaatacatgt 3120
atgttctgtt ggaatatgtt tgctgtttga tccgttgttg tgtccttaat cttgtgctag 3180
ttcttaccct atctgtttgg tgattatttc ttgcagtacg taagcatgaa gaggaccgcc 3240
gacggcagcg agttcgagcc gaagaagaag aggaaggtgt ccagcgagac aggaccagtg 3300
gcagtcgacc caacactgcg caggcggatc gagccacacg agttcgaggt gttcttcgat 3360
ccgagggagc tccggaagga gacatgcctc ctgtacgaga tcaactgggg cggccgccac 3420
tctatctgga ggcatacctc acagaacaca aataagcatg tggaggtcaa cttcatcgag 3480
aagttcacca cagagcggta cttctgcccg aatacgcgct gctccatcac ctggttcctg 3540
tcgtggtccc catgcggaga gtgctcgagg gcaatcacgg agttcctctc ccgctacccg 3600
cacgtcaccc tgttcatcta catcgcacgg ctctaccacc atgcggaccc gcggaatagg 3660
cagggcctcc gcgatctgat ctcttcaggc gtgacaatcc agatcatgac ggagcaggag 3720
tcaggctact gctggaggaa cttcgtcaat tacagcccat ctaacgaggc acactggccg 3780
cgctacccgc atctctgggt gcgcctctac gtgctcgagc tgtactgcat catcctcggc 3840
ctgccgccat gcctcaatat cctgcgcagg aagcagccgc agctgacgtt cttcaccatc 3900
gccctccaga gctgccacta ccagcggctc cctccgcata tcctgtgggc gacaggcctc 3960
aagtcaggct cggagacacc tggcacgtcc gagagcgcca ccccggagtc tgacaagaag 4020
tactccatcg gcctcgccat cggcaccaac agcgtcggct gggcggtgat caccgacgag 4080
tacaaggtcc cgtccaagaa gttcaaggtc ctgggcaaca ccgaccgcca ctccatcaag 4140
aagaacctca tcggcgccct cctcttcgac tccggcgaga cggcggaggc gacccgcctc 4200
aagcgcaccg cccgccgccg ctacacccgc cgcaagaacc gcatctgcta cctccaggag 4260
atcttctcca acgagatggc gaaggtcgac gactccttct tccaccgcct cgaggagtcc 4320
ttcctcgtgg aggaggacaa gaagcacgag cgccacccca tcttcggcaa catcgtcgac 4380
gaggtcgcct accacgagaa gtaccccact atctaccacc ttcgtaagaa gcttgttgac 4440
tctactgata aggctgatct tcgtctcatc taccttgctc tcgctcacat gatcaagttc 4500
cgtggtcact tccttatcga gggtgacctt aaccctgata actccgacgt ggacaagctc 4560
ttcatccagc tcgtccagac ctacaaccag ctcttcgagg agaaccctat caacgcttcc 4620
ggtgtcgacg ctaaggcgat cctttccgct aggctctcca agtccaggcg tctcgagaac 4680
ctcatcgccc agctccctgg tgagaagaag aacggtcttt tcggtaacct catcgctctc 4740
tccctcggtc tgacccctaa cttcaagtcc aacttcgacc tcgctgagga cgctaagctt 4800
cagctctcca aggataccta cgacgatgat ctcgacaacc tcctcgctca gattggagat 4860
cagtacgctg atctcttcct tgctgctaag aacctctccg atgctatcct cctttcggat 4920
atccttaggg ttaacactga gatcactaag gctcctcttt ctgcttccat gatcaagcgc 4980
tacgacgagc accaccagga cctcaccctc ctcaaggctc ttgttcgtca gcagctcccc 5040
gagaagtaca aggagatctt cttcgaccag tccaagaacg gctacgccgg ttacattgac 5100
ggtggagcta gccaggagga gttctacaag ttcatcaagc caatccttga gaagatggat 5160
ggtactgagg agcttctcgt taagcttaac cgtgaggacc tccttaggaa gcagaggact 5220
ttcgataacg gctctatccc tcaccagatc caccttggtg agcttcacgc catccttcgt 5280
aggcaggagg acttctaccc tttcctcaag gacaaccgtg agaagatcga gaagatcctt 5340
actttccgta ttccttacta cgttggtcct cttgctcgtg gtaactcccg tttcgcttgg 5400
atgactagga agtccgagga gactatcacc ccttggaact tcgaggaggt tgttgacaag 5460
ggtgcttccg cccagtcctt catcgagcgc atgaccaact tcgacaagaa cctccccaac 5520
gagaaggtcc tccccaagca ctccctcctc tacgagtact tcacggtcta caacgagctc 5580
accaaggtca agtacgtcac cgagggtatg cgcaagcctg ccttcctctc cggcgagcag 5640
aagaaggcta tcgttgacct cctcttcaag accaaccgca aggtcaccgt caagcagctc 5700
aaggaggact acttcaagaa gatcgagtgc ttcgactccg tcgagatcag cggcgttgag 5760
gaccgtttca acgcttctct cggtacctac cacgatctcc tcaagatcat caaggacaag 5820
gacttcctcg acaacgagga gaacgaggac atcctcgagg acatcgtcct cactcttact 5880
ctcttcgagg atagggagat gatcgaggag aggctcaaga cttacgctca tctcttcgat 5940
gacaaggtta tgaagcagct caagcgtcgc cgttacaccg gttggggtag gctctcccgc 6000
aagctcatca acggtatcag ggataagcag agcggcaaga ctatcctcga cttcctcaag 6060
tctgatggtt tcgctaacag gaacttcatg cagctcatcc acgatgactc tcttaccttc 6120
aaggaggata ttcagaaggc tcaggtgtcc ggtcagggcg actctctcca cgagcacatt 6180
gctaaccttg ctggttcccc tgctatcaag aagggcatcc ttcagactgt taaggttgtc 6240
gatgagcttg tcaaggttat gggtcgtcac aagcctgaga acatcgtcat cgagatggct 6300
cgtgagaacc agactaccca gaagggtcag aagaactcga gggagcgcat gaagaggatt 6360
gaggagggta tcaaggagct tggttctcag atccttaagg agcaccctgt cgagaacacc 6420
cagctccaga acgagaagct ctacctctac tacctccaga acggtaggga tatgtacgtt 6480
gaccaggagc tcgacatcaa caggctttct gactacgacg tcgaccacat tgttcctcag 6540
tctttcctta aggatgactc catcgacaac aaggtcctca cgaggtccga caagaacagg 6600
ggtaagtcgg acaacgtccc ttccgaggag gttgtcaaga agatgaagaa ctactggagg 6660
cagcttctca acgctaagct cattacccag aggaagttcg acaacctcac gaaggctgag 6720
aggggtggcc tttccgagct tgacaaggct ggtttcatca agaggcagct tgttgagacg 6780
aggcagatta ccaagcacgt tgctcagatc ctcgattcta ggatgaacac caagtacgac 6840
gagaacgaca agctcatccg cgaggtcaag gtgatcaccc tcaagtccaa gctcgtctcc 6900
gacttccgca aggacttcca gttctacaag gtccgcgaga tcaacaacta ccaccacgct 6960
cacgatgctt accttaacgc tgtcgttggt accgctctta tcaagaagta ccctaagctt 7020
gagtccgagt tcgtctacgg tgactacaag gtctacgacg ttcgtaagat gatcgccaag 7080
tccgagcagg agatcggcaa ggccaccgcc aagtacttct tctactccaa catcatgaac 7140
ttcttcaaga ccgagatcac cctcgccaac ggcgagatcc gcaagcgccc tcttatcgag 7200
acgaacggtg agactggtga gatcgtttgg gacaagggtc gcgacttcgc tactgttcgc 7260
aaggtccttt ctatgcctca ggttaacatc gtcaagaaga ccgaggtcca gaccggtggc 7320
ttctccaagg agtctatcct tccaaagaga aactcggaca agctcatcgc taggaagaag 7380
gattgggacc ctaagaagta cggtggtttc gactccccta ctgtcgccta ctccgtcctc 7440
gtggtcgcca aggtggagaa gggtaagtcg aagaagctca agtccgtcaa ggagctcctc 7500
ggcatcacca tcatggagcg ctcctccttc gagaagaacc cgatcgactt cctcgaggcc 7560
aagggctaca aggaggtcaa gaaggacctc atcatcaagc tccccaagta ctctcttttc 7620
gagctcgaga acggtcgtaa gaggatgctg gcttccgctg gtgagctcca gaagggtaac 7680
gagcttgctc ttccttccaa gtacgtgaac ttcctctacc tcgcctccca ctacgagaag 7740
ctcaagggtt cccctgagga taacgagcag aagcagctct tcgtggagca gcacaagcac 7800
tacctcgacg agatcatcga gcagatctcc gagttctcca agcgcgtcat cctcgctgac 7860
gctaacctcg acaaggtcct ctccgcctac aacaagcacc gcgacaagcc catccgcgag 7920
caggccgaga acatcatcca cctcttcacg ctcacgaacc tcggcgcccc tgctgctttc 7980
aagtacttcg acaccaccat cgacaggaag cgttacacgt ccaccaagga ggttctcgac 8040
gctactctca tccaccagtc catcaccggt ctttacgaga ctcgtatcga cctttcccag 8100
cttggtggtg attccggcgg cagcaccaac ctctccgaca tcatcgagaa ggagacaggc 8160
aagcagctcg tgatccagga gagcatcctc atgctcccgg aggaggtgga ggaggtcatc 8220
ggcaacaagc cggagtccga catcctcgtg cacaccgcct acgacgagtc caccgacgag 8280
aacgtgatgc tcctcacctc agatgcacca gagtacaagc catgggcact cgtgatccag 8340
gacagcaacg gcgagaacaa gatcaagatg ctctccggcg gctccaccaa cctctccgac 8400
atcatcgaga aggagacagg caagcagctc gtgatccagg agagcatcct catgctcccg 8460
gaggaggtgg aggaggtcat cggcaacaag ccggagtccg acatcctcgt gcacaccgcc 8520
tacgacgagt ccaccgacga gaacgtgatg ctcctcacct cagatgcacc agagtacaag 8580
ccatgggcac tcgtgatcca ggacagcaac ggcgagaaca agatcaagat gctctccggc 8640
ggctccaaga ggaccgccga cggcagcgag ttcgagccga agaagaagag gaaggtgtag 8700
actagttcag ccagtttggt ggagctgccg atgtgcctgg tcgtcccgag cctctgttcg 8760
tcaagtattt gtggtgctga tgtctacttg tgtctggttt aatggaccat cgagtccgta 8820
tgatatgtta gttttatgaa acagtttcct gtgggacagc agtatgcttt atgaataagt 8880
tggatttgaa cctaaatatg tgctcaattt gctcatttgc atctcattcc tgttgatgtt 8940
ttatctgagt tgcaagtttg aaaatgctgc atattcttat taaatcgtca tttactttta 9000
tcttaatgag ctttgcaatg gcctatggga tataaaagag atcgttcaaa catttggcaa 9060
taaagtttct taagattgaa tcctgttgcc ggtcttgcga tgattatcat ataatttctg 9120
ttgaattacg ttaagcatgt aataattaac atgtaatgca tgacgttatt tatgagatgg 9180
gtttttatga ttagagtccc gcaattatac atttaatacg cgatagaaaa caaaatatag 9240
cgcgcaaact aggataaatt atcgcgcgcg gtgtcatcta tgttactaga tcggcgcctg 9300
tccgggcgcg cctggtggat cgtccgccta ggctgcagtg cagcgtgacc cggtcgtgcc 9360
cctctctaga gataatgagc attgcatgtc taagttataa aaaattacca catatttttt 9420
ttgtcacact tgtttgaagt gcagtttatc tatctttata catatattta aactttactc 9480
tacgaataat ataatctata gtactacaat aatatcagtg ttttagagaa tcatataaat 9540
gaacagttag acatggtcta aaggacaatt gagtattttg acaacaggac tctacagttt 9600
tatcttttta gtgtgcatgt gttctccttt ttttttgcaa atagcttcac ctatataata 9660
cttcatccat tttattagta catccattta gggtttaggg ttaatggttt ttatagacta 9720
atttttttag tacatctatt ttattctatt ttagcctcta aattaagaaa actaaaactc 9780
tattttagtt tttttattta ataatttaga tataaaatag aataaaataa agtgactaaa 9840
aattaaacaa atacccttta agaaattaaa aaaactaagg aaacattttt cttgtttcga 9900
gtagataatg ccagcctgtt aaacgccgtc gacgagtcta acggacacca accagcgaac 9960
cagcagcgtc gcgtcgggcc aagcgaagca gacggcacgg catctctgtc gctgcctctg 10020
gacccctctc gagagttccg ctccaccgtt ggacttgctc cgctgtcggc atccagaaat 10080
tgcgtggcgg agcggcagac gtgagccggc acggcaggcg gcctcctcct cctctcacgg 10140
caccggcagc tacgggggat tcctttccca ccgctccttc gctttccctt cctcgcccgc 10200
cgtaataaat agacaccccc tccacaccct ctttccccaa cctcgtgttg ttcggagcgc 10260
acacacacac aaccagatct cccccaaatc cacccgtcgg cacctccgct tcaaggtacg 10320
ccgctcgtcc tccccccccc cccctctcta ccttctctag atcggcgttc cggtccatgg 10380
ttagggcccg gtagttctac ttctgttcat gtttgtgtta gatccgtgtt tgtgttagat 10440
ccgtgctgct agcgttcgta cacggatgcg acctgtacgt cagacacgtt ctgattgcta 10500
acttgccagt gtttctcttt ggggaatcct gggatggctc tagccgttcc gcagacggga 10560
tcgatttcat gatttttttt gtttcgttgc atagggtttg gtttgccctt ttcctttatt 10620
tcaatatatg ccgtgcactt gtttgtcggg tcatcttttc atgctttttt ttgtcttggt 10680
tgtgatgatg tggtctggtt gggcggtcgt tctagatcgg agtagaattc tgtttcaaac 10740
tacctggtgg atttattaat tttggatctg tatgtgtgtg ccatacatat tcatagttac 10800
gaattgaaga tgatggatgg aaatatcgat ctaggatagg tatacatgtt gatgcgggtt 10860
ttactgatgc atatacagag atgctttttg ttcgcttggt tgtgatgatg tggtgtggtt 10920
gggcggtcgt tcattcgttc tagatcggag tagaatactg tttcaaacta cctggtgtat 10980
ttattaattt tggaactgta tgtgtgtgtc atacatcttc atagttacga gtttaagatg 11040
gatggaaata tcgatctagg ataggtatac atgttgatgt gggttttact gatgcatata 11100
catgatggca tatgcagcat ctattcatat gctctaacct tgagtaccta tctattataa 11160
taaacaagta tgttttataa ttattttgat cttgatatac ttggatgatg gcatatgcag 11220
cagctatatg tggatttttt tagccctgcc ttcatacgct atttatttgc ttggtactgt 11280
ttcttttgtc gatgctcacc ctgttgtttg gtgttacttc tgcaggagct catgaaaaag 11340
cctgaactca ccgcgacgtc tgtcgagaag tttctgatcg aaaagttcga cagcgtctcc 11400
gacctgatgc agctctcgga gggcgaagaa tctcgtgctt tcagcttcga tgtaggaggg 11460
cgtggatatg tcctgcgggt aaatagctgc gccgatggtt tctacaaaga tcgttatgtt 11520
tatcggcact ttgcatcggc cgcgctcccg attccggaag tgcttgacat tggggagttt 11580
agcgagagcc tgacctattg catctcccgc cgttcacagg gtgtcacgtt gcaagacctg 11640
cctgaaaccg aactgcccgc tgttctacaa ccggtcgcgg aggctatgga tgcgatcgct 11700
gcggccgatc ttagccagac gagcgggttc ggcccattcg gaccgcaagg aatcggtcaa 11760
tacactacat ggcgtgattt catatgcgcg attgctgatc cccatgtgta tcactggcaa 11820
actgtgatgg acgacaccgt cagtgcgtcc gtcgcgcagg ctctcgatga gctgatgctt 11880
tgggccgagg actgccccga agtccggcac ctcgtgcacg cggatttcgg ctccaacaat 11940
gtcctgacgg acaatggccg cataacagcg gtcattgact ggagcgaggc gatgttcggg 12000
gattcccaat acgaggtcgc caacatcttc ttctggaggc cgtggttggc ttgtatggag 12060
cagcagacgc gctacttcga gcggaggcat ccggagcttg caggatcgcc acgactccgg 12120
gcgtatatgc tccgcattgg tcttgaccaa ctctatcaga gcttggttga cggcaatttc 12180
gatgatgcag cttgggcgca gggtcgatgc gacgcaatcg tccgatccgg agccgggact 12240
gtcgggcgta cacaaatcgc ccgcagaagc gcggccgtct ggaccgatgg ctgtgtagaa 12300
gtactcgccg atagtggaaa ccgacgcccc agcactcgtc cgagggcaaa gaaatagagt 12360
agatgccgac cgggatctgt cgatcgacaa gctcgagttt ctccataata atgtgtgagt 12420
agttcccaga taagggaatt agggttccta tagggtttcg ctcatgtgtt gagcatataa 12480
gaaaccctta gtatgtattt gtatttgtaa aatacttcta tcaataaaat ttctaattcc 12540
taaaaccaaa atccagtact aaaatccaga tcccccgaat taattcggcg ttaattcagc 12600
ctgcaggacg cgtttaatta agtgcacgcg gccgcctact tagtcaagag cctcgcacgc 12660
gactgtcacg cggccaggat cgcctcgtga gcctcgcaat ctgtacctag tgtttaaact 12720
atcagtgttt gacaggatat attggcgggt aaacctaaga gaaaagagcg tttattagaa 12780
taacggatat ttaaaagggc gtgaaaaggt ttatccgttc gtccatttgt atgtgcatgc 12840
caaccacagg gttcccctcg ggatcaaagt actttgatcc aacccctccg ctgctatagt 12900
gcagtcggct tctgacgttc agtgcagccg tcttctgaaa acgacatgtc gcacaagtcc 12960
taagttacgc gacaggctgc cgccctgccc ttttcctggc gttttcttgt cgcgtgtttt 13020
agtcgcataa agtagaatac ttgcgactag aaccggagac attacgccat gaacaagagc 13080
gccgccgctg gcctgctggg ctatgcccgc gtcagcaccg acgaccagga cttgaccaac 13140
caacgggccg aactgcacgc ggccggctgc accaagctgt tttccgagaa gatcaccggc 13200
accaggcgcg accgcccgga gctggccagg atgcttgacc acctacgccc tggcgacgtt 13260
gtgacagtga ccaggctaga ccgcctggcc cgcagcaccc gcgacctact ggacattgcc 13320
gagcgcatcc aggaggccgg cgcgggcctg cgtagcctgg cagagccgtg ggccgacacc 13380
accacgccgg ccggccgcat ggtgttgacc gtgttcgccg gcattgccga gttcgagcgt 13440
tccctaatca tcgaccgcac ccggagcggg cgcgaggccg ccaaggcccg aggcgtgaag 13500
tttggccccc gccctaccct caccccggca cagatcgcgc acgcccgcga gctgatcgac 13560
caggaaggcc gcaccgtgaa agaggcggct gcactgcttg gcgtgcatcg ctcgaccctg 13620
taccgcgcac ttgagcgcag cgaggaagtg acgcccaccg aggccaggcg gcgcggtgcc 13680
ttccgtgagg acgcattgac cgaggccgac gccctggcgg ccgccgagaa tgaacgccaa 13740
gaggaacaag catgaaaccg caccaggacg gccaggacga accgtttttc attaccgaag 13800
agatcgaggc ggagatgatc gcggccgggt acgtgttcga gccgcccgcg cacgtctcaa 13860
ccgtgcggct gcatgaaatc ctggccggtt tgtctgatgc caagctggcg gcctggccgg 13920
ccagcttggc cgctgaagaa accgagcgcc gccgtctaaa aaggtgatgt gtatttgagt 13980
aaaacagctt gcgtcatgcg gtcgctgcgt atatgatgcg atgagtaaat aaacaaatac 14040
gcaaggggaa cgcatgaagg ttatcgctgt acttaaccag aaaggcgggt caggcaagac 14100
gaccatcgca acccatctag cccgcgccct gcaactcgcc ggggccgatg ttctgttagt 14160
cgattccgat ccccagggca gtgcccgcga ttgggcggcc gtgcgggaag atcaaccgct 14220
aaccgttgtc ggcatcgacc gcccgacgat tgaccgcgac gtgaaggcca tcggccggcg 14280
cgacttcgta gtgatcgacg gagcgcccca ggcggcggac ttggctgtgt ccgcgatcaa 14340
ggcagccgac ttcgtgctga ttccggtgca gccaagccct tacgacatat gggccaccgc 14400
cgacctggtg gagctggtta agcagcgcat tgaggtcacg gatggaaggc tacaagcggc 14460
ctttgtcgtg tcgcgggcga tcaaaggcac gcgcatcggc ggtgaggttg ccgaggcgct 14520
ggccgggtac gagctgccca ttcttgagtc ccgtatcacg cagcgcgtga gctacccagg 14580
cactgccgcc gccggcacaa ccgttcttga atcagaaccc gagggcgacg ctgcccgcga 14640
ggtccaggcg ctggccgctg aaattaaatc aaaactcatt tgagttaatg aggtaaagag 14700
aaaatgagca aaagcacaaa cacgctaagt gccggccgtc cgagcgcacg cagcagcaag 14760
gctgcaacgt tggccagcct ggcagacacg ccagccatga agcgggtcaa ctttcagttg 14820
ccggcggagg atcacaccaa gctgaagatg tacgcggtac gccaaggcaa gaccattacc 14880
gagctgctat ctgaatacat cgcgcagcta ccagagtaaa tgagcaaatg aataaatgag 14940
tagatgaatt ttagcggcta aaggaggcgg catggaaaat caagaacaac caggcaccga 15000
cgccgtggaa tgccccatgt gtggaggaac gggcggttgg ccaggcgtaa gcggctgggt 15060
tgtctgccgg ccctgcaatg gcactggaac ccccaagccc gaggaatcgg cgtgacggtc 15120
gcaaaccatc cggcccggta caaatcggcg cggcgctggg tgatgacctg gtggagaagt 15180
tgaaggccgc gcaggccgcc cagcggcaac gcatcgaggc agaagcacgc cccggtgaat 15240
cgtggcaagc ggccgctgat cgaatccgca aagaatcccg gcaaccgccg gcagccggtg 15300
cgccgtcgat taggaagccg cccaagggcg acgagcaacc agattttttc gttccgatgc 15360
tctatgacgt gggcacccgc gatagtcgca gcatcatgga cgtggccgtt ttccgtctgt 15420
cgaagcgtga ccgacgagct ggcgaggtga tccgctacga gcttccagac gggcacgtag 15480
aggtttccgc agggccggcc ggcatggcca gtgtgtggga ttacgacctg gtactgatgg 15540
cggtttccca tctaaccgaa tccatgaacc gataccggga agggaaggga gacaagcccg 15600
gccgcgtgtt ccgtccacac gttgcggacg tactcaagtt ctgccggcga gccgatggcg 15660
gaaagcagaa agacgacctg gtagaaacct gcattcggtt aaacaccacg cacgttgcca 15720
tgcagcgtac gaagaaggcc aagaacggcc gcctggtgac ggtatccgag ggtgaagcct 15780
tgattagccg ctacaagatc gtaaagagcg aaaccgggcg gccggagtac atcgagatcg 15840
agctagctga ttggatgtac cgcgagatca cagaaggcaa gaacccggac gtgctgacgg 15900
ttcaccccga ttactttttg atcgatcccg gcatcggccg ttttctctac cgcctggcac 15960
gccgcgccgc aggcaaggca gaagccagat ggttgttcaa gacgatctac gaacgcagtg 16020
gcagcgccgg agagttcaag aagttctgtt tcaccgtgcg caagctgatc gggtcaaatg 16080
acctgccgga gtacgatttg aaggaggagg cggggcaggc tggcccgatc ctagtcatgc 16140
gctaccgcaa cctgatcgag ggcgaagcat ccgccggttc ctaatgtacg gagcagatgc 16200
tagggcaaat tgccctagca ggggaaaaag gtcgaaaagg tctctttcct gtggatagca 16260
cgtacattgg gaacccaaag ccgtacattg ggaaccggaa cccgtacatt gggaacccaa 16320
agccgtacat tgggaaccgg tcacacatgt aagtgactga tataaaagag aaaaaaggcg 16380
atttttccgc ctaaaactct ttaaaactta ttaaaactct taaaacccgc ctggcctgtg 16440
cataactgtc tggccagcgc acagccgaag agctgcaaaa agcgcctacc cttcggtcgc 16500
tgcgctccct acgccccgcc gcttcgcgtc ggcctatcgc ggccgctggc cgctcaaaaa 16560
tggctggcct acggccaggc aatctaccag ggcgcggaca agccgcgccg tcgccactcg 16620
accgccggcg cccacatcaa ggcaccctgc ctcgcgcgtt tcggtgatga cggtgaaaac 16680
ctctgacaca tgcagctccc ggagacggtc acagcttgtc tgtaagcgga tgccgggagc 16740
agacaagccc gtcagggcgc gtcagcgggt gttggcgggt gtcggggcgc agccatgacc 16800
cagtcacgta gcgatagcgg agtgtatact ggcttaacta tgcggcatca gagcagattg 16860
tactgagagt gcaccatatg cggtgtgaaa taccgcacag atgcgtaagg agaaaatacc 16920
gcatcaggcg ctcttccgct tcctcgctca ctgactcgct gcgctcggtc gttcggctgc 16980
ggcgagcggt atcagctcac tcaaaggcgg taatacggtt atccacagaa tcaggggata 17040
acgcaggaaa gaacatgtga gcaaaaggcc agcaaaaggc caggaaccgt aaaaaggccg 17100
cgttgctggc gtttttccat aggctccgcc cccctgacga gcatcacaaa aatcgacgct 17160
caagtcagag gtggcgaaac ccgacaggac tataaagata ccaggcgttt ccccctggaa 17220
gctccctcgt gcgctctcct gttccgaccc tgccgcttac cggatacctg tccgcctttc 17280
tcccttcggg aagcgtggcg ctttctcata gctcacgctg taggtatctc agttcggtgt 17340
aggtcgttcg ctccaagctg ggctgtgtgc acgaaccccc cgttcagccc gaccgctgcg 17400
ccttatccgg taactatcgt cttgagtcca acccggtaag acacgactta tcgccactgg 17460
cagcagccac tggtaacagg attagcagag cgaggtatgt aggcggtgct acagagttct 17520
tgaagtggtg gcctaactac ggctacacta gaaggacagt atttggtatc tgcgctctgc 17580
tgaagccagt taccttcgga aaaagagttg gtagctcttg atccggcaaa caaaccaccg 17640
ctggtagcgg tggttttttt gtttgcaagc agcagattac gcgcagaaaa aaaggatctc 17700
aagaagatcc tttgatcttt tctacggggt ctgacgctca gtggaacgaa aactcacgtt 17760
aagggatttt ggtcatgcat tctaggtact aaaacaattc atccagtaaa atataatatt 17820
ttattttctc ccaatcaggc ttgatcccca gtaagtcaaa aaatagctcg acatactgtt 17880
cttccccgat atcctccctg atcgaccgga cgcagaaggc aatgtcatac cacttgtccg 17940
ccctgccgct tctcccaaga tcaataaagc cacttacttt gccatctttc acaaagatgt 18000
tgctgtctcc caggtcgccg tgggaaaaga caagttcctc ttcgggcttt tccgtcttta 18060
aaaaatcata cagctcgcgc ggatctttaa atggagtgtc ttcttcccag ttttcgcaat 18120
ccacatcggc cagatcgtta ttcagtaagt aatccaattc ggctaagcgg ctgtctaagc 18180
tattcgtata gggacaatcc gatatgtcga tggagtgaaa gagcctgatg cactccgcat 18240
acagctcgat aatcttttca gggctttgtt catcttcata ctcttccgag caaaggacgc 18300
catcggcctc actcatgagc agattgctcc agccatcatg ccgttcaaag tgcaggacct 18360
ttggaacagg cagctttcct tccagccata gcatcatgtc cttttcccgt tccacatcat 18420
aggtggtccc tttataccgg ctgtccgtca tttttaaata taggttttca ttttctccca 18480
ccagcttata taccttagca ggagacattc cttccgtatc ttttacgcag cggtattttt 18540
cgatcagttt tttcaattcc ggtgatattc tcattttagc catttattat ttccttcctc 18600
ttttctacag tatttaaaga taccccaaga agctaattat aacaagacga actccaattc 18660
actgttcctt gcattctaaa accttaaata ccagaaaaca gctttttcaa agttgttttc 18720
aaagttggcg tataacatag tatcgacgga gccgattttg aaaccgcggt gatcacaggc 18780
agcaacgctc tgtcatcgtt acaatcaaca tgctaccctc cgcgagatca tccgtgtttc 18840
aaacccggca gcttagttgc cgttcttccg aatagcatcg gtaacatgag caaagtctgc 18900
cgccttacaa cggctctccc gctgacgccg tcccggactg atgggctgcc tgtatcgagt 18960
ggtgattttg tgccgagctg ccggtcgggg agctgttggc tggct 19005
<210>2
<211>228
<212>PRT
<213>Artificial Sequence
<400>2
Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro Thr Leu Arg Arg Arg
1 5 10 15
Ile Glu Pro His Glu Phe Glu Val Phe Phe Asp Pro Arg Glu Leu Arg
20 25 30
Lys Glu Thr Cys Leu Leu Tyr Glu Ile Asn Trp Gly Gly Arg His Ser
35 40 45
Ile Trp Arg His Thr Ser Gln Asn Thr Asn Lys His Val Glu Val Asn
50 55 60
Phe Ile Glu Lys Phe Thr Thr Glu Arg Tyr Phe Cys Pro Asn Thr Arg
65 70 75 80
Cys Ser Ile Thr Trp Phe Leu Ser Trp Ser Pro Cys Gly Glu Cys Ser
85 90 95
Arg Ala Ile Thr Glu Phe Leu Ser Arg Tyr Pro His Val Thr Leu Phe
100 105 110
Ile Tyr Ile Ala Arg Leu Tyr His His Ala Asp Pro Arg Asn Arg Gln
115 120 125
Gly Leu Arg Asp Leu Ile Ser Ser Gly Val Thr Ile Gln Ile Met Thr
130 135 140
Glu Gln Glu Ser Gly Tyr Cys Trp Arg Asn Phe Val Asn Tyr Ser Pro
145 150 155 160
Ser Asn Glu Ala His Trp Pro Arg Tyr Pro His Leu Trp Val Arg Leu
165 170 175
Tyr Val Leu Glu Leu Tyr Cys Ile Ile Leu Gly Leu Pro Pro Cys Leu
180 185 190
Asn Ile Leu Arg Arg Lys Gln Pro Gln Leu Thr Phe Phe Thr Ile Ala
195 200 205
Leu Gln Ser Cys His Tyr Gln Arg Leu Pro Pro His Ile Leu Trp Ala
210 215 220
Thr Gly Leu Lys
225
<210>3
<211>1367
<212>PRT
<213>Artificial Sequence
<400>3
Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser Val Gly
1 5 10 15
Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys
20 25 30
Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile Gly
35 40 45
Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys
50 55 60
Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr
65 70 75 80
Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser Phe
85 90 95
Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys His
100 105 110
Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr His
115 120 125
Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp Ser
130 135 140
Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His Met
145 150 155 160
Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp
165 170 175
Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn
180 185 190
Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala Lys
195 200 205
Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu
210 215 220
Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu
225 230 235 240
Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp
245 250 255
Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp
260 265 270
Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu
275 280 285
Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile
290 295 300
Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser Met
305 310 315 320
Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys Ala
325 330 335
Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp
340 345 350
Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln
355 360 365
Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp Gly
370 375 380
Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys
385 390 395 400
Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu Gly
405 410 415
Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu
420 425 430
Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro
435 440 445
Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp Met
450 455 460
Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu Val
465 470 475 480
Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr Asn
485 490 495
Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser Leu
500 505 510
Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr
515 520 525
Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys
530 535 540
Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr Val
545 550 555 560
Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser
565 570 575
Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr
580 585 590
Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn
595 600 605
Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr Leu
610 615 620
Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala His
625 630 635 640
Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr
645 650 655
Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys
660 665 670
Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala
675 680 685
Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe Lys
690 695 700
Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu His
705 710 715 720
Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile
725 730 735
Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly Arg
740 745 750
His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln Thr
755 760 765
Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile Glu
770 775 780
Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro Val
785 790 795 800
Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln
805 810 815
Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu
820 825 830
Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp
835 840 845
Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly
850 855 860
Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn
865 870 875 880
Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe
885 890 895
Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys
900 905 910
Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr Lys
915 920 925
His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu
930 935 940
Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser Lys
945 950 955 960
Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu
965 970 975
Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val Val
980 985 990
Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val
995 1000 1005
Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys
1010 1015 1020
Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr
1025 1030 1035
Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn
1040 1045 1050
Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr
1055 1060 1065
Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg
1070 1075 1080
Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu
1085 1090 1095
Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg
1100 1105 1110
Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys
1115 1120 1125
Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu
1130 1135 1140
Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser
1145 1150 1155
Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe
1160 1165 1170
Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu
1175 1180 1185
Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe
1190 1195 1200
Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu
1205 1210 1215
Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn
1220 1225 1230
Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro
1235 1240 1245
Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His
1250 1255 1260
Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg
1265 1270 1275
Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr
1280 1285 1290
Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile
1295 1300 1305
Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe
1310 1315 1320
Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr
1325 1330 1335
Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly
1340 1345 1350
Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp
1355 1360 1365
<210>4
<211>83
<212>PRT
<213>Artificial Sequence
<400>4
Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu Val
1 5 10 15
Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu Val Ile
20 25 30
Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr Asp Glu
35 40 45
Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp Ala Pro Glu Tyr
50 55 60
Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn Gly Glu Asn Lys Ile
65 70 75 80
Lys Met Leu
<210>5
<211>117
<212>DNA
<213>Artificial Sequence
<400>5
gactacaagg accacgacgg cgactacaag gatcatgaca tcgactacaa ggacgacgac 60
gacaagatgg ccccgaagaa gaagaggaaa gtgggcatcc acggcgtgcc ggccgcc 117
<210>6
<211>207
<212>DNA
<213>Artificial Sequence
<400>6
gactacaagg accacgacgg ggattacaaa gaccacgaca tagactacaa ggatgacgat 60
gacaaaatgg caccgaagaa aaaaaggaag gtcggcggct ccccgaagaa aaaaaggaag 120
gtcggcggct ccccgaagaa aaaaaggaag gtcggcggct ccccgaagaa aaaaaggaag 180
gtcggaatcc atggcgttcc agctgcc 207
<210>7
<211>17
<212>PRT
<213>Artificial Sequence
<400>7
Lys Arg Thr Ala Asp Gly Ser Glu Phe Glu Pro Lys Lys Lys Arg Lys
1 5 10 15
Val
<210>8
<211>31
<212>PRT
<213>Artificial Sequence
<400>8
Asp Tyr Lys Asp His Asp Gly Asp Tyr Lys Asp His Asp Ile Asp Tyr
1 5 10 15
Lys Asp Asp Asp Asp Lys Met Ala Pro Lys Lys Lys Arg Lys Val
20 25 30

Claims (14)

1. A kit comprising sgRNA, Cas9 nuclease, deaminase, nuclear localization signal a, nuclear localization signal b, UGI protein;
the Cas9 nuclease is a Cas9n protein;
the deaminase is rAPOBEC1 protein;
the kit is obtained by expressing a recombinant expression vector, wherein the recombinant expression vector comprises an expression cassette which consists of a promoter, a coding gene of a nuclear localization signal A, a coding gene of cytosine deaminase rAPOBEC1, a coding gene of Cas9n nuclease, a coding gene of UGI, a coding gene of a nuclear localization signal B and a terminator in sequence;
the nuclear localization signal A comprises 13 Flag1 tag protein and 1 NLS1 protein, and the amino acid sequence of the nuclear localization signal A is shown as a sequence 8;
the nuclear localization signal B comprises 1 bpNLS protein, and the amino acid sequence of the nuclear localization signal B is shown as a sequence 7.
2. The kit of claim 1, wherein:
the nucleic acid molecule for coding the nuclear localization signal A is a DNA molecule shown in 1 st to 93 th sites of a sequence 5 in a sequence table;
the nucleic acid molecule for coding the nuclear localization signal B is a DNA molecule shown in 8647-8697 th site of the sequence 1 in the sequence table.
3. The kit of claim 1 or 2, wherein:
the amino acid sequence of the Cas9n protein is shown as sequence 3;
the amino acid sequence of the rAPOBEC1 protein is shown as a sequence 2;
the sgRNA targets a target sequence;
the sgRNA is tRNA-sgRNA;
the tRNA-sgRNA structure is as follows: tRNA-RNA transcribed from the target sequence-esgRNA backbone;
the tRNA is an RNA molecule obtained by replacing T in the 474-550 th position of the sequence 1 with U;
the esgRNA framework is an RNA molecule obtained by replacing T in the 571-656 position of the sequence 1 with U.
4. The kit of claim 3, wherein: the amino acid sequence of the UGI protein is shown as a sequence 4.
5. Use of the kit of any one of claims 1 to 4 in any one of S1) -S4):
s1) editing of a genomic target sequence of an organism or cell of an organism;
s2) preparing an edited product of a genomic target sequence of the organism or cell of the organism;
s3) improving the efficiency of editing a genomic target sequence of an organism or a cell of an organism;
s4) preparing a product for improving the editing efficiency of the genome target sequence of the organism or the organism cell;
the organism is a plant; the biological cell is a plant cell.
6. Use according to claim 5, characterized in that: the target sequence is edited by mutating a base C into a base T.
7. Use according to claim 5 or 6, characterized in that: the plant is a monocotyledon or a dicotyledon;
the plant cell is a monocotyledon cell or a dicotyledon cell.
8. Use according to claim 7, characterized in that: the monocotyledon is a gramineous plant; the monocotyledon cell is a gramineae plant cell.
9. Use according to claim 8, characterized in that: the gramineous plant is rice; the gramineous plant cell is a rice cell.
10, T1) or T2):
t1) or a method for increasing the efficiency of editing a genomic target sequence of an organism or cell comprising the steps of: causing an organism or biological cell to express a nuclear localization signal a as described in claims 1-5, a nuclear localization signal b as described in claims 1-5, a sgRNA as described in claims 1-5, a Cas9 nuclease as described in claims 1-5, a deaminase as described in claims 1-5; the sgRNA targets the target sequence;
the organism or the biological cell expresses a nuclear localization signal A as described in claims 1 to 5, a nuclear localization signal B as described in claims 1 to 5, a sgRNA as described in claims 1 to 5, a Cas9 nuclease as described in claims 1 to 5, a deaminase as described in claims 1 to 5 by introducing a gene encoding the nuclear localization signal A, a gene encoding the nuclear localization signal B, a DNA molecule transcribing the sgRNA, a gene encoding the Cas9 nuclease, a gene encoding the cytosine deaminase, and a gene encoding UGI into the organism or the biological cell;
the coding gene of the nuclear localization signal A, the coding gene of the nuclear localization signal B, a DNA molecule for transcribing the sgRNA, the coding gene of the Cas9 nuclease, the coding gene of the cytosine deaminase and the coding gene of the UGI are introduced into an organism or an organism cell through a recombinant expression vector;
the recombinant expression vector comprises an expression cassette which consists of a promoter, a coding gene of a nuclear localization signal A, a coding gene of cytosine deaminase rAPOBEC1, a coding gene of Cas9n nuclease, a coding gene of UGI, a coding gene of a nuclear localization signal B and a terminator in sequence;
t2) biological mutant, comprising the following steps: editing the genome of the organism according to the method of T1) to obtain a biological mutant;
the organism is a plant; the biological cell is a plant cell.
11. The method of claim 10, wherein: the target sequence is edited by mutating a base C into a base T.
12. The method according to claim 10 or 11, characterized in that: the plant is a monocotyledon or a dicotyledon;
the plant cell is a monocotyledon cell or a dicotyledon cell.
13. The method of claim 12, wherein: the monocotyledon is a gramineous plant; the monocotyledon cell is a gramineae plant cell.
14. The method of claim 13, wherein: the gramineous plant is rice; the gramineous plant cell is a rice cell.
CN201911323189.7A 2019-12-20 2019-12-20 Nuclear localization signal FNB and application thereof in improving base editing efficiency Active CN110964741B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911323189.7A CN110964741B (en) 2019-12-20 2019-12-20 Nuclear localization signal FNB and application thereof in improving base editing efficiency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911323189.7A CN110964741B (en) 2019-12-20 2019-12-20 Nuclear localization signal FNB and application thereof in improving base editing efficiency

Publications (2)

Publication Number Publication Date
CN110964741A CN110964741A (en) 2020-04-07
CN110964741B true CN110964741B (en) 2022-03-01

Family

ID=70035435

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911323189.7A Active CN110964741B (en) 2019-12-20 2019-12-20 Nuclear localization signal FNB and application thereof in improving base editing efficiency

Country Status (1)

Country Link
CN (1) CN110964741B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019042284A1 (en) * 2017-09-01 2019-03-07 Shanghaitech University Fusion proteins for improved precision in base editing
CN109957569A (en) * 2017-12-22 2019-07-02 中国科学院遗传与发育生物学研究所 Base editing system and method based on CPF1 albumen
KR20190122596A (en) * 2018-04-20 2019-10-30 기초과학연구원 Gene Construct for Base Editing, Vector Comprising the Same and Method for Base Editing Using the Same
CN110551752A (en) * 2019-08-30 2019-12-10 北京市农林科学院 xCas9n-epBE base editing system and application thereof in genome base replacement
CN110564752A (en) * 2019-09-30 2019-12-13 北京市农林科学院 Application of differential agent technology in enrichment of C.T base substitution cells

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019042284A1 (en) * 2017-09-01 2019-03-07 Shanghaitech University Fusion proteins for improved precision in base editing
CN109957569A (en) * 2017-12-22 2019-07-02 中国科学院遗传与发育生物学研究所 Base editing system and method based on CPF1 albumen
KR20190122596A (en) * 2018-04-20 2019-10-30 기초과학연구원 Gene Construct for Base Editing, Vector Comprising the Same and Method for Base Editing Using the Same
CN110551752A (en) * 2019-08-30 2019-12-10 北京市农林科学院 xCas9n-epBE base editing system and application thereof in genome base replacement
CN110564752A (en) * 2019-09-30 2019-12-13 北京市农林科学院 Application of differential agent technology in enrichment of C.T base substitution cells

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Genome-wide target specificities of CRISPR RNA-guided programmable deaminases;Daesik Kim等;《Nat Biotechnol》;20170410;第35卷(第5期);第475-480页 *
Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction;Luke W Koblan等;《Nat Biotechnol》;20181031;第36卷(第9期);第843-846页 *
Optimized base editors enable efficient editing in cells, organoids and mice;Maria Paz Zafra等;《Nat Biotechnol》;20180703;第36卷(第9期);第888-893页 *
碱基编辑系统研究进展;宗媛等;《遗传》;20190930;第41卷(第9期);第777-800页 *

Also Published As

Publication number Publication date
CN110964741A (en) 2020-04-07

Similar Documents

Publication Publication Date Title
CN108368517B (en) Methods and compositions for rapid plant transformation
KR101222628B1 (en) Method for modulating gene expression by modifying the CpG content
CN109456973A (en) Application of the SpCas9n&amp;PmCDA1&amp;UGI base editing system in plant gene editor
CN105838733A (en) Cas9 mediated carnation gene editing carrier and application
CN109679949B (en) Breeding method for regulating miR156 and target gene IPA1 thereof and simultaneously improving disease resistance and yield of rice
CN111909953B (en) Recombinant vector for Phellinus linteus genetic expression, construction method and genetic transformation method
CN110951736B (en) Nuclear localization signal F4NLS and application thereof in improving base editing efficiency and expanding editable base range
CN107326043B (en) Construction and use method of multifunctional vector
CN108165579B (en) Optimized method for identifying VIGS silencing system of China rose RhPDS gene
CN113512562B (en) Method for improving plant stress resistance and yield by heterogeneously synthesizing gamma-polyglutamic acid in plant
CN110964741B (en) Nuclear localization signal FNB and application thereof in improving base editing efficiency
CN101818151B (en) Specific promoter of soybean seeds and use thereof
CN114774427B (en) Recombinant gene for improving luteolin content in honeysuckle and application thereof
CN110923235B (en) Non-coding gene for controlling corn grain filling and application thereof
CN112708633B (en) CRISPR-Cas9 gene editing system containing corn seed fluorescent reporter group and application
CN110872584B (en) Barley alpha-amylase and coding gene and application thereof
CN106755059B (en) Backbone plasmid vector for genetic engineering and application
CN113122556B (en) Oscillating gene expression system, construction method and application thereof in rhamnolipid fermentation
CN112011563A (en) AI-2-based quorum sensing self-induced protein expression vector and application
CN110257444B (en) Method for producing medium-chain fatty acid in plant cells
CN109797165B (en) Method for improving yield of dibasic acid by traceless editing technology
CN110923262B (en) Sorghum alpha-amylase and coding gene and application thereof
CN114457082B (en) Pepper NaCl-induced promoter, recombinant vector and application thereof
CN114561388B (en) Exogenous ABA (abscisic acid) inducible promoter of capsicum, expression vector and application of exogenous ABA inducible promoter
AU2013227286A1 (en) Expression cassettes for stress-induced gene expression in plants

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant