US20090068732A1 - Directed evolution methods for improving polypeptide folding and solubility and superfolder fluorescent proteins generated thereby - Google Patents
Directed evolution methods for improving polypeptide folding and solubility and superfolder fluorescent proteins generated thereby Download PDFInfo
- Publication number
- US20090068732A1 US20090068732A1 US11/900,551 US90055107A US2009068732A1 US 20090068732 A1 US20090068732 A1 US 20090068732A1 US 90055107 A US90055107 A US 90055107A US 2009068732 A1 US2009068732 A1 US 2009068732A1
- Authority
- US
- United States
- Prior art keywords
- protein
- folding
- nucleic acid
- fluorescent protein
- polypeptide
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 108090000765 processed proteins & peptides Proteins 0.000 title claims abstract description 179
- 102000004196 processed proteins & peptides Human genes 0.000 title claims abstract description 156
- 229920001184 polypeptide Polymers 0.000 title claims abstract description 152
- 238000000034 method Methods 0.000 title claims abstract description 113
- 102000034287 fluorescent proteins Human genes 0.000 title claims description 49
- 108091006047 fluorescent proteins Proteins 0.000 title claims description 49
- 108010043121 Green Fluorescent Proteins Proteins 0.000 claims abstract description 159
- 102000004144 Green Fluorescent Proteins Human genes 0.000 claims abstract description 157
- 108020001507 fusion proteins Proteins 0.000 claims abstract description 57
- 102000037865 fusion proteins Human genes 0.000 claims abstract description 56
- 108010054624 red fluorescent protein Proteins 0.000 claims abstract description 10
- 108090000623 proteins and genes Proteins 0.000 claims description 237
- 102000004169 proteins and genes Human genes 0.000 claims description 217
- 235000018102 proteins Nutrition 0.000 claims description 210
- 239000005090 green fluorescent protein Substances 0.000 claims description 173
- 150000007523 nucleic acids Chemical class 0.000 claims description 110
- 102000039446 nucleic acids Human genes 0.000 claims description 95
- 108020004707 nucleic acids Proteins 0.000 claims description 95
- 235000001014 amino acid Nutrition 0.000 claims description 82
- 238000006467 substitution reaction Methods 0.000 claims description 48
- 230000000694 effects Effects 0.000 claims description 47
- XSQUKJJJFZCRTK-UHFFFAOYSA-N Urea Chemical compound NC(N)=O XSQUKJJJFZCRTK-UHFFFAOYSA-N 0.000 claims description 28
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 28
- 102000008857 Ferritin Human genes 0.000 claims description 16
- 108050000784 Ferritin Proteins 0.000 claims description 16
- 238000008416 Ferritin Methods 0.000 claims description 16
- KZSNJWFQEVHDMF-UHFFFAOYSA-N Valine Natural products CC(C)C(N)C(O)=O KZSNJWFQEVHDMF-UHFFFAOYSA-N 0.000 claims description 15
- 239000004202 carbamide Substances 0.000 claims description 14
- 239000004474 valine Substances 0.000 claims description 14
- 239000013604 expression vector Substances 0.000 claims description 13
- 108091033319 polynucleotide Proteins 0.000 claims description 12
- 102000040430 polynucleotide Human genes 0.000 claims description 12
- 239000002157 polynucleotide Substances 0.000 claims description 12
- 125000002987 valine group Chemical group [H]N([H])C([H])(C(*)=O)C([H])(C([H])([H])[H])C([H])([H])[H] 0.000 claims description 10
- 239000004475 Arginine Substances 0.000 claims description 9
- DCXYFEDJOCDNAF-UHFFFAOYSA-N Asparagine Natural products OC(=O)C(N)CC(N)=O DCXYFEDJOCDNAF-UHFFFAOYSA-N 0.000 claims description 9
- AYFVYJQAPQTCCC-UHFFFAOYSA-N Threonine Natural products CC(O)C(N)C(O)=O AYFVYJQAPQTCCC-UHFFFAOYSA-N 0.000 claims description 9
- 239000004473 Threonine Substances 0.000 claims description 9
- ODKSFYDXXFIFQN-UHFFFAOYSA-N arginine Natural products OC(=O)C(N)CCCNC(N)=N ODKSFYDXXFIFQN-UHFFFAOYSA-N 0.000 claims description 9
- 235000009582 asparagine Nutrition 0.000 claims description 9
- 229960001230 asparagine Drugs 0.000 claims description 9
- 101000997963 Aequorea victoria Green fluorescent protein Proteins 0.000 claims description 7
- 238000004925 denaturation Methods 0.000 claims description 6
- 230000036425 denaturation Effects 0.000 claims description 6
- 241000006867 Discosoma Species 0.000 claims description 5
- 125000000637 arginyl group Chemical group N[C@@H](CCCNC(N)=N)C(=O)* 0.000 claims description 5
- 125000000613 asparagine group Chemical group N[C@@H](CC(N)=O)C(=O)* 0.000 claims description 5
- 230000004071 biological effect Effects 0.000 claims description 5
- COLNVLDHVKWLRT-UHFFFAOYSA-N phenylalanine Natural products OC(=O)C(N)CC1=CC=CC=C1 COLNVLDHVKWLRT-UHFFFAOYSA-N 0.000 claims description 5
- COLNVLDHVKWLRT-QMMMGPOBSA-N phenylalanine group Chemical group N[C@@H](CC1=CC=CC=C1)C(=O)O COLNVLDHVKWLRT-QMMMGPOBSA-N 0.000 claims description 5
- 125000000341 threoninyl group Chemical group [H]OC([H])(C([H])([H])[H])C([H])(N([H])[H])C(*)=O 0.000 claims description 5
- DCXYFEDJOCDNAF-REOHCLBHSA-N L-asparagine Chemical compound OC(=O)[C@@H](N)CC(N)=O DCXYFEDJOCDNAF-REOHCLBHSA-N 0.000 claims description 4
- 238000005304 joining Methods 0.000 claims description 4
- 241000242762 Anemonia sulcata Species 0.000 claims description 3
- 230000003115 biocidal effect Effects 0.000 claims description 3
- 241001487542 Trachyphyllia geoffroyi Species 0.000 claims description 2
- 108091006049 anthozoan fluorescent proteins Proteins 0.000 claims description 2
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 claims 1
- 241000243290 Aequorea Species 0.000 claims 1
- 230000002708 enhancing effect Effects 0.000 claims 1
- 230000001976 improved effect Effects 0.000 abstract description 36
- 229940024606 amino acid Drugs 0.000 description 73
- 150000001413 amino acids Chemical class 0.000 description 66
- 210000004027 cell Anatomy 0.000 description 64
- 230000014509 gene expression Effects 0.000 description 60
- 230000004927 fusion Effects 0.000 description 44
- 230000035772 mutation Effects 0.000 description 37
- 238000009739 binding Methods 0.000 description 30
- 230000027455 binding Effects 0.000 description 29
- 125000005647 linker group Chemical group 0.000 description 23
- 230000006870 function Effects 0.000 description 22
- 108020004705 Codon Proteins 0.000 description 21
- 238000012360 testing method Methods 0.000 description 21
- 241000588724 Escherichia coli Species 0.000 description 18
- 230000001965 increasing effect Effects 0.000 description 18
- 102000004190 Enzymes Human genes 0.000 description 17
- 108090000790 Enzymes Proteins 0.000 description 17
- 238000000338 in vitro Methods 0.000 description 17
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 16
- 229940088598 enzyme Drugs 0.000 description 16
- 239000013598 vector Substances 0.000 description 16
- 108020004414 DNA Proteins 0.000 description 15
- 238000009396 hybridization Methods 0.000 description 14
- 238000002703 mutagenesis Methods 0.000 description 14
- 231100000350 mutagenesis Toxicity 0.000 description 14
- 125000003729 nucleotide group Chemical group 0.000 description 14
- 239000013615 primer Substances 0.000 description 14
- 239000002253 acid Substances 0.000 description 13
- 150000007513 acids Chemical class 0.000 description 13
- 239000002773 nucleotide Substances 0.000 description 13
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 12
- 238000013459 approach Methods 0.000 description 11
- 239000013612 plasmid Substances 0.000 description 11
- 230000001580 bacterial effect Effects 0.000 description 10
- 238000001727 in vivo Methods 0.000 description 10
- 239000000523 sample Substances 0.000 description 10
- DHMQDGOQFOQNFH-UHFFFAOYSA-N Glycine Chemical compound NCC(O)=O DHMQDGOQFOQNFH-UHFFFAOYSA-N 0.000 description 9
- 108091005461 Nucleic proteins Proteins 0.000 description 9
- -1 aliphatic amino acid Chemical class 0.000 description 9
- 239000000872 buffer Substances 0.000 description 9
- 239000012634 fragment Substances 0.000 description 9
- 108091026890 Coding region Proteins 0.000 description 8
- 241000270934 Rana catesbeiana Species 0.000 description 8
- 239000000427 antigen Substances 0.000 description 8
- 230000015572 biosynthetic process Effects 0.000 description 8
- 238000012986 modification Methods 0.000 description 8
- 230000004048 modification Effects 0.000 description 8
- 230000006798 recombination Effects 0.000 description 8
- 238000005215 recombination Methods 0.000 description 8
- 239000011780 sodium chloride Substances 0.000 description 8
- 239000000126 substance Substances 0.000 description 8
- 108090000204 Dipeptidase 1 Proteins 0.000 description 7
- 108091028043 Nucleic acid sequence Proteins 0.000 description 7
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 7
- 125000000539 amino acid group Chemical group 0.000 description 7
- 108091007433 antigens Proteins 0.000 description 7
- 102000036639 antigens Human genes 0.000 description 7
- 102000006635 beta-lactamase Human genes 0.000 description 7
- 230000000875 corresponding effect Effects 0.000 description 7
- 239000000203 mixture Substances 0.000 description 7
- 238000000746 purification Methods 0.000 description 7
- 238000002415 sodium dodecyl sulfate polyacrylamide gel electrophoresis Methods 0.000 description 7
- 241000242764 Aequorea victoria Species 0.000 description 6
- 235000014653 Carica parviflora Nutrition 0.000 description 6
- ZHNUHDYFZUAESO-UHFFFAOYSA-N Formamide Chemical compound NC=O ZHNUHDYFZUAESO-UHFFFAOYSA-N 0.000 description 6
- FFEARJCKVFRZRR-BYPYZUCNSA-N L-methionine Chemical group CSCC[C@H](N)C(O)=O FFEARJCKVFRZRR-BYPYZUCNSA-N 0.000 description 6
- 108091005971 Wild-type GFP Proteins 0.000 description 6
- 230000003321 amplification Effects 0.000 description 6
- 238000003556 assay Methods 0.000 description 6
- 229910052799 carbon Inorganic materials 0.000 description 6
- 238000000326 densiometry Methods 0.000 description 6
- 238000002825 functional assay Methods 0.000 description 6
- RAXXELZNTBOGNW-UHFFFAOYSA-N imidazole Natural products C1=CNC=N1 RAXXELZNTBOGNW-UHFFFAOYSA-N 0.000 description 6
- 238000003780 insertion Methods 0.000 description 6
- 230000037431 insertion Effects 0.000 description 6
- 229930182817 methionine Chemical group 0.000 description 6
- 238000003199 nucleic acid amplification method Methods 0.000 description 6
- 239000008188 pellet Substances 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000012216 screening Methods 0.000 description 6
- HEMHJVSKTPXQMS-UHFFFAOYSA-M sodium hydroxide Inorganic materials [OH-].[Na+] HEMHJVSKTPXQMS-UHFFFAOYSA-M 0.000 description 6
- 239000000758 substrate Substances 0.000 description 6
- 238000013519 translation Methods 0.000 description 6
- 241001198387 Escherichia coli BL21(DE3) Species 0.000 description 5
- QNAYBMKLOCPYGJ-REOHCLBHSA-N L-alanine Chemical compound C[C@H](N)C(O)=O QNAYBMKLOCPYGJ-REOHCLBHSA-N 0.000 description 5
- 102000007056 Recombinant Fusion Proteins Human genes 0.000 description 5
- 108010008281 Recombinant Fusion Proteins Proteins 0.000 description 5
- 108091081024 Start codon Proteins 0.000 description 5
- 239000007983 Tris buffer Substances 0.000 description 5
- 235000004279 alanine Nutrition 0.000 description 5
- 102000005936 beta-Galactosidase Human genes 0.000 description 5
- 108010005774 beta-Galactosidase Proteins 0.000 description 5
- 210000004899 c-terminal region Anatomy 0.000 description 5
- 230000008859 change Effects 0.000 description 5
- 150000001875 compounds Chemical class 0.000 description 5
- 239000013078 crystal Substances 0.000 description 5
- 230000001186 cumulative effect Effects 0.000 description 5
- 238000001514 detection method Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 238000000684 flow cytometry Methods 0.000 description 5
- 238000002866 fluorescence resonance energy transfer Methods 0.000 description 5
- 230000002068 genetic effect Effects 0.000 description 5
- 238000004519 manufacturing process Methods 0.000 description 5
- 229920000642 polymer Polymers 0.000 description 5
- 108020001580 protein domains Proteins 0.000 description 5
- 238000002781 protein folding assay Methods 0.000 description 5
- 230000001105 regulatory effect Effects 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 241000894007 species Species 0.000 description 5
- 108091005957 yellow fluorescent proteins Proteins 0.000 description 5
- MTCFGRXMJLQNBG-REOHCLBHSA-N (2S)-2-Amino-3-hydroxypropansäure Chemical compound OC[C@H](N)C(O)=O MTCFGRXMJLQNBG-REOHCLBHSA-N 0.000 description 4
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 4
- 102000000546 Apoferritins Human genes 0.000 description 4
- 108010002084 Apoferritins Proteins 0.000 description 4
- 241000894006 Bacteria Species 0.000 description 4
- 101710117451 Calmodulin-like protein Proteins 0.000 description 4
- 241000243321 Cnidaria Species 0.000 description 4
- 239000004471 Glycine Substances 0.000 description 4
- 108010021625 Immunoglobulin Fragments Proteins 0.000 description 4
- QIVBCDIJIAJPQS-VIFPVBQESA-N L-tryptophane Chemical compound C1=CC=C2C(C[C@H](N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-VIFPVBQESA-N 0.000 description 4
- KZSNJWFQEVHDMF-BYPYZUCNSA-N L-valine Chemical compound CC(C)[C@H](N)C(O)=O KZSNJWFQEVHDMF-BYPYZUCNSA-N 0.000 description 4
- 101001023781 Montipora efflorescens GFP-like non-fluorescent chromoprotein Proteins 0.000 description 4
- 108091034117 Oligonucleotide Proteins 0.000 description 4
- QIVBCDIJIAJPQS-UHFFFAOYSA-N Tryptophan Natural products C1=CC=C2C(CC(N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-UHFFFAOYSA-N 0.000 description 4
- 238000002835 absorbance Methods 0.000 description 4
- 230000004075 alteration Effects 0.000 description 4
- 239000003153 chemical reaction reagent Substances 0.000 description 4
- 238000000295 emission spectrum Methods 0.000 description 4
- 230000005284 excitation Effects 0.000 description 4
- 239000000499 gel Substances 0.000 description 4
- 238000003018 immunoassay Methods 0.000 description 4
- 210000003000 inclusion body Anatomy 0.000 description 4
- 230000001939 inductive effect Effects 0.000 description 4
- 229910052751 metal Inorganic materials 0.000 description 4
- 239000002184 metal Substances 0.000 description 4
- 238000010369 molecular cloning Methods 0.000 description 4
- 239000000178 monomer Substances 0.000 description 4
- 238000002823 phage display Methods 0.000 description 4
- 238000012552 review Methods 0.000 description 4
- 238000002741 site-directed mutagenesis Methods 0.000 description 4
- 238000010561 standard procedure Methods 0.000 description 4
- 238000011144 upstream manufacturing Methods 0.000 description 4
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 3
- 241000242759 Actiniaria Species 0.000 description 3
- 241000242757 Anthozoa Species 0.000 description 3
- BWGNESOTFCXPMA-UHFFFAOYSA-N Dihydrogen disulfide Chemical compound SS BWGNESOTFCXPMA-UHFFFAOYSA-N 0.000 description 3
- WHUUTDBJXJRKMK-UHFFFAOYSA-N Glutamic acid Natural products OC(=O)C(N)CCC(O)=O WHUUTDBJXJRKMK-UHFFFAOYSA-N 0.000 description 3
- 241000238631 Hexapoda Species 0.000 description 3
- 108060003951 Immunoglobulin Proteins 0.000 description 3
- 102000008394 Immunoglobulin Fragments Human genes 0.000 description 3
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 description 3
- 108700026244 Open Reading Frames Proteins 0.000 description 3
- 108010033276 Peptide Fragments Proteins 0.000 description 3
- 102000007079 Peptide Fragments Human genes 0.000 description 3
- 241000736843 Pyrobaculum aerophilum Species 0.000 description 3
- 108700008625 Reporter Genes Proteins 0.000 description 3
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 description 3
- 108020005038 Terminator Codon Proteins 0.000 description 3
- 108010022394 Threonine synthase Proteins 0.000 description 3
- 241000700605 Viruses Species 0.000 description 3
- 230000001413 cellular effect Effects 0.000 description 3
- 238000005119 centrifugation Methods 0.000 description 3
- 238000010367 cloning Methods 0.000 description 3
- 230000002596 correlated effect Effects 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 238000012217 deletion Methods 0.000 description 3
- 230000037430 deletion Effects 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 3
- 102000004419 dihydrofolate reductase Human genes 0.000 description 3
- 238000007865 diluting Methods 0.000 description 3
- 210000003527 eukaryotic cell Anatomy 0.000 description 3
- 238000013467 fragmentation Methods 0.000 description 3
- 238000006062 fragmentation reaction Methods 0.000 description 3
- 238000007429 general method Methods 0.000 description 3
- 235000013922 glutamic acid Nutrition 0.000 description 3
- 239000004220 glutamic acid Substances 0.000 description 3
- 102000018358 immunoglobulin Human genes 0.000 description 3
- 230000000977 initiatory effect Effects 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 229930027917 kanamycin Natural products 0.000 description 3
- 229960000318 kanamycin Drugs 0.000 description 3
- SBUJHOSQTJFQJX-NOAMYHISSA-N kanamycin Chemical compound O[C@@H]1[C@@H](O)[C@H](O)[C@@H](CN)O[C@@H]1O[C@H]1[C@H](O)[C@@H](O[C@@H]2[C@@H]([C@@H](N)[C@H](O)[C@@H](CO)O2)O)[C@H](N)C[C@@H]1N SBUJHOSQTJFQJX-NOAMYHISSA-N 0.000 description 3
- 229930182823 kanamycin A Natural products 0.000 description 3
- 239000003446 ligand Substances 0.000 description 3
- 210000004962 mammalian cell Anatomy 0.000 description 3
- 229910052757 nitrogen Inorganic materials 0.000 description 3
- 230000008520 organization Effects 0.000 description 3
- 239000000816 peptidomimetic Substances 0.000 description 3
- 239000000047 product Substances 0.000 description 3
- 230000004853 protein function Effects 0.000 description 3
- 230000004850 protein–protein interaction Effects 0.000 description 3
- 238000013518 transcription Methods 0.000 description 3
- 230000035897 transcription Effects 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- LENZDBCJOHFCAS-UHFFFAOYSA-N tris Chemical compound OCC(N)(CO)CO LENZDBCJOHFCAS-UHFFFAOYSA-N 0.000 description 3
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 description 3
- 238000011179 visual inspection Methods 0.000 description 3
- DIGQNXIGRZPYDK-WKSCXVIASA-N (2R)-6-amino-2-[[2-[[(2S)-2-[[2-[[(2R)-2-[[(2S)-2-[[(2R,3S)-2-[[2-[[(2S)-2-[[2-[[(2S)-2-[[(2S)-2-[[(2R)-2-[[(2S,3S)-2-[[(2R)-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[2-[[(2S)-2-[[(2R)-2-[[2-[[2-[[2-[(2-amino-1-hydroxyethylidene)amino]-3-carboxy-1-hydroxypropylidene]amino]-1-hydroxy-3-sulfanylpropylidene]amino]-1-hydroxyethylidene]amino]-1-hydroxy-3-sulfanylpropylidene]amino]-1,3-dihydroxypropylidene]amino]-1-hydroxyethylidene]amino]-1-hydroxypropylidene]amino]-1,3-dihydroxypropylidene]amino]-1,3-dihydroxypropylidene]amino]-1-hydroxy-3-sulfanylpropylidene]amino]-1,3-dihydroxybutylidene]amino]-1-hydroxy-3-sulfanylpropylidene]amino]-1-hydroxypropylidene]amino]-1,3-dihydroxypropylidene]amino]-1-hydroxyethylidene]amino]-1,5-dihydroxy-5-iminopentylidene]amino]-1-hydroxy-3-sulfanylpropylidene]amino]-1,3-dihydroxybutylidene]amino]-1-hydroxy-3-sulfanylpropylidene]amino]-1,3-dihydroxypropylidene]amino]-1-hydroxyethylidene]amino]-1-hydroxy-3-sulfanylpropylidene]amino]-1-hydroxyethylidene]amino]hexanoic acid Chemical compound C[C@@H]([C@@H](C(=N[C@@H](CS)C(=N[C@@H](C)C(=N[C@@H](CO)C(=NCC(=N[C@@H](CCC(=N)O)C(=NC(CS)C(=N[C@H]([C@H](C)O)C(=N[C@H](CS)C(=N[C@H](CO)C(=NCC(=N[C@H](CS)C(=NCC(=N[C@H](CCCCN)C(=O)O)O)O)O)O)O)O)O)O)O)O)O)O)O)N=C([C@H](CS)N=C([C@H](CO)N=C([C@H](CO)N=C([C@H](C)N=C(CN=C([C@H](CO)N=C([C@H](CS)N=C(CN=C(C(CS)N=C(C(CC(=O)O)N=C(CN)O)O)O)O)O)O)O)O)O)O)O)O DIGQNXIGRZPYDK-WKSCXVIASA-N 0.000 description 2
- VYZAMTAEIAYCRO-UHFFFAOYSA-N Chromium Chemical compound [Cr] VYZAMTAEIAYCRO-UHFFFAOYSA-N 0.000 description 2
- 102100031673 Corneodesmosin Human genes 0.000 description 2
- 102100025698 Cytosolic carboxypeptidase 4 Human genes 0.000 description 2
- 102100024746 Dihydrofolate reductase Human genes 0.000 description 2
- 241000006271 Discosoma sp. Species 0.000 description 2
- 241000701959 Escherichia virus Lambda Species 0.000 description 2
- 102220566687 GDNF family receptor alpha-1_F64L_mutation Human genes 0.000 description 2
- 102220566469 GDNF family receptor alpha-1_S65T_mutation Human genes 0.000 description 2
- 102220567282 GDNF family receptor alpha-1_T203Y_mutation Human genes 0.000 description 2
- 108010001336 Horseradish Peroxidase Proteins 0.000 description 2
- 108010020056 Hydrogenase Proteins 0.000 description 2
- 108700005091 Immunoglobulin Genes Proteins 0.000 description 2
- CKLJMWTZIZZHCS-REOHCLBHSA-N L-aspartic acid Chemical compound OC(=O)[C@@H](N)CC(O)=O CKLJMWTZIZZHCS-REOHCLBHSA-N 0.000 description 2
- AGPKZVBTJJNPAG-WHFBIAKZSA-N L-isoleucine Chemical compound CC[C@H](C)[C@H](N)C(O)=O AGPKZVBTJJNPAG-WHFBIAKZSA-N 0.000 description 2
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 2
- LRQKBLKVPFOOQJ-YFKPBYRVSA-N L-norleucine Chemical group CCCC[C@H]([NH3+])C([O-])=O LRQKBLKVPFOOQJ-YFKPBYRVSA-N 0.000 description 2
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 2
- 239000004472 Lysine Substances 0.000 description 2
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 description 2
- 102000003792 Metallothionein Human genes 0.000 description 2
- 108090000157 Metallothionein Proteins 0.000 description 2
- 241001529936 Murinae Species 0.000 description 2
- 125000001429 N-terminal alpha-amino-acid group Chemical group 0.000 description 2
- 101100442582 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) spe-1 gene Proteins 0.000 description 2
- 102000013901 Nucleoside diphosphate kinase Human genes 0.000 description 2
- 229910019142 PO4 Inorganic materials 0.000 description 2
- 108091005804 Peptidases Proteins 0.000 description 2
- 102000035195 Peptidases Human genes 0.000 description 2
- 108091093037 Peptide nucleic acid Proteins 0.000 description 2
- 101710177166 Phosphoprotein Proteins 0.000 description 2
- 229940096437 Protein S Drugs 0.000 description 2
- 108020004511 Recombinant DNA Proteins 0.000 description 2
- 108091028664 Ribonucleotide Proteins 0.000 description 2
- 241000242583 Scyphozoa Species 0.000 description 2
- FKNQFGJONOIPTF-UHFFFAOYSA-N Sodium cation Chemical compound [Na+] FKNQFGJONOIPTF-UHFFFAOYSA-N 0.000 description 2
- 101710100179 UMP-CMP kinase Proteins 0.000 description 2
- 101710119674 UMP-CMP kinase 2, mitochondrial Proteins 0.000 description 2
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 2
- 108010031318 Vitronectin Proteins 0.000 description 2
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 2
- 230000002378 acidificating effect Effects 0.000 description 2
- 238000007792 addition Methods 0.000 description 2
- 125000001931 aliphatic group Chemical group 0.000 description 2
- 235000003704 aspartic acid Nutrition 0.000 description 2
- 125000004429 atom Chemical group 0.000 description 2
- OQFSQFPPLPISGP-UHFFFAOYSA-N beta-carboxyaspartic acid Natural products OC(=O)C(N)C(C(O)=O)C(O)=O OQFSQFPPLPISGP-UHFFFAOYSA-N 0.000 description 2
- 108091005948 blue fluorescent proteins Proteins 0.000 description 2
- 230000003197 catalytic effect Effects 0.000 description 2
- 239000013522 chelant Substances 0.000 description 2
- 239000003795 chemical substances by application Substances 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 239000002299 complementary DNA Substances 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 239000013256 coordination polymer Substances 0.000 description 2
- 235000018417 cysteine Nutrition 0.000 description 2
- XUJNEKJLAYXESH-UHFFFAOYSA-N cysteine Natural products SCC(N)C(O)=O XUJNEKJLAYXESH-UHFFFAOYSA-N 0.000 description 2
- 239000003398 denaturant Substances 0.000 description 2
- 230000008021 deposition Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000029087 digestion Effects 0.000 description 2
- 108020001096 dihydrofolate reductase Proteins 0.000 description 2
- 239000000539 dimer Substances 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 238000009510 drug design Methods 0.000 description 2
- 230000002255 enzymatic effect Effects 0.000 description 2
- 230000002538 fungal effect Effects 0.000 description 2
- ZDXPYRJPNDTMRX-UHFFFAOYSA-N glutamine Natural products OC(=O)C(N)CCC(N)=O ZDXPYRJPNDTMRX-UHFFFAOYSA-N 0.000 description 2
- 229910052739 hydrogen Inorganic materials 0.000 description 2
- 239000001257 hydrogen Substances 0.000 description 2
- 239000000543 intermediate Substances 0.000 description 2
- 229960000310 isoleucine Drugs 0.000 description 2
- AGPKZVBTJJNPAG-UHFFFAOYSA-N isoleucine Natural products CCC(C)C(N)C(O)=O AGPKZVBTJJNPAG-UHFFFAOYSA-N 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000001404 mediated effect Effects 0.000 description 2
- 244000005700 microbiome Species 0.000 description 2
- 239000003471 mutagenic agent Substances 0.000 description 2
- 230000000869 mutational effect Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000002018 overexpression Effects 0.000 description 2
- 238000007254 oxidation reaction Methods 0.000 description 2
- 230000037361 pathway Effects 0.000 description 2
- 239000010452 phosphate Substances 0.000 description 2
- 229910052698 phosphorus Inorganic materials 0.000 description 2
- 229920002704 polyhistidine Polymers 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 238000001498 protein fragment complementation assay Methods 0.000 description 2
- 238000001742 protein purification Methods 0.000 description 2
- 239000012460 protein solution Substances 0.000 description 2
- 238000002708 random mutagenesis Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 239000002336 ribonucleotide Substances 0.000 description 2
- QSHGUCSTWRSQAF-FJSLEGQWSA-N s-peptide Chemical compound C([C@@H](C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC=1C=CC(OS(O)(=O)=O)=CC=1)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(O)=O)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCCCN)C(O)=O)NC(=O)[C@@H](NC(=O)[C@H]1N(CCC1)C(=O)[C@H](CO)NC(=O)[C@H](CO)NC(=O)[C@@H](NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](CC=1C=CC(O)=CC=1)NC(=O)[C@H](CC(O)=O)NC(=O)[C@@H](N)CCSC)C(C)C)[C@@H](C)CC)C1=CC=C(OS(O)(=O)=O)C=C1 QSHGUCSTWRSQAF-FJSLEGQWSA-N 0.000 description 2
- 150000003839 salts Chemical class 0.000 description 2
- 229910001415 sodium ion Inorganic materials 0.000 description 2
- 229910052717 sulfur Inorganic materials 0.000 description 2
- 239000006228 supernatant Substances 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 230000007306 turnover Effects 0.000 description 2
- 241000701447 unidentified baculovirus Species 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- UKAUYVFTDYCKQA-UHFFFAOYSA-N -2-Amino-4-hydroxybutanoic acid Natural products OC(=O)C(N)CCO UKAUYVFTDYCKQA-UHFFFAOYSA-N 0.000 description 1
- VGONTNSXDCQUGY-RRKCRQDMSA-N 2'-deoxyinosine Chemical group C1[C@H](O)[C@@H](CO)O[C@H]1N1C(N=CNC2=O)=C2N=C1 VGONTNSXDCQUGY-RRKCRQDMSA-N 0.000 description 1
- 101100068321 Aequorea victoria GFP gene Proteins 0.000 description 1
- 102000002260 Alkaline Phosphatase Human genes 0.000 description 1
- 108020004774 Alkaline Phosphatase Proteins 0.000 description 1
- 108700028369 Alleles Proteins 0.000 description 1
- 102000052866 Amino Acyl-tRNA Synthetases Human genes 0.000 description 1
- 108700028939 Amino Acyl-tRNA Synthetases Proteins 0.000 description 1
- 241001512986 Anemonia majano Species 0.000 description 1
- 108020004652 Aspartate-Semialdehyde Dehydrogenase Proteins 0.000 description 1
- 241000193830 Bacillus <bacterium> Species 0.000 description 1
- 241000194110 Bacillus sp. (in: Bacteria) Species 0.000 description 1
- 101100404147 Bacillus subtilis (strain 168) nasE gene Proteins 0.000 description 1
- 206010006187 Breast cancer Diseases 0.000 description 1
- QCMYYKRYFNMIEC-UHFFFAOYSA-N COP(O)=O Chemical class COP(O)=O QCMYYKRYFNMIEC-UHFFFAOYSA-N 0.000 description 1
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 102000014914 Carrier Proteins Human genes 0.000 description 1
- 108010035563 Chloramphenicol O-acetyltransferase Proteins 0.000 description 1
- 108010000898 Chorismate mutase Proteins 0.000 description 1
- 241000006720 Clavularia sp. Species 0.000 description 1
- 108700010070 Codon Usage Proteins 0.000 description 1
- 108091035707 Consensus sequence Proteins 0.000 description 1
- 101710095468 Cyclase Proteins 0.000 description 1
- 102000018832 Cytochromes Human genes 0.000 description 1
- 108010052832 Cytochromes Proteins 0.000 description 1
- 239000003155 DNA primer Substances 0.000 description 1
- 101710101803 DNA-binding protein J Proteins 0.000 description 1
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 1
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 1
- FEWJPZIEWOKRBE-JCYAYHJZSA-N Dextrotartaric acid Chemical compound OC(=O)[C@H](O)[C@@H](O)C(O)=O FEWJPZIEWOKRBE-JCYAYHJZSA-N 0.000 description 1
- 241001512730 Discosoma striata Species 0.000 description 1
- 238000002965 ELISA Methods 0.000 description 1
- 101710121765 Endo-1,4-beta-xylanase Proteins 0.000 description 1
- 241000701832 Enterobacteria phage T3 Species 0.000 description 1
- 101900148372 Escherichia coli Dihydrofolate reductase Proteins 0.000 description 1
- 102220566524 GDNF family receptor alpha-1_F99S_mutation Human genes 0.000 description 1
- 102220566706 GDNF family receptor alpha-1_S30R_mutation Human genes 0.000 description 1
- 102220566468 GDNF family receptor alpha-1_S65G_mutation Human genes 0.000 description 1
- 102220566479 GDNF family receptor alpha-1_S72A_mutation Human genes 0.000 description 1
- 102220566483 GDNF family receptor alpha-1_V68L_mutation Human genes 0.000 description 1
- 102220566708 GDNF family receptor alpha-1_Y39N_mutation Human genes 0.000 description 1
- 102000002464 Galactosidases Human genes 0.000 description 1
- 108010093031 Galactosidases Proteins 0.000 description 1
- 108050002220 Green fluorescent protein, GFP Proteins 0.000 description 1
- HVLSXIKZNLPZJJ-TXZCQADKSA-N HA peptide Chemical compound C([C@@H](C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](C(C)C)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(=O)N[C@@H](C)C(O)=O)NC(=O)[C@H]1N(CCC1)C(=O)[C@@H](N)CC=1C=CC(O)=CC=1)C1=CC=C(O)C=C1 HVLSXIKZNLPZJJ-TXZCQADKSA-N 0.000 description 1
- 241001517044 Heteractis magnifica Species 0.000 description 1
- 101000714692 Homo sapiens Calmodulin-like protein 3 Proteins 0.000 description 1
- 101000932590 Homo sapiens Cytosolic carboxypeptidase 4 Proteins 0.000 description 1
- 241000701044 Human gammaherpesvirus 4 Species 0.000 description 1
- 102000004867 Hydro-Lyases Human genes 0.000 description 1
- 108090001042 Hydro-Lyases Proteins 0.000 description 1
- PMMYEEVYMWASQN-DMTCNVIQSA-N Hydroxyproline Chemical compound O[C@H]1CN[C@H](C(O)=O)C1 PMMYEEVYMWASQN-DMTCNVIQSA-N 0.000 description 1
- 206010020751 Hypersensitivity Diseases 0.000 description 1
- 102000001706 Immunoglobulin Fab Fragments Human genes 0.000 description 1
- 108010054477 Immunoglobulin Fab Fragments Proteins 0.000 description 1
- 108010067060 Immunoglobulin Variable Region Proteins 0.000 description 1
- 101710120978 Kanamycin resistance protein Proteins 0.000 description 1
- 125000000998 L-alanino group Chemical group [H]N([*])[C@](C([H])([H])[H])([H])C(=O)O[H] 0.000 description 1
- UKAUYVFTDYCKQA-VKHMYHEASA-N L-homoserine Chemical group OC(=O)[C@@H](N)CCO UKAUYVFTDYCKQA-VKHMYHEASA-N 0.000 description 1
- QEFRNWWLZKMPFJ-ZXPFJRLXSA-N L-methionine (R)-S-oxide Chemical group C[S@@](=O)CC[C@H]([NH3+])C([O-])=O QEFRNWWLZKMPFJ-ZXPFJRLXSA-N 0.000 description 1
- QEFRNWWLZKMPFJ-UHFFFAOYSA-N L-methionine sulphoxide Chemical group CS(=O)CCC(N)C(O)=O QEFRNWWLZKMPFJ-UHFFFAOYSA-N 0.000 description 1
- 125000000205 L-threonino group Chemical group [H]OC(=O)[C@@]([H])(N([H])[*])[C@](C([H])([H])[H])([H])O[H] 0.000 description 1
- GUBGYTABKSRVRQ-QKKXKWKRSA-N Lactose Natural products OC[C@H]1O[C@@H](O[C@H]2[C@H](O)[C@@H](O)C(O)O[C@@H]2CO)[C@H](O)[C@@H](O)[C@H]1O GUBGYTABKSRVRQ-QKKXKWKRSA-N 0.000 description 1
- 241000668842 Lepidosaphes gloverii Species 0.000 description 1
- 101710175625 Maltose/maltodextrin-binding periplasmic protein Proteins 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 108010006519 Molecular Chaperones Proteins 0.000 description 1
- 241001507045 Montipora efflorescens Species 0.000 description 1
- 241000699660 Mus musculus Species 0.000 description 1
- 101001033003 Mus musculus Granzyme F Proteins 0.000 description 1
- 101710135898 Myc proto-oncogene protein Proteins 0.000 description 1
- 102100038895 Myc proto-oncogene protein Human genes 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 241001631646 Papillomaviridae Species 0.000 description 1
- 108010087702 Penicillinase Proteins 0.000 description 1
- 241000242751 Pennatulacea Species 0.000 description 1
- 102000057297 Pepsin A Human genes 0.000 description 1
- 108090000284 Pepsin A Proteins 0.000 description 1
- 206010035226 Plasma cell myeloma Diseases 0.000 description 1
- 241000276498 Pollachius virens Species 0.000 description 1
- 101710182846 Polyhedrin Proteins 0.000 description 1
- 101710101148 Probable 6-oxopurine nucleoside phosphorylase Proteins 0.000 description 1
- 108700040121 Protein Methyltransferases Proteins 0.000 description 1
- 102000055027 Protein Methyltransferases Human genes 0.000 description 1
- 241000589776 Pseudomonas putida Species 0.000 description 1
- 102000030764 Purine-nucleoside phosphorylase Human genes 0.000 description 1
- 241000242739 Renilla Species 0.000 description 1
- 241000242743 Renilla reniformis Species 0.000 description 1
- 102000006382 Ribonucleases Human genes 0.000 description 1
- 108010083644 Ribonucleases Proteins 0.000 description 1
- 102000002278 Ribosomal Proteins Human genes 0.000 description 1
- 108010000605 Ribosomal Proteins Proteins 0.000 description 1
- 241000220317 Rosa Species 0.000 description 1
- 241000714474 Rous sarcoma virus Species 0.000 description 1
- 241000607142 Salmonella Species 0.000 description 1
- 241000242732 Scleractinia Species 0.000 description 1
- 108090000787 Subtilisin Proteins 0.000 description 1
- 101710099760 Tetracycline resistance protein Proteins 0.000 description 1
- RYYWUUFWQRZTIU-UHFFFAOYSA-N Thiophosphoric acid Chemical class OP(O)(S)=O RYYWUUFWQRZTIU-UHFFFAOYSA-N 0.000 description 1
- 108091023040 Transcription factor Proteins 0.000 description 1
- 102000040945 Transcription factor Human genes 0.000 description 1
- 101710150448 Transcriptional regulator Myc Proteins 0.000 description 1
- 241000251555 Tunicata Species 0.000 description 1
- 241000387514 Waldo Species 0.000 description 1
- 241001512733 Zoanthus sp. Species 0.000 description 1
- 238000000862 absorption spectrum Methods 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 239000003463 adsorbent Substances 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 102000012086 alpha-L-Fucosidase Human genes 0.000 description 1
- 108010061314 alpha-L-Fucosidase Proteins 0.000 description 1
- 125000003277 amino group Chemical group 0.000 description 1
- BFNBIHQBYMNNAN-UHFFFAOYSA-N ammonium sulfate Chemical compound N.N.OS(O)(=O)=O BFNBIHQBYMNNAN-UHFFFAOYSA-N 0.000 description 1
- 229910052921 ammonium sulfate Inorganic materials 0.000 description 1
- 238000012870 ammonium sulfate precipitation Methods 0.000 description 1
- 235000011130 ammonium sulphate Nutrition 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000000692 anti-sense effect Effects 0.000 description 1
- 239000007864 aqueous solution Substances 0.000 description 1
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 1
- 108091008324 binding proteins Proteins 0.000 description 1
- 239000011942 biocatalyst Substances 0.000 description 1
- 230000008436 biogenesis Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 229960000074 biopharmaceutical Drugs 0.000 description 1
- 229960002685 biotin Drugs 0.000 description 1
- 239000011616 biotin Substances 0.000 description 1
- 125000003178 carboxy group Chemical group [H]OC(*)=O 0.000 description 1
- UHBYWPGGCSDKFX-UHFFFAOYSA-N carboxyglutamic acid Chemical compound OC(=O)C(N)CC(C(O)=O)C(O)=O UHBYWPGGCSDKFX-UHFFFAOYSA-N 0.000 description 1
- 238000012219 cassette mutagenesis Methods 0.000 description 1
- 108020001778 catalytic domains Proteins 0.000 description 1
- 238000004113 cell culture Methods 0.000 description 1
- 230000010261 cell growth Effects 0.000 description 1
- 239000013592 cell lysate Substances 0.000 description 1
- 238000009614 chemical analysis method Methods 0.000 description 1
- 238000003889 chemical engineering Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 229960005091 chloramphenicol Drugs 0.000 description 1
- WIIZWVCIJKGZOK-RKDXNWHRSA-N chloramphenicol Chemical compound ClC(Cl)C(=O)N[C@H](CO)[C@H](O)C1=CC=C([N+]([O-])=O)C=C1 WIIZWVCIJKGZOK-RKDXNWHRSA-N 0.000 description 1
- 238000002983 circular dichroism Methods 0.000 description 1
- 238000004440 column chromatography Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000002425 crystallisation Methods 0.000 description 1
- 230000008025 crystallization Effects 0.000 description 1
- 108010082025 cyan fluorescent protein Proteins 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 125000000151 cysteine group Chemical group N[C@@H](CS)C(=O)* 0.000 description 1
- 210000000805 cytoplasm Anatomy 0.000 description 1
- 230000001086 cytosolic effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 239000005547 deoxyribonucleotide Substances 0.000 description 1
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 1
- 230000000368 destabilizing effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000009274 differential gene expression Effects 0.000 description 1
- 238000002050 diffraction method Methods 0.000 description 1
- 230000006334 disulfide bridging Effects 0.000 description 1
- VHJLVAABSRFDPM-QWWZWVQMSA-N dithiothreitol Chemical compound SC[C@@H](O)[C@H](O)CS VHJLVAABSRFDPM-QWWZWVQMSA-N 0.000 description 1
- PMMYEEVYMWASQN-UHFFFAOYSA-N dl-hydroxyproline Natural products OC1C[NH2+]C(C([O-])=O)C1 PMMYEEVYMWASQN-UHFFFAOYSA-N 0.000 description 1
- 238000013551 empirical research Methods 0.000 description 1
- 108010048367 enhanced green fluorescent protein Proteins 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000010429 evolutionary process Effects 0.000 description 1
- 238000000695 excitation spectrum Methods 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 239000007850 fluorescent dye Substances 0.000 description 1
- 238000001502 gel electrophoresis Methods 0.000 description 1
- 102000034356 gene-regulatory proteins Human genes 0.000 description 1
- 108091006104 gene-regulatory proteins Proteins 0.000 description 1
- 238000010353 genetic engineering Methods 0.000 description 1
- 239000011544 gradient gel Substances 0.000 description 1
- 230000012010 growth Effects 0.000 description 1
- 239000001963 growth medium Substances 0.000 description 1
- 238000003306 harvesting Methods 0.000 description 1
- 238000004128 high performance liquid chromatography Methods 0.000 description 1
- 235000014304 histidine Nutrition 0.000 description 1
- 125000000487 histidyl group Chemical group [H]N([H])C(C(=O)O*)C([H])([H])C1=C([H])N([H])C([H])=N1 0.000 description 1
- 102000051644 human CALML3 Human genes 0.000 description 1
- 125000004435 hydrogen atom Chemical group [H]* 0.000 description 1
- 230000002209 hydrophobic effect Effects 0.000 description 1
- 229960002591 hydroxyproline Drugs 0.000 description 1
- 229940072221 immunoglobulins Drugs 0.000 description 1
- 238000003364 immunohistochemistry Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000017730 intein-mediated protein splicing Effects 0.000 description 1
- 230000003834 intracellular effect Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 239000008101 lactose Substances 0.000 description 1
- 239000012160 loading buffer Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 229920002521 macromolecule Polymers 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 239000002609 medium Substances 0.000 description 1
- 238000002844 melting Methods 0.000 description 1
- 230000008018 melting Effects 0.000 description 1
- 108020004999 messenger RNA Proteins 0.000 description 1
- LSDPWZHWYPCBBB-UHFFFAOYSA-O methylsulfide anion Chemical compound [SH2+]C LSDPWZHWYPCBBB-UHFFFAOYSA-O 0.000 description 1
- 230000033607 mismatch repair Effects 0.000 description 1
- 108091005601 modified peptides Proteins 0.000 description 1
- 238000000302 molecular modelling Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 201000000050 myeloid neoplasm Diseases 0.000 description 1
- 239000006225 natural substrate Substances 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 101150045642 nirD gene Proteins 0.000 description 1
- MGFYIUFZLHCRTH-UHFFFAOYSA-N nitrilotriacetic acid Chemical compound OC(=O)CN(CC(O)=O)CC(O)=O MGFYIUFZLHCRTH-UHFFFAOYSA-N 0.000 description 1
- 238000007826 nucleic acid assay Methods 0.000 description 1
- 238000011275 oncology therapy Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000003647 oxidation Effects 0.000 description 1
- 229910052760 oxygen Inorganic materials 0.000 description 1
- 239000001301 oxygen Substances 0.000 description 1
- 229950009506 penicillinase Drugs 0.000 description 1
- 229940111202 pepsin Drugs 0.000 description 1
- 230000000144 pharmacologic effect Effects 0.000 description 1
- 239000012071 phase Substances 0.000 description 1
- NBIIXXVUZAFLBC-UHFFFAOYSA-K phosphate Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 description 1
- 150000008298 phosphoramidates Chemical class 0.000 description 1
- BZQFBWGGLXLEPQ-REOHCLBHSA-N phosphoserine Chemical compound OC(=O)[C@@H](N)COP(O)(O)=O BZQFBWGGLXLEPQ-REOHCLBHSA-N 0.000 description 1
- 238000002264 polyacrylamide gel electrophoresis Methods 0.000 description 1
- 108010077051 polycysteine Proteins 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 210000004896 polypeptide structure Anatomy 0.000 description 1
- 108010051566 polysulfide reductase Proteins 0.000 description 1
- 230000001323 posttranslational effect Effects 0.000 description 1
- 239000002244 precipitate Substances 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 210000001236 prokaryotic cell Anatomy 0.000 description 1
- 125000001500 prolyl group Chemical group [H]N1C([H])(C(=O)[*])C([H])([H])C([H])([H])C1([H])[H] 0.000 description 1
- 230000000069 prophylactic effect Effects 0.000 description 1
- 235000019833 protease Nutrition 0.000 description 1
- 230000004952 protein activity Effects 0.000 description 1
- 238000003157 protein complementation Methods 0.000 description 1
- 230000012846 protein folding Effects 0.000 description 1
- 230000026447 protein localization Effects 0.000 description 1
- 239000012474 protein marker Substances 0.000 description 1
- 230000017854 proteolysis Effects 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000010188 recombinant method Methods 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 239000011347 resin Substances 0.000 description 1
- 229920005989 resin Polymers 0.000 description 1
- 108091008146 restriction endonucleases Proteins 0.000 description 1
- 125000002652 ribonucleotide group Chemical group 0.000 description 1
- 229920002477 rna polymer Polymers 0.000 description 1
- 102200142969 rs398124653 Human genes 0.000 description 1
- 238000003118 sandwich ELISA Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 230000035939 shock Effects 0.000 description 1
- 239000007790 solid phase Substances 0.000 description 1
- 230000007928 solubilization Effects 0.000 description 1
- 238000005063 solubilization Methods 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
- 230000009870 specific binding Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000004611 spectroscopical analysis Methods 0.000 description 1
- 230000002269 spontaneous effect Effects 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
- 239000007858 starting material Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 229940095064 tartrate Drugs 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- FGMPLJWBKKVCDB-UHFFFAOYSA-N trans-L-hydroxy-proline Natural products ON1CCCC1C(O)=O FGMPLJWBKKVCDB-UHFFFAOYSA-N 0.000 description 1
- 230000005026 transcription initiation Effects 0.000 description 1
- 230000005030 transcription termination Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000011830 transgenic mouse model Methods 0.000 description 1
- 229940035893 uracil Drugs 0.000 description 1
- 230000003612 virological effect Effects 0.000 description 1
- 238000001262 western blot Methods 0.000 description 1
- 101150038987 xylR gene Proteins 0.000 description 1
- 210000005253 yeast cell Anatomy 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K14/00—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
- C07K14/435—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
- C07K14/43504—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from invertebrates
- C07K14/43595—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from invertebrates from coelenteratae, e.g. medusae
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K16/00—Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/53—Immunoassay; Biospecific binding assay; Materials therefor
- G01N33/531—Production of immunochemical test materials
- G01N33/532—Production of labelled immunochemicals
- G01N33/533—Production of labelled immunochemicals with fluorescent label
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/58—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving labelled substances
- G01N33/582—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving labelled substances with fluorescent label
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2317/00—Immunoglobulins specific features
- C07K2317/50—Immunoglobulins specific features characterized by immunoglobulin fragments
- C07K2317/56—Immunoglobulins specific features characterized by immunoglobulin fragments variable (Fv) region, i.e. VH and/or VL
- C07K2317/565—Complementarity determining region [CDR]
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
Definitions
- Protein insolubility constitutes a significant problem in basic and applied bioscience, in many situations limiting the rate of progress in these areas. Protein folding and solubility has been the subject of considerable theoretical and empirical research. However, there still exists no general method for improving intrinsic protein solubility. Such a method would greatly facilitate-protein structure-function studies, drug design, de novo peptide and protein design and associated structure-function studies, industrial process optimization using bioreactors and microorganisms, and many disciplines in which a process or application depends on the ability to tailor or improve the solubility of proteins, screen or modify the solubility of large numbers of unique proteins about which little or no structure-function information is available, or adapt the solubility of proteins to new environments when the structure and function of the protein(s) are poorly understood or unknown.
- a second set of approaches changes the sequence of the expressed protein.
- Rational approaches employ site-directed mutation of key residues to improve protein stability and solubility. Alternatively, a smaller, more soluble fragment of the protein may be expressed. These approaches require a priori knowledge about the structure of the protein, knowledge which is generally unavailable when the protein is insoluble. Furthermore, rational design approaches are best applied when the problem involves only a small number of amino-acid changes. Finally, even when the structure is known, the changes required to improve solubility may be unclear. Thus, many thousands of possible combinations of mutations may have to be investigated leading to what is essentially an “irrational” or random mutagenesis approach. Such an approach requires a method for rapidly determining the solubility of each version.
- Random or “irrational” mutagenesis redesign of protein solubility carries the possibility that the native function of the protein may be destroyed or modified by the inadvertent mutation of residues which are important for function, but not necessarily related to solubility.
- protein solubility is strongly influenced by interaction with the environment through surface amino acid residues, while catalytic activities and/or small substrate recognition often involve partially buried or cleft residues distant from the surface residues.
- rational mutation of proteins has demonstrated that the solubility of a protein can be modified without destroying the native function of the protein. Modification of the function of a protein without effecting its solubility has also been frequently observed.
- Wild type green fluorescent protein (GFP) cloned from Aequorea victoria normally misfolds and is poorly fluorescent when overexpressed in the heterologous host E. coli . It is found predominantly in the inclusion body fraction of cell lysates. The misfolding is incompletely understood, but is thought to result from the increased expression level or rate in E. coli , or the inadequacy of the bacterial chaperone and related folding machinery under conditions of overexpression. The folding yield also decreases dramatically at higher temperatures (37° C. vs. 27° C.). This wild type GFP is a very poor folder, as it is extremely sensitive to the expression environment.
- Green fluorescent protein has become a widely used reporter of gene expression and regulation.
- DNA shuffling has been used to obtain a mutant having a whole cell fluorescence 45-times greater than the standard, commercially available plasmid GFP. See, e.g., “Improved Green Fluorescent Protein By Molecular Evolution Using DNA Shuffling,” by Andreas Crameri et al., Nature Biotechnology 14, 315 (1996).
- the screening process optimizes the function of GFP (green fluorescence), and thus uses a functional screen.
- the bacteria under the control of a T7 promoter, and that the bacteria contained inclusion bodies consisting of protein indistinguishable from jellyfish or soluble recombinant protein on denaturing gels, but that this material was completely nonfluorescent, lacked the visible absorbance bands of the chromophore, and did not become fluorescent when solubilized and subjected to protocols that renature GFP, as opposed to the soluble GFP in the bacteria which undergoes correct folding and, therefore, fluoresces.
- the recognition was linked to cell survival, (binding of the antibody to a selectable protein marker which was an antigen for the antibody of interest providing selection for functional antibodies); in the case of phage displayed antibodies without disulfide bonds, the recognition was transduced to successful binding of the displayed phage to the target antigen of the displayed antibody in a biopanning protocol.
- An apparent increase in the amount of protein expressed in the soluble fraction relative to the unselected target proteins was noted upon expression of the proteins in E. coli .
- the apparent increase in activity of desirable mutants during the evolution was due at least in part to an increase in the number of correctly folded (and hence functional) protein molecules, and not exclusively to an increase in the specific activity of a given protein molecule.
- the driving force for the selection or screening process during the directed evolution depended on the functionality (and functional assay for) the protein of interest.
- Functional assays therefore lack the generality needed to identify proteins which are soluble, or to find genetic variants (mutants and fragments) of proteins with improved solubility, in a high-throughput manner for proteomics or functional genomics wherein large numbers of different proteins about which little or no functional/structural information is known, are to be solubly expressed.
- GFP3 contains the mutations F99S, M153T and V163A.
- GFP3 is relatively insensitive to the expression environment and folds well in a wide variety of hosts, including E. coli .
- GFP3 folds equally well at 27° C. and 37° C.
- the GFP3 mutations also appear to eliminate potential temperature sensitive folding intermediates that occur during folding of wild type GFP.
- GFP3 can be made to misfold by expression as a fusion protein with another poorly folded polypeptide.
- GFP3 has been used to report on the “folding robustness” of N-terminally fused proteins during expression in E. coli (Waldo et al., Nat. Biotechnol. 17:691-695, 1999). If test protein, Xi, misfolds and is insoluble when expressed in E. coli , cells expressing the corresponding fusion protein Xi-L-GFP3 (where L is a small flexible linker) are poorly fluorescent, indicating the high probability of failure of the GFP3 to fold and become fluorescent. On the other hand, when protein Xs folds well and is highly soluble when expressed in E.
- the present invention provides directed evolution methods for improving the folding and solubility characteristics of polypeptides.
- a number of fluorescent proteins having improved solubility and folding characteristics are provided, including superfolder GFP and DsRed fluorescent proteins.
- FIG. 1 Normalized whole cell fluorescence for E. coli BL21(DE3) expressing GFP variants as C-terminal fusions with poorly-folded bullfrog red cell H-subunit ferritin (bracketed). Expression at 37° C. (black) and 27° C. (grey). GFP variants (left to right) cycle-3 redshift, 6 single point mutants, super folder (left, bracketed)). Non-fusion GFP variants (cycle-3 redshift and superfolder, (right)) as reference. Note that the fluorescence of the optimized superfolder fused to ferritin is essentially identical to the non-fusion cycle-3 redshift GFP. In contrast, cycle-3 redshift GFP fused to ferritin is poorly folded (far left). As expected, the fluorescence is higher at 27° C. relative to 37° C., consistent with the improved folding at lower temperature.
- FIG. 2 Proteins from Pyrobaculum aerophilum expressed in Echerichia coli as N-terminal fusions with either cycle-3 GFP redshift (lower line, triangles) or superfolder GFP (upper line, circles).
- Y-axis whole cell
- FIG. 3 Tolerance of GFP to urea-induced unfolding during refolding from fully-denatured state.
- GFP unfolded in 9M urea at 95° C. were refolded by rapidly diluting into TRIS buffer containing the indicated final concentration of urea (x-axis).
- Fraction of folded protein is determined by fraction of fluorescence recovered (y-axis) at indicated concentration of urea in the refolding buffer (x-axis).
- FIG. 4A Long-term progress curves during refolding of superfolder GFP (SF-GFP) and cycle-3 redshift GFP (C3-GFP). Fully denatured proteins were diluted 100-fold into TRIS buffer (100 mM TRIS-HCl pH 7.5, 150 mM NaCl, 10% v/v glycerol) and the fluorescence measured at 0.2 s intervals with a Perkin Elmer spectrofluorimeter. Note that after 10000 s, both proteins approach the same final value (ca. 375 units).
- FIG. 4B Initial rate progress curves during refolding of superfolder GFP (SF-GFP) and cycle-3 redshift GFP (C3-GFP). Fully denatured proteins were diluted 100-fold into TRIS buffer (100 mM TRIS-HCl pH 7.5, 150 mM NaCl, 10% v/v glycerol) and the fluorescence measured at 0.2 s intervals with a Perkin Elmer spectrofluorimeter. Initial rates were determined by fitting a 4th order polynomial to the first 40 s of each progress curve, and converted to pseudo first-order rates by normalizing to the fluorescence at infinite time (ca. 375 units). The superfolder refolds ca. 7 times faster than cycle-3 redshift.
- TRIS buffer 100 mM TRIS-HCl pH 7.5, 150 mM NaCl, 10% v/v glycerol
- FIG. 5 Increased solubility superfolder mutant pool (right) versus cycle-3 redshift mutant pool (left). SDS-PAGE of (left to right) 10 kD molecular weight standard (M), soluble (S) and pellet (P) fractions of cycle-3 redshift mutant pool (C3-GFP) and superfolder mutant pool (SF-GFP) expressed at 37° C., 10 kD molecular weight standard (M).
- M 10 kD molecular weight standard
- S soluble
- P pellet
- C3-GFP soluble
- C3-GFP soluble
- SF-GFP superfolder mutant pool
- the superfolder (right) has a higher proportion of soluble protein compared to the cycle-3 redshift (left), consistent with the improved folding of superfolder GFP.
- FIG. 6 Flow cytometric analyses of cycle-3 redshift mutant pool library (grey) or control parental cycle-3 redshift (dark grey). Number of events (cells) y-axis; fluorescence intensity of each event (x-axis). Note the logarithmic fluorescence scale.
- FIG. 7 Flow cytometric analyses of superfolder mutant pool library (grey) or control parental superfolder variant (dark grey). Number of events (cells) y-axis; fluorescence intensity of each event (x-axis). Note the logarithmic fluorescence scale.
- FIG. 8 Solubility of various circular permutants expressed in BL21(DE3) at 37° C. of cycle-3 redshift (black) and superfolder GFP (grey). Normal, non-permutated variants (control). Y-axis, fraction soluble determined by SDS-PAGE densitometry. X-axis, indicated circular permutant (see Table 1 for new starting codon position). As expected, the superfolder is more tolerant to circular permutation (as evidenced by the higher solubility) compared to cycle-3 redshift.
- FIG. 9 Whole-cell fluorescence at 37° C. for BL21(DE3) expressing various circular permutants of cycle-3 redshift (black) and superfolder GFP (grey). Fluorescence (488 nm ex/520 nm em) normalized by culture density (absorbance at 600 nm). Normal, non-permutated variants (control). Y-axis, normalized whole cell-fluorescence. X-axis, indicated circular permutant (see Table 1 for new starting codon position). As expected, the superfolder is more tolerant to circular permutation (as evidenced by the higher fluorescence) compared to cycle-3 redshift.
- FIG. 10 Whole-cell fluorescence at 37° C. for BL21(DE3) expressing dsRED variants as C-terminal fusions with poorly-folded bullfrog red-cell H ferritin. Left to right: starting variant (wt); pools of top 10 optima from each round of directed evolution (rounds 1 to 5); non-fusion starting variant (non fusion). Fluorescence (580 nm ex/610 nm em) normalized by culture density (absorbance at 600 nm). As expected, the folding of superfolder dsRED (round 5) is more tolerant to fused upstream misfolded bullfrog red-cell H-ferritin compared to the starting (wt) variant.
- the current invention provides polypeptides with improved folding activity and/or solubility, including superfolding variants of the Aequorea victoria Green Fluorescent Protein and Discosoma sp. Red Fluorescent Protein, and methods of obtaining such polypeptides.
- fluorescent protein as used herein is a protein that has intrinsic fluorescence.
- a fluorescent protein has a structure that includes an 11-stranded beta-barrel.
- chromophoric protein or “chromoprotein” are used interchangeably and refer to a class of proteins, recently identified from various corals, anemones and often sea organisms, which have intrinsic color and, in some cases, variable degrees of intrinsic or inducible fluorescence.
- a chromo-protein has a structure similar to the fluorescent proteins, i.e., an 11-stranded beta-barrel.
- MMDB Id: 5742 structure refers to the GFP structure disclosed by Ormo & Remington, MMDB Id: 5742, in the Molecular Modeling Database (MMDB), PDB Id: 1EMA PDB Authors: M. Ormo & S. J. Remington PDB Deposition: 1 Aug. 96 PDB Class: Fluorescent Protein PDB Title: Green Fluorescent Protein From Aequorea Victoria .
- PDB Protein Data Bank
- RMSD Root mean square deviation
- a “folding interference domain” as used herein refers to a domain that interferes with the folding of a polypeptide (“Xid”).
- Xid polypeptide
- the presence of a folding interference domain in a fusion protein of a polypeptide of interest should detectably interfere with folding, as measured by any criteria capable of discriminating between better and poorer folded versions of the polypeptide of interest, P, within the context of a fusion with Xid.
- the folding interference domain need not be misfolded itself. In fact, it may not actually be folded at all, and it might be soluble or it might be insoluble.
- Domain refers to a unit of a protein or protein complex, comprising a polypeptide subsequence, a complete polypeptide sequence, or a plurality of polypeptide sequences where that unit has a defined function.
- the function is understood to be broadly defined and can be ligand binding, catalytic activity or can have a stabilizing effect on the structure of the protein.
- “Join” or “link” refers to any method known in the art for functionally connecting protein domains, including without limitation recombinant fusion with or without intervening domains; intein-mediated fusion; non-covalent association; and covalent bonding, including disulfide bonding; hydrogen bonding; electrostatic bonding; and conformational bonding, e.g., antibody-antigen, and biotin-avidin associations.
- a “fusion protein” refers to a chimeric molecule formed by the joining of two or more polypeptides through a bond formed one polypeptide and another polypeptide. Fusion proteins may also contain a linker polypeptide in between the constituent polypeptides of the fusion protein.
- the term “fusion construct” or “fusion protein construct” is generally meant to refer to a polynucleotide encoding a fusion protein.
- heterologous when used with reference to portions of a nucleic acid indicates that the nucleic acid comprises two or more subsequences that are not found in the same relationship to each other in nature.
- a nucleic acid is typically recombinantly produced, having two or more sequences from unrelated genes arranged to make a new functional nucleic acid, e.g., a nucleic acid encoding a fluorescent protein from one source and a nucleic acid encoding a peptide sequence from another source.
- a heterologous protein indicates that the protein comprises two or more subsequences that are not found in the same relationship to each other in nature (e.g., a fusion protein).
- a “reporter molecule” has a detectable phenotype.
- the reporter molecule is a polypeptide, such as an enzyme, or a fluorescent polypeptide.
- a reporter polypeptide may have intrinsic activity.
- a reporter molecule has a detectable phenotype associated with correct folding or solubility of the reporter molecule.
- the reporter could be an enzyme or a fluorescent polypeptide.
- the detectable phenotype would then be the ability to turn over a substrate giving a detectable product or change in substrate concentration or physical state.
- the activity would be the emission of fluorescence upon excitation by the appropriate wavelength(s) of light.
- nucleic acid or protein when applied to a nucleic acid or protein, denotes that the nucleic acid or protein is essentially free of other cellular components with which it is associated in the natural state. It is preferably in a homogeneous state although it can be in either a dry or aqueous solution. Purity and homogeneity are typically determined using analytical chemistry techniques such as polyacrylamide gel electrophoresis or high performance liquid chromatography. A protein which is the predominant species present in a preparation is substantially purified. In particular, an isolated gene is separated from open reading frames which flank the gene and encode a protein other than the gene of interest. The term “purified” denotes that a nucleic acid or protein gives rise to essentially one band in an electrophoretic gel. Particularly, it means that the nucleic acid or protein is at least 85% pure, more preferably at least 95% pure, and most preferably at least 99% pure.
- Nucleic acid refers to deoxyribonucleotides or ribonucleotides and polymers thereof in either single- or double-stranded form.
- the term encompasses nucleic acids containing known nucleotide analogs or modified backbone residues or linkages, which are synthetic, naturally occurring, and non-naturally occurring, which have similar binding properties as the reference nucleic acid, and which are metabolized in a manner similar to the reference nucleotides. Examples of such analogs include, without limitation, phosphorothioates, phosphoramidates, methyl phosphonates, chiral-methyl phosphonates, 2-O-methyl ribonucleotides, peptide-nucleic acids (PNAs).
- PNAs peptide-nucleic acids
- nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions) and complementary sequences, as well as the sequence explicitly indicated.
- degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues (Batzer et al., Nucleic Acid Res. 19:5081 (1991); Ohtsuka et al., J. Biol. Chem. 260:2605-2608 (1985); Rossolini et al., Mol. Cell. Probes 8:91-98 (1994)).
- nucleic acid is used interchangeably with gene, cDNA, mRNA, oligonucleotide, and polynucleotide.
- polypeptide peptide and protein are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one or more amino acid residue is an artificial chemical mimetic of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers and non-naturally occurring amino acid polymer.
- amino acid refers to naturally occurring and synthetic amino acids, as well as amino acid analogs and amino acid mimetics that function in a manner similar to the naturally occurring amino acids.
- Naturally occurring amino acids are those encoded by the genetic code, as well as those amino acids that are later modified, e.g., hydroxyproline, ⁇ -carboxyglutamate, and O-phosphoserine.
- Amino acid analogs refers to compounds that have the same basic chemical structure as a naturally occurring amino acid, i.e., an a carbon that is bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs have modified R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic chemical structure as a naturally occurring amino acid.
- Amino acid mimetics refers to chemical compounds that have a structure that is different from the general chemical structure of an amino acid, but that functions in a manner similar to a naturally occurring amino acid.
- Amino acids may be referred to herein by either their commonly known three letter symbols or by the one-letter symbols recommended by the IUPAC-IUB Biochemical Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly accepted single-letter codes.
- peptidomimetic and “mimetic” refer to a synthetic chemical compound that has substantially the same structural and functional characteristics of the polypeptides of the invention.
- Peptide analogs are commonly used in the pharmaceutical industry as non-peptide drugs with properties analogous to those of the template peptide. These types of non-peptide compound are termed “peptide mimetics” or “peptidomimetics” (Fauchere, J. Adv. Drug Res. 15:29 (1986); Veber and Freidinger TINS p. 392 (1985); and Evans et al. J. Med. Chem. 30:1229 (1987), which are incorporated herein by reference).
- Peptide mimetics that are structurally similar to therapeutically useful peptides may be used to produce an equivalent or enhanced therapeutic or prophylactic effect.
- peptidomimetics are structurally similar to a paradigm polypeptide (i.e., a polypeptide that has a biological or pharmacological activity), but have one or more peptide linkages optionally replaced by a linkage selected from the group consisting of, e.g., —CH2NH—, —CH2S—, —CH2-CH2-, —CH ⁇ CH— (cis and trans), —COCH2-, —CH(OH)CH2-, and —CH2SO—.
- a paradigm polypeptide i.e., a polypeptide that has a biological or pharmacological activity
- linkages optionally replaced by a linkage selected from the group consisting of, e.g., —CH2NH—, —CH2S—, —CH2-CH2-, —CH ⁇ CH— (cis and trans),
- the mimetic can be either entirely composed of synthetic, non-natural analogues of amino acids, or, is a chimeric molecule of partly natural peptide amino acids and partly non-natural analogs of amino acids.
- the mimetic can also incorporate any amount of natural amino acid conservative substitutions as long as such substitutions also do not substantially alter the mimetic's structure and/or activity.
- a mimetic composition is within the scope of the invention if it is capable of carrying out the binding or fluorescent activities of green fluorescent protein.
- “Conservatively modified variants” applies to both amino acid and nucleic acid sequences. With respect to particular nucleic acid sequences, conservatively modified variants refers to those nucleic acids which encode identical or essentially identical amino acid sequences, or where the nucleic acid does not encode an amino acid sequence, to essentially identical sequences. Because of the degeneracy of the genetic code, a large number of functionally identical nucleic acids encode any given protein. For instance, the codons GCA, GCC, GCG and GCU all encode the amino acid alanine. Thus, at every position where an alanine is specified by a codon, the codon can be altered to any of the corresponding codons described without altering the encoded polypeptide.
- nucleic acid variations are “silent variations,” which are one species of conservatively modified variations. Every nucleic acid sequence herein which encodes a polypeptide also describes every possible silent variation of the nucleic acid.
- each codon in a nucleic acid except AUG, which is ordinarily the only codon for methionine, and TGG, which is ordinarily the only codon for tryptophan
- TGG which is ordinarily the only codon for tryptophan
- amino acid sequences one of skill will recognize that individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alters, adds or deletes a single amino acid or a small percentage of amino acids in the encoded sequence is a “conservatively modified variant” where the alteration results in the substitution of an amino acid with a chemically similar amino acid.
- Conservative substitution tables providing functionally similar amino acids are well known in the art. For example, substitutions may be made wherein an aliphatic amino acid (G, A, I, L, or V) is substituted with another member of the group.
- an aliphatic polar-uncharged group such as C, S, T, M, N, or Q
- basic residues e.g., K, R, or H
- an amino acid with an acidic side chain, E or D may be substituted with its uncharged counterpart, Q or N, respectively; or vice versa.
- Each of the following eight groups contains other exemplary amino acids that are conservative substitutions for one another:
- substitutions may be made wherein an aliphatic amino acid (G, A, I, L, or V) is substituted with another member of the group.
- an aliphatic polar-uncharged group such as C, S, T, M, N, or Q, may be substituted with another member of the group; and basic residues, e.g., K, R, or H, may be substituted for one another.
- an amino acid with an acidic side chain, E or D may be substituted with its uncharged counterpart, Q or N, respectively; or vice versa.
- Each of the following eight groups contains other exemplary amino acids that are conservative substitutions for one another:
- Macromolecular structures such as polypeptide structures can be described in terms of various levels of organization. For a general discussion of this organization, see, e.g., Alberts et al., Molecular Biology of the Cell (3 rd ed., 1994) and Cantor and Schimmel, Biophysical Chemistry Part I: The Conformation of Biological Macromolecules (1980).
- Primary structure refers to the amino acid sequence of a particular peptide.
- “Secondary structure” refers to locally ordered, three dimensional structures within a polypeptide. These structures are commonly known as domains. Domains are portions of a polypeptide that form a compact unit of the polypeptide and are typically 25 to approximately 500 amino acids long.
- Typical domains are made up of sections of lesser organization such as stretches of ⁇ -sheet and ⁇ -helices.
- Tetiary structure refers to the complete three dimensional structure of a polypeptide monomer.
- Quaternary structure refers to the three dimensional structure formed by the noncovalent association of independent tertiary units. Anisotropic terms are also known as energy terms.
- nucleic acids or polypeptide sequences refer to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same (i.e., about 70% identity, preferably 75%, 80%, 85%, 90%, or 95% identity over a specified region, when compared and aligned for maximum correspondence over a comparison window, or designated region as measured using a BLAST or BLAST 2.0 sequence comparison algorithms with default parameters described below, or by manual alignment and visual inspection. Such sequences are then said to be “substantially identical.” This definition also refers to the compliment of a test sequence.
- the identity exists over a region that is at least about 22 amino acids or nucleotides in length, or more preferably over a region that is 30, 40, or 50-100 amino acids or nucleotides in length.
- similarity refers to two or more sequences or subsequences that have a specified percentage of amino acid residues that are either the same or similar as defined in the 8 conservative amino acid substitutions defined above (i.e., 60%, optionally 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98% or 99% similar over a specified region or, when not specified, over the entire sequence), when compared and aligned for maximum correspondence over a comparison window, or designated region as measured using one of the following sequence comparison algorithms or by manual alignment and visual inspection.
- sequences are then said to be “substantially similar.”
- this identity exists over a region that is at least about 50 amino acids in length, or more preferably over a region that is at least about 100, 200, 300, 400, 500 or 1000 or more amino acids in length.
- sequence comparison typically one sequence acts as a reference sequence, to which test sequences are compared.
- test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Default program parameters can be used, or alternative parameters can be designated.
- sequence comparison algorithm then calculates the percent sequence identities for the test sequences relative to the reference sequence, based on the program parameters.
- a “comparison window”, as used herein, includes reference to a segment of any one of the number of contiguous positions selected from the group consisting of from 20 to 600, usually about 50 to about 200, more usually about 100 to about 150 in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned.
- Methods of alignment of sequences for comparison are well-known in the art. Optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J. Mol. Biol.
- BLAST and BLAST 2.0 are used, typically with the default parameters described herein, to determine percent sequence identity for the nucleic acids and proteins of the invention.
- Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information.
- This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence.
- T is referred to as the neighborhood word score threshold (Altschul et al., supra).
- a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached.
- the BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment.
- the BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin & Altschul, Proc. Nat'l. Acad. Sci. USA 90:5873-5787 (1993)).
- One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance.
- P(N) the smallest sum probability
- a nucleic acid is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid to the reference nucleic acid is less than about 0.2, more preferably less than about 0.01, and most preferably less than about 0.001.
- the default parameters of BLAST are also often employed to determined percent identity or percent similarity.
- nucleic acid sequences or polypeptides are substantially identical is that the polypeptide encoded by the first nucleic acid is immunologically cross reactive with the antibodies raised against the polypeptide encoded by the second nucleic acid, as described below.
- a polypeptide is typically substantially identical to a second polypeptide, for example, where the two peptides differ only by conservative substitutions.
- Another indication that two nucleic acid sequences are substantially identical is that the two molecules or their complements hybridize to each other under stringent conditions, as described below.
- Yet another indication that two nucleic acid sequences are substantially identical is that the same primers can be used to amplify the sequence.
- Antibody refers to a polypeptide comprising a framework region from an immunoglobulin gene or fragments thereof that specifically binds and recognizes an antigen.
- the recognized immunoglobulin genes include the kappa, lambda, alpha, gamma, delta, epsilon, and mu constant region genes, as well as the myriad immunoglobulin variable region genes.
- Light chains are classified as either kappa or lambda.
- Heavy chains are classified as gamma, mu, alpha, delta, or epsilon, which in turn define the immunoglobulin classes, IgG, IgM, IgA, IgD and IgE, respectively.
- An exemplary immunoglobulin (antibody) structural unit comprises a tetramer.
- Each tetramer is composed of two identical pairs of polypeptide chains, each pair having one “light” (about 25 kDa) and one “heavy” chain (about 50-70 kDa).
- the N-terminus of each chain defines a variable region of about 100 to 110 or more amino acids primarily responsible for antigen recognition.
- the terms variable light chain (V L ) and variable heavy chain (V H ) refer to these light and heavy chains respectively.
- Antibodies exist, e.g., as intact immunoglobulins or as a number of well-characterized fragments produced by digestion with various peptidases.
- pepsin digests an antibody below the disulfide linkages in the hinge region to produce F(ab)′ 2 , a dimer of Fab which itself is a light chain joined to V H -C H 1 by a disulfide bond.
- the F(ab)′ 2 may be reduced under mild conditions to break the disulfide linkage in the hinge region, thereby converting the F(ab)′ 2 dimer into an Fab′ monomer.
- the Fab′ monomer is essentially Fab with part of the hinge region (see Fundamental Immunology (Paul ed., 3d ed.
- antibody fragments are defined in terms of the digestion of an intact antibody, one of skill will appreciate that such fragments may be synthesized de novo either chemically or by using recombinant DNA methodology.
- the term antibody also includes antibody fragments either produced by the modification of whole antibodies, or those synthesized de novo using recombinant DNA methodologies (e.g., single chain Fv) or those identified using phage display libraries (see, e.g., McCafferty et al., Nature 348:552-554 (1990)).
- any technique known in the art can be used (see, e.g., Kohler & Milstein, Nature 256:495-497 (1975); Kozbor et al., Immunology Today 4: 72 (1983); Cole et al., pp. 77-96 in Monoclonal Antibodies and Cancer Therapy (1985)).
- Techniques for the production of single chain antibodies can be adapted to produce antibodies to polypeptides of this invention.
- transgenic mice, or other organisms such as other mammals may be used to express humanized antibodies.
- phage display technology can be used to identify antibodies and heteromeric Fab fragments that specifically bind to selected antigens (see, e.g., McCafferty et al., Nature 348:552-554 (1990); Marks et al., Biotechnology 10:779-783 (1992)).
- the phrase “selectively (or specifically) hybridizes to” refers to the binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under stringent hybridization conditions when that sequence is present in a complex mixture (e.g., total cellular or library DNA or RNA).
- stringent hybridization conditions refers to conditions under which a probe will hybridize to its target subsequence, typically in a complex mixture of nucleic acid, but to no other sequences. Stringent conditions are sequence-dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures. An extensive guide to the hybridization of nucleic acids is found in Tijssen, Techniques in Biochemistry and Molecular Biology—Hybridization with Nucleic Probes , “Overview of principles of hybridization and the strategy of nucleic acid assays” (1993). Generally, stringent conditions are selected to be about 5-10° C. lower than the thermal melting point (T m ) for the specific sequence at a defined ionic strength pH.
- T m thermal melting point
- the T m is the temperature (under defined ionic strength, pH, and nucleic concentration) at which 50% of the probes complementary to the target hybridize to the target sequence at equilibrium (as the target sequences are present in excess, at T m , 50% of the probes are occupied at equilibrium).
- Stringent conditions will be those in which the salt concentration is less than about 1.0 M sodium ion, typically about 0.01 to 1.0 M sodium ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30° C. for short probes (e.g., 10 to 50 nucleotides) and at least about 60° C. for long probes (e.g., greater than 50 nucleotides).
- Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide.
- destabilizing agents such as formamide.
- a positive signal is at least two times background, optionally 10 times background hybridization.
- Exemplary stringent hybridization conditions can be as following: 50% formamide, 5 ⁇ SSC, and 1% SDS, incubating at 42° C., or 5 ⁇ SSC, 1% SDS, incubating at 65° C., with wash in 0.2 ⁇ SSC, and 0.1% SDS at 65° C. Such washes can be performed for 5, 15, 30, 60, 120, or more minutes.
- Nucleic acids that do not hybridize to each other under stringent conditions are still substantially identical if the polypeptides which they encode are substantially identical. This occurs, for example, when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code. In such cases, the nucleic acids typically hybridize under moderately stringent hybridization conditions.
- Exemplary “moderately stringent hybridization conditions” include a hybridization in a buffer of 40% formamide, 1 M NaCl, 1% SDS at 37° C., and a wash in 1 ⁇ SSC at 45° C. Such washes can be performed for 5, 15, 30, 60, 120, or more minutes. A positive hybridization is at least twice background. Those of ordinary skill will readily recognize that alternative hybridization and wash conditions can be utilized to provide conditions of similar stringency.
- a nucleic acid sequence encoding refers to a nucleic acid which contains sequence information for a structural RNA such as rRNA, a tRNA, or the primary amino acid sequence of a specific protein or peptide, or a binding site for a trans-acting regulatory agent. This phrase specifically encompasses degenerate codons (i.e., different codons which encode a single amino acid) of the native sequence or sequences which may be introduced to conform with codon preference in a specific host cell.
- recombinant when used with reference, e.g., to a cell, or nucleic acid, protein, or vector, indicates that the cell, nucleic acid, protein or vector, has been modified by the introduction of a heterologous nucleic acid or protein or the alteration of a native nucleic acid or protein, or that the cell is derived from a cell so modified.
- recombinant cells express genes that are not found within the native (nonrecombinant) form of the cell or express native genes that are otherwise abnormally expressed, under-expressed or not expressed at all.
- an “expression vector” is a nucleic acid construct, generated recombinantly or synthetically, with a series of specified nucleic acid elements that permit transcription of a particular nucleic acid in a host cell.
- the expression vector can be part of a plasmid, virus, or nucleic acid fragment.
- the expression vector includes a nucleic acid to be transcribed operably linked to a promoter.
- the specified antibodies bind to a particular protein and do not bind in a significant amount to other proteins present in the sample. Specific binding to an antibody under such conditions may require an antibody that is selected for its specificity for a particular protein.
- antibodies raised against a protein having an amino acid sequence encoded by any of the polynucleotides of the invention can be selected to obtain antibodies specifically immunoreactive with that protein and not with other proteins, except for polymorphic variants.
- a variety of immunoassay formats may be used to select antibodies specifically immunoreactive with a particular protein.
- solid-phase ELISA immunoassays, Western blots, or immunohistochemistry are routinely used to select monoclonal antibodies specifically immunoreactive with a protein. See, Harlow and Lane Antibodies, A Laboratory Manual , Cold Spring Harbor Publications, NY (1988) for a description of immunoassay formats and conditions that can be used to determine specific immunoreactivity.
- a specific or selective reaction will be at least twice the background signal or noise and more typically more than 10 to 100 times background.
- the polypeptide is joined to a folding interference domain, which causes the polypeptide to fold poorly.
- the DNA encoding the polypeptide can then be mutagenized.
- Sequence alterations that overcome the poor folding imposed by the folding interference domain can be identified by an increase in the activity of the polypeptide or a reporter linked to the polypeptide.
- sequence mutations can include modification of coding sequence, deletion of coding sequence, insertion of additional coding sequences, change of order of coding sequences, within the existing coding sequence or at the N or C termini (5′ or 3′ end of the encoding nucleic acid), non-native amino acids.
- This method was used to generate “superfolder” variants of the Green Fluorescent Protein, GFP, of the luminescent jellyfish Aequorea victoria and the red fluorescent protein from Discosoma species, DsRed, both of which exhibit enhanced folding and stability properties.
- a detectable moiety can be linked to the target polypeptide/folding interference domain fusion protein to provide a means of assaying for enhanced folding.
- the method of selecting robustly-folding proteins has wide applicability.
- target protein P has an easily measured phenotype
- its folding (or solubility) success can be monitored in the presence of a bait protein domain, herein termed a “folding interference domain” (Xid), as Xid-L-P, for example.
- Xid folding interference domain
- These bait domains may also be inserted internally into permissive sites of P, e.g., for GFP at position 145 as further described in the Examples, infra. New variants of target protein P, better suited for folding and/or solubility under stringent conditions can thereby be produced.
- a reporter domain can be used, for example, in a construct such as Xid-L1-P-L2-R, where R is the reporter domain that tells about the folding of P, Xid is the folding interference domain, and L1 and L2 are flexible linkers.
- this method can also be applied in a block-optimization of a new protein scaffolding, P, comprised of a series of smaller domains, or subdomains of P (P 1 , P 2 , etc.).
- a construct such as Xid-L-P 1 -R is used to optimize P 1 using R as the reporter.
- a subdomain, P 2 can be added, e.g., in a construct such as Xid-L-P 2 -P 1 -R and used to optimize P 2 using R as the reporter.
- P 1 can be optimized for folding at the same time.
- the same reporter domain need not be used to optimize each P N .
- the entire P domain is built from the smaller subdomains.
- the methods of the invention can be used to increase folding and solubility of a target polypeptide as well as subdomains contained within the target polypeptide.
- the current invention employs basic nucleic acid methodology that is routine in the field of recombinant genetics.
- Basic texts disclosing the general methods of obtaining and manipulating nucleic acids in this invention include Sambrook and Russell, MOLECULAR CLONING, A LABORATORY MANUAL (3rd ed. 2001) and CURRENT PROTOCOLS IN MOLECULAR BIOLOGY (Ausubel et al., eds., John Wiley & Sons, Inc. 1994-1997, 2001 version)).
- nucleic acid sequences encoding the fusion proteins of the invention are generated using amplification techniques.
- amplification techniques Examples of techniques sufficient to direct persons of skill through in vitro amplification methods are found in Berger, Sambrook, and Ausubel, as well as Dieffenfach & Dveksler, PCR Primers: A Laboratory Manual (1995): Mullis et al., (1987) U.S. Pat. No. 4,683,202; PCR Protocols A Guide to Methods and Applications (Innis et al., eds) Academic Press Inc. San Diego, Calif. (1990) (Innis); Arnheim & Levinson (Oct.
- Folding interference domains can be identified by screening a library.
- a library can be generated in which peptides fragments are generated to a target protein, e.g., green fluorescent protein, and selecting the recombinants in which the signal from the target protein fused to a peptide fragment is less than, for example, about 10% of the signal from a control recombinant that encodes only the target polypeptide.
- an assay such as the folding assay disclosed by Waldo et al., in Nature Biotech. 17:691-695, 1999, may be employed. Waldo et al. describe a GFP that does not fold well when fused after bull frog red cell H-ferritin. The folding yield of the GFP in the RanaH-L-GFP fusion was approximately 1/50 that of GFP expressed alone. In that work, several other proteins substantially reduced the folding yield of the GFP domain ( ⁇ 10% that of the GFP alone)
- Desirable folding interference domains would be those that decrease the folding yield of the test protein or fused reporter domain, while maintaining the level of expression of the fusion protein at a level similar to that of the test protein alone, or the test protein plus reporter domain (i.e., the expression level of the test protein or test protein-reporter domain fusion should be similar to the fusion containing the trapped peptide fragment).
- any number of proteins or protein domains can be used as a folding interference domain.
- bull frog red cell H-ferritin folds poorly when expressed by itself, and when included in a fusion polypeptide, causes the fusion polypeptide to fold poorly.
- Other poorly folding domains include, but are not limited to the Alzheimer's ⁇ / ⁇ peptide (amino acids 1-40 of the Alzheimer's precursor protein); domain A of the xylR TOL operon regulatory protein of Pseudomonas putida Perez-Martin, J; Cases, I; deLorenzo, V Design of a solubilization pathway for recombinant polypeptides in vivo through processing of a bi-protein with a viral protease PROTEIN ENGINEERING; JUN 1997; v. 10, no. 6, p.
- nucleoside diphosphate kinase of the hyperthermophile Pyrobaculum aerophilum (Pedelacq et al, 2002, Nature Biotechnol. 20 (9): 927-932). Any of the insoluble, poorly folded domains described in Waldo et al., in Nature Biotech. 17:691-695, 1999.
- folding interference domains are mostly insoluble when expressed alone in E. coli .
- the folding interference domain need not be insoluble when expressed alone.
- Some peptides are at least partially soluble when expressed alone or with well-folded highly soluble polypeptides ( ⁇ at least 40% soluble), but can nonetheless induce misfolding and poor solubility of many fused polypeptides.
- Such polypeptides include the lacZ ⁇ domain (the first 80-100 N-terminal amino acids of the beta galactosidase, a fragment commonly used in protein complementation assays).
- the folding interference domain may be linked, either directly or via a linker, to either the N-terminus or C-terminus of the target polypeptide sequence.
- the domain may be inserted into an internal site of the target polypeptide that is permissive to the insertion.
- a permissive site of a host protein is one which tolerates the insertion of well-folded, soluble proteins or polypeptides (guest polypeptides) within the host protein scaffolding. Typical sites are turns and sterically open regions.
- One such example is amino acid residue 87 of Escherichia coli dihydrofolate reductase.
- a site is defined as permissive if the host protein containing the guest polypeptide retains at least 5%, or 10%, or preferably at least 20% of the host protein activity observed without the guest.
- a target polypeptide can be any polypeptide for which it is desirable to improve the folding properties. Often such polypeptides include those with reporter activity, such as a fluorescent protein, i.e., green or red fluorescent protein.
- reporter activity such as a fluorescent protein, i.e., green or red fluorescent protein.
- Other proteins include various enzymes, e.g., antibiotic resistance proteins such as, chloramphenicol acetyltransferase, kanamycin resistance protein, beta-lactamase, tetracycline resistance protein, dihydrofolate reductase; and other enzymes such as subtilisin, fungal xylanases.
- Other target proteins include antibodies, for which increased binding to the target antigen can be used as the selection criterion.
- a particular aspect of the invention relates to the generation of superfolder fluorescent and chromophoric protein variants, and is described in further detail below and in the Examples, infra.
- a variety of fluorescent proteins and chromoproteins may be “evolved” according to the methods of the invention to generate variants having improved folding and/or solubility properties.
- the superfolder fluorescent and chromophoric protein variants generally share a common tertiary structure comprising an 11-stranded beta-barrel structure surrounding a centrally-located self-activating chromophore.
- GFP Green Fluorescent Protein isolated from Aequorea victoria
- GFP variants such as cyan fluorescent protein, blue fluorescent protein, yellow fluorescent protein, etc.
- SEQ ID NO:2 SEQ ID NO:2
- color shift mutants of GFP have been developed and may be employed in the directed evolution methods of the present invention. These color-shift GFP mutants have emission colors blue to yellow-green, increased brightness, and photostability (Tsien et al., 1998, Annual Review of Biochemistry 67: 509-544).
- One such GFP mutant termed the Enhanced Yellow Fluorescent Protein, displays an emission maximum at 529 nm.
- GPF-based variants having modified excitation and emission spectra (Tsien et al., U.S. Patent Appn. 20020123113A1), enhanced fluorescence intensity and thermal tolerance (Thastrup et al., U.S. Patent Appn. 20020107362A1; Bjorn et al., U.S. Patent Appn. 20020177189A1), and chromophore formation under reduced oxygen levels (Fisher, U.S. Pat. No. 6,414,119) have also been described. Most recently, GFPs from the anthozoans Renilla reniformis and Renilla kollikeri were described (Ward et al., U.S. Patent Appn. 20030013849).
- DsRed red fluorescent protein isolated from Discosoma species of coral
- DsRed accession number AF168419 version AF168419.2
- DsRed and the other anthozoan fluorescent proteins share only about 26-30% amino acid sequence identity to the wild-type GFP from Aequorea victoria , yet all the crucial motifs are conserved, indicating the formation of the 11-stranded beta-barrel structure characteristic of GFP.
- DsRed mutants of the longer wavelength red fluorescent protein DsRed have also been described, and similarly, may be employed in the directed evolution methods of the invention.
- DsRed mutants with emission spectra shifted further to the red may be employed in the practice of the invention (Wiehler et al., 2001, FEBS Letters 487: 384-389; Terskikh et al., 2000, Science 290: 1585-1588; Baird et al., 2000, Proc. Natl. Acad. Sci. USA 97: 11984-11989).
- Fluorescent proteins from Anemonia majano, Zoanthus sp., Discosoma striata, Discosoma sp. and Clavularia sp. have also been reported (Matz et al., supra).
- a fluorescent protein cloned from the stony coral species, Trachyphyllia geoffroyi has been reported to emit green, yellow, and red light, and to convert from green light to red light emission upon exposure to UV light (Ando et al., 2002, Proc. Natl. Acad. Sci. USA 99: 12651-12656).
- fluorescent proteins from sea anemones include green and orange fluorescent proteins cloned from Anemonia sulcata (Wiedenmann et al., 2000, Proc. Natl. Acad. Sci. USA 97: 14091-14096), a naturally enhanced green fluorescent protein cloned from the tentacles of Heteractis magnifica (Hongbin et al., 2003, Biochem. Biophys. Res. Commun.
- GFP-related proteins having chromophoric and fluorescent properties
- One such group of coral-derived proteins the pocilloporins
- exhibit a broad range of spectral and fluorescent characteristics Dove and Hoegh-Guldberg, 1999, PCT application WO 00/46233; Dove et al., 2001, Coral Reefs 19: 197-204.
- Rtms5 is deep blue in color, yet is weakly fluorescent.
- Rtms5 as well as other chromoproteins with sequence homology to Rtms5, can be interconverted to a far-red fluorescent protein via single amino acid substitutions (Beddoe et al., 2003, supra; Bulina et al., 2002, BMC Biochem. 3: 7; Lukyanov et al., 2000, supra).
- fluorescent and chromophoric protein variants exhibiting enhanced folding or solubility are generated from any fluorescent or chromophoric protein having a structure with a root mean square deviation of less than 5 angstroms, often less than 3, or 4 angstroms, and preferably less than 2 angstroms from the 11-stranded beta-barrel structure of Aequorea victoria GFP MMDB Id:5742.
- fluorescent proteins exist in multimeric form.
- DsRed is tetrameric (Cotlet et al., 2001, Proc. Natl. Acad. Sci. USA 98: 14398014403).
- structural deviation between such multimeric fluorescent proteins and GFP (a monomer) is evaluated on the basis of the monomeric unit of the structure of the fluorescent protein.
- such a suitable fluorescent protein or chromoprotein structure can be identified using comparison methodology well known in the art.
- identifying the protein a crucial feature in the alignment and comparison to the MMDB ID:5742 structure is the conservation of the 11 beta strands, and the topology or connection order of the secondary structural elements (see, e.g., Ormo et al. “Crystal structure of the Aequorea victoria green fluorescent protein.” Yang et al, 1996, Science 273: 5280, 1392-5; Yang et al., 1996 Nat. Biotechnol. 10:1246-51).
- the two structures to be compared are aligned using algorithms familiar to those with average skill in the art, using for example the CCP4 program suite.
- COLLABORATIVE COMPUTATIONAL PROJECT NUMBER 4. 1994. “The CCP4 Suite: Programs for Protein Crystallography”. Acta Cryst. D50, 760-763.
- the user inputs the PDB coordinate files of the two structures to be aligned, and the program generates output coordinates of the atoms of the aligned structures using a rigid body transformation (rotation and translation) to minimize the global differences in position of the atoms in the two structures.
- the output aligned coordinates for each structure can be visualized separately or as a superposition by readily-available molecular graphics programs such as RASMOL, Roger A. Sayle and E. J. Milner-White, “RasMol: Biomolecular graphics for all”, Trends in Biochemical Science (TIBS), September 1995, Vol. 20, No. 9, p. 374), or Swiss PDB Viewer, Guex, N and Peitsch, M. C. (1996) Swiss-PdbViewer: A Fast and Easy-to-use PDB Viewer for Macintosh and PC. Protein Data Bank Quarterly Newsletter 77, pp. 7.
- molecular graphics programs such as RASMOL, Roger A. Sayle and E. J. Milner-White, “RasMol: Biomolecular graphics for all”, Trends in Biochemical Science (TIBS), September 1995, Vol. 20, No. 9, p. 374), or Swiss PDB Viewer, Guex, N and Peitsch, M. C. (1996) Swiss-Pdb
- the RMSD value scales with the extent of the structural alignments and this size is taken into consideration when using the RMSD as a descriptor of overall structural similarity.
- the issue of scaling of RMSD is typically dealt with by including blocks of amino acids that are aligned within a certain threshold. The longer the unbroken block of aligned sequence that satisfies a specified criterion, the ‘better’ aligned the structures are.
- 164 of the c-alpha carbons can be aligned to within 1 angstrom of the GFP.
- users skilled in the art will select a program that can align the two trial structures based on rigid body transformations, for example DALI, Holm, L.
- the server site for the computer implementation of the algorithm is available, for example, at dali@ebi.ac.uk.
- GFP3 “Crameri” cycle 3 GFP
- the improved GFPs of the invention comprise at least 80% identity to SEQ ID NO: 5 and contain at least one amino acid substitution selected from the group consisting of a substitution at position 30 that is an arginine or a conservative variant of arginine; a substitution at position 39 that is an asparagine or a conservative variant of asparagine; a substitution at position 105 that is a threonine or a conservative variant of threonine; a substitution at position 171 that is a valine or a conservative variant of valine; and a substitution at position 206 that is a valine or a conservative variant of valine.
- GFP SF superfolder GFP variant
- the positions are typically determined with reference to SEQ ID NO: 5. Thus, as appreciated by one of skill in the art, the positions do not refer to the number of amino acids in the protein, but the position relative to SEQ ID NO: 5.
- a GFP sequence is maximally aligned with SEQ ID NO: 5, for example by manual alignment or using the Smith & Waterman alignment (see, e.g., Adv. Appl. Math. 2:482 (1981)) with the default parameters.
- the residue of the GFP sequence that aligns with position 30 of SEQ ID NO: 5 is considered to be position 30 of the GFP sequence.
- a “green” fluorescent protein of the invention often fluoresces green, but may also have yellow or blue fluorescence.
- a single amino acid change provide detail shifts the fluorescence from green to blue.
- a superfolding yellow fluorescent protein (sfYFP) can be made from the superfolding GFP disclosed herein by adding the single amino acid change T203Y.
- folding of the existing BFP and YFP proteins (Tsien, 1998) Annu. Rev. Biochem. 67: 509-544; Miyawaki et al, 1999, Proc. Natl. Acad. Sci. USA 96: 2135-2140), which is equivalent to the canonical GFP with the mutations S65G, V68L, Q69K, S72A, and T203Y). can each also be improved by making the substitutions disclosed herein.
- the directed evolution method of the invention has also applied to the generation of a superfolder DsRed fluorescent protein.
- a superfolder DsRed variant (“DsRed SF ”) is provided, and has the amino acid sequence of SEQ ID NO: 4
- DsRed SF has the amino acid sequence of SEQ ID NO: 3
- an amino acid linker sequence is employed to separate the first and second polypeptide components by a distance sufficient to ensure that each polypeptide could fold into its secondary and tertiary structures.
- Such an amino acid linker sequence is incorporated into the fusion protein using standard techniques well known in the art.
- Suitable peptide linker sequences may be chosen based on the following factors: (1) their ability to adopt a flexible extended conformation; (2) their inability to adopt a secondary structure that can interact with functional epitopes on the first and second polypeptides; and (3) the lack of hydrophobic or charged residues that might react with the polypeptide functional epitopes.
- Typical peptide linker sequences contain Gly, Ser, Ala, Val and Thr residues.
- a linker is a “flexible linker”, that has a sequence such as (Gly 4 Ser) x , e.g., (Gly 4 Ser) 3 .
- linker sequences which may be usefully employed as linkers include those disclosed in Maratea et al. (1985) Gene 40:39-46; Murphy et al. (1986) Proc. Natl. Acad. Sci. USA 83:8258-8262; U.S. Pat. Nos. 4,935,233 and 4,751,180.
- the linker sequence may generally be from 1 to about 50 amino acids in length, e.g., 3, 4, 6, or 10 amino acids in length, but can be 100 or 200 amino acids in length. Linker sequences may not be required when the first and second polypeptides have non-essential N-terminal amino acid regions that can be used to separate the functional domains and prevent steric interference.
- ком ⁇ онентs of the chimeric protein include ionic binding by expressing negative and positive tails, and indirect binding through antibodies and streptavidin-biotin interactions. (See, e.g., Bioconjugate Techniques, supra).
- the components can also be joined together through an intermediate interacting sequence.
- the moieties included in the conjugate molecules can be joined in any order, although the most favorable configuration may be determined empirically.
- Fusion constructs can be made by ligating the appropriate nucleic acid sequences encoding the desired amino acid sequences to each other by methods known in the art, in the proper reading frame, and expressing the product by methods known in the art.
- Nucleic acids encoding the domains to be incorporated into the fusion proteins of the invention can be obtained using routine techniques in the field of recombinant genetics (see, e.g., Sambrook and Russell, eds, Molecular Cloning: A Laboratory Manual, 3rd Ed, vols. 1-3, Cold Spring Harbor Laboratory Press, 2001; and Current Protocols in Molecular Biology, Ausubel, ed. John Wiley & Sons, Inc. New York, 1997).
- nucleic acid sequences encoding the component domains to be incorporated into the fusion protein are cloned from cDNA and genomic DNA libraries by hybridization with probes, or isolated using amplification techniques with oligonucleotide primers. Amplification techniques can be used to amplify and isolate sequences from DNA or RNA (see, e.g., Dieffenbach & Dveksler, PCR Primers: A Laboratory Manual (1995)). Alternatively, overlapping oligonucleotides can be produced synthetically and joined to produce one or more of the domains. Nucleic acids encoding the component domains can also be isolated from expression libraries using antibodies as probes.
- the nucleic acid sequence or subsequence is PCR amplified, using a sense primer containing one restriction site and an antisense primer containing another restriction site. This will produce a nucleic acid encoding the desired domain sequence or subsequence and having terminal restriction sites.
- This nucleic acid can then be easily ligated into a vector containing a nucleic acid encoding the second domain and having the appropriate corresponding restriction sites.
- the domains can be directly joined or may be separated by a linker, or other, protein sequence.
- Suitable PCR primers can be determined by one of skill in the art using the sequence information provided in GenBank or other sources.
- Appropriate restriction sites can also be added to the nucleic acid encoding the protein or protein subsequence by site-directed mutagenesis.
- the plasmid containing the domain-encoding nucleotide sequence or subsequence is cleaved with the appropriate restriction endonuclease and then ligated into an appropriate vector for amplification and/or expression according to standard methods.
- polypeptides encoding the components of the conjugate molecules may be desirable to modify the polypeptides encoding the components of the conjugate molecules.
- One of skill will recognize many ways of generating alterations in a given nucleic acid construct. Such well-known methods include site-directed mutagenesis, PCR amplification using degenerate oligonucleotides, exposure of cells containing the nucleic acid to mutagenic agents or radiation, chemical synthesis of a desired oligonucleotide (e.g., in conjunction with ligation and/or cloning to generate large nucleic acids) and other well-known techniques. See, e.g., Giliman and Smith (1979) Gene 8:81-97, Roberts et al. (1987) Nature 328: 731-734.
- the domains can be modified to facilitate the linkage of the two domains to obtain the polynucleotides that encode the fusion polypeptides of the invention.
- Catalytic domains and binding domains that are modified by such methods are also part of the invention.
- a codon for a cysteine residue can be placed at either end of a domain so that the domain can be linked by, for example, a disulfide linkage.
- the modification can be performed using either recombinant or chemical methods (see, e.g., Pierce Chemical Co. catalog, Rockford Ill.).
- linkers usually polypeptide sequences of neutral amino acids such as serine or glycine, that can be of varying lengths, for example, about 200 amino acids or more in length, with 1 to 100 amino acids being typical.
- the linkers are 10, 15, 20, 25, 30, 35, 40, 45, or 50 amino acid residues or less in length.
- proline residues are incorporated into the linker to prevent the formation of significant secondary structural elements by the linker.
- Linkers can often be flexible amino acid subsequences that are synthesized as part of a recombinant fusion protein. Such flexible linkers are known to persons of skill in the art.
- a flexible linker is a peptide linker of any length whose amino acid composition is rich in glycine to minimize the formation of rigid structure by interaction of amino acid side chains with each other or with the polypeptide backbone.
- a typical flexible linker has the composition (Gly 4 Ser) x .
- the recombinant nucleic acids encoding the fusion proteins of the invention are modified to provide preferred codons which enhance translation of the nucleic acid in a selected organism (e.g., yeast preferred codons are substituted into a coding nucleic acid for expression in yeast).
- Target polypeptides with enhanced folding ability are typically identified by mutating the nucleic acid sequence encoding the target polypeptide, generating a fusion protein (comprising the mutated target polypeptide, a poorly folding domain, and optionally, a reporter gene), and selecting those polypeptides with enhanced reporter activity, thus identifying target polypeptides that overcome the poor folding property imposed by the poorly folding domain.
- the nucleic acid sequences encoding the target polypeptide of interest can be mutated using methods well known to those of ordinary skill in the art.
- the target polypeptide is usually mutated by mutating the nucleic acid.
- Techniques for mutagenizing are well known in the art. These include, but are not limited to, such techniques as error-prone PCR, chemical mutagenesis, and cassette mutagenesis Alternatively, mutator strains of host cells may be employed to add mutational frequency (Greener and Callahan (1995) Strategies in Mol. Biol. 7: 32).
- error-prone PCR see, e.g., Ausubel, supra
- mutagenesis methods include, for example, recombination (WO98/42727); oligonucleotide-directed mutagenesis (see, e.g., the review in Smith, Ann. Rev. Genet. 19: 423-462 (1985); Botstein and Shortle, Science 229: 1193-1201 (1985); Carter, Biochem. J. 237: 1-7 (1986); Kunkel, “The efficiency of oligonucleotide directed mutagenesis” in Nucleic acids & Molecular Biology , Eckstein and Lilley, eds., Springer Verlag, Berlin (1987), Methods in Enzymol. 100: 46.8-500 (1983), and Methods in Enzymol.
- Additional methods include point mismatch repair (Kramer et al., Cell 38: 879-887 (1984)), mutagenesis using repair-deficient host strains (Carter et al., Nucl. Acids Res. 13: 4431-4443 (1985); Carter, Methods in Enzymol. 154: 382-403 (1987)), deletion mutagenesis (Eghtedarzadeh and Henikoff, Nucl. Acids Res. 14: 5115 (1986)), restriction-selection and restriction-purification (Wells et al., Phil. Trans. R. Soc. Lond.
- Kits for mutagenesis are commercially available (e.g., Bio-Rad, Amersham International).
- More recent approaches include codon-based mutagenesis, in which entire codons are replaced, thereby increasing the diversity of mutants generated, as exemplified by the RID method described in Murakami et al., 2002, Nature Biotechnology, 20: 76-81.
- Folding may be detected and assessed using various tests commonly used to determine correct folding, including without limitation spectroscopy, resistance to denaturation, kinetics, and tolerance for additional random mutations and polypeptide insertions.
- circular dichroism may be used to distinguish between folded and unfolded forms of a polypeptide.
- folding kinetics may be used, wherein better folded versions of P are identified by their ability to adopt a correctly folded conformation faster than poorer folding variants or the wild type protein.
- the evolved polypeptide will display about a 25% faster refolding time following denaturation.
- resistance to denaturation may be used to assess folding.
- increasing concentrations of urea may be used to assess more robustly folding variants.
- a polypeptide variant with significantly improved folding activity is typically one which can tolerate about a 0.5 molar higher urea concentration compared to the wild type or starting polypeptide.
- Tolerance to random mutations may also be used to assess the folding enhancement achieved following polypeptide evolution. Briefly, a library of random mutants of both the wild type (or pre-evolved) polypeptide and the test evolved polypeptide are generated. A 0.7% amino acid mutation rate, for example, may be appropriate. The library clones are then evaluated for fluorescence as a measure of correct folding. The presence and extent to which the evolved polypeptide mutant library displays a greater number of fluorescent clones relative to the wild type mutant library indicates the folding robustness of the evolved test polypeptide.
- random insertion mutant libraries may be created using, for example, transposon-mediated mutagenesis techniques (Gorshin et al., 2000, Nature Biotechnol. 18: 97) and commercially available kits (e.g., Epicentre Technologies, Madison, Wis.). More robustly folding mutants in the evolved mutant library relative to the unevolved mutant polypeptide library provides an indication of the extent to which the evolved test polypeptide has enhanced folding properties. Similarly, the tolerance to larger insertions may provide an indication of the extent to which the evolved polypeptide has acquired enhanced folding properties.
- Another method for evaluating acquisition of enhanced folding in evolved polypeptides involves the generation of circular permutants of the test evolved polypeptide. Briefly, the native N and C termini of the test evolved polypeptide are ligated together at the polynucleotide level, and start codons are randomly introduced into the coding sequence. A library of circular permutants is then expressed and compared to a library of circular permutants generated from the unevolved polypeptide, wherein the relative number of permissive sites for the randomly inserted start codons may be determined by a functional screen indicative of correct folding and thereby provides an indication of folding enhancement acquired by the evolved polypeptide.
- FRET Fluorescence Resonance Energy Transfer
- FRET is the non-radiative transfer of energy from a donor fluorophore to an acceptor fluorophore spatially located within about 80 Angstroms of each other.
- the relative geometric context of the two fluorophores is an important component of FRET. Circular permutation may be used to alter the geometric orientation of the fluorophores relative to each other.
- a biological property of a protein of interest may be measured as an indication of folding.
- the protein is a fluorescent or chromophoric protein
- the presence and intensity of emitted fluorescence or color, respectively provides an indication of folding.
- Brighter fluorescence for example, provides an indication of better folding in relation to dimmer variants of P (or colonies expressing P).
- misfolded proteins often aggregate and become insoluble, and a corresponding test may be applied by first determining that the correctly folded protein is soluble, and that the incorrectly folded protein is insoluble. For example, if the protein is an enzyme, and the correctly folded enzyme is active and its activity can be measured, and the soluble protein is active while the insoluble protein is inactive, then if Xid-L-P is soluble and active, P would be inferred to be correctly folded. If Xid-L-P is not active, and also insoluble, then it may be concluded that P is misfolded. Xid-L-P might be active and yet insoluble, or Xid-L-P might be soluble but inactive.
- the solubility of Xid-L-P could be used to determine the folding of P in Xid-L-P as above. If the correctly folded version of P binds a target peptide Pt, and the binding can be detected, for example if Pt is an antibody that is conjugated to a reporter domain R, or has and intrinsically detectable signal, or P and Pt are binding or folding partners, or P and Pt comprise two of at least two domains of a split protein or multiprotein complex, which has a detectable phenotype when the fragments or components are assembled, the assembly dependent on the correct folding of P in Xid-L-P.
- folding of P could be measured by the resistance of P to limited proteolysis coupled to selection by phage display (in which case the method is a way of increasing the stringency of selection by phage display (Martin et al., 2001, J. Mol. Biol. 309(3): 717-26.
- the folding of P in Xid-L-P could be detected by using a folding reporter such as GFP or some other protein with a detectable phenotype (enzyme activity, fluorescence, ability to bind other proteins or molecules) such that the detection of R in Xid-L-P-R is an indication of correct folding by R and therefore of P (see Waldo patent “method for determining and modifying protein/peptide solubility).
- a folding reporter such as GFP or some other protein with a detectable phenotype (enzyme activity, fluorescence, ability to bind other proteins or molecules)
- Detectable phenotypes are not limited to enzymatic activity or fluorescence.
- the phenotype associated with correct folding of P could be the ability of P to bind a target molecule, the binding event being detectable by some means. In this case, the reporter domain might not have activity until the binding event occurs.
- P could be a component of a complementation system or split protein such as the S-protein or S-peptide (which associate to form active RNASE-A), or the split dihydrofolate reductase, or the split beta lactamase (Galarneau, A; Primeau, M; Trudeau, L E; Michnick, S W Beta-lactamase protein fragment complementation assays as in vivo and in vitro sensors of protein-protein interactions Nature Biotechnology; JUN 2002; v. 20, no. 6, p.
- a complementation system or split protein such as the S-protein or S-peptide (which associate to form active RNASE-A), or the split dihydrofolate reductase, or the split beta lactamase (Galarneau, A; Primeau, M; Trudeau, L E; Michnick, S W Beta-lactamase protein fragment complementation assays as in vivo and in vitro sensors of protein-protein interactions Nature Biotechnology; JUN 2002; v. 20, no
- split beta galactosidase (Wigley, W C; Stidham, R D; Smith, N M; Hunt, J F; Thomas, P J Protein solubility and folding monitored in vivo by structural complementation of a genetic marker protein Nature Biotechnology; FEB 2001; v. 19, no. 2, p. 131-136).
- the split proteins could be self-assembling, or require the association via fused partners that are capable of association, such as coiled-coils. (Galarneau, A; Primeau, M; Trudeau, L E; Michnick, S W Beta-lactamase protein fragment complementation assays as in vivo and in vitro sensors of protein-protein interactions Nature Biotechnology; June 2002; v. 20, no. 6, p. 619-622.
- the signal level given as the detectable phenotype be proportionate to the amount of correctly folded reporter molecule.
- the binding event could be that of an antibody that recognizes an epitope of the correctly-folded target P, binding of the antibody measured by some means such as the enzymatic activity of a linked enzyme.
- the target polypeptide may itself have reporter activity or may be joined to another molecule that has reporter activity.
- Reporter molecules that can be used include those with activities that can be directly measured, e.g., fluorescent polypeptides, e.g., green, blue, yellow, or red fluorescent proteins and variants of those proteins; polypeptides encoded by antibiotic resistance genes; and molecules that can be indirectly measured, e.g., enzymes such as ⁇ -galactosidase, alkaline phosphatase, horse radish peroxidase, ⁇ -lactamase, or other enzymes that require a secondary detection reagent.
- Other polypeptides such as antibodies or other binding protein, may be measured by assessing their ability to specifically bind to a binding partner.
- Other polypeptides could be parts of ‘split protein’ complementing pairs.
- DHFR DHFR
- DHFR 106-186 from murine dihydrofolate reductase
- various split proteins such as beta lactamase, beta galactosidase, etc.
- this assay can be performed in vitro using cell free-expression and appropriate substrates (fluorogenic, chemoluminescent, etc.; see Galacton Star reagent for beta galactosidase, a ribonucleic acid donor/quencher substrate which is the target of RNASE-A, for example, the split S-protein S-peptide system (Novagen) Kelemen, B R; Klink, T A; Behlke, M A; Eubanks, S R; Leland, P A; Raines, R T Hypersensitive substrate for ribonucleases Nucleic Acids Research; Sep. 15, 1999; v. 27, no. 18, p. 3696-3701.
- substrates fluorogenic, chemoluminescent, etc.; see Galacton Star reagent for beta galactosidase, a ribonucleic acid donor/quencher substrate which is the target of RNASE-A
- RNASE-A for example, the split S-protein S-peptide
- Non-polypeptide reporters may also be employed, such as cyclic arseno compounds capable of binding to poly cysteine tags on proteins and cyclizing to become fluorescent. (Adams et al., 2002, Journal Of The American Chemical Society, 124: 6063-6076). Polypeptide with enhanced folding properties are then selected and can be obtained in the quantity desired using various expression systems.
- expression control sequences such as ribosome binding sites, transcription termination sites and the like are also optionally included. Constructs that include one or more of these control sequences are termed “expression cassettes.” Accordingly, the nucleic acids that encode the joined polypeptides are incorporated for high level expression in a desired host cell.
- prokaryotic control sequences which are defined herein to include promoters for transcription initiation, optionally with an operator, along with ribosome binding site sequences, include such commonly used promoters as the beta-lactamase (penicillinase) and lactose (lac) promoter systems (Change et al., Nature (1977) 198: 1056), the tryptophan (trp) promoter system (Goeddel et al., Nucleic Acids Res . (1980) ⁇ : 4057), the tac promoter (DeBoer, et al., Proc. Natl. Acad. Sci. U.S.A .
- promoters as the beta-lactamase (penicillinase) and lactose (lac) promoter systems (Change et al., Nature (1977) 198: 1056), the tryptophan (trp) promoter system (Goeddel et al., Nucleic Acids Res . (1980)
- Standard bacterial expression vectors include plasmids such as pBR322-based plasmids, e.g., pBLUESCRIPTTM, pSKF, pET23D, ⁇ -phage derived vectors, p15A-based vectors (Rose, Nucleic Acids Res. (1988) 16:355 and 356) and fusion expression systems such as GST.
- Epitope tags can also be added to recombinant proteins to provide convenient methods of isolation, e.g., c-myc, HA-tag, 6-His tag, maltose binding protein, VSV-G tag, anti-DYKDDDDK tag, or any such tag, a large number of which are well known to those of skill in the art.
- fusion polypeptides in prokaryotic cells other than E. coli regulatory sequences for transcription and translation that function in the particular prokaryotic species is required.
- promoters can be obtained from genes that have been cloned from the species, or heterologous promoters can be used.
- the hybrid trp-lac promoter functions in Bacillus in addition to E. coli .
- suitable bacterial promoters are well known in the art and are described, e.g., in Russell & Sambrook and Ausubel et al.
- Bacterial expression systems for expressing the proteins of the invention are available in, e.g., E. coli, Bacillus sp., and Salmonella (Palva et al., Gene 22:229-235 (1983); Mosbach et al., Nature 302:543-545 (1983). Kits for such expression systems are commercially available.
- eukaryotic expression systems for mammalian cells, yeast, and insect cells are well known in the art and are also commercially available.
- vectors include Yeast Integrating plasmids (e.g., YIp5) and Yeast Replicating plasmids (the YRp series plasmids) and pGPD-2.
- Expression vectors containing regulatory elements from eukaryotic viruses are typically used in eukaryotic expression vectors, e.g., SV40 vectors, papilloma virus vectors, and vectors derived from Epstein-Barr virus.
- eukaryotic expression vectors include those employing the CMV promoter, metallothionein promoter, murine mammary tumor virus promoter, Rous sarcoma virus promoter, polyhedrin promoter, or other promoters shown effective for expression in eukaryotic cells.
- Either constitutive or regulated promoters can be used in the present invention.
- Regulated promoters can be advantageous because the host cells can be grown to high densities before expression of the fusion polypeptides is induced. High level expression of heterologous proteins slows cell growth in some situations.
- An inducible promoter is a promoter that directs expression of a gene where the level of expression is alterable by environmental or developmental factors such as, for example, temperature, pH, anaerobic or aerobic conditions, light, transcription factors and chemicals.
- inducible promoters are known to those of skill in the art. These include, for example, the lac promoter, the bacteriophage lambda P L promoter, the hybrid trp-lac promoter (Amann et al. (1983) Gene 25: 167; de Boer et al. (1983) Proc. Nat'l. Acad. Sci. USA 80: 21), and the bacteriophage T7 promoter (Studier et al. (1986) J. Mol. Biol .; Tabor et al. (1985) Proc. Nat'l. Acad. Sci. USA 82: 1074-8). These promoters and their use are discussed in Sambrook et al., supra.
- Inducible promoters for other organisms are also well known to those of skill in the art. These include, for example, the metallothionein promoter, the heat shock promoter, as well as many others.
- Translational coupling may be used to enhance expression.
- the strategy uses a short upstream open reading frame derived from a highly expressed gene native to the translational system, which is placed downstream of the promoter, and a ribosome binding site followed after a few amino acid codons by a termination codon. Just prior to the termination codon is a second ribosome binding site, and following the termination codon is a start codon for the initiation of translation.
- the system dissolves secondary structure in the RNA, allowing for the efficient initiation of translation. See Squires, et. al. (1988), J. Biol. Chem. 263: 16297-16302.
- polynucleotide constructs generally requires the use of vectors able to replicate in host bacterial cells, or able to integrate into the genome of host bacterial cells. Such vectors are commonly used in the art.
- kits are commercially available for the purification of plasmids from bacteria (for example, EasyPrepJ, FlexiPrepJ, from Pharmacia Biotech; StrataCleanJ, from Stratagene; and, QIAexpress Expression System, Qiagen).
- the isolated and purified plasmids can then be further manipulated to produce other plasmids, and used to transform cells.
- the polypeptides can be expressed intracellularly, or can be secreted from the cell. Intracellular expression often results in high yields. If necessary, the amount of soluble, active fusion polypeptide may be increased by performing refolding procedures (see, e.g., Sambrook et al., supra.; Marston et al., Bio/Technology (1984) 2: 800; Schoner et al., Bio/Technology (1985) 3: 151). Fusion polypeptides of the invention can be expressed in a variety of host cells, including E. coli , other bacterial hosts, yeast, and various higher eukaryotic cells such as the COS, CHO and HeLa cells lines and myeloma cell lines.
- the host cells can be mammalian cells, insect cells, or microorganisms, such as, for example, yeast cells, bacterial cells, or fungal cells.
- the recombinant fusion polypeptides can be purified according to standard procedures of the art, including ammonium sulfate precipitation, affinity columns, column chromatography, gel electrophoresis and the like (see, generally, R. Scopes, Protein Purification , Springer-Verlag, N.Y. (1982), Deutscher, Methods in Enzymology Vol. 182: Guide to Protein Purification , Academic Press, Inc. N.Y. (1990)). Substantially pure compositions of at least about 90 to 95% homogeneity are preferred, and 98 to 99% or more homogeneity are most preferred.
- the nucleic acids that encode the fusion polypeptides can also include a coding sequence for an epitope or “tag” for which an affinity binding reagent is available.
- suitable epitopes include the myc and V-5 reporter genes; expression vectors useful for recombinant production of fusion polypeptides having these epitopes are commercially available (e.g., Invitrogen (Carlsbad Calif.) vectors pcDNA3.1/Myc-His and pcDNA3.1/V5-His are suitable for expression in mammalian cells).
- Additional expression vectors suitable for attaching a tag to the fusion proteins of the invention, and corresponding detection systems are known to those of skill in the art, and several are commercially available (e.g., FLAG” (Kodak, Rochester N.Y.).
- FLAG Kodak, Rochester N.Y.
- Another example of a suitable tag is a polyhistidine sequence, which is capable of binding to metal chelate affinity ligands. Typically, six adjacent histidines are used, although one can use more or less than six.
- Suitable metal chelate affinity ligands that can serve as the binding moiety for a polyhistidine tag include nitrilo-tri-acetic acid (NTA) (Hochuli, E.
- Evolved polypeptides with improved folding can be used in any number of applications.
- those target polypeptides that can be used as reporter proteins can be used to report expression level, unaffected by folding.
- Conventional methods for assessing protein expression in vivo require poorly folded proteins to be unfolded, for example, prior to probing with labeled antibodies. These proteins do not generally refold well prior to probing or sandwich ELISA, leading to an underestimate of expression level as the misfolded aggregated protein domains are not available for binding by the antibody. Obviously this denaturing method is not suited for intact, high throughput in vivo protein expression monitoring.
- conventional methods for assessing protein expression in vivo do not work well when the protein domains are buried in aggregates. In contrast, the reporter activity of a polypeptide that has enhanced folding can more accurately reflect expression.
- the GFP and DsRed variants described herein that have improved folding activity can be used in many in vivo and high throughput applications.
- Xid-L-GFP SF fluorescence is a direct indicator of total expression levels. The assay can thus be applied to single cells using flow cytometry.
- the superfolder fluorescent proteins provided herein provide new and more stable scaffolds for the creation of new GFP variants based on circular permutation.
- FIG. 1 shows Normalized whole cell fluorescence for E. coli BL21(DE3) expressing GFP variants as C-terminal fusions with poorly-folded bullfrog red cell H-subunit ferritin (bracketed). Expression at 37° C. (black) and 27° C. (grey). GFP variants (left to right) cycle-3 redshift, 6 single point mutants, super folder (left, bracketed)). Non-fusion GFP variants (cycle-3 redshift and superfolder, (right)) as reference. Note that the fluorescence of the optimized superfolder fused to ferritin is essentially identical to the non-fusion cycle-3 redshift GFP. In contrast, cycle-3 redshift GFP fused to ferritin is poorly folded (far left). As expected, the fluorescence is higher at 27° C. relative to 37° C., consistent with the improved folding at lower temperature.
- the ferritin-linker-GFPSF fusion protein partitioned quantitatively to the inclusion body fraction, as was the case with the ferritin-linker-GFP3 variant.
- the solubility of the fusion protein was therefore controlled by the solubility of its most poorly folded domain (ferritin).
- the aggregated fusion protein also failed to catalyze the oxidation of Fe 2+ , yet was brightly fluorescent. This observation suggested that the aggregated fusion protein still contained a misfolded and poorly soluble ferritin domain, but a correctly folded and functional GFP domain. Accordingly, it was concluded that the superfolder mutations uncoupled the folding of the GFP domain and the formation of the chromophore from the presence of misfolded fused ferritin domain.
- dsRED T4 an improved variant of dsRED with decreased aggregation and increased rate of chromophore formation, termed dsRED T4, previously described by Glick and co-workers (Bevis B J, Glick B S. Rapidly maturing variants of the Discosoma red fluorescent protein (DsRed). Nat. Biotechnol. 2002 January; 20(1):83-87).
- the starting variant has the dsRED wild-type sequence, with the indicated mutations of Glick (see Table 1).
- Tsien A monomeric variant of dsRED was recently engineered by Tsien (Campbell R E, Tour O, Palmer A E, Steinbach P A, Baird G S, Zacharias D A, Tsien R Y. A monomeric red fluorescent protein Proc Natl Acad Sci USA. 2002 Jun. 11; 99(12):7877-82). This sequence is included in Table 1 for reference.
- the monomeric variant of Tsien contains several of the Glick T4 (this was the starting parental variant used by Tsien & co-workers for engineering the monomeric dsRED).
- One of the superfolder amino acid positions (177) was found as F177V by Tsien, and F177I in this work.
- Tsien specified that this mutation was associated with the monomeric character (wild type dsRed is a tetramer). There is no teaching in the work of Tsien that this mutation improves folding above that of the starting variant. F177I in this example, contributing to the improved folding of the dsRED cycle 5, is a new and surprising property of mutation at F177, not anticipated by Tsien. Similarly, the negatively charged R2E of superfolder dsRED cycle 5 in our work differs from the R2A non-charged variant previously described by Glick, and there is no teaching in Glick or Tsien that mutations at R improved the folding of dsRED or increase its tolerance to misfolded fused proteins.
- dsRED wild-type amino acid at position cited mdsRED amino acid of monomeric variant of Tsien. Glick T4 amino acid of improved variant of Glick. sfdsRED amino acid of superfolder dsRED (this work). Grey rows: amino acid positions in common with this work, at which previous workers also specify a mutation relative to wild type.
- 10 ⁇ l cell aliquots were mixed with 180 ⁇ l of buffer D and the fluorescence measured (488 ex/520 em) using an FL600 plate reader (Biotek). 10 ⁇ l cell aliquots were mixed with SDS loading buffer containing dithiothreitol in PCR tubes and denatured for 5 min at 95° C. 8 ⁇ l of the denatured samples were run on 4-20% gradient gels (BioRad), stained using Gelcode Blue (BioRad), and protein quantified by scanning densitometry using a GS-800 calibrated densitometer (BioRad).
- FIG. 2 represents a plot of the normalized fluorescence versus the total whole cell expression (determined by SDS-PAGE densitometry). Many of the proteins are poorly folded and the cells carrying these constructs are only weakly fluorescent in the case of cycle-3 GFP, as expected. Thus the whole cell fluorescence is poorly correlated with total expression level. Instead, the fluorescence of the cycle-3 GFP fusions was correlated with the non-fusion solubility of the proteins expressed alone as previously reported (Waldo G S, Standish B M, Berendzen J, Terwilliger T C. (1999) Rapid protein-folding assay using green fluorescent protein. Nat. Biotechnol: 17(7): 691-695; Waldo GS, (2002) Method for determining and modifying protein/peptide solubility, U.S. Pat. No. 6,448,087.
- fluorescent GFP was denatured in 9M urea at 95° C. for 5 min until unfolded and non-fluorescent.
- GFP was renatured (refolded) by rapidly diluting 500-fold in the indicated concentration of urea in 100 mM TRIS pH 7.5, 150 mM NaCl, 10% glycerol, and allowed to refold for 1 h.
- the fluorescence was measured using a BioTek FL600 plate reader.
- Fluorescent cycle-3 redshift or superfolder GFP were unfolded in 9M urea at 95° C. for 5 minutes until non-fluorescent.
- the proteins were refolded by diluting 100-fold in 100 mM TRIS pH 7.5, 150 mM NaCl, 10% glycerol, in a rapidly stirred cuvette and the kinetics measured at 0.2 s intervals on a Perkin Elmer spectrofluorimeter (Waldo G S, Standish B M, Berendzen J, Terwilliger T C. (1999) Rapid protein-folding assay using green fluorescent protein. Nat. Biotechnol. 17(7): 691-695). The long-scale kinetics are shown in FIG. 4A .
- both superfolder and cycle-3 redshift approached the same final fluorescence values asymptotically (approximation of infinite time), ca. 375 fluorescence units.
- the initial rates were determined by fitting 4th order polynomials to the first 40 seconds of each progress curve (see FIG. 4B ). Rates were normalized to pseudo-first-order rate constants by dividing by the fluorescence values at infinite time (ca. 375).
- the superfolder refolds approximately 7 times faster than cycle-3 redshift, consistent with the improved folding of superfolder (9.2 ⁇ 10 ⁇ 2 -s ⁇ 1 for superfolder, 1.3 ⁇ 10 ⁇ 2 s ⁇ 1 for cycle-3 redshift. This is consistent with the improved folding of superfolder relative to the starting cycle-3 redshift parental variant.
- GFP cycle-3 redshift or superfolder
- the mutant pools and the starting variants were expressed in BL21(DE3) at 37° C., sonicated to lysed the cells, fractionated into soluble and pellet fractions by centrifugation, and the soluble and pellet fractions resolved on 20% SDS-PAGE gels, and scanned by densitometer.
- the starting variants were fully-soluble as expected.
- the mutant pools displayed a significant fraction of misfolded, insoluble protein.
- Superfolder GFP mutant pool contained ca. 2.5 times the soluble protein of the cycle-3 redshift mutant pool, consistent with the improved folding (and subsequent increased solubility) of the superfolder variant (see FIG. 5 ).
- GFP cycle-3 redshift (F64L, S65T) or superfolder
- the starting (parental) variants (superfolder or cycle-3 redshift) were cloned and expressed in BL21(DE3) at 37° C. as a standard and analyzed by flow cytometry.
- the superfolder variant mutant pool has a higher fraction of brighter cells ( FIG. 7 ) compared to the cycle-3 redshift mutant pool ( FIG. 6 ).
- the increased tolerance of the folding of superfolder GFP to additional random mutations is consistent with the improved folding of the superfolder GFP versus cycle-3 redshift.
- each GFP variant was linked by a short GGGS amino acid linker, and new start codons were created at the indicated sites (see Table 2). Sites were chosen to correspond to the middle of loops between structural elements using the published structures of GFP. Manipulation was by primer-based PCR according to standard methods well known in the art. Most proteins do not tolerate circular permutation and still fold (Baird et al., 1999, Proc. Natl. Acad. Sci. USA, 96: 11241-11246). The effect of circular permutation was investigated by studying the solubility of the permutants as well as the fluorescence yield.
- Circular permutants were cloned into the pET vector equipped with an in-frame Spe-1 and Kpn-1 cloning site as Spe-1/Kpn-1 inserts and expressed in BL21(DE3) at 37° C. for 4 h.
- the cells were pelleted and fractionated into soluble and pellet fractions according to previously published methods (Waldo G S, Standish B M, Berendzen J, Terwilliger T C. (1999) Rapid protein-folding assay using green fluorescent protein. Nat. Biotechnol. 17(7): 691-695), resolved on SDS-PAGE gels, and the soluble and pellet fractions quantitated by densitometry.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Immunology (AREA)
- Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Organic Chemistry (AREA)
- Biomedical Technology (AREA)
- Urology & Nephrology (AREA)
- General Health & Medical Sciences (AREA)
- Hematology (AREA)
- Medicinal Chemistry (AREA)
- Biochemistry (AREA)
- Cell Biology (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- General Physics & Mathematics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biotechnology (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- Microbiology (AREA)
- Physics & Mathematics (AREA)
- Food Science & Technology (AREA)
- Zoology (AREA)
- Gastroenterology & Hepatology (AREA)
- Tropical Medicine & Parasitology (AREA)
- Toxicology (AREA)
- Peptides Or Proteins (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
- This application is a Divisional of U.S. patent application Ser. No. 10/423,688, filed Apr. 24, 2003, which is a continuation-in-part of U.S. patent application Ser. No. 10/132,067, filed Apr. 24, 2002.
- This invention was made with government support under Contract No. W-7405-ENG-36 awarded by the U.S. Department of Energy. The government has certain rights in the invention.
- Protein insolubility constitutes a significant problem in basic and applied bioscience, in many situations limiting the rate of progress in these areas. Protein folding and solubility has been the subject of considerable theoretical and empirical research. However, there still exists no general method for improving intrinsic protein solubility. Such a method would greatly facilitate-protein structure-function studies, drug design, de novo peptide and protein design and associated structure-function studies, industrial process optimization using bioreactors and microorganisms, and many disciplines in which a process or application depends on the ability to tailor or improve the solubility of proteins, screen or modify the solubility of large numbers of unique proteins about which little or no structure-function information is available, or adapt the solubility of proteins to new environments when the structure and function of the protein(s) are poorly understood or unknown.
- Overexpression of cloned genes using an expression host, for example E. coli, is the principal method of obtaining proteins for most applications. Unfortunately, many such cloned foreign proteins are insoluble or unstable when overexpressed. There are two sets of approaches currently in use which deal with such insoluble proteins. One set of approaches modifies the environment of the protein in vivo and/or in vitro. For example, proteins may be expressed as fusions with more soluble proteins, or directed to specific cellular locations. Chaperons may be coexpressed to assist folding pathways. Insoluble proteins may be purified from inclusion bodies using denaturants and the protein subsequently refolded in the absence of the denaturant. Modified growth media and/or growth conditions can sometimes improve the folding and solubility of a foreign protein. However, these methods are frequently cumbersome, unreliable, ineffective, or lack generality. A second set of approaches changes the sequence of the expressed protein. Rational approaches employ site-directed mutation of key residues to improve protein stability and solubility. Alternatively, a smaller, more soluble fragment of the protein may be expressed. These approaches require a priori knowledge about the structure of the protein, knowledge which is generally unavailable when the protein is insoluble. Furthermore, rational design approaches are best applied when the problem involves only a small number of amino-acid changes. Finally, even when the structure is known, the changes required to improve solubility may be unclear. Thus, many thousands of possible combinations of mutations may have to be investigated leading to what is essentially an “irrational” or random mutagenesis approach. Such an approach requires a method for rapidly determining the solubility of each version.
- Random or “irrational” mutagenesis redesign of protein solubility carries the possibility that the native function of the protein may be destroyed or modified by the inadvertent mutation of residues which are important for function, but not necessarily related to solubility. However, protein solubility is strongly influenced by interaction with the environment through surface amino acid residues, while catalytic activities and/or small substrate recognition often involve partially buried or cleft residues distant from the surface residues. Thus, in many situations, rational mutation of proteins has demonstrated that the solubility of a protein can be modified without destroying the native function of the protein. Modification of the function of a protein without effecting its solubility has also been frequently observed. Furthermore, spontaneous mutants of proteins bearing only 1 or 2 point mutations have been serendipitously isolated which have converted a previously insoluble protein into a soluble one. This suggests that the solubility of a protein can be optimized with a low level of mutation and that protein function can be maintained independently of enhancements or modifications to solubility. Furthermore, a screen for function may be applied concomitantly after each round of solubility selection during the directed evolution process.
- In the absence of a screen for function, for example when the function is unknown, the final version of the protein can be backcrossed against the wild type in vitro to remove nonessential mutations. This approach has been successfully applied by Stemmer in “Rapid Evolution Of A Protein In Vitro By DNA Shuffling,”, by W. P. C. Stemmer, Nature 370, 389 (1994), and in “DNA Shuffling By Random Fragmentation And Reassembly: In Vitro Recombination For Molecular Evolution,” by W. P. C. Stemmer, Proc. Natl. Acad. Sci. USA 91, 10747 (1994) to problems in which the function of a protein had been optimized and it was desired to remove nonessential mutations accumulated during directed evolution. The development of highly specialized protein variants by directed, in vitro evolution, which exerts unidirectional selection pressure on organisms, is further discussed in: “Searching Sequence Space Using Recombination To Search More Efficiently And Thoroughly Instead Of Making Bigger Combinatorial Libraries,” by Willem P. C. Stemmer, Biotechnology 13, 549 (1995); in “Directed Evolution: Creating Biocatalysts For The Future,” by Frances H. Arnold, Chemical Engineering Science 51, 5091 (1996); in “Directed Evolution Of A Fucosidase From A Galactosidase By DNA Shuffling And Screening,” by Ji-Hu Zhang et al., Proc. Natl. Acad. Sci. USA 94, 4504 (1997); in “Functional And Nonfunctional Mutations Distinguished By Random Combination Of Homologous Genes,” by Huimin Zhao and Frances H. Arnold, Proc. Natl. Acad. Sci. USA 94, 7007 (1997); and in “Strategies For The In Vitro Evolution of Protein Function: Enzyme Evolution By Random Recombination of Improved Sequences”, by Jeff Moore et al., J. Mol. Biol. 272, 336-346 (1997). Therein, efficient strategies for engineering new proteins by multiple generations of random mutagenesis and recombination coupled with screening for improved variants is described. However, there are no teachings concerning the use of directed evolutionary processes to improve solubility of proteins; rather, the mutagenesis was directed to improvement of protein function. It should be mentioned, however, that in order for the protein to function properly in any environment, it must at least be correctly folded.
- Finally, for structural determination it is often not necessary or even desirable to have a fully functional version of the protein. If the mutational rate is low (ensured by molecular backcrossing), it is likely that the structure of the wild-type and solubility optimized versions of a protein will be similar. As long as the protein is soluble, and a structure can be obtained, it should then be possible to redesign the solubility of the protein using rational methods, if desired.
- Wild type green fluorescent protein (GFP) cloned from Aequorea victoria, normally misfolds and is poorly fluorescent when overexpressed in the heterologous host E. coli. It is found predominantly in the inclusion body fraction of cell lysates. The misfolding is incompletely understood, but is thought to result from the increased expression level or rate in E. coli, or the inadequacy of the bacterial chaperone and related folding machinery under conditions of overexpression. The folding yield also decreases dramatically at higher temperatures (37° C. vs. 27° C.). This wild type GFP is a very poor folder, as it is extremely sensitive to the expression environment.
- Green fluorescent protein has become a widely used reporter of gene expression and regulation. DNA shuffling has been used to obtain a mutant having a whole cell fluorescence 45-times greater than the standard, commercially available plasmid GFP. See, e.g., “Improved Green Fluorescent Protein By Molecular Evolution Using DNA Shuffling,” by Andreas Crameri et al., Nature Biotechnology 14, 315 (1996). The screening process optimizes the function of GFP (green fluorescence), and thus uses a functional screen. Although the screening process coincidentally optimizes the solubility of the GFP, in that the GFP is only fluorescent when properly folded, there is no mention of using soluble GFP as a tag to monitor solubility of other proteins; that is, the function of the protein and not its solubility are being modified. In “Wavelength Mutations And Post-translational Auto-oxidation Of Green Fluorescent Protein,” by Roger Heim et al., Proc. Natl. Acad. Sci. USA 91, 12501 (1994), GFP was mutagenized and screened for variants with altered absorption or emission spectra. The authors mention that in place of proteins labeled with fluorescent tags to detect location and sometimes their conformational changes both in vitro and in intact cells, a possible strategy would be to concatenate the gene for the nonfluorescent protein of interest with the gene for a naturally fluorescent protein and express the fusion product. However, the focus of this paper is the extension of the usefulness of GFP by enabling visualization of differential gene expression and protein localization and measurement of protein association by fluorescence resonance energy transfer, by making available two visibly distinct colors. There is no mention of the use of the gene construct for solubility determinations. The paper further discusses the expression of GFP in E. coli under the control of a T7 promoter, and that the bacteria contained inclusion bodies consisting of protein indistinguishable from jellyfish or soluble recombinant protein on denaturing gels, but that this material was completely nonfluorescent, lacked the visible absorbance bands of the chromophore, and did not become fluorescent when solubilized and subjected to protocols that renature GFP, as opposed to the soluble GFP in the bacteria which undergoes correct folding and, therefore, fluoresces.
- Chun Wu et al. in “Novel Green Fluorescent Protein (GFP) Baculovirus Expression Vectors,” Gene 190, 157 (1997), describe the construction of Baculovirus expression vectors which contain GFP as a reporter gene. The authors follow the production and purification of a protein of interest by in-frame cloning of the gene that expresses the protein in insect cells with the GFP open reading frame, thereby permitting visualization of the produced GFP-fusion protein using UV light. However, the purified GFP-XylE fusion protein was found to be insoluble after harvest. The authors did not correlate the level of fluorescence of the cells expressing the GFP-XylE fusion protein with the solubility of cells expressing the XylE protein alone. Therefore, this reference does not teach the use of the fusion protein fluorescence as an indicator of the solubility of the specific protein XylE or of the solubility of other proteins.
- In “Application Of A Chimeric Green Protein Fluorescent Protein To Study Protein-Protein Interactions,” by N. Garamszegi et al., Biotechniques 23, 864 (1997), the authors discuss the fusion between GFP and human calmodulin-like protein (CLP) and show that this protein retains fluorescence and the known characteristics of CLP. That is, the GFP portion remains responsible for efficient fluorescent signals with little or no influence on the properties of the fused protein of interest. The authors maintain that the exhibited GFP fluorescence provides information concerning the maintenance of the GFP structural integrity in the chimeric protein, but does not provide information about the integrity of the entire fusion protein and, in particular, does not allow any statements concerning the maintenance of CLP function or integrity. From these statements, it is clear that this paper does not contemplate the use of the GFP as a solubility reporter for the CLP.
- It has been demonstrated that improving the apparent functionality of a protein can sometimes increase the concomitant solubility of the protein, as in: “Redesigning enzyme topology by directed evolution,” by G. Macbeath, P. Kast, and D Hilvert, Science 279, 1958-1961 (1998); “Expression of an antibody fragment at high levels in the bacterial cytoplasm,” by P. Martineau, P. Jones, and G. Winter, J. Mol. Biol. 280, 117-127 (1998); “Antibody scFv fragments without disulfide bonds made by molecular evolution,” K. Proba, A. Worn, A. Honegger, and A. Pluckthun, J. Mol. Biol. 275, 245-253 (1998); and “Functional Expression of Horseradish Peroxidase in E. coli by Directed Evolution,” Lin Zhanglin, Todd Thorsen, and Frances H. Arnold, Biotechnol. Prog. 15, 467-471 (1999). In each case, the driving force for the directed evolution was the functionality of the protein of interest. For example, if the protein was an enzyme, the assay for improved function was the turnover of a chromogenic analog of the enzyme's natural substrate; if the protein was an antibody, it was the recognition of the target antigen by the antibody.
- For cytoplasmic expression of antibodies, the recognition was linked to cell survival, (binding of the antibody to a selectable protein marker which was an antigen for the antibody of interest providing selection for functional antibodies); in the case of phage displayed antibodies without disulfide bonds, the recognition was transduced to successful binding of the displayed phage to the target antigen of the displayed antibody in a biopanning protocol. An apparent increase in the amount of protein expressed in the soluble fraction relative to the unselected target proteins was noted upon expression of the proteins in E. coli. The apparent increase in activity of desirable mutants during the evolution was due at least in part to an increase in the number of correctly folded (and hence functional) protein molecules, and not exclusively to an increase in the specific activity of a given protein molecule. However, the driving force for the selection or screening process during the directed evolution depended on the functionality (and functional assay for) the protein of interest.
- Many proteins have no easily detectable functional assay, and thus identification of proteins with improved folding yield by an increase in apparent activity due to a larger number of correctly folded molecules, is not a general method for improving folding by directed evolution. Furthermore, even when functional assays are available, apparent increases in activity can also be due to increases in the specific activity (activity of an individual protein molecule) even when the total number of correctly folded molecules remains the same. Thus, increases in apparent activity do not necessarily translate to increases in the solubility of proteins. Furthermore, functional assays are protein-specific, and thus must be developed on a case-by-case basis for each new protein. Functional assays therefore lack the generality needed to identify proteins which are soluble, or to find genetic variants (mutants and fragments) of proteins with improved solubility, in a high-throughput manner for proteomics or functional genomics wherein large numbers of different proteins about which little or no functional/structural information is known, are to be solubly expressed.
- Stemmer and coworkers applied directed evolution to screen for mutants or variants of GFP that exhibited increased fluorescence and folding yield in E. coli (see, e.g., Crameri et al., Nat. Biotechnol. 143:315-319, 1996). They identified a mutant that exhibited increased folding ability. This version of GFP, termed cycle-3 or GFP3 contains the mutations F99S, M153T and V163A. GFP3 is relatively insensitive to the expression environment and folds well in a wide variety of hosts, including E. coli. GFP3 folds equally well at 27° C. and 37° C. Thus, the GFP3 mutations also appear to eliminate potential temperature sensitive folding intermediates that occur during folding of wild type GFP.
- GFP3 can be made to misfold by expression as a fusion protein with another poorly folded polypeptide. GFP3 has been used to report on the “folding robustness” of N-terminally fused proteins during expression in E. coli (Waldo et al., Nat. Biotechnol. 17:691-695, 1999). If test protein, Xi, misfolds and is insoluble when expressed in E. coli, cells expressing the corresponding fusion protein Xi-L-GFP3 (where L is a small flexible linker) are poorly fluorescent, indicating the high probability of failure of the GFP3 to fold and become fluorescent. On the other hand, when protein Xs folds well and is highly soluble when expressed in E. coli, cells expressing the corresponding fusion protein Xs-L-GFP3 are highly fluorescent, indicating the successful folding of the GFP3 domain. These observations suggest the presence of latent folding defects in the folding trajectory of GFP3 and that poorly folded fused polypeptides effectively ‘bait’ the GFP3 to misfold.
- This aspect of GFP3 folding has been used to evolve soluble versions of proteins that normally misfold and aggregate when expressed in E. coli. This methodology is described, for example, in WO 01/23602. In these methods, the sequence of the reporter, e.g., GFP3 domain, remains constant and a poorly folded upstream domain is mutated. Better folded variants of domain X are identified by increased fluorescence.
- The present invention provides directed evolution methods for improving the folding and solubility characteristics of polypeptides. A number of fluorescent proteins having improved solubility and folding characteristics are provided, including superfolder GFP and DsRed fluorescent proteins.
-
FIG. 1 . Normalized whole cell fluorescence for E. coli BL21(DE3) expressing GFP variants as C-terminal fusions with poorly-folded bullfrog red cell H-subunit ferritin (bracketed). Expression at 37° C. (black) and 27° C. (grey). GFP variants (left to right) cycle-3 redshift, 6 single point mutants, super folder (left, bracketed)). Non-fusion GFP variants (cycle-3 redshift and superfolder, (right)) as reference. Note that the fluorescence of the optimized superfolder fused to ferritin is essentially identical to the non-fusion cycle-3 redshift GFP. In contrast, cycle-3 redshift GFP fused to ferritin is poorly folded (far left). As expected, the fluorescence is higher at 27° C. relative to 37° C., consistent with the improved folding at lower temperature. -
FIG. 2 . Proteins from Pyrobaculum aerophilum expressed in Echerichia coli as N-terminal fusions with either cycle-3 GFP redshift (lower line, triangles) or superfolder GFP (upper line, circles). Sixteen proteins listed in order increasing expression level: tartrate dehydratase beta subunit, nucleoside diphosphate kinase, tyrosine tRNA synthetase, polysulfide reductase subunit, methyltransferase, aspartate-semialdehyde dehydrogenase, purine-nucleoside phosphorylase, soluble hydrogenase, 3-hexylose 6-phosphate synthase, nirD protein, C-type cytochrome biogenesis factor, phosphate cyclase, hydrogenase expression/formation, chorismate mutase, DNA-directed RNA polymerase, and ribosomal protein S9p. Y-axis: whole cell fluorescence (488 nm excitation, 520 nm emission, 10 nm bandpass); X-axis: trace quantity of protein in whole cell fraction determined by SDS-PAGE densitometry. -
FIG. 3 . Tolerance of GFP to urea-induced unfolding during refolding from fully-denatured state. GFP unfolded in 9M urea at 95° C. were refolded by rapidly diluting into TRIS buffer containing the indicated final concentration of urea (x-axis). Cycle-3 redshift (triangles) or superfolder (circles). Fraction of folded protein is determined by fraction of fluorescence recovered (y-axis) at indicated concentration of urea in the refolding buffer (x-axis). -
FIG. 4A . Long-term progress curves during refolding of superfolder GFP (SF-GFP) and cycle-3 redshift GFP (C3-GFP). Fully denatured proteins were diluted 100-fold into TRIS buffer (100 mM TRIS-HCl pH 7.5, 150 mM NaCl, 10% v/v glycerol) and the fluorescence measured at 0.2 s intervals with a Perkin Elmer spectrofluorimeter. Note that after 10000 s, both proteins approach the same final value (ca. 375 units). -
FIG. 4B . Initial rate progress curves during refolding of superfolder GFP (SF-GFP) and cycle-3 redshift GFP (C3-GFP). Fully denatured proteins were diluted 100-fold into TRIS buffer (100 mM TRIS-HCl pH 7.5, 150 mM NaCl, 10% v/v glycerol) and the fluorescence measured at 0.2 s intervals with a Perkin Elmer spectrofluorimeter. Initial rates were determined by fitting a 4th order polynomial to the first 40 s of each progress curve, and converted to pseudo first-order rates by normalizing to the fluorescence at infinite time (ca. 375 units). The superfolder refolds ca. 7 times faster than cycle-3 redshift. -
FIG. 5 . Increased solubility superfolder mutant pool (right) versus cycle-3 redshift mutant pool (left). SDS-PAGE of (left to right) 10 kD molecular weight standard (M), soluble (S) and pellet (P) fractions of cycle-3 redshift mutant pool (C3-GFP) and superfolder mutant pool (SF-GFP) expressed at 37° C., 10 kD molecular weight standard (M). The superfolder (right) has a higher proportion of soluble protein compared to the cycle-3 redshift (left), consistent with the improved folding of superfolder GFP. -
FIG. 6 . Flow cytometric analyses of cycle-3 redshift mutant pool library (grey) or control parental cycle-3 redshift (dark grey). Number of events (cells) y-axis; fluorescence intensity of each event (x-axis). Note the logarithmic fluorescence scale. -
FIG. 7 . Flow cytometric analyses of superfolder mutant pool library (grey) or control parental superfolder variant (dark grey). Number of events (cells) y-axis; fluorescence intensity of each event (x-axis). Note the logarithmic fluorescence scale. -
FIG. 8 . Solubility of various circular permutants expressed in BL21(DE3) at 37° C. of cycle-3 redshift (black) and superfolder GFP (grey). Normal, non-permutated variants (control). Y-axis, fraction soluble determined by SDS-PAGE densitometry. X-axis, indicated circular permutant (see Table 1 for new starting codon position). As expected, the superfolder is more tolerant to circular permutation (as evidenced by the higher solubility) compared to cycle-3 redshift. -
FIG. 9 . Whole-cell fluorescence at 37° C. for BL21(DE3) expressing various circular permutants of cycle-3 redshift (black) and superfolder GFP (grey). Fluorescence (488 nm ex/520 nm em) normalized by culture density (absorbance at 600 nm). Normal, non-permutated variants (control). Y-axis, normalized whole cell-fluorescence. X-axis, indicated circular permutant (see Table 1 for new starting codon position). As expected, the superfolder is more tolerant to circular permutation (as evidenced by the higher fluorescence) compared to cycle-3 redshift. -
FIG. 10 . Whole-cell fluorescence at 37° C. for BL21(DE3) expressing dsRED variants as C-terminal fusions with poorly-folded bullfrog red-cell H ferritin. Left to right: starting variant (wt); pools of top 10 optima from each round of directed evolution (rounds 1 to 5); non-fusion starting variant (non fusion). Fluorescence (580 nm ex/610 nm em) normalized by culture density (absorbance at 600 nm). As expected, the folding of superfolder dsRED (round 5) is more tolerant to fused upstream misfolded bullfrog red-cell H-ferritin compared to the starting (wt) variant. - The current invention provides polypeptides with improved folding activity and/or solubility, including superfolding variants of the Aequorea victoria Green Fluorescent Protein and Discosoma sp. Red Fluorescent Protein, and methods of obtaining such polypeptides.
- Unless otherwise defined, all terms of art, notations and other scientific terminology used herein are intended to have the meanings commonly understood by those of skill in the art to which this invention pertains. In some cases, terms with commonly understood meanings are defined herein for clarity and/or for ready reference, and the inclusion of such definitions herein should not necessarily be construed to represent a substantial difference over what is generally understood in the art. The techniques and procedures described or referenced herein are generally well understood and commonly employed using conventional methodology by those skilled in the art, such as, for example, the widely utilized molecular cloning methodologies described in Sambrook et al., Molecular Cloning: A Laboratory Manual 3rd. edition (2001) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. and Current Protocols in Molecular Biology (Ausbel et al., eds., John Wiley & Sons, Inc. 2001. As appropriate, procedures involving the use of commercially available kits and reagents are generally carried out in accordance with manufacturer defined protocols and/or parameters unless otherwise noted.
- A “fluorescent protein” as used herein is a protein that has intrinsic fluorescence. Typically, a fluorescent protein has a structure that includes an 11-stranded beta-barrel.
- A “chromophoric protein” or “chromoprotein” are used interchangeably and refer to a class of proteins, recently identified from various corals, anemones and often sea organisms, which have intrinsic color and, in some cases, variable degrees of intrinsic or inducible fluorescence. Typically, a chromo-protein has a structure similar to the fluorescent proteins, i.e., an 11-stranded beta-barrel.
- The “MMDB Id: 5742 structure” as used herein refers to the GFP structure disclosed by Ormo & Remington, MMDB Id: 5742, in the Molecular Modeling Database (MMDB), PDB Id: 1EMA PDB Authors: M. Ormo & S. J. Remington PDB Deposition: 1 Aug. 96 PDB Class: Fluorescent Protein PDB Title: Green Fluorescent Protein From Aequorea Victoria. The Protein Data Bank (PDB) reference is Id PDB Id: 1EMA PDB Authors: M. Ormo & S. J. Remington PDB Deposition: 1 Aug. 96 PDB Class: Fluorescent Protein PDB Title: Green Fluorescent Protein From Aequorea Victoria. (see, e.g., Ormo et al. “Crystal structure of the Aequorea Victoria green fluorescent protein.” Science 1996 Sep. 6; 273(5280):1392-5; Yang et al, “The molecular structure of green fluorescent protein.” Nat. Biotechnol. 1996 October; 14(10):1246-51).
- “Root mean square deviation” (“RMSD”) refers to the root mean square superposition residual in Angstroms. This number is calculated after optimal superposition of two structures, as the square root of the mean square distances between equivalent C-alpha-atoms.
- A “folding interference domain” as used herein refers to a domain that interferes with the folding of a polypeptide (“Xid”). The presence of a folding interference domain in a fusion protein of a polypeptide of interest should detectably interfere with folding, as measured by any criteria capable of discriminating between better and poorer folded versions of the polypeptide of interest, P, within the context of a fusion with Xid. In the practice of the method of the invention, the folding interference domain need not be misfolded itself. In fact, it may not actually be folded at all, and it might be soluble or it might be insoluble. For a folding interference domain, the only requirement is that P in Xid-L-P is detectably less well-folded than P alone (“L” indicates an optional linker polypeptide incorporated between P and Xid in the fusion protein). Further details regarding the detection and assessment of folding is set forth infra.
- “Domain” refers to a unit of a protein or protein complex, comprising a polypeptide subsequence, a complete polypeptide sequence, or a plurality of polypeptide sequences where that unit has a defined function. The function is understood to be broadly defined and can be ligand binding, catalytic activity or can have a stabilizing effect on the structure of the protein.
- “Join” or “link” refers to any method known in the art for functionally connecting protein domains, including without limitation recombinant fusion with or without intervening domains; intein-mediated fusion; non-covalent association; and covalent bonding, including disulfide bonding; hydrogen bonding; electrostatic bonding; and conformational bonding, e.g., antibody-antigen, and biotin-avidin associations.
- “Fused” refers to linkage by covalent bonding.
- A “fusion protein” refers to a chimeric molecule formed by the joining of two or more polypeptides through a bond formed one polypeptide and another polypeptide. Fusion proteins may also contain a linker polypeptide in between the constituent polypeptides of the fusion protein. The term “fusion construct” or “fusion protein construct” is generally meant to refer to a polynucleotide encoding a fusion protein.
- The term “heterologous” when used with reference to portions of a nucleic acid indicates that the nucleic acid comprises two or more subsequences that are not found in the same relationship to each other in nature. For instance, a nucleic acid is typically recombinantly produced, having two or more sequences from unrelated genes arranged to make a new functional nucleic acid, e.g., a nucleic acid encoding a fluorescent protein from one source and a nucleic acid encoding a peptide sequence from another source. Similarly, a heterologous protein indicates that the protein comprises two or more subsequences that are not found in the same relationship to each other in nature (e.g., a fusion protein).
- A “reporter molecule” has a detectable phenotype. Often, the reporter molecule is a polypeptide, such as an enzyme, or a fluorescent polypeptide. A reporter polypeptide may have intrinsic activity. In the context of the methods of the invention, a reporter molecule has a detectable phenotype associated with correct folding or solubility of the reporter molecule. For example, the reporter could be an enzyme or a fluorescent polypeptide. For an enzyme, the detectable phenotype would then be the ability to turn over a substrate giving a detectable product or change in substrate concentration or physical state. For a fluorescent protein, the activity would be the emission of fluorescence upon excitation by the appropriate wavelength(s) of light.
- The term “isolated,” when applied to a nucleic acid or protein, denotes that the nucleic acid or protein is essentially free of other cellular components with which it is associated in the natural state. It is preferably in a homogeneous state although it can be in either a dry or aqueous solution. Purity and homogeneity are typically determined using analytical chemistry techniques such as polyacrylamide gel electrophoresis or high performance liquid chromatography. A protein which is the predominant species present in a preparation is substantially purified. In particular, an isolated gene is separated from open reading frames which flank the gene and encode a protein other than the gene of interest. The term “purified” denotes that a nucleic acid or protein gives rise to essentially one band in an electrophoretic gel. Particularly, it means that the nucleic acid or protein is at least 85% pure, more preferably at least 95% pure, and most preferably at least 99% pure.
- “Nucleic acid” refers to deoxyribonucleotides or ribonucleotides and polymers thereof in either single- or double-stranded form. The term encompasses nucleic acids containing known nucleotide analogs or modified backbone residues or linkages, which are synthetic, naturally occurring, and non-naturally occurring, which have similar binding properties as the reference nucleic acid, and which are metabolized in a manner similar to the reference nucleotides. Examples of such analogs include, without limitation, phosphorothioates, phosphoramidates, methyl phosphonates, chiral-methyl phosphonates, 2-O-methyl ribonucleotides, peptide-nucleic acids (PNAs).
- Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions) and complementary sequences, as well as the sequence explicitly indicated. Specifically, degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues (Batzer et al., Nucleic Acid Res. 19:5081 (1991); Ohtsuka et al., J. Biol. Chem. 260:2605-2608 (1985); Rossolini et al., Mol. Cell. Probes 8:91-98 (1994)). The term nucleic acid is used interchangeably with gene, cDNA, mRNA, oligonucleotide, and polynucleotide. The terms “polypeptide,” “peptide” and “protein” are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one or more amino acid residue is an artificial chemical mimetic of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers and non-naturally occurring amino acid polymer.
- The term “amino acid” refers to naturally occurring and synthetic amino acids, as well as amino acid analogs and amino acid mimetics that function in a manner similar to the naturally occurring amino acids. Naturally occurring amino acids are those encoded by the genetic code, as well as those amino acids that are later modified, e.g., hydroxyproline, γ-carboxyglutamate, and O-phosphoserine. Amino acid analogs refers to compounds that have the same basic chemical structure as a naturally occurring amino acid, i.e., an a carbon that is bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs have modified R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic chemical structure as a naturally occurring amino acid. Amino acid mimetics refers to chemical compounds that have a structure that is different from the general chemical structure of an amino acid, but that functions in a manner similar to a naturally occurring amino acid.
- Amino acids may be referred to herein by either their commonly known three letter symbols or by the one-letter symbols recommended by the IUPAC-IUB Biochemical Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly accepted single-letter codes.
- The terms “peptidomimetic” and “mimetic” refer to a synthetic chemical compound that has substantially the same structural and functional characteristics of the polypeptides of the invention. Peptide analogs are commonly used in the pharmaceutical industry as non-peptide drugs with properties analogous to those of the template peptide. These types of non-peptide compound are termed “peptide mimetics” or “peptidomimetics” (Fauchere, J. Adv. Drug Res. 15:29 (1986); Veber and Freidinger TINS p. 392 (1985); and Evans et al. J. Med. Chem. 30:1229 (1987), which are incorporated herein by reference). Peptide mimetics that are structurally similar to therapeutically useful peptides may be used to produce an equivalent or enhanced therapeutic or prophylactic effect. Generally, peptidomimetics are structurally similar to a paradigm polypeptide (i.e., a polypeptide that has a biological or pharmacological activity), but have one or more peptide linkages optionally replaced by a linkage selected from the group consisting of, e.g., —CH2NH—, —CH2S—, —CH2-CH2-, —CH═CH— (cis and trans), —COCH2-, —CH(OH)CH2-, and —CH2SO—. The mimetic can be either entirely composed of synthetic, non-natural analogues of amino acids, or, is a chimeric molecule of partly natural peptide amino acids and partly non-natural analogs of amino acids. The mimetic can also incorporate any amount of natural amino acid conservative substitutions as long as such substitutions also do not substantially alter the mimetic's structure and/or activity. For example, a mimetic composition is within the scope of the invention if it is capable of carrying out the binding or fluorescent activities of green fluorescent protein.
- “Conservatively modified variants” applies to both amino acid and nucleic acid sequences. With respect to particular nucleic acid sequences, conservatively modified variants refers to those nucleic acids which encode identical or essentially identical amino acid sequences, or where the nucleic acid does not encode an amino acid sequence, to essentially identical sequences. Because of the degeneracy of the genetic code, a large number of functionally identical nucleic acids encode any given protein. For instance, the codons GCA, GCC, GCG and GCU all encode the amino acid alanine. Thus, at every position where an alanine is specified by a codon, the codon can be altered to any of the corresponding codons described without altering the encoded polypeptide. Such nucleic acid variations are “silent variations,” which are one species of conservatively modified variations. Every nucleic acid sequence herein which encodes a polypeptide also describes every possible silent variation of the nucleic acid. One of skill will recognize that each codon in a nucleic acid (except AUG, which is ordinarily the only codon for methionine, and TGG, which is ordinarily the only codon for tryptophan) can be modified to yield a functionally identical molecule. Accordingly, each silent variation of a nucleic acid which encodes a polypeptide is implicit in each described sequence.
- As to amino acid sequences, one of skill will recognize that individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alters, adds or deletes a single amino acid or a small percentage of amino acids in the encoded sequence is a “conservatively modified variant” where the alteration results in the substitution of an amino acid with a chemically similar amino acid. Conservative substitution tables providing functionally similar amino acids are well known in the art. For example, substitutions may be made wherein an aliphatic amino acid (G, A, I, L, or V) is substituted with another member of the group. Similarly, an aliphatic polar-uncharged group such as C, S, T, M, N, or Q, may be substituted with another member of the group; and basic residues, e.g., K, R, or H, may be substituted for one another. In some embodiments, an amino acid with an acidic side chain, E or D, may be substituted with its uncharged counterpart, Q or N, respectively; or vice versa. Each of the following eight groups contains other exemplary amino acids that are conservative substitutions for one another:
-
- 1) Alanine (A), Glycine (G);
- 2) Aspartic acid (D), Glutamic acid (E);
- 3) Asparagine (N), Glutamine (Q);
- 4) Arginine (R), Lysine (K);
- 5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V);
- 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W);
- 7) Serine (S), Threonine (T); and
- 8) Cysteine (C), Methionine (M)
- (see, e.g., Creighton, Proteins (1984)).
Such conservatively modified variants are in addition to and do not exclude polymorphic variants, interspecies homologs, and alleles of the invention.
- For example, substitutions may be made wherein an aliphatic amino acid (G, A, I, L, or V) is substituted with another member of the group. Similarly, an aliphatic polar-uncharged group such as C, S, T, M, N, or Q, may be substituted with another member of the group; and basic residues, e.g., K, R, or H, may be substituted for one another. In some embodiments, an amino acid with an acidic side chain, E or D, may be substituted with its uncharged counterpart, Q or N, respectively; or vice versa. Each of the following eight groups contains other exemplary amino acids that are conservative substitutions for one another:
-
- 1) Alanine (A), Glycine (G);
- 2) Aspartic acid (D), Glutamic acid (E);
- 3) Asparagine (N), Glutamine (Q);
- 4) Arginine (R), Lysine (K);
- 5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V);
- 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W);
- 7) Serine (S), Threonine (T); and
- 8) Cysteine (C), Methionine (M)
- (see, e.g., Creighton, Proteins (1984)).
- Macromolecular structures such as polypeptide structures can be described in terms of various levels of organization. For a general discussion of this organization, see, e.g., Alberts et al., Molecular Biology of the Cell (3rd ed., 1994) and Cantor and Schimmel, Biophysical Chemistry Part I: The Conformation of Biological Macromolecules (1980). “Primary structure” refers to the amino acid sequence of a particular peptide. “Secondary structure” refers to locally ordered, three dimensional structures within a polypeptide. These structures are commonly known as domains. Domains are portions of a polypeptide that form a compact unit of the polypeptide and are typically 25 to approximately 500 amino acids long. Typical domains are made up of sections of lesser organization such as stretches of □-sheet and □-helices. “Tertiary structure” refers to the complete three dimensional structure of a polypeptide monomer. “Quaternary structure” refers to the three dimensional structure formed by the noncovalent association of independent tertiary units. Anisotropic terms are also known as energy terms.
- The terms “identical” or percent “identity,” in the context of two or more nucleic acids or polypeptide sequences, refer to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same (i.e., about 70% identity, preferably 75%, 80%, 85%, 90%, or 95% identity over a specified region, when compared and aligned for maximum correspondence over a comparison window, or designated region as measured using a BLAST or BLAST 2.0 sequence comparison algorithms with default parameters described below, or by manual alignment and visual inspection. Such sequences are then said to be “substantially identical.” This definition also refers to the compliment of a test sequence. Preferably, the identity exists over a region that is at least about 22 amino acids or nucleotides in length, or more preferably over a region that is 30, 40, or 50-100 amino acids or nucleotides in length.
- The term “similarity,” or percent “similarity,” in the context of two or more polypeptide sequences, refer to two or more sequences or subsequences that have a specified percentage of amino acid residues that are either the same or similar as defined in the 8 conservative amino acid substitutions defined above (i.e., 60%, optionally 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98% or 99% similar over a specified region or, when not specified, over the entire sequence), when compared and aligned for maximum correspondence over a comparison window, or designated region as measured using one of the following sequence comparison algorithms or by manual alignment and visual inspection. Such sequences are then said to be “substantially similar.” Optionally, this identity exists over a region that is at least about 50 amino acids in length, or more preferably over a region that is at least about 100, 200, 300, 400, 500 or 1000 or more amino acids in length.
- For sequence comparison, typically one sequence acts as a reference sequence, to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Default program parameters can be used, or alternative parameters can be designated. The sequence comparison algorithm then calculates the percent sequence identities for the test sequences relative to the reference sequence, based on the program parameters.
- A “comparison window”, as used herein, includes reference to a segment of any one of the number of contiguous positions selected from the group consisting of from 20 to 600, usually about 50 to about 200, more usually about 100 to about 150 in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned. Methods of alignment of sequences for comparison are well-known in the art. Optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J. Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson & Lipman, Proc. Nat'l. Acad. Sci. USA 85:2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, Wis.), or by manual alignment and visual inspection (see, e.g., Current Protocols in Molecular Biology (Ausubel et al., eds. 1995 supplement)). Typically, the Smith & Waterman alignment with the default parameters are used for the purposes of this invention
- Another example of algorithm that is suitable for determining percent sequence identity and sequence similarity are the BLAST and BLAST 2.0 algorithms, which are described in Altschul et al., Nuc. Acids Res. 25:3389-3402 (1977) and Altschul et al., J. Mol. Biol. 215:403-410 (1990), respectively. BLAST and BLAST 2.0 are used, typically with the default parameters described herein, to determine percent sequence identity for the nucleic acids and proteins of the invention. Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information. This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul et al., supra). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always >0) and N (penalty score for mismatching residues; always <0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) uses as defaults a wordlength (W) of 11, an expectation (E) of 10, M=5, N=−4 and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a wordlength of 3, and expectation (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff & Henikoff, Proc. Natl. Acad. Sci. USA 89:10915 (1989)) alignments (B) of 50, expectation (E) of 10, M=5, N=−4, and a comparison of both strands.
- The BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin & Altschul, Proc. Nat'l. Acad. Sci. USA 90:5873-5787 (1993)). One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance. For example, a nucleic acid is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid to the reference nucleic acid is less than about 0.2, more preferably less than about 0.01, and most preferably less than about 0.001. The default parameters of BLAST are also often employed to determined percent identity or percent similarity.
- An indication that two nucleic acid sequences or polypeptides are substantially identical is that the polypeptide encoded by the first nucleic acid is immunologically cross reactive with the antibodies raised against the polypeptide encoded by the second nucleic acid, as described below. Thus, a polypeptide is typically substantially identical to a second polypeptide, for example, where the two peptides differ only by conservative substitutions. Another indication that two nucleic acid sequences are substantially identical is that the two molecules or their complements hybridize to each other under stringent conditions, as described below. Yet another indication that two nucleic acid sequences are substantially identical is that the same primers can be used to amplify the sequence.
- “Antibody” refers to a polypeptide comprising a framework region from an immunoglobulin gene or fragments thereof that specifically binds and recognizes an antigen. The recognized immunoglobulin genes include the kappa, lambda, alpha, gamma, delta, epsilon, and mu constant region genes, as well as the myriad immunoglobulin variable region genes. Light chains are classified as either kappa or lambda. Heavy chains are classified as gamma, mu, alpha, delta, or epsilon, which in turn define the immunoglobulin classes, IgG, IgM, IgA, IgD and IgE, respectively.
- An exemplary immunoglobulin (antibody) structural unit comprises a tetramer. Each tetramer is composed of two identical pairs of polypeptide chains, each pair having one “light” (about 25 kDa) and one “heavy” chain (about 50-70 kDa). The N-terminus of each chain defines a variable region of about 100 to 110 or more amino acids primarily responsible for antigen recognition. The terms variable light chain (VL) and variable heavy chain (VH) refer to these light and heavy chains respectively.
- Antibodies exist, e.g., as intact immunoglobulins or as a number of well-characterized fragments produced by digestion with various peptidases. Thus, for example, pepsin digests an antibody below the disulfide linkages in the hinge region to produce F(ab)′2, a dimer of Fab which itself is a light chain joined to VH-
C H1 by a disulfide bond. The F(ab)′2 may be reduced under mild conditions to break the disulfide linkage in the hinge region, thereby converting the F(ab)′2 dimer into an Fab′ monomer. The Fab′ monomer is essentially Fab with part of the hinge region (see Fundamental Immunology (Paul ed., 3d ed. 1993). While various antibody fragments are defined in terms of the digestion of an intact antibody, one of skill will appreciate that such fragments may be synthesized de novo either chemically or by using recombinant DNA methodology. Thus, the term antibody, as used herein, also includes antibody fragments either produced by the modification of whole antibodies, or those synthesized de novo using recombinant DNA methodologies (e.g., single chain Fv) or those identified using phage display libraries (see, e.g., McCafferty et al., Nature 348:552-554 (1990)). - For preparation of monoclonal or polyclonal antibodies, any technique known in the art can be used (see, e.g., Kohler & Milstein, Nature 256:495-497 (1975); Kozbor et al., Immunology Today 4: 72 (1983); Cole et al., pp. 77-96 in Monoclonal Antibodies and Cancer Therapy (1985)). Techniques for the production of single chain antibodies (U.S. Pat. No. 4,946,778) can be adapted to produce antibodies to polypeptides of this invention. Also, transgenic mice, or other organisms such as other mammals, may be used to express humanized antibodies. Alternatively, phage display technology can be used to identify antibodies and heteromeric Fab fragments that specifically bind to selected antigens (see, e.g., McCafferty et al., Nature 348:552-554 (1990); Marks et al., Biotechnology 10:779-783 (1992)).
- The phrase “selectively (or specifically) hybridizes to” refers to the binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under stringent hybridization conditions when that sequence is present in a complex mixture (e.g., total cellular or library DNA or RNA).
- The phrase “stringent hybridization conditions” refers to conditions under which a probe will hybridize to its target subsequence, typically in a complex mixture of nucleic acid, but to no other sequences. Stringent conditions are sequence-dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures. An extensive guide to the hybridization of nucleic acids is found in Tijssen, Techniques in Biochemistry and Molecular Biology—Hybridization with Nucleic Probes, “Overview of principles of hybridization and the strategy of nucleic acid assays” (1993). Generally, stringent conditions are selected to be about 5-10° C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength pH. The Tm is the temperature (under defined ionic strength, pH, and nucleic concentration) at which 50% of the probes complementary to the target hybridize to the target sequence at equilibrium (as the target sequences are present in excess, at Tm, 50% of the probes are occupied at equilibrium). Stringent conditions will be those in which the salt concentration is less than about 1.0 M sodium ion, typically about 0.01 to 1.0 M sodium ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30° C. for short probes (e.g., 10 to 50 nucleotides) and at least about 60° C. for long probes (e.g., greater than 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. For selective or specific hybridization, a positive signal is at least two times background, optionally 10 times background hybridization. Exemplary stringent hybridization conditions can be as following: 50% formamide, 5×SSC, and 1% SDS, incubating at 42° C., or 5×SSC, 1% SDS, incubating at 65° C., with wash in 0.2×SSC, and 0.1% SDS at 65° C. Such washes can be performed for 5, 15, 30, 60, 120, or more minutes.
- Nucleic acids that do not hybridize to each other under stringent conditions are still substantially identical if the polypeptides which they encode are substantially identical. This occurs, for example, when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code. In such cases, the nucleic acids typically hybridize under moderately stringent hybridization conditions. Exemplary “moderately stringent hybridization conditions” include a hybridization in a buffer of 40% formamide, 1 M NaCl, 1% SDS at 37° C., and a wash in 1×SSC at 45° C. Such washes can be performed for 5, 15, 30, 60, 120, or more minutes. A positive hybridization is at least twice background. Those of ordinary skill will readily recognize that alternative hybridization and wash conditions can be utilized to provide conditions of similar stringency.
- The phrase “a nucleic acid sequence encoding” refers to a nucleic acid which contains sequence information for a structural RNA such as rRNA, a tRNA, or the primary amino acid sequence of a specific protein or peptide, or a binding site for a trans-acting regulatory agent. This phrase specifically encompasses degenerate codons (i.e., different codons which encode a single amino acid) of the native sequence or sequences which may be introduced to conform with codon preference in a specific host cell.
- The term “recombinant” when used with reference, e.g., to a cell, or nucleic acid, protein, or vector, indicates that the cell, nucleic acid, protein or vector, has been modified by the introduction of a heterologous nucleic acid or protein or the alteration of a native nucleic acid or protein, or that the cell is derived from a cell so modified. Thus, for example, recombinant cells express genes that are not found within the native (nonrecombinant) form of the cell or express native genes that are otherwise abnormally expressed, under-expressed or not expressed at all.
- An “expression vector” is a nucleic acid construct, generated recombinantly or synthetically, with a series of specified nucleic acid elements that permit transcription of a particular nucleic acid in a host cell. The expression vector can be part of a plasmid, virus, or nucleic acid fragment. Typically, the expression vector includes a nucleic acid to be transcribed operably linked to a promoter.
- The phrase “specifically (or selectively) binds to an antibody” or “specifically (or selectively) immunoreactive with”, when referring to a protein or peptide, refers to a binding reaction which is determinative of the presence of the protein in the presence of a heterogeneous population of proteins and other biologics. Thus, under designated immunoassay conditions, the specified antibodies bind to a particular protein and do not bind in a significant amount to other proteins present in the sample. Specific binding to an antibody under such conditions may require an antibody that is selected for its specificity for a particular protein. For example, antibodies raised against a protein having an amino acid sequence encoded by any of the polynucleotides of the invention can be selected to obtain antibodies specifically immunoreactive with that protein and not with other proteins, except for polymorphic variants. A variety of immunoassay formats may be used to select antibodies specifically immunoreactive with a particular protein. For example, solid-phase ELISA immunoassays, Western blots, or immunohistochemistry are routinely used to select monoclonal antibodies specifically immunoreactive with a protein. See, Harlow and Lane Antibodies, A Laboratory Manual, Cold Spring Harbor Publications, NY (1988) for a description of immunoassay formats and conditions that can be used to determine specific immunoreactivity. Typically, a specific or selective reaction will be at least twice the background signal or noise and more typically more than 10 to 100 times background.
- To improve the folding of a polypeptide, the polypeptide is joined to a folding interference domain, which causes the polypeptide to fold poorly. The DNA encoding the polypeptide can then be mutagenized. Sequence alterations that overcome the poor folding imposed by the folding interference domain can be identified by an increase in the activity of the polypeptide or a reporter linked to the polypeptide. Such sequence mutations can include modification of coding sequence, deletion of coding sequence, insertion of additional coding sequences, change of order of coding sequences, within the existing coding sequence or at the N or C termini (5′ or 3′ end of the encoding nucleic acid), non-native amino acids. This method was used to generate “superfolder” variants of the Green Fluorescent Protein, GFP, of the luminescent jellyfish Aequorea victoria and the red fluorescent protein from Discosoma species, DsRed, both of which exhibit enhanced folding and stability properties.
- It is often desirable to improve folding of a protein that does not have a detectable activity. For such an application, a detectable moiety can be linked to the target polypeptide/folding interference domain fusion protein to provide a means of assaying for enhanced folding. Thus, the method of selecting robustly-folding proteins has wide applicability.
- Where the target protein P has an easily measured phenotype, its folding (or solubility) success can be monitored in the presence of a bait protein domain, herein termed a “folding interference domain” (Xid), as Xid-L-P, for example. These bait domains may also be inserted internally into permissive sites of P, e.g., for GFP at position 145 as further described in the Examples, infra. New variants of target protein P, better suited for folding and/or solubility under stringent conditions can thereby be produced.
- When P has no easily measured phenotype associated with correct folding, a reporter domain can be used, for example, in a construct such as Xid-L1-P-L2-R, where R is the reporter domain that tells about the folding of P, Xid is the folding interference domain, and L1 and L2 are flexible linkers.
- As will be appreciated by one of skill in the art, this method can also be applied in a block-optimization of a new protein scaffolding, P, comprised of a series of smaller domains, or subdomains of P (P1, P2, etc.). In this embodiment, for example, a construct such as Xid-L-P1-R is used to optimize P1 using R as the reporter. Next, a subdomain, P2, can be added, e.g., in a construct such as Xid-L-P2-P1-R and used to optimize P2 using R as the reporter. Optionally, P1 can be optimized for folding at the same time. The same reporter domain need not be used to optimize each PN. Eventually, after PN is added, the entire P domain is built from the smaller subdomains.
- Thus, the methods of the invention can be used to increase folding and solubility of a target polypeptide as well as subdomains contained within the target polypeptide.
- The current invention employs basic nucleic acid methodology that is routine in the field of recombinant genetics. Basic texts disclosing the general methods of obtaining and manipulating nucleic acids in this invention include Sambrook and Russell,
MOLECULAR CLONING, A LABORATORY MANUAL (3rd ed. 2001) andCURRENT PROTOCOLS IN MOLECULAR BIOLOGY (Ausubel et al., eds., John Wiley & Sons, Inc. 1994-1997, 2001 version)). - Often, the nucleic acid sequences encoding the fusion proteins of the invention are generated using amplification techniques. Examples of techniques sufficient to direct persons of skill through in vitro amplification methods are found in Berger, Sambrook, and Ausubel, as well as Dieffenfach & Dveksler, PCR Primers: A Laboratory Manual (1995): Mullis et al., (1987) U.S. Pat. No. 4,683,202; PCR Protocols A Guide to Methods and Applications (Innis et al., eds) Academic Press Inc. San Diego, Calif. (1990) (Innis); Arnheim & Levinson (Oct. 1, 1990) C&EN 36-47; The Journal Of NIH Research (1991) 3: 81-94; (Kwoh et al. (1989) Proc. Natl. Acad. Sci. USA 86: 1173; Guatelli et al. (1990) Proc. Natl. Acad. Sci. USA 87, 1874; Lomell et al. (1989) J. Clin. Chem., 35: 1826; Landegren et al., (1988) Science 241: 1077-1080; Van Brunt (1990) Biotechnology 8: 291-294; Wu and Wallace (1989) Gene 4: 560; and Barringer et al. (1990) Gene 89: 117.
- Folding interference domains can be identified by screening a library. For example, a library can be generated in which peptides fragments are generated to a target protein, e.g., green fluorescent protein, and selecting the recombinants in which the signal from the target protein fused to a peptide fragment is less than, for example, about 10% of the signal from a control recombinant that encodes only the target polypeptide. For example, an assay such as the folding assay disclosed by Waldo et al., in Nature Biotech. 17:691-695, 1999, may be employed. Waldo et al. describe a GFP that does not fold well when fused after bull frog red cell H-ferritin. The folding yield of the GFP in the RanaH-L-GFP fusion was approximately 1/50 that of GFP expressed alone. In that work, several other proteins substantially reduced the folding yield of the GFP domain (<10% that of the GFP alone)
- Since the reduction in fluorescence could also be due to a reduction in the level of expression of the fusion protein caused by the trapped peptide fragment, the expression levels of these candidate fusion proteins could subsequently be determined by SDS-PAGE densitometry, for example. Desirable folding interference domains would be those that decrease the folding yield of the test protein or fused reporter domain, while maintaining the level of expression of the fusion protein at a level similar to that of the test protein alone, or the test protein plus reporter domain (i.e., the expression level of the test protein or test protein-reporter domain fusion should be similar to the fusion containing the trapped peptide fragment).
- Any number of proteins or protein domains can be used as a folding interference domain. For example, bull frog red cell H-ferritin, folds poorly when expressed by itself, and when included in a fusion polypeptide, causes the fusion polypeptide to fold poorly. Other poorly folding domains include, but are not limited to the Alzheimer's α/β peptide (amino acids 1-40 of the Alzheimer's precursor protein); domain A of the xylR TOL operon regulatory protein of Pseudomonas putida Perez-Martin, J; Cases, I; deLorenzo, V Design of a solubilization pathway for recombinant polypeptides in vivo through processing of a bi-protein with a viral protease PROTEIN ENGINEERING; JUN 1997; v. 10, no. 6, p. 725-730; and nucleoside diphosphate kinase of the hyperthermophile Pyrobaculum aerophilum (Pedelacq et al, 2002, Nature Biotechnol. 20 (9): 927-932). Any of the insoluble, poorly folded domains described in Waldo et al., in Nature Biotech. 17:691-695, 1999.
- The aforementioned folding interference domains are mostly insoluble when expressed alone in E. coli. However, the folding interference domain need not be insoluble when expressed alone. Some peptides are at least partially soluble when expressed alone or with well-folded highly soluble polypeptides (˜at least 40% soluble), but can nonetheless induce misfolding and poor solubility of many fused polypeptides. Such polypeptides include the lacZα domain (the first 80-100 N-terminal amino acids of the beta galactosidase, a fragment commonly used in protein complementation assays).
- The folding interference domain may be linked, either directly or via a linker, to either the N-terminus or C-terminus of the target polypeptide sequence. Alternatively, the domain may be inserted into an internal site of the target polypeptide that is permissive to the insertion. A permissive site of a host protein is one which tolerates the insertion of well-folded, soluble proteins or polypeptides (guest polypeptides) within the host protein scaffolding. Typical sites are turns and sterically open regions. One such example is amino acid residue 87 of Escherichia coli dihydrofolate reductase. If the protein has a measurable activity (enzyme, fluorescence, binding ability) associated with the native structure, a site is defined as permissive if the host protein containing the guest polypeptide retains at least 5%, or 10%, or preferably at least 20% of the host protein activity observed without the guest.
- A target polypeptide can be any polypeptide for which it is desirable to improve the folding properties. Often such polypeptides include those with reporter activity, such as a fluorescent protein, i.e., green or red fluorescent protein. Other proteins include various enzymes, e.g., antibiotic resistance proteins such as, chloramphenicol acetyltransferase, kanamycin resistance protein, beta-lactamase, tetracycline resistance protein, dihydrofolate reductase; and other enzymes such as subtilisin, fungal xylanases. Other target proteins include antibodies, for which increased binding to the target antigen can be used as the selection criterion.
- A particular aspect of the invention relates to the generation of superfolder fluorescent and chromophoric protein variants, and is described in further detail below and in the Examples, infra.
- A variety of fluorescent proteins and chromoproteins may be “evolved” according to the methods of the invention to generate variants having improved folding and/or solubility properties. The superfolder fluorescent and chromophoric protein variants generally share a common tertiary structure comprising an 11-stranded beta-barrel structure surrounding a centrally-located self-activating chromophore.
- One group of such fluorescent proteins includes the Green Fluorescent Protein isolated from Aequorea victoria (GFP), as well as a number of GFP variants, such as cyan fluorescent protein, blue fluorescent protein, yellow fluorescent protein, etc. Typically, these variants share about 80% or greater sequence identity with the GFP sequence or with SEQ ID NO:2. A number of color shift mutants of GFP have been developed and may be employed in the directed evolution methods of the present invention. These color-shift GFP mutants have emission colors blue to yellow-green, increased brightness, and photostability (Tsien et al., 1998, Annual Review of Biochemistry 67: 509-544). One such GFP mutant, termed the Enhanced Yellow Fluorescent Protein, displays an emission maximum at 529 nm.
- Additional GPF-based variants having modified excitation and emission spectra (Tsien et al., U.S. Patent Appn. 20020123113A1), enhanced fluorescence intensity and thermal tolerance (Thastrup et al., U.S. Patent Appn. 20020107362A1; Bjorn et al., U.S. Patent Appn. 20020177189A1), and chromophore formation under reduced oxygen levels (Fisher, U.S. Pat. No. 6,414,119) have also been described. Most recently, GFPs from the anthozoans Renilla reniformis and Renilla kollikeri were described (Ward et al., U.S. Patent Appn. 20030013849).
- Another group of such fluorescent proteins includes the fluorescent proteins isolated from anthozoans, including without limitation the red fluorescent protein isolated from Discosoma species of coral, DsRed (Matz et al., 1999, Nat. Biotechnol. 17:969-973), (see, e.g., accession number AF168419 version AF168419.2). DsRed and the other anthozoan fluorescent proteins share only about 26-30% amino acid sequence identity to the wild-type GFP from Aequorea victoria, yet all the crucial motifs are conserved, indicating the formation of the 11-stranded beta-barrel structure characteristic of GFP.
- The crystal structure of DsRed has also been solved, and shows conservation of the 11-stranded beta-barrel structure of GFP MMDB Id: 5742 (Yarbrough et al., 2001, Proc. Natl. Acad. Sci. USA 98: 462-467).
- A number of mutants of the longer wavelength red fluorescent protein DsRed have also been described, and similarly, may be employed in the directed evolution methods of the invention. For example, recently described DsRed mutants with emission spectra shifted further to the red may be employed in the practice of the invention (Wiehler et al., 2001, FEBS Letters 487: 384-389; Terskikh et al., 2000, Science 290: 1585-1588; Baird et al., 2000, Proc. Natl. Acad. Sci. USA 97: 11984-11989).
- An increasingly large number of other fluorescent proteins from a number of ocean life forms have recently been described, and the Protein Data Bank currently lists a number of GFP and GFP mutant crystal structures, as well as the crystal structures of various GFP analogs. Related fluorescent proteins with similar structures to GFP from corals, sea pens, sea squirts, and sea anemones have been described, and may be used to generate “superfolder” variants (for reviews, see Zimmer, 2002, Chem. Rev. 102: 759-781; Zhang et al., 2002, Nature Reviews 3: 906-918).
- Fluorescent proteins from Anemonia majano, Zoanthus sp., Discosoma striata, Discosoma sp. and Clavularia sp. have also been reported (Matz et al., supra). A fluorescent protein cloned from the stony coral species, Trachyphyllia geoffroyi, has been reported to emit green, yellow, and red light, and to convert from green light to red light emission upon exposure to UV light (Ando et al., 2002, Proc. Natl. Acad. Sci. USA 99: 12651-12656). Recently described fluorescent proteins from sea anemones include green and orange fluorescent proteins cloned from Anemonia sulcata (Wiedenmann et al., 2000, Proc. Natl. Acad. Sci. USA 97: 14091-14096), a naturally enhanced green fluorescent protein cloned from the tentacles of Heteractis magnifica (Hongbin et al., 2003, Biochem. Biophys. Res. Commun. 301: 879-885), and a generally non fluorescent purple chromoprotein displaying weak red fluorescence cloned from Anemonia sulcata, and a mutant thereof displaying far-red shift emission spectra (595 nm) (Lukyanov et al., 2000, J. Biol. Chem. 275: 25879-25882).
- Additionally, another class of GFP-related proteins having chromophoric and fluorescent properties have been described. One such group of coral-derived proteins, the pocilloporins, exhibit a broad range of spectral and fluorescent characteristics (Dove and Hoegh-Guldberg, 1999, PCT application WO 00/46233; Dove et al., 2001, Coral Reefs 19: 197-204). Recently, the purification and crystallization of the pocilloporin Rtms5 from the reef-building coral Montipora efflorescens has been described (Beddoe et al., 2003, Acta Cryst. D59: 597-599). Rtms5 is deep blue in color, yet is weakly fluorescent. However, it has been reported that Rtms5, as well as other chromoproteins with sequence homology to Rtms5, can be interconverted to a far-red fluorescent protein via single amino acid substitutions (Beddoe et al., 2003, supra; Bulina et al., 2002, BMC Biochem. 3: 7; Lukyanov et al., 2000, supra).
- Various other coral-derived chromoproteins closely related to the pocilloporins are also known (see, for example, Lukyanov et al. 2000, J. Biol. Chem. 275: 25879-82; Gurskaya et al., 2001, FEBS Letters 507: 16-20).
- In one embodiment, fluorescent and chromophoric protein variants exhibiting enhanced folding or solubility are generated from any fluorescent or chromophoric protein having a structure with a root mean square deviation of less than 5 angstroms, often less than 3, or 4 angstroms, and preferably less than 2 angstroms from the 11-stranded beta-barrel structure of Aequorea victoria GFP MMDB Id:5742. In some cases, fluorescent proteins exist in multimeric form. For example, DsRed is tetrameric (Cotlet et al., 2001, Proc. Natl. Acad. Sci. USA 98: 14398014403). As will be appreciated by those skilled in the art, structural deviation between such multimeric fluorescent proteins and GFP (a monomer) is evaluated on the basis of the monomeric unit of the structure of the fluorescent protein.
- As appreciated by one of ordinary skill in the art, such a suitable fluorescent protein or chromoprotein structure can be identified using comparison methodology well known in the art. In identifying the protein, a crucial feature in the alignment and comparison to the MMDB ID:5742 structure is the conservation of the 11 beta strands, and the topology or connection order of the secondary structural elements (see, e.g., Ormo et al. “Crystal structure of the Aequorea victoria green fluorescent protein.” Yang et al, 1996, Science 273: 5280, 1392-5; Yang et al., 1996 Nat. Biotechnol. 10:1246-51). Typically, most of the deviations between a fluorescent protein and the GFP structure are in the length(s) of the connecting strands or linkers between the crucial beta strands, see, e.g., the comparison of DsRed and GFP (Yarbrough et al., 2001, Proc Natl Acad Sci USA 98:462-7). In Yarbrough et al., alignment of GFP and DsRed is shown pictorially. From the stereo diagram, it is apparent that the 11 beta-strand barrel is rigorously conserved between the two structures. The c-alpha backbones are aligned to within 1 angstrom RMSD over 169 amino acids although the sequence identity is only 23% comparing DsRed and GFP.
- In comparing structure, the two structures to be compared are aligned using algorithms familiar to those with average skill in the art, using for example the CCP4 program suite. COLLABORATIVE COMPUTATIONAL PROJECT,
NUMBER 4. 1994. “The CCP4 Suite: Programs for Protein Crystallography”. Acta Cryst. D50, 760-763. In using such a program, the user inputs the PDB coordinate files of the two structures to be aligned, and the program generates output coordinates of the atoms of the aligned structures using a rigid body transformation (rotation and translation) to minimize the global differences in position of the atoms in the two structures. The output aligned coordinates for each structure can be visualized separately or as a superposition by readily-available molecular graphics programs such as RASMOL, Roger A. Sayle and E. J. Milner-White, “RasMol: Biomolecular graphics for all”, Trends in Biochemical Science (TIBS), September 1995, Vol. 20, No. 9, p. 374), or Swiss PDB Viewer, Guex, N and Peitsch, M. C. (1996) Swiss-PdbViewer: A Fast and Easy-to-use PDB Viewer for Macintosh and PC. Protein Data Bank Quarterly Newsletter 77, pp. 7. - In considering the RMSD, the RMSD value scales with the extent of the structural alignments and this size is taken into consideration when using the RMSD as a descriptor of overall structural similarity. The issue of scaling of RMSD is typically dealt with by including blocks of amino acids that are aligned within a certain threshold. The longer the unbroken block of aligned sequence that satisfies a specified criterion, the ‘better’ aligned the structures are. In the DsRed example, 164 of the c-alpha carbons can be aligned to within 1 angstrom of the GFP. Typically, users skilled in the art will select a program that can align the two trial structures based on rigid body transformations, for example DALI, Holm, L. & Sander, C. Protein-structure comparison by alignment of distance matrices. Journal of Molecular Biology 1993, 233, 123-138. The server site for the computer implementation of the algorithm is available, for example, at dali@ebi.ac.uk. The output of the DALI algorithm are blocks of sequence that can be superimposed between two structures using rigid body transformations. Regions with Z-scores at or above a threshold of Z=2 are reported as similar. For each such block, the overall RMSD is reported.
- GFP Proteins with Improved Folding Activity
- Superfolding GFP proteins were generated using the methods set forth herein. These proteins exhibit increased folding compared to wild type GFP or the “Crameri”
cycle 3 GFP (GFP3) (Crameri et al., Eur. J. Biochem. 226:53-58, 1994). The improved GFPs of the invention comprise at least 80% identity to SEQ ID NO: 5 and contain at least one amino acid substitution selected from the group consisting of a substitution atposition 30 that is an arginine or a conservative variant of arginine; a substitution at position 39 that is an asparagine or a conservative variant of asparagine; a substitution at position 105 that is a threonine or a conservative variant of threonine; a substitution at position 171 that is a valine or a conservative variant of valine; and a substitution at position 206 that is a valine or a conservative variant of valine. - In a particular embodiment, a superfolder GFP variant (“GFPSF”) containing the foregoing five amino acid substitutions on a GFP3 background is provided.
- The positions are typically determined with reference to SEQ ID NO: 5. Thus, as appreciated by one of skill in the art, the positions do not refer to the number of amino acids in the protein, but the position relative to SEQ ID NO: 5. For example, a GFP sequence is maximally aligned with SEQ ID NO: 5, for example by manual alignment or using the Smith & Waterman alignment (see, e.g., Adv. Appl. Math. 2:482 (1981)) with the default parameters. The residue of the GFP sequence that aligns with
position 30 of SEQ ID NO: 5, is considered to beposition 30 of the GFP sequence. - The presence of the substitution at the position of the protein results in improved folding of the green fluorescent protein.
- A “green” fluorescent protein of the invention often fluoresces green, but may also have yellow or blue fluorescence. For example, a single amino acid change provide detail shifts the fluorescence from green to blue. A superfolding yellow fluorescent protein (sfYFP) can be made from the superfolding GFP disclosed herein by adding the single amino acid change T203Y. Alternatively, folding of the existing BFP and YFP proteins (Tsien, 1998) Annu. Rev. Biochem. 67: 509-544; Miyawaki et al, 1999, Proc. Natl. Acad. Sci. USA 96: 2135-2140), which is equivalent to the canonical GFP with the mutations S65G, V68L, Q69K, S72A, and T203Y). can each also be improved by making the substitutions disclosed herein.
- DsRed Fluorescent Proteins with Enhanced Folding
- The directed evolution method of the invention has also applied to the generation of a superfolder DsRed fluorescent protein. In a particular embodiment, a superfolder DsRed variant (“DsRedSF”) is provided, and has the amino acid sequence of SEQ ID NO: 4 One example of a polynucleotide encoding DsRedSF has the nucleotide sequence of SEQ ID NO: 3
- Typically an amino acid linker sequence is employed to separate the first and second polypeptide components by a distance sufficient to ensure that each polypeptide could fold into its secondary and tertiary structures. Such an amino acid linker sequence is incorporated into the fusion protein using standard techniques well known in the art. Suitable peptide linker sequences may be chosen based on the following factors: (1) their ability to adopt a flexible extended conformation; (2) their inability to adopt a secondary structure that can interact with functional epitopes on the first and second polypeptides; and (3) the lack of hydrophobic or charged residues that might react with the polypeptide functional epitopes. Typical peptide linker sequences contain Gly, Ser, Ala, Val and Thr residues. Often, a linker is a “flexible linker”, that has a sequence such as (Gly4Ser)x, e.g., (Gly4Ser)3.
- Amino acid sequences which may be usefully employed as linkers include those disclosed in Maratea et al. (1985) Gene 40:39-46; Murphy et al. (1986) Proc. Natl. Acad. Sci. USA 83:8258-8262; U.S. Pat. Nos. 4,935,233 and 4,751,180. The linker sequence may generally be from 1 to about 50 amino acids in length, e.g., 3, 4, 6, or 10 amino acids in length, but can be 100 or 200 amino acids in length. Linker sequences may not be required when the first and second polypeptides have non-essential N-terminal amino acid regions that can be used to separate the functional domains and prevent steric interference.
- Other methods of joining the components of the chimeric protein include ionic binding by expressing negative and positive tails, and indirect binding through antibodies and streptavidin-biotin interactions. (See, e.g., Bioconjugate Techniques, supra). The components can also be joined together through an intermediate interacting sequence. The moieties included in the conjugate molecules can be joined in any order, although the most favorable configuration may be determined empirically.
- Well known recombinant methodology is used to generate the fusion proteins used in the practice of the method of the invention. Fusion constructs can be made by ligating the appropriate nucleic acid sequences encoding the desired amino acid sequences to each other by methods known in the art, in the proper reading frame, and expressing the product by methods known in the art. Nucleic acids encoding the domains to be incorporated into the fusion proteins of the invention can be obtained using routine techniques in the field of recombinant genetics (see, e.g., Sambrook and Russell, eds, Molecular Cloning: A Laboratory Manual, 3rd Ed, vols. 1-3, Cold Spring Harbor Laboratory Press, 2001; and Current Protocols in Molecular Biology, Ausubel, ed. John Wiley & Sons, Inc. New York, 1997).
- Often, the nucleic acid sequences encoding the component domains to be incorporated into the fusion protein are cloned from cDNA and genomic DNA libraries by hybridization with probes, or isolated using amplification techniques with oligonucleotide primers. Amplification techniques can be used to amplify and isolate sequences from DNA or RNA (see, e.g., Dieffenbach & Dveksler, PCR Primers: A Laboratory Manual (1995)). Alternatively, overlapping oligonucleotides can be produced synthetically and joined to produce one or more of the domains. Nucleic acids encoding the component domains can also be isolated from expression libraries using antibodies as probes.
- In an example of obtaining a nucleic acid encoding a domain to be included in the conjugate molecule using PCR, the nucleic acid sequence or subsequence is PCR amplified, using a sense primer containing one restriction site and an antisense primer containing another restriction site. This will produce a nucleic acid encoding the desired domain sequence or subsequence and having terminal restriction sites. This nucleic acid can then be easily ligated into a vector containing a nucleic acid encoding the second domain and having the appropriate corresponding restriction sites. The domains can be directly joined or may be separated by a linker, or other, protein sequence. Suitable PCR primers can be determined by one of skill in the art using the sequence information provided in GenBank or other sources. Appropriate restriction sites can also be added to the nucleic acid encoding the protein or protein subsequence by site-directed mutagenesis. The plasmid containing the domain-encoding nucleotide sequence or subsequence is cleaved with the appropriate restriction endonuclease and then ligated into an appropriate vector for amplification and/or expression according to standard methods.
- Examples of techniques sufficient to direct persons of skill through in vitro amplification methods are found in Berger, Sambrook, and Ausubel, as well as Mullis et al., (1987) U.S. Pat. No. 4,683,202; PCR Protocols A Guide to Methods and Applications (Innis et al., eds) Academic Press Inc. San Diego, Calif. (1990) (Innis); Arnheim & Levinson (Oct. 1, 1990) C&EN 36-47; The Journal Of NIH Research (1991) 3: 81-94; (Kwoh et al. (1989) Proc. Natl. Acad. Sci. USA 86: 1173; Guatelli et al. (1990) Proc. Natl. Acad. Sci. USA 87, 1874; Lomell et al. (1989) J. Clin. Chem., 35: 1826; Landegren et al., (1988) Science 241: 1077-1080; Van Brunt (1990) Biotechnology 8: 291-294; Wu and Wallace (1989) Gene 4: 560; and Barringer et al. (1990) Gene 89:117.
- In some embodiments, it may be desirable to modify the polypeptides encoding the components of the conjugate molecules. One of skill will recognize many ways of generating alterations in a given nucleic acid construct. Such well-known methods include site-directed mutagenesis, PCR amplification using degenerate oligonucleotides, exposure of cells containing the nucleic acid to mutagenic agents or radiation, chemical synthesis of a desired oligonucleotide (e.g., in conjunction with ligation and/or cloning to generate large nucleic acids) and other well-known techniques. See, e.g., Giliman and Smith (1979) Gene 8:81-97, Roberts et al. (1987) Nature 328: 731-734.
- For example, the domains can be modified to facilitate the linkage of the two domains to obtain the polynucleotides that encode the fusion polypeptides of the invention. Catalytic domains and binding domains that are modified by such methods are also part of the invention. For example, a codon for a cysteine residue can be placed at either end of a domain so that the domain can be linked by, for example, a disulfide linkage. The modification can be performed using either recombinant or chemical methods (see, e.g., Pierce Chemical Co. catalog, Rockford Ill.).
- The domains of the recombinant fusion proteins are often joined by linkers, usually polypeptide sequences of neutral amino acids such as serine or glycine, that can be of varying lengths, for example, about 200 amino acids or more in length, with 1 to 100 amino acids being typical. Often, the linkers are 10, 15, 20, 25, 30, 35, 40, 45, or 50 amino acid residues or less in length. In some embodiments, proline residues are incorporated into the linker to prevent the formation of significant secondary structural elements by the linker. Linkers can often be flexible amino acid subsequences that are synthesized as part of a recombinant fusion protein. Such flexible linkers are known to persons of skill in the art. Typically, a flexible linker is a peptide linker of any length whose amino acid composition is rich in glycine to minimize the formation of rigid structure by interaction of amino acid side chains with each other or with the polypeptide backbone. A typical flexible linker has the composition (Gly4Ser)x.
- In some embodiments, the recombinant nucleic acids encoding the fusion proteins of the invention are modified to provide preferred codons which enhance translation of the nucleic acid in a selected organism (e.g., yeast preferred codons are substituted into a coding nucleic acid for expression in yeast).
- Target polypeptides with enhanced folding ability are typically identified by mutating the nucleic acid sequence encoding the target polypeptide, generating a fusion protein (comprising the mutated target polypeptide, a poorly folding domain, and optionally, a reporter gene), and selecting those polypeptides with enhanced reporter activity, thus identifying target polypeptides that overcome the poor folding property imposed by the poorly folding domain.
- The nucleic acid sequences encoding the target polypeptide of interest can be mutated using methods well known to those of ordinary skill in the art. The target polypeptide is usually mutated by mutating the nucleic acid. Techniques for mutagenizing are well known in the art. These include, but are not limited to, such techniques as error-prone PCR, chemical mutagenesis, and cassette mutagenesis Alternatively, mutator strains of host cells may be employed to add mutational frequency (Greener and Callahan (1995) Strategies in Mol. Biol. 7: 32). For example, error-prone PCR (see, e.g., Ausubel, supra) uses low-fidelity polymerization conditions to introduce a low level of point mutations randomly over a long sequence. Other mutagenesis methods include, for example, recombination (WO98/42727); oligonucleotide-directed mutagenesis (see, e.g., the review in Smith, Ann. Rev. Genet. 19: 423-462 (1985); Botstein and Shortle, Science 229: 1193-1201 (1985); Carter, Biochem. J. 237: 1-7 (1986); Kunkel, “The efficiency of oligonucleotide directed mutagenesis” in Nucleic acids & Molecular Biology, Eckstein and Lilley, eds., Springer Verlag, Berlin (1987), Methods in Enzymol. 100: 46.8-500 (1983), and Methods in Enzymol. 154: 329-350 (1987)); phosphothioate-modified DNA mutagenesis (Taylor et al., Nucl. Acids Res. 13: 8749-8764 (1985); Taylor et al., Nucl. Acids Res. 13: 8765-8787 (1985); Nakamaye and Eckstein, Nucl. Acids Res. 14: 9679-9698 (1986); Sayers et al., Nucl. Acids Res. 16:791-802 (1988); Sayers et al., Nucl. Acids Res. 16: 803-814 (1988)), mutagenesis using uracil-containing templates (Kunkel, Proc. Nat'l. Acad. Sci. USA 82: 488-492 (1985) and Kunkel et al., Methods in Enzymol. 154:367-382, 1987); mutagenesis using gapped duplex DNA (Kramer et al., Nucl. Acids Res. 12: 9441-9456 (1984); Kramer and Fritz, Methods in Enzymol. 154:350-367 (1987); Kramer et al., Nucl. Acids Res. 16: 7207 (1988)); and Fritz et al., Nucl. Acids Res. 16: 6987-6999 (1988)). Additional methods include point mismatch repair (Kramer et al., Cell 38: 879-887 (1984)), mutagenesis using repair-deficient host strains (Carter et al., Nucl. Acids Res. 13: 4431-4443 (1985); Carter, Methods in Enzymol. 154: 382-403 (1987)), deletion mutagenesis (Eghtedarzadeh and Henikoff, Nucl. Acids Res. 14: 5115 (1986)), restriction-selection and restriction-purification (Wells et al., Phil. Trans. R. Soc. Lond. A 317: 415-423 (1986)), mutagenesis by total gene synthesis (Nambiar et al., Science 223: 1299-1301 (1984); Sakamar and Khorana, Nucl. Acids Res. 14: 6361-6372 (1988); Wells et al., Gene 34:315-323 (1985); and Grundstrom et al., Nucl. Acids Res. 13: 3305-3316 (1985). Kits for mutagenesis are commercially available (e.g., Bio-Rad, Amersham International). More recent approaches include codon-based mutagenesis, in which entire codons are replaced, thereby increasing the diversity of mutants generated, as exemplified by the RID method described in Murakami et al., 2002, Nature Biotechnology, 20: 76-81.
- Folding may be detected and assessed using various tests commonly used to determine correct folding, including without limitation spectroscopy, resistance to denaturation, kinetics, and tolerance for additional random mutations and polypeptide insertions. In one embodiment, circular dichroism may be used to distinguish between folded and unfolded forms of a polypeptide. In another embodiment, folding kinetics may be used, wherein better folded versions of P are identified by their ability to adopt a correctly folded conformation faster than poorer folding variants or the wild type protein. Preferably, the evolved polypeptide will display about a 25% faster refolding time following denaturation.
- In another embodiment, resistance to denaturation may be used to assess folding. For example, increasing concentrations of urea may be used to assess more robustly folding variants. A polypeptide variant with significantly improved folding activity is typically one which can tolerate about a 0.5 molar higher urea concentration compared to the wild type or starting polypeptide.
- Tolerance to random mutations may also be used to assess the folding enhancement achieved following polypeptide evolution. Briefly, a library of random mutants of both the wild type (or pre-evolved) polypeptide and the test evolved polypeptide are generated. A 0.7% amino acid mutation rate, for example, may be appropriate. The library clones are then evaluated for fluorescence as a measure of correct folding. The presence and extent to which the evolved polypeptide mutant library displays a greater number of fluorescent clones relative to the wild type mutant library indicates the folding robustness of the evolved test polypeptide.
- Similarly, tolerance to terminally fused or inserted polypeptides may provide an indication of the folding enhancement achieved following the directed evolution method of the invention. In one embodiment, random insertion mutant libraries may be created using, for example, transposon-mediated mutagenesis techniques (Gorshin et al., 2000, Nature Biotechnol. 18: 97) and commercially available kits (e.g., Epicentre Technologies, Madison, Wis.). More robustly folding mutants in the evolved mutant library relative to the unevolved mutant polypeptide library provides an indication of the extent to which the evolved test polypeptide has enhanced folding properties. Similarly, the tolerance to larger insertions may provide an indication of the extent to which the evolved polypeptide has acquired enhanced folding properties.
- Another method for evaluating acquisition of enhanced folding in evolved polypeptides involves the generation of circular permutants of the test evolved polypeptide. Briefly, the native N and C termini of the test evolved polypeptide are ligated together at the polynucleotide level, and start codons are randomly introduced into the coding sequence. A library of circular permutants is then expressed and compared to a library of circular permutants generated from the unevolved polypeptide, wherein the relative number of permissive sites for the randomly inserted start codons may be determined by a functional screen indicative of correct folding and thereby provides an indication of folding enhancement acquired by the evolved polypeptide.
- In general, superfolder polypeptides will enable the generation of a greater range of circular permutants, relative to the wild type or pre-evolved polypeptide from which the superfolder was generated. This is a particularly important consideration in regards to fluorescent proteins, for which the generation of a variety of circular permutants is desirable for developing appropriate FRET pairs. FRET, or Fluorescence Resonance Energy Transfer, is the non-radiative transfer of energy from a donor fluorophore to an acceptor fluorophore spatially located within about 80 Angstroms of each other. The relative geometric context of the two fluorophores is an important component of FRET. Circular permutation may be used to alter the geometric orientation of the fluorophores relative to each other.
- Functional assays may also be utilized where appropriate, and may be preferred. For example, a biological property of a protein of interest may be measured as an indication of folding. For example, if the protein is a fluorescent or chromophoric protein, the presence and intensity of emitted fluorescence or color, respectively, provides an indication of folding. Brighter fluorescence, for example, provides an indication of better folding in relation to dimmer variants of P (or colonies expressing P).
- Additionally, misfolded proteins often aggregate and become insoluble, and a corresponding test may be applied by first determining that the correctly folded protein is soluble, and that the incorrectly folded protein is insoluble. For example, if the protein is an enzyme, and the correctly folded enzyme is active and its activity can be measured, and the soluble protein is active while the insoluble protein is inactive, then if Xid-L-P is soluble and active, P would be inferred to be correctly folded. If Xid-L-P is not active, and also insoluble, then it may be concluded that P is misfolded. Xid-L-P might be active and yet insoluble, or Xid-L-P might be soluble but inactive.
- Alternatively, the solubility of Xid-L-P could be used to determine the folding of P in Xid-L-P as above. If the correctly folded version of P binds a target peptide Pt, and the binding can be detected, for example if Pt is an antibody that is conjugated to a reporter domain R, or has and intrinsically detectable signal, or P and Pt are binding or folding partners, or P and Pt comprise two of at least two domains of a split protein or multiprotein complex, which has a detectable phenotype when the fragments or components are assembled, the assembly dependent on the correct folding of P in Xid-L-P. Also, folding of P could be measured by the resistance of P to limited proteolysis coupled to selection by phage display (in which case the method is a way of increasing the stringency of selection by phage display (Martin et al., 2001, J. Mol. Biol. 309(3): 717-26.
- Also, the folding of P in Xid-L-P could be detected by using a folding reporter such as GFP or some other protein with a detectable phenotype (enzyme activity, fluorescence, ability to bind other proteins or molecules) such that the detection of R in Xid-L-P-R is an indication of correct folding by R and therefore of P (see Waldo patent “method for determining and modifying protein/peptide solubility).
- Detectable phenotypes are not limited to enzymatic activity or fluorescence. For example, the phenotype associated with correct folding of P could be the ability of P to bind a target molecule, the binding event being detectable by some means. In this case, the reporter domain might not have activity until the binding event occurs. For example, P could be a component of a complementation system or split protein such as the S-protein or S-peptide (which associate to form active RNASE-A), or the split dihydrofolate reductase, or the split beta lactamase (Galarneau, A; Primeau, M; Trudeau, L E; Michnick, S W Beta-lactamase protein fragment complementation assays as in vivo and in vitro sensors of protein-protein interactions Nature Biotechnology; JUN 2002; v. 20, no. 6, p. 619-622, or the split beta galactosidase (Wigley, W C; Stidham, R D; Smith, N M; Hunt, J F; Thomas, P J Protein solubility and folding monitored in vivo by structural complementation of a genetic marker protein Nature Biotechnology; FEB 2001; v. 19, no. 2, p. 131-136). The split proteins could be self-assembling, or require the association via fused partners that are capable of association, such as coiled-coils. (Galarneau, A; Primeau, M; Trudeau, L E; Michnick, S W Beta-lactamase protein fragment complementation assays as in vivo and in vitro sensors of protein-protein interactions Nature Biotechnology; June 2002; v. 20, no. 6, p. 619-622.
- It is desirable that the signal level given as the detectable phenotype be proportionate to the amount of correctly folded reporter molecule. The binding event could be that of an antibody that recognizes an epitope of the correctly-folded target P, binding of the antibody measured by some means such as the enzymatic activity of a linked enzyme.
- The mutated target polypeptides tested for folding activity in the context of a fusion protein comprising a poorly folding domain, which was selected for its poor folding properties in the expression system of interest. Folding activity is typically measured by measuring the amount of reporter activity, as the amount of active protein is dependent on proper folding. The target polypeptide may itself have reporter activity or may be joined to another molecule that has reporter activity.
- Reporter molecules that can be used include those with activities that can be directly measured, e.g., fluorescent polypeptides, e.g., green, blue, yellow, or red fluorescent proteins and variants of those proteins; polypeptides encoded by antibiotic resistance genes; and molecules that can be indirectly measured, e.g., enzymes such as β-galactosidase, alkaline phosphatase, horse radish peroxidase, β-lactamase, or other enzymes that require a secondary detection reagent. Other polypeptides such as antibodies or other binding protein, may be measured by assessing their ability to specifically bind to a binding partner. Other polypeptides could be parts of ‘split protein’ complementing pairs. Such as DHFR (1-105) and DHFR (106-186) from murine dihydrofolate reductase (see, Remy et al., 1999, Proc. Natl. Acad. Sci. USA, 96: 5394-5399). Also, various split proteins such as beta lactamase, beta galactosidase, etc. Also, this assay can be performed in vitro using cell free-expression and appropriate substrates (fluorogenic, chemoluminescent, etc.; see Galacton Star reagent for beta galactosidase, a ribonucleic acid donor/quencher substrate which is the target of RNASE-A, for example, the split S-protein S-peptide system (Novagen) Kelemen, B R; Klink, T A; Behlke, M A; Eubanks, S R; Leland, P A; Raines, R T Hypersensitive substrate for ribonucleases Nucleic Acids Research; Sep. 15, 1999; v. 27, no. 18, p. 3696-3701.
- Various non-polypeptide reporters may also be employed, such as cyclic arseno compounds capable of binding to poly cysteine tags on proteins and cyclizing to become fluorescent. (Adams et al., 2002, Journal Of The American Chemical Society, 124: 6063-6076). Polypeptide with enhanced folding properties are then selected and can be obtained in the quantity desired using various expression systems.
- There are many expression systems for producing the proteins of the invention, e.g., the GFP variants with enhanced folding or the fusion proteins, that are well know to those of ordinary skill in the art. (See, e.g., Gene Expression Systems, Fernandes and Hoeffler, Eds. Academic Press, 1999; Ausubel, supra; Russell & Sambrook, supra.) The protein may be, but need not be, expressed in the system in which the folding properties were determined. The polynucleotide that encodes the fusion polypeptide is placed under the control of a promoter that is, functional in the desired host cell. An extremely wide variety of promoters are available, and can be used in the expression vectors of the invention, depending on the particular application. Other expression control sequences such as ribosome binding sites, transcription termination sites and the like are also optionally included. Constructs that include one or more of these control sequences are termed “expression cassettes.” Accordingly, the nucleic acids that encode the joined polypeptides are incorporated for high level expression in a desired host cell.
- Commonly used prokaryotic control sequences, which are defined herein to include promoters for transcription initiation, optionally with an operator, along with ribosome binding site sequences, include such commonly used promoters as the beta-lactamase (penicillinase) and lactose (lac) promoter systems (Change et al., Nature (1977) 198: 1056), the tryptophan (trp) promoter system (Goeddel et al., Nucleic Acids Res. (1980) δ: 4057), the tac promoter (DeBoer, et al., Proc. Natl. Acad. Sci. U.S.A. (1983) 80:21-25); and the lambda-derived PL promoter and N-gene ribosome binding site (Shimatake et al., Nature (1981) 292: 128). The particular promoter system is not critical to the invention, any available promoter that functions in prokaryotes can be used. Standard bacterial expression vectors include plasmids such as pBR322-based plasmids, e.g., pBLUESCRIPT™, pSKF, pET23D, λ-phage derived vectors, p15A-based vectors (Rose, Nucleic Acids Res. (1988) 16:355 and 356) and fusion expression systems such as GST. Epitope tags can also be added to recombinant proteins to provide convenient methods of isolation, e.g., c-myc, HA-tag, 6-His tag, maltose binding protein, VSV-G tag, anti-DYKDDDDK tag, or any such tag, a large number of which are well known to those of skill in the art.
- For expression of fusion polypeptides in prokaryotic cells other than E. coli, regulatory sequences for transcription and translation that function in the particular prokaryotic species is required. Such promoters can be obtained from genes that have been cloned from the species, or heterologous promoters can be used. For example, the hybrid trp-lac promoter functions in Bacillus in addition to E. coli. These and other suitable bacterial promoters are well known in the art and are described, e.g., in Russell & Sambrook and Ausubel et al. Bacterial expression systems for expressing the proteins of the invention are available in, e.g., E. coli, Bacillus sp., and Salmonella (Palva et al., Gene 22:229-235 (1983); Mosbach et al., Nature 302:543-545 (1983). Kits for such expression systems are commercially available.
- Similarly, the for expression of fusion polypeptides in eukaryotic cells, transcription and translation sequences that function in the particular eukaryotic species are required. For example, eukaryotic expression systems for mammalian cells, yeast, and insect cells are well known in the art and are also commercially available. In yeast, vectors include Yeast Integrating plasmids (e.g., YIp5) and Yeast Replicating plasmids (the YRp series plasmids) and pGPD-2. Expression vectors containing regulatory elements from eukaryotic viruses are typically used in eukaryotic expression vectors, e.g., SV40 vectors, papilloma virus vectors, and vectors derived from Epstein-Barr virus. Other exemplary eukaryotic vectors include those employing the CMV promoter, metallothionein promoter, murine mammary tumor virus promoter, Rous sarcoma virus promoter, polyhedrin promoter, or other promoters shown effective for expression in eukaryotic cells.
- Either constitutive or regulated promoters can be used in the present invention. Regulated promoters can be advantageous because the host cells can be grown to high densities before expression of the fusion polypeptides is induced. High level expression of heterologous proteins slows cell growth in some situations. An inducible promoter is a promoter that directs expression of a gene where the level of expression is alterable by environmental or developmental factors such as, for example, temperature, pH, anaerobic or aerobic conditions, light, transcription factors and chemicals.
- For E. coli and other bacterial host cells, inducible promoters are known to those of skill in the art. These include, for example, the lac promoter, the bacteriophage lambda PL promoter, the hybrid trp-lac promoter (Amann et al. (1983) Gene 25: 167; de Boer et al. (1983) Proc. Nat'l. Acad. Sci. USA 80: 21), and the bacteriophage T7 promoter (Studier et al. (1986) J. Mol. Biol.; Tabor et al. (1985) Proc. Nat'l. Acad. Sci. USA 82: 1074-8). These promoters and their use are discussed in Sambrook et al., supra.
- Inducible promoters for other organisms are also well known to those of skill in the art. These include, for example, the metallothionein promoter, the heat shock promoter, as well as many others.
- Translational coupling may be used to enhance expression. The strategy uses a short upstream open reading frame derived from a highly expressed gene native to the translational system, which is placed downstream of the promoter, and a ribosome binding site followed after a few amino acid codons by a termination codon. Just prior to the termination codon is a second ribosome binding site, and following the termination codon is a start codon for the initiation of translation. The system dissolves secondary structure in the RNA, allowing for the efficient initiation of translation. See Squires, et. al. (1988), J. Biol. Chem. 263: 16297-16302.
- The construction of polynucleotide constructs generally requires the use of vectors able to replicate in host bacterial cells, or able to integrate into the genome of host bacterial cells. Such vectors are commonly used in the art. A plethora of kits are commercially available for the purification of plasmids from bacteria (for example, EasyPrepJ, FlexiPrepJ, from Pharmacia Biotech; StrataCleanJ, from Stratagene; and, QIAexpress Expression System, Qiagen). The isolated and purified plasmids can then be further manipulated to produce other plasmids, and used to transform cells.
- The polypeptides can be expressed intracellularly, or can be secreted from the cell. Intracellular expression often results in high yields. If necessary, the amount of soluble, active fusion polypeptide may be increased by performing refolding procedures (see, e.g., Sambrook et al., supra.; Marston et al., Bio/Technology (1984) 2: 800; Schoner et al., Bio/Technology (1985) 3: 151). Fusion polypeptides of the invention can be expressed in a variety of host cells, including E. coli, other bacterial hosts, yeast, and various higher eukaryotic cells such as the COS, CHO and HeLa cells lines and myeloma cell lines. The host cells can be mammalian cells, insect cells, or microorganisms, such as, for example, yeast cells, bacterial cells, or fungal cells.
- Once expressed, the recombinant fusion polypeptides can be purified according to standard procedures of the art, including ammonium sulfate precipitation, affinity columns, column chromatography, gel electrophoresis and the like (see, generally, R. Scopes, Protein Purification, Springer-Verlag, N.Y. (1982), Deutscher, Methods in Enzymology Vol. 182: Guide to Protein Purification, Academic Press, Inc. N.Y. (1990)). Substantially pure compositions of at least about 90 to 95% homogeneity are preferred, and 98 to 99% or more homogeneity are most preferred.
- To facilitate purification of the fusion polypeptides of the invention, the nucleic acids that encode the fusion polypeptides can also include a coding sequence for an epitope or “tag” for which an affinity binding reagent is available. Examples of suitable epitopes include the myc and V-5 reporter genes; expression vectors useful for recombinant production of fusion polypeptides having these epitopes are commercially available (e.g., Invitrogen (Carlsbad Calif.) vectors pcDNA3.1/Myc-His and pcDNA3.1/V5-His are suitable for expression in mammalian cells).
- Additional expression vectors suitable for attaching a tag to the fusion proteins of the invention, and corresponding detection systems are known to those of skill in the art, and several are commercially available (e.g., FLAG” (Kodak, Rochester N.Y.). Another example of a suitable tag is a polyhistidine sequence, which is capable of binding to metal chelate affinity ligands. Typically, six adjacent histidines are used, although one can use more or less than six. Suitable metal chelate affinity ligands that can serve as the binding moiety for a polyhistidine tag include nitrilo-tri-acetic acid (NTA) (Hochuli, E. (1990) “Purification of recombinant proteins with metal chelating adsorbents” In Genetic Engineering: Principles and Methods, J. K. Setlow, Ed., Plenum Press, NY; commercially available from Qiagen (Santa Clarita, Calif.)).
- Uses of Evolved Polypeptides with Improved Folding Properties.
- Evolved polypeptides with improved folding can be used in any number of applications. In particular, those target polypeptides that can be used as reporter proteins can be used to report expression level, unaffected by folding. Conventional methods for assessing protein expression in vivo, require poorly folded proteins to be unfolded, for example, prior to probing with labeled antibodies. These proteins do not generally refold well prior to probing or sandwich ELISA, leading to an underestimate of expression level as the misfolded aggregated protein domains are not available for binding by the antibody. Obviously this denaturing method is not suited for intact, high throughput in vivo protein expression monitoring. Furthermore, conventional methods for assessing protein expression in vivo do not work well when the protein domains are buried in aggregates. In contrast, the reporter activity of a polypeptide that has enhanced folding can more accurately reflect expression.
- In particular, the GFP and DsRed variants described herein that have improved folding activity can be used in many in vivo and high throughput applications. For example, Xid-L-GFPSF fluorescence is a direct indicator of total expression levels. The assay can thus be applied to single cells using flow cytometry.
- Furthermore, the superfolder fluorescent proteins provided herein provide new and more stable scaffolds for the creation of new GFP variants based on circular permutation.
- Various aspects of the invention are further described and illustrated by way of the several examples which follow, none of which are intended to limit the scope of the invention.
- The following example describes the use of the method of the invention to generate superfolding variants of GFP.
- To create the ‘superfolder’ GFP variant, a ‘directed evolution’ experiment was performed in which a poorly folded ferritin domain was linked to the sequence of a GFP3 domain (Crameri variant plus F64L and S65T)(Waldo et al., 1999, Nature Biotechnol. 17: 691-695. The ferritin domain provided the ‘bait’ to challenge the GFP3 to fold under stringent conditions.
- After three rounds of in vitro mutation and recombination, followed by in vivo selection, there was no further increase in the brightness of the colonies. Twelve clones were selected and sequenced by fluorescent dye dideoxy-terminator sequencing technology. Most of the clones contained at least 5 of 6 consensus mutations. The consensus mutations were S30R, Y39N, N105T, Y145F, I171V, and A206V. The resulting GFP, termed superfolder GFP (GFPSF), was many-fold brighter as a fusion with ferritin compared to the starting GFP3 variant.
-
FIG. 1 shows Normalized whole cell fluorescence for E. coli BL21(DE3) expressing GFP variants as C-terminal fusions with poorly-folded bullfrog red cell H-subunit ferritin (bracketed). Expression at 37° C. (black) and 27° C. (grey). GFP variants (left to right) cycle-3 redshift, 6 single point mutants, super folder (left, bracketed)). Non-fusion GFP variants (cycle-3 redshift and superfolder, (right)) as reference. Note that the fluorescence of the optimized superfolder fused to ferritin is essentially identical to the non-fusion cycle-3 redshift GFP. In contrast, cycle-3 redshift GFP fused to ferritin is poorly folded (far left). As expected, the fluorescence is higher at 27° C. relative to 37° C., consistent with the improved folding at lower temperature. - The ferritin-linker-GFPSF fusion protein partitioned quantitatively to the inclusion body fraction, as was the case with the ferritin-linker-GFP3 variant. The solubility of the fusion protein was therefore controlled by the solubility of its most poorly folded domain (ferritin). The aggregated fusion protein also failed to catalyze the oxidation of Fe2+, yet was brightly fluorescent. This observation suggested that the aggregated fusion protein still contained a misfolded and poorly soluble ferritin domain, but a correctly folded and functional GFP domain. Accordingly, it was concluded that the superfolder mutations uncoupled the folding of the GFP domain and the formation of the chromophore from the presence of misfolded fused ferritin domain.
- The following example describes the use of the method of the invention to generate superfolding variants of DsRed.
- To create the evolved superfolder dsRED, we followed the same protocol used to create the superfolder GFP (supra) with the following modifications. The starting material was an improved variant of dsRED with decreased aggregation and increased rate of chromophore formation, termed dsRED T4, previously described by Glick and co-workers (Bevis B J, Glick B S. Rapidly maturing variants of the Discosoma red fluorescent protein (DsRed). Nat. Biotechnol. 2002 January; 20(1):83-87). The starting variant has the dsRED wild-type sequence, with the indicated mutations of Glick (see Table 1).
- Clone optima were picked from each round of directed evolution as for GFP, except the IllumaTool (Light Tools Research) was equipped with a 580 nm excitation filter, and the plates were either visually examined or photographed, through a 610 nm long pass red filter. After 5 rounds, the rate of fluorescence increase with each cycle began to reach a plateau (determined by examining the whole-cell culture fluorescence for the pooled top 10 optima from each round in a BioTek FL600 plate reader (580 nm ex/610 nm em, 40)(see
FIG. 10 ). The process was stopped and 10 colonies fromround 5 were sequenced. The top 3 brightest colonies all shared the same consensus sequence (see Table 1).Amino acid position 2, which was alanine in the Glick T4 mutant, mutated to glutamic acid in the dsRED superfolder. - A monomeric variant of dsRED was recently engineered by Tsien (Campbell R E, Tour O, Palmer A E, Steinbach P A, Baird G S, Zacharias D A, Tsien R Y. A monomeric red fluorescent protein Proc Natl Acad Sci USA. 2002 Jun. 11; 99(12):7877-82). This sequence is included in Table 1 for reference. The monomeric variant of Tsien contains several of the Glick T4 (this was the starting parental variant used by Tsien & co-workers for engineering the monomeric dsRED). One of the superfolder amino acid positions (177) was found as F177V by Tsien, and F177I in this work. However, Tsien specified that this mutation was associated with the monomeric character (wild type dsRed is a tetramer). There is no teaching in the work of Tsien that this mutation improves folding above that of the starting variant. F177I in this example, contributing to the improved folding of the
dsRED cycle 5, is a new and surprising property of mutation at F177, not anticipated by Tsien. Similarly, the negatively charged R2E ofsuperfolder dsRED cycle 5 in our work differs from the R2A non-charged variant previously described by Glick, and there is no teaching in Glick or Tsien that mutations at R improved the folding of dsRED or increase its tolerance to misfolded fused proteins. Instead, Glick simply states that replacing basic residues near the N-terminus of dsRED can improve its solubility (no statement regarding folding or fluorescence yield). Thus, the property of R2E in increasing the folding yield of dsRED fused to poorly folded proteins is a surprising property of R2. -
TABLE 1 Amino acid mutations of various dsRED variants. sfdsRED Glick based on # aa dsRED mdsRED T4 Glick T4 1 2 R A A E 2 5 K E E 3 6 N D D 4 17 R H 5 21 T S S 6 41 H T T 7 42 N Q 8 43 T N 9 44 V A 10 71 V A 11 83 K L 12 105 V A 13 114 Q E 14 117 C E 15 118 F L 16 124 F L 17 125 I R 18 127 V T 19 145 A P 20 150 L M 21 153 R E 22 156 V A 23 160 E D 24 162 H K 25 163 K M 26 164 A R 27 174 L D 28 175 V A 29 176 E D 30 177 F V I 31 179 S T 32 180 I T 33 192 Y A 34 194 Y K 35 195 V T 36 197 S I 37 203 S N 38 217 T A A 39 222 H S 40 223 L T 41 224 F G 42 225 L A # index of amino acid cited aa Position in dsRED amino acid coding sequence of the amino acid cited. dsRED wild-type amino acid at position cited. mdsRED amino acid of monomeric variant of Tsien. Glick T4 amino acid of improved variant of Glick. sfdsRED amino acid of superfolder dsRED (this work). Grey rows: amino acid positions in common with this work, at which previous workers also specify a mutation relative to wild type. - To test the effect of the superfolder mutations in greater detail, 6 single-point mutants of cycle-3 redshift were engineered by PCR using methods well-established in the art. Each mutant incorporated one of the 6 mutations found in the superfolder GFP variant. These were cloned into a pET vector as C-terminal fusions with poorly-folded bullfrog redcell ferritin (Waldo G S, Standish B M, Berendzen J, Terwilliger T C. (1999) Rapid protein-folding assay using green fluorescent protein. Nat. Biotechnol. 17(7): 691-695). Overnight cultures in Luria-Bertani (LB) media containing kanamycin (35 □g.ml−1 were diluted 100-fold and grown for 2 h at 37° C. Proteins were expressed for 4 h by adding isopropyl-□-D-thiogalactopyranoside (IPTG) to 1 mM in 3 ml cultures of LB (Luria-Bertani) media at either 37° C. or 27° C. in E. coli BL21(DE3) as C-terminal fusions with poorly-folded bullfrog red cell H-subunit ferritin. Cycle-3 redshift and superfolder were cloned and expressed similarly as controls, both with and without the N-terminal ferritin. The fluorescence (488 nm ex/520 nm em) and absorbance (600 nm) were measured for each culture using a BioTek FL-600 plate reader (
FIG. 1 ). - Single colony transformants of either the cycle-3 or superfolder GFP in E. coli BL21 (DE3) were grown LB, and shaken overnight at 37° C. This pre-culture was used to inoculate LB medium containing kanamycin (35 □g.ml−1). One colony was picked, inoculated ate a larger volume culture (˜1 L) that was grown to mid-log phase at 37° C. and subsequently induced with 1 mM IPTG (isopropyl-□-D-thiogalactopyranoside) for about six hours. The cell pellets were harvested by centrifugation at 5° C. and stored at −20° C.
- Cell-free extract was centrifuged (100000 g, 30 min at 15° C.) and the supernatant loaded onto a 10 ml volume metal affinity resin (Talon, Clontech) equilibrated in buffer A (150 mM NaCl, 100 mM Hepes-NaOH pH=7.5). Unbound proteins were washed off with buffer A containing 10 mM imidazole. The bound protein was then eluted with buffer B (200 mM Imidazole, 150 mM NaCl, 100 mM Hepes-NaOH pH=7.5) to a final volume of 15 ml.
- Ammonium sulfate was added to 80% saturation (ca. 0.48 mg added to 1 ml of protein solution) at 27° C. The solution was stirred for 15 min at the same temperature until dissolved, then incubated on ice for an additional 30 min. The mixture containing the precipitated protein was centrifuged and the supernatant discarded. The precipitate was progressively dissolved in 3 ml buffer C (20 mM Hepes-NaOH pH=7.5), and the protein solution was dialyzed overnight against the same buffer.
- Sixteen proteins from the hyperthermophile Pyrobaculum aerophilum that had been previously cloned and characterized, (Waldo G S, Standish B M, Berendzen J, Terwilliger T C. (1999) Rapid protein-folding assay using green fluorescent protein. Nat. Biotechnol. 17(7): 691-695; Waldo GS, (2002) Method for determining and modifying protein/peptide solubility, U.S. Pat. No. 6,448,087, were expressed in E. coli BL21 (DE3) as N-terminal fusions with either cycle-3 or superfolder GFP. Overnight LB cultures containing kanamycin (35 □g.ml−1) were diluted 100-fold into fresh 1 ml cultures at 37° C. After 1.5 h, protein expression was induced with 1 mM IPTG at 37° C., then arrested after 45 min by adding chloramphenicol to a final concentration of 100 μg/ml. Cells were pelleted by centrifugation and suspended in buffer D (100 mM TRIS HCl pH 8.0, 150 mM NaCl). Aliquots of these suspended cells were examined for GFP fusion fluorescence and total protein expression as follows. 10 □l cell aliquots were mixed with 180 □l of buffer D and the fluorescence measured (488 ex/520 em) using an FL600 plate reader (Biotek). 10 □l cell aliquots were mixed with SDS loading buffer containing dithiothreitol in PCR tubes and denatured for 5 min at 95° C. 8 □l of the denatured samples were run on 4-20% gradient gels (BioRad), stained using Gelcode Blue (BioRad), and protein quantified by scanning densitometry using a GS-800 calibrated densitometer (BioRad).
-
FIG. 2 represents a plot of the normalized fluorescence versus the total whole cell expression (determined by SDS-PAGE densitometry). Many of the proteins are poorly folded and the cells carrying these constructs are only weakly fluorescent in the case of cycle-3 GFP, as expected. Thus the whole cell fluorescence is poorly correlated with total expression level. Instead, the fluorescence of the cycle-3 GFP fusions was correlated with the non-fusion solubility of the proteins expressed alone as previously reported (Waldo G S, Standish B M, Berendzen J, Terwilliger T C. (1999) Rapid protein-folding assay using green fluorescent protein. Nat. Biotechnol: 17(7): 691-695; Waldo GS, (2002) Method for determining and modifying protein/peptide solubility, U.S. Pat. No. 6,448,087. - In contrast, the fluorescence of the superfolder GFP fusions was overall much higher than that of the cycle-3 GFP fusions (
FIG. 2 ). The fluorescence of the superfolder GFP fusions was well correlated with total expression, suggesting that the folding yield of the GFP domain was independent of the folding yield of the attached upstream protein. Thus, the folding trajectory of the superfolder GFP appears to be considerably more robust than cycle-3 GFP (FIG. 2 ). - To test the stability of the GFP variants to urea denaturation during refolding, fluorescent GFP was denatured in 9M urea at 95° C. for 5 min until unfolded and non-fluorescent. GFP was renatured (refolded) by rapidly diluting 500-fold in the indicated concentration of urea in 100 mM TRIS pH 7.5, 150 mM NaCl, 10% glycerol, and allowed to refold for 1 h. The fluorescence was measured using a BioTek FL600 plate reader. Equilibrium unfolding concentrations of urea (where 50% of the GFP is folded, 50% unfolded) for superfolder is 3.8 M urea, while for folding reporter (cycle-3 red shift) GFP is 2.4 M, consistent with the improved stability and folding of superfolder (
FIG. 3 ). - Fluorescent cycle-3 redshift or superfolder GFP were unfolded in 9M urea at 95° C. for 5 minutes until non-fluorescent. The proteins were refolded by diluting 100-fold in 100 mM TRIS pH 7.5, 150 mM NaCl, 10% glycerol, in a rapidly stirred cuvette and the kinetics measured at 0.2 s intervals on a Perkin Elmer spectrofluorimeter (Waldo G S, Standish B M, Berendzen J, Terwilliger T C. (1999) Rapid protein-folding assay using green fluorescent protein. Nat. Biotechnol. 17(7): 691-695). The long-scale kinetics are shown in
FIG. 4A . After 10000 s, both superfolder and cycle-3 redshift approached the same final fluorescence values asymptotically (approximation of infinite time), ca. 375 fluorescence units. The initial rates were determined by fitting 4th order polynomials to the first 40 seconds of each progress curve (seeFIG. 4B ). Rates were normalized to pseudo-first-order rate constants by dividing by the fluorescence values at infinite time (ca. 375). The superfolder refolds approximately 7 times faster than cycle-3 redshift, consistent with the improved folding of superfolder (9.2×10−2-s−1 for superfolder, 1.3×10−2 s−1 for cycle-3 redshift. This is consistent with the improved folding of superfolder relative to the starting cycle-3 redshift parental variant. - GFP (either cycle-3 redshift or superfolder) was shuffled to create a point mutation rate of ca. 0.7% (Stemmer, W. P. C. (1994). Rapid evolution of a protein in vitro by DNA shuffling. Nature 370, 389-391; Stemmer, W. P. C. (1994). DNA shuffling by random fragmentation and reassembly: In vitro recombination for molecular evolution. PNAS USA 91, 10747-10751). The mutant pools and the starting variants (cycle-3 redshift or superfolder) were expressed in BL21(DE3) at 37° C., sonicated to lysed the cells, fractionated into soluble and pellet fractions by centrifugation, and the soluble and pellet fractions resolved on 20% SDS-PAGE gels, and scanned by densitometer. The starting variants were fully-soluble as expected. In contrast, the mutant pools displayed a significant fraction of misfolded, insoluble protein. Superfolder GFP mutant pool contained ca. 2.5 times the soluble protein of the cycle-3 redshift mutant pool, consistent with the improved folding (and subsequent increased solubility) of the superfolder variant (see
FIG. 5 ). - GFP (either cycle-3 redshift (F64L, S65T) or superfolder) was shuffled to create a point mutation rate of ca. 0.7% (Stemmer, W. P. C. (1994). Rapid evolution of a protein in vitro by DNA shuffling. Nature 370, 389-391; Stemmer, W. P. C. (1994). DNA shuffling by random fragmentation and reassembly: In vitro recombination for molecular evolution. PNAS USA 91, 10747-10751). The mutant pools were expressed in BL21(DE3) at 37° C. and analyzed by flow cytometry. The starting (parental) variants (superfolder or cycle-3 redshift) were cloned and expressed in BL21(DE3) at 37° C. as a standard and analyzed by flow cytometry. The superfolder variant mutant pool has a higher fraction of brighter cells (
FIG. 7 ) compared to the cycle-3 redshift mutant pool (FIG. 6 ). The increased tolerance of the folding of superfolder GFP to additional random mutations is consistent with the improved folding of the superfolder GFP versus cycle-3 redshift. - To create the circular permutants, the native N and C termini of each GFP variant were linked by a short GGGS amino acid linker, and new start codons were created at the indicated sites (see Table 2). Sites were chosen to correspond to the middle of loops between structural elements using the published structures of GFP. Manipulation was by primer-based PCR according to standard methods well known in the art. Most proteins do not tolerate circular permutation and still fold (Baird et al., 1999, Proc. Natl. Acad. Sci. USA, 96: 11241-11246). The effect of circular permutation was investigated by studying the solubility of the permutants as well as the fluorescence yield. Circular permutants were cloned into the pET vector equipped with an in-frame Spe-1 and Kpn-1 cloning site as Spe-1/Kpn-1 inserts and expressed in BL21(DE3) at 37° C. for 4 h. The cells were pelleted and fractionated into soluble and pellet fractions according to previously published methods (Waldo G S, Standish B M, Berendzen J, Terwilliger T C. (1999) Rapid protein-folding assay using green fluorescent protein. Nat. Biotechnol. 17(7): 691-695), resolved on SDS-PAGE gels, and the soluble and pellet fractions quantitated by densitometry. Many of the superfolder circular permutants are substantially soluble; in contrast, most of the cycle-3 redshift circular permutants are poorly soluble (see
FIG. 8 ). Fluorescence (480 nm ex/520 nm em) was measured for whole cells in suspension and normalized by dividing by the cell density (optical density 600 nm) (seeFIG. 9 ). As expected, the superfolder is much more tolerant of circular permutation, as evidenced by the greater fluorescence for superfolder compared to cycle-3 redshift for the various circular permutants. -
TABLE 2 Primers used to create circular permutants. SEQ. ID CP Name c3 sf Name Code Primer NO. a 2-3 1 GFP23+ GATATAACTAGTAATGGGCACAAATTTTCTGTCAGAGGA 6 a 2-3 1 GFP23 + wt GATATAACTAGTAATGGGCACAAATTTTCTGTCAGTGGA 7 a 2-3 1 1 GFP23− TACTTCGGTACCATTAACATCACCATCTAATTCAACAAG 8 b 3-4 1 GFP39+ GATATAACTAGTAACGGAAAACTCACCCTTAAATTTATT 9 b 3-4 1 GFP39 + wt GATATAACTAGTTACGGAAAACTCACCCTTAAATTTATT 10 b 3-4 1 GFP39− TACTTCGGTACCGTTTGTAGCATCACCTTCACCCTCTCC 11 b 3-4 1 GFP39 − wt TACTTCGGTACCGTATGTAGCATCACCTTCACCCTCTCC 12 c chrome 4-3 1 1 GFP51+ GATATAACTAGTGGAAAACTACCTGTTCCATGGCCAACA 13 c chrome 4-3 1 1 GFP51− TACTTCGGTACCTCCAGTAGTGCAAATAAATTTAAGGGT 14 d 4-3 1 1 GFP91+ GATATAACTAGTGGTTATGTACAGGAACGCACTATATCT 15 d 4-3 1 1 GFP91− TACTTCGGTACCACCTTCGGGCATGGCACTCTTGAAAAA 16 e 5-4 1 GFP102+ GATATAACTAGTGATGACGGGACCTACAAGACGCGTGCT 17 e 5-4 1 GFP102 + wt GATATAACTAGTGATGACGGGAACTACAAGACGCGTGCT 18 e 5-4 1 1 GFP102− TACTTCGGTACCATCTTTGAAAGATATAGTGCGTTCCTG 19 f 6-5 1 1 GFP117+ GATATAACTAGTGATACCCTTGTTAATCGTATCGAGTTA 20 f 6-5 1 1 GFP117− TACTTCGGTACCATCACCTTCAAACTTGACTTCAGCACG 21 g Pre 7-6 1 1 GFP129+ GATATAACTAGTGATTTTAAAGAAGATGGAAACATTCTC 22 g Pre 7-6 1 1 GFP129− TACTTCGGTACCATCAATACCTTTTAACTCGATACGATT 23 h Pre140 7-6 1 GFP140+ GATATAACTAGTAAACTCGAGTACAACTTTAACTCACAC 24 h Pre140 7-6 1 GFP140 + wt GATATAACTAGTAAACTCGAGTACAACTATAACTCACAC 25 h Pre140 7-6 1 1 GFP140− TACTTCGGTACCTTTGTGTCCGAGAATGTTTCCATCTTC 26 i 7-6 1 GFP145+ GATATAACTAGTTTTAACTCACACAATGTATACATCACG 27 i 7-6 1 GFP145 + wt GATATAACTAGTTATAACTCACACAATGTATACATCACG 28 i 7-6 1 GFP145− TACTTCGGTACCAAAGTTGTACTCGAGTTTGTGTCCGAG 29 i 7-6 1 GFP145 − wt TACTTCGGTACCATAGTTGTACTCGAGTTTGTGTCCGAG 30 j 8-7 1 1 GFP157+ GATATAACTAGTCAAAAGAATGGAATCAAAGCTAACTTC 31 j 8-7 1 1 GFP157− TACTTCGGTACCTTGTTTGTCTGCCGTGATGTATACATT 32 k 9-8 1 1 GFP173+ GATATAACTAGTGATGGTTCCGTTCAACTAGCAGACCAT 33 k 9-8 1 GFP173− TACTTCGGTACCATCTTCAACGTTGTGGCGAATTTTGAA 34 k 9-8 1 GFP173 − wt TACTTCGGTACCATCTTCAATGTTGTGGCGAATTTTGAA 35 l Pre 10-9 1 1 GFP189+ GATATAACTAGTGGCGATGGCCCTGTCCTTTTACCAGAC 36 l Pre 10-9 1 1 GFP189− TACTTCGGTACCGCCAATTGGAGTATTTTGTTGATAATG 37 m 10-9 1 1 GFP195+ GATATAACTAGTTTACCAGACAACCATTACCTGTCGACA 38 m 10-9 1 1 GFP195− TACTTCGGTACCTAAAAGGACAGGGCCATCGCCAATTGG 39 n 11-10 1 1 GFP214+ GATATAACTAGTAAGCGTGACCACATGGTCCTTCTTGAG 40 n 11-10 1 GFP214− TACTTCGGTACCCTTTTCGTTGGGATCTTTCGAAAGGAC 41 n 11-10 1 GFP214 − wt TACTTCGGTACCCTTTTCGTTGGGATCTTTCGAAAGGGC 42 Legend. CP Single-letter name of each of the 14 circular permutants (a-n). Name Name of each of the 14 circular permutants cited in FIGS. 9 and 10. C3 Primer used to make cycle-3 redshift circular permutant variant. SF Primer used to make superfolder circular permutant variant. Name Code Code name of primer. Number indicates amino acid of new start codon. Primer Sequence of primer (5′ to 3′ sense) used to make circular permutant. - All publications, patents, and patent applications cited in this specification are herein incorporated by reference as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference.
- The present invention is not to be limited in scope by the embodiments disclosed herein, which are intended as single illustrations of individual aspects of the invention, and any which are functionally equivalent are within the scope of the invention. Various modifications to the models and methods of the invention, in addition to those described herein, will become apparent to those skilled in the art from the foregoing description and teachings, and are similarly intended to fall within the scope of the invention. Such modifications or other embodiments can be practiced without departing from the true scope and spirit of the invention.
-
TABLE OF SEQUENCES SEQ ID NO:1 GFP variant nucleotide coding sequence (optimal) ATGAGCAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGA ATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCAGTGGAGAGGGTG AAGGTGATGCTACATACGGAAAACTCACCCTTAAATTTATTTGCACTACT GGAAAACTACCTGTTCCATGGCCAACACTTGTCACTACTCTGACCTATGG TGTTCAATGCTTTTCCCGTTATCCGGATCACATGAAACGGCATGACTTTT TCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTC AAAGATGACGGGAACTACAAGACGCGTGCTGAAGTCAAGTTTGAAGGTGA TACCCTTGTTAATCGTATCGAGTTAAAAGGTATTGATTTTAAAGAAGATG GAAACATTCTCGGACACAAACTCGAGTACAACTATAACTCACACAATGTA TACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCAAAAT TCGCCACAACATTGAAGATGGTTCCGTTCAACTAGCAGACCATTATCAAC AAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTAC CTGTCGACACAATCTGCCCTTTCGAAAGATCCCAACGAAAAGCGTGACCA CATGGTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGG ATGAGCTCTACAAATAA SEQ ID NO:2-GFP variant amino acid sequence: MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTT GKLPVPWPTLVTTLTYGVQCFSRYPDHMKRHDFFKSAMPEGYVQERTISF KDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNV YITADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHY LSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK* SEQ ID NO:3-DsRedSF variant nucleotide coding sequence: ATGGAGTCTTCCGAGGATGTTATCAAGGAGTTCATGAGGTTTAAGGTTCA CATGGAAGGATCGGTCAATGGGCACGAGTTTGAAATAGAAGGCGAAGGAG AGGGGAGGCCATACGAAGGCACCCAGAACGTAAAGCTTAAGGTAACTAAG GGGGGACCTTTGCCATTTGCTTGGGATATTTTGTCACCACAATTTCAGTA TGGAAGCAAGGTATATGTCAAGCACCCTGCCGACATACCAGACTATAAAA AGCTGTCATTTCCTGAAGGATTTAAATGGGAAAGGGTCATGAACTTTGAA GACGGTGGCGTCGCTACTGTAACCCAGGATTCCAGTTTGGAGGATGGCTG TTTGATCTACAAGGTCAAGTTCATTGGCGTGAACTTTCCTTCCGATGGAC CTGTTATGCAAAAGAAGACAATGGGCTGGGAACCGAGCACTGAGCGTTTG TATCCTCGTGATGGCGTGTTGAAAGGAGATATTCATAAGGCTCTGAAGCT GAAAGACGGTGGTCATTACCTAGTTGATATCAAAAGTATTTACATGGCAA AGAAGCCTGTGCAGCTACCAGGGTACTACTATGTTGACTCCAAACTGGAT ATAACAAACCACAACGAAGACTATACAATCGTTGAGCAGTATGAAAGAGC CGAGGGACGCCACCATCTGTTCCTTTAA SEQ ID NO: 4-DsRedSF variant amino acid sequence: MESSEDVIKEFMRFKVHMEGSVNGHEFEIEGEGEGRPYEGTQNVKLKVTK GGPLPFAWDILSPQFQYGSKVYVKHPADIPDYKKLSFPEGFKWERVMNFE DGGVATVTQDSSLEDGCLIYKVKFIGVNFPSDGPVMQKKTMGWEPSTERL YPRDGVLKGDIHKALKLKDGGHYLVDIKSIYMAKKPVQLPGYYYVDSKLD ITNHNEDYTIVEQYERAEGRHHLFL SEQ ID NO:5 Wild type GFP amino acid sequence (Swiss protein database accession P42212): MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTT GKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFF KDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNV YIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHY LSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK
Claims (44)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/900,551 US20090068732A1 (en) | 2002-04-24 | 2007-09-11 | Directed evolution methods for improving polypeptide folding and solubility and superfolder fluorescent proteins generated thereby |
US12/286,967 US9637528B2 (en) | 2002-04-24 | 2008-10-02 | Method of generating ploynucleotides encoding enhanced folding variants |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/132,067 US20030203355A1 (en) | 2002-04-24 | 2002-04-24 | Fluorobodies: binding ligands with intrinsic fluorescence |
US10/423,688 US7271241B2 (en) | 2002-04-24 | 2003-04-24 | Directed evolution methods for improving polypeptide folding and solubility and superfolder fluorescent proteins generated thereby |
US11/900,551 US20090068732A1 (en) | 2002-04-24 | 2007-09-11 | Directed evolution methods for improving polypeptide folding and solubility and superfolder fluorescent proteins generated thereby |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/423,688 Division US7271241B2 (en) | 2002-04-24 | 2003-04-24 | Directed evolution methods for improving polypeptide folding and solubility and superfolder fluorescent proteins generated thereby |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/286,967 Continuation-In-Part US9637528B2 (en) | 2002-04-24 | 2008-10-02 | Method of generating ploynucleotides encoding enhanced folding variants |
Publications (1)
Publication Number | Publication Date |
---|---|
US20090068732A1 true US20090068732A1 (en) | 2009-03-12 |
Family
ID=46204812
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/423,688 Expired - Lifetime US7271241B2 (en) | 2002-04-24 | 2003-04-24 | Directed evolution methods for improving polypeptide folding and solubility and superfolder fluorescent proteins generated thereby |
US11/900,551 Abandoned US20090068732A1 (en) | 2002-04-24 | 2007-09-11 | Directed evolution methods for improving polypeptide folding and solubility and superfolder fluorescent proteins generated thereby |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/423,688 Expired - Lifetime US7271241B2 (en) | 2002-04-24 | 2003-04-24 | Directed evolution methods for improving polypeptide folding and solubility and superfolder fluorescent proteins generated thereby |
Country Status (1)
Country | Link |
---|---|
US (2) | US7271241B2 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014142955A1 (en) * | 2013-03-15 | 2014-09-18 | Carnegie Mellon University | Linked peptide fluorogenic biosensors |
WO2015080178A1 (en) * | 2013-11-28 | 2015-06-04 | 学校法人 京都産業大学 | Novel fluorescent protein |
US10202466B2 (en) | 2007-12-03 | 2019-02-12 | Carnegie Mellon University | Linked peptide fluorogenic biosensors |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1886420B (en) * | 2003-10-24 | 2012-02-29 | 加利福尼亚大学 | Self-assembling split-fluorescent protein systems |
US7390640B2 (en) | 2005-02-22 | 2008-06-24 | Los Alamos National Security, Llc | Circular permutant GFP insertion folding reporters |
FR2886943B1 (en) * | 2005-06-10 | 2007-09-07 | Biomethodes Sa | METHOD OF SELECTING STABLE PROTEINS UNDER STANDARD PHYSICO-CHEMICAL CONDITIONS |
CA2656298A1 (en) | 2006-06-02 | 2007-12-13 | President And Fellows Of Harvard College | Protein surface remodeling |
EP2297182A4 (en) * | 2008-04-28 | 2012-08-15 | Harvard College | Supercharged proteins for cell penetration |
US9221886B2 (en) | 2009-04-28 | 2015-12-29 | President And Fellows Of Harvard College | Supercharged proteins for cell penetration |
US10544414B2 (en) | 2014-10-22 | 2020-01-28 | Danmarks Tekniske Universitet | Two-cassette reporter system for assessing target gene translation and target gene product inclusion body formation |
WO2016186948A1 (en) | 2015-05-15 | 2016-11-24 | Albert Einstein College Of Medicine, Inc. | Gfp-derived fusion tags for protein expression |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6307024B1 (en) * | 1999-03-09 | 2001-10-23 | Zymogenetics, Inc. | Cytokine zalpha11 Ligand |
US6414119B1 (en) * | 1998-10-16 | 2002-07-02 | Rutgers, The State University | Rapidly greening, low oxygen mutant of the aequoria victoria green fluorescent protein |
US20020107362A1 (en) * | 1995-09-22 | 2002-08-08 | Bioimage A/S | Novel fluorescent proteins |
US20020123113A1 (en) * | 1994-11-10 | 2002-09-05 | The Regents Of The University Of California | Modified green fluorescent proteins |
US6448087B1 (en) * | 1997-12-12 | 2002-09-10 | The Regents Of The University Of California | Method for determining and modifying protein/peptide solubility |
US20020177189A1 (en) * | 2000-06-19 | 2002-11-28 | Bjorn Sara Petersen | Novel fluorescent proteins |
US20030013849A1 (en) * | 1999-10-29 | 2003-01-16 | Ward William W. | Renilla reniformis green fluorescent protein |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US107362A (en) * | 1870-09-13 | Improved spittoon-holder | ||
US123113A (en) * | 1872-01-30 | Improvement in mail-bags | ||
US13849A (en) * | 1855-11-27 | krake | ||
US177189A (en) * | 1876-05-09 | Solomon w | ||
AUPP846399A0 (en) | 1999-02-02 | 1999-02-25 | University Of Sydney, The | Pigment protein from coral tissue |
-
2003
- 2003-04-24 US US10/423,688 patent/US7271241B2/en not_active Expired - Lifetime
-
2007
- 2007-09-11 US US11/900,551 patent/US20090068732A1/en not_active Abandoned
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020123113A1 (en) * | 1994-11-10 | 2002-09-05 | The Regents Of The University Of California | Modified green fluorescent proteins |
US20020107362A1 (en) * | 1995-09-22 | 2002-08-08 | Bioimage A/S | Novel fluorescent proteins |
US6448087B1 (en) * | 1997-12-12 | 2002-09-10 | The Regents Of The University Of California | Method for determining and modifying protein/peptide solubility |
US6414119B1 (en) * | 1998-10-16 | 2002-07-02 | Rutgers, The State University | Rapidly greening, low oxygen mutant of the aequoria victoria green fluorescent protein |
US6307024B1 (en) * | 1999-03-09 | 2001-10-23 | Zymogenetics, Inc. | Cytokine zalpha11 Ligand |
US20030013849A1 (en) * | 1999-10-29 | 2003-01-16 | Ward William W. | Renilla reniformis green fluorescent protein |
US20020177189A1 (en) * | 2000-06-19 | 2002-11-28 | Bjorn Sara Petersen | Novel fluorescent proteins |
Non-Patent Citations (4)
Title |
---|
Chapagain et al. (Biopolymers, vol. 81, pages 167-178, 2006) * |
Sergel et al. (J. of Virology, vol. 175, no. 17, 2001). * |
Waldo et al. (Nature Biotechnology, vol. 17, 1999) * |
Yang et al. (Nature Biotechnology, vol. 14, October 1996, pages 1246-1251 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10202466B2 (en) | 2007-12-03 | 2019-02-12 | Carnegie Mellon University | Linked peptide fluorogenic biosensors |
WO2014142955A1 (en) * | 2013-03-15 | 2014-09-18 | Carnegie Mellon University | Linked peptide fluorogenic biosensors |
WO2015080178A1 (en) * | 2013-11-28 | 2015-06-04 | 学校法人 京都産業大学 | Novel fluorescent protein |
JPWO2015080178A1 (en) * | 2013-11-28 | 2017-03-16 | 学校法人 京都産業大学 | Novel fluorescent protein |
Also Published As
Publication number | Publication date |
---|---|
US20040078148A1 (en) | 2004-04-22 |
US7271241B2 (en) | 2007-09-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20090068732A1 (en) | Directed evolution methods for improving polypeptide folding and solubility and superfolder fluorescent proteins generated thereby | |
US9081014B2 (en) | Nucleic acid encoding a self-assembling split-fluorescent protein system | |
Pédelacq et al. | Engineering and characterization of a superfolder green fluorescent protein | |
US7666606B2 (en) | Protein- protein interaction detection system using fluorescent protein microdomains | |
WO2005116267A2 (en) | Emission ratiometric indicators of phosphorylation by c-kinase | |
US8420390B2 (en) | Circular permutant GFP insertion folding reporters | |
WO2003095610A2 (en) | Directed evolution method of generating enhanced folding polypeptide variants | |
US9637528B2 (en) | Method of generating ploynucleotides encoding enhanced folding variants | |
WO2016186948A1 (en) | Gfp-derived fusion tags for protein expression | |
CA2949355A1 (en) | Genetically encoded sensors for imaging proteins and their complexes | |
US20140024555A1 (en) | Method of identifying soluble proteins and soluble protein complexes | |
Štrancar et al. | A Practical Guide for the Quality Evaluation of Fluobodies/Chromobodies | |
Waldo et al. | Circular permutant GFP insertion folding reporters |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: LOS ALAMOS NATIONAL SECURITY LLC, NEW MEXICO Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:THE REGENTS OF THE UNIVERSITY OF CALIFORNIA;REEL/FRAME:026082/0395 Effective date: 20060501 Owner name: THE REGENTS OF THE UNIVERSITY OF CALIFORNIA, CALIF Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:WALDO, GEOFFREY S.;REEL/FRAME:026082/0354 Effective date: 20030722 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |