[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2008137958A1 - Séquences de nucléotides codant pour la cellobiohydrolase ayant une cinétique traductionnelle raffinée et procédés pour leur préparation - Google Patents

Séquences de nucléotides codant pour la cellobiohydrolase ayant une cinétique traductionnelle raffinée et procédés pour leur préparation Download PDF

Info

Publication number
WO2008137958A1
WO2008137958A1 PCT/US2008/062957 US2008062957W WO2008137958A1 WO 2008137958 A1 WO2008137958 A1 WO 2008137958A1 US 2008062957 W US2008062957 W US 2008062957W WO 2008137958 A1 WO2008137958 A1 WO 2008137958A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleotides
replaced
codon
encoding
translational
Prior art date
Application number
PCT/US2008/062957
Other languages
English (en)
Inventor
Kirsty A. Salmon
David A. Roth
G. Wesley Hatfield
Yimeng Dou
Original Assignee
The Regents Of The University Of California
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Regents Of The University Of California filed Critical The Regents Of The University Of California
Publication of WO2008137958A1 publication Critical patent/WO2008137958A1/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N1/00Microorganisms, e.g. protozoa; Compositions thereof; Processes of propagating, maintaining or preserving microorganisms or compositions thereof; Processes of preparing or isolating a composition containing a microorganism; Culture media therefor
    • C12N1/22Processes using, or culture media containing, cellulose or hydrolysates thereof
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/24Hydrolases (3) acting on glycosyl compounds (3.2)
    • C12N9/2402Hydrolases (3) acting on glycosyl compounds (3.2) hydrolysing O- and S- glycosyl compounds (3.2.1)
    • C12N9/2405Glucanases
    • C12N9/2434Glucanases acting on beta-1,4-glucosidic bonds
    • C12N9/2437Cellulases (3.2.1.4; 3.2.1.74; 3.2.1.91; 3.2.1.150)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P7/00Preparation of oxygen-containing organic compounds
    • C12P7/02Preparation of oxygen-containing organic compounds containing a hydroxy group
    • C12P7/04Preparation of oxygen-containing organic compounds containing a hydroxy group acyclic
    • C12P7/06Ethanol, i.e. non-beverage
    • C12P7/08Ethanol, i.e. non-beverage produced as by-product or from waste or cellulosic material substrate
    • C12P7/10Ethanol, i.e. non-beverage produced as by-product or from waste or cellulosic material substrate substrate containing cellulosic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y302/00Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2)
    • C12Y302/01Glycosidases, i.e. enzymes hydrolysing O- and S-glycosyl compounds (3.2.1)
    • C12Y302/01004Cellulase (3.2.1.4), i.e. endo-1,4-beta-glucanase
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y302/00Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2)
    • C12Y302/01Glycosidases, i.e. enzymes hydrolysing O- and S-glycosyl compounds (3.2.1)
    • C12Y302/01021Beta-glucosidase (3.2.1.21)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y302/00Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2)
    • C12Y302/01Glycosidases, i.e. enzymes hydrolysing O- and S-glycosyl compounds (3.2.1)
    • C12Y302/01091Cellulose 1,4-beta-cellobiosidase (3.2.1.91)
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E50/00Technologies for the production of fuel of non-fossil origin
    • Y02E50/10Biofuels, e.g. bio-diesel

Definitions

  • the present invention relates to refining the translational kinetics of an mRNA into polypeptide, and polypeptide-encoding nucleotide sequences which have refined translational properties.
  • Cellulases are enzymes used in the treatment of textiles and in the production of cellulosic ethanol that are capable of hydrolysis of the ⁇ -D-glucosidic linkages in cellulose.
  • Cellulolytic enzymes have been traditionally divided into three major classes: endoglucanases, exoglucanases or cellobiohydrolases and ⁇ -glucosidases (Knowles, J. et al., (1987), Trends Biotech. 5, 255-261); and are known to be produced by a large number of bacteria, yeasts and fungi.
  • An endoglucanase hydrolyses ⁇ -l,4-glycosidic linkages randomly.
  • cellobiohydrolase acts on cellulose and, in particular, splits off cellobiose units from the non-reducing end of the chain.
  • Cellobiohydrolase hydrolyses cellodextrins but not cellobiose.
  • ⁇ -Glucosidase hydrolyses cellobiose and cellooligosaccharides to glucose, but does not attack cellulose or higher cellodextrins.
  • the filamentous fungus Trichoderma reesei produces a complete set of cellulosic enzymes needed for efficient solubilization of native cellulose. Its two cellobiohydrolases, cellobiohydrolase-I (CBH-I) and cellobiohydrolase-II (CBH-II) are key enzymes in the breakdown of crystalline cellulose.
  • CBH-II is an exoglucanase releasing predominantly cellobiose from the ends of the polymeric glucose chains. In large-scale ethanol production, a steady supply of CBH-II can play a valuable role.
  • T. reesei CBH-II (TrCBH-II) does not express well in host organisms such as Escherichia coli or Saccharomyces cerevisiae. As a result, large- scale production is limited. Therefore, there is a continued need for improved expression of cellobiohydrolase enzymes.
  • Some translational pauses are resultant from the presence of particular codon pairs in the nucleotide sequence encoding the polypeptide to be translated. As provided herein, inappropriate or excessive translation pauses can reduce protein expression considerably. Further, the translational pausing properties of codon pairs vary from organism to organism. As a result, exogenous expression of genes foreign to the expression organism can lead to inefficient translation and poor expression. Even when the gene is translated in a sufficiently efficient manner that recoverable quantities of the translation product are produced, the protein is often inactive, insoluble, aggregated, or otherwise different in properties from the native protein. Thus, removing inappropriate or excessive translation pause structures coded for by specific di-codon nucleotide sequences in the open reading frame (ORF) can improve protein expression.
  • ORF open reading frame
  • cellobiohydrolase- encoding nucleotide sequences with refined translational kinetics and methods of designing and synthesizing the same.
  • a cellobiohydrolase-encoding DNA sequence wherein the encoded sequence has amino acid sequence identity with an original cellobiohydrolase polypeptide, and wherein predicted translation pauses in the expression organism have been removed or reduced by replacing original codon pairs with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the resultant cellobiohydrolase- encoding nucleotide is predicted to be translated rapidly along its entire length.
  • Expression of the resultant cellobiohydrolase-encoding nucleotide is predicted to result in improved protein expression levels in cases where inappropriate or excessive translation pauses reduce protein expression.
  • expression of the resultant cellobiohydrolase-encoding nucleotide is predicted to result in improved levels of active and/or natively folded polypeptide expression products in cases where inappropriate or excessive translation pauses cause expression of inactive, insoluble or aggregated cellobiohydrolase.
  • cellobiohydrolase-encoding DNA sequences wherein the encoded sequence has amino acid sequence identity with an original cellobiohydrolase-encoding DNA sequence and is adapted for expression in a heterologous host organism, wherein at least 1, 2, or 3 codon pairs of the original sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein.
  • the at least three codon pairs of the original sequence that are predicted to cause a translational pause in the host organism are highly- overrepresented codon pairs therein and have been replaced with codon pairs that are not highly-overrepresented therein.
  • the host organism is not human, E. coli or S. cerevisiae.
  • a cellobiohydrolase-encoding DNA sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 27-471 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 2, wherein at least 3 codon pairs of SEQ ID NO: 1 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: CCCTCT (nucleotides 463-468); GGCCAA (nucleotides 94-99); CAGTTT (nucleotides 565-570); GATATC (nucleotides 703-708); GTGGAA (nucleotides 691- 696); GGATTT (nucleotides 1192-1197); GGTATT (nucleotides 1198-1203).
  • CCCTCT nucleotides 463-468
  • GGCCAA nucleotides 94-99
  • CAGTTT nucleotides 565-570
  • GATATC nucleotides 703-708
  • GTGGAA nucleotides 691- 696
  • GGATTT nucleotides 1192-1197
  • GGTATT nucleotides 1198-1203
  • CCCTCT nucleotides 463-4608 replaced with CCTTCT
  • GGCCAA nucleotides 94-99 replaced with GGTCAA
  • CAGTTT nucleotides 565-570 replaced with CAATTT
  • GATATC nucleotides 703- 708 replaced with GACATT
  • GTGGAA nucleotides 691-696 replaced with GTTGAA
  • GGATTT nucleotides 1192-1197 replaced with GGTTTC
  • GGTATT nucleotides 1198-1203 replaced with GGAATT.
  • the DNA sequence is optimized for expression in S. cerevisiae.
  • a cellobiohydrolase-encoding DNA sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 27-471 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 2, wherein at least 3 codon pairs of SEQ ID NO: 1 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: CTCGGT (nucleotides 760-765); ATTGCC (nucleotides 631-636); GACAGC (nucleotides 1285-1290); GTCTGG (nucleotides 88-93); GTCTGG (nucleotides 1246- 1251); TTGCTG (nucleotides 1231-1236); GTGGTG (nucleotides 571-576); ACGCTG (nucleotides 22-27); ACGCTG (nucleotides 31-36); GACTGG (nucleotides 1168-1173); GCCGGA (nucleotides 559-564); CTGGTG (nucleotides 748-753).
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: CTCGGT (nucleotides 760-765) replaced with CTGGGT; ATTGCC (nucleotides 631-636) replaced with ATTGCG; GACAGC (nucleotides 1285-1290) replaced with GACTCT; GTCTGG (nucleotides 88- 93) replaced with GTTTGG; GTCTGG (nucleotides 1246-1251) replaced with GTTTGG; TTGCTG (nucleotides 1231-1236) replaced with CTGCTG; GTGGTG (nucleotides 571- 576) replaced with GTTGTT; ACGCTG (nucleotides 22-27) replaced with ACCCTC; ACGCTG (nucleotides
  • a cellobiohydrolase-encoding DNA sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 27-471 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 2, wherein at least 3 codon pairs of SEQ ID NO: 1 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: CAGTTT (nucleotides 565-570); TTTGAC (nucleotides 1303-1308); TCGTTT (nucleotides 1240-1245); GGCCAA (nucleotides 94-99); AAGAAT (nucleotides 541-546); AAGAAT (nucleotides 934-939); GCCAAA (nucleotides 649- 654); GTCAAG (nucleotides 1252-1257); GGTATT (nucleotides 1198-1203); ATCAAC (nucleotides 808-813); GGCCAT (nucleotides 865-870); CTTCCA (nucleotides 835- 840); GATATC (nucleotides 703-708); TCGTTG (nucleotides 1228-1233).
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: CAGTTT (nucleotides 565-570) replaced with CAATTT; TTTGAC (nucleotides 1303-1308) replaced with TTTGAT; TCGTTT (nucleotides 1240-1245) replaced with TCTTTT; GGCCAA (nucleotides 94- 99) replaced with GGACAA; AAGAAT (nucleotides 541-546) replaced with AAAAAT; AAGAAT (nucleotides 934-939) replaced with AAAAAC; GCCAAA (nucleotides 649- 654) replaced with GCTAAA; GTCAAG (nucleotides 1252-1257) replaced with GTTAAA; GGTATT
  • a cellobiohydrolase-encoding DNA sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 27-471 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 2, wherein at least 3 codon pairs of SEQ ID NO: 1 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: GGCCAA (nucleotides 94-99); CAGTTT (nucleotides 565-570); GATATC (nucleotides 703-708); TATTTG (nucleotides 853-858); GGCCAT (nucleotides 865- 870); TCGTTG (nucleotides 1228-1233); TTTGTC (nucleotides 1243-1248); TTCCAA (nucleotides 1363-1368).
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: GGCCAA (nucleotides 94-99) replaced with GGTCAA; CAGTTT (nucleotides 565-570) replaced with CAATTC; GATATC (nucleotides 703-708) replaced with GACATT; TATTTG (nucleotides 853-858) replaced with TATTTA; GGCCAT (nucleotides 865-870) replaced with GGACAT; TCGTTG (nucleotides 1228-1233) replaced with TCTTTA; TTTGTC (nucleotides 1243-1248) replaced with TTCGTT; TTCCAA (nucleotides 1363-1368) replaced with TTCCAG.
  • the following codon pair replacements have been made: GGCCAA (nucleo
  • a cellobiohydrolase-encoding DNA sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 27-471 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 2, wherein at least 3 codon pairs of SEQ ID NO: 1 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: GTGCCT (nucleotides 55-60); GCCAAT (nucleotides 370-375); GCTATT (nucleotides 406-411); GCCGGA (nucleotides 559-564); GCCAAT (nucleotides 778- 783); TTGGCA (nucleotides 967-972); AAGCTG (nucleotides 1051-1056); GCTATT (nucleotides 1066-1071); GCCAAT (nucleotides 1084-1089); ACCGGA (nucleotides 1147-1152); ACCGGA (nucleotides 1189-1194); GGTATT (nucleotides 1198 - 1203); GACAGC (nucleotides 1285-1290); GATGCC (nucleotides 1327-1332); GCCTTG (nucleotides 1330-1335);
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: GTGCCT (nucleotides 55-60) replaced with GTTCCG; GCCAAT (nucleotides 370-375) replaced with GCTAAT; GCTATT (nucleotides 406-411) replaced with GCCATT; GCCGGA (nucleotides 559- 564) replaced with GCTGGT;GCCAAT (nucleotides 778-783) replaced with GCGAAT; TTGGCA (nucleotides 967-972) replaced with TTGGCT; AAGCTG (nucleotides 1051- 1056) replaced with AAATTG; GCTATT (nucleotides 1066-1071) replaced with GCCATT; GCCAAT (nucleotides
  • a cellobiohydrolase-encoding DNA sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 27-471 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 2 and is adapted for expression in a heterologous host organism, and wherein at least three codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein.
  • the codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism are highly- overrepresented codon pairs therein and have been replaced with codon pairs that are not highly-overrepresented therein, wherein a highly-overrepresented codon pair is a codon pair that has a translational kinetics value greater than 5, or 3, or 2.5, or 2 times the standard deviation of translational kinetics values for the host organism.
  • the host organism is not human, E. coli or S. cerevisiae.
  • a cellobiohydrolase-encoding DNA sequence having at least a 75% amino acid sequence identity with amino acids 27-471 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 2 and is adapted for expression in a heterologous host organism, wherein at least three codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein, and wherein the host organism is selected from the following: Pichia pastoris; Oryctolagus cuniculus (rabbit); Macaca fascicularis (Long-tailed monkey); M. mulatta (Monkey); E.
  • an expression system comprising an expression vector in a host organism, wherein the expression vector includes the DNA sequence of the embodiments provided herein, operably linked to an expression control sequence.
  • a system for degrading cellulose comprising one or more host organisms that collectively include DNA sequences operably encoding the following enzymes: endo-l,4- ⁇ -glucanase, exo-l,4- ⁇ -D- glucanase, and ⁇ -D-glucosidase; wherein the enzymes are heterologous to the one or more host organisms, and wherein transcriptional kinetics of each of the DNA sequences encoding the enzymes has been modified to replace at least three codon pairs present in the original sequence for each enzyme, wherein the at least three replaced codon pairs are predicted to cause a translational pause in the host organism, and wherein said modification results in silent permutation or conservative amino acid substitution of said at least three codon pairs.
  • the one or more host organisms are selected from the group consisting of: Saccharomyces cerevisiae, Pichia pastoris, Escherichia coli, Bombyx mori, Spodoptera frugiperda, Drosophila melanogaster, Kluyveromyces lactis, Zymomonas mobilis and Schizosaccharomyces pombe.
  • each encoded enzyme has at least a 75% amino acid sequence identity with the original sequence of the enzyme.
  • the exo-l,4- ⁇ -D-glucanase retains at least 75% of the enzymatic activity of wild-type TrCBH-II (SEQ ID NO: 2) under normal physiological conditions.
  • a cellobiohydrolase-encoding DNA sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 27-471 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 2 and is adapted for expression in a heterologous host organism, wherein at least 1 , 2 or 3 codon pairs present in SEQ ID NO:1 and which encode amino acids 27-62 of SEQ ID NO: 2 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 27-62 of SEQ ID NO: 2 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 27-62 when expressed in the native organism.
  • no replacement codon encoding amino acids 27-62 of SEQ ID NO: 2 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the wild type codon pair TCCAAC when expressed in the native organism.
  • a cellobiohydrolase-en coding DNA sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 27-471 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 2 and is adapted for expression in a heterologous host organism, wherein at least 1 , 2 or 3 codon pairs present in SEQ ID NO:1 and which encode amino acids 107-471 of SEQ ID NO: 2 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 107-471 of SEQ ID NO: 2 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 107-471 when expressed in the native organism.
  • no replacement codon encoding amino acids 107-471 of SEQ ID NO: 2 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the wild type codon pair GCAAAG when expressed in the native organism.
  • a cellobiohydrolase-encoding DNA sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 27-471 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 2 and is adapted for expression in a heterologous host organism, wherein at least 1 , 2 or 3 codon pairs present in SEQ ID NO:1 and which encode amino acids 62-107 of SEQ ID NO: 2 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be equally or more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is at least 75% of the z score for the wild type codon pair when expressed in the native organism.
  • at least one replacement codon encoding amino acids 62-107 of SEQ ID NO: 2 has a z score for expression in the heterologous that is more than 200%, or 100%, or 75%, or 50% or 40% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 62-107 when expressed in the native organism.
  • At least one replacement codon encoding amino acids 62-107 of SEQ ID NO: 2 has a z score for expression in the heterologous host that is more than 200%, or 100%, or 75%, or 50% or 40% of the wild type codon pair TCTACT when expressed in the native organism.
  • polynucleotides comprising any of the DNA sequences provided herein.
  • isolated polynucleotides comprising the DNA sequence of SEQ ID NOs:3, 5, 7, 9, 11, 13, 15, 17, 19, 21 or 23.
  • such a polynucleotide is a DNA polynucleotide, while also contemplated herein, such a polynucleotide can be an RNA polynucleotide comprising the RNA- equivalent of said DNA sequence.
  • cells comprising such a polynucleotide. In some such cells, the cell expresses the polypeptide encoded by the polynucleotide.
  • Also provided are methods of introducing a polynucleotide into a host cell comprising providing a host cell; and contacting said host cell with any of the polynucleotides provided herein under conditions that permit the polynucleotide to be introduced into the host cell. Also provided are methods of expressing a polypeptide comprising providing a cell comprising any of the polynucleotides provided herein; and placing the cell under conditions that permit the cell to express the polypeptide encoded by the DNA sequence, whereby said encoded polypeptide is expressed by said cell.
  • Also provided are methods of hydrolyzing a carbohydrate comprising providing a carbohydrate comprising at least one glycosidic bond; providing a polypeptide encoded by any of the polynucleotides provided herein; and contacting said carbohydrate with said polypeptide under conditions that permit said polypeptide to hydrolyze at least one glycosidic bond of said carbohydrate; whereby at least one glycosidic bond of said carbohydrate is hydrolyzed.
  • the carbohydrate is cellulose.
  • the carbohydrate comprises two or more ⁇ -l,4-linked glucose units.
  • Figure 1 depicts a graphical display of z scores of translational kinetics values for codon pair utilization in T.
  • Reesei of nucleic acid sequences encoding the cellobiohydrolase-II enzyme of T. Reesei (TrCBH-II), plotted as a function of codon pair position.
  • Figures 2-6 depicts effects of Translational EngineeringTM on protein expression levels. Each of Figures 2-6 depict graphical displays of z scores of translational kinetics values for codon pair utilization of nucleic acid sequences encoding TrCBH-II, plotted as a function of codon pair position.
  • Figure 2A depicts a graphical display of the S. cerevisiae expression of the native nucleic acid sequence encoding the TrCBH-II protein.
  • Figure 2B depicts a graphical display of the S. cerevisiae expression of a nucleic acid sequence encoding the TrCBH-II which has been modified to eliminate codon pairs that are predicted to cause a translational pause in S. cerevisiae.
  • Figure 3A depicts a graphical display of the E. coli expression of the native nucleic acid sequence encoding the TrCBH-II protein.
  • Figure 3B depicts a graphical display of the E. coli expression of a nucleic acid sequence encoding the TrCBH-II which has been modified to eliminate codon pairs that are predicted to cause a translational pause in E. coli.
  • Figure 4A depicts a graphical display of the P. pastoris expression of the native nucleic acid sequence encoding the TrCBH-II protein.
  • Figure 4B depicts a graphical display of the P. pastoris expression of a nucleic acid sequence encoding the TrCBH-II which has been modified to eliminate codon pairs that are predicted to cause a translational pause in P. pastoris.
  • Figure 5 A depicts a graphical display of the K. lactis expression of the native nucleic acid sequence encoding the TrCBH-II protein.
  • Figure 5B depicts a graphical display of the K. lactis expression of a nucleic acid sequence encoding the TrCBH-II which has been modified to eliminate codon pairs that are predicted to cause a translational pause in K. lactis.
  • Figure 6A depicts a graphical display of the Z. mobilis expression of the native nucleic acid sequence encoding the TrCBH-II protein.
  • Figure 6B depicts a graphical display of the Z. mobilis expression of a nucleic acid sequence encoding the TrCBH-II which has been modified to eliminate codon pairs that are predicted to cause a translational pause in Z. mobilis.
  • DETAILED DESCIPRTION depicts a graphical display of the Z. mobilis expression of the native nucleic acid sequence encoding the TrCBH-II protein.
  • Figure 6B depicts a graphical display of the Z. mobilis expression of a nucleic acid sequence encoding the TrCBH-II which has been modified to eliminate codon pairs that are predicted to cause a translational pause in Z. mobilis.
  • Biomass is the earth's most attractive alternative among fuel sources and most sustainable energy resource and is reproduced by the bioconversion of carbon dioxide.
  • Ethanol produced from biomass is today the most widely used biofuel when blended with gasoline.
  • the use of biofuels can significantly reduce the accumulation of greenhouse gas.
  • Ethanol is just one example of the uses of biomass harvesting using industrial enzymes. The technologies associated with biomass harvesting are similarly applicable in the production of other biofuels, fine chemicals as well as other diverse applications.
  • a variety of highly specialized microorganisms have evolved to produce enzymes that either synergistically or in complexes can carry out the complete hydrolysis of cellulose.
  • the anaerobic bacteria Clostridium thermocellum and Clostridium cellulovorans and the filamentous fungus Trichoderma reesei are known as cellulolytic and xylanolytic microorganisms.
  • the bacteria C thermocellum and C. cellulovorans produce a cellulosome complex consisting of cellulase and hemicellulase organized on the cell surface (Doi and Tamaru (2001) Chem. Rec. 1:24-32; Shoham et al. (1999) Trends Microbiol. 7:275-281).
  • T. reesei three types of cellulolytic enzyme are extracellularly secreted, including five endoglucanases (EG [EC 3.2.1.4]) (Okada et al (1998) Appl. Environ. Microbiol. 64:555- 563), two cellobiohydrolases (CBH [EC 3.2.1.91]) (Henrissat et al. (1985) Bio/Technology 3:722-726; Teeri et al. (1987) Gene 51 :43-52), and two ⁇ -glucosidases (BGL [EC 3.2.1.21]) (Chen et al. (1992) Biochim. Biophys.
  • EG [EC 3.2.1.4] endoglucanases
  • CBH [EC 3.2.1.91] two cellobiohydrolases
  • BGL [EC 3.2.1.21] two ⁇ -glucosidases
  • Endoglucanases act randomly against the amorphous region of the cellulose chain to produce reducing and nonreducing ends for cellobiohydrolases, which produce cellobiose from reducing or nonreducing ends of crystalline cellulose.
  • Exoglucanase enzymes including CBH-I and CBH-II, liberate the disaccharide D-cellobiose from 1 ,4- ⁇ -glucans.
  • Cellulose chains are thus efficiently degraded to soluble cellobiose and cellooligosaccharides by the endo-exo synergism of EG and CBH (Henrissat et al. (1985) Bio/Technology 3:722-726).
  • the predominant polysaccharide in the primary cell wall of biomass is cellulose, the second most abundant is hemi-cellulose, and the third is pectin.
  • the secondary cell wall produced after the cell has stopped growing, also contains polysaccharides and is strengthened through polymeric lignin covalently cross-linked to hemicellulose.
  • Cellulose is a homopolymer of anhydrocellobiose and thus a linear ⁇ -(l- 4)-D-glucan, while hemicelluloses include a variety of compounds, such as xylans, xyloglucans, arabinoxylans, and mannans in complex branched structures with a spectrum of substituents.
  • cellulose is found in plant tissue primarily as an insoluble crystalline matrix of parallel glucan chains. Hemicelluloses usually hydrogen bond to cellulose, as well as to other hemicelluloses, which helps stabilize the cell wall matrix.
  • DNA constructs encoding cellulase enzymes are known in the art.
  • U.S. Patent No. 5,686,593 relates to cellulose- or hemicellulose-degrading enzymes that are derivable from a fungus other than Trichoderma or Phanerochaete, and which comprise a carbohydrate binding domain homologous to a terminal A region of T. reesei cellulases.
  • T. reesei CBH-II does not express well in host organisms such as E. coli or S. cerevisiae. Accordingly, provided herein are cellobiohydrolase-encoding nucleotide sequences and methods of making the same for improved expression of cellobiohydrolase enzymes.
  • Some translational pauses are resultant from the presence of particular codon pairs in the nucleotide sequence encoding the polypeptide to be translated. As provided herein, inappropriate or excessive translation pauses can reduce protein expression considerably. Further, the translational pausing properties of codon pairs vary from organism to organism. As a result, exogenous expression of genes foreign to the expression organism can lead to inefficient translation. Even when the gene is translated in a sufficiently efficient manner that recoverable quantities of the translation product are produced, the protein is often inactive, insoluble, aggregated, or otherwise different in properties from the native protein. Thus, removing inappropriate or excessive translation pauses can improve protein expression.
  • a translational pause can serve to slow translation of the nascent amino acid chain.
  • the pause(s) can serve to facilitate proper polypeptide folding, post-translational modification, re-organization/folding at protein domain boundaries, or other steps toward arriving at the native, active wild type protein.
  • one or more pauses that are predicted to be present in native translation of cellobiohydrolase is/are preserved in a modified cellobiohydrolase-encoding polynucleotide provided in accordance with the teachings herein.
  • a codon pair in the modified cellobiohydrolase-encoding polynucleotide can be selected to have a predicted translational kinetics value that is at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, or 99% that of the native codon pair whose predicted pause is to be preserved; further, the codon pair in the modified cellobiohydrolase-encoding polynucleotide can be selected to be located within 30, 20, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1 codons of the native codon pair whose predicted pause is to be preserved.
  • Translation EngineeringTM refers to a process used to modify the translational kinetics of a polypeptide-encoding nucleic sequence.
  • Translation EngineeringTM can be applied to modify the translational kinetics of a polypeptide-encoding nucleic sequence when expressed in its native organism.
  • Translation EngineeringTM can be applied to modify the translational kinetics of a polypeptide-encoding nucleic sequence when expressed in its native organism.
  • this process alters the polypeptide-encoding nucleic sequence to optimize codon usage and codon pair optimization in the organism in which the polypeptide-encoding nucleic sequence is expressed.
  • sequence modifications can be made to place or prevent restriction sites in the sequence, eliminate strong RNA secondary structures and avoid inadvertent Shine-Delgarno sequences.
  • Translation EngineeringTM involves modifying the translational kinetics of a polypeptide-encoding nucleic sequence by removing, preserving, and/or inserting translational pauses into the polypeptide-encoding nucleic sequence.
  • cellobiohydrolase-encoding nucleotide sequences with refined translational kinetics and methods of making same are provided herein.
  • a cellobiohydrolase-encoding DNA sequence wherein the encoded sequence has amino acid sequence identity with wild-type cellobiohydrolase, and wherein predicted translation pauses in the expression organism have been removed or reduced by replacing input-sequence codon pairs with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the resultant cellobiohydrolase-encoding nucleotide is predicted to be translated rapidly along its entire length.
  • Expression of the resultant cellobiohydrolase-encoding nucleotide is predicted to result in improved protein expression levels in cases where inappropriate or excessive translation pauses reduce protein expression.
  • expression of the resultant cellobiohydrolase-encoding nucleotide is predicted to result in improved levels of active and/or natively folded polypeptide expression in cases where inappropriate or excessive translation pauses causes expression of inactive, insoluble or aggregated cellobiohydrolase.
  • expression of the resultant cellobiohydrolase-encoding nucleotide is predicted to result in improved levels of active and/or natively folded polypeptide expression in cases where one or more predicted pauses are preserved from the native expression profile or are added to preserve expression of active and/or soluble cellobiohydrolase.
  • the cellobiohydrolase-encoding nucleotide sequences provided herein allow for one or more of the following results: higher expression levels; higher enzymatic activity; greater protein stability and resistance to degradation; and increased solubility.
  • nucleic acid sequences encoding the cellobiohydrolase-II enzyme of T. Reesei are provided.
  • the nucleotide sequences provided herein include the native sequence from T. Reesei shown in the sequence listing (SEQ ID NO: 1) which encodes the TrCBH-II amino acid sequence (SEQ ID NO: 2).
  • nucleic acid sequences encoding TrCBH- II with refined translational kinetics for expression in S. cerevisiae (SEQ ID NO: 3), E. coli (SEQ ID NO: 9), P. pasto ⁇ s (SEQ ID NO: 15), K. lactis (SEQ ID NO: 21) and Z. mobilis (SEQ ID NO: 23). Also provided herein are sequences where additional sequence has been added to the 3 'or 5' ends, or both.
  • nucleotide sequences may be added 3' or 5' of any nucleic acid, for example, to facilitate hybridization of PCR primers, to add cloning restriction sites or other sites that facilitate cloning and/or expression. Accordingly, provided in the sequence listing are nucleic acid sequences with additional 5' and 3' cloning and/or PCR sequences, and which encode TrCBH-II with refined translational kinetics for expression in S. cerevisiae (SEQ ID NOS: 5 and 7), E. coli (SEQ ID NOS: 11 and 13) and P. pastoris (SEQ ID NOS: 17 and 19).
  • TrCBH-II amino acid sequences encoded by the nucleotide sequences with refined translational kinetics described herein.
  • TrCBH-II nucleic acid sequences with refined translational kinetics for expression in S. cerevisiae encode the amino acid sequences shown in the sequence listing (SEQ ID NOS: 4, 6 and 8).
  • TrCBH-II nucleic acid sequences with refined translational kinetics for expression in E. coli (SEQ ID NOS: 9, 11 and 13) encode the amino acid sequences shown in the sequence listing (SEQ ID NOS: 10, 12 and 14).
  • TrCBH-II nucleic acid sequences with refined translational kinetics for expression in P. pastoris (SEQ ID NOS: 15, 17 and 19) encode the amino acid sequences shown in the sequence listing (SEQ ID NOS: 16, 18 and 20).
  • TrCBH-II nucleic acid sequences with refined translational kinetics for expression in K. lactis (SEQ ID NO: 21) encode the amino acid sequences shown in the sequence listing (SEQ ID NO: 22).
  • TrCBH-II nucleic acid sequences with refined translational kinetics for expression in Z. mobilis (SEQ ID NO: 24).
  • the at least three codon pairs of the original sequence that are predicted to cause a translational pause in the host organism are highly-overrepresented codon pairs therein and have been replaced with codon pairs that are not highly-overrepresented therein.
  • the host organism is not human, E. coli or S. cerevisiae.
  • a cellobiohydrolase polynucleotide encodes a polypeptide having cellobiohydrolase activity.
  • Cellobiohydrolase, exoglucanase, exo- 1 ,4- ⁇ -D-glucanase and like terms refers to the enzymatic hydrolysis of a glucoside bond in a polysaccharide or an oligosaccharide containing D-glucose subunits bonded through ⁇ -1,4 bonds, to release cellobiose, a disaccharide in which D-glucose is bonded through a ⁇ -1,4 bond.
  • a method for measuring the cellobiohydrolase activity is exemplified by a known method in which an enzymatic reaction is carried out using phosphoric acid- swollen cellulose as a substrate and the existence of cellobiose in the reaction is confirmed by thin-layer silica gel chromatography, as described in U.S. Patent No. 6,566,113, hereby incorporated by reference in its entirety.
  • polynucleotides provided herein encode polypeptides that have a cellobiohydrolase activity.
  • a cellobiohydrolase-encoding polynucleotide comprising any of the DNA sequences provided herein can be transcribed and the resulting RNA translated to produce a polypeptide with cellobiohydrolase activity.
  • the cellobiohydrolase-encoding DNA sequence is adapted for expression in a heterologous host organism.
  • a DNA sequence that has been adapted for expression is a DNA sequence that has been inserted into an expression vector or otherwise modified to contain regulatory elements necessary for expression of the DNA in the host cell, positioned in such a manner as to permit expression of the DNA in the host cell.
  • regulatory elements required for expression include promoter sequences, transcription initiation sequences and, optionally, enhancer sequences.
  • a DNA sequence may be inserted into a plasmid vector adapted for expression in a bacterial cell, such as E. coli or Z. mobilis, a eukaryotic cell, such as S. cerevisiae, K. lactis or other yeast, or any other host organism.
  • a heterologous host organism is an organism used to express DNA, RNA or protein that is foreign to the host organism.
  • the host organism is not human, E. coli or S. cerevisiae.
  • polynucleotides provided herein also encode polypeptides that have other glycosidase activities such as an endoglucanase activity and a ⁇ -D-glucosidase activity.
  • translational kinetics of an mRNA into polypeptide can be changed in order to achieve any of a variety of expression profiles. For example, translational kinetics of an mRNA into polypeptide can be changed in order to remove some or all translational pauses. In another example, translational kinetics of an mRNA into polypeptide can be changed in order to replace some or all translational pauses predicted to occur within an autonomous folding unit of a nascent protein. In another example, translational kinetics of an mRNA into polypeptide can be changed in order to replace some or all over-represented codon pairs.
  • the presence of a pause or translation slowing codon pair can queue ribosomes back to the beginning of the coding sequence, thereby inhibiting further ribosome attachment to the message which can result in down- regulation of protein expression levels as the rate of translation initiation readily saturates and the slowest translation step time becomes rate limiting. It is also proposed herein that the presence of a pause or translational slowing codon pair can stall or detach a ribosome. It is also proposed herein that the presence of a pause or translational slowing codon pair can expose naked mRNA, which is then subject to message degradation.
  • Organism-specific codon usage and codon pair usage, and the presence of organism-specific pause sites result in gene translation that is highly adapted to the original host organism.
  • ribosomal pausing sites that may be functional in a human cell will typically be scrambled, random, or not appropriate or not recognized in the proper context in a bacterium or other non-native host.
  • a heterologous cDNA or synthetic polynucleotide has a random but high probability of inadvertently encoding a pause site somewhere, often leading to protein expression and/or activity failure.
  • Methods for refining translational kinetics of an mRNA into polypeptide can be performed according to any method known in the art, as exemplified in U.S. Patent Publication No. 2008/0046192, published on February 21, 2008, which is incorporated by reference herein in its entirety.
  • a polypeptide-encoding nucleotide can be designed to be predicted to be translated rapidly along its entire length.
  • some polypeptide-encoding nucleotides provided herein are those that have been engineered to remove all predicted pauses. Expression of such a polypeptide-encoding nucleotide can result in improved protein expression levels and improved levels of active and/or natively folded polypeptide expression.
  • a test of translation pausing or slowing as a result of codon pair usage can be performed by comparing a series of genes that have random pauses with modified genes where codon pairs predicted to cause translational pauses are replaced. Unmodified genes moved from their source organism and expressed in a heterologous host can have an altered set of codon pairs predicted to cause a translational pause or ribosomal slowing (e.g., an altered set of over-represented codon pairs), resulting in altered configuration and location of presumed pause sites.
  • translational kinetics of an mRNA into TrCBH- II-encoding polypeptide can be changed in order to remove some or all translational pauses or replace other codon pairs that cause translational slowing, message instability and degradation, and poor protein translation, expression, and functional properties. While not intending to be limited to the following, it is believed that, for at least some proteins, reduction or elimination of translational pauses can serve to increase the expression level and/or quality and characteristics of the protein. Accordingly, by removing some or all translational pauses or replacing other codon pairs that cause translational slowing, the expression levels and/or quality of an expressed protein can be increased.
  • the cellobiohydrolase-encoding nucleotide sequences provided herein allow for one or more of the following results: higher expression levels, higher enzymatic activity, greater protein stability, resistance to degradation, and increased solubility compared to the original native gene when expressed in a heterologous host.
  • cellobiohydrolase-encoding nucleotide sequences that have been modified to have one or more transcriptional pauses or slowing sites removed by modifying one or more codon pairs to a corresponding codon pair that is less likely to cause a translational pause or slowing. While in some embodiments it is preferred to replace all codon pairs predicted to cause a translational pause or slowing, in other embodiments, it is sufficient to replace a subset of codon pairs predicted to cause a translational pause or slowing. For example, expression levels can be increased by replacing at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more codon pairs predicted to cause a translational pause or slowing.
  • At least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98% or 99% of codon pairs predicted to cause a translational pause or slowing are replaced by, for example, substituting different codon pairs that encode the same amino acids.
  • translational kinetics of an mRNA into polypeptide can be changed in order to remove some or all translational pauses predicted to occur within an autonomous folding unit of a protein.
  • an autonomous folding unit of a protein refers to an element of the overall protein structure that is self- stabilizing and often folds independently of the rest of the protein chain. Such autonomous folding units typically correspond to a protein domain.
  • expression of a gene in a heterologous host organism can result in translational pauses located in regions that inhibit protein expression and/or protein folding.
  • preserving or inserting a translational pause in a region predicted to separate autonomous folding units of a protein can result in improved folding and/or solubility of expressed proteins.
  • methods of changing translational kinetics of an mRNA into polypeptide by preserving, relative to native, or inserting one or more translational pauses in one or more regions predicted to separate autonomous folding units of a protein, thereby increasing improving the folding and/or solubility of the expressed protein.
  • one step can include identifying predicted autonomous folding units of a protein.
  • Methods for identifying predicted autonomous folding units of a protein or protein domains are known in the art, and include alignment of amino acid sequences with protein sequences having known structures, and threading amino acid sequences against template protein domain databases.
  • Such methods can employ any of a variety of software algorithms in searching any of a variety of databases known in the art for predicting the location of protein domains.
  • the results of such methods will typically include an identification of the amino acids predicted to be present in a particular domain, and also can include an identification of the domain itself, and an identification of the secondary structural element, if any, in which each amino acid sequence of a domain is located.
  • the polypeptide- encoding nucleotide sequence it is not possible to modify the polypeptide- encoding nucleotide sequence to remove a translational pause not present in the expression profile of the polypeptide in the native host organism. For example, there may be no codon pairs that are not predicted to cause a translational pause or slowing and that encode a corresponding pair of amino acids. In such instances, several options are available: the codon pair that is least likely to cause a translational pause or slowing can be selected; an amino acid insertion, deletion or mutation can be introduced to yield a codon pair that is not predicted to cause a translational pause or slowing; or no change is made.
  • One option in a computational method is to request human input in order to resolve the issue.
  • the computational method may, for example, involve the use of a computer that is programmed to request human input.
  • the computer may be programmed to make a selection, or combination of selections, such that multiple genes, or Ordered Gene Sets or small permutation libraries are designed and synthetically produced for use in expression analysis.
  • an amino acid insertion, deletion or mutation is made in order to change translational kinetics, it is preferable to select a change that is predicted not to substantially influence the final three-dimensional structure of the protein and/or the activity of the protein.
  • Such an amino acid insertion, deletion or mutation can include, for example, a conservative amino acid substitution such as the conservative substitutions shown in Table 1.
  • the substitutions shown are based on amino acid physical-chemical properties, and as such, are independent of organism.
  • the conservative amino acid substitution is a substitution listed under the heading of exemplary substitutions.
  • codon pairs predicted to cause a translational pause or slowing are treated equally
  • one or more different threshold levels can be established for differential treatment of codon pairs, where codon pairs above a highest threshold are the codon pairs most likely to cause a translational pause or slowing, and succeedingly lower codon pair threshold-based groups correspond to succeedingly lower likelihoods of the respective codon pairs causing a translational pause or slowing.
  • codon pairs above a highest threshold are the codon pairs most likely to cause a translational pause or slowing
  • succeedingly lower codon pair threshold-based groups correspond to succeedingly lower likelihoods of the respective codon pairs causing a translational pause or slowing.
  • different numbers or percentages of codon pairs can be replaced for each of these different threshold-based groups. For example, 95% or more codon pairs above a highest threshold level can be replaced, while 90% or less of all codon pairs between that level and an intermediate threshold level are replaced.
  • codon pairs likely to cause a translational pause or slowing can be segregated into two or more different threshold- based groups, three or more different threshold-based groups, four or more different threshold-based groups, five or more different threshold-based groups, six or more different threshold-based groups, or more. Discussion of specific thresholds are provided elsewhere herein; however, typically the higher the threshold, the higher the likelihood of a translational pause or slowing caused by a codon pair with a translational kinetics value greater than the threshold. In embodiments in which codon pairs likely to cause a translational pause or slowing can be segregated into two or more different threshold- based groups, different numbers or percentages of codon pairs can be replaced for each codon pair group.
  • codon pairs above a highest threshold are replaced, while the same or a lower percentage of codon pairs are replaced from codon pair groups corresponding to one or more lower thresholds.
  • the same or a lower percentage of codon pairs are replaced.
  • all codon pairs above a highest threshold are replaced, while a codon pair above an intermediate threshold is replaced only if the codon pair is located within an autonomous folding unit.
  • all codon pairs above a highest threshold are replaced, while a codon pair above an intermediate threshold is replaced only if the codon pair can be replaced without requiring a change in the encoded polypeptide sequence.
  • all codon pairs above a highest threshold are replaced, while a codon pair above a first higher intermediate threshold is replaced only if the codon pair can be replaced without changing the encoded polypeptide sequence or with only a conservative change to the encoded polypeptide sequence, while a codon pair above a second lower intermediate threshold is replaced only if the codon pair can be replaced without requiring any change in the encoded polypeptide sequence.
  • an evaluation method can be used that determines the degree to which a codon pair should be replaced according to the translational kinetics value of the codon pair, where the degree to which the codon pair should be replaced can be counterbalanced by any of a variety of user-determined factors such as, for example, presence of the codon pair within or between autonomous folding units, and degree of change to the encoded polypeptide sequence.
  • a translational kinetics value of a codon pair is a representation of the degree to which it is expected that a codon pair is associated with a translational pause. Methods of determining the translational kinetics value of a codon pair are discussed elsewhere herein. Such translational kinetics values can be normalized to facilitate comparison of translational kinetics values between species. In some embodiments, the translational value can be the degree of over-representation of a codon pair. An over-represented codon pair is a codon pair which is present in a protein-encoding sequence in higher abundance than would be expected if all codon pairs were statistically randomly abundant.
  • a codon pair predicted to cause a translational pause or slowing is a codon pair whose likelihood of causing a translational pause or slowing is at least one standard deviation above the mean translational kinetics value, where a particular translational kinetics value above the mean translational kinetics value in this context refers to a translational kinetics value indicative of a greater likelihood of causing translational pausing or slowing, relative to a mean translational kinetics value, and is not strictly limited to a particular mathematical relationship (e.g., greater than the mean) since the depiction of propensity to cause a translational pause by a translational kinetics value can be selected to be negative or positive, based on the selected implementation by one skilled in the art.
  • over-represented codon pairs may be graphically displayed as a positive function in a SpeedPlotTM, as depicted in Figure 1 , where a positive deflection or peak above a selected threshold describes a translational pause or slowing at the exact nucleotide location as defined by the abscissa.
  • a threshold for the translational kinetics value of codon pairs that are predicted to cause a translational pause or slowing can be set in accordance with the method and level of stringency desired by one skilled in the art.
  • a threshold value can be set to 5, or 3, or 2, or 1.5 standard deviations or more above the mean.
  • Typical threshold values can be at least 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 3, 3.5, 4, 4.5 and 5 or more standard deviations above the mean.
  • a plurality of thresholds can be applied in the herein-provided methods in segregating codon pairs into a plurality of groups. Each threshold of such a plurality can be a different value selected from 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 3, 3.5, 4, 4.5 and 5 or more standard deviations above the mean.
  • translational kinetics of an mRNA into polypeptide can be changed to add or retain one or more translational pauses predicted to occur before, after or within an autonomous folding unit of a protein, or between autonomous folding units. While not intending to be limited to the following, it is proposed that translational pauses are present in wild type genes in order to slow translation of a nascent polypeptide subsequent to translation of a protein domain, thus providing time for acquisition of secondary and at least partial tertiary structure in the domain prior to further downstream translation and reorganization or reconfiguration of the growing polypeptide or domain. By modifying the translational kinetics of complex multi-domain proteins it may be possible to experimentally alter the time each domain has available to organize.
  • Folding of a heterologously-expressed gene having two or more independent domains can be altered by the presence of pause sites between the domains. Refolding studies indicate that the time it takes for a protein to settle into its final configuration may take longer than the translation of the protein. Pausing may allow each domain to partially organize and commit to a particular, independent fold. Other co- translational events, such as those associated with co-factors, protein subunits, protein complexes, membranes, chaperones, secretion, or proteolysis complexes, also can depend on the kinetics of the emerging nascent polypeptide. Pauses can be introduced by engineering one codon pair predicted to cause a translational pause or slowing, or two or more such codon pairs into the sequence to facilitate these co-translational interactions.
  • typically a translational pause is preserved, which refers to maintaining the same codon pair for a polypeptide-encoding nucleotide sequence that is expressed in the native host organism, or, when the polypeptide-encoding nucleotide sequence is heterologously expressed, changing the codon pair as appropriate to have a translational kinetics value comparable to or closest to the translational kinetics value of the native codon pair in the native host organism.
  • proximal codon pairs can be selected to be replaced in order to introduce a translational pause or slowing.
  • one of the 1, 2, 3, 4 or 5 most proximal codon pairs upstream (5' of the desired pause site) or one of the 1, 2, 3, 4 or 5 most proximal codon pairs downstream (3' of the desired pause site) can be chosen for replacement to introduce the translational pause or slowing.
  • the selected codon pair for replacement to introduce the translational pause or slowing is the codon pair closest to the originally desired codon pair location of the translational pause or slowing, provided the desired translational pause or slowing can be attained (e.g., 1 codon pair upstream or downstream is typically selected instead of 2 codon pairs upstream or downstream, provided the desired translational pause or slowing can be attained).
  • a translational pause or slowing can be introduced by selecting a replacement codon pair encoding a conservative amino acid substitution, such as the conservative substitutions shown in Table 1.
  • replacement of a proximal codon pair to introduce a translational pause or slowing is preferred over replacement of a codon pair resulting in a change in the encoded amino acid sequence.
  • graphical displays of translational kinetics values of one or more proteins can be used to provide information to assist in the selection of a translational pause or slowing to preserve or insert in a redesigned polypeptide-encoding nucleotide sequence.
  • graphical displays of translational kinetics values can permit, for example, alignment of homologous proteins from different species and an identification, based on this alignment, of predicted translational pause or slowing sites that are conserved in the aligned proteins.
  • Such predicted translational pause or slowing sites can be preserved or inserted in a redesigned polypeptide-encoding nucleotide sequence.
  • regions between autonomous folding units in one or more proteins within a particular species can be graphically examined for the presence or absence of predicted pause sites.
  • Such graphical display methods can result in an identification of a region between autonomous folding units in which a translational pause or slowing is desirably preserved in a redesigned polypeptide-encoding sequence.
  • Methods for identifying and selecting conserved translational pauses can be performed according to any method known in the art, as exemplified in U.S. Patent Publication No. 2007/0298503, published on December 27, 2007, and U.S. Patent Publication No. 2007/0275399, published on November 29, 2007.
  • the codon pair translation kinetics values can be compared with a database of related gene sequences and conserved pause sites can be identified.
  • a synthetic gene can be designed wherein at least one conserved pause site is maintained to provide a synthetic gene with modified translation kinetics.
  • codon pairs are associated with translational pauses, and can thereby influence translational kinetics of an mRNA into polypeptide.
  • the methods of changing translational kinetics provided herein will typically be performed by modifying or designing one or more nucleotide sequences encoding a polypeptide to be expressed.
  • methods of modifying a gene or designing a synthetic nucleotide sequence encoding the polypeptide encoded by the gene collectively referred to herein as redesigning a polypeptide-encoding gene sequence or redesigning a polypeptide-encoding nucleotide sequence.
  • a cellobiohydrolase-encoding DNA sequence wherein the encoded sequence has at least a 50%, 60%, 70%, 75%, 80%, 85%, and more typically at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% amino acid sequence identity to the wild type cellobiohydrolase polypeptide sequence as set forth in SEQ ID NO: 1.
  • At least 1, 2 or 3 codon pairs of a polynucleotide sequence encoding the cellobiohydrolase (SEQ ID NO: 2) have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 1 , 2 or 3 codon pairs to be replaced are selected from the following: CCCTCT (nucleotides 463-468); GGCCAA (nucleotides 94-99); CAGTTT (nucleotides 565-570); GATATC (nucleotides 703-708); GTGGAA (nucleotides 691-696); GGATTT (nucleotides 1192-1197); GGTATT (nucleotides 1198-1203), or any other codon pair that can suitably be substituted.
  • at least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • CCCTCT nucleotides 463-4608 replaced with CCTTCT
  • GGCCAA nucleotides 94-99 replaced with GGTCAA
  • CAGTTT nucleotides 565-570 replaced with CAATTT
  • GATATC nucleotides 703- 708 replaced with GACATT
  • GTGGAA nucleotides 691-696 replaced with GTTGAA
  • GGATTT nucleotides 1192-1197
  • GGTATT nucleotides 1198-1203 replaced with GGAATT or any other codon pair replacement that can suitably be made.
  • At least 1, 2 or 3 codon pairs of a polynucleotide sequence encoding the cellobiohydrolase (SEQ ID NO: 2) have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 1 , 2 or 3 codon pairs to be replaced are selected from the following: CTCGGT (nucleotides 760-765); ATTGCC (nucleotides 631-636); GACAGC (nucleotides 1285-1290); GTCTGG (nucleotides 88- 93); GTCTGG (nucleotides 1246-1251); TTGCTG (nucleotides 1231-1236); GTGGTG (nucleotides 571-576); ACGCTG (nucleotides 22-27); ACGCTG (nucleotides 31-36); GACTGG (nucleotides 1168-1173); GCCGGA (nucleotides 559-564); CTGGTG (nucleotides 748-753), or any other codon pair that can suitably be substituted.
  • CTCGGT nucleotides 760-765
  • ATTGCC nucleotides 631-636
  • GACAGC nucleo
  • At least 1 , 2, or 3 of the following codon pair replacements have been made: CTCGGT (nucleotides 760-765) replaced with CTGGGT; ATTGCC (nucleotides 631-636) replaced with ATTGCG; GACAGC (nucleotides 1285-1290) replaced with GACTCT; GTCTGG (nucleotides 88- 93) replaced with GTTTGG; GTCTGG (nucleotides 1246-1251) replaced with GTTTGG; TTGCTG (nucleotides 1231-1236) replaced with CTGCTG; GTGGTG (nucleotides 571- 576) replaced with GTTGTT; ACGCTG (nucleotides 22-27) replaced with ACCCTC; ACGCTG (nucleotides 31-36) replaced with ACCCTG; GACTGG (nucleotides 1168- 1173) replaced with GATTGG; GCCGGA (nucleocleo
  • At least 1, 2 or 3 codon pairs of a polynucleotide sequence encoding the cellobiohydrolase (SEQ ID NO: 2) have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 1 , 2 or 3 codon pairs to be replaced are selected from the following: CAGTTT (nucleotides 565-570); TTTGAC (nucleotides 1303-1308); TCGTTT (nucleotides 1240-1245); GGCCAA (nucleotides 94- 99); AAGAAT (nucleotides 541-546); AAGAAT (nucleotides 934-939); GCCAAA (nucleotides 649-654); GTCAAG (nucleotides 1252-1257); GGTATT (nucleotides 1198- 1203); ATCAAC (nucleotides 808-813); GGCCAT (nucleotides 865-870); CTTCCA (nucleotides 835-840); GATATC (nucleotides 703-708); TCGTTG (nucleotides 1228- 1233), or any other codon pair that can suitably
  • At least 1 , 2, or 3 of the following codon pair replacements have been made: CAGTTT (nucleotides 565-570) replaced with CAATTT; TTTGAC (nucleotides 1303-1308) replaced with TTTGAT; TCGTTT (nucleotides 1240-1245) replaced with TCTTTT; GGCCAA (nucleotides 94- 99) replaced with GGACAA; AAGAAT (nucleotides 541-546) replaced with AAAAAT; AAGAAT (nucleotides 934-939) replaced with AAAAAC; GCCAAA (nucleotides 649- 654) replaced with GCTAAA; GTCAAG (nucleotides 1252-1257) replaced with GTTAAA; GGTATT (nucleotides 1198-1203) replaced with GGAATC; ATCAAC (nucleotides 808-813) replaced with ATTAAT; G
  • At least 1, 2 or 3 codon pairs of a polynucleotide sequence encoding the cellobiohydrolase (SEQ ID NO: 2) have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 1, 2 or 3 codon pairs to be replaced are selected from the following: GGCCAA (nucleotides 94-99); CAGTTT (nucleotides 565-570); GATATC (nucleotides 703-708); TATTTG (nucleotides 853- 858); GGCCAT (nucleotides 865-870); TCGTTG (nucleotides 1228-1233); TTTGTC (nucleotides 1243-1248); TTCCAA (nucleotides 1363-1368), or any other codon pair that can suitably be substituted.
  • GGCCAA nucleotides 94-99
  • CAGTTT nucleotides 565-570
  • GATATC nucleotides 703-708
  • TATTTG nucleotides 853- 858
  • GGCCAT nucleotides 865-870
  • TCGTTG nucleotides 1228-1233
  • TTTGTC nucleo
  • GGCCAA nucleotides 94-99
  • CAGTTT nucleotides 565-570
  • CAATTC CAATTC
  • GATATC nucleotides 703-708) replaced with GACATT
  • TATTTG nucleotides 853- 858) replaced with TATTTA
  • GGCCAT nucleotides 865-870
  • TCGTTG nucleotides 1228-1233 replaced with TCTTTA
  • TTTGTC nucleotides 1243- 1248
  • TTCCAA nucleotides 1363-1368
  • At least 1, 2 or 3 codon pairs of a polynucleotide sequence encoding the cellobiohydrolase (SEQ ID NO: 2) have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 1 , 2 or 3 codon pairs to be replaced are selected from the following: GTGCCT (nucleotides 55-60); GCCAAT (nucleotides 370-375); GCTATT (nucleotides 406-411); GCCGGA (nucleotides 559- 564); GCCAAT (nucleotides 778-783); TTGGCA (nucleotides 967-972); AAGCTG (nucleotides 1051-1056); GCTATT (nucleotides 1066-1071); GCCAAT (nucleotides 1084-1089); ACCGGA (nucleotides 1147-1152); ACCGGA (nucleotides 1189-1194); GGTATT (nucleotides 1 198 - 1203); GACAGC (nucleotides 1285-1290); GATGCC (nucleotides 1327-1332); GCCTTG (nucleotides 1285-12
  • GGTATT nucleotides 1198-1203 replaced with GTGCCT (nucleotides 55-60) replaced with GTTCCG
  • GCCAAT nucleotides 370-375 replaced with GCTAAT
  • GCTATT nucleotides 406-411 replaced with GCCATT
  • GCCGGA nucleotides 559-564 replaced with GCTGGT
  • GCCAAT nucleotides 778-783 replaced with GCGAAT
  • TTGGCA nucleotides 967-972 replaced with TTGGCT
  • AAGCTG nucleotides 1051-1056
  • AAATTG AAATTG
  • GCTATT nucleotides 1066-1071 replaced with GCCATT
  • GCCAAT nucleotides 1084-1089
  • a cellobiohydrolase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the cellulose binding domain of the cellobiohydrolase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for cellulose binding domains are known in the art.
  • the cellulose binding domain includes at least amino acids 35-58, 30- 61 or 27-62.
  • the replacement codon pairs are predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism. That is, the embodiments in which one or more codon pairs encoding amino acids of the cellulose binding domain have been replaced include embodiments in which the nucleotide sequence encoding the cellulose binding domain is changed to increase the predicted translational kinetics of translation of the cellulose binding domain.
  • incomplete translation, improper folding, or other protein expression shortcomings can result from the presence of one or more translational pauses in a heterologously-expressed polypeptide.
  • removal of one or more of these pauses can increase the speed of translation of the cellulose binding domain, and thereby increase the quantity of protein produced and/or increase the amount of stable, properly folded, active, and/or soluble protein produced.
  • the replacement codons i.e., the codons added as replacements for the wild type codons
  • the replacement codon are typically predicted to be less likely to cause a translational pause.
  • the replacement codon can have a translational kinetics value in the heterologous host organism that is 95%, 90%, 85%, 80%, 75%, 70%, or less, than the translational kinetics value of the wild type codon pair when expressed in the heterologous host organism.
  • the replacement codon is selected to have a translational kinetics value similar to the translational kinetics value of the wild type codon pair in the native organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism can be no more than 250%, 200%, 150%, 125% or 100% of the z score for the wild type codon pair when expressed in the native organism.
  • a cellobiohydrolase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the glycosyl hydrolase domain of the cellobiohydrolase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for glycosyl hydrolase domains are known in the art.
  • the glycosyl hydrolase domain includes at least amino acids 124-437, 1 15-450 or 107-471.
  • the replacement codon pairs are predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism. That is, the embodiments in which one or more codon pairs encoding amino acids of the glycosyl hydrolase domain have been replaced include embodiments in which the nucleotide sequence encoding the glycosyl hydrolase domain is changed to increase the predicted translational kinetics of translation of the glycosyl hydrolase domain.
  • incomplete translation, improper folding, or other protein expression shortcomings can result from the presence of one or more translational pauses in a heterologously-expressed polypeptide.
  • removal one or more of these pauses can increase the speed of translation of the glycosyl hydrolase domain, and thereby increase the quantity of protein produced and/or increase the amount of stable, properly folded, active, and/or soluble protein produced.
  • the replacement codons i.e., the codons added as replacements for the wild type codons
  • the replacement codon are typically predicted to be less likely to cause a translational pause.
  • the replacement codon can have a translational kinetics value in the heterologous host organism that is 95%, 90%, 85%, 80%, 75%, 70%, or less, than the translational kinetics value of the wild type codon pair when expressed in the heterologous host organism.
  • the replacement codon is selected to have a translational kinetics value similar to the translational kinetics value of the wild type codon pair in the native organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism can be no more than 250%, 200%, 150%, 125% or 100% of the z score for the wild type codon pair when expressed in the native organism.
  • a cellobiohydrolase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the region between the cellulose binding domain and the glycosyl hydrolase domain of the cellobiohydrolase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for the cellulose binding domain and glycosyl hydrolase domain are described hereinabove.
  • the replacement codon pairs are predicted to be more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism. That is, the embodiments in which one or more codon pairs encoding amino acids in the region between the cellulose binding domain and the glycosyl hydrolase domain have been replaced include embodiments in which the nucleotide sequence encoding the region between the cellulose binding domain and the glycosyl hydrolase domain is changed to decrease the predicted translational kinetics of translation of the region between the cellulose binding domain and the glycosyl hydrolase domain.
  • incomplete translation, improper folding, or other protein expression shortcomings can result from the absence of one or more translational pauses in a heterologously-expressed polypeptide.
  • adding one or more of these pauses can increase the speed of translation of the glycosyl hydrolase domain, and thereby increase the quantity of protein produced and/or increase the amount of stable, properly folded, active, and/or soluble protein produced.
  • the replacement codons i.e., the codons added as replacements for the wild type codons
  • the replacement codon are typically predicted to be more likely to cause a translational pause.
  • the replacement codon can have a translational kinetics value in the heterologous host organism that is 105%, 110%, 115%, 120%, 125%, 130%, or more, than the translational kinetics value of the wild type codon pair when expressed in the heterologous host organism.
  • the replacement codon is selected to have a translational kinetics value similar to the translational kinetics value of the wild type codon pair in the native organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism can be at least 75%, 80%, 85%, 90%, 95% or 100% of the z score for the wild type codon pair when expressed in the native organism.
  • polypeptide-encoding nucleotide sequence provided herein to modify the translational kinetics of the polypeptide-encoding nucleotide sequence, where the polypeptide-encoding nucleotide sequence is altered such that one or more codon pairs have a decreased likelihood of causing a translational pause or slowing relative to the unaltered polypeptide-encoding nucleotide sequence.
  • one or more nucleotides of a polypeptide-encoding nucleotide sequence can be changed such that a codon pair containing the changed nucleotides has a translational kinetics value indicative of a decreased likelihood of causing a translational pause or slowing relative to the unchanged polypeptide-encoding nucleotide sequence.
  • the redesigned polypeptide-encoding nucleotide sequence need not possess a high degree of identity to the polypeptide-encoding nucleotide sequence of the original gene, in some embodiments, the redesigned polypeptide-encoding nucleotide sequence will have at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% nucleotide identity with the polypeptide-encoding nucleotide sequence of the original gene.
  • an original gene refers to a gene for which codon pair refinement is to be performed; such original genes can be, for example, wild type genes, native genes, naturally occurring mutant genes, other mutant genes such as site-directed mutant genes or engineered or completely synthetic genes.
  • the polynucleotide sequence will be completely synthetic, and will bear much lower identity with the original gene, e.g., no more than 90%, 80%, 70%, 60%, 50%, 40%, or lower.
  • the resulting sequence can be designed to: (1) reduce or eliminate translational problems caused by inappropriate ribosome pausing, such as those caused by over- represented codon pairs or other codon pairs with translational values predictive of a translational pause; (2) have codon usage refined to avoid over-reliance on rare codons; (3) reduce in number or remove particular restriction sites, splice sites, internal Shine- Dalgarno sequences, or other sites that may cause problems in cloning or in interactions with the host organism; or (4) have controlled RNA secondary structure to avoid detrimental translational termination effects, translation initiation effects, or RNA processing, which can arise from, for example, RNA self-hybridization.
  • this sequence also can be designed to avoid oligonucleotides that mis-hybridize, resulting in genes that can be assembled from refined oligonucleotides that by thermodynamic necessity only pair up in the desired manner, using methods known in the art, as exemplified in U.S. Patent Publication No. 2005/0106590, which is hereby incorporated by reference in its entirety.
  • polypeptide-encoding nucleotide sequence it is not possible to modify the polypeptide- encoding nucleotide sequence to suitably modify the translational kinetics of the mRNA into polypeptide without modifying the amino acid sequence of the encoded polypeptide.
  • an amino acid insertion, deletion or mutation can be introduced to yield a codon pair that is not predicted to cause a translational pause or slowing; or no change is made.
  • the change is preferably predicted to not substantially influence the final three-dimensional structure of the protein and/or the activity of the protein.
  • Such non-identical polypeptides can vary by containing one or more insertions, deletions and/or mutations.
  • polypeptide sequence can vary according to the purpose of the change, typically such a change results in a polypeptide that is at least 50%, 60%, 70%, 75%, 80%, 85%, and more typically at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identical to the wild type polypeptide sequence.
  • the sequence of the polynucleotide can be generated, optionally in conjunction with optimization of a plurality of parameters where one such parameter can be codon pair usage, where the resultant polynucleotide can be prepared by assembly of a plurality of oligonucleotides sufficiently small to be synthesized by known oligonucleotide synthetic methods.
  • Methods known in the art for optimizing multiple parameters in synthetic nucleotide sequences can be applied to optimizing the parameters recited in the present claims. Such methods may advantageously include those exemplified in U.S. Patent App. Publication No. 2005/0106590, U.S. Patent App. Publication No. 2007/0009928, and R. H.
  • an exemplary method for generating a sequence can also include dividing the desired sequence into a plurality of partially overlapping segments; optimizing the melting temperatures of the overlapping regions of each segment to disfavor hybridization to the overlapping segments which are non- adjacent in the desired sequence; allowing the overlapping regions of single stranded segments which are adjacent to one another in the desired sequence to hybridize to one another under conditions which disfavor hybridization of non-adjacent segments; and filling in, ligating, or repairing the gaps between the overlapping regions, thereby forming a double-stranded DNA with the desired sequence.
  • This process can be performed manually or can be automated, e.g., in a general purpose digital computer.
  • the search of possible codon assignments is mapped into an anytime branch and bound computerized algorithm developed for biological applications.
  • a synthetic nucleotide sequence for the polynucleotides provided herein, where the synthetic nucleotide sequence also is typically designed to have desirable translational kinetics properties, such as the removal of some or all codon pairs predicted to result in a translational pause or slowing.
  • Such design methods include determining a set of partially overlapping segments with optimized melting temperatures, and determining the translational kinetics of the synthetic sequence, where if it is desired to change the translational kinetics of the synthetic gene, the sequences of the overlapping segments are modified and refined in order to approximate the desired translational kinetics while still possessing acceptable hybridization properties. In some embodiments, this process is performed iteratively.
  • a criterion is established for selecting codon pairs having high translational kinetics values to be replaced with codon pairs having lower the translational kinetics values unless a codon pair of this group is the site of a planned pause.
  • the top 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 5.5%, 6%, 6.5%, 7%, 7.5%, 8%, 8.5%, 9%, 9.5%, or 10% of codon pairs ranked by translational kinetics values can be replaced by codon pairs having lower translational kinetics values, such as translational kinetics value below a user defined level that can be, for example, a translational kinetics value equal to or below the translational kinetics values of codon pairs not in the top selected percentage, unless a codon pair of this group is the site of a planned pause (in which case it is not necessarily replaced).
  • all codon pairs above a user-selected translational kinetics value such as more than 5, 4.5, 4, 3.5, 3, 2.5 or 2 standard deviations above the mean translational kinetics value can be replaced by codon pairs having lower translational kinetics values, such as translational kinetics value below a user defined level that can be, for example, a translational kinetics value that is 4, 3.5, 3, 2.5, 2, 1.5 or 1 standard deviations less than the mean translational kinetics value, unless a codon pair of this group is the site of a planned pause (in which case it is not necessarily replaced).
  • polynucleotide sequences design methods provided herein can be employed where a plurality of properties of the polynucleotide sequences can be refined in addition to codon pair usage properties, where such properties can include, but are not limited to, melting temperature gap between oligonucleotides of synthetic gene, average codon usage, average codon pair chi-squared (e.g., z score), worst codon usage, worst codon pair (e.g., z score), maximum usage in adjacent codons, Shine-Dalgarno sequence (for E.
  • coli expression occurrences of 5 consecutive G's or 5 consecutive Cs, occurrences of 6 consecutive A's or 6 consecutive Ts, long exactly repeated subsequences, cloning restriction sites, user-prohibited sequences (e.g., other restriction sites), codon usage of a specific codon above user-specified limit, and out-of- frame stop codons (framecatchers).
  • additional properties that can be considered in a process of designing a polynucleotide sequence include, but are not limited to, occurrences of RNA splice sites, occurrences of polyA sites, and occurrence of ribosome binding sequence.
  • a process of designing a poly nucleotide sequence can include constraints including, but not limited to, minimum melting temperature gap between oligonucleotides of synthetic gene, minimum average codon usage, maximum average codon pair chi-squared (z score), minimum absolute codon usage, maximum absolute codon pair (z score), minimum maximum usage in adjacent codons, no Shine-Dalgarno sequence (for E.
  • additional constraints can include, but are not limited to, minimum occurrences of RNA splice sites, minimum occurrences of polyA sites, and occurrence of ribosome binding sequence.
  • a process of designing a polynucleotide sequence can include preferences including, but not limited to, prefer high average codon usage, prefer low average codon pair chi-squared, prefer larger melting temperature gap, prefer more out of frame stop codons (framecatchers), and optionally prefer evenly distributed codon usage.
  • Any of a variety of nucleotide sequence refinement/optimization methods known in the art can be used to refine the polynucleotide sequence according to the codon pair usage properties, and according to any of the additional properties specifically described above, or other properties that are refined in nucleotide sequence redesign methods known in the art.
  • a branch and bound method is employed to refine the polynucleotide sequence according to codon pair usage properties and at least one additional property, such as codon usage.
  • the methods provided herein can further include analyzing at least a portion of the candidate polynucleotide sequence in frame shift, and selecting codons for the candidate polynucleotide sequence such that stop codons are added to at least one said frame shift.
  • the generating step further includes analyzing at least a portion of the candidate polynucleotide sequence in frame shift, and selecting codons for the candidate polynucleotide sequence such that one or more stop codons in one, two or three reading frames are added downstream of polypeptide-encoding region of the nucleotide sequence.
  • methods for redesigning a polypeptide-encoding gene for expression in a host organism, by providing a data set representative of codon pair translational kinetics for the host organism which includes translational kinetics values of the codon pairs utilized by the host organism, providing a desired polypeptide sequence for expression in the host organism, and generating a polynucleotide sequence encoding the polypeptide sequence by analyzing candidate nucleotides to select, where possible, codon pairs that are predicted not to cause a translational pause in the host organism, with reference to the data set, thereby providing a candidate polynucleotide sequence encoding the desired polypeptide.
  • Also provided herein are methods for redesigning a polypeptide- encoding gene for expression in a host organism by providing a first data set representative of codon pair translational kinetics for the host organism which includes translational kinetics values of the codon pairs utilized by the host organism, providing a second data set representative of at least one additional desired property of the synthetic gene, providing a desired polypeptide sequence for expression in the host organism, and generating a polynucleotide sequence encoding the polypeptide sequence by analyzing candidate nucleotides to select, where possible, both (i) codon pairs that are predicted not to cause a translational pause in the host organism, with reference to the first data set, and (ii) nucleotides that provide a desired property, with reference to the second data set, thereby providing a candidate polynucleotide sequence encoding the desired polypeptide.
  • a branch and bound method is employed to refine the polypeptide- encoding nucleotide sequence according to codon pair usage properties of the first data set and according to the properties of the second data set.
  • the second data set contains codon preferences representative of codon usage by the host organism, including the most common codons used by the host organism for a given amino acid.
  • a cellobiohydrolase-encoding DNA sequence wherein the encoded sequence has at least a 50%, 60%, 70%, 75%,80%, 85%, and more typically at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% amino acid sequence identity to the wild type cellobiohydrolase polypeptide sequence as set forth in SEQ ID NO: 2.
  • the polynucleotide provided herein is adapted for expression in a heterologous host organism.
  • a heterologous host organism is an organism used to express DNA, RNA or protein that is foreign to the host organism.
  • the host organism is not human, E. coli or S. cerevisiae.
  • At least 1, 2 or 3 codon pairs of the original sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein.
  • the at least three codon pairs of the original sequence that are predicted to cause a translational pause in the host organism are highly-overrepresented codon pairs therein and have been replaced with codon pairs that are not highly-overrepresented therein.
  • a highly- overrepresented codon pair is a codon pair that has a translational kinetics value greater than a designated threshold, wherein a threshold value can be at least 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 3, 3.5, 4, 4.5 or 5 or more standard deviations above the mean translational kinetics value.
  • a cellobiohydrolase-encoding DNA sequence having at least a 75% sequence identity with an original cellobiohydrolase polypeptide sequence as set forth in SEQ ID NO: 2 and is adapted for expression in a heterologous host organism, wherein at least three codon pairs of the original sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein, and wherein the host organisms are selected from the following: Pichia pastoris; Oryctolagus cuniculus (rabbit); Macaca fascicularis (Long-tailed monkey); M. mulatta (Monkey); E. coli K12 W3110; E.
  • E. coli UTI89 E. coli O157:H7 EDL933; E. coli O157:H7 str. Sakai; Bombyx mori; Spodoptera frugiperda; Drosophila melanogaster, Kluyveromyces lactis, Zymomonas mobilis and Schizosaccharomyces pombe.
  • the methods provided herein can include analyzing the candidate polynucleotide sequence to confirm that no codon pairs are predicted to cause a translational pause in the host organism by more than a designated threshold.
  • the likelihood that a particular codon pair will cause translational pausing or slowing in an organism can be represented by a translational kinetics value.
  • the translational kinetics value can be expressed in any of a variety of manners in accordance with the guidance provided herein. In one example, a translational kinetics value can be expressed in terms of the mean translational kinetics value and the corresponding standard deviation for all codon pairs in an organism.
  • the translational kinetics value for a particular codon pair can be expressed in terms of the number of standard deviations that separate the translational kinetics value of the codon pair from the mean translational kinetics value.
  • a threshold value can be at least 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 3, 3.5, 4, 4.5 or 5 or more standard deviations above the mean translational kinetics value.
  • the methods provided herein also include generating a candidate nucleotide sequence according to codon usage.
  • codon usage As is known in the art, different organisms can have different preference for the three- nucleotide codon sequence encoding a particular amino acid. As a result, translation can often be improved by using the most common three-nucleotide codon sequence encoding a particular amino acid.
  • some methods provided herein also include generating a candidate nucleotide sequence such that codon utilization is non-randomly biased in favor of codons most commonly used by the host organism. Codon usage preferences are known in the art for a variety of organisms and methods for selecting the more commonly used codons are well known in the art.
  • the methods of redesigning a polypeptide- encoding nucleotide sequence are based on a plurality of properties, where a conflict in the preferred nucleotide sequence arising from the plurality of properties is determined in order to optimize the predicted translational kinetics. That is, when the plurality of properties being optimized would lead to more than one possible nucleotide sequence depending on which property is to be accorded more weight, typically, the conflict is resolved by selecting the nucleotide sequence predicted to be translated more rapidly, for example, due to fewer predicted translational pauses.
  • the methods of redesigning a polypeptide-encoding nucleotide sequence are based on a plurality of properties, where a conflict in the preferred nucleotide sequence arising from the plurality of properties is determined in order to optimize codon pair usage preferences. That is, when the plurality of properties being optimized would lead to more than one possible nucleotide sequence depending on which property is to be accorded more weight, typically, codon pair usage will be accorded more weight in order to resolve the conflict between the more than one possible nucleotide sequences.
  • the methods provided herein can include identifying at least one instance of a conflict between selecting common codons and avoiding codon pairs predicted to cause a translational pause; in such instances, the conflict is resolved in favor of avoiding codon pairs predicted to cause a translational pause.
  • Some embodiments provided herein include generating a candidate polynucleotide sequence encoding the polypeptide sequence, the candidate polynucleotide sequence having a non-random codon pair usage, such that the codon pairs encoding any particular pair of amino acids have the lowest translational kinetics values.
  • the candidate polynucleotide sequence encoding the polypeptide sequence is generated and/or altered such that the encoded amino acid sequence is not altered.
  • the candidate polynucleotide sequence encoding the polypeptide sequence is generated and/or altered such that the three dimensional structure of the encoded polypeptide is not substantially altered.
  • the candidate polynucleotide sequence encoding the polypeptide sequence is generated and/or altered such that no more than conservative amino acid changes are made to the encoded polypeptide.
  • the methods provided herein can further include a step of refining or altering the candidate polynucleotide sequence in accordance with a second nucleotide sequence property to be refined.
  • the methods further include generating or refining a candidate polynucleotide sequence encoding a polypeptide sequence such that the candidate polynucleotide sequence has a non-random codon usage, where the most common codons used by the host organism are over-represented in the candidate polynucleotide sequence.
  • the methods can include refining or altering the candidate polynucleotide sequence in accordance with any of a variety of additional properties provided herein, including but not limited to, melting temperature gap between oligonucleotides of synthetic gene, Shine-Dai garno sequence, occurrences of 5 consecutive G's or 5 consecutive Cs, occurrences of 6 consecutive A's or 6 consecutive T's long exactly repeated subsequences, cloning restriction sites, or any other user-prohibited sequences. Further, any of a variety of combinations of these properties can be additionally included in the nucleotide sequence refinement methods provided herein.
  • the method provided herein can further include an evaluation step in which after the candidate polynucleotide sequence is altered, the sequence is compared with at least a portion of a data set of a property against which the sequence was refined.
  • an evaluation step in which after the candidate polynucleotide sequence is altered, the sequence is compared with at least a portion of a data set of a property against which the sequence was refined.
  • the candidate nucleotide sequence can be compared to each property considered in the refinement, and, if the values for all properties are deemed to be acceptable or desired, no further sequence alteration is required. If the values for fewer than all properties are deemed to be acceptable or desired, the candidate nucleotide sequence can be subjected to further sequence alteration and evaluation.
  • sequence alteration steps of methods provided herein can be performed iteratively. That is, one or more steps of altering the nucleotide sequence can be performed, and the candidate nucleotide sequence can be evaluated to determine whether or not further sequence alteration is necessary and/or desirable. These steps can be repeated until values for all properties are deemed to be acceptable or desired, or until no further improvement can be achieved.
  • the methods and sequences provided herein include determination and use of translational kinetics values for codon pairs. As provided herein, such a translational kinetics value can be calculated and/or empirically measured, and the final translational kinetics value used in graphical displays and methods of predicting translational kinetics can be a refined value resultant from two or more types of codon pair translational kinetics information.
  • codon pair translational kinetics information that can be used in refining or replacing a translational kinetics value for a codon pair include, for example, values of observed versus expected codon pair frequencies in a particular organism, normalized values of observed versus expected codon pair frequencies in a particular organism, the degree to which observed versus expected codon pair frequency values are conserved in related proteins across two or more species, the degree to which observed versus expected codon pair frequency values are conserved at predicted pause sites such as boundaries between autonomous folding units in related proteins across two or more species, the degree to which codon pairs are conserved at predicted pause sites across different proteins in the same species, and empirical measurement of translational kinetics for a codon pair.
  • the values of observed versus expected codon pair frequencies in a host organism can be determined by any of a variety of methods known in the art for statistically evaluating observed occurrences relative to expected occurrences. Regardless of the statistical method used, this typically involves obtaining codon sequence data for the organism, for example, on a gene-by-gene basis. In some embodiments, the analysis is focused only on the coding regions of the genome. Because the analysis is a statistical one, a large database is preferred. Initially, the total number of codons is determined and the number of times each of the 61 non-terminating codons appears is determined.
  • the expected frequency of each of the 3721 (61 2 ) possible non- terminating codon pairs is calculated, typically by multiplying together the frequencies with which each of the component codons appears.
  • This frequency analysis can be carried out on a global basis, analyzing all of the sequences in the database together; however, it is typically done on a local basis, analyzing each sequence individually. This will tend to minimize the statistical effect of an unusually high proportion of rare codons in a sequence.
  • the expected number of occurrences of each codon pair is calculated by, for example, multiplying the expected frequency by the number of pairs in the sequence. This information can then be added to a global table, and each next succeeding sequence can be analyzed in like manner.
  • the values of observed versus expected codon pair frequencies are chi-squared values, such as chi-squared 2 (chisq2) values or chi- squared 3 (chisq3) values.
  • Methods for calculating chi-squared values can be performed according to any method known in the art, as exemplified in U.S. Patent No. 5,082,767, which is incorporated by reference herein in its entirety.
  • a new value chi-squared 2 (chisq2) can be calculated as follows. For each group of codon pairs encoding the same amino acid pair (i.e., 400 groups), the sums of the expected and observed values are tallied; any non-randomness in amino acid pairs is reflected in the difference between these two values. Therefore, each of the expected values within the group is multiplied by the factor [sum observed/sum expected], so that the sums of the expected and observed values with the group are equal. The new chi- squared, chisq2, is evaluated using these new expected values.
  • a new value chi-squared 3 (chisq3) can be calculated. Correction is made only for those dinucleotides formed between adjacent codon pairs; any bias of dinucleotides within codons (codon triplet positions I-II and II-III) will directly affect codon usage and is, therefore, automatically taken into account in the underlying calculations.
  • the sums of the expected and observed values are tallied; any non- randomness in dinucleotide pairs is reflected in the difference between these two values. Therefore, each of the expected values within the group is multiplied by the factor [sum observed/sum expected], so that the sums of the expected and observed values with the group are equal.
  • the new chi-squared, chisq3, is evaluated using these new expected values.
  • Dinucleotide bias represents a smaller effect in yeast, and only a very minor one in E. coli.
  • the predominant dinucleotide bias in human is the well-known CpG deficit, other dinucleotides are also very highly biased. For example, there is a deficit of TA, as well as an excess of TG, CA and CT. Overall, the deficit of CpG contributes only 35% of the total dinucleotide bias in the human database, and 17% in yeast.
  • the values of observed versus expected codon pair frequencies in a host organism herein can be normalized. Normalization permits different sets of values of observed versus expected codon pair frequencies to be compared by placing these values on the same numerical scale. For example, normalized codon pair frequency values can be compared between different organisms, or can be compared for different codon pair frequency value calculations within a particular organism (e.g., different calculations based on input sequence information or based on different calculations such as chisql or chisq2 or chisq3). Typically, normalization results in codon pair frequency values that are described in terms of their mean and standard deviation from the mean. [0123] An exemplary method for normalizing codon pair frequency values is the calculation of z scores.
  • the z score for an item indicates how far and in what direction that item deviates from its distribution's mean, expressed in units of its distribution's standard deviation.
  • the mathematics of the z score transformation are such that if every item in a distribution is converted to its z score, the transformed scores will have a mean of zero and a standard deviation of one.
  • the z scores transformation can be especially useful when seeking to compare the relative standings of items from distributions with different means and/or different standard deviations, z scores are especially informative when the distribution to which they refer is normal. In a normal distribution, the distance between the mean and a given z score cuts off a fixed proportion of the total area under the curve.
  • An exemplary method for determining z scores for codon pair chi- squared values is as follows: First, a list of all 3721 possible non-terminating codon pairs is generated. Second, for the i ⁇ codon pair, the i ft chi-squared value is calculated, where the i th chi-squared value is denoted c,. The chi-squared value, C 1 , is given the sign of (observed - expected), so that over-represented codon pairs are assigned a positive c, and under-represented codon pairs are assigned a negative C 1 .
  • the formula for c is:
  • the mean chi-squared value is calculated where the mean is denoted m.
  • the standard deviation of the chi-squared values is calculated, where the standard deviation is denoted s.
  • a z score is calculated by subtracting the mean then dividing by the standard deviation, wherein the i th z score is denoted Z 1 .
  • the formula for the z score is: S
  • provided herein are methods of refining the predictive capability of a translational kinetics value of a codon pair in a host organism by providing an initial translational kinetics value based on the value of observed codon pair frequency versus expected codon pair frequency for a codon pair in a host organism, providing additional translational kinetics data for the codon pair in the host organism, and modifying the initial translational kinetics value according to the additional codon pair translational kinetics data to generate a refined translational kinetics value for the codon pair in the host organism.
  • the translational kinetics data that can be used to refine translational kinetics values and methods of modifying translational kinetics values according to such additional translational kinetics data to generate a refined translational kinetics value for a codon pair in a host organism are provided below.
  • translational kinetics data that can be used to refine translational kinetics values are based on recurrence of a codon pair and/or recurrence of a predicted translational kinetics value associated with a codon pair.
  • Recurrence-based refinement of translational kinetics values is based on the investigation of multiple polypeptide-encoding nucleotide sequences to determine whether or not there are multiple occurrences of either codon pairs or predicted translational kinetics values in those sequences.
  • Recurrence-based refinement of translational kinetics can be performed using any of a variety of known sequence comparison methods consistent with the examples provided herein. For purposes of exemplification, and not for limitation, the following example of recurrence-based refinement of translational kinetics is provided.
  • the predicted translational kinetics value for a codon pair can be refined according to the degree to which observed versus expected codon pair frequency values are conserved in related proteins across two or more species.
  • related proteins are proteins having homologous amino acid sequences and/or similar three dimensional structures.
  • Related proteins having homologous amino acid sequences will typically have at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95% sequence identity.
  • Related proteins having similar three dimensional structures will typically share similar secondary structure topology and similar relative positioning of secondary structural elements; exemplary related proteins having three dimensional structures are members of the same SCOP- classified Family (see, e.g., Murzin A. G., Brenner S.
  • the codon pair located at the position on a protein that is confirmed as, or considered to have an increased likelihood of, containing an actual translational pause or slowing can itself be confirmed as being, or considered to have an increased likelihood of being, a functional translational kinetics signal.
  • a codon pair located at a position on a protein that is confirmed as not containing, or considered to have a decreased likelihood of containing, an actual translational pause or slowing can itself be confirmed as not acting, or considered to have an decreased likelihood of acting, as a functional translational kinetics signal.
  • initially predicted translational kinetics data e.g., data based on values of observed codon pair frequency versus expected codon pair frequency
  • the predicted translational kinetics value for a codon pair can be refined according to the presence of the codon pair at a location predicted by methods other than codon pair frequency methods to contain a translational pause or slowing site.
  • a predicted location is a boundary location between autonomous folding units of a protein.
  • translational pauses are present in wild type genes in order to slow translation of a nascent polypeptide subsequent to translation of a secondary structural element of a protein and/or a protein domain, thus providing time for acquisition of secondary and at least partial tertiary structure by the nascent protein prior to further downstream translation, and thereby allowing each domain to partially organize and commit to a particular, independent fold.
  • codon pairs can be associated with translational pauses between autonomous folding units of a protein, where autonomous folding units can be secondary structural elements such as an alpha helix, or can be tertiary structural elements such as a protein domain.
  • the presence of a codon pair at a boundary location between autonomous folding units of a protein can confirm or increase the likelihood that the codon pair acts to pause or slow translation.
  • predicted translational kinetics data e.g., data based on values of observed codon pair frequency versus expected codon pair frequency
  • predicted translational kinetics data can be modified according to the presence of the codon pair at a boundary location between autonomous folding units of a protein, which can increase the likelihood of the codon pair acts to pause or slow translation.
  • an over-represented codon pair that is present at a boundary location between autonomous folding units of a protein can be confirmed as acting as a translational pause or slowing codon pair.
  • a single observation of the codon pair at a boundary location between autonomous folding units of a protein can confirm or increase the likely translational pause or slowing properties of a codon pair.
  • typically a plurality of observations will be used to more accurately estimate the translational pause or slowing properties of a codon pair.
  • methods of using, for example, predicted boundary locations can be combined with methods that are based on recurrence of a codon pair and/or recurrence of a predicted translational kinetics value associated with a codon pair in methods of refining a predicted translational kinetics value for a codon pair.
  • a protein present in two or more species can have conserved boundary locations between autonomous folding units of the protein, and recurrent presence of an over-represented codon pair at the boundary locations can confirm the likelihood of an actual translational pause at that boundary location, leading to confirmation, or increased likelihood, that the corresponding codon pair for the respective species acts as a translational pause or slowing codon pair.
  • two or more proteins of the same species can have boundary locations between autonomous folding units, and recurrent presence of an over-represented codon pair at the boundary locations can confirm or indicate the likelihood of an actual translational pause at that boundary location, leading to confirmation or indication of increased likelihood that the corresponding codon pair acts as a translational pause or slowing codon pair.
  • Such recurrence-based methods also can be used to confirm or indicate increased likelihood that a non-over-represented codon pair (e.g., an under-represented codon pair or a represented-as-expected codon pair) acts as a translational pause or slowing codon pair.
  • a non-over-represented codon pair e.g., an under-represented codon pair or a represented-as-expected codon pair
  • two or more proteins of the same species can have boundary locations between autonomous folding units, and recurrent presence of a non- over-represented codon pair at the boundary locations, particularly if no over-represented codon pair is present, can confirm or indicate the likelihood of an actual translational pause at that boundary location, leading to confirmation or indication of increased likelihood that the corresponding codon pair acts as a translational pause or slowing codon pair.
  • Such recurrence-based methods also can be used to confirm or indicate the likelihood that a codon pair, such as an over-represented codon pair, does not act as a translational pause or slowing codon pair.
  • a codon pair such as an over-represented codon pair
  • two or more proteins of the same species can have boundary locations between autonomous folding units, and consistent absence of a non-over-represented codon pair at the boundary locations can confirm or indicate increased likelihood that the codon pair does not act as a translational pause or slowing codon pair.
  • the predicted translational kinetics value for a codon pair can be refined according to empirical measurement of translational kinetics for a codon pair.
  • the influence of a codon pair on translational kinetics can be experimentally measured, and these experimental measurements can be used to refine or replace the predicted translational kinetics values for a codon pair.
  • Several methods of experimentally measuring the translational kinetics of a codon pair are known in the art, and can be used herein, as exemplified in Irwin et al, J. Biol. Chem., (1995) 270:22801.
  • One such exemplary assay is based on the observation that a ribosome pausing at a site near the beginning of an mRNA coding sequence can inhibit translation initiation by physically interfering with the attachment of a new ribosome to the message, and, thus, the codon pair to be assayed can be placed at the beginning of a polypeptide-encoding nucleotide sequence and the effect of the codon pair on translational initiation can be measured as an indication of the ability of the codon pair to cause a translational pause.
  • Another such exemplary assay is based on the fact that the transit time of a ribosome through the leader polypeptide coding region of the leader RNA of the trp operon sets the basal level of transcription through the trp attenuator, and, thus, the codon pair to be assayed can be placed into a trpLep leader polypeptide codon region, and level of expression can be inversely indicative of the translational pause properties of the codon pair, due to a faster translation causing formation of a stem-loop attenuator in the leader RNA, which results in transcriptional attenuation.
  • the methods provided herein for calculation of translational kinetics values can be applied to the native organism of the polypeptide of SEQ ID NO: 2, and also can be applied to a selected organism in which the polypeptide of SEQ ID NO: 2, or a modification thereof, is to be heterologously expressed.
  • the nucleotide sequence information of an organism can be used to calculate chi-squared values in accordance with the methods provided herein, and the translational kinetics values can be based on these chi-squared values as well as on additional translational kinetics information provided herein, including, but not limited to, codon pairs conserved in domain boundaries and empirically measured translational kinetics for a codon pair.
  • the translational kinetics data described herein can be combined in such a manner as to provide a refined translational kinetics value for a codon pair in a host organism.
  • Methods of combining predictive data to arrive at a refined predictive value are known in the art and can be used herein.
  • D) P(D
  • P(D) is constant for all H.
  • P(H) is identified with the degree of belief in hypothesis H before the data was observed.
  • H) read "the probability of D given H,” is identified with how well hypothesis H predicts the observed data D.
  • an hypothesis H is that a given sequence feature, e.g., a given codon pair, has utility for translational kinetics engineering, e.g., creates a translational pause site.
  • P(D]H) P(Dl & D2 & D3 & D4
  • H) P(D
  • an experimental measurement Dl that has been confirmed by replicate testing would have a very low probability of error, and therefore it would dominate the estimate if available.
  • P(Di is correct) and P(Di is not correct) can be estimated a priori by the correlation of Di with previous experimental measurements.
  • H) are obtained by observing whether or not hypothesis H is consistent with observed data item Di. More complex and powerful Bayesian approaches are also well known to the art. The fully general approach rewrites P(D
  • the translational kinetics values for a codon pair can be refined by consideration of, for example, chi-squared value of observed versus expected codon pair frequency and the degree to which codon pairs are conserved at predicted pause sites across different proteins in the same species, for example, at protein structure domain boundaries.
  • An over-represented codon pair which is present with above-random frequency at boundary locations between autonomous folding units of proteins in the same species can have a translational kinetics value reflecting higher predicted translational pause properties of the codon pair.
  • an over- represented codon pair which is present with below-random frequency at boundary locations between autonomous folding units of proteins in the same species can have a translational kinetics value reflecting lower predicted translational pause properties of the codon pair.
  • the translational kinetics values for a codon pair can be refined by consideration of, for example, experimentally measured translation step times in one species and the degree to which codon pairs that correspond to measured pause sites in the first species are conserved across homologous proteins in other species, for example, in a multiple sequence alignment.
  • an over-represented codon pair in another species is aligned with above-random frequency to a codon pair that corresponds to a measured translation pause site in the first species, it can have a translational kinetics value reflecting higher predicted translational pause properties of that codon pair in the other species.
  • an over-represented codon pair in another species when aligned with below-random frequency to a codon pair that corresponds to a measured translation pause site in the first species, it can have a translational kinetics value reflecting lower predicted translational pause properties of that codon pair in the other species.
  • translational kinetics values for codon pairs can be determined.
  • the translational kinetic values can be organized according to the likelihood of causing a translational pause or slowing based on any method known in the art.
  • the translational kinetic values for two or more codon pairs, up to all codon pairs, in an organism are determined, and the mean translational kinetics value and associated standard deviation are calculated. Based on this, the translational kinetics value for a particular codon pair can be described in terms of the multiple of standard deviations the translational kinetics value for the particular codon pair differs from the mean translational kinetics value. Accordingly, reference herein to mean translational kinetics values and standard deviations, whether or not applied to a particular expression of translational kinetics value, can be applied to any of a variety of expressions of translational kinetics values provided herein.
  • Such a graphical display provides a visual display of the predicted translational influence, including translational pause or slowing for numerous or all codon pairs of a polypeptide-encoding nucleotide sequence.
  • This visual display can be used in methods of modifying polypeptide-encoding nucleotide sequences in order to thereby modify the predicted translational kinetics of the mRNA into polypeptide in methods such as those provided herein.
  • the graphical displays can be used to identify one or more codon pairs to be modified in a polypeptide-encoding nucleotide sequence.
  • the graphical displays can be used in analyzing a polypeptide-encoding nucleotide sequence prior to modifying the polypeptide-encoding nucleotide sequence, or can be used in analyzing a modified polypeptide-encoding nucleotide sequence to determine, for example, whether or not further modifications are desired.
  • Methods for creating and using graphical displays can be performed according to any method known in the art, as exemplified in U.S. Patent Publication No. 2007/0298503, published on December 27, 2007, and U.S. Patent Publication No. 2007/0275399, published on November 29, 2007, which are incorporated by reference herein in their entireties.
  • graphical displays as described therein can be created to illustrate the translational kinetics of an original or redesigned polypeptide- encoding nucleotide sequence in the native or a heterologous organism, or to illustrate differences and/or similarities of translation kinetic of a polypeptide-encoding nucleotide sequence in which one or more codon pairs have been modified.
  • numerous normalized graphical displays can be created to illustrate differences and/or similarities of translation kinetics of a polypeptide-encoding nucleotide sequence when expressed in two or more different organisms.
  • the graphical displays can be created using translational kinetics values based on any of the methods for determining translational kinetics values provided herein or otherwise known in the art. For example, chi-squared as a function of codon pair position, chi-squared 2 as a function of codon position, or chi-squared 3 as a function of codon pair position, translational kinetics values thereof, empirical measurement of translational pause of codon pairs in a host organism, estimated translational pause capability based on observed presence and/or recurrence of a codon pair at predicted pause site, and variations and combinations thereof as provided herein.
  • the exact format of the graphical displays can take any of a variety of forms, and the specific form is typically selected for ease of analysis and comparison between plots.
  • the abscissa typically lists the position along the nucleotide sequence or polypeptide sequence, and can be represented by nucleotide position, codon position, codon pair position, amino acid position, or amino acid pair position.
  • the ordinate typically lists the translational kinetics value of the codon pair, such as, but not limited to, a translational kinetics value of codon pair frequency, including, but not limited to the z score of chisql, the z score of chisq2, the z score of chisq3, the empirically measured value, and the refined translational kinetics value.
  • the sequence position can be plotted along the ordinate and the translational kinetics value can be plotted along the abscissa.
  • a set of graphical displays including at least a first graphical display and a second graphical display, are prepared. These sets of displays can be compared in order to determine the difference in predicted translational efficiency or translational kinetics of the two plots.
  • the plots can differ according to any of a variety of criteria. For example, each plot can represent a different polypeptide-encoding nucleotide sequence, each plot can represent a different host organism, each plot can represent differently determined translational kinetics values, or any combination thereof.
  • any number of different graphical displays can be compared in accordance with the methods provided herein, for example, 2, 3, 4, 5, 6, 7, 8 or more different graphical displays can be compared.
  • two plots will represent different polypeptide-encoding nucleotide sequences, the same sequence in different host organisms, or different sequences in different host organisms.
  • Comparison of different graphical displays can be used to analyze the predicted change in translational kinetics as a result of the difference represented by the graphical displays. For example, comparison of the same polypeptide-encoding nucleotide sequence in different host organisms can be used to analyze any predicted transcriptional pauses that can be removed. Accordingly, provided herein are methods of analyzing translational kinetics of an mRNA into polypeptide in a host organism by comparing two graphical displays to understand or predict the differences in translational kinetics of the mRNA into polypeptide, where the differences in the graphical displays can be as a result of, for example, a difference in the polypeptide-encoding nucleotide sequence or a difference in the host organism.
  • a graphical display of the translational kinetics values of codon pairs for the original polypeptide- encoding nucleotide sequence in the heterologous host can be compared to a graphical display of the translational kinetics values of codon pairs for a modified polypeptide- encoding nucleotide sequence in the heterologous host, and it can be determined whether or not the modification to the polypeptide-encoding nucleotide sequence resulted in improved translational kinetics.
  • an expression system comprising an expression vector in a host organism, wherein the expression vector includes a DNA sequence of the embodiments provided herein operably linked to an expression control sequence.
  • an expression vector is a DNA or RNA vector that is capable of transforming a host cell and of effecting expression of a specified nucleic acid molecule.
  • the expression vector is also capable of replicating within the host cell.
  • Expression vectors can be either prokaryotic or eukaryotic, and are typically viruses or plasmids.
  • operably linked refers to functional linkage between a nucleic acid expression control sequence (such as a promoter, or array of transcription factor binding sites) and a second nucleic acid sequence, wherein the expression control sequence directs transcription of the nucleic acid corresponding to the second sequence.
  • An operably linked expression vector can also include secretion signals and other modifying sequences, and may encode chaperones and proteins for a variety of organisms and systems.
  • Methods of expressing polypeptides from polypeptide-encoding nucleotide sequences are known in the art, as exemplified, for example, by the techniques described in Maniatis et al., 1989, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, N. Y. and Ausubel et al., 2006, Current Protocols in Molecular Biology, Greene Publishing Associates and Wiley Interscience, N.Y.
  • the methods include inserting a polypeptide- encoding nucleotide sequence designed by the methods provided herein into a cell, and expressing the polypeptide-encoding nucleotide sequence under conditions suitable for gene expression. Additionally provided expression methods include cell-free expression systems as known in the art, where such methods include providing a polypeptide- encoding nucleotide sequence designed by the methods provided herein and contacting the polypeptide-encoding nucleotide sequence with a cell-free expression system under conditions suitable for protein translation.
  • one or more, or all of the enzymes are heterologous to the one or more host organisms.
  • the translational kinetics of each of the DNA sequences encoding the enzymes has been increased by silent permutation or conservative amino acid substitution of at least 1, 2, or 3 codon pairs present in the original sequence for each enzyme.
  • a silent permutation is a change to one or more nucleotides of a codon such that the encoded amino acid does not change.
  • the at least 1 , 2 or 3 substituted codon pairs are predicted to cause a translational pause or slowing in the host organism, and the substituting codon pair is typically a codon pair not predicted to cause a translational pause or slowing in the host organism.
  • the one or more host organisms are selected from the group consisting of: Saccharomyces cerevisiae, Pichia pastoris, Escherichia coli, Bombyx mori, Spodoptera frugiperda, Drosophila melanogaster, Kluyveromyces lactis, Zymomonas mobilis and Schizosaccharomyces pombe.
  • each encoded enzyme in the system has at least a 50%, 60%, 70%, 80%, and more typically at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% amino acid sequence identity to the with the original sequence of the enzyme.
  • one or more of the endo-l,4- ⁇ -glucanase, exo-l,4- ⁇ - D-glucanase, and ⁇ -D-glucosidase enzymes in the system retains at least 75% of the enzymatic activity of the enzyme encoded by the original sequence under conditions suitable for degradation of cellulose.
  • Methods for measuring the activity of the enzymes in the system are known in the art.
  • the incorporated materials of U.S. Patent No. 6,566,113 provide methods for measuring the activity of cellobiohydrolases that have been recombinantly expressed.
  • Also provided are methods of hydrolyzing a carbohydrate comprising providing a carbohydrate comprising at least one glycosidic bond, providing a polypeptide encoded by any of the polynucleotides provided herein, and contacting said carbohydrate with said polypeptide under conditions that permit said polypeptide to hydrolyze at least one glycosidic bond of said carbohydrate, whereby at least one glycosidic bond of said carbohydrate is hydrolyzed.
  • the carbohydrate is cellulose.
  • the carbohydrate comprises two or more ⁇ -l,4-linked glucose units.
  • Such methods can be performed using the cells and systems provided herein. Such methods can be performed in order to provide smaller polysaccharides and/or monosaccharides which can be used by a cell or processed extracellularly according to any one of a variety of known methods in the art.
  • This example describes optimization of a DNA sequence encoding TrCBH-II for expression in yeast.
  • the chi-squared value "chisql” was generated by the expected and observed values determined.
  • the chsql was recalculated to remove any influence of non-randomness in amino acid pair frequencies, yielding "chisq2.”
  • the chsq2 was re-calculated to remove any influence of non- randomness in dinucleotide frequencies, yielding "chisq3.”
  • z scores of chisq3 were calculated by determining the mean chisq3 value and corresponding standard deviation for all codon pairs, and normalizing each chisq3 value to be reported in terms of number of standard deviations from the mean chisq3 values.
  • the nucleotide sequence for the gene encoding the TrCBH-II protein was modified to optimize codon usage for S. cerevisiae.
  • the DNA sequence encoding TrCBH-II (SEQ ID NO: 1) was derived from GenBank accession number M16190 by removing untranslated sequence (5' untranslated region and introns).
  • a graphical display for the native gene (SEQ ID NO: 1) encoding the TrCBH-II protein (SEQ ID NO: 2) in T. reesei was prepared by plotting z scores of translational kinetics values for codon pair utilization in T. reesei as a function of codon pair position. The graphical display is provided in Figure 1.
  • a graphical display for the native gene (SEQ ID NO: 1) encoding the TrCBH-II protein (SEQ ID NO: 2) in S. cerevisiae was prepared by plotting z scores of translational kinetics values for codon pair utilization in S. cerevisiae as a function of codon pair position.
  • the graphical display is provided in Figure 2A.
  • the nucleotide sequence for the gene encoding the TrCBH-II protein was modified to no longer contain codon pairs having z scores in S. cerevisiae greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 3) was found to encode a protein (SEQ ID NO: 4) with 100% amino acid sequence identity to wild-type TrCBH-II (SEQ ID NO: 2).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 3) encoding the TrCBH-II protein (SEQ ID NO: 4) expressed in S. cerevisiae was prepared by plotting z scores of translational kinetics values for codon pair utilization in S. cerevisiae as a function of codon pair position. The graphical display is provided in Figure 2B.
  • This example describes optimization of a DNA sequence encoding TrCBH-II for expression in bacteria.
  • Chi-squared values for E. coli were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for E. coli were obtained from GenBank sequence database (75,096 codon pairs in 237 sequences for E. coli) to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi-squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the TrCBH-II protein was modified to optimize codon usage for E. coli.
  • a graphical display for the native gene (SEQ ID NO: 1) encoding the TrCBH-II protein (SEQ ID NO: 2) in E. coli was prepared by plotting z scores of translational kinetics values for codon pair utilization in E. coli as a function of codon pair position. The graphical display is provided in Figure 3 A.
  • the nucleotide sequence for the gene encoding the TrCBH-II protein was modified to no longer contain codon pairs having z scores in E. coli greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 9) was found to encode a protein (SEQ ID NO: 10) with 100% amino acid sequence identity to wild-type TrCBH-II (SEQ ID NO: 2).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 9) encoding the TrCBH-II protein (SEQ ID NO: 10) expressed in E. coli was prepared by plotting z scores of translational kinetics values for codon pair utilization in E. coli as a function of codon pair position. The graphical display is provided in Figure 3B.
  • This example describes optimization of a DNA sequence encoding TrCBH-II for expression in P. pastoris.
  • Chi-squared values for P. pastoris were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for P. pastoris were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi- squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the TrCBH-II protein was modified to optimize codon usage for P. pastoris.
  • a graphical display for the native gene (SEQ ID NO: 1) encoding the TrCBH-II protein (SEQ ID NO: 2) in P. pastoris was prepared by plotting z scores of translational kinetics values for codon pair utilization in P. pastoris as a function of codon pair position. The graphical display is provided in Figure 4A.
  • the nucleotide sequence for the gene encoding the TrCBH-II protein was modified to no longer contain codon pairs having z scores in P. pastoris greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 15) was found to encode a protein (SEQ ID NO: 16) with 100% amino acid sequence identity to wild-type TrCBH-II (SEQ ID NO: 2).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 15) encoding the TrCBH-II protein (SEQ ID NO: 16) expressed in P. pastoris was prepared by plotting z scores of translational kinetics values for codon pair utilization in P. pastoris as a function of codon pair position. The graphical display is provided in Figure 4B.
  • This example describes optimization of a DNA sequence encoding TrCBH-II for expression in K. lactis.
  • Chi-squared values for K. lactis were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for K. lactis were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi-squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the TrCBH-II protein was modified to optimize codon usage for K. lactis.
  • a graphical display for the native gene (SEQ ID NO: 1) encoding the TrCBH-II protein (SEQ ID NO: 2) in K. lactis was prepared by plotting z scores of translational kinetics values for codon pair utilization in K. lactis as a function of codon pair position. The graphical display is provided in Figure 5A.
  • the nucleotide sequence for the gene encoding the TrCBH-II protein was modified to no longer contain codon pairs having z scores in K. lactis greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 15) was found to encode a protein (SEQ ID NO: 16) with 100% amino acid sequence identity to wild-type TrCBH-II (SEQ ID NO: 2).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 15) encoding the TrCBH-II protein (SEQ ID NO: 16) expressed in K. lactis was prepared by plotting z scores of translational kinetics values for codon pair utilization in K. lactis as a function of codon pair position. The graphical display is provided in Figure 5B.
  • This example describes optimization of a DNA sequence encoding TrCBH-II for expression in Z. mobilis.
  • Chi-squared values for Z. mobilis were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for Z. mobilis were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi- squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the TrCBH-II protein was modified to optimize codon usage for Z mobilis.
  • a graphical display for the native gene (SEQ ID NO: 1) encoding the TrCBH-II protein (SEQ ID NO: 2) in Z mobilis was prepared by plotting z scores of translational kinetics values for codon pair utilization in Z mobilis as a function of codon pair position. The graphical display is provided in Figure 6A.
  • the nucleotide sequence for the gene encoding the TrCBH-II protein was modified to no longer contain codon pairs having z scores in Z mobilis greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 15) was found to encode a protein (SEQ ID NO: 16) with 100% amino acid sequence identity to wild-type TrCBH-II (SEQ ID NO: 2).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 15) encoding the TrCBH-II protein (SEQ ID NO: 16) expressed in Z mobilis was prepared by plotting z scores of translational kinetics values for codon pair utilization in Z mobilis as a function of codon pair position. The graphical display is provided in Figure 6B.
  • An overnight culture is inoculated at 1 :100 into 5 ml of LB medium plus lOO ⁇ g/ml ampicillin and grown at 37°C to OD 6O0 of 0.5. Protein expression is induced by addition of 0.002 or 0.02% L-arabinose and grown for 3hrs at 37°C.
  • Cells are harvested by centrifugation and the cell pellets are resuspended in phosphate buffered saline. Cells are disrupted by sonication and supernatant and pellet fractions are resolved in a 4-20% SDS-polyacrylamide gel (Pierce).
  • Proteins are transferred to Immobilon-P (Millipore, Bedford, MA) and are incubated with rabbit polyclonal anti-CBH-II antibody diluted 1 :20,000. Rabbit IgG is visualized using a HRP- conjugated secondary antibody and ECL + Plus (Amersham, Buckinghamshire, UK) according to manufacturer's instructions.
  • Western blot analysis demonstrates that changes to a polypeptide- encoding nucleic acid sequence can increase expression of the polypeptide, particularly when the polypeptide is heterologously expressed.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Medicinal Chemistry (AREA)
  • Biomedical Technology (AREA)
  • General Chemical & Material Sciences (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Tropical Medicine & Parasitology (AREA)
  • Virology (AREA)
  • Molecular Biology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Enzymes And Modification Thereof (AREA)

Abstract

La présente invention concerne des séquences de polynucléotides et des gènes synthétiques codant pour des enzymes cellobiohydrolases pour l'expression dans un organisme hôte avec une cinétique traductionnelle améliorée et/ou raffinée, et des procédés pour les préparer. Le nucléotide codant pour la cellobiohydrolase résultant devrait être traduit rapidement sur toute sa longueur. L'expression du nucléotide codant pour la cellobiohydrolase résultant devrait générer des niveaux d'expression protéique améliorés dans les cas où des pauses de traduction inappropriées ou excessives réduisent l'expression de la protéine. De plus, l'expression du nucléotide codant pour la cellobiohydrolase résultant devrait générer des niveaux d'expression des polypeptides actifs et/ou repliés sous forme native et fonctionnels améliorés dans les cas où des pauses de traduction inappropriées ou excessives induisent l'expression d'une cellobiohydrolase inactive, insoluble ou dysfonctionnelle ou très peu active.
PCT/US2008/062957 2007-05-07 2008-05-07 Séquences de nucléotides codant pour la cellobiohydrolase ayant une cinétique traductionnelle raffinée et procédés pour leur préparation WO2008137958A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US91637507P 2007-05-07 2007-05-07
US60/916,375 2007-05-07

Publications (1)

Publication Number Publication Date
WO2008137958A1 true WO2008137958A1 (fr) 2008-11-13

Family

ID=39684161

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/062957 WO2008137958A1 (fr) 2007-05-07 2008-05-07 Séquences de nucléotides codant pour la cellobiohydrolase ayant une cinétique traductionnelle raffinée et procédés pour leur préparation

Country Status (1)

Country Link
WO (1) WO2008137958A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010066411A2 (fr) * 2008-12-10 2010-06-17 Direvo Industrial Biotechnology Gmbh Polypeptides à activité de cellobiohydrolase ii
WO2011051806A3 (fr) * 2009-10-26 2011-06-23 Riaan Den Haan Expression hétérologue de gènes fongiques de la cellobiohydrolase 2 présents dans la levure
CN106650307A (zh) * 2016-09-21 2017-05-10 武汉伯远生物科技有限公司 一种基于密码子对使用频度的基因密码子优化方法

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007130606A2 (fr) * 2006-05-04 2007-11-15 The Regents Of The University Of California Analyse de cinétique translationnelle utilisant des afficheurs graphiques de valeurs cinétiques translationnelles de paires de codon

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007130606A2 (fr) * 2006-05-04 2007-11-15 The Regents Of The University Of California Analyse de cinétique translationnelle utilisant des afficheurs graphiques de valeurs cinétiques translationnelles de paires de codon

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
DATABASE Geneseq [online] 25 August 2005 (2005-08-25), "T. reesei cellobiohydrolase II (Cel6A) cDNA.", XP002492982, retrieved from EBI accession no. GSN:AEA49934 Database accession no. AEA49934 *
GUTMAN G A ET AL: "Nonrandom utilization of codon pairs in Escherichia coli", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF USA, NATIONAL ACADEMY OF SCIENCE, WASHINGTON, DC, vol. 86, no. 10, 1 May 1989 (1989-05-01), pages 3699 - 3703, XP002460057, ISSN: 0027-8424 *
HATFIELD G WESLEY ET AL: "Optimizing scaleup yield for protein production: Computationally Optimized DNA Assembly (CODA) and Translation Engineeringtrade mark", BIOTECHNOLOGY ANNUAL REVIEW, XX, XX, vol. 13, 1 January 2007 (2007-01-01), pages 27 - 42, XP009092735 *
IRWIN B ET AL: "Codon pair utilization biases influence translational elongation step times", JOURNAL OF BIOLOGICAL CHEMISTRY, AMERICAN SOCIETY OF BIOLOCHEMICAL BIOLOGISTS, BIRMINGHAM,; US, vol. 270, no. 39, 29 September 1995 (1995-09-29), pages 22801 - 22806, XP002406003, ISSN: 0021-9258 *
THANARAJ T A ET AL: "RIBOSOME-MEDIATED TRANSLATIONAL PAUSE AND PROTEIN DOMAIN ORGANIZATION", PROTEIN SCIENCE, CAMBRIDGE UNIVERSITY PRESS, CAMBRIDGE, GB, vol. 5, no. 8, 1 January 1996 (1996-01-01), pages 1594 - 1612, XP009014613, ISSN: 0961-8368 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010066411A2 (fr) * 2008-12-10 2010-06-17 Direvo Industrial Biotechnology Gmbh Polypeptides à activité de cellobiohydrolase ii
WO2010066411A3 (fr) * 2008-12-10 2010-09-16 Direvo Industrial Biotechnology Gmbh Polypeptides à activité de cellobiohydrolase ii
US8409839B2 (en) 2008-12-10 2013-04-02 Direvo Industrial Biotechnology Gmbh Polypeptides having cellobiohydrolase II activity
EP2626421A1 (fr) * 2008-12-10 2013-08-14 Direvo Industrial Biotechnology GmbH Enzymes améliorées pour conversion de biomasse
WO2011051806A3 (fr) * 2009-10-26 2011-06-23 Riaan Den Haan Expression hétérologue de gènes fongiques de la cellobiohydrolase 2 présents dans la levure
CN102666849A (zh) * 2009-10-26 2012-09-12 斯泰伦博斯大学 真菌纤维二糖水解酶2基因在酵母中的异源表达
US9447398B2 (en) 2009-10-26 2016-09-20 Stellenbosch University Heterologous expression of fungal cellobiohydrolase 2 genes in yeast
US10196622B2 (en) 2009-10-26 2019-02-05 Stellenbosch University Heterologous expression of fungal cellobiohydrolase 2 genes in yeast
CN106650307A (zh) * 2016-09-21 2017-05-10 武汉伯远生物科技有限公司 一种基于密码子对使用频度的基因密码子优化方法
CN106650307B (zh) * 2016-09-21 2019-04-05 武汉伯远生物科技有限公司 一种基于密码子对使用频度的基因密码子优化方法

Similar Documents

Publication Publication Date Title
US20200157517A1 (en) Methods for enhancing the degradation or conversion of cellulosic material
Garvey et al. Cellulases for biomass degradation: comparing recombinant cellulase expression platforms
US20100028966A1 (en) Methods and Compositions for Improving The production Of Products In Microorganisms
US20120164709A1 (en) Recombinant beta-glucosidase variants for production of soluble sugars from cellulosic biomass
US20220279818A1 (en) Enzyme blends and processes for producing a high protein feed ingredient from a whole stillage byproduct
WO2010075529A2 (fr) Expression d'une enzyme de dégradation de biomasse hétérologue dans thermoanaerobacterium saccharolyticum
DK2553093T3 (en) Cellobiohydrolase variants and polynucleotides encoding them
CA2689910A1 (fr) Compositions pour degrader de la matiere cellulosique
WO2010005553A1 (fr) Isolement et caractérisation de cellobiohydrolase i (cbh 1) de schizochytrium aggregatum
WO2010096562A2 (fr) Cellules de levure exprimant un cellulosome exogène et procédés d'utilisation de celles-ci
US9476077B2 (en) Fungal beta-xylosidase variants
US20110294184A1 (en) Cellulolytic polypeptides and their use in micro-organisms for the production of solvents and fuels
CN111094562A (zh) 具有海藻糖酶活性的多肽及其在产生发酵产物的方法中的用途
EP3000880B1 (fr) Expression d'enzymes beta-xylosidases recombinantes
Datta Recent strategies to overexpress and engineer cellulases for biomass degradation
WO2008137958A1 (fr) Séquences de nucléotides codant pour la cellobiohydrolase ayant une cinétique traductionnelle raffinée et procédés pour leur préparation
EP2990482B1 (fr) Xylanase thermostable appartenant à la famille gh 10
US9580702B2 (en) Thermostable cellobiohydrolase and amino acid substituted variant thereof
WO2009005564A2 (fr) Séquences nucléotidiques codant pour l'enzyme dégradant la cellulose et l'hémicellulose et ayant une cinétique traductionnelle raffinée, et procédé de production correspondant
US10808235B2 (en) Beta-glucosidase and uses thereof
AU5590600A (en) Recombinant hosts suitable for simultaneous saccharification and fermentation
WO2014160402A1 (fr) Co-conversion de glucides en produits de fermentation dans une étape de fermentation unique
EP3330375A1 (fr) Expression d'enzymes bêta-xylosidases recombinantes
JP6650315B2 (ja) 耐熱性セロビオハイドロラーゼ
US9890370B2 (en) Hyperthermostable endoglucanase

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08755135

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08755135

Country of ref document: EP

Kind code of ref document: A1