[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

US20150246961A1 - Method for the expression of polypeptides using modified nucleic acids - Google Patents

Method for the expression of polypeptides using modified nucleic acids Download PDF

Info

Publication number
US20150246961A1
US20150246961A1 US14/517,516 US201414517516A US2015246961A1 US 20150246961 A1 US20150246961 A1 US 20150246961A1 US 201414517516 A US201414517516 A US 201414517516A US 2015246961 A1 US2015246961 A1 US 2015246961A1
Authority
US
United States
Prior art keywords
amino acid
codons
codon
seq
acid residue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/517,516
Inventor
Stefan Klostermann
Erhard Kopetzki
Ursula Schwarz
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hoffmann La Roche Inc
Original Assignee
Hoffmann La Roche Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=48184161&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=US20150246961(A1) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by Hoffmann La Roche Inc filed Critical Hoffmann La Roche Inc
Assigned to HOFFMANN-LA ROCHE INC. reassignment HOFFMANN-LA ROCHE INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: F. HOFFMANN-LA ROCHE AG
Assigned to ROCHE DIAGNOSTICS GMBH reassignment ROCHE DIAGNOSTICS GMBH ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KOPETZKI, ERHARD, SCHWARZ, URSULA, KLOSTERMANN, STEFAN
Assigned to F. HOFFMANN-LA ROCHE AG reassignment F. HOFFMANN-LA ROCHE AG ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ROCHE DIAGNOSTICS GMBH
Publication of US20150246961A1 publication Critical patent/US20150246961A1/en
Priority to US15/833,012 priority Critical patent/US20180100006A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K16/00Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/67General methods for enhancing the expression
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2317/00Immunoglobulins specific features
    • C07K2317/10Immunoglobulins specific features characterized by their source of isolation or production
    • C07K2317/14Specific host cells or culture conditions, e.g. components, pH or temperature

Definitions

  • the methods as reported herein are in the field of optimization of a polypeptide encoding nucleic acid and improved expression of a polypeptide encoded by a nucleic acid optimized with the method as reported herein.
  • the polypeptide encoding nucleic acid is characterized in that each amino acid is encoded by a group of codons, whereby each codon in the group of codons is defined by a specific usage frequency within the group that is related to the overall usage frequency of this codon in the genome of the cell, and whereby the usage frequency of the codons in the (total) polypeptide encoding nucleic acid is about the same as the usage frequency within the respective group.
  • One aspect as reported herein is a method for recombinantly producing a polypeptide in a cell comprising the step of cultivating a cell which comprises a nucleic acid encoding the polypeptide, and recovering the polypeptide from the cell or the cultivation medium,
  • amino acid residues G, A, V, L, I, P, F, S, T, N, Q, Y, C, K, R, H, D, and E are each encoded by a group of codons and the amino acid residues M and W are encoded by a single codon.
  • amino acid residues G, A, V, L, I, P, F, S, T, N, Q, Y, C, K, R, H, D, and E are each encoded by a group of codons comprising at least two codons and the amino acid residues M and W are encoded by a single codon.
  • the specific usage frequency of a codon is 100% if the amino acid residue is encoded by exactly one codon.
  • the amino acid residue G is encoded by a group of at most 4 codons.
  • the amino acid residue A is encoded by a group of at most 4 codons.
  • the amino acid residue V is encoded by a group of at most 4 codons.
  • the amino acid residue L is encoded by a group of at most 6 codons.
  • the amino acid residue I is encoded by a group of at most 3 codons.
  • the amino acid residue M is encoded by exactly 1 codon.
  • the amino acid residue P is encoded by a group of at most 4 codons.
  • the amino acid residue F is encoded by a group of at most 2 codons.
  • the amino acid residue W is encoded by exactly 1 codon.
  • the amino acid residue S is encoded by a group of at most 6 codons.
  • the amino acid residue T is encoded by a group of at most 4 codons.
  • the amino acid residue N is encoded by a group of at most 2 codons.
  • the amino acid residue Q is encoded by a group of at most 2 codons.
  • the amino acid residue Y is encoded by a group of at most 2 codons.
  • the amino acid residue C is encoded by a group of at most 2 codons.
  • the amino acid residue K is encoded by a group of at most 2 codons.
  • amino acid residue R is encoded by a group of at most 6 codons. In one embodiment the amino acid residue H is encoded by a group of at most 2 codons. In one embodiment the amino acid residue D is encoded by a group of at most 2 codons. In one embodiment the amino acid residue E is encoded by a group of at most 2 codons.
  • the amino acid residue G is encoded by a group of 1 to 4 codons.
  • the amino acid residue A is encoded by a group of 1 to 4 codons.
  • the amino acid residue V is encoded by a group of 1 to 4 codons.
  • the amino acid residue L is encoded by a group of 1 to 6 codons.
  • the amino acid residue I is encoded by a group of 1 to 3 codons.
  • the amino acid residue M is encoded by a group of 1 codon, i.e. by exactly 1 codon.
  • the amino acid residue P is encoded by a group of 1 to 4 codons.
  • the amino acid residue F is encoded by a group of 1 to 2 codons.
  • the amino acid residue W is encoded by a group of 1 codon, i.e. by exactly 1 codon.
  • the amino acid residue S is encoded by a group of 1 to 6 codons.
  • the amino acid residue T is encoded by a group of 1 to 4 codons.
  • the amino acid residue N is encoded by a group of 1 to 2 codons.
  • the amino acid residue Q is encoded by a group of 1 to 2 codons.
  • the amino acid residue Y is encoded by a group of 1 to 2 codons.
  • the amino acid residue C is encoded by a group of 1 to 2 codons.
  • amino acid residue K is encoded by a group of 1 to 2 codons.
  • amino acid residue R is encoded by a group of 1 to 6 codons.
  • amino acid residue H is encoded by a group of 1 to 2 codons.
  • amino acid residue D is encoded by a group of 1 to 2 codons.
  • amino acid residue E is encoded by a group of 1 to 2 codons.
  • each of the groups comprises only codons with an overall usage frequency within the genome of the cell of more than 5%. In one embodiment each of the groups comprises only codons with an overall usage frequency within the genome of the cell of 8% or more. In one embodiment each of the groups comprises only codons with an overall usage frequency within the genome of the cell of 10% or more. In one embodiment each of the groups comprises only codons with an overall usage frequency within the genome of the cell of 15% or more.
  • sequence of codons in the nucleic acid encoding the polypeptide for a specific amino acid residue in 5′ to 3′ direction is, i.e. corresponds to, the sequence of codons in a respective amino acid codon motif.
  • the encoding nucleic acid comprises the codon that is the same as that at the corresponding sequential position in the amino acid codon motif of the respective specific amino acid.
  • the usage frequency of a codon in the amino acid codon motif is about the same as its specific usage frequency within its group.
  • the encoding nucleic acid comprises the codon that is at the first position of the amino acid codon motif.
  • the codons in the amino acid codon motif are distributed randomly throughout the amino acid codon motif.
  • the amino acid codon motif is selected from a group of amino acid codon motifs comprising all possible amino acid codon motifs obtainable by permutating codons therein wherein all motifs have the same number of codons and the codons in each motif have the same specific usage frequency.
  • codons in the amino acid codon motif are arranged with decreasing specific usage frequency whereby all codons of one usage frequency directly succeed each other. In one embodiment the codons of one codon usage frequence are grouped together.
  • the (different) codons in the amino acid codon motif are distributed uniformly throughout the amino acid codon motif.
  • the codons in the amino acid codon motif are arranged with decreasing specific usage frequency whereby after the codon with the lowest specific usage frequency or the codon with the second lowest specific usage frequency the codon with the highest specific usage frequency is present (used).
  • the codons in the amino acid codon motif are arranged with decreasing specific usage frequency whereby after the codon with the lowest specific usage frequency the codon with the highest specific usage frequency is present (used).
  • the cell is a prokaryotic cell.
  • the prokaryotic cell is an E. coli cell.
  • the cell is a eukaryotic cell is selected from a CHO cell, a BHK cell, a HEK cell, a SP2/0 cell, or a NS0 cell.
  • the eukaryotic cell is a CHO cell.
  • polypeptide is an antibody, or an antibody fragment, or an antibody fusion polypeptide.
  • One aspect as reported herein is a nucleic acid encoding a polypeptide, characterized in that each of the amino acid residues of the polypeptide is encoded by one or more (at least one) codon(s),
  • amino acid residues G, A, V, L, I, P, F, S, T, N, Q, Y, C, K, R, H, D, and E are each encoded by a group of codons and the amino acid residues M and W are encoded by a single codon.
  • amino acid residues G, A, V, L, I, P, F, S, T, N, Q, Y, C, K, R, H, D, and E are each encoded by a group of codons comprising at least two codons and the amino acid residues M and W are encoded by a single codon.
  • the specific usage frequency of a codon is 100% if the amino acid residue is encoded by exactly one codon.
  • the amino acid residue G is encoded by a group of at most 4 codons.
  • the amino acid residue A is encoded by a group of at most 4 codons.
  • the amino acid residue V is encoded by a group of at most 4 codons.
  • the amino acid residue L is encoded by a group of at most 6 codons.
  • the amino acid residue I is encoded by a group of at most 3 codons.
  • the amino acid residue M is encoded by exactly 1 codon.
  • the amino acid residue P is encoded by a group of at most 4 codons.
  • the amino acid residue F is encoded by a group of at most 2 codons.
  • the amino acid residue W is encoded by exactly 1 codon.
  • the amino acid residue S is encoded by a group of at most 6 codons.
  • the amino acid residue T is encoded by a group of at most 4 codons.
  • the amino acid residue N is encoded by a group of at most 2 codons.
  • the amino acid residue Q is encoded by a group of at most 2 codons.
  • the amino acid residue Y is encoded by a group of at most 2 codons.
  • the amino acid residue C is encoded by a group of at most 2 codons.
  • the amino acid residue K is encoded by a group of at most 2 codons.
  • amino acid residue R is encoded by a group of at most 6 codons. In one embodiment the amino acid residue H is encoded by a group of at most 2 codons. In one embodiment the amino acid residue D is encoded by a group of at most 2 codons. In one embodiment the amino acid residue E is encoded by a group of at most 2 codons.
  • the amino acid residue G is encoded by a group of 1 to 4 codons.
  • the amino acid residue A is encoded by a group of 1 to 4 codons.
  • the amino acid residue V is encoded by a group of 1 to 4 codons.
  • the amino acid residue L is encoded by a group of 1 to 6 codons.
  • the amino acid residue I is encoded by a group of 1 to 3 codons.
  • the amino acid residue M is encoded by a group of 1 codon.
  • the amino acid residue P is encoded by a group of 1 to 4 codons.
  • the amino acid residue F is encoded by a group of 1 to 2 codons.
  • the amino acid residue W is encoded by a group of 1 codon.
  • the amino acid residue S is encoded by a group of 1 to 6 codons.
  • the amino acid residue T is encoded by a group of 1 to 4 codons.
  • the amino acid residue N is encoded by a group of 1 to 2 codons.
  • the amino acid residue Q is encoded by a group of 1 to 2 codons.
  • the amino acid residue Y is encoded by a group of 1 to 2 codons.
  • the amino acid residue C is encoded by a group of 1 to 2 codons.
  • the amino acid residue K is encoded by a group of 1 to 2 codons.
  • amino acid residue R is encoded by a group of 1 to 6 codons.
  • amino acid residue H is encoded by a group of 1 to 2 codons.
  • amino acid residue D is encoded by a group of 1 to 2 codons.
  • amino acid residue E is encoded by a group of 1 to 2 codons.
  • each of the groups comprises only codons with an overall usage frequency within the genome of the cell of more than 5%. In one embodiment each of the groups comprises only codons with an overall usage frequency within the genome of the cell of 8% or more. In one embodiment each of the groups comprises only codons with an overall usage frequency within the genome of the cell of 10% or more. In one embodiment each of the groups comprises only codons with an overall usage frequency within the genome of the cell of 15% or more.
  • sequence of codons in the nucleic acid encoding the polypeptide for a specific amino acid residue in 5′ to 3′ direction is, i.e. corresponds to, the sequence of codons in a respective amino acid codon motif.
  • the encoding nucleic acid comprises the codon that is the same as that at the corresponding sequential position in the amino acid codon motif of the respective specific amino acid.
  • the usage frequency of a codon in the amino acid codon motif is about the same as its specific usage frequency within its group.
  • the encoding nucleic acid comprises the codon that is at the first position of the amino acid codon motif.
  • each of the codons in the amino acid codon motif is distributed randomly throughout the amino acid codon motif.
  • each of the codons in the amino acid codon motif is distributed evenly throughout the amino acid codon motif.
  • the codons in the amino acid codon motif are arranged with decreasing specific usage frequency whereby after the codon with the lowest specific usage frequency or the codon with the second lowest specific usage frequency the codon with the highest specific usage frequency is used.
  • the codons in the amino acid codon motif are arranged with decreasing specific usage frequency whereby after the codon with the lowest specific usage frequency the codon with the highest specific usage frequency is used.
  • One aspect as reported herein is a cell comprising a nucleic acid as reported herein.
  • One aspect as reported herein is a method for increasing the expression of a polypeptide in a prokaryotic cell or a eukaryotic cell comprising the step of,
  • FIG. 1 shows the Western blot of the polypeptide containing supernatants of differently encoded poly-His-tagged test-polypeptide.
  • FIG. 2 shows the Western blot of the SDS-extracted cell pellet of differently encoded poly-His-tagged test-polypeptide.
  • FIG. 3 shows the protein reference standard curve obtained from five known scFv-poly-His concentration.
  • amino acid denotes the group of carboxy ⁇ -amino acids, which directly or in form of a precursor can be encoded by a nucleic acid.
  • the individual amino acids are encoded by nucleic acids consisting of three nucleotides, so called codons or base-triplets. Each amino acid is encoded by at least one codon. The encoding of the same amino acid by different codons is known as “degeneration of the genetic code”.
  • amino acid denotes the naturally occurring carboxy ⁇ -amino acids and is comprising alanine (three letter code: ala, one letter code: A), arginine (arg, R), asparagine (asn, N), aspartic acid (asp, D), cysteine (cys, C), glutamine (gln, Q), glutamic acid (glu, E), glycine (gly, G), histidine (his, H), isoleucine (ile, I), leucine (leu, L), lysine (lys, K), methionine (met, M), phenylalanine (phe, F), proline (pro, P), serine (ser, S), threonine (thr, T), tryptophan (tip, W), tyrosine (tyr, Y), and valine (val, V).
  • alanine three letter code: ala, one letter code: A
  • arginine arg, R
  • antibody herein is used in the broadest sense and encompasses various antibody structures, including but not limited to monoclonal antibodies, polyclonal antibodies, multispecific antibodies (e.g., bispecific antibodies), and antibody fragments so long as they exhibit the desired antigen-binding activity.
  • antibody fragment refers to a molecule other than an intact antibody that comprises a portion of an intact antibody that binds the antigen to which the intact antibody binds.
  • antibody fragments include but are not limited to Fv, Fab, Fab′, Fab′-SH, F(ab′) 2 ; diabodies; linear antibodies; single-chain antibody molecules (e.g. scFv); and multispecific antibodies formed from antibody fragments.
  • codon denotes an oligonucleotide consisting of three nucleotides that is encoding a defined amino acid. Due to the degeneracy of the genetic code most amino acids are encoded by more than one codon. These different codons encoding the same amino acid have different relative usage frequencies in individual host cells. Thus, a specific amino acid is encoded either by exactly one codon or by a group of different codons. Likewise the amino acid sequence of a polypeptide can be encoded by different nucleic acids. Therefore, a specific amino acid (residue) in a polypeptide can be encoded by a group of different codons, whereby each of these codons has a usage frequency within a given host cell.
  • codon usage tables are available from e.g. the “Codon Usage Database” (www.kazusa.or.jp/codon/), Nakamura, Y., et al., Nucl. Acids Res. 28 (2000) 292.
  • the codon usage tables for yeast, E. coli, homo sapiens and hamster have been reproduced from “EMBOSS: The European Molecular Biology Open Software Suite” (Rice, P., et al., Trends Gen. 16 (2000) 276-277, Release 6.0.1, 15.07.2009) and are shown in the following tables.
  • the different codon usage frequencies for the 20 naturally occurring amino acids for E. coli , yeast, human cells, and CHO cells have been calculated for each amino acid, rather than for all 64 codons.
  • the term “expression” as used herein refers to transcription and/or translation processes occurring within a cell.
  • the level of transcription of a nucleic acid sequence of interest in a cell can be determined on the basis of the amount of corresponding mRNA that is present in the cell.
  • mRNA transcribed from a sequence of interest can be quantitated by RT-PCR (qRT-PCR) or by Northern hybridization (see Sambrook, J., et al., 1989, supra).
  • Polypeptides encoded by a nucleic acid of interest can be quantitated by various methods, e.g.
  • An “expression cassette” refers to a construct that contains the necessary regulatory elements, such as promoter and polyadenylation site, for expression of at least the contained nucleic acid in a cell.
  • polypeptide(s) of interest are in general secreted polypeptides and therefore contain an N-terminal extension (also known as the signal sequence) which is necessary for the transport/secretion of the polypeptide through the cell wall into the extracellular medium.
  • the signal sequence can be derived from any gene encoding a secreted polypeptide. If a heterologous signal sequence is used, it preferably is one that is recognized and processed (i.e. cleaved by a signal peptidase) by the host cell.
  • the native signal sequence of a heterologous gene to be expressed may be substituted by a homologous yeast signal sequence derived from a secreted gene, such as the yeast invertase signal sequence, alpha-factor leader (including Saccharomyces, Kluyveromyces, Pichia , and Hansenula ⁇ -factor leaders, the second described in U.S. Pat. No. 5,010,182), acid phosphatase signal sequence, or the C. albicans glucoamylase signal sequence (EP 0 362 179).
  • yeast invertase signal sequence such as the yeast invertase signal sequence, alpha-factor leader (including Saccharomyces, Kluyveromyces, Pichia , and Hansenula ⁇ -factor leaders, the second described in U.S. Pat. No. 5,010,182), acid phosphatase signal sequence, or the C. albicans glucoamylase signal sequence (EP 0 362 179).
  • the native signal sequence of the protein of interest is satisfactory, although other mammalian signal sequences may be suitable, such as signal sequences from secreted polypeptides of the same or related species, e.g. for immunoglobulins from human or murine origin, as well as viral secretory signal sequences, for example, the herpes simplex glycoprotein D signal sequence.
  • the DNA fragment encoding for such a pre-segment is ligated in frame, i.e. operably linked, to the DNA fragment encoding a polypeptide of interest.
  • cell refers to a cell into which a nucleic acid, e.g. encoding a heterologous polypeptide, can be or is transfected.
  • the term “cell” includes both prokaryotic cells, which are used for expression of a nucleic acid and production of the encoded polypeptide including propagation of plasmids, and eukaryotic cells, which are used for the expression of a nucleic acid and production of the encoded polypeptide.
  • the eukaryotic cells are mammalian cells.
  • the mammalian cell is a CHO cell, optionally a CHO K1 cell (ATCC CCL-61 or DSM ACC 110), or a CHO DG44 cell (also known as CHO-DHFR[-], DSM ACC 126), or a CHO XL99 cell, a CHO-T cell (see e.g. Morgan, D., et al., Biochemistry 26 (1987) 2959-2963), or a CHO-S cell, or a Super-CHO cell (Pak, S. C. O., et al. Cytotechnology 22 (1996) 139-146). If these cells are not adapted to growth in serum-free medium or in suspension an adaptation prior to the use in the current method is to be performed.
  • the expression “cell” includes the subject cell and its progeny.
  • the words “transformant” and “transformed cell” include the primary subject cell and cultures derived there from without regard for the number of transfers or subcultivations. It is also understood that all progeny may not be precisely identical in DNA content, due to deliberate or inadvertent mutations. Variant progeny that have the same function or biological activity as screened for in the originally transformed cell are included.
  • the eukaryotic cell is a yeast cell.
  • the yeast cell is of the genus Saccharomyces , or Pichia , or Hansenula , or Kluyveromyces , or Schizosaccharomyces.
  • the prokaryotic cell is an Escherichia cell, or a Salmonella cell, or a Bacillus cell, or a Lactococcus cell, or a Streptococcus cell.
  • the eukaryotic cell is a plant cell.
  • the plant cell is of the genus Arabidopsis , Tobacco and Tomato.
  • codon optimization denotes the exchange of one, at least one, or more than one codon in a polypeptide encoding nucleic acid for a different codon with a different usage frequency in a respective cell.
  • codon-optimized nucleic acid denotes a nucleic acid encoding a polypeptide that has been adapted for improved expression in a cell, e.g. a mammalian cell or a bacterial cell, by replacing one, at least one, or more than one codon in a parent polypeptide encoding nucleic acid with a codon encoding the same amino acid residue with a different relative frequency of usage in the cell.
  • a “gene” denotes a nucleic acid which is a segment e.g. on a chromosome or on a plasmid which can effect the expression of a peptide, polypeptide, or protein. Beside the coding region, i.e. the structural gene, a gene comprises other functional elements e.g. a signal sequence, promoter(s), introns, and/or terminators.
  • group of codons and semantic equivalents thereof denote a defined number of different codons encoding one (i.e. the same) amino acid residue.
  • the individual codons of one group differ in their overall usage frequency in the genome of a cell.
  • Each codon in a group of codons has a specific usage frequency within the group that depends on the number of codons in the group. This specific usage frequency within the group can be different from the overall usage frequency in the genome of a cell but is depending (related thereto) on the overall usage frequency.
  • a group of codons may comprise only one codon but can comprise also up to six codons.
  • all usage frequency in the genome of a cell denotes the frequency of occurrence of a specific codon in the entire genome of a cell.
  • specific usage frequency of a codon in a group of codons denotes the frequency with which a single (i.e. a specific) codon of a group of codons in relation to all codons of one group can be found in a nucleic acid encoding a polypeptide obtained with a method as reported herein.
  • the value of the specific usage frequency depends on the overall usage frequency of the specific codon in the genome of a cell and the number of codons in the group.
  • the specific usage frequency of a codon in a group of condons is at least the same as its overall usage frequency in the genome of a cell and at most 100%, i.e. it is at least the same but can be more than the overall usage frequency in the genome of a cell.
  • the sum of specific codon usage frequencies of all members of a group of codons is always about 100%.
  • amino acid codon motif denotes a sequence of codons, which all are members of the same group of codons and, thus, encode the same amino acid residue.
  • the number of different codons in an amino acid codon motif is the same as the number of different codons in a group of codons but each codon can be present more than once in the amino acid codon motif. Further, each codon is present in the amino acid codon motif at its specific usage frequency.
  • the amino acid codon motif represents a sequence of different codons encoding the same amino acid residue wherein each of the different codons is present at its specific usage frequency, wherein the sequence starts with the codon having the highest specific usage frequence, and wherein the codons are arranged in a defined sequence.
  • the group of codons encoding the amino acid residue alanine comprises the four codons GCG, GCT, GCA and GCC with a specific usage frequency of 32%, 28%, 24% and 16%, respectively (corresponding to a 4:4:3:2 ratio).
  • the amino acid codon motif for the amino acid residue alanine is defined in comprising the four codons GCG, GCT, GCA, and GCC at a ratio of 4:4:3:2, wherein the first codon is GCG.
  • the first codon of the amino acid codon motif is used in the corresponding encoding nucleic acid.
  • the second codon of the amino acid codon motif is used and so on.
  • the codon at the thirteenth, i.e. the last, position of the amino acid codon motiv is used in the corresponding encoding nucleic acid.
  • the fourteenth occurrence of the amino acid alanine in the amino acid sequence of the polypeptide again the first codon of the amino acid codon motif is used and so on.
  • nucleic acid or a “nucleic acid sequence”, which terms are used interchangeably within this application, refers to a polymeric molecule consisting of individual nucleotides (also called bases) a, c, g, and t (or u in RNA), for example to DNA, RNA, or modifications thereof.
  • This polynucleotide molecule can be a naturally occurring polynucleotide molecule or a synthetic polynucleotide molecule or a combination of one or more naturally occurring polynucleotide molecules with one or more synthetic polynucleotide molecules. Also encompassed by this definition are naturally occurring polynucleotide molecules in which one or more nucleotides are changed (e.g.
  • a nucleic acid can either be isolated, or integrated in another nucleic acid, e.g. in an expression cassette, a plasmid, or the chromosome of a host cell.
  • a nucleic acid is characterized by its nucleic acid sequence consisting of individual nucleotides.
  • nucleic acid is characterized by its nucleic acid sequence consisting of individual nucleotides and likewise by the amino acid sequence of a polypeptide encoded thereby.
  • a “structural gene” denotes the region of a gene without a signal sequence, i.e. the coding region.
  • a “transfection vector” is a nucleic acid (also denoted as nucleic acid molecule) providing all required elements for the expression of the in the transfection vector comprised coding nucleic acids/structural gene(s) in a host cell.
  • a transfection vector comprises a prokaryotic plasmid propagation unit, e.g. for E. coli , in turn comprising a prokaryotic origin of replication, and a nucleic acid conferring resistance to a prokaryotic selection agent, further comprises the transfection vector one or more nucleic acid(s) conferring resistance to an eukaryotic selection agent, and one or more nucleic acid encoding a polypeptide of interest.
  • each expression cassette comprises a promoter, a coding nucleic acid, and a transcription terminator including a polyadenylation signal.
  • Gene expression is usually placed under the control of a promoter, and such a structural gene is said to be “operably linked to” the promoter.
  • a regulatory element and a core promoter are operably linked if the regulatory element modulates the activity of the core promoter.
  • vector refers to a nucleic acid molecule capable of propagating another nucleic acid to which it is linked.
  • the term includes the vector as a self-replicating nucleic acid structure as well as the vector incorporated into the genome of a host cell into which it has been introduced.
  • Certain vectors are capable of directing the expression of nucleic acids to which they are operatively linked. Such vectors are referred to herein as “expression vectors”.
  • Antibodies may be produced using recombinant methods and compositions, e.g., as described in U.S. Pat. No. 4,816,567.
  • isolated nucleic acid encoding an antibody as reported herein is provided.
  • Such nucleic acid may encode an amino acid sequence comprising the VL and/or an amino acid sequence comprising the VH of the antibody (e.g., the light and/or heavy chains of the antibody).
  • one or more vectors e.g., expression vectors
  • a cell comprising such nucleic acid is provided.
  • a cell comprises (e.g., has been transformed with): (1) a vector comprising a nucleic acid that encodes an amino acid sequence comprising the VL of the antibody and an amino acid sequence comprising the VH of the antibody, or (2) a first vector comprising a nucleic acid that encodes an amino acid sequence comprising the VL of the antibody and a second vector comprising a nucleic acid that encodes an amino acid sequence comprising the VH of the antibody.
  • the cell is eukaryotic, e.g. a Chinese Hamster Ovary (CHO) cell or lymphoid cell (e.g., Y0, NS0, Sp2/0 cell).
  • a method of making an antibody comprises culturing a cell comprising a nucleic acid encoding the antibody, as provided herein, under conditions suitable for expression of the antibody, and optionally recovering the antibody from the cell (or culture medium).
  • nucleic acid encoding an antibody is isolated and inserted into one or more vectors for further cloning and/or expression in a cell.
  • nucleic acid may be readily isolated and sequenced using conventional procedures (e.g., by using oligonucleotide probes that are capable of binding specifically to genes encoding the heavy and light chains of the antibody).
  • Suitable cells for cloning or expression of antibody-encoding vectors include prokaryotic or eukaryotic cells described herein.
  • antibodies may be produced in bacteria, in particular when glycosylation and Fc effector function are not needed.
  • For expression of antibody fragments and polypeptides in bacteria see, e.g., U.S. Pat. No. 5,648,237, U.S. Pat. No. 5,789,199, and U.S. Pat. No. 5,840,523; Charlton, K. A., In: Methods in Molecular Biology, Vol. 248, Lo, B. K. C. (ed.), Humana Press, Totowa, N.J. (2003) pp. 245-254, describing expression of antibody fragments in E. coli ).
  • the antibody may be isolated from the bacterial cell paste in a soluble fraction and can be further purified.
  • eukaryotic microbes such as filamentous fungi or yeast are suitable cloning or expression hosts for antibody-encoding vectors, including fungi and yeast strains whose glycosylation pathways have been “humanized,” resulting in the production of an antibody with a partially or fully human glycosylation pattern (see Gerngross, T. U., Nat. Biotech. 22 (2004) 1409-1414; and Li, H., et al., Nat. Biotech. 24 (2006) 210-215).
  • Suitable host cells for the expression of glycosylated antibody are also derived from multicellular organisms (invertebrates and vertebrates). Examples of invertebrate cells include plant and insect cells. Numerous baculoviral strains have been identified which may be used in conjunction with insect cells, particularly for transfection of Spodoptera frugiperda cells.
  • Plant cell cultures can also be utilized as hosts (see e.g. U.S. Pat. No. 5,959,177, U.S. Pat. No. 6,040,498, U.S. Pat. No. 6,420,548, U.S. Pat. No. 7,125,978, and U.S. Pat. No. 6,417,429 (describing PLANTIBODIESTM technology for producing antibodies in transgenic plants).
  • Vertebrate cells may also be used as hosts.
  • mammalian cell lines that are adapted to grow in suspension may be useful.
  • useful mammalian cell lines are monkey kidney CV1 line transformed by SV40 (COS-7); human embryonic kidney line (293 cells as described, e.g., in Graham, F. L., et al., J. Gen Virol. 36 (1977) 59-74); baby hamster kidney cells (BHK); mouse sertoli cells (TM4 cells as described, e.g., in Mather, J. P., Biol. Reprod.
  • monkey kidney cells (CV1); African green monkey kidney cells (VERO-76); human cervical carcinoma cells (HELA); canine kidney cells (MDCK; buffalo rat liver cells (BRL 3A); human lung cells (W138); human liver cells (Hep G2); mouse mammary tumor (MMT 060562); TRI cells, as described, e.g., in Mather, J. P., et al., Annals N.Y. Acad. Sci. 383 (1982) 44-68; MRC 5 cells; and FS4 cells.
  • Other useful mammalian cell lines include Chinese hamster ovary (CHO) cells, including DHFR ⁇ CHO cells (Urlaub, G., et al., Proc. Natl.
  • Codon usage tables are readily available, for example, at the “Codon Usage Database” available at http://www.kazusa.or.jp/codon/ and these tables can be adapted in a number of ways (Nakamura, Y., et al., Nucl. Acids Res. 28 (2000) 292).
  • the encoding nucleic acid plays an important role.
  • Naturally occurring and from nature isolated encoding nucleic acids are generally not optimized for high yield expression, especially if expressed in a heterologous host cell.
  • one amino acid residue can be encoded by more than one nucleotide triplet (codon) except for the amino acids tryptophan and methionine.
  • codons corresponding encoding nucleic acid sequences
  • codons encoding one amino acid residue are employed by different organisms with different relative frequency (codon usage). Generally one specific codon is used with higher frequency than the other possible codons.
  • One aspect herein is a method for producing a polypeptide comprising the step of cultivating a cell, which comprises a nucleic acid encoding the polypeptide, and recovering the polypeptide from the cell or the cultivation medium,
  • a codon optimized nucleic acid encoding a polypeptide is used to express the polypeptide, whereby the obtainable expression yield is increased compared to other nucleic acids.
  • nucleic acid that encodes a polypeptide can be provided that upon expression result in an improved yield, e.g. compared to a nucleic acid in which always the codon with the highest usage frequency is present.
  • the cell is a prokaryotic cell. In one embodiment the cell is a bacterial cell.
  • the cell is an E. coli cell.
  • the overall codon usage frequency taking into account all codons encoding a specific amino acid residue for E. coli is given in the following table.
  • the group of codons encoding the amino acid residue can comprise at most four codons.
  • each codon has a specific usage frequency in the group that is the same as the overall usage frequency in the genome of the cell, i.e. the codon GCG has a specific and overall usage frequency of 32%, the codon GCA has a specific and overall usage frequency of 24%, the codon GCT has a specific and overall usage frequency of 28%, and the codon GCC has a specific and overall usage frequency of 16%. If the number of codons in the group is reduced, e.g.
  • each codon has a specific usage frequency in the group that is higher than its overall usage frequency in the genome of the cell, i.e. the codon GCG has a specific usage frequency of 53% and an overall usage frequency of 32%, and the codon GCT has a specific usage frequency of 47% and an overall usage frequency of 28%.
  • the amino acid codon motif of the codons encoding the amino acid residue alanine comprises the codons GCG, GCT, GCA and GCC at a ratio of 32:28:24:16, which corresponds to 8:7:6:4. As this would result in an amino acid codon motif comprising 25 positions it is adjusted to 8:8:6:4, which corresponds to 4:4:3:2.
  • the use of the different codons encoding a specific amino acid is alternating within the genome. This alternation is reflected herein by the definition of an amino acid codon motif.
  • an amino acid codon motif Within an amino acid codon motif the individual codons are distributed taking into account the specific usage frequency, whereby codons with a higher frequency are chosen first.
  • the amino acid sequence motif comprises a specific sequence of codons, wherein the total number of codons in an amino acid codon motif is at least the same or even higher than the number of codons in a group of codons in order to allow a mapping of the usage frequence of a group of codons to the corresponding amino acid codon motif.
  • one amino acid codon motif of the codons encoding the amino acid residue alanine is gcg gcg gcg gcg gct gct gct gct gca gca gcc gcc (SEQ ID NO: 01).
  • one amino acid codon motif of the codons encoding the amino acid residue alanine is gcg gct gca gcc gcg gct gca gcg gct gcc gcg gct gca (SEQ ID NO: 02).
  • one amino acid codon motif of the codons encoding the amino acid residue alanine is gcg gct gca gcc gcg gct gca gcc gcg gct gca gcg gct gca gcg gct (SEQ ID NO: 03).
  • the amino acid codon motif of the codons encoding the amino acid residue alanine comprises the codons GCG and GCT at a ratio of 53:47. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 50:50, which corresponds to 1:1.
  • one amino acid codon motif of the codons encoding the amino acid residue alanine is gcg gct (SEQ ID NO: 04).
  • one amino acid codon motif of the codons encoding the amino acid residue alanine is gct gcg (SEQ ID NO: 05).
  • the amino acid codon motif of the codons encoding the amino acid residue arginine comprises the codons CGT and CGC at a ratio of 66:34, which corresponds to 33:17. As this would result in an amino acid codon motif comprising 51 positions it is adjusted to 66:33, which corresponds to 2:1.
  • one amino acid codon motif of the codons encoding the amino acid residue arginine is cgt cgt cgc (SEQ ID NO: 06).
  • one amino acid codon motif of the codons encoding the amino acid residue arginine is cgt cgc cgt (SEQ ID NO: 07).
  • the amino acid codon motif of the codons encoding the amino acid residue asparagine comprises the codons AAC and AAT at a ratio of 84:16, which corresponds to 21:4. As this would result in an amino acid codon motif comprising 25 positions it is adjusted to 20:4, which corresponds to 5:1.
  • one amino acid codon motif of the codons encoding the amino acid residue asparagine is aac aac aac aac aac aat (SEQ ID NO: 08).
  • one amino acid codon motif of the codons encoding the amino acid residue asparagine is aac aac aac aac aat aac (SEQ ID NO: 09).
  • one amino acid codon motif of the codons encoding the amino acid residue asparagine is aac aac aac aat aac aac (SEQ ID NO: 10).
  • one amino acid codon motif of the codons encoding the amino acid residue asparagine is aac aac aat aac aac aac (SEQ ID NO: 11).
  • one amino acid codon motif of the codons encoding the amino acid residue asparagine is aac aat aac aac aac aac (SEQ ID NO: 12).
  • the amino acid codon motif of the codons encoding the amino acid residue aspartic acid comprises the codons GAC and GAT at a ratio of 54:46, which corresponds to 27:23. As this would result in an amino acid codon motif comprising 50 positions it is adjusted to 25:25, which corresponds to 1:1.
  • one amino acid codon motif of the codons encoding the amino acid residue aspartic acid is gac gat (SEQ ID NO: 13).
  • one amino acid codon motif of the codons encoding the amino acid residue aspartic acid is gat gac (SEQ ID NO: 14).
  • the amino acid codon motif of the codons encoding the amino acid residue cysteine comprises the codons TGC and TGT at a ratio of 64:36, which corresponds to 16:9. As this would result in an amino acid codon motif comprising 25 positions it is adjusted to 15:9, which corresponds to 5:3.
  • one amino acid codon motif of the codons encoding the amino acid residue cysteine is tgc tgc tgc tgc tgc tgt tgt tgt tgt (SEQ ID NO: 15).
  • one amino acid codon motif of the codons encoding the amino acid residue cysteine is tgc tgt tgc tgc tgt tgc tgc tgc tgc tgtgtgc tgtgt (SEQ ID NO: 16).
  • one amino acid codon motif of the codons encoding the amino acid residue cysteine is tgc tgc tgt tgc tgt tgc tgt tgc tgt tgc (SEQ ID NO: 17).
  • the amino acid codon motif of the codons encoding the amino acid residue glutamine comprises the codons CAG and CAA at a ratio of 82:18, which corresponds to 41:9. As this would result in an amino acid codon motif comprising 50 positions it is adjusted to 40:10, which corresponds to 4:1.
  • one amino acid codon motif of the codons encoding the amino acid residue glutamine is cag cag cag cag caa (SEQ ID NO: 18).
  • one amino acid codon motif of the codons encoding the amino acid residue glutamine is cag cag cag caa cag (SEQ ID NO: 19).
  • one amino acid codon motif of the codons encoding the amino acid residue glutamine is cag cag caa cag cag (SEQ ID NO: 20).
  • one amino acid codon motif of the codons encoding the amino acid residue glutamine is cag caa cag cag cag (SEQ ID NO: 21).
  • the amino acid codon motif of the codons encoding the amino acid residue glutamic acid comprises the codons GAA and GAG at a ratio of 76:24, which corresponds to 19:6. As this would result in an amino acid codon motif comprising 50 positions it is adjusted to 18:6, which corresponds to 3:1.
  • one amino acid codon motif of the codons encoding the amino acid residue glutamic acid is gaa gaa gaa gag (SEQ ID NO: 22).
  • one amino acid codon motif of the codons encoding the amino acid residue glutamic acid is gaa gaa gag gaa (SEQ ID NO: 23).
  • one amino acid codon motif of the codons encoding the amino acid residue glutamic acid is gaa gag gaa gaa (SEQ ID NO: 24).
  • the amino acid codon motif of the codons encoding the amino acid residue glycine comprises the codons GGT and GGC at a ratio of 54:46, which corresponds to 27:23. As this would result in an amino acid codon motif comprising 50 positions it is adjusted to 30:24 taking into account that glycine can be encoded by four different codons, which corresponds to 5:4.
  • one amino acid codon motif of the codons encoding the amino acid residue glycine is ggt ggt ggt ggt ggt ggt ggc ggc ggc ggc ggc ggc (SEQ ID NO: 25).
  • one amino acid codon motif of the codons encoding the amino acid residue glycine is ggt ggc ggt ggc ggt ggc ggt ggc ggt ggc ggt ggt ggt (SEQ ID NO: 26).
  • the amino acid codon motif of the codons encoding the amino acid residue histidine comprises the codons CAC and CAT at a ratio of 71:29. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 66:33, which corresponds to 2:1.
  • one amino acid codon motif of the codons encoding the amino acid residue histidine is cac cac cat (SEQ ID NO: 27).
  • one amino acid codon motif of the codons encoding the amino acid residue histidine is cac cat cac (SEQ ID NO: 28).
  • the amino acid codon motif of the codons encoding the amino acid residue isoleucine comprises the codons ATC and ATT at a ratio of 68:32, which corresponds to 17:8. As this would result in an amino acid codon motif comprising 25 positions it is adjusted to 16:8, which corresponds to 2:1.
  • one amino acid codon motif of the codons encoding the amino acid residue isoleucine is atc atc att (SEQ ID NO: 29).
  • one amino acid codon motif of the codons encoding the amino acid residue isoleucine is atc att atc (SEQ ID NO: 30).
  • the amino acid codon motif of the codons encoding the amino acid residue leucine comprises the codons CTG and CTC at a ratio of 91:9. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 90:10, which corresponds to 9:1.
  • one amino acid codon motif of the codons encoding the amino acid residue leucine is ctg ctg ctg ctg ctg ctg ctc ctg ctg ctg ctg ctg (SEQ ID NO: 32).
  • one amino acid codon motif of the codons encoding the amino acid residue leucine is ctg ctg ctg ctg ctc ctg ctg ctg ctg ctg ctg ctg ctg (SEQ ID NO: 33).
  • the amino acid codon motif of the codons encoding the amino acid residue lysine comprises the codons AAA and AAG at a ratio of 80:20, which corresponds to 4:1.
  • one amino acid codon motif of the codons encoding the amino acid residue lysine is aaa aaa aaa aaa aag (SEQ ID NO: 34).
  • one amino acid codon motif of the codons encoding the amino acid residue lysine is aaa aaa aaa aag aaa (SEQ ID NO: 35).
  • one amino acid codon motif of the codons encoding the amino acid residue lysine is aaa aaa aag aaa aaa (SEQ ID NO: 36).
  • one amino acid codon motif of the codons encoding the amino acid residue lysine is aaa aag aaa aaa aaa (SEQ ID NO: 37).
  • the amino acid codon motif of the codons encoding the amino acid residue phenylalanine comprises the codons TTC and TTT at a ratio of 72:28, which corresponds to 18:7. As this would result in an amino acid codon motif comprising 25 positions it is adjusted to 18:6, which corresponds to 3:1.
  • one amino acid codon motif of the codons encoding the amino acid residue phenylalanine is ttc ttc ttc tttt (SEQ ID NO: 38).
  • one amino acid codon motif of the codons encoding the amino acid residue phenylalanine is ttc ttc ttt ttc (SEQ ID NO: 39).
  • one amino acid codon motif of the codons encoding the amino acid residue phenylalanine is ttc ttt ttc ttc (SEQ ID NO: 40).
  • the amino acid codon motif of the codons encoding the amino acid residue proline comprises the codons CCG, CCA and CCT at a ratio of 76:14:10, which corresponds to 38:7:5. As this would result in an amino acid codon motif comprising 50 positions it is adjusted to 35:7:7, which corresponds to 5:1:1.
  • one amino acid codon motif of the codons encoding the amino acid residue proline is ccg ccg ccg ccg ccg cca cct (SEQ ID NO: 41).
  • one amino acid codon motif of the codons encoding the amino acid residue proline is ccg ccg ccg cca ccg cct ccg (SEQ ID NO: 42).
  • one amino acid codon motif of the codons encoding the amino acid residue proline is ccg ccg cca ccg ccg cct ccg (SEQ ID NO: 43).
  • one amino acid codon motif of the codons encoding the amino acid residue proline is ccg ccg cca ccg cct ccg ccg (SEQ ID NO: 44).
  • one amino acid codon motif of the codons encoding the amino acid residue proline is ccg cca ccg ccg cct ccg ccg (SEQ ID NO: 45).
  • one amino acid codon motif of the codons encoding the amino acid residue proline is ccg cca ccg cct ccg ccg ccg (SEQ ID NO: 46).
  • the amino acid codon motif of the codons encoding the amino acid residue serine comprises the codons TCT, TCC and AGC at a ratio of 38:32:30, which corresponds to 19:16:15. As this would result in an amino acid codon motif comprising 50 positions it is adjusted to 18:15:15, which corresponds to 6:5:5.
  • one amino acid codon motif of the codons encoding the amino acid residue serine is tct tct tct tct tct tct tcc tcc tcc tcc tcc tcc agc agc agc agc (SEQ ID NO: 47).
  • one amino acid codon motif of the codons encoding the amino acid residue serine is tct tcc agc tct tcc agc tct tcc agc tct tcc agc tcc tct agc tcct tct (SEQ ID NO: 48).
  • the amino acid codon motif of the codons encoding the amino acid residue threonine comprises the codons ACC, ACT and ACG at a ratio of 58:29:13. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 56:28:14, which corresponds to 4:2:1.
  • one amino acid codon motif of the codons encoding the amino acid residue threonine is acc acc acc acc act act acg (SEQ ID NO: 49).
  • one amino acid codon motif of the codons encoding the amino acid residue threonine is acc act acc act acc acg acc (SEQ ID NO: 50).
  • one amino acid codon motif of the codons encoding the amino acid residue threonine is acc act acc acg acc act acc (SEQ ID NO: 51).
  • the amino acid codon motif of the codons encoding the amino acid residue tyrosine comprises the codons TAC and TAT at a ratio of 64:34, which corresponds to 32:17. As this would result in an amino acid codon motif comprising 49 positions it is adjusted to 32:16, which corresponds to 2:1.
  • one amino acid codon motif of the codons encoding the amino acid residue tyrosine is tac tac tat (SEQ ID NO: 52).
  • one amino acid codon motif of the codons encoding the amino acid residue tyrosine is tac tat tac (SEQ ID NO: 53).
  • the amino acid codon motif of the codons encoding the amino acid residue valine comprises the codons GTT, GTG, GTA and GTC at a ratio of 40:27:20:13. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 40:30:20:10, which corresponds to 4:3:2:1.
  • one amino acid codon motif of the codons encoding the amino acid residue valine is gtt gtt gtt gtt gtg gtg gtg gta gta gtc (SEQ ID NO: 54).
  • one amino acid codon motif of the codons encoding the amino acid residue valine is gtt gtg gta gtc gtt gtg gta gtt gtg gtt (SEQ ID NO: 55).
  • one amino acid codon motif of the codons encoding the amino acid residue valine is gtt gtg gta gtt gtg gtt gtc gtt gtg gta (SEQ ID NO: 56).
  • a purification tag can be attached to the N- or C-terminus of the polypeptide.
  • the purification tag can be fused directly to the amino acid sequence of the polypeptide or it can be separated from the amino acid sequence by a short linker or a protease cleavage site.
  • One exemplary purification tag is the hexa-histidine tag with an N-terminal GS linker for fusion to the C-terminus of the polypeptide:
  • test-polypeptide can be preceded by a short carrier peptide.
  • carrier peptide is derived from the N-terminal part of mature human interferon-alpha:
  • the nucleic acid sequence encoding the test-polypeptide of SEQ ID NO: 57 which is obtained with a backtranslation method using always the codon with the highest usage in the respective cell, is
  • the nucleic acid sequence encoding the test-polypeptide of SEQ ID NO: 57 which is obtained with a method as reported herein, is
  • nucleic acid sequence that has been obtained by using always the codon with the highest usage frequency with the nucleic acid that has been obtained with a method as reported herein it can be seen that the two nucleic acids differ in 146 codons out of 300 codons, i.e. the optimized sequences differ by 48.7% of all coding codons (differing codons are underlined in the following alignment).
  • codons of the respective encoding nucleic acid are given in sequences of their appearance starting from the 5′ end of the respective nucleic acid.
  • E. coli comprising an N-terminal interferon-alpha carrier peptide of SEQ ID NO: 60 encoded by a nucleic acid of SEQ ID NO: 61 and a C-terminal purification tag of SEQ ID NO: 58 encoded by a nucleic acid of SEQ ID NO: 59.
  • the E. coli expression vector including the expression cassette was identical for the differently encoded test-polypeptide.
  • the expression yield of the differently encoded test-polypeptide has been determined by quantitative Western blot analysis. Therefore, E. coli whole cell lysates were prepared and fractionated into a soluble supernatant and an insoluble cell pellet fraction by centrifugation.
  • proteins were separated electrophoretically by SDS PAGE, transferred electrophoretically to a nitrocellulose membrane and then stained with an antibody POD conjugate recognizing the poly-His purification tag.
  • the stained poly-His containing differently encoded test-polypeptide was quantified by comparison with a pure protein reference standard containing the same poly-His purification tag (scFv-poly-His antibody fragment) of known scFv-poly-His protein concentration using the Lumi-Imager F1 analyzer (Roche Molecular Biochemicals) and the Lumi Analyst software version 3.1.
  • FIGS. 1 and 2 the Western blots of the differently encoded poly-His-tagged test-polypeptide are shown.
  • E. coli whole cell lysates were fractionated into a soluble supernatant and an insoluble cell pellet fraction before SDS PAGE and immonoblotting.
  • a molecular weight protein standard and a scFv antibody fragment comprising a poly-His-tag of known protein concentration has been used.
  • Lanes 2, 4, 6, and 8 are samples showing the amount of expressed test-polypeptide obtained with the nucleic acid generated by using always the codon with the highest usage frequency after 0 hours, 4 hours, 6 hours, and 21 hours of cultivation.
  • Lanes 3, 5, 7, and 9 are samples showing the amount of expressed test-polypeptide obtained with the nucleic acid generated with the new/inventive protein backtranslation method as reported herein after 0 hours, 4 hours, 6 hours, and 21 hours of cultivation.
  • Lanes 11 to 14 correspond to the purified scFv-poly-His protein reference standard of known concentration (5 ng, 10 ng, 20 ng, 30 ng, and 40 ng).
  • test-polypeptide expressed was determined within the insoluble cell debris fraction (pellet) after solubilization/extraction of insoluble protein aggregates with SDS sample buffer since the major fraction of the expressed test-polypeptide was found in the insoluble cell pellet fraction after cell lysis and cell fractionation (precipitated insoluble protein aggregates also known as inclusion bodies).
  • test-polypeptide solubilized from the insoluble cell pellet fraction are shown in the following table.
  • the expression yield obtained by using a nucleic acid encoding a test-polypeptide in which the encoding codons are chosen according to the method as reported herein is at least about 1.8 times the yield that is obtained using a classical codon optimization method.
  • the cell is a CHO cell.
  • the overall codon usage frequency taking into account all condons encoding a specific amino acid residue for Cricetulus species (CHO cells; Mesocricetus species; hamster) is given in the following table.
  • the group of codons encoding the amino acid residue can comprise at most four codons.
  • each codon has a specific usage frequency in the group that is the same as the overall usage frequency in the genome of the cell, i.e. the codon GCG has a specific and overall usage frequency of 9%, the codon GCA has a specific and overall usage frequency of 23%, the codon GCT has a specific and overall usage frequency of 30%, and the codon GCC has a specific and overall usage frequency of 38%. If the number of codons in the group is reduced, e.g.
  • each codon has a specific usage frequency in the group that is higher than its overall usage frequency in the genome of the cell, i.e. the codon GCT has a specific usage frequency of 44% and an overall usage frequency of 30%, and the codon GCC has a specific usage frequency of 56% and an overall usage frequency of 38%.
  • the amino acid codon motif of the codons encoding the amino acid residue alanine comprises the codons GCC, GCT, GCA and GCG at a ratio of 38:30:23:9. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 40:30:20:10, which corresponds to 4:3:2:1.
  • one amino acid codon motif of the codons encoding the amino acid residue alanine is gcc gcc gcc gcc gct gct gct gca gca gcg (SEQ ID NO: 64).
  • codons are distributed within the genome also a distribution within the amino acid codon motif is used taking into account the above ratio and the usage frequency, whereby codons with a higher frequency are chosen first.
  • one amino acid codon motif of the codons encoding the amino acid residue alanine is gcc gct gca gcc gct gca gcg gcc gct gcc (SEQ ID NO: 65).
  • one amino acid codon motif of the codons encoding the amino acid residue alanine is gcc gct gcc gct gca gcg gcc gct gca gcc (SEQ ID NO: 66).
  • the amino acid codon motif of the codons encoding the amino acid residue alanine comprises the codons GCC, GCT and GCA at a ratio of 42:33:25. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 40:30:30, which corresponds to 4:3:3.
  • one amino acid codon motif of the codons encoding the amino acid residue alanine is gcc gcc gcc gcc gct gct gct gca gca (SEQ ID NO: 67).
  • one amino acid codon motif of the codons encoding the amino acid residue alanine is gcc gct gca gcc gct gca gcc gct gca gcc gct gca gcc (SEQ ID NO: 68).
  • the amino acid codon motif of the codons encoding the amino acid residue arginine comprises the codons AGG, AGA, CGG and CGC at a ratio of 27:25:24:24. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 25:25:25:25, which corresponds to 1:1:1:1.
  • one amino acid codon motif of the codons encoding the amino acid residue arginine is agg aga cgg cgc (SEQ ID NO: 69).
  • one amino acid codon motif of the codons encoding the amino acid residue arginine is agg cgg aga cgc (SEQ ID NO: 70).
  • the amino acid codon motif of the codons encoding the amino acid residue asparagine comprises the codons AAC and AAT at a ratio of 61:39. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 60:40, which corresponds to 3:2.
  • one amino acid codon motif of the codons encoding the amino acid residue asparagine is aac aac aac aat aat (SEQ ID NO: 71).
  • one amino acid codon motif of the codons encoding the amino acid residue asparagine is aac aat aac aat aac (SEQ ID NO: 72).
  • the amino acid codon motif of the codons encoding the amino acid residue aspartic acid comprises the codons GAC and GAT at a ratio of 61:39. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 60:40, which corresponds to 3:2.
  • one amino acid codon motif of the codons encoding the amino acid residue aspartic acid is gac gac gac gat gat (SEQ ID NO: 73).
  • one amino acid codon motif of the codons encoding the amino acid residue aspartic acid is gac gat gac gat gac (SEQ ID NO: 74).
  • the amino acid codon motif of the codons encoding the amino acid residue cysteine comprises the codons TGC and TGT at a ratio of 58:42, which corresponds to 29:21. As this would result in an amino acid codon motif comprising 50 positions it is adjusted to 30:20, which corresponds to 3:2.
  • one amino acid codon motif of the codons encoding the amino acid residue cysteine is tgc tgc tgc tgt tgt (SEQ ID NO: 75).
  • one amino acid codon motif of the codons encoding the amino acid residue cysteine is tgc tgt tgc tgt tgc (SEQ ID NO: 76).
  • the amino acid codon motif of the codons encoding the amino acid residue glutamine comprises the codons CAG and CAA at a ratio of 78:22, which corresponds to 39:11. As this would result in an amino acid codon motif comprising 50 positions it is adjusted to 40:10, which corresponds to 4:1.
  • one amino acid codon motif of the codons encoding the amino acid residue glutamine is cag cag cag cag caa (SEQ ID NO: 77).
  • one amino acid codon motif of the codons encoding the amino acid residue glutamine is cag cag cag caa cag (SEQ ID NO: 78).
  • one amino acid codon motif of the codons encoding the amino acid residue glutamine is cag cag caa cag cag (SEQ ID NO: 79).
  • one amino acid codon motif of the codons encoding the amino acid residue glutamine is cag caa cag cag cag (SEQ ID NO: 80).
  • the amino acid codon motif of the codons encoding the amino acid residue glutamic acid comprises the codons GAG and GAA at a ratio of 64:36, which corresponds to 32:18. As this would result in an amino acid codon motif comprising 50 positions it is adjusted to 32:16, which corresponds to 2:1.
  • one amino acid codon motif of the codons encoding the amino acid residue glutamic acid is gag gag gaa (SEQ ID NO: 81).
  • one amino acid codon motif of the codons encoding the amino acid residue glutamic acid is gag gaa gag (SEQ ID NO: 82).
  • the amino acid codon motif of the codons encoding the amino acid residue glycine comprises the codons GGC, GGA, GGG and GGT at a ratio of 33:25:24:19. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 35:25:25:20, which corresponds to 7:5:5:4.
  • one amino acid codon motif of the codons encoding the amino acid residue glycine is ggc ggc ggc ggc ggc ggc ggc ggc ggc gga gga gga gga ggg ggg ggg ggg ggg ggt ggt ggt ggt (SEQ ID NO: 83).
  • one amino acid codon motif of the codons encoding the amino acid residue glycine is ggc gga ggg ggt ggc gga ggc ggg ggt ggc gga ggg ggt ggc gga ggg ggt ggc gga ggc ggg ggc gga ggg ggt (SEQ ID NO: 84).
  • the amino acid codon motif of the codons encoding the amino acid residue histidine comprises the codons CAC and CAT at a ratio of 58:42, which corresponds to 29:21. As this would result in an amino acid codon motif comprising 50 positions it is adjusted to 30:20, which corresponds to 3:2.
  • one amino acid codon motif of the codons encoding the amino acid residue histidine is cac cac cac cat cat (SEQ ID NO: 85).
  • one amino acid codon motif of the codons encoding the amino acid residue histidine is cac cat cac cat cac (SEQ ID NO: 86).
  • the amino acid codon motif of the codons encoding the amino acid residue isoleucine comprises the codons ATC, ATT and ATA at a ratio of 51:35:15. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 50:35:15, which corresponds to 10:7:3.
  • one amino acid codon motif of the codons encoding the amino acid residue isoleucine is atc atc atc atc atc atc atc atc atc atc att att att att att att att att att att att att att att ata ata (SEQ ID NO: 87).
  • one amino acid codon motif of the codons encoding the amino acid residue isoleucine is atc att atc atc att ata atc att atc atc att ata atc att atc atc att ata atc att atc att ata atc att (SEQ ID NO: 88).
  • the amino acid codon motif of the codons encoding the amino acid residue leucine comprises the codons CTG, CTC, CTT and CTG at a ratio of 44:19:13:12. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 40:20:10:10, which corresponds to 4:2:1:1.
  • one amino acid codon motif of the codons encoding the amino acid residue leucine is ctg ctg ctg ctg ctg ctc ctc ctt ttg (SEQ ID NO: 89).
  • one amino acid codon motif of the codons encoding the amino acid residue leucine is ctg ctc ctg ctt ctg ctc ctg ttg (SEQ ID NO: 90).
  • one amino acid codon motif of the codons encoding the amino acid residue leucine is ctg ctc ctt ttg ctg ctc ctg ctg (SEQ ID NO: 91).
  • the amino acid codon motif of the codons encoding the amino acid residue lysine comprises the codons AAG and AAA at a ratio of 67:33. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 66:33, which corresponds to 2:1.
  • one amino acid codon motif of the codons encoding the amino acid residue lysine is aag aag aaa (SEQ ID NO: 92).
  • one amino acid codon motif of the codons encoding the amino acid residue lysine is aag aaa aag (SEQ ID NO: 93).
  • the amino acid codon motif of the codons encoding the amino acid residue phenylalanine comprises the codons TTC and TTT at a ratio of 56:44, which corresponds to 14:11. As this would result in an amino acid codon motif comprising 25 positions it is adjusted to 15:10, which corresponds to 3:2.
  • one amino acid codon motif of the codons encoding the amino acid residue phenylalanine is ttc ttc ttc ttt ttt (SEQ ID NO: 94).
  • one amino acid codon motif of the codons encoding the amino acid residue phenylalanine is ttc ttt ttc ttt ttc (SEQ ID NO: 95).
  • the amino acid codon motif of the codons encoding the amino acid residue proline comprises the codons CCC, CCA and CCT at a ratio of 34:29:29. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 35:30:30, which corresponds to 7:6:6.
  • one amino acid codon motif of the codons encoding the amino acid residue proline is ccc ccc ccc ccc ccc cca cca cca cca cca cca cca cct cct cct cct cct cct cct cct cct (SEQ ID NO: 96).
  • one amino acid codon motif of the codons encoding the amino acid residue proline is ccc cca cct ccc cca cct ccc cca cct ccc cca cct ccc cca cct ccc cca cct ccc cca cct ccc cca cct ccc cca cct cccccccc (SEQ ID NO: 97).
  • the amino acid codon motif of the codons encoding the amino acid residue serine comprises the codons TCC, AGC, TCT and TCA at a ratio of 24:24:18:15, which corresponds to 8:8:6:3. As this would result in an amino acid codon motif comprising 25 positions it is adjusted to 9:9:6:3, which corresponds to 3:3:2:1.
  • one amino acid codon motif of the codons encoding the amino acid residue serine is tcc tcc tcc agc agc agc tct tct tca (SEQ ID NO: 98).
  • one amino acid codon motif of the codons encoding the amino acid residue serine is tcc agc tct tca tcc agc tct tcc agc (SEQ ID NO: 99).
  • one amino acid codon motif of the codons encoding the amino acid residue serine is tcc agc tct tcc agc tca tct tcc agc (SEQ ID NO: 100).
  • the amino acid codon motif of the codons encoding the amino acid residue threonine comprises the codons ACC, ACA and ACT at a ratio of 45:32:23. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 50:30:20, which corresponds to 5:3:2.
  • one amino acid codon motif of the codons encoding the amino acid residue threonine is acc acc acc acc acc acc aca aca aca act act (SEQ ID NO: 101).
  • one amino acid codon motif of the codons encoding the amino acid residue threonine is acc aca act acc aca acc aca act acc acc (SEQ ID NO: 102).
  • one amino acid codon motif of the codons encoding the amino acid residue threonine is acc aca acc aca act acc acc aca act acc (SEQ ID NO: 103).
  • the amino acid codon motif of the codons encoding the amino acid residue tyrosine comprises the codons TAT and TAC at a ratio of 61:39. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 60:40, which corresponds to 3:2.
  • one amino acid codon motif of the codons encoding the amino acid residue tyrosine is tat tat tat tac tac (SEQ ID NO: 104).
  • one amino acid codon motif of the codons encoding the amino acid residue tyrosine is tat tac tat tac tat (SEQ ID NO: 105).
  • the amino acid codon motif of the codons encoding the amino acid residue valine comprises the codons GTG, GTC, GTT and GTA at a ratio of 48:25:16:11. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 48:24:18:12, which corresponds to 8:4:3:2.
  • one amino acid codon motif of the codons encoding the amino acid residue valine is gtg gtg gtg gtg gtg gtg gtg gtg gtg gtg gtg gtg gtg gtc gtc gtt gtt gtt gta gta (SEQ ID NO: 106).
  • one amino acid codon motif of the codons encoding the amino acid residue valine is gtg gtg gtc gtt gta gtg gtg gtc gtt gtg gtg gtc gtt gta gtg gtg gtc (SEQ ID NO: 107).
  • one amino acid codon motif of the codons encoding the amino acid residue valine is gtg gtc gtt gta gtg gtc gtt gta gtg gtc gtt gtg gtc gtg gtg gtg gtg gtg (SEQ ID NO: 108).
  • the protein concentration was determined by determining the optical density (OD) at 280 nm, using the molar extinction coefficient calculated on the basis of the amino acid sequence.
  • the fusion test-polypeptide was prepared by recombinant means.
  • the amino acid sequence of the expressed fusion test-polypeptide was encoded by a nucleic acid comprising in 5′ to 3′ direction a nucleic acid of SEQ ID NO: 61 encoding the carrier peptide, a nucleic acid of SEQ ID NO: 62 or SEQ ID NO: 63 encoding the test polypeptide, and a nucleic acid of SEQ ID NO: 59 encoding a hexa-histidine purification tag (poly-His tag).
  • the encoding fusion gene was assembled with known recombinant methods and techniques by connection of appropriate nucleic acid segments. Nucleic acid sequences made by chemical synthesis were verified by DNA sequencing. The expression plasmid for the production of the fusion polypeptide was prepared as outlined below.
  • Plasmid 4980 (4980-pBRori-URA3-LACI-SAC) is an expression plasmid for the expression of core-streptavidin in E. coli . It was generated by ligation of the 3142 bp long EcoRI/CelII-vector fragment derived from plasmid 1966 (1966-pBRori-URA3-LACI-T-repeat; reported in EP-B 1 422 237) with a 435 bp long core-streptavidin encoding EcoRI/CelII-fragment.
  • the core-streptavidin E. coli expression plasmid comprises the following elements:
  • the final expression plasmid for the expression of the fusion test-polypeptide was prepared by excising the core-streptavidin structural gene from vector 4980 using the singular flanking EcoRI and CelII restriction endonuclease cleavage site and inserting the EcoRII/CelII restriction site flanked nucleic acid encoding the fusion test-polypeptide into the 3142 bp long EcoRI/CelII-4980 vector fragment.
  • the expression plasmid containing the test-polypeptide gene generated with the classic codon usage was designated 11020 while the expression plasmid containing the test-polypeptide gene generated with the new codon usage was designated 11021.
  • E. coli host/vector system which enables an antibiotic-free plasmid selection by complementation of an E. coli auxotrophy (PyrF) (EP 0 972 838 and U.S. Pat. No. 6,291,245).
  • the E. coli K12 strain CSPZ-2 (leuB, proC, trpE, thi-1, ApyrF) was transformed with the expression plasmid (11020 and 11021, respectively) obtained in previous step.
  • the transformed CSPZ-2 cells were first grown at 37° C. on agar plates and subsequently in a shaking culture in M9 minimal medium containing 0.5% casamino acids (Difco) up to an optical density at 550 nm (OD550) of 0.6-0.9 and subsequently induced with IPTG (1-5 mmol/1 final concentration).
  • E. coli lysate was processed by sodium dodecyl sulfate (SDS) polyacrylamide gel electrophoresis (SDS-PAGE), and the separated polypeptides were transferred to a membrane from the gel and subsequently detected and quantified by an immunological method.
  • SDS sodium dodecyl sulfate
  • SDS-PAGE sodium dodecyl sulfate polyacrylamide gel electrophoresis
  • E. coli cell culture samples were drawn from the shaking culture over a time course of about 24 h. One sample was drawn prior to induction of recombinant protein expression. Further samples were taken at dedicated time points e.g. 4, 6 and 21 hours after induction.
  • the insoluble cell components were sedimented (centrifugation 14,000 rpm, 5 min.) and an aliquot of the clarified supernatant was admixed with 1/4 volume (v/v) of 4 ⁇ LDS sample buffer and 1/10 volume (v/v) of 0.5 M 1,4-dithiotreitol (DTT).
  • pellet The insoluble cell debris fraction (pellet) was resuspended/extracted in 0.3 ml 1 ⁇ LDS sample buffer containing 50 mM 1,4-dithiotreitol (DTT) under shaking for 15 min and centrifuged again.
  • DTT 1,4-dithiotreitol
  • the NuPAGE® Pre-Cast gel system (Invitrogen) was used according to the manufacturer's instruction (10% NuPAGE® Novex® Bis-TRIS Pre-Cast gels, pH 6.4; Cat.-No.: NP0301). The samples were incubated for 10 min. at 70° C. and after cooling to room temperature 5-40 ⁇ L were loaded onto the gels. In addition, 5 ⁇ l MagicMarkTM XP Western Protein Standard (20-220 kDa) (Invitrogen, Cat. No.: LC5602), 5 ⁇ l of Precision Plus ProteinTM prestained protein standard (Bio-Rad, Cat.
  • Transfer buffer 39 mM glycine, 48 mM TRIS-hydrochloride, 0.04% by weight (w/w) SDS, and 20% by volume methanol (v/v).
  • the mouse monoclonal anti-Penta-His antibody (Qiagen, Cat. No.: 34660) was used as primary antibody at a dilution of 1:1,000 in TBS, 0.5% (w/v) Western Blocking Solution. After two washes in TBS (Bio-Rad, Cat. No.: 170-6435) and two washes in TBS supplemented with 0.05% (v/v) Tween-20 (TBST) the poly-His containing polypeptides were visualized using a purified rabbit anti-mouse IgG antibody conjugated to a peroxidase (Roche Molecular Biochemicals, Cat. No.: 11693 506) as secondary antibody at a dilution of 1:400 in TBS with 3% (w/v) not fat dry milk powder.
  • the membranes were incubated in 10 ml Luminol/peroxide-solution for 10 seconds to 5 minutes and the emitted light was detected afterwards with a LUMI-Imager F1 Analysator (Roche Molecular Biochemicals) and a protein reference standard curve was obtained by plotting the known protein concentration of the scFv-poly-His proteins against their cognate measured LUMI-Imager signals (intensity of the spots expressed in BLU units) which was used for the calculation of the concentrations of target protein in the original samples.
  • LUMI-Imager F1 Analysator Roche Molecular Biochemicals
  • the intensity of the spots was quantified with the LumiAnalyst Software (Version 3.1).
  • the protein reference standard curve obtained from five known scFv-poly-His concentrations is shown in FIG. 3 .
  • FIG. 1 The Western blot of the polypeptide containing supernatants is shown in FIG. 1 .
  • the Western blot of the SDS-extracted cell pellet fraction is shown in FIG. 2 .

Landscapes

  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Microbiology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Medicinal Chemistry (AREA)
  • Immunology (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Peptides Or Proteins (AREA)

Abstract

Herein is reported a method for recombinantly producing a polypeptide in a cell comprising the step of cultivating a cell which comprises a nucleic acid encoding the polypeptide, and recovering the polypeptide from the cell or the cultivation medium, wherein each of the amino acid residues of the polypeptide is encoded by at least one codon, whereby the different codons encoding the same amino acid residue are combined in one group and each of the codons in a group is defined by a specific usage frequency within the group, whereby the sum of the specific usage frequencies of all codons in one group is 100%, and wherein the usage frequency of a codon in the polypeptide encoding nucleic acid is about the same as its specific usage frequency within its group.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of International Patent Application No. PCT/EP2013/057808, filed on Apr. 15, 2013, which claims priority to European Patent Application No. 12164430.6, filed on Apr. 17, 2012, the entire contents of which are incorporated herein by reference.
  • SEQUENCE LISTING
  • The instant application contains a Sequence Listing submitted via EFS-Web and hereby incorporated by reference in its entirety. Said ASCII copy, created on Sep. 4, 2014, is named P4923C1SeqList.txt, and is 27,396 bytes in size.
  • FIELD OF THE INVENTION
  • The methods as reported herein are in the field of optimization of a polypeptide encoding nucleic acid and improved expression of a polypeptide encoded by a nucleic acid optimized with the method as reported herein.
  • BACKGROUND OF THE INVENTION
  • Cannarozzi, G., et al. report the role of codon order in translation dynamics (Cell 141 (2010) 355-367). The cause and consequence of codon bias is reported by Plotkin, J. B. and Kudla, G. (Nat. Rev. Gen. 12 (2011) 32-42). Weygand-Durasevic, I. and Ibba, M., report new roles for codon usage (Science 329 (2010) 1473-1474). Overlapping codes within protein-coding sequences is reported by Itzkovitz, S., et al. (Gen. Res. 20 (2010) 1582-1589).
  • In WO 97/11086 high level expression of proteins is reported. Plant polypeptide production is reported in WO 03/70957. In WO 03/85114 a method for designing synthetic nucleic acid sequences for optimal protein expression in a host cell. Codon pair optimization is reported in U.S. Pat. No. 5,082,767. In WO 2008/000632 a method for achieving improved polypeptide expression is reported. A codon optimization method is reported in WO 2007/142954 and U.S. Pat. No. 8,128,938.
  • Watkins, N. E., et al., report nearest-neighbor thermodynamics of deoxyinosine pairs in DNA duplexes (Nucl. Acids Res. 33 (2005) 6258-6267).
  • SUMMARY OF THE INVENTION
  • It has been found that for the expression of a polypeptide in a cell the use of a polypeptide encoding nucleic acid with the characteristics as reported herein is beneficial. The polypeptide encoding nucleic acid is characterized in that each amino acid is encoded by a group of codons, whereby each codon in the group of codons is defined by a specific usage frequency within the group that is related to the overall usage frequency of this codon in the genome of the cell, and whereby the usage frequency of the codons in the (total) polypeptide encoding nucleic acid is about the same as the usage frequency within the respective group.
  • One aspect as reported herein is a method for recombinantly producing a polypeptide in a cell comprising the step of cultivating a cell which comprises a nucleic acid encoding the polypeptide, and recovering the polypeptide from the cell or the cultivation medium,
      • wherein each of the amino acid residues of the polypeptide is encoded by one or more (at least one) codon(s), whereby the (different) codons encoding the same amino acid residue are combined in one group and each of the codons in a group is defined by a specific usage frequency within the group, whereby the sum of the specific usage frequencies of all codons in one group is 100%,
      • wherein the overall usage frequency of each codon in the polypeptide encoding nucleic acid is about the same as its specific usage frequency within its group.
  • In one embodiment the amino acid residues G, A, V, L, I, P, F, S, T, N, Q, Y, C, K, R, H, D, and E are each encoded by a group of codons and the amino acid residues M and W are encoded by a single codon.
  • In one embodiment the amino acid residues G, A, V, L, I, P, F, S, T, N, Q, Y, C, K, R, H, D, and E are each encoded by a group of codons comprising at least two codons and the amino acid residues M and W are encoded by a single codon.
  • In one embodiment the specific usage frequency of a codon is 100% if the amino acid residue is encoded by exactly one codon.
  • In one embodiment the amino acid residue G is encoded by a group of at most 4 codons. In one embodiment the amino acid residue A is encoded by a group of at most 4 codons. In one embodiment the amino acid residue V is encoded by a group of at most 4 codons. In one embodiment the amino acid residue L is encoded by a group of at most 6 codons. In one embodiment the amino acid residue I is encoded by a group of at most 3 codons. In one embodiment the amino acid residue M is encoded by exactly 1 codon. In one embodiment the amino acid residue P is encoded by a group of at most 4 codons. In one embodiment the amino acid residue F is encoded by a group of at most 2 codons. In one embodiment the amino acid residue W is encoded by exactly 1 codon. In one embodiment the amino acid residue S is encoded by a group of at most 6 codons. In one embodiment the amino acid residue T is encoded by a group of at most 4 codons. In one embodiment the amino acid residue N is encoded by a group of at most 2 codons. In one embodiment the amino acid residue Q is encoded by a group of at most 2 codons. In one embodiment the amino acid residue Y is encoded by a group of at most 2 codons. In one embodiment the amino acid residue C is encoded by a group of at most 2 codons. In one embodiment the amino acid residue K is encoded by a group of at most 2 codons. In one embodiment the amino acid residue R is encoded by a group of at most 6 codons. In one embodiment the amino acid residue H is encoded by a group of at most 2 codons. In one embodiment the amino acid residue D is encoded by a group of at most 2 codons. In one embodiment the amino acid residue E is encoded by a group of at most 2 codons.
  • In one embodiment the amino acid residue G is encoded by a group of 1 to 4 codons. In one embodiment the amino acid residue A is encoded by a group of 1 to 4 codons. In one embodiment the amino acid residue V is encoded by a group of 1 to 4 codons. In one embodiment the amino acid residue L is encoded by a group of 1 to 6 codons. In one embodiment the amino acid residue I is encoded by a group of 1 to 3 codons. In one embodiment the amino acid residue M is encoded by a group of 1 codon, i.e. by exactly 1 codon. In one embodiment the amino acid residue P is encoded by a group of 1 to 4 codons. In one embodiment the amino acid residue F is encoded by a group of 1 to 2 codons. In one embodiment the amino acid residue W is encoded by a group of 1 codon, i.e. by exactly 1 codon. In one embodiment the amino acid residue S is encoded by a group of 1 to 6 codons. In one embodiment the amino acid residue T is encoded by a group of 1 to 4 codons. In one embodiment the amino acid residue N is encoded by a group of 1 to 2 codons. In one embodiment the amino acid residue Q is encoded by a group of 1 to 2 codons. In one embodiment the amino acid residue Y is encoded by a group of 1 to 2 codons. In one embodiment the amino acid residue C is encoded by a group of 1 to 2 codons. In one embodiment the amino acid residue K is encoded by a group of 1 to 2 codons. In one embodiment the amino acid residue R is encoded by a group of 1 to 6 codons. In one embodiment the amino acid residue H is encoded by a group of 1 to 2 codons. In one embodiment the amino acid residue D is encoded by a group of 1 to 2 codons. In one embodiment the amino acid residue E is encoded by a group of 1 to 2 codons.
  • In one embodiment each of the groups comprises only codons with an overall usage frequency within the genome of the cell of more than 5%. In one embodiment each of the groups comprises only codons with an overall usage frequency within the genome of the cell of 8% or more. In one embodiment each of the groups comprises only codons with an overall usage frequency within the genome of the cell of 10% or more. In one embodiment each of the groups comprises only codons with an overall usage frequency within the genome of the cell of 15% or more.
  • In one embodiment the sequence of codons in the nucleic acid encoding the polypeptide for a specific amino acid residue in 5′ to 3′ direction is, i.e. corresponds to, the sequence of codons in a respective amino acid codon motif.
  • In one embodiment for each sequential occurrence of a specific amino acid in the polypeptide starting from the N-terminus of the polypeptide the encoding nucleic acid comprises the codon that is the same as that at the corresponding sequential position in the amino acid codon motif of the respective specific amino acid.
  • In one embodiment the usage frequency of a codon in the amino acid codon motif is about the same as its specific usage frequency within its group.
  • In one embodiment after the final codon of the amino acid codon motif has been reached at the next occurrence of the specific amino acid in the polypeptide the encoding nucleic acid comprises the codon that is at the first position of the amino acid codon motif.
  • In one embodiment the codons in the amino acid codon motif are distributed randomly throughout the amino acid codon motif.
  • In one embodiment the amino acid codon motif is selected from a group of amino acid codon motifs comprising all possible amino acid codon motifs obtainable by permutating codons therein wherein all motifs have the same number of codons and the codons in each motif have the same specific usage frequency.
  • In one embodiment the codons in the amino acid codon motif are arranged with decreasing specific usage frequency whereby all codons of one usage frequency directly succeed each other. In one embodiment the codons of one codon usage frequence are grouped together.
  • In one embodiment the (different) codons in the amino acid codon motif are distributed uniformly throughout the amino acid codon motif.
  • In one embodiment the codons in the amino acid codon motif are arranged with decreasing specific usage frequency whereby after the codon with the lowest specific usage frequency or the codon with the second lowest specific usage frequency the codon with the highest specific usage frequency is present (used).
  • In one embodiment the codons in the amino acid codon motif are arranged with decreasing specific usage frequency whereby after the codon with the lowest specific usage frequency the codon with the highest specific usage frequency is present (used).
  • In one embodiment the cell is a prokaryotic cell.
  • In one embodiment the prokaryotic cell is an E. coli cell.
  • In one embodiment the amino acid codon motif for
      • alanine is selected from SEQ ID NO: 01, 02, 03, 04 and 05, and/or
      • arginine is selected from SEQ ID NO: 06 and 07, and/or
      • asparagine is selected from SEQ ID NO: 08, 09, 10, 11, and 12, and/or
      • aspartic acid is selected from SEQ ID NO: 13 and 14, and/or,
      • cysteine is selected from SEQ ID NO: 15, 16 and 17, and/or
      • glutamine is selected from SEQ ID NO: 18, 19, 20, and 21, and/or
      • glutamic acid is selected from SEQ ID NO: 22, 23 and 24, and/or
      • glycine is selected from SEQ ID NO: 25 and 26, and/or
      • histidine is selected from SEQ ID NO: 27 and 28, and/or
      • isoleucine is selected from SEQ ID NO: 29 and 30, and/or
      • leucine is selected from SEQ ID NO: 31, 32 and 33, and/or
      • lysine is selected from SEQ ID NO: 34, 35, 36 and 37, and/or
      • phenylalanine is selected from SEQ ID NO: 38, 39 and 40, and/or
      • proline is selected from SEQ ID NO: 41, 42, 43, 44, 45 and 46, and/or
      • serine is selected from, SEQ ID NO: 47 and 48, and/or
      • threonine is selected from SEQ ID NO: 49, 50 and 51, and/or
      • tyrosine is selected from SEQ ID NO: 52 and 53, and/or
      • valine is selected from SEQ ID NO: 54, 55 and 56.
  • In one embodiment the amino acid codon motif for
      • alanine is SEQ ID NO: 03,
      • arginine is SEQ ID NO: 07,
      • asparagine is SEQ ID NO: 10,
      • aspartic acid is SEQ ID NO: 13,
      • cysteine is SEQ ID NO: 17,
      • glutamine is SEQ ID NO: 20,
      • glutamic acid is SEQ ID NO: 23,
      • glycine is SEQ ID NO: 26,
      • histidine is SEQ ID NO: 28,
      • isoleucine is SEQ ID NO: 30,
      • leucine is SEQ ID NO: 33;
      • lysine is SEQ ID NO: 36,
      • phenylalanine is SEQ ID NO: 39,
      • proline is SEQ ID NO: 43,
      • serine is SEQ ID NO: 48,
      • threonine is SEQ ID NO: 51,
      • tyrosine is SEQ ID NO: 53, and
      • valine is SEQ ID NO: 56.
  • In one embodiment the cell is a eukaryotic cell is selected from a CHO cell, a BHK cell, a HEK cell, a SP2/0 cell, or a NS0 cell.
  • In one embodiment the eukaryotic cell is a CHO cell.
  • In one embodiment the amino acid codon motif for
      • alanine is selected from SEQ ID NO: 64, 65, 66, 67 and 68, and/or
      • arginine is selected from SEQ ID NO: 69 and 70, and/or
      • asparagine is selected from SEQ ID NO: 71 and 72, and/or
      • aspartic acid is selected from SEQ ID NO: 73 and 74, and/or,
      • cysteine is selected from SEQ ID NO: 75 and 76, and/or
      • glutamine is selected from SEQ ID NO: 77, 78, 79, and 80, and/or
      • glutamic acid is selected from SEQ ID NO: 81 and 82, and/or
      • glycine is selected from SEQ ID NO: 83 and 84, and/or
      • histidine is selected from SEQ ID NO: 85 and 86, and/or
      • isoleucine is selected from SEQ ID NO: 87 and 88, and/or
      • leucine is selected from SEQ ID NO: 89, 90 and 91, and/or
      • lysine is selected from SEQ ID NO: 92 and 93, and/or
      • phenylalanine is selected from SEQ ID NO: 94 and 95, and/or
      • proline is selected from SEQ ID NO: 96 and 97, and/or
      • serine is selected from, SEQ ID NO: 98, 99 and 100, and/or
      • threonine is selected from SEQ ID NO: 101, 102 and 103, and/or
      • tyrosine is selected from SEQ ID NO: 104 and 105, and/or
      • valine is selected from SEQ ID NO: 106, 107 and 108.
  • In one embodiment the amino acid codon motif for
      • alanine is SEQ ID NO: 68,
      • arginine is SEQ ID NO: 69,
      • asparagine is SEQ ID NO: 72,
      • aspartic acid is SEQ ID NO: 74,
      • cysteine is SEQ ID NO: 76,
      • glutamine is SEQ ID NO: 79,
      • glutamic acid is SEQ ID NO: 82,
      • glycine is SEQ ID NO: 84,
      • histidine is SEQ ID NO: 86,
      • isoleucine is SEQ ID NO: 88,
      • leucine is SEQ ID NO: 90;
      • lysine is SEQ ID NO: 93,
      • phenylalanine is SEQ ID NO: 95,
      • proline is SEQ ID NO: 97,
      • serine is SEQ ID NO: 99,
      • threonine is SEQ ID NO: 103,
      • tyrosine is SEQ ID NO: 105, and
      • valine is SEQ ID NO: 108.
  • In one embodiment the polypeptide is an antibody, or an antibody fragment, or an antibody fusion polypeptide.
  • One aspect as reported herein is a nucleic acid encoding a polypeptide, characterized in that each of the amino acid residues of the polypeptide is encoded by one or more (at least one) codon(s),
      • whereby the different codons encoding the same amino acid residue are combined in one group and each of the codons in a group is defined by a specific usage frequency within the group, whereby the sum of the specific usage frequencies of all codons in one group is 100%,
      • wherein the usage frequency of a codon in the polypeptide encoding nucleic acid is about the same as its specific usage frequency within its group.
  • In one embodiment the amino acid residues G, A, V, L, I, P, F, S, T, N, Q, Y, C, K, R, H, D, and E are each encoded by a group of codons and the amino acid residues M and W are encoded by a single codon.
  • In one embodiment the amino acid residues G, A, V, L, I, P, F, S, T, N, Q, Y, C, K, R, H, D, and E are each encoded by a group of codons comprising at least two codons and the amino acid residues M and W are encoded by a single codon.
  • In one embodiment the specific usage frequency of a codon is 100% if the amino acid residue is encoded by exactly one codon.
  • In one embodiment the amino acid residue G is encoded by a group of at most 4 codons. In one embodiment the amino acid residue A is encoded by a group of at most 4 codons. In one embodiment the amino acid residue V is encoded by a group of at most 4 codons. In one embodiment the amino acid residue L is encoded by a group of at most 6 codons. In one embodiment the amino acid residue I is encoded by a group of at most 3 codons. In one embodiment the amino acid residue M is encoded by exactly 1 codon. In one embodiment the amino acid residue P is encoded by a group of at most 4 codons. In one embodiment the amino acid residue F is encoded by a group of at most 2 codons. In one embodiment the amino acid residue W is encoded by exactly 1 codon. In one embodiment the amino acid residue S is encoded by a group of at most 6 codons. In one embodiment the amino acid residue T is encoded by a group of at most 4 codons. In one embodiment the amino acid residue N is encoded by a group of at most 2 codons. In one embodiment the amino acid residue Q is encoded by a group of at most 2 codons. In one embodiment the amino acid residue Y is encoded by a group of at most 2 codons. In one embodiment the amino acid residue C is encoded by a group of at most 2 codons. In one embodiment the amino acid residue K is encoded by a group of at most 2 codons. In one embodiment the amino acid residue R is encoded by a group of at most 6 codons. In one embodiment the amino acid residue H is encoded by a group of at most 2 codons. In one embodiment the amino acid residue D is encoded by a group of at most 2 codons. In one embodiment the amino acid residue E is encoded by a group of at most 2 codons.
  • In one embodiment the amino acid residue G is encoded by a group of 1 to 4 codons. In one embodiment the amino acid residue A is encoded by a group of 1 to 4 codons. In one embodiment the amino acid residue V is encoded by a group of 1 to 4 codons. In one embodiment the amino acid residue L is encoded by a group of 1 to 6 codons. In one embodiment the amino acid residue I is encoded by a group of 1 to 3 codons. In one embodiment the amino acid residue M is encoded by a group of 1 codon. In one embodiment the amino acid residue P is encoded by a group of 1 to 4 codons. In one embodiment the amino acid residue F is encoded by a group of 1 to 2 codons. In one embodiment the amino acid residue W is encoded by a group of 1 codon. In one embodiment the amino acid residue S is encoded by a group of 1 to 6 codons. In one embodiment the amino acid residue T is encoded by a group of 1 to 4 codons. In one embodiment the amino acid residue N is encoded by a group of 1 to 2 codons. In one embodiment the amino acid residue Q is encoded by a group of 1 to 2 codons. In one embodiment the amino acid residue Y is encoded by a group of 1 to 2 codons. In one embodiment the amino acid residue C is encoded by a group of 1 to 2 codons. In one embodiment the amino acid residue K is encoded by a group of 1 to 2 codons. In one embodiment the amino acid residue R is encoded by a group of 1 to 6 codons. In one embodiment the amino acid residue H is encoded by a group of 1 to 2 codons. In one embodiment the amino acid residue D is encoded by a group of 1 to 2 codons. In one embodiment the amino acid residue E is encoded by a group of 1 to 2 codons.
  • In one embodiment each of the groups comprises only codons with an overall usage frequency within the genome of the cell of more than 5%. In one embodiment each of the groups comprises only codons with an overall usage frequency within the genome of the cell of 8% or more. In one embodiment each of the groups comprises only codons with an overall usage frequency within the genome of the cell of 10% or more. In one embodiment each of the groups comprises only codons with an overall usage frequency within the genome of the cell of 15% or more.
  • In one embodiment the sequence of codons in the nucleic acid encoding the polypeptide for a specific amino acid residue in 5′ to 3′ direction is, i.e. corresponds to, the sequence of codons in a respective amino acid codon motif.
  • In one embodiment for each sequential occurrence of a specific amino acid in the polypeptide starting from the N-terminus of the polypeptide the encoding nucleic acid comprises the codon that is the same as that at the corresponding sequential position in the amino acid codon motif of the respective specific amino acid.
  • In one embodiment the usage frequency of a codon in the amino acid codon motif is about the same as its specific usage frequency within its group.
  • In one embodiment after the final codon of the amino acid codon motif has been reached at the next occurrence of the specific amino acid in the polypeptide the encoding nucleic acid comprises the codon that is at the first position of the amino acid codon motif.
  • In one embodiment each of the codons in the amino acid codon motif is distributed randomly throughout the amino acid codon motif.
  • In one embodiment each of the codons in the amino acid codon motif is distributed evenly throughout the amino acid codon motif.
  • In one embodiment the codons in the amino acid codon motif are arranged with decreasing specific usage frequency whereby after the codon with the lowest specific usage frequency or the codon with the second lowest specific usage frequency the codon with the highest specific usage frequency is used.
  • In one embodiment the codons in the amino acid codon motif are arranged with decreasing specific usage frequency whereby after the codon with the lowest specific usage frequency the codon with the highest specific usage frequency is used.
  • One aspect as reported herein is a cell comprising a nucleic acid as reported herein.
  • One aspect as reported herein is a method for increasing the expression of a polypeptide in a prokaryotic cell or a eukaryotic cell comprising the step of,
      • providing a nucleic acid encoding the polypeptide,
      • wherein each of the amino acid residues of the polypeptide is encoded by at least one codon, whereby the different codons encoding the same amino acid residue are combined in one group and each of the codons in a group is defined by a specific usage frequency within the group, whereby the sum of the specific usage frequencies of all codons in one group is 100%,
      • wherein the usage frequency of a codon in the polypeptide encoding nucleic acid is about the same as its specific usage frequency within its group.
    BRIEF DESCRIPTION OF THE FIGURES
  • FIG. 1 shows the Western blot of the polypeptide containing supernatants of differently encoded poly-His-tagged test-polypeptide.
  • FIG. 2 shows the Western blot of the SDS-extracted cell pellet of differently encoded poly-His-tagged test-polypeptide.
  • FIG. 3 shows the protein reference standard curve obtained from five known scFv-poly-His concentration.
  • DETAILED DESCRIPTION OF THE INVENTION Definitions
  • The term “amino acid” as used within this application denotes the group of carboxy α-amino acids, which directly or in form of a precursor can be encoded by a nucleic acid. The individual amino acids are encoded by nucleic acids consisting of three nucleotides, so called codons or base-triplets. Each amino acid is encoded by at least one codon. The encoding of the same amino acid by different codons is known as “degeneration of the genetic code”. The term “amino acid” as used within this application denotes the naturally occurring carboxy α-amino acids and is comprising alanine (three letter code: ala, one letter code: A), arginine (arg, R), asparagine (asn, N), aspartic acid (asp, D), cysteine (cys, C), glutamine (gln, Q), glutamic acid (glu, E), glycine (gly, G), histidine (his, H), isoleucine (ile, I), leucine (leu, L), lysine (lys, K), methionine (met, M), phenylalanine (phe, F), proline (pro, P), serine (ser, S), threonine (thr, T), tryptophan (tip, W), tyrosine (tyr, Y), and valine (val, V).
  • The term “antibody” herein is used in the broadest sense and encompasses various antibody structures, including but not limited to monoclonal antibodies, polyclonal antibodies, multispecific antibodies (e.g., bispecific antibodies), and antibody fragments so long as they exhibit the desired antigen-binding activity.
  • An “antibody fragment” refers to a molecule other than an intact antibody that comprises a portion of an intact antibody that binds the antigen to which the intact antibody binds. Examples of antibody fragments include but are not limited to Fv, Fab, Fab′, Fab′-SH, F(ab′)2; diabodies; linear antibodies; single-chain antibody molecules (e.g. scFv); and multispecific antibodies formed from antibody fragments.
  • The term “codon” denotes an oligonucleotide consisting of three nucleotides that is encoding a defined amino acid. Due to the degeneracy of the genetic code most amino acids are encoded by more than one codon. These different codons encoding the same amino acid have different relative usage frequencies in individual host cells. Thus, a specific amino acid is encoded either by exactly one codon or by a group of different codons. Likewise the amino acid sequence of a polypeptide can be encoded by different nucleic acids. Therefore, a specific amino acid (residue) in a polypeptide can be encoded by a group of different codons, whereby each of these codons has a usage frequency within a given host cell.
  • As a large number of gene sequences is available for a number of frequently used host cells the relative frequencies of codon usage can be calculated. Calculated codon usage tables are available from e.g. the “Codon Usage Database” (www.kazusa.or.jp/codon/), Nakamura, Y., et al., Nucl. Acids Res. 28 (2000) 292.
  • The codon usage tables for yeast, E. coli, homo sapiens and hamster have been reproduced from “EMBOSS: The European Molecular Biology Open Software Suite” (Rice, P., et al., Trends Gen. 16 (2000) 276-277, Release 6.0.1, 15.07.2009) and are shown in the following tables. The different codon usage frequencies for the 20 naturally occurring amino acids for E. coli, yeast, human cells, and CHO cells have been calculated for each amino acid, rather than for all 64 codons.
  • TABLE
    Saccharomyces cerevisiae overall codon usage
    frequency (encoded amino acid|codon|usage
    frequency [%])
    Ala GCG 1
    Ala GCA 6
    Ala GCT 64
    Ala GCC 29
    Arg AGG 3
    Arg AGA 77
    Arg CGG 0
    Arg CGA 0
    Arg CGT 19
    Arg CGC 1
    Asn AAT 23
    Asn AAC 77
    Asp GAT 49
    Asp GAC 51
    Cys TGT 87
    Cys TGC 13
    Gln CAG 6
    Gln CAA 94
    Glu GAG 11
    Glu GAA 89
    Gly GGG 1
    Gly GGA 3
    Gly GGT 89
    Gly GGC 7
    His CAT 37
    His CAC 63
    Ile ATA 2
    Ile ATT 52
    Ile ATC 45
    Leu CTG 3
    Leu CTA 9
    Leu CTT 4
    Leu CTC 1
    Leu TTG 64
    Leu TTA 20
    Lys AAG 74
    Lys AAA 26
    Met ATG 100
    Phe TTT 29
    Phe TTC 71
    Pro CCG 1
    Pro CCA 80
    Pro CCT 17
    Pro CCC 2
    Ser AGT 6
    Ser AGC 5
    Ser TCG 2
    Ser TCA 8
    Ser TCT 49
    Ser TCC 31
    Thr ACG 1
    Thr ACA 8
    Thr ACT 52
    Thr ACC 40
    Trp TGG 100
    Tyr TAT 22
    Tyr TAC 78
    Val GTG 4
    Val GTA 3
    Val GTT 54
    Val GTC 38
  • TABLE
    Escherichia Coli overall codon usage frequency
    (encoded amino acid|codon|usage frequency [%])
    Ala GCG 32
    Ala GCA 24
    Ala GCT 28
    Ala GCC 16
    Arg AGG 0
    Arg AGA 0
    Arg CGG 1
    Arg CGA 1
    Arg CGT 65
    Arg CGC 33
    Asn AAT 16
    Asn AAC 84
    Asp GAT 46
    Asp GAC 54
    Cys TGT 36
    Cys TGC 64
    Gln CAG 82
    Gln CAA 18
    Glu GAG 24
    Glu GAA 76
    Gly GGG 4
    Gly GGA 2
    Gly GGT 51
    Gly GGC 43
    His CAT 29
    His CAC 71
    Ile ATA 0
    Ile ATT 32
    Ile ATC 68
    Leu CTG 79
    Leu CTA 1
    Leu CTT 5
    Leu CTC 8
    Leu TTG 5
    Leu TTA 3
    Lys AAG 20
    Lys AAA 80
    Met ATG 100
    Phe TTT 28
    Phe TTC 72
    Pro CCG 75
    Pro CCA 14
    Pro CCT 10
    Pro CCC 1
    Ser AGT 4
    Ser AGC 25
    Ser TCG 7
    Ser TCA 5
    Ser TCT 32
    Ser TCC 27
    Thr ACG 12
    Thr ACA 4
    Thr ACT 28
    Thr ACC 56
    Trp TGG 100
    Tyr TAT 36
    Tyr TAC 64
    Val GTG 27
    Val GTA 20
    Val GTT 40
    Val GTC 13
  • TABLE
    Homo sapiens overall codon usage frequency
    (encoded amino acid|codon|usage frequency [%])
    Ala GCG 10
    Ala GCA 22
    Ala GCT 27
    Ala GCC 41
    Arg AGG 20
    Arg AGA 20
    Arg CGG 20
    Arg CGA 11
    Arg CGT 9
    Arg CGC 20
    Asn AAT 45
    Asn AAC 55
    Asp GAT 46
    Asp GAC 54
    Cys TGT 44
    Cys TGC 56
    Gln CAG 74
    Gln CAA 26
    Glu GAG 58
    Glu GAA 42
    Gly GGG 24
    Gly GGA 25
    Gly GGT 17
    Gly GGC 34
    His CAT 40
    His CAC 60
    Ile ATA 15
    Ile ATT 35
    Ile ATC 50
    Leu CTG 42
    Leu CTA 7
    Leu CTT 13
    Leu CTC 20
    Leu TTG 12
    Leu TTA 7
    Lys AAG 59
    Lys AAA 41
    Met ATG 100
    Phe TTT 45
    Phe TTC 55
    Pro CCG 11
    Pro CCA 28
    Pro CCT 28
    Pro CCC 33
    Ser AGT 15
    Ser AGC 25
    Ser TCG 6
    Ser TCA 14
    Ser TCT 18
    Ser TCC 23
    Thr ACG 12
    Thr ACA 27
    Thr ACT 24
    Thr ACC 37
    Trp TGG 100
    Tyr TAT 43
    Tyr TAC 57
    Val GTG 47
    Val GTA 11
    Val GTT 17
    Val GTC 25
  • TABLE
    Hamster overall codon usage frequence
    (encoded amino acid|codon|usage frequency [%])
    Ala GCG 9
    Ala GCA 23
    Ala GCT 30
    Ala GCC 38
    Arg AGG 22
    Arg AGA 20
    Arg CGG 19
    Arg CGA 9
    Arg CGT 10
    Arg CGC 19
    Asn AAT 39
    Asn AAC 61
    Asp GAT 39
    Asp GAC 61
    Cys TGT 42
    Cys TGC 58
    Gln CAG 78
    Gln CAA 22
    Glu GAG 64
    Glu GAA 36
    Gly GGG 24
    Gly GGA 25
    Gly GGT 19
    Gly GGC 33
    His CAT 42
    His CAC 58
    Ile ATA 15
    Ile ATT 35
    Ile ATC 51
    Leu CTG 44
    Leu CTA 6
    Leu CTT 13
    Leu CTC 19
    Leu TTG 12
    Leu TTA 6
    Lys AAG 67
    Lys AAA 33
    Met ATG 100
    Phe TTT 44
    Phe TTC 56
    Pro CCG 7
    Pro CCA 29
    Pro CCT 29
    Pro CCC 34
    Ser AGT 14
    Ser AGC 24
    Ser TCG 5
    Ser TCA 15
    Ser TCT 18
    Ser TCC 24
    Thr ACG 10
    Thr ACA 29
    Thr ACT 21
    Thr ACC 40
    Trp TGG 100
    Tyr TAT 39
    Tyr TAC 61
    Val GTG 48
    Val GTA 11
    Val GTT 16
    Val GTC 25
  • The term “expression” as used herein refers to transcription and/or translation processes occurring within a cell. The level of transcription of a nucleic acid sequence of interest in a cell can be determined on the basis of the amount of corresponding mRNA that is present in the cell. For example, mRNA transcribed from a sequence of interest can be quantitated by RT-PCR (qRT-PCR) or by Northern hybridization (see Sambrook, J., et al., 1989, supra). Polypeptides encoded by a nucleic acid of interest can be quantitated by various methods, e.g. by ELISA, by assaying for the biological activity of the polypeptide, or by employing assays that are independent of such activity, such as Western blotting or radioimmunoassay, using immunoglobulins that recognize and bind to the polypeptide (see Sambrook, J., et al., 1989, supra).
  • An “expression cassette” refers to a construct that contains the necessary regulatory elements, such as promoter and polyadenylation site, for expression of at least the contained nucleic acid in a cell.
  • Expression of a gene is performed either as transient or as permanent expression. The polypeptide(s) of interest are in general secreted polypeptides and therefore contain an N-terminal extension (also known as the signal sequence) which is necessary for the transport/secretion of the polypeptide through the cell wall into the extracellular medium. In general, the signal sequence can be derived from any gene encoding a secreted polypeptide. If a heterologous signal sequence is used, it preferably is one that is recognized and processed (i.e. cleaved by a signal peptidase) by the host cell. For secretion in yeast for example the native signal sequence of a heterologous gene to be expressed may be substituted by a homologous yeast signal sequence derived from a secreted gene, such as the yeast invertase signal sequence, alpha-factor leader (including Saccharomyces, Kluyveromyces, Pichia, and Hansenula α-factor leaders, the second described in U.S. Pat. No. 5,010,182), acid phosphatase signal sequence, or the C. albicans glucoamylase signal sequence (EP 0 362 179). In mammalian cell expression the native signal sequence of the protein of interest is satisfactory, although other mammalian signal sequences may be suitable, such as signal sequences from secreted polypeptides of the same or related species, e.g. for immunoglobulins from human or murine origin, as well as viral secretory signal sequences, for example, the herpes simplex glycoprotein D signal sequence. The DNA fragment encoding for such a pre-segment is ligated in frame, i.e. operably linked, to the DNA fragment encoding a polypeptide of interest.
  • The term “cell” or “host cell” refers to a cell into which a nucleic acid, e.g. encoding a heterologous polypeptide, can be or is transfected. The term “cell” includes both prokaryotic cells, which are used for expression of a nucleic acid and production of the encoded polypeptide including propagation of plasmids, and eukaryotic cells, which are used for the expression of a nucleic acid and production of the encoded polypeptide. In one embodiment, the eukaryotic cells are mammalian cells. In one embodiment the mammalian cell is a CHO cell, optionally a CHO K1 cell (ATCC CCL-61 or DSM ACC 110), or a CHO DG44 cell (also known as CHO-DHFR[-], DSM ACC 126), or a CHO XL99 cell, a CHO-T cell (see e.g. Morgan, D., et al., Biochemistry 26 (1987) 2959-2963), or a CHO-S cell, or a Super-CHO cell (Pak, S. C. O., et al. Cytotechnology 22 (1996) 139-146). If these cells are not adapted to growth in serum-free medium or in suspension an adaptation prior to the use in the current method is to be performed. As used herein, the expression “cell” includes the subject cell and its progeny. Thus, the words “transformant” and “transformed cell” include the primary subject cell and cultures derived there from without regard for the number of transfers or subcultivations. It is also understood that all progeny may not be precisely identical in DNA content, due to deliberate or inadvertent mutations. Variant progeny that have the same function or biological activity as screened for in the originally transformed cell are included.
  • In one embodiment the eukaryotic cell is a yeast cell. In one embodiment the yeast cell is of the genus Saccharomyces, or Pichia, or Hansenula, or Kluyveromyces, or Schizosaccharomyces.
  • In one embodiment the prokaryotic cell is an Escherichia cell, or a Salmonella cell, or a Bacillus cell, or a Lactococcus cell, or a Streptococcus cell.
  • In one embodiment the eukaryotic cell is a plant cell. In one embodiment the plant cell is of the genus Arabidopsis, Tobacco and Tomato.
  • The term “codon optimization” denotes the exchange of one, at least one, or more than one codon in a polypeptide encoding nucleic acid for a different codon with a different usage frequency in a respective cell.
  • The term “codon-optimized nucleic acid” denotes a nucleic acid encoding a polypeptide that has been adapted for improved expression in a cell, e.g. a mammalian cell or a bacterial cell, by replacing one, at least one, or more than one codon in a parent polypeptide encoding nucleic acid with a codon encoding the same amino acid residue with a different relative frequency of usage in the cell.
  • A “gene” denotes a nucleic acid which is a segment e.g. on a chromosome or on a plasmid which can effect the expression of a peptide, polypeptide, or protein. Beside the coding region, i.e. the structural gene, a gene comprises other functional elements e.g. a signal sequence, promoter(s), introns, and/or terminators.
  • The term “group of codons” and semantic equivalents thereof denote a defined number of different codons encoding one (i.e. the same) amino acid residue. The individual codons of one group differ in their overall usage frequency in the genome of a cell. Each codon in a group of codons has a specific usage frequency within the group that depends on the number of codons in the group. This specific usage frequency within the group can be different from the overall usage frequency in the genome of a cell but is depending (related thereto) on the overall usage frequency. A group of codons may comprise only one codon but can comprise also up to six codons.
  • The term “overall usage frequency in the genome of a cell” denotes the frequency of occurrence of a specific codon in the entire genome of a cell.
  • The term “specific usage frequency” of a codon in a group of codons denotes the frequency with which a single (i.e. a specific) codon of a group of codons in relation to all codons of one group can be found in a nucleic acid encoding a polypeptide obtained with a method as reported herein. The value of the specific usage frequency depends on the overall usage frequency of the specific codon in the genome of a cell and the number of codons in the group. Thus, as a group of codons does not necessarily comprise all possible codons encoding one specific amino acid residue the specific usage frequency of a codon in a group of condons is at least the same as its overall usage frequency in the genome of a cell and at most 100%, i.e. it is at least the same but can be more than the overall usage frequency in the genome of a cell. The sum of specific codon usage frequencies of all members of a group of codons is always about 100%.
  • The term “amino acid codon motif” denotes a sequence of codons, which all are members of the same group of codons and, thus, encode the same amino acid residue. The number of different codons in an amino acid codon motif is the same as the number of different codons in a group of codons but each codon can be present more than once in the amino acid codon motif. Further, each codon is present in the amino acid codon motif at its specific usage frequency. Therefore, the amino acid codon motif represents a sequence of different codons encoding the same amino acid residue wherein each of the different codons is present at its specific usage frequency, wherein the sequence starts with the codon having the highest specific usage frequence, and wherein the codons are arranged in a defined sequence. For example, the group of codons encoding the amino acid residue alanine comprises the four codons GCG, GCT, GCA and GCC with a specific usage frequency of 32%, 28%, 24% and 16%, respectively (corresponding to a 4:4:3:2 ratio). The amino acid codon motif for the amino acid residue alanine is defined in comprising the four codons GCG, GCT, GCA, and GCC at a ratio of 4:4:3:2, wherein the first codon is GCG. One exemplary amino acid codon motif for alanine is gcg gcg gcg gcg gct gct gct gct gca gca gca gcc gcc (SEQ ID NO: 01). This motif consists of thirteen sequential codons (4+4+3+2=13). Upon the first occurrence of the amino acid residue alanine in the amino acid sequence of a polypeptide the first codon of the amino acid codon motif is used in the corresponding encoding nucleic acid. Upon the second occurrence of alanine the second codon of the amino acid codon motif is used and so on. Upon the thirteenth occurrence of alanine in the amino acid sequence of the polypeptide the codon at the thirteenth, i.e. the last, position of the amino acid codon motiv is used in the corresponding encoding nucleic acid. Upon the fourteenth occurrence of the amino acid alanine in the amino acid sequence of the polypeptide again the first codon of the amino acid codon motif is used and so on.
  • A “nucleic acid” or a “nucleic acid sequence”, which terms are used interchangeably within this application, refers to a polymeric molecule consisting of individual nucleotides (also called bases) a, c, g, and t (or u in RNA), for example to DNA, RNA, or modifications thereof. This polynucleotide molecule can be a naturally occurring polynucleotide molecule or a synthetic polynucleotide molecule or a combination of one or more naturally occurring polynucleotide molecules with one or more synthetic polynucleotide molecules. Also encompassed by this definition are naturally occurring polynucleotide molecules in which one or more nucleotides are changed (e.g. by mutagenesis), deleted, or added. A nucleic acid can either be isolated, or integrated in another nucleic acid, e.g. in an expression cassette, a plasmid, or the chromosome of a host cell. A nucleic acid is characterized by its nucleic acid sequence consisting of individual nucleotides.
  • To a person skilled in the art procedures and methods are well known to convert an amino acid sequence, e.g. of a polypeptide, into a corresponding nucleic acid sequence encoding this amino acid sequence. Therefore, a nucleic acid is characterized by its nucleic acid sequence consisting of individual nucleotides and likewise by the amino acid sequence of a polypeptide encoded thereby.
  • A “structural gene” denotes the region of a gene without a signal sequence, i.e. the coding region.
  • A “transfection vector” is a nucleic acid (also denoted as nucleic acid molecule) providing all required elements for the expression of the in the transfection vector comprised coding nucleic acids/structural gene(s) in a host cell. A transfection vector comprises a prokaryotic plasmid propagation unit, e.g. for E. coli, in turn comprising a prokaryotic origin of replication, and a nucleic acid conferring resistance to a prokaryotic selection agent, further comprises the transfection vector one or more nucleic acid(s) conferring resistance to an eukaryotic selection agent, and one or more nucleic acid encoding a polypeptide of interest. Preferably are the nucleic acids conferring resistance to a selection agent and the nucleic acid(s) encoding a polypeptide of interest placed each within an expression cassette, whereby each expression cassette comprises a promoter, a coding nucleic acid, and a transcription terminator including a polyadenylation signal. Gene expression is usually placed under the control of a promoter, and such a structural gene is said to be “operably linked to” the promoter. Similarly, a regulatory element and a core promoter are operably linked if the regulatory element modulates the activity of the core promoter.
  • The term “vector”, as used herein, refers to a nucleic acid molecule capable of propagating another nucleic acid to which it is linked. The term includes the vector as a self-replicating nucleic acid structure as well as the vector incorporated into the genome of a host cell into which it has been introduced. Certain vectors are capable of directing the expression of nucleic acids to which they are operatively linked. Such vectors are referred to herein as “expression vectors”.
  • Recombinant Methods
  • Antibodies may be produced using recombinant methods and compositions, e.g., as described in U.S. Pat. No. 4,816,567. In one embodiment, isolated nucleic acid encoding an antibody as reported herein is provided. Such nucleic acid may encode an amino acid sequence comprising the VL and/or an amino acid sequence comprising the VH of the antibody (e.g., the light and/or heavy chains of the antibody). In one embodiment, one or more vectors (e.g., expression vectors) comprising such nucleic acid are provided. In one embodiment, a cell comprising such nucleic acid is provided. In one embodiment, a cell comprises (e.g., has been transformed with): (1) a vector comprising a nucleic acid that encodes an amino acid sequence comprising the VL of the antibody and an amino acid sequence comprising the VH of the antibody, or (2) a first vector comprising a nucleic acid that encodes an amino acid sequence comprising the VL of the antibody and a second vector comprising a nucleic acid that encodes an amino acid sequence comprising the VH of the antibody. In one embodiment, the cell is eukaryotic, e.g. a Chinese Hamster Ovary (CHO) cell or lymphoid cell (e.g., Y0, NS0, Sp2/0 cell). In one embodiment, a method of making an antibody is provided, wherein the method comprises culturing a cell comprising a nucleic acid encoding the antibody, as provided herein, under conditions suitable for expression of the antibody, and optionally recovering the antibody from the cell (or culture medium).
  • For recombinant production of an antibody, nucleic acid encoding an antibody, e.g., as reported herein, is isolated and inserted into one or more vectors for further cloning and/or expression in a cell. Such nucleic acid may be readily isolated and sequenced using conventional procedures (e.g., by using oligonucleotide probes that are capable of binding specifically to genes encoding the heavy and light chains of the antibody).
  • Suitable cells for cloning or expression of antibody-encoding vectors include prokaryotic or eukaryotic cells described herein. For example, antibodies may be produced in bacteria, in particular when glycosylation and Fc effector function are not needed. For expression of antibody fragments and polypeptides in bacteria, see, e.g., U.S. Pat. No. 5,648,237, U.S. Pat. No. 5,789,199, and U.S. Pat. No. 5,840,523; Charlton, K. A., In: Methods in Molecular Biology, Vol. 248, Lo, B. K. C. (ed.), Humana Press, Totowa, N.J. (2003) pp. 245-254, describing expression of antibody fragments in E. coli). After expression, the antibody may be isolated from the bacterial cell paste in a soluble fraction and can be further purified.
  • In addition to prokaryotes, eukaryotic microbes such as filamentous fungi or yeast are suitable cloning or expression hosts for antibody-encoding vectors, including fungi and yeast strains whose glycosylation pathways have been “humanized,” resulting in the production of an antibody with a partially or fully human glycosylation pattern (see Gerngross, T. U., Nat. Biotech. 22 (2004) 1409-1414; and Li, H., et al., Nat. Biotech. 24 (2006) 210-215).
  • Suitable host cells for the expression of glycosylated antibody are also derived from multicellular organisms (invertebrates and vertebrates). Examples of invertebrate cells include plant and insect cells. Numerous baculoviral strains have been identified which may be used in conjunction with insect cells, particularly for transfection of Spodoptera frugiperda cells.
  • Plant cell cultures can also be utilized as hosts (see e.g. U.S. Pat. No. 5,959,177, U.S. Pat. No. 6,040,498, U.S. Pat. No. 6,420,548, U.S. Pat. No. 7,125,978, and U.S. Pat. No. 6,417,429 (describing PLANTIBODIES™ technology for producing antibodies in transgenic plants).
  • Vertebrate cells may also be used as hosts. For example, mammalian cell lines that are adapted to grow in suspension may be useful. Other examples of useful mammalian cell lines are monkey kidney CV1 line transformed by SV40 (COS-7); human embryonic kidney line (293 cells as described, e.g., in Graham, F. L., et al., J. Gen Virol. 36 (1977) 59-74); baby hamster kidney cells (BHK); mouse sertoli cells (TM4 cells as described, e.g., in Mather, J. P., Biol. Reprod. 23 (1980) 243-252); monkey kidney cells (CV1); African green monkey kidney cells (VERO-76); human cervical carcinoma cells (HELA); canine kidney cells (MDCK; buffalo rat liver cells (BRL 3A); human lung cells (W138); human liver cells (Hep G2); mouse mammary tumor (MMT 060562); TRI cells, as described, e.g., in Mather, J. P., et al., Annals N.Y. Acad. Sci. 383 (1982) 44-68; MRC 5 cells; and FS4 cells. Other useful mammalian cell lines include Chinese hamster ovary (CHO) cells, including DHFR CHO cells (Urlaub, G., et al., Proc. Natl. Acad. Sci. USA 77 (1980) 4216-4220); and myeloma cell lines such as Y0, NS0 and Sp2/0. For a review of certain mammalian host cell lines suitable for antibody production, see, e.g., Yazaki, P. and Wu, A. M., Methods in Molecular Biology, Vol. 248, Lo, B. K. C. (ed.), Humana Press, Totowa, N.J. (2004) pp. 255-268.
  • Codon Usage
  • Codon usage tables (see tables above for examples) are readily available, for example, at the “Codon Usage Database” available at http://www.kazusa.or.jp/codon/ and these tables can be adapted in a number of ways (Nakamura, Y., et al., Nucl. Acids Res. 28 (2000) 292).
  • For high yield expression of recombinant polypeptides the encoding nucleic acid plays an important role. Naturally occurring and from nature isolated encoding nucleic acids are generally not optimized for high yield expression, especially if expressed in a heterologous host cell. Due to the degeneration of the genetic code one amino acid residue can be encoded by more than one nucleotide triplet (codon) except for the amino acids tryptophan and methionine. Thus, for one amino acid sequence different encoding codons (=corresponding encoding nucleic acid sequences) are possible.
  • The different codons encoding one amino acid residue are employed by different organisms with different relative frequency (codon usage). Generally one specific codon is used with higher frequency than the other possible codons.
  • In WO 2001/088141 a reading frame optimization according to codon usage found in highly expressed mammalian genes is reported. For that purpose, a matrix was generated considering almost exclusively those codons that are used most frequently and, less preferably, those that are used second most frequently in highly expressed mammalian genes as depicted in the following table. Using these codons from highly expressed human genes a fully synthetic reading frame not occurring in nature was created, which, however encodes the very same product as the original wild-type gene construct.
  • In U.S. Pat. No. 8,128,938 different methods of codon optimization using the usage frequency of individual codons are reported, such as uniform optimization, full-optimization and minimal optimization.
  • In the following table the most frequently used codon (codon 1) and second most frequently used codon (codon 2) found in highly expressed mammalian genes is shown.
  • TABLE
    amino acid codon 1 codon 2
    Ala GCC GCT
    Arg AGG AGA
    Asn AAC AAT
    Asp GAC GAT
    Cys TGC TGT
    End TGA TAA
    Gln CAG CAT
    Glu GAG GAA
    Gly GGC GGA
    His CAC CAT
    Ile ATC ATT
    Leu CTG CTC
    Lys AAG AAT
    Met ATG ATG
    Phe TTC TTT
    Pro CCC CCT
    Ser AGC TCC
    Thr ACC ACA
    Trp TGG TGG
    Tyr TAC TAT
    Val GTG GTC
    (Ausubel, F. M., et al., Current Protocols in Molecular Biology 2 (1994), A1.8-A1.9).
  • Few deviations from strict adherence to the usage of most frequently found codons may be made (i) to accommodate the introduction of unique restriction sites, (ii) to break G or C stretches extending more than 7 base pairs in order to allow consecutive PCR amplification and sequencing of the synthetic gene product.
  • Method for Producing a Polypeptide by Expressing an Encoding Nucleic Acid with Modified Codon Usage
  • It has been found that for the expression of a polypeptide in a cell the use of a polypeptide encoding nucleic acid, in which each amino acid is encoded by a group of codons, whereby each codon in the group of codons is defined by an specific usage frequency in the group corresponding to the overall usage frequency in the genome of the cell, and whereby the usage frequency of the codons in the (total) polypeptide encoding nucleic acid is about the same as the usage frequency within the group, is beneficial.
  • One aspect herein is a method for producing a polypeptide comprising the step of cultivating a cell, which comprises a nucleic acid encoding the polypeptide, and recovering the polypeptide from the cell or the cultivation medium,
      • wherein each of the amino acid residues of the polypeptide is encoded by at least one codon, whereby the different codons encoding the same amino acid residue are combined in one group and each of the codons in a group is defined by a specific usage frequency within the group, whereby the sum of the specific usage frequencies of all codons in one group is 100%,
      • wherein the usage frequency of a codon in the polypeptide encoding nucleic acid is about the same as its specific usage frequency within its group.
  • In the method as reported herein a codon optimized nucleic acid encoding a polypeptide is used to express the polypeptide, whereby the obtainable expression yield is increased compared to other nucleic acids.
  • It has been found that by using a group of codons for encoding each amino acid in a polypeptide that is in nature also encoded by a group of codons and by using the same relative ratio between the individual codons of one group also within the entire nucleic acid a nucleic acid that encodes a polypeptide can be provided that upon expression result in an improved yield, e.g. compared to a nucleic acid in which always the codon with the highest usage frequency is present.
  • In one embodiment the cell is a prokaryotic cell. In one embodiment the cell is a bacterial cell.
  • In one preferred embodiment the cell is an E. coli cell. The overall codon usage frequency taking into account all codons encoding a specific amino acid residue for E. coli is given in the following table.
  • TABLE
    Ala GCG 32
    Ala GCA 24
    Ala GCT 28
    Ala GCC 16
    Arg AGG 0
    Arg AGA 0
    Arg CGG 1
    Arg CGA 1
    Arg CGT 65
    Arg CGC 33
    Asn AAT 16
    Asn AAC 84
    Asp GAT 46
    Asp GAC 54
    Cys TGT 36
    Cys TGC 64
    Gln CAG 82
    Gln CAA 18
    Glu GAG 24
    Glu GAA 76
    Gly GGG 4
    Gly GGA 2
    Gly GGT 51
    Gly GGC 43
    His CAT 29
    His CAC 71
    Ile ATA 0
    Ile ATT 32
    Ile ATC 68
    Leu CTG 79
    Leu CTA 1
    Leu CTT 5
    Leu CTC 8
    Leu TTG 5
    Leu TTA 3
    Lys AAG 20
    Lys AAA 80
    Met ATG 100
    Phe TTT 28
    Phe TTC 72
    Pro CCG 75
    Pro CCA 14
    Pro CCT 10
    Pro CCC 1
    Ser AGT 4
    Ser AGC 25
    Ser TCG 7
    Ser TCA 5
    Ser TCT 32
    Ser TCC 27
    Thr ACG 12
    Thr ACA 4
    Thr ACT 28
    Thr ACC 56
    Trp TGG 100
    Tyr TAT 36
    Tyr TAC 64
    Val GTG 27
    Val GTA 20
    Val GTT 40
    Val GTC 13
  • For encoding the amino acid residue alanine four different codons are available: GCG, GCA, GCT, and GCC. Thus, the group of codons encoding the amino acid residue can comprise at most four codons. In the group comprising four codons, each codon has a specific usage frequency in the group that is the same as the overall usage frequency in the genome of the cell, i.e. the codon GCG has a specific and overall usage frequency of 32%, the codon GCA has a specific and overall usage frequency of 24%, the codon GCT has a specific and overall usage frequency of 28%, and the codon GCC has a specific and overall usage frequency of 16%. If the number of codons in the group is reduced, e.g. be excluding the codons GCA and GCC with a specific usage frequency of 24% and 16% from the group, respectively, the specific usage frequency of the remaining members of the group, i.e. GCG and GCT, changes to 53% (=32/(32+28)*100) and 47% (=28/(32+28)*100), respectively, as the sum of the specific usage frequencies of all codons in one group is 100%. Thus, in the group comprising the two codons GCG and GCT, each codon has a specific usage frequency in the group that is higher than its overall usage frequency in the genome of the cell, i.e. the codon GCG has a specific usage frequency of 53% and an overall usage frequency of 32%, and the codon GCT has a specific usage frequency of 47% and an overall usage frequency of 28%.
  • If the group of codons encoding the amino acid residue alanine comprises all four available codons the amino acid codon motif of the codons encoding the amino acid residue alanine comprises the codons GCG, GCT, GCA and GCC at a ratio of 32:28:24:16, which corresponds to 8:7:6:4. As this would result in an amino acid codon motif comprising 25 positions it is adjusted to 8:8:6:4, which corresponds to 4:4:3:2.
  • The use of the different codons encoding a specific amino acid is alternating within the genome. This alternation is reflected herein by the definition of an amino acid codon motif. Within an amino acid codon motif the individual codons are distributed taking into account the specific usage frequency, whereby codons with a higher frequency are chosen first. The amino acid sequence motif comprises a specific sequence of codons, wherein the total number of codons in an amino acid codon motif is at least the same or even higher than the number of codons in a group of codons in order to allow a mapping of the usage frequence of a group of codons to the corresponding amino acid codon motif.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue alanine is gcg gcg gcg gcg gct gct gct gct gca gca gca gcc gcc (SEQ ID NO: 01).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue alanine is gcg gct gca gcc gcg gct gca gcg gct gcc gcg gct gca (SEQ ID NO: 02).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue alanine is gcg gct gca gcc gcg gct gca gcc gcg gct gca gcg gct (SEQ ID NO: 03).
  • If the group of codons encoding the amino acid residue alanine comprises the two codons GCG and GCT the amino acid codon motif of the codons encoding the amino acid residue alanine comprises the codons GCG and GCT at a ratio of 53:47. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 50:50, which corresponds to 1:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue alanine is gcg gct (SEQ ID NO: 04).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue alanine is gct gcg (SEQ ID NO: 05).
  • If the group of codons encoding the amino acid residue arginine comprises the two codons CGT and CGC the amino acid codon motif of the codons encoding the amino acid residue arginine comprises the codons CGT and CGC at a ratio of 66:34, which corresponds to 33:17. As this would result in an amino acid codon motif comprising 51 positions it is adjusted to 66:33, which corresponds to 2:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue arginine is cgt cgt cgc (SEQ ID NO: 06).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue arginine is cgt cgc cgt (SEQ ID NO: 07).
  • If the group of codons encoding the amino acid residue asparagine comprises the two codons AAC and AAT the amino acid codon motif of the codons encoding the amino acid residue asparagine comprises the codons AAC and AAT at a ratio of 84:16, which corresponds to 21:4. As this would result in an amino acid codon motif comprising 25 positions it is adjusted to 20:4, which corresponds to 5:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue asparagine is aac aac aac aac aac aat (SEQ ID NO: 08).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue asparagine is aac aac aac aac aat aac (SEQ ID NO: 09).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue asparagine is aac aac aac aat aac aac (SEQ ID NO: 10).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue asparagine is aac aac aat aac aac aac (SEQ ID NO: 11).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue asparagine is aac aat aac aac aac aac (SEQ ID NO: 12).
  • If the group of codons encoding the amino acid residue aspartic acid comprises the two codons GAC and GAT the amino acid codon motif of the codons encoding the amino acid residue aspartic acid comprises the codons GAC and GAT at a ratio of 54:46, which corresponds to 27:23. As this would result in an amino acid codon motif comprising 50 positions it is adjusted to 25:25, which corresponds to 1:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue aspartic acid is gac gat (SEQ ID NO: 13).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue aspartic acid is gat gac (SEQ ID NO: 14).
  • If the group of codons encoding the amino acid residue cysteine comprises the two codons TGC and TGT the amino acid codon motif of the codons encoding the amino acid residue cysteine comprises the codons TGC and TGT at a ratio of 64:36, which corresponds to 16:9. As this would result in an amino acid codon motif comprising 25 positions it is adjusted to 15:9, which corresponds to 5:3.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue cysteine is tgc tgc tgc tgc tgc tgt tgt tgt (SEQ ID NO: 15).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue cysteine is tgc tgt tgc tgc tgt tgc tgc tgt (SEQ ID NO: 16).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue cysteine is tgc tgc tgt tgc tgt tgc tgt tgc (SEQ ID NO: 17).
  • If the group of codons encoding the amino acid residue glutamine comprises the two codons CAG and CAA the amino acid codon motif of the codons encoding the amino acid residue glutamine comprises the codons CAG and CAA at a ratio of 82:18, which corresponds to 41:9. As this would result in an amino acid codon motif comprising 50 positions it is adjusted to 40:10, which corresponds to 4:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue glutamine is cag cag cag cag caa (SEQ ID NO: 18).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue glutamine is cag cag cag caa cag (SEQ ID NO: 19).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue glutamine is cag cag caa cag cag (SEQ ID NO: 20).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue glutamine is cag caa cag cag cag (SEQ ID NO: 21).
  • If the group of codons encoding the amino acid residue glutamic acid comprises the two codons GAA and GAG the amino acid codon motif of the codons encoding the amino acid residue glutamic acid comprises the codons GAA and GAG at a ratio of 76:24, which corresponds to 19:6. As this would result in an amino acid codon motif comprising 50 positions it is adjusted to 18:6, which corresponds to 3:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue glutamic acid is gaa gaa gaa gag (SEQ ID NO: 22).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue glutamic acid is gaa gaa gag gaa (SEQ ID NO: 23).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue glutamic acid is gaa gag gaa gaa (SEQ ID NO: 24).
  • If the group of codons encoding the amino acid residue glycine comprises the two codons GGT and GGC the amino acid codon motif of the codons encoding the amino acid residue glycine comprises the codons GGT and GGC at a ratio of 54:46, which corresponds to 27:23. As this would result in an amino acid codon motif comprising 50 positions it is adjusted to 30:24 taking into account that glycine can be encoded by four different codons, which corresponds to 5:4.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue glycine is ggt ggt ggt ggt ggt ggc ggc ggc ggc (SEQ ID NO: 25).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue glycine is ggt ggc ggt ggc ggt ggc ggt ggc ggt (SEQ ID NO: 26).
  • If the group of codons encoding the amino acid residue histidine comprises the two codons CAC and CAT the amino acid codon motif of the codons encoding the amino acid residue histidine comprises the codons CAC and CAT at a ratio of 71:29. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 66:33, which corresponds to 2:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue histidine is cac cac cat (SEQ ID NO: 27).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue histidine is cac cat cac (SEQ ID NO: 28).
  • If the group of codons encoding the amino acid residue isoleucine comprises the two codons ATC and ATT the amino acid codon motif of the codons encoding the amino acid residue isoleucine comprises the codons ATC and ATT at a ratio of 68:32, which corresponds to 17:8. As this would result in an amino acid codon motif comprising 25 positions it is adjusted to 16:8, which corresponds to 2:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue isoleucine is atc atc att (SEQ ID NO: 29).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue isoleucine is atc att atc (SEQ ID NO: 30).
  • If the group of codons encoding the amino acid residue leucine comprises the two codons CTG and CTC the amino acid codon motif of the codons encoding the amino acid residue leucine comprises the codons CTG and CTC at a ratio of 91:9. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 90:10, which corresponds to 9:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue leucine is ctg ctg ctg ctg ctg ctg ctg ctg ctg ctc (SEQ ID NO: 31).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue leucine is ctg ctg ctg ctg ctg ctc ctg ctg ctg ctg (SEQ ID NO: 32).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue leucine is ctg ctg ctg ctg ctc ctg ctg ctg ctg ctg (SEQ ID NO: 33).
  • If the group of codons encoding the amino acid residue lysine comprises the two codons AAA and AAG the amino acid codon motif of the codons encoding the amino acid residue lysine comprises the codons AAA and AAG at a ratio of 80:20, which corresponds to 4:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue lysine is aaa aaa aaa aaa aag (SEQ ID NO: 34).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue lysine is aaa aaa aaa aag aaa (SEQ ID NO: 35).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue lysine is aaa aaa aag aaa aaa (SEQ ID NO: 36).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue lysine is aaa aag aaa aaa aaa (SEQ ID NO: 37).
  • If the group of codons encoding the amino acid residue phenylalanine comprises the two codons TTC and TTT the amino acid codon motif of the codons encoding the amino acid residue phenylalanine comprises the codons TTC and TTT at a ratio of 72:28, which corresponds to 18:7. As this would result in an amino acid codon motif comprising 25 positions it is adjusted to 18:6, which corresponds to 3:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue phenylalanine is ttc ttc ttc ttt (SEQ ID NO: 38).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue phenylalanine is ttc ttc ttt ttc (SEQ ID NO: 39).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue phenylalanine is ttc ttt ttc ttc (SEQ ID NO: 40).
  • If the group of codons encoding the amino acid residue proline comprises the three codons CCG, CCA and CCT the amino acid codon motif of the codons encoding the amino acid residue proline comprises the codons CCG, CCA and CCT at a ratio of 76:14:10, which corresponds to 38:7:5. As this would result in an amino acid codon motif comprising 50 positions it is adjusted to 35:7:7, which corresponds to 5:1:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue proline is ccg ccg ccg ccg ccg cca cct (SEQ ID NO: 41).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue proline is ccg ccg ccg cca ccg cct ccg (SEQ ID NO: 42).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue proline is ccg ccg cca ccg ccg cct ccg (SEQ ID NO: 43).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue proline is ccg ccg cca ccg cct ccg ccg (SEQ ID NO: 44).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue proline is ccg cca ccg ccg cct ccg ccg (SEQ ID NO: 45).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue proline is ccg cca ccg cct ccg ccg ccg (SEQ ID NO: 46).
  • If the group of codons encoding the amino acid residue serine comprises the three codons TCT, TCC and AGC the amino acid codon motif of the codons encoding the amino acid residue serine comprises the codons TCT, TCC and AGC at a ratio of 38:32:30, which corresponds to 19:16:15. As this would result in an amino acid codon motif comprising 50 positions it is adjusted to 18:15:15, which corresponds to 6:5:5.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue serine is tct tct tct tct tct tct tcc tcc tcc tcc tcc agc agc agc agc agc (SEQ ID NO: 47).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue serine is tct tcc agc tct tcc agc tct tcc agc tct tcc agc tcc tct agc tct (SEQ ID NO: 48).
  • If the group of codons encoding the amino acid residue threonine comprises the three codons ACC, ACT and ACG the amino acid codon motif of the codons encoding the amino acid residue threonine comprises the codons ACC, ACT and ACG at a ratio of 58:29:13. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 56:28:14, which corresponds to 4:2:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue threonine is acc acc acc acc act act acg (SEQ ID NO: 49).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue threonine is acc act acc act acc acg acc (SEQ ID NO: 50).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue threonine is acc act acc acg acc act acc (SEQ ID NO: 51).
  • If the group of codons encoding the amino acid residue tyrosine comprises the two codons TAC and TAT the amino acid codon motif of the codons encoding the amino acid residue tyrosine comprises the codons TAC and TAT at a ratio of 64:34, which corresponds to 32:17. As this would result in an amino acid codon motif comprising 49 positions it is adjusted to 32:16, which corresponds to 2:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue tyrosine is tac tac tat (SEQ ID NO: 52).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue tyrosine is tac tat tac (SEQ ID NO: 53).
  • If the group of codons encoding the amino acid residue valine comprises all four available codons the amino acid codon motif of the codons encoding the amino acid residue valine comprises the codons GTT, GTG, GTA and GTC at a ratio of 40:27:20:13. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 40:30:20:10, which corresponds to 4:3:2:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue valine is gtt gtt gtt gtt gtg gtg gtg gta gta gtc (SEQ ID NO: 54).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue valine is gtt gtg gta gtc gtt gtg gta gtt gtg gtt (SEQ ID NO: 55).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue valine is gtt gtg gta gtt gtg gtt gtc gtt gtg gta (SEQ ID NO: 56).
  • In the following the method as reported herein is exemplified with a polypeptide that has the following amino acid sequence:
  • (SEQ ID NO: 57)
    DSAVDSQGTS FSEYVGAFVS VDAGHKAAES
    QASVSSVYNL AVPAYRASYV RSDTSDIDTA
    AVSSPDVVDI IERVKSYSRG SVTAAYAIGV
    RYDWSRSHSG SETSTSNFAY TYSLNSTQTF
    VYASKARSAL AAVVVGVRES ITGSSGQVFF
    AATSTASDAH ASTGADIDPT AVVHTDVSVV
    ISAFAVAAHG VARVHHVIAS IDYAVDAGAA
    GAAGSSGGTR IAGVVVSVTI RGFSLTGLGA
    GDVGPHTARY AGESFSVDCS RGASHVASSA
    KPASVTDMTP YRSVTDDASD DGPASVSDGY.
  • For ease of purification a purification tag can be attached to the N- or C-terminus of the polypeptide. The purification tag can be fused directly to the amino acid sequence of the polypeptide or it can be separated from the amino acid sequence by a short linker or a protease cleavage site. One exemplary purification tag is the hexa-histidine tag with an N-terminal GS linker for fusion to the C-terminus of the polypeptide:
  • (SEQ ID NO: 58)
    GSHHHHHH,

    which is encoded by the following nucleic acid sequence:
  • (SEQ ID NO: 59)
    ggttctcaccaccaccaccaccac.
  • The test-polypeptide can be preceded by a short carrier peptide. One exemplary carrier peptide is derived from the N-terminal part of mature human interferon-alpha:
  • (SEQ ID NO: 60)
    MCDLPQTHSL GS,

    which is encoded by the following nucleic acid sequence:
  • (SEQ ID NO: 61)
    atgtgcgacctgccgcagacccactcccttggatcc.
  • The nucleic acid sequence encoding the test-polypeptide of SEQ ID NO: 57, which is obtained with a backtranslation method using always the codon with the highest usage in the respective cell, is
  • (SEQ ID NO: 62)
    gactctgcggttgactctcagggtacctctttctctgaatacgttggtg
    cgttcgtttctgttgacgcgggtcacaaagcggcggaatctcaggcgtc
    tgtttcttctgtttacaacctggcggttccggcgtaccgtgcgtcttac
    gttcgttctgacacctctgacatcgacaccgcggcggtttcttctccgg
    acgttgttgacatcatcgaacgtgttaaatcttactctcgtggttctgt
    taccgcggcgtacgcgatcggtgttcgttacgactggtctcgttctcac
    tctggttctgaaacctctacctctaacttcgcgtacacctactctctga
    actctacccagaccttcgtttacgcgtctaaagcgcgttctgcgctggc
    ggcggttgttgttggtgttcgtgaatctatcaccggttcttctggtcag
    gttttcttcgcggcgacctctaccgcgtctgacgcgcacgcgtctaccg
    gtgcggacatcgacccgaccgcggttgttcacaccgacgtttctgttgt
    tatctctgcgttcgcggttgcggcgcacggtgttgcgcgtgttcaccac
    gttatcgcgtctatcgactacgcggttgacgcgggtgcggcgggtgcgg
    cgggttcttctggtggtacccgtatcgcgggtgttgttgtttctgttac
    catccgtggtttctctctgaccggtctgggtgcgggtgacgttggtccg
    cacaccgcgcgttacgcgggtgaatctttctctgttgactgctctcgtg
    gtgcgtctcacgttgcgtcttctgcgaaaccggcgtctgttaccgacat
    gaccccgtaccgttctgttaccgacgacgcgtctgacgacggtccggcg
    tctgtttctgacggttac.
  • The nucleic acid sequence encoding the test-polypeptide of SEQ ID NO: 57, which is obtained with a method as reported herein, is
  • (SEQ ID NO: 63)
    gactctgcggttgattcccagggtaccagcttctctgaatacgtgggcg
    ctttcgtatccgttgacgcaggtcacaaagccgcggaaagccaggcttc
    tgtgtccagcgtttataacctggcagtcccggcctaccgtgcgtcttac
    gttcgctccgatactagcgacatcgataccgctgcagtgtcttccccgg
    acgtagttgatattatcgagcgtgtgaaaagctattctcgtggctctgt
    aacggcggcgtacgctatcggtgttcgctacgactggtcccgtagccat
    tctggctccgaaaccagcacttctaactttgcatatacctactccctga
    acagcacccaaactttcgtgtacgcctctaaggcgcgttccgctctggc
    agccgttgtcgttggtgtgcgcgaaagcattaccggctcttccggtcag
    gtattcttcgcggctacgagcaccgcatctgatgcgcacgcgtctactg
    gtgctgacatcgatccaaccgcagttgtgcacaccgacgtatccgttgt
    gatcagcgcctttgcggttgctgcacatggcgtcgcccgtgttcaccac
    gtgattgcgtctatcgattatgctgtagacgcaggtgcggcgggcgctg
    caggttccagcggcggtactcgtatcgccggcgttgtggtatctgttac
    cattcgcggtttctccctgacgggtctcggcgcgggtgatgtgggcccg
    cataccgctcgttacgcaggtgaaagcttctctgttgactgctcccgtg
    gcgccagccacgtcgcgtcttccgctaaaccggcaagcgttactgatat
    gaccccttaccgctctgtgaccgacgatgcgtctgacgatggtccggcg
    tccgtaagcgacggctat.
  • By comparing the nucleic acid sequence that has been obtained by using always the codon with the highest usage frequency with the nucleic acid that has been obtained with a method as reported herein it can be seen that the two nucleic acids differ in 146 codons out of 300 codons, i.e. the optimized sequences differ by 48.7% of all coding codons (differing codons are underlined in the following alignment).
  • gactctgcggttgactctcagggtacctctttctctgaatacgttggtgcgttc
    gactctgcggttgattcccagggtaccagcttctctgaatacgtgggcgctttc
    D  S  A  V  D  S  Q  G  T  S  F  S  E  Y  V  G  A  F
    gtttctgttgacgcgggtcacaaagcggcggaatctcaggcgtctgtttcttct
    gtatccgttgacgcaggtcacaaagccgcggaaagccaggcttctgtgtccagc
    V  S  V  D  A  G  H  K  A  A  E  S  Q  A  S  V  S  S
    gtttacaacctggcggttccggcgtaccgtgcgtcttacgttcgttctgacacc
    gtttataacctggcagtcccggcctaccgtgcgtcttacgttcgctccgatact
    V  Y  N  L  A  V  P  A  Y  R  A  S  Y  V  R  S  D  T
    tctgacatcgacaccgcggcggtttcttctccggacgttgttgacatcatcgaa
    agcgacatcgataccgctgcagtgtcttccccggacgtagttgatattatcgag
    S  D  I  D  T  A  A  V  S  S  P  D  V  V  D  I  I  E
    cgtgttaaatcttactctcgtggttctgttaccgcggcgtacgcgatcggtgtt
    cgtgtgaaaagctattctcgtggctctgtaacggcggcgtacgctatcggtgtt
    R  V  K  S  Y  S  R  G  S  V  T  A  A  Y  A  I  G  V
    cgttacgactggtctcgttctcactctggttctgaaacctctacctctaacttc
    cgctacgactggtcccgtagccattctggctccgaaaccagcacttctaacttt
    R  Y  D  W  S  R  S  H  S  G  S  E  T  S  T  S  N  F
    gcgtacacctactctctgaactctacccagaccttcgtttacgcgtctaaagcg
    gcatatacctactccctgaacagcacccaaactttcgtgtacgcctctaaggcg
    A  Y  T  Y  S  L  N  S  T  Q  T  F  V  Y  A  S  K  A
    cgttctgcgctggcggcggttgttgttggtgttcgtgaatctatcaccggttct
    cgttccgctctggcagccgttgtcgttggtgtgcgcgaaagcattaccggctct
    R  S  A  L  A  A  V  V  V  G  V  R  E  S  I  T  G  S
    tctggtcaggttttcttcgcggcgacctctaccgcgtctgacgcgcacgcgtct
    tccggtcaggtattcttcgcggctacgagcaccgcatctgatgcgcacgcgtct
    S  G  Q  V  F  F  A  A  T  S  T  A  S  D  A  H  A  S
    accggtgcggacatcgacccgaccgcggttgttcacaccgacgtttctgttgtt
    actggtgctgacatcgatccaaccgcagttgtgcacaccgacgtatccgttgtg
    T  G  A  D  I  D  P  T  A  V  V  H  T  D  V  S  V  V
    atctctgcgttcgcggttgcggcgcacggtgttgcgcgtgttcaccacgttatc
    atcagcgcctttgcggttgctgcacatggcgtcgcccgtgttcaccacgtgatt
    I  S  A  F  A  V  A  A  H  G  V  A  R  V  H  H  V  I
    gcgtctatcgactacgcggttgacgcgggtgcggcgggtgcggcgggttcttct
    gcgtctatcgattatgctgtagacgcaggtgcggcgggcgctgcaggttccagc
    A  S  I  D  Y  A  V  D  A  G  A  A  G  A  A  G  S  S
    ggtggtacccgtatcgcgggtgttgttgtttctgttaccatccgtggtttctct
    ggcggtactcgtatcgccggcgttgtggtatctgttaccattcgcggtttctcc
    G  G  T  R  I  A  G  V  V  V  S  V  T  I  R  G  F  S
    ctgaccggtctgggtgcgggtgacgttggtccgcacaccgcgcgttacgcgggt
    ctgacgggtctcggcgcgggtgatgtgggcccgcataccgctcgttacgcaggt
    L  T  G  L  G  A  G  D  V  G  P  H  T  A  R  Y  A  G
    gaatctttctctgttgactgctctcgtggtgcgtctcacgttgcgtcttctgcg
    gaaagcttctctgttgactgctcccgtggcgccagccacgtcgcgtcttccgct
    E  S  F  S  V  D  C  S  R  G  A  S  H  V  A  S  S  A
    aaaccggcgtctgttaccgacatgaccccgtaccgttctgttaccgacgacgcg
    aaaccggcaagcgttactgatatgaccccttaccgctctgtgaccgacgatgcg
    K  P  A  S  V  T  D  M  T  P  Y  R  S  V  T  D  D  A
    tctgacgacggtccggcgtctgtttctgacggttac
    tctgacgatggtccggcgtccgtaagcgacggctat
    S  D  D  G  P  A  S  V  S  D  G  Y
    (upper row: SEQ ID NO: 62; middle row: SEQ ID NO: 63; lower
    row: SEQ ID NO: 57).
  • In the following table the codons of the respective encoding nucleic acid are given in sequences of their appearance starting from the 5′ end of the respective nucleic acid.
  • TABLE
    amino acid residue SEQ ID NO: 62 SEQ ID NO: 63
    G ggt ggt ggt ggt ggt ggt ggt ggc ggt ggc ggt ggc
    ggt ggt ggt ggt ggt ggt ggt ggc ggt ggt ggc ggt
    ggt ggt ggt ggt ggt ggt ggc ggt ggc ggt ggc ggt
    ggt ggt ggt ggt ggt ggt ggt ggc ggt ggc ggt ggc
    ggt ggt ggt ggc
    A gcg gcg gcg gcg gcg gcg gcg gct gca gcc gcg gct
    gcg gcg gcg gcg gcg gcg gca gcc gcg gct gca gcg
    gcg gcg gcg gcg gcg gcg gcg gct gca gcc gcg gct
    gcg gcg gcg gcg gcg gcg gca gcc gcg gct gca gcg
    gcg gcg gcg gcg gcg gcg gcg gct gca gcc gcg gct
    gcg gcg gcg gcg gcg gcg gca gcc gcg gct gca gcg
    gcg gcg gcg gcg gcg gcg gcg gct gca gcc gcg gct
    gcg gcg gcg gcg gcg gcg gca gcc gcg gct gca gcg
    gcg gcg
    V gtt gtt gtt gtt gtt gtt gtt gtt gtt gtg gta gtt gtg gtt gtc
    gtt gtt gtt gtt gtt gtt gtt gtt gtt gtg gta gtt gtg gta gtt
    gtt gtt gtt gtt gtt gtt gtt gtt gtg gtt gtc gtt gtg gta gtt
    gtt gtt gtt gtt gtt gtt gtt gtt gtg gta gtt gtg gtt gtc gtt
    gtt gtt gtt gtt gtt gtt gtt gtt gtg gta gtt gtg gta gtt gtg
    gtt gtc gtt gtg gta
    L ctg ctg ctg ctg ctg ctg ctg ctg ctg ctc
    I atc atc atc atc atc atc atc atc att atc atc att atc atc
    atc atc atc atc att atc atc att
    M atg atg
    P ccg ccg ccg ccg ccg ccg ccg ccg cca ccg ccg cct
    ccg ccg
    F ttc ttc ttc ttc ttc ttc ttc ttc ttc ttc ttt ttc ttc ttc ttt ttc
    ttc ttc
    W tgg tgg
    S tct tct tct tct tct tct tct tct tcc agc tct tcc agc tct
    tct tct tct tct tct tct tct tcc agc tct tcc agc tct tcc
    tct tct tct tct tct tct tct agc tct tct tcc agc tct tcc
    tct tct tct tct tct tct tct agc tct tcc agc tct tcc agc
    tct tct tct tct tct tct tct tct tcc agc tct tct tcc agc
    tct tct tct tct tct tct tct tct tcc agc tct tcc agc tct
    tct tct tct tct tct tct tct tcc agc tct tcc agc tct tct
    tct tct tcc agc
    T acc acc acc acc acc acc acc act acc acg acc act
    acc acc acc acc acc acc acc acc act acc acg acc
    acc acc acc acc acc acc act acc acc act acc acg
    acc acc acc acc acc act acc acc
    N aac aac aac aac aac aac
    Q cag cag cag cag cag cag caa cag
    Y tac tac tac tac tac tac tac tac tat tac tac tat tac tac
    tac tac tac tac tac tac tac tat tac tac tat tac tac tat
    C tgc tgc
    K aaa aaa aaa aaa aaa aaa aag aaa
    R cgt cgt cgt cgt cgt cgt cgt cgc cgt cgt cgc cgt
    cgt cgt cgt cgt cgt cgt cgt cgc cgt cgt cgc cgt
    cgt cgt cgt cgc
    H cac cac cac cac cac cac cac cat cac cac cat cac
    cac cac cac cac cat cac
    D gac gac gac gac gac gac gac gat gac gat gac gat
    gac gac gac gac gac gac gac gat gac gat gac gat
    gac gac gac gac gac gac gac gat gac gat gac gat
    gac gac gac gac gac gac gat gac gat gac
    E gaa gaa gaa gaa gaa gaa gaa gaa gag gaa gaa gaa
  • In the following table the codon usage frequency in the amino acid codon motifs with respect to overall usage frequency in the cell is given.
  • TABLE
    sequence of relative relative frequency sequence relative relative
    codons frequency of of used codon in of codons frequency of No. of frequency of
    amino acid (motif) in used codon group for specific (motif) in used codon codons used codon
    residue SEQ ID NO: 62 in motif amino acid residue SEQ ID NO: 63 in motif in group in group
    G ggt 1/1 = 100% 100% ggt ggt: 2 ggt:
    (only codon ggc 5/9 = 56% 54%
    used) ggt ggc: ggc:
    ggc 4/9 = 44% 46%
    ggt
    ggc
    ggt
    ggc
    ggt
    A gcg 1/1 = 100% 100% gcg gcg: 4 gcg:
    (only codon gct 4/12 = 33% 32%
    used) gca gct: gct:
    gcc 3/12 = 25% 28%
    gcg gca: gca:
    gct 3/12 = 25% 24%
    gca gcc: gcc:
    gcc 2/12 = 17% 16%
    gcg
    gct
    gca
    gcg
    V gtt 1/1 = 100% 100% gtt gtt: 4 gtt:
    (only codon gtg 4/10 = 40% 40%
    used) gta gtg: gtg:
    gtt 3/10 = 30% 27%
    gtg gta: gta:
    gtt 2/10 = 20% 20%
    gtc gtc: gtc:
    gtt 1/10 = 10% 12%
    gtg
    gta
    L ctg 1/1 = 100% 100% ctg ctg: 2 ctg:
    (only codon ctg 4/5 = 80% 91%
    used) ctg ctc: ctc:
    ctg 1/5 = 20% 9%
    ctc
    I atc 1/1 = 100% 100% atc atc: 2 atc:
    (only codon att 2/3 = 67% 68%
    used) atc att: att:
    1/3 = 33% 32%
    M atg 1/1 = 100% 100% atg 1/1 = 100% 1 100%
    (only codon
    used)
    P ccg 1/1 = 100% 100% ccg ccg: 3 ccg:
    (only codon ccg 4/6 = 67% 76%
    used) cca cca: cca:
    ccg 1/6 = 16.5% 14%
    ccg cct: cct:
    cct 1/6 = 16.5% 10%
    F ttc 1/1 = 100% 100% ttc ttc: 2 ttc:
    (only codon ttc 3/4 = 75% 72%
    used) ttt ttt: ttt:
    ttc 1/4 = 25% 28%
    W tgg 1/1 = 100% 100% tgg 1/1 = 100% 1 100%
    (only codon
    used)
    S tct 1/1 = 100% 100% tct tct: 3 tct:
    (only codon tcc 6/16 = 38% 38%
    used) agc tcc: tcc:
    tct 5/16 = 31% 32%
    tcc agc: agc:
    agc 5/16 = 31% 30%
    tct
    tcc
    agc
    tct
    tcc
    agc
    tct
    tcc
    agc
    tct
    T acc 1/1 = 100% 100% acc acc: 3 acc:
    (only codon act 4/7 = 57% 58%
    used) acc act: act:
    acg 2 /7 = 29% 29%
    acc acg: acg:
    act 1/7 = 14% 13%
    acc
    N aac 1/1 = 100% 100% aac 1/1 = 100% 2 aac:
    (only codon aac 84%
    used) aac aat:
    16%
    Q cag 1/1 = 100% 100% cag cag: 2 cag:
    (only codon cag 2/3 = 67% 82%
    used) caa caa: caa:
    1/3 = 33% 18%
    Y tac 1/1 = 100% 100% tac tac: 2 tac:
    (only codon tat 2/3 = 67% 64%
    used) tac tat: tat:
    1/3 = 33% 36%
    C tgc 1/1 = 100% 100% tgc 1/1 = 100% 2 tgc:
    (only codon 64%
    used) tgt:
    36%
    K aaa 1/1 = 100% 100% aaa aaa: 2 aaa:
    (only codon aaa 2/3 = 67% 80%
    used) aag aag: aag:
    1/3 = 33% 20%
    R cgt 1/1 = 100% 100% cgt cgt: 2 cgt:
    (only codon cgc 2/3 = 67% 66%
    used) cgt cgc: cgc:
    1/3 = 33% 34%
    H cac 1/1 = 100% 100% cac cac: 2 cac:
    (only codon cat 2/3 = 67% 71%
    used) cac cat: cat:
    1/3 = 33% 29%
    D gac 1/1 = 100% 100% gac gac: 2 gac:
    (only codon gat 1/2 = 50% 54%
    used) gat: gat:
    1/2 = 50% 46%
    E gaa 1/1 = 100% 100% gaa gaa: 2 gaa:
    (only codon gaa 3/4 = 75% 76%
    used) gag gag: gag:
    gaa 1/4 = 25% 24%
  • In the following table the frequency of the codons in the entire encoding nucleic acid is given.
  • TABLE
    relative relative frequency relative
    amino frequency of of used codon in relative frequency frequency of
    acid used codon group for specific of used codon in used codon
    residue SEQ ID NO: 62 in motif amino acid residue SEQ ID NO: 63 motif in group
    G ggt ggt ggt ggt 26/26 = 100% ggt ggc ggt ggc ggt: ggt:
    ggt ggt ggt ggt 100% ggt ggc ggt ggc 14/26 = 54%
    ggt ggt ggt ggt ggt ggt ggc ggt 54% ggc:
    ggt ggt ggt ggt ggc ggt ggc ggt ggc: 46%
    ggt ggt ggt ggt ggc ggt ggt ggc 12/26 =
    ggt ggt ggt ggt ggt ggc ggt ggc 46%
    ggt ggt ggt ggc
    A gcg gcg gcg gcg 49/49 = 100% gcg gct gca gcc gcg: gcg:
    gcg gcg gcg gcg 100% gcg gct gca gcc 17/49 = 32%
    gcg gcg gcg gcg gcg gct gca gcg 35% gct:
    gcg gcg gcg gcg gcg gct gca gcc gct: 28%
    gcg gcg gcg gcg gcg gct gca gcc 12/49 = gca:
    gcg gcg gcg gcg gcg gct gca gcg 24.5% 24%
    gcg gcg gcg gcg gcg gct gca gcc gca: gcc:
    gcg gcg gcg gcg gcg gct gca gcc 12/49 = 16%
    gcg gcg gcg gcg gcg gct gca gcg 24.5%
    gcg gcg gcg gcg gcg gct gca gcc gcc:
    gcg gcg gcg gcg gcg gct gca gcc 8/49 =
    gcg gcg gcg gcg gcg gct gca gcg 16%
    gcg gcg
    V gtt gtt gtt gtt 32/32 = 100% gtt gtg gta gtt gtt: gtt:
    gtt gtt gtt gtt 100% gtg gtt gtc gtt 16/40 = 40%
    gtt gtt gtt gtt gtg gta gtt gtg 40% gtg:
    gtt gtt gtt gtt gta gtt gtg gtt gtg: 27%
    gtt gtt gtt gtt gtc gtt gtg gta 12/40 = gta:
    gtt gtt gtt gtt gtt gtg gta gtt 30% 20%
    gtt gtt gtt gtt gtg gtt gtc gtt gta: gtc:
    gtt gtt gtt gtt gtg gta gtt gtg 8/40 = 12%
    gtt gtt gtt gtt gta gtt gtg gtt 20%
    gtt gtt gtt gtt gtc gtt gtg gta gtc:
    4/40 =
    10%
    L ctg ctg ctg ctg 5/5 = 100% ctg ctg ctg ctg ctg: ctg:
    ctg 100% ctc 4/5 = 91%
    80% ctc:
    ctc: 9%
    1/5 =
    20%
    I atc atc atc atc 11/11 = 100% atc att atc atc atc: atc:
    atc atc atc atc 100% att atc atc att 7/11 = 68%
    atc atc atc atc atc att 64% att:
    att: 32%
    4/11 =
    36%
    M atg 1/1 = 100% atg 1/1 = 100%
    100% 100%
    P ccg ccg ccg ccg 7/7 = 100% ccg ccg cca ccg ccg: ccg:
    ccg ccg ccg 100% ccg cct ccg 5/7 = 76%
    72% cca:
    cca: 14%
    1/7 = cct:
    14% 10%
    cct:
    1/7 =
    14%
    F ttc ttc ttc ttc 9/9 = 100% ttc ttc ttt ttc ttt: ttt:
    ttc ttc ttc ttc 100% ttc ttc ttt ttc 7/9 = 72%
    ttc ttc 78% ttt:
    ttt: 28%
    2/9 =
    22%
    W tgg 1/1 = 100% tgg 1/1 = 100%
    100% 100%
    S tct tct tct tct 51/51 = 100% tct tcc agc tct tct: tct:
    tct tct tct tct 100% tcc agc tct tcc 19/51 = 38%
    tct tct tct tct agc tct tcc agc 38% tcc:
    tct tct tct tct tct tcc agc tct tcc: 32%
    tct tct tct tct tct tcc agc tct 16/51 = agc:
    tct tct tct tct tcc agc tct tcc 31% 30%
    tct tct tct tct agc tct tcc agc agc:
    tct tct tct tct tct tcc agc tct 16/51 =
    tct tct tct tct tct tcc agc tct 31%
    tct tct tct tct tcc agc tct tcc
    tct tct tct tct agc tct tcc agc
    tct tct tct tct tct tcc agc tct
    tct tct tct tct tcc agc
    T acc acc acc acc 22/22 = 100% acc act acc acg acc: acc:
    acc acc acc acc 100% acc act acc acc 13/22 = 58%
    acc acc acc acc act acc acg acc 59% act:
    acc acc acc acc act acc acc act act: 29%
    acc acc acc acc acc acg acc act 6/22 = acg:
    acc acc acc acc 27% 13%
    acg:
    3/22 =
    14%
    N aac aac aac 3/3 = 100% aac aac aac 3/3 = 100%
    100% 100%
    Q cag cag cag cag 4/4 = 100% cag cag caa cag cag: cag:
    100% 3/4 = 82%
    75% caa:
    caa: 18%
    1/4 =
    25%
    Y tac tac tac tac 14/14 = 100% tac tat tac tac tac: tac:
    tac tac tac tac 100% tat tac tac tat 9/14 = 64%
    tac tac tac tac tac tac tat tac 64% tat:
    tac tac tac tat tat: 36%
    5/14 =
    36%
    C tgc 1/1 = 100% tgc 1/1 = 100%
    100% 100%
    K aaa aaa aaa aaa 4/4 = 100% aaa aaa aag aaa aaa: aaa:
    100% 3/4 = 80%
    75% aag:
    aag: 20%
    1/4 =
    25%
    R cgt cgt cgt cgt 14/14 = 100% cgt cgc cgt cgt cgt: cgt:
    cgt cgt cgt cgt 100% cgc cgt cgt cgc 9/14 = 66%
    cgt cgt cgt cgt cgt cgt cgc cgt 64% cgc:
    cgt cgt cgt cgc cgc: 34%
    5/14 =
    36%
    H cac cac cac cac 9/9 = 100% cac cat cac cac cac: cac:
    cac cac cac cac 100% cat cac cac cat 6/9 = 71%
    cac cac 67% cat:
    cat: 29%
    3/9 =
    33%
    D gac gac gac gac 23/23 = 100% gac gat gac gat gac: gac:
    gac gac gac gac 100% gac gat gac gat 12/23 = 54%
    gac gac gac gac gac gat gac gat 52% gat:
    gac gac gac gac gac gat gac gat gat: 46%
    gac gac gac gac gac gat gac gat 11/23 =
    gac gac gac gac gat gac 48%
    E gaa gaa gaa gaa 6/6 = 100% gaa gaa gag gaa gaa: gaa:
    gaa gaa 100% gaa gaa 5/6 = 76%
    83% gag:
    gag: 24%
    1/6 =
    17%
  • Both nucleic acids have been expressed in E. coli comprising an N-terminal interferon-alpha carrier peptide of SEQ ID NO: 60 encoded by a nucleic acid of SEQ ID NO: 61 and a C-terminal purification tag of SEQ ID NO: 58 encoded by a nucleic acid of SEQ ID NO: 59. The E. coli expression vector including the expression cassette was identical for the differently encoded test-polypeptide. The expression yield of the differently encoded test-polypeptide has been determined by quantitative Western blot analysis. Therefore, E. coli whole cell lysates were prepared and fractionated into a soluble supernatant and an insoluble cell pellet fraction by centrifugation. Thereafter, proteins were separated electrophoretically by SDS PAGE, transferred electrophoretically to a nitrocellulose membrane and then stained with an antibody POD conjugate recognizing the poly-His purification tag. The stained poly-His containing differently encoded test-polypeptide was quantified by comparison with a pure protein reference standard containing the same poly-His purification tag (scFv-poly-His antibody fragment) of known scFv-poly-His protein concentration using the Lumi-Imager F1 analyzer (Roche Molecular Biochemicals) and the Lumi Analyst software version 3.1.
  • In FIGS. 1 and 2 the Western blots of the differently encoded poly-His-tagged test-polypeptide are shown. E. coli whole cell lysates were fractionated into a soluble supernatant and an insoluble cell pellet fraction before SDS PAGE and immonoblotting. As reference a molecular weight protein standard and a scFv antibody fragment comprising a poly-His-tag of known protein concentration has been used.
  • Lanes 2, 4, 6, and 8 are samples showing the amount of expressed test-polypeptide obtained with the nucleic acid generated by using always the codon with the highest usage frequency after 0 hours, 4 hours, 6 hours, and 21 hours of cultivation.
  • Lanes 3, 5, 7, and 9 are samples showing the amount of expressed test-polypeptide obtained with the nucleic acid generated with the new/inventive protein backtranslation method as reported herein after 0 hours, 4 hours, 6 hours, and 21 hours of cultivation.
  • Lanes 11 to 14 correspond to the purified scFv-poly-His protein reference standard of known concentration (5 ng, 10 ng, 20 ng, 30 ng, and 40 ng).
  • The amount of the test-polypeptide expressed was determined within the insoluble cell debris fraction (pellet) after solubilization/extraction of insoluble protein aggregates with SDS sample buffer since the major fraction of the expressed test-polypeptide was found in the insoluble cell pellet fraction after cell lysis and cell fractionation (precipitated insoluble protein aggregates also known as inclusion bodies).
  • The determined amounts of test-polypeptide solubilized from the insoluble cell pellet fraction are shown in the following table.
  • TABLE
    deter-
    mined total
    amount of amount of
    test- test-
    Lumi poly- poly-
    sample Imager peptide peptide
    amount signal per lane per sample
    lane sample [μl] [BLU] [ng] [ng]
    2 reference - 0 hours 5 no signal 0 0
    4 reference - 4 hours 0.02* 9703 6.3 315.7
    6 reference - 6 hours 0.02* 13461 9.0 448.9
    8 reference - 21 hours 0.02* 1448 0.5 22.9
    3 reported herein 5 no signal 0 0
    0 hours
    5 reported herein 0.02* 19440 13.2 660.9
    4 hours
    7 reported herein 0.02* 24014 16.5 823.1
    6 hours
    9 reported herein 0.02* 7738 4.9 246.0
    21 hours
    10 standard 1 (5 ng) 1 4803
    11 standard 2 (10 ng) 2 16847
    12 standard 3 (20 ng) 4 32733
    13 standard 4 (30 ng) 6 40476
    14 standard 5 (40 ng) 8 43858
    *sample diluted 1:50 with sample buffer; analyzed volume 5 μl
  • As can be seen from the above table the expression yield obtained by using a nucleic acid encoding a test-polypeptide in which the encoding codons are chosen according to the method as reported herein is at least about 1.8 times the yield that is obtained using a classical codon optimization method.
  • In the following the method as reported herein is exemplified using a eukaryotic cell.
  • In one embodiment the cell is a CHO cell. The overall codon usage frequency taking into account all condons encoding a specific amino acid residue for Cricetulus species (CHO cells; Mesocricetus species; hamster) is given in the following table.
  • TABLE
    Ala GCG
    9
    Ala GCA 23
    Ala GCT 30
    Ala GCC 38
    Arg AGG 22
    Arg AGA 20
    Arg CGG 19
    Arg CGA 9
    Arg CGT 10
    Arg CGC 19
    Asn AAT 39
    Asn AAC 61
    Asp GAT 39
    Asp GAC 61
    Cys TGT 42
    Cys TGC 58
    Gln CAG 78
    Gln CAA 22
    Glu GAG 64
    Glu GAA 36
    Gly GGG 24
    Gly GGA 25
    Gly CGT 19
    Gly GGC 33
    His CAT 42
    His CAC 58
    Ile ATA 15
    Ile ATT 35
    Ile ATC 51
    Leu CTG 44
    Leu CTA 6
    Leu CTT 13
    Leu CTC 19
    Leu TTG 12
    Leu TTA 6
    Lys AAG 67
    Lys AAA 33
    Met ATG 100
    Phe TTT 44
    Phe TTC 56
    Pro CCG 7
    Pro CCA 29
    Pro CCT 29
    Pro CCC 34
    Ser AGT 14
    Ser AGC 24
    Ser TCG 5
    Ser TCA 15
    Ser TCT 18
    Ser TCC 24
    Thr ACG 10
    Thr ACA 29
    Thr ACT 21
    Thr ACC 40
    Trp TGG 100
    Tyr TAT 39
    Tyr TAC 61
    Val GTG 48
    Val GTA 11
    Val GTT 16
    Val GTC 25
  • For encoding the amino acid residue alanine four different codons are available: GCG, GCA, GCT, and GCC. Thus, the group of codons encoding the amino acid residue can comprise at most four codons. In the group comprising four codons, each codon has a specific usage frequency in the group that is the same as the overall usage frequency in the genome of the cell, i.e. the codon GCG has a specific and overall usage frequency of 9%, the codon GCA has a specific and overall usage frequency of 23%, the codon GCT has a specific and overall usage frequency of 30%, and the codon GCC has a specific and overall usage frequency of 38%. If the number of codons in the group is reduced, e.g. be excluding the codons GCG and GCA with a specific usage frequency of 9% and 23% from the group, respectively, the specific usage frequency of the remaining members of the group, i.e. GCT and GCC, changes to 44% (=30/(30+38)*100) and 56% (=38/(30+38)*100), respectively, as the sum of the specific usage frequencies of all codons in one group is 100%. Thus, in the group comprising the two codons GCT and GCC, each codon has a specific usage frequency in the group that is higher than its overall usage frequency in the genome of the cell, i.e. the codon GCT has a specific usage frequency of 44% and an overall usage frequency of 30%, and the codon GCC has a specific usage frequency of 56% and an overall usage frequency of 38%.
  • If the group of codons encoding the amino acid residue alanine comprises all four available codons the amino acid codon motif of the codons encoding the amino acid residue alanine comprises the codons GCC, GCT, GCA and GCG at a ratio of 38:30:23:9. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 40:30:20:10, which corresponds to 4:3:2:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue alanine is gcc gcc gcc gcc gct gct gct gca gca gcg (SEQ ID NO: 64).
  • As the codons are distributed within the genome also a distribution within the amino acid codon motif is used taking into account the above ratio and the usage frequency, whereby codons with a higher frequency are chosen first.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue alanine is gcc gct gca gcc gct gca gcg gcc gct gcc (SEQ ID NO: 65).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue alanine is gcc gct gcc gct gca gcg gcc gct gca gcc (SEQ ID NO: 66).
  • If the group of codons encoding the amino acid residue alanine comprises the three codons GCC, GCT and GCA the amino acid codon motif of the codons encoding the amino acid residue alanine comprises the codons GCC, GCT and GCA at a ratio of 42:33:25. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 40:30:30, which corresponds to 4:3:3.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue alanine is gcc gcc gcc gcc gct gct gct gca gca gca (SEQ ID NO: 67).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue alanine is gcc gct gca gcc gct gca gcc gct gca gcc (SEQ ID NO: 68).
  • If the group of codons encoding the amino acid residue arginine comprises the four codons AGG, AGA, CGG and CGC the amino acid codon motif of the codons encoding the amino acid residue arginine comprises the codons AGG, AGA, CGG and CGC at a ratio of 27:25:24:24. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 25:25:25:25, which corresponds to 1:1:1:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue arginine is agg aga cgg cgc (SEQ ID NO: 69).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue arginine is agg cgg aga cgc (SEQ ID NO: 70).
  • If the group of codons encoding the amino acid residue asparagine comprises the two codons AAC and AAT the amino acid codon motif of the codons encoding the amino acid residue asparagine comprises the codons AAC and AAT at a ratio of 61:39. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 60:40, which corresponds to 3:2.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue asparagine is aac aac aac aat aat (SEQ ID NO: 71).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue asparagine is aac aat aac aat aac (SEQ ID NO: 72).
  • If the group of codons encoding the amino acid residue aspartic acid comprises the two codons GAC and GAT the amino acid codon motif of the codons encoding the amino acid residue aspartic acid comprises the codons GAC and GAT at a ratio of 61:39. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 60:40, which corresponds to 3:2.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue aspartic acid is gac gac gac gat gat (SEQ ID NO: 73).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue aspartic acid is gac gat gac gat gac (SEQ ID NO: 74).
  • If the group of codons encoding the amino acid residue cysteine comprises the two codons TGC and TGT the amino acid codon motif of the codons encoding the amino acid residue cysteine comprises the codons TGC and TGT at a ratio of 58:42, which corresponds to 29:21. As this would result in an amino acid codon motif comprising 50 positions it is adjusted to 30:20, which corresponds to 3:2.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue cysteine is tgc tgc tgc tgt tgt (SEQ ID NO: 75).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue cysteine is tgc tgt tgc tgt tgc (SEQ ID NO: 76).
  • If the group of codons encoding the amino acid residue glutamine comprises the two codons CAG and CAA the amino acid codon motif of the codons encoding the amino acid residue glutamine comprises the codons CAG and CAA at a ratio of 78:22, which corresponds to 39:11. As this would result in an amino acid codon motif comprising 50 positions it is adjusted to 40:10, which corresponds to 4:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue glutamine is cag cag cag cag caa (SEQ ID NO: 77).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue glutamine is cag cag cag caa cag (SEQ ID NO: 78).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue glutamine is cag cag caa cag cag (SEQ ID NO: 79).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue glutamine is cag caa cag cag cag (SEQ ID NO: 80).
  • If the group of codons encoding the amino acid residue glutamic acid comprises the two codons GAG and GAA the amino acid codon motif of the codons encoding the amino acid residue glutamic acid comprises the codons GAG and GAA at a ratio of 64:36, which corresponds to 32:18. As this would result in an amino acid codon motif comprising 50 positions it is adjusted to 32:16, which corresponds to 2:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue glutamic acid is gag gag gaa (SEQ ID NO: 81).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue glutamic acid is gag gaa gag (SEQ ID NO: 82).
  • If the group of codons encoding the amino acid residue glycine comprises all available codons the amino acid codon motif of the codons encoding the amino acid residue glycine comprises the codons GGC, GGA, GGG and GGT at a ratio of 33:25:24:19. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 35:25:25:20, which corresponds to 7:5:5:4.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue glycine is ggc ggc ggc ggc ggc ggc ggc gga gga gga gga gga ggg ggg ggg ggg ggg ggt ggt ggt ggt (SEQ ID NO: 83).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue glycine is ggc gga ggg ggt ggc gga ggc ggg ggt ggc gga ggg ggt ggc gga ggc ggg ggc gga ggg ggt (SEQ ID NO: 84).
  • If the group of codons encoding the amino acid residue histidine comprises the two codons CAC and CAT the amino acid codon motif of the codons encoding the amino acid residue histidine comprises the codons CAC and CAT at a ratio of 58:42, which corresponds to 29:21. As this would result in an amino acid codon motif comprising 50 positions it is adjusted to 30:20, which corresponds to 3:2.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue histidine is cac cac cac cat cat (SEQ ID NO: 85).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue histidine is cac cat cac cat cac (SEQ ID NO: 86).
  • If the group of codons encoding the amino acid residue isoleucine comprises all available codons the amino acid codon motif of the codons encoding the amino acid residue isoleucine comprises the codons ATC, ATT and ATA at a ratio of 51:35:15. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 50:35:15, which corresponds to 10:7:3.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue isoleucine is atc atc atc atc atc atc atc atc atc atc att att att att att att att ata ata ata (SEQ ID NO: 87).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue isoleucine is atc att atc atc att ata atc att atc atc att ata atc att atc atc att ata atc att (SEQ ID NO: 88).
  • If the group of codons encoding the amino acid residue leucine comprises the four codons CTG, CTC, CTT and TTG the amino acid codon motif of the codons encoding the amino acid residue leucine comprises the codons CTG, CTC, CTT and CTG at a ratio of 44:19:13:12. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 40:20:10:10, which corresponds to 4:2:1:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue leucine is ctg ctg ctg ctg ctc ctc ctt ttg (SEQ ID NO: 89).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue leucine is ctg ctc ctg ctt ctg ctc ctg ttg (SEQ ID NO: 90).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue leucine is ctg ctc ctt ttg ctg ctc ctg ctg (SEQ ID NO: 91).
  • If the group of codons encoding the amino acid residue lysine comprises the two codons AAG and AAA the amino acid codon motif of the codons encoding the amino acid residue lysine comprises the codons AAG and AAA at a ratio of 67:33. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 66:33, which corresponds to 2:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue lysine is aag aag aaa (SEQ ID NO: 92).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue lysine is aag aaa aag (SEQ ID NO: 93).
  • If the group of codons encoding the amino acid residue phenylalanine comprises the two codons TTC and TTT the amino acid codon motif of the codons encoding the amino acid residue phenylalanine comprises the codons TTC and TTT at a ratio of 56:44, which corresponds to 14:11. As this would result in an amino acid codon motif comprising 25 positions it is adjusted to 15:10, which corresponds to 3:2.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue phenylalanine is ttc ttc ttc ttt ttt (SEQ ID NO: 94).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue phenylalanine is ttc ttt ttc ttt ttc (SEQ ID NO: 95).
  • If the group of codons encoding the amino acid residue proline comprises the three codons CCC, CCA and CCT the amino acid codon motif of the codons encoding the amino acid residue proline comprises the codons CCC, CCA and CCT at a ratio of 34:29:29. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 35:30:30, which corresponds to 7:6:6.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue proline is ccc ccc ccc ccc ccc ccc ccc cca cca cca cca cca cca cct cct cct cct cct cct (SEQ ID NO: 96).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue proline is ccc cca cct ccc cca cct ccc cca cct ccc cca cct ccc cca cct ccc cca cct ccc (SEQ ID NO: 97).
  • If the group of codons encoding the amino acid residue serine comprises the four codons TCC, AGC, TCT and TCA the amino acid codon motif of the codons encoding the amino acid residue serine comprises the codons TCC, AGC, TCT and TCA at a ratio of 24:24:18:15, which corresponds to 8:8:6:3. As this would result in an amino acid codon motif comprising 25 positions it is adjusted to 9:9:6:3, which corresponds to 3:3:2:1.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue serine is tcc tcc tcc agc agc agc tct tct tca (SEQ ID NO: 98).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue serine is tcc agc tct tca tcc agc tct tcc agc (SEQ ID NO: 99).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue serine is tcc agc tct tcc agc tca tct tcc agc (SEQ ID NO: 100).
  • If the group of codons encoding the amino acid residue threonine comprises the three codons ACC, ACA and ACT the amino acid codon motif of the codons encoding the amino acid residue threonine comprises the codons ACC, ACA and ACT at a ratio of 45:32:23. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 50:30:20, which corresponds to 5:3:2.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue threonine is acc acc acc acc acc aca aca aca act act (SEQ ID NO: 101).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue threonine is acc aca act acc aca acc aca act acc acc (SEQ ID NO: 102).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue threonine is acc aca acc aca act acc acc aca act acc (SEQ ID NO: 103).
  • If the group of codons encoding the amino acid residue tyrosine comprises the two codons TAT and TAC the amino acid codon motif of the codons encoding the amino acid residue tyrosine comprises the codons TAT and TAC at a ratio of 61:39. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 60:40, which corresponds to 3:2.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue tyrosine is tat tat tat tac tac (SEQ ID NO: 104).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue tyrosine is tat tac tat tac tat (SEQ ID NO: 105).
  • If the group of codons encoding the amino acid residue valine comprises all four available codons the amino acid codon motif of the codons encoding the amino acid residue valine comprises the codons GTG, GTC, GTT and GTA at a ratio of 48:25:16:11. As this would result in an amino acid codon motif comprising 100 positions it is adjusted to 48:24:18:12, which corresponds to 8:4:3:2.
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue valine is gtg gtg gtg gtg gtg gtg gtg gtg gtc gtc gtc gtc gtt gtt gtt gta gta (SEQ ID NO: 106).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue valine is gtg gtg gtc gtt gta gtg gtg gtc gtt gtg gtg gtc gtt gta gtg gtg gtc (SEQ ID NO: 107).
  • Thus, one amino acid codon motif of the codons encoding the amino acid residue valine is gtg gtc gtt gta gtg gtc gtt gta gtg gtc gtt gtg gtc gtg gtg gtg gtg (SEQ ID NO: 108).
  • The following examples, sequences and figures are provided to aid the understanding of the present invention, the true scope of which is set forth in the appended claims. It is understood that modifications can be made in the procedures set forth without departing from the spirit of the invention.
  • EXAMPLES Protein Determination
  • The protein concentration was determined by determining the optical density (OD) at 280 nm, using the molar extinction coefficient calculated on the basis of the amino acid sequence.
  • Recombinant DNA Technique:
  • Standard methods were used to manipulate DNA as described in Sambrook, J., et al., Molecular Cloning: A Laboratory Manual; Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989). The molecular biological reagents were used according to the manufacturer's instructions.
  • Example 1 Making and Description of the Test-Polypeptide Expression Plasmids
  • The fusion test-polypeptide was prepared by recombinant means. The amino acid sequence of the expressed fusion test-polypeptide was encoded by a nucleic acid comprising in 5′ to 3′ direction a nucleic acid of SEQ ID NO: 61 encoding the carrier peptide, a nucleic acid of SEQ ID NO: 62 or SEQ ID NO: 63 encoding the test polypeptide, and a nucleic acid of SEQ ID NO: 59 encoding a hexa-histidine purification tag (poly-His tag).
  • The encoding fusion gene was assembled with known recombinant methods and techniques by connection of appropriate nucleic acid segments. Nucleic acid sequences made by chemical synthesis were verified by DNA sequencing. The expression plasmid for the production of the fusion polypeptide was prepared as outlined below.
  • Making of the E. coli Expression Plasmid
  • Plasmid 4980 (4980-pBRori-URA3-LACI-SAC) is an expression plasmid for the expression of core-streptavidin in E. coli. It was generated by ligation of the 3142 bp long EcoRI/CelII-vector fragment derived from plasmid 1966 (1966-pBRori-URA3-LACI-T-repeat; reported in EP-B 1 422 237) with a 435 bp long core-streptavidin encoding EcoRI/CelII-fragment.
  • The core-streptavidin E. coli expression plasmid comprises the following elements:
      • the origin of replication from the vector pBR322 for replication in E. coli (corresponding to by position 2517-3160 according to Sutcliffe, G., et al., Quant. Biol. 43 (1979) 77-90),
      • the URA3 gene of Saccharomyces cerevisiae coding for orotidine 5′-phosphate decarboxylase (Rose, M., et al., Gene 29 (1984) 113-124) which allows plasmid selection by complementation of E. coli pyrF mutant strains (uracil auxotrophy),
      • the core-streptavidin expression cassette comprising
        • the T5 hybrid promoter (T5-PN25/03/04 hybrid promoter according to Bujard, H., et al., Methods. Enzymol. 155 (1987) 416-433 and Stueber, D., et al., Immunol. Methods IV (1990) 121-152) including a synthetic ribosomal binding site according to Stueber, D., et al. (see before),
        • the core-streptavidin gene,
        • two bacteriophage-derived transcription terminators, the λ-T0 terminator (Schwarz, E., et al., Nature 272 (1978) 410-414) and the fd-terminator (Beck E. and Zink, B. Gene 1-3 (1981) 35-58),
      • the lad repressor gene from E. coli (Farabaugh, P. J., Nature 274 (1978) 765-769).
  • The final expression plasmid for the expression of the fusion test-polypeptide was prepared by excising the core-streptavidin structural gene from vector 4980 using the singular flanking EcoRI and CelII restriction endonuclease cleavage site and inserting the EcoRII/CelII restriction site flanked nucleic acid encoding the fusion test-polypeptide into the 3142 bp long EcoRI/CelII-4980 vector fragment.
  • The expression plasmid containing the test-polypeptide gene generated with the classic codon usage was designated 11020 while the expression plasmid containing the test-polypeptide gene generated with the new codon usage was designated 11021.
  • Example 2 Expression of the Test-Polypeptide in E. coli
  • For the expression of the fusion test-polypeptide there was employed an E. coli host/vector system which enables an antibiotic-free plasmid selection by complementation of an E. coli auxotrophy (PyrF) (EP 0 972 838 and U.S. Pat. No. 6,291,245).
  • Transformation, Cell Culturing and Induction of Transformed E. coli Cells
  • The E. coli K12 strain CSPZ-2 (leuB, proC, trpE, thi-1, ApyrF) was transformed with the expression plasmid (11020 and 11021, respectively) obtained in previous step. The transformed CSPZ-2 cells were first grown at 37° C. on agar plates and subsequently in a shaking culture in M9 minimal medium containing 0.5% casamino acids (Difco) up to an optical density at 550 nm (OD550) of 0.6-0.9 and subsequently induced with IPTG (1-5 mmol/1 final concentration).
  • Example 3 Expression Analysis of Test-Polypeptide
  • The expressed fusion test-polypeptide was visualized after SDS PAGE by quantitative Western blot analysis. Therefore, E. coli lysate was processed by sodium dodecyl sulfate (SDS) polyacrylamide gel electrophoresis (SDS-PAGE), and the separated polypeptides were transferred to a membrane from the gel and subsequently detected and quantified by an immunological method.
  • Sampling of E. coli Cells and Sample Preparation for SDS PAGE
  • For expression analysis E. coli cell culture samples were drawn from the shaking culture over a time course of about 24 h. One sample was drawn prior to induction of recombinant protein expression. Further samples were taken at dedicated time points e.g. 4, 6 and 21 hours after induction.
  • The E. coli cell pellets from 3 OD550 nm units (1 OD550 nm=1 ml cell suspension with an OD at 550 nm of 1) of centrifuged culture medium were resuspended in 0.25 ml 10 mmol/1 potassium phosphate buffer, pH 6.5, and the cells were lysed by ultrasonic treatment (two pulses of 30 sec. at 50% intensity). The insoluble cell components were sedimented (centrifugation 14,000 rpm, 5 min.) and an aliquot of the clarified supernatant was admixed with 1/4 volume (v/v) of 4×LDS sample buffer and 1/10 volume (v/v) of 0.5 M 1,4-dithiotreitol (DTT). The insoluble cell debris fraction (pellet) was resuspended/extracted in 0.3 ml 1×LDS sample buffer containing 50 mM 1,4-dithiotreitol (DTT) under shaking for 15 min and centrifuged again.
  • SDS Page
  • LDS sample buffer, fourfold concentrate (4×): 4 g glycerol, 0.682 g TRIS-Base, 0.666 g TRIS-hydrochloride, 0.8 g LDS (lithium dodecyl sulfate), 0.006 g EDTA (ethylene diamin tetra acid), 0.75 ml of a 1% by weight (w/w) solution of Serva Blue G250 in water, 0.75 ml of a 1% by weight (w/w) solution of phenol red, add water to make a total volume of 10 ml.
  • The NuPAGE® Pre-Cast gel system (Invitrogen) was used according to the manufacturer's instruction (10% NuPAGE® Novex® Bis-TRIS Pre-Cast gels, pH 6.4; Cat.-No.: NP0301). The samples were incubated for 10 min. at 70° C. and after cooling to room temperature 5-40 μL were loaded onto the gels. In addition, 5 μl MagicMark™ XP Western Protein Standard (20-220 kDa) (Invitrogen, Cat. No.: LC5602), 5 μl of Precision Plus Protein™ prestained protein standard (Bio-Rad, Cat. No.: 161-0373) and 1, 2, 4, 6 and 8 μl of purified scFv-poly-His quantification standard (protein concentration: 5 ng/μl) were loaded onto the gel. Separation of proteins took place in reducing NuPAGE® MOPS SDS running buffer (Invitrogen, Cat. No.: NP0001) for 60 min. at 180 V.
  • The sample arrangement for SDS PAGE/Western Blot as shown in FIGS. 1 and 2.
  • lane sample source amount
    1 Magic Mark ™ 5 μl
    2 reference, 0 h supernatant 5 μl
    3 reported herein, 0 h supernatant 5 μl
    4 reference, 4 h supernatant 2 μl
    5 reported herein, 4 h supernatant 2 μl
    6 reference, 6 h supernatant 2 μl
    7 reported herein, 6 h supernatant 2 μl
    8 reference, 21 h supernatant 5 μl
    9 reported herein, 21 h supernatant 5 μl
    10 standard 5 ng
    11 standard 10 ng
    12 standard 20 ng
    13 standard 30 ng
    14 standard 40 ng
    15 Precision Plus 5 μl
    Protein ™
    1 Magic Mark ™ 5 μl
    2 reference, 0 h Pellet 5 μl
    3 reported herein, 0 h Pellet 5 μl
    4 reference, 4 h Pellet 40 μl
    5 reported herein, 4 h Pellet 40 μl
    6 reference, 6 h Pellet 40 μl
    7 reported herein, 6 h Pellet 40 μl
    8 reference, 21 h Pellet 40 μl
    9 reported herein, 21 h Pellet 40 μl
    10 standard 5 ng
    11 standard 10 ng
    12 standard 20 ng
    13 standard 30 ng
    14 standard 40 ng
    15 Precision Plus 5 μl
    Protein ™
  • Western Blotting
  • Transfer buffer: 39 mM glycine, 48 mM TRIS-hydrochloride, 0.04% by weight (w/w) SDS, and 20% by volume methanol (v/v).
  • After SDS-PAGE the separated polypeptides were transferred electrophoretically to a nitrocellulose filter membrane (pore size: 0.45 μm, Invitrogen, Cat. No. LC2001) according to the “Semidry-Blotting-Method” of Burnette (Burnette, W. N., Anal. Biochem. 112 (1981) 195-203).
  • Immunological Detection and Quantification of the Poly-his-Tagged Test-Polypeptide
  • After electro-transfer the membranes were washed in 50 mM Tris-HCl, pH 7.5, 150 mM NaCl (TBS, tris buffered saline) and nonspecific binding sites were blocked over night at 4° C. in TBS, 1% (w/v) Western Blocking Reagent (Roche, Cat Nr.: 11921673001).
  • The mouse monoclonal anti-Penta-His antibody (Qiagen, Cat. No.: 34660) was used as primary antibody at a dilution of 1:1,000 in TBS, 0.5% (w/v) Western Blocking Solution. After two washes in TBS (Bio-Rad, Cat. No.: 170-6435) and two washes in TBS supplemented with 0.05% (v/v) Tween-20 (TBST) the poly-His containing polypeptides were visualized using a purified rabbit anti-mouse IgG antibody conjugated to a peroxidase (Roche Molecular Biochemicals, Cat. No.: 11693 506) as secondary antibody at a dilution of 1:400 in TBS with 3% (w/v) not fat dry milk powder.
  • After washing the membranes three times with TBTS-buffer and once with TBS buffer for 10 min. at room temperature, the Western blot membranes were developed with a Luminol/peroxide-solution generating chemiluminescence (Lumi-LightPLUS Western Blotting Substrate, Roche Molecular Biochemicals, Cat. No.: 12015196001). Therefore the membranes were incubated in 10 ml Luminol/peroxide-solution for 10 seconds to 5 minutes and the emitted light was detected afterwards with a LUMI-Imager F1 Analysator (Roche Molecular Biochemicals) and a protein reference standard curve was obtained by plotting the known protein concentration of the scFv-poly-His proteins against their cognate measured LUMI-Imager signals (intensity of the spots expressed in BLU units) which was used for the calculation of the concentrations of target protein in the original samples.
  • The intensity of the spots was quantified with the LumiAnalyst Software (Version 3.1).
  • TABLE
    deter-
    mined total
    amount of amount of
    test- test-
    Lumi poly- poly-
    sample Imager peptide peptide
    amount signal per lane per sample
    lane sample [μl] [BLU] [ng] [ng]
    2 reference - 0 hours 5 no signal 0 0
    4 reference - 4 hours 0.02* 9703 6.3 315.7
    6 reference - 6 hours 0.02* 13461 9.0 448.9
    8 reference - 21 hours 0.02* 1448 0.5 22.9
    3 reported herein 5 no signal 0 0
    0 hours
    5 reported herein 0.02* 19440 13.2 660.9
    4 hours
    7 reported herein 0.02* 24014 16.5 823.1
    6 hours
    9 reported herein 0.02* 7738 4.9 246.0
    21 hours
    10 standard 1 (5 ng) 1 4803
    11 standard 2 (10 ng) 2 16847
    12 standard 3 (20 ng) 4 32733
    13 standard 4 (30 ng) 6 40476
    14 standard 5 (40 ng) 8 43858
    *sample diluted 1:50 with sample buffer; analyzed volume 5 μl
  • The protein reference standard curve obtained from five known scFv-poly-His concentrations is shown in FIG. 3.
  • The Western blot of the polypeptide containing supernatants is shown in FIG. 1.
  • The Western blot of the SDS-extracted cell pellet fraction is shown in FIG. 2.

Claims (11)

1. A method for recombinantly producing a polypeptide in a prokaryotic cell comprising the step of cultivating a prokaryotic cell which comprises a nucleic acid encoding the polypeptide, and recovering the polypeptide from the prokaryotic cell or the cultivation medium,
wherein each of the amino acid residues of the polypeptide is encoded by at least one codon, whereby the codon(s) encoding the same amino acid residue are combined in one group and each of the codons in a group is defined by a specific usage frequency within the group, whereby the sum of the specific usage frequencies of all codons in one group is 100%,
wherein the overall usage frequency of each codon in the polypeptide encoding nucleic acid is about the same as its specific usage frequency within its group.
2. The method according to claim 1, characterized in that the groups comprises only codons with an overall usage frequency within the genome of the cell of more than 5%.
3. The method according to claim 1, characterized in that the groups comprises only codons with an overall usage frequency within the genome of the cell of 8% or more.
4. The method according to claim 1, characterized in that the groups comprises only codons with an overall usage frequency within the genome of the cell of 10% or more.
5. The method according to claim 1, characterized in that for each sequential occurrence of a specific amino acid in the polypeptide starting from the N-terminus of the polypeptide in the corresponding position of the encoding nucleic acid the same codon is used as that which is present at the corresponding sequential position in the respective amino acid codon motif for the specific amino acid.
6. The method according to claim 5, characterized in that
i) after usage of the final codon of the amino acid codon motif at the next occurrence of the specific amino acid in the polypeptide the codon that is at the first position of the respective amino acid codon motif is used again in the corresponding encoding nucleic acid,
ii) for each further sequential occurrence of this specific amino acid in the polypeptide in the corresponding position of the encoding nucleic acid the codon is used which is present at the corresponding position in the respective amino acid codon motif for the specific amino acid.
7. The method according to claim 5, characterized in that the codons in the amino acid codon motif are arranged with decreasing specific usage frequency whereby after the codon with the lowest specific usage frequency or the codon with the second lowest specific usage frequency the codon with the highest specific usage frequency is used.
8. The method according to claim 7, characterized in that the codons in the amino acid codon motif are arranged with decreasing specific usage frequency whereby after the codon with the lowest specific usage frequency the codon with the highest specific usage frequency is used.
9. The method according to claim 1, characterized in that the cell is E. coli.
10. The method according to claim 9, characterized in that the amino acid codon motif for
alanine is selected from SEQ ID NO: 01, 02, 03, 04 and 05, and/or
arginine is selected from SEQ ID NO: 06 and 07, and/or
asparagine is selected from SEQ ID NO: 08, 09, 10, 11, and 12, and/or
aspartic acid is selected from SEQ ID NO: 13 and 14, and/or,
cysteine is selected from SEQ ID NO: 15, 16 and 17, and/or
glutamine is selected from SEQ ID NO: 18, 19, 20, and 21, and/or
glutamic acid is selected from SEQ ID NO: 22, 23 and 24, and/or
glycine is selected from SEQ ID NO: 25 and 26, and/or
histidine is selected from SEQ ID NO: 27 and 28, and/or
isoleucine is selected from SEQ ID NO: 29 and 30, and/or
leucine is selected from SEQ ID NO: 31, 32 and 33, and/or
lysine is selected from SEQ ID NO: 34, 35, 36 and 37, and/or
phenylalanine is selected from SEQ ID NO: 38, 39 and 40, and/or
proline is selected from SEQ ID NO: 41, 42, 43, 44, 45 and 46, and/or
serine is selected from, SEQ ID NO: 47 and 48, and/or
threonine is selected from SEQ ID NO: 49, 50 and 51, and/or
tyrosine is selected from SEQ ID NO: 52 and 53, and/or
valine is selected from SEQ ID NO: 54, 55 and 56.
11. The method according to any one of claims 1 to 10, characterized in that the polypeptide is an antibody, or an antibody fragment, or an antibody fusion polypeptide.
US14/517,516 2012-04-17 2014-10-17 Method for the expression of polypeptides using modified nucleic acids Abandoned US20150246961A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/833,012 US20180100006A1 (en) 2012-04-17 2017-12-06 Method for the expression of polypeptides using modified nucleic acids

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP12164430 2012-04-17
EP12164430.6 2012-04-17
PCT/EP2013/057808 WO2013156443A1 (en) 2012-04-17 2013-04-15 Method for the expression of polypeptides using modified nucleic acids

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2013/057808 Continuation WO2013156443A1 (en) 2012-04-17 2013-04-15 Method for the expression of polypeptides using modified nucleic acids

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/833,012 Continuation US20180100006A1 (en) 2012-04-17 2017-12-06 Method for the expression of polypeptides using modified nucleic acids

Publications (1)

Publication Number Publication Date
US20150246961A1 true US20150246961A1 (en) 2015-09-03

Family

ID=48184161

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/517,516 Abandoned US20150246961A1 (en) 2012-04-17 2014-10-17 Method for the expression of polypeptides using modified nucleic acids
US15/833,012 Pending US20180100006A1 (en) 2012-04-17 2017-12-06 Method for the expression of polypeptides using modified nucleic acids

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/833,012 Pending US20180100006A1 (en) 2012-04-17 2017-12-06 Method for the expression of polypeptides using modified nucleic acids

Country Status (12)

Country Link
US (2) US20150246961A1 (en)
EP (2) EP3138917B2 (en)
JP (2) JP6224077B2 (en)
KR (1) KR102064025B1 (en)
CN (2) CN114107352A (en)
BR (1) BR112014025693B1 (en)
CA (1) CA2865676C (en)
ES (1) ES2599386T3 (en)
HK (1) HK1205186A1 (en)
MX (1) MX349596B (en)
RU (1) RU2014144881A (en)
WO (1) WO2013156443A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112513263A (en) * 2018-05-29 2021-03-16 弗门尼舍有限公司 Method for producing a bryodin compound

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114107352A (en) * 2012-04-17 2022-03-01 弗·哈夫曼-拉罗切有限公司 Methods of expressing polypeptides using modified nucleic acids
AU2014296574A1 (en) 2013-07-29 2016-02-18 Danisco Us Inc. Variant enzymes
WO2020039183A1 (en) * 2018-08-20 2020-02-27 Ucl Business Plc Factor ix encoding nucleotides
CN113993888A (en) * 2019-06-28 2022-01-28 豪夫迈·罗氏有限公司 Method for producing antibody
CN114774391B (en) * 2022-03-09 2023-03-14 华南农业大学 Bacteriophage lysin for resisting escherichia coli and application thereof
CN114507682A (en) * 2022-03-16 2022-05-17 华南农业大学 Gene for coding recombinant NlpC/P60 endopeptidase protein and application thereof

Family Cites Families (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4816567A (en) 1983-04-08 1989-03-28 Genentech, Inc. Recombinant immunoglobin preparations
US5010182A (en) 1987-07-28 1991-04-23 Chiron Corporation DNA constructs containing a Kluyveromyces alpha factor leader sequence for directing secretion of heterologous polypeptides
AU4005289A (en) 1988-08-25 1990-03-01 Smithkline Beecham Corporation Recombinant saccharomyces
US5082767A (en) 1989-02-27 1992-01-21 Hatfield G Wesley Codon pair utilization
US5959177A (en) 1989-10-27 1999-09-28 The Scripps Research Institute Transgenic plants expressing assembled secretory antibodies
DK0628639T3 (en) 1991-04-25 2000-01-24 Chugai Pharmaceutical Co Ltd Reconstituted human antibody to human interleukin-6 receptor
JP3951062B2 (en) 1991-09-19 2007-08-01 ジェネンテック・インコーポレーテッド Expression of antibody fragments with cysteine present at least as a free thiol in E. coli for the production of bifunctional F (ab ') 2 antibodies
US5795737A (en) 1994-09-19 1998-08-18 The General Hospital Corporation High level expression of proteins
US5789199A (en) 1994-11-03 1998-08-04 Genentech, Inc. Process for bacterial production of polypeptides
US5840523A (en) 1995-03-01 1998-11-24 Genetech, Inc. Methods and compositions for secretion of heterologous polypeptides
US6040498A (en) 1998-08-11 2000-03-21 North Caroline State University Genetically engineered duckweed
US6291245B1 (en) 1998-07-15 2001-09-18 Roche Diagnostics Gmbh Host-vector system
EP0972838B1 (en) 1998-07-15 2004-09-15 Roche Diagnostics GmbH Escherichia coli host/vector system based on antibiotic-free selection by complementation of an auxotrophy
DE60022369T2 (en) 1999-10-04 2006-05-18 Medicago Inc., Sainte Foy PROCESS FOR REGULATING THE TRANSCRIPTION OF FOREIGN GENES IN THE PRESENCE OF NITROGEN
US7125978B1 (en) 1999-10-04 2006-10-24 Medicago Inc. Promoter for regulating expression of foreign genes
EP1156112B1 (en) 2000-05-18 2006-03-01 Geneart GmbH Synthetic gagpol genes and their uses
GB0014288D0 (en) * 2000-06-10 2000-08-02 Smithkline Beecham Biolog Vaccine
DE10037111A1 (en) * 2000-07-27 2002-02-07 Boehringer Ingelheim Int Production of a recombinant protein in a prokaryotic host cell
EP1383884A4 (en) * 2001-03-22 2004-12-15 Dendreon Corp Nucleic acid molecules encoding serine protease cvsp14, the encoded polypeptides and methods based thereon
WO2003070957A2 (en) 2002-02-20 2003-08-28 Novozymes A/S Plant polypeptide production
US20040005600A1 (en) 2002-04-01 2004-01-08 Evelina Angov Method of designing synthetic nucleic acid sequences for optimal protein expression in a host cell
EP1501863A4 (en) * 2002-05-03 2007-01-24 Sequenom Inc Kinase anchor protein muteins, peptides thereof, and related methods
CA2443365C (en) 2002-11-19 2010-01-12 F. Hoffmann-La Roche Ag Methods for the recombinant production of antifusogenic peptides
GB0308988D0 (en) * 2003-04-17 2003-05-28 Univ Singapore Molecule
JP4228072B2 (en) * 2003-06-04 2009-02-25 独立行政法人農業・食品産業技術総合研究機構 Artificial synthetic gene encoding avidin
WO2005116270A2 (en) 2004-05-18 2005-12-08 Vical Incorporated Influenza virus vaccine composition and method of use
JP2009538622A (en) 2006-05-30 2009-11-12 ダウ グローバル テクノロジーズ インコーポレイティド Codon optimization method
CN101627053A (en) * 2006-06-12 2010-01-13 西福根有限公司 Pan-cell surface receptor- specific therapeutics
EP2423315B1 (en) 2006-06-29 2015-01-07 DSM IP Assets B.V. A method for achieving improved polypeptide expression
CN114107352A (en) 2012-04-17 2022-03-01 弗·哈夫曼-拉罗切有限公司 Methods of expressing polypeptides using modified nucleic acids

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Bezerra, et al. "Non-Standard Genetic Codes Define New Concepts for Protein Engineering", Life, 5, pp. 1610-28. *
Donkor (2013) "Sequencing of Bacterial Genomes: Principles and Insights into Pathogenesis and Development of Antibiotics", Gene, 4: 556-72. *
Michael Anissimov, Wisegeek.com "How many species of bacteria are there", accessed 21 January 2014, No Journal, no issue, 2 pages printed. *
Sharp, et al. (2005) "Variation in the strength of selected codon usage bias among bacteria", Nucleic Acids Research, 33(4): 1141-53. *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112513263A (en) * 2018-05-29 2021-03-16 弗门尼舍有限公司 Method for producing a bryodin compound

Also Published As

Publication number Publication date
EP3138917A1 (en) 2017-03-08
EP3138917B1 (en) 2019-08-21
CA2865676A1 (en) 2013-10-24
HK1205186A1 (en) 2015-12-11
WO2013156443A1 (en) 2013-10-24
RU2014144881A (en) 2016-06-10
KR102064025B1 (en) 2020-01-08
JP6224077B2 (en) 2017-11-01
CA2865676C (en) 2020-03-10
EP3138917B2 (en) 2024-10-23
CN104245937B (en) 2021-09-21
ES2599386T3 (en) 2017-02-01
KR20140146105A (en) 2014-12-24
EP2839011B1 (en) 2016-09-14
CN104245937A (en) 2014-12-24
CN114107352A (en) 2022-03-01
JP2015514406A (en) 2015-05-21
BR112014025693B1 (en) 2021-12-07
US20180100006A1 (en) 2018-04-12
BR112014025693A2 (en) 2018-08-07
MX349596B (en) 2017-08-04
MX2014011805A (en) 2014-12-08
EP2839011A1 (en) 2015-02-25
JP2018038408A (en) 2018-03-15

Similar Documents

Publication Publication Date Title
US20180100006A1 (en) Method for the expression of polypeptides using modified nucleic acids
TWI352737B (en) Promoter
JP5117542B2 (en) BHK cells for high expression of recombinant factor VIII
EP0873405B1 (en) Expression augmenting sequence elements (ease) for eukaryotic expression systems
JP5432117B2 (en) Recombinant expression vector element (rEVE) for enhancing expression of recombinant protein in a host cell
TWI507526B (en) Expression vectors comprising chimeric cytomegalovirus promoter and enhancer sequences
CN107881151A (en) CHO expression systems
US20110165620A1 (en) Method for the production of proteins or protein fragments
KR102376287B1 (en) Method for detecting multispecific antibody light chain mispairing
AU606049B2 (en) Inducible heat shock and amplification system
US20120094297A1 (en) Method For Producing Protein
CN115197327A (en) RNA modified chimeric protein and application thereof
US20220356487A1 (en) Mammalian expression vectors
US20040209323A1 (en) Protein expression by codon harmonization and translational attenuation
EP1957660B1 (en) Materials and methods to increase peptide chain expression
JP2023183417A (en) Method for producing antibody
US20140356911A1 (en) Method for Producing Protein

Legal Events

Date Code Title Description
AS Assignment

Owner name: HOFFMANN-LA ROCHE INC., NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:F. HOFFMANN-LA ROCHE AG;REEL/FRAME:033975/0081

Effective date: 20130822

Owner name: F. HOFFMANN-LA ROCHE AG, SWITZERLAND

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ROCHE DIAGNOSTICS GMBH;REEL/FRAME:033975/0610

Effective date: 20130116

Owner name: ROCHE DIAGNOSTICS GMBH, GERMANY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KLOSTERMANN, STEFAN;KOPETZKI, ERHARD;SCHWARZ, URSULA;SIGNING DATES FROM 20121017 TO 20121203;REEL/FRAME:033975/0292

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION