MOLECULAR SWITCHES II
FIELD OF THE INVENTION
This invention relates to molecular switches and methods for identifying and selecting such switches. Particular molecular switches include gene switches that use nucleic acid binding molecules capable of binding a specific nucleic acid sequence (for example, a DNA sequence) in a ligand-dependent manner, and protein switches in which two protein binding artners bind in a manner which is modulatable by a ligand. Such methods optionally make use of array technology Moreover, this invention relates to methods for the identification of the ligand-dependent binding molecules as well as identification of ligands. The invention in particular relates to screening of arrays of nucleic acid targets with a known nucleic acid binding molecule, and a known or library of ligands for identification of new molecules which potentially modulate the interaction between nucleic acid and nucleic acid binding molecule.
BACKGROUND TO THE INVENTION
Protein-protein interactions are crucial to almost every physiological and pharmacological process. These interactions often are characterized by very high affinity, with dissociation constants in the low nanomolar to subpicomolar range. Such strong affinity between proteins is possible when a high level of specificity allows subtle discrimination among closely related structures. Proteins can bind to each other through several types of interface, for example, a "surface string" where a portion of the surface of one protein contacts an extended loop of polypeptide chain on a second protein, a "helix- helix" configuration involving two alpha helices, and a "surface-surface" configuration involving the matching of one surface to another. For example, it is known that the SH2 domain binds tightly to a region of a polypeptide chain that contains a phosphorylated tyrosine side chain.
Polypeptides can form higher order tertiary structures with like polypeptides (homo-oligomers) or with unalike polypeptides (hetero-oligomers). In the simplest
scenario, two identical polypeptides associate to form an active homodimer. An example of this type of association is the natural association of myosin II molecules in the assembly of myosin into filaments. Protein-protein association may be mediated by several factors, including post-translational modifications, by means of which enzymatic activity may be biologically controlled. For example, the phosphorylation state of a protein may cause it to associate with or dissociate from another protein. The phospohrylation state of a protein is thought to be determined by the relative activity of protein kinases which add phosphate and protein phosphatases which remove the phosphate moiety from the protein. For example, it is thought that phosphorylation of myosin II by protein- kinases is involved in the priming event leading to dimerization of myosin II monomers and subsequent formation of myosin filaments.
Ligand mediated association and dissociation of proteins is also known, in which the ability of a protein to interact with another protein is dependent on the binding of a ligand to one or both proteins. An example of ligand-mediated heterodimer association is described in patent application number W092/00388. This publication describes an adenosine 3 : 5 cyclic monophosphate (cAMP) dependent protein kinase which is a four- subunit enzyme being composed of two catalytic polypeptides (C) and two regulatory polypeptides (R). In nature the polypeptides associate in a stoichiometry of R2C2. In the absence of cAMP the R and C subunits associate and the enzyme complex is inactive. In the presence of cAMP the R subunit functions as a ligand for cAMP resulting in dissociation of the complex and the release of active protein kinase. The invention described in 092/00388 exploits this association by adding fluorochromes to the R and C subunits.
The present invention seeks to describe methods of identifying tripartite systems comprising protein switches.
Proteins can also interact with nucleic acids, for example, DNA and RNA. Many important biological interactions involve the binding of proteins to nucleic acids. Such interactions are important, for example, in the control of transcription and translation, and include the control and execution of nucleic acid replication, recombination, modification,
cleavage, degradation, ligation, splicing, packaging etc. In addition, nucleic acids (aptamers) have been engineered which are capable of binding to proteins and affecting the function of these proteins.
In many cases it would be highly advantageous to be able to modulate the binding of proteins to nucleic acids using a chemical moiety such as a ligand.
Zinc finger proteins are transcriptional regulators of gene expression which may be adapted to regulate a desired gene by modulating the binding specificity of the zinc finger for its target nucleic acid.- A number of applications for zinc finger technology have been suggested, including the treatment of diseases, use as reagents for manipulating nucleic acids and the regulation of gene expression.
One of the drawbacks of zinc fingers is that they are relatively large polypeptides. In order to introduce zinc fingers into a cell, it is necessary either to express the zinc fingers in the cell by means of a transgene, or to modify them to include cellular uptake domains which successfully target the zinc finger to the nucleus. In either case, but particularly the former, regulation of zinc finger activity is difficult, because the amount of zinc finger present in each cell nucleus can only be controlled indirectly, by influencing the level of zinc finger expression, or by varying the amount of protein administered to the cell. In both cases, upregulation or downregulation of zinc finger activity is slow. Downregulation in particular is dependent on the natural turnover of zinc finger molecules within the cell.
One interesting application of this effect would be the ligand-dependent regulation of gene expression: in effect a ' gene switch' . Gene switches are currently of great interest to those wishing to control timing and/or dosage of gene expression. Various gene switches have been developed in the prior art. Examples of such switches would include the tetracycline receptor system, the hormone receptor systems (e.g. those responsive to ecdysone) and the rapamycin-responsive dimerisation system. In these systems a promoter driving the target gene is engineered to contain binding sites for transcription regulatory molecules that comprise the gene switch.
Most of the above prior art switches are derived from naturally occuring gene. regulatory proteins. In the natural context of these switches, the switching ligand is able to produce a physiological effect by affecting a protein-nucleic acid interaction. The ability to apply gene switch capability to any desired promoter is highly desirable. Most promoters of clinical or commercial significance,' however, do not possess regulatory elements which are susceptible to gene switch regulation.
The present invention seeks to describe methods of identifying tripartite systems comprising gene switches.
SUMMARY OF THE INVENTION
According to a first aspect of the present invention, we provide a method of selecting one or more components of a switching system, the switching system comprising: (i) a first molecule and (ii) a second molecule, in which the first molecule binds to the second molecule in a manner modulatable by a ligand, and (iii) a ligand, the method comprising the steps of: (a) determining the degree of binding between a candidate first molecule and a candidate second molecule in the presence of a candidate ligand; (b) determining the degree of binding between the candidate first molecule and the candidate second molecule in the absence of the candidate ligand; and (c) identifying a first molecule / second molecule pair in which the binding of the first molecule to the second molecule differs in the presence and absence of a ligand; and (d) optionally isolating and/or identifying first molecule, the second molecule or the ligand.
In a preferred embodiment of the invention, the degree of binding between each of a plurality of candidate first molecules and a single candidate second molecule is determined substantially simultaneously. Furthermore, the degree of binding between each of a plurality of candidate first molecules and each of a plurality of candidate second molecules may also be determined substantially simultaneously.
Preferably, the plurality of candidate second molecules is provided in the form of an array of candidate second molecules. Alternatively or in addition, the plurality of candidate first molecules is provided in the form of an array of candidate first molecules.
Preferably, the degree of binding between the or each candidate first molecule and the or each candidate second molecule is determined substantially simultaneously in the presence and/or absence of each of a plurality of candidate ligands.
Furthermore, the plurality of candidate ligands may be provided in the form of an array of candidate ligands.
In an alternative aspect, therefore, the invention provides for a method of selecting one or more components of a switching system, the switching system comprising: (i) a first molecule and (ii) a second molecule, in which the first molecule binds to the second molecule in a manner modulatable by a ligand, and (iii) a ligand, the method comprising the steps of: (a) determining the degree of binding between one or more candidate first molecules and one or more candidate second molecules in the presence of one or more candidate ligands; (b) determining the degree of binding between the one or more candidate first molecules and the one or more candidate second molecules in the absence of the candidate ligand(s); and (c) identifying a first molecule / second molecule pair in which the binding of the first molecule to the second molecule differs in the presence and absence of a ligand; and (d) optionally isolating and/or identifying a first molecule, a second molecule or a ligand, in which at least one of the candidate first molecule(s), candidate second molecule(s) and the candidate ligand(s) is in the form of an array of molecules or ligands.
The first molecule may have a higher affinity for the second molecule in the presence of the ligand than in the absence of the ligand. Alternatively, the first molecule component has a higher affinity for the second molecule in the absence of the ligand than in the presence of the ligand.
The invention allows for the selection of one or more components of a gene switch. Thus, in one embodiment of the invention, one of the first molecule and the second molecule is a nucleic acid binding molecule and the other of the first molecule and the second molecule is a nucleic acid.
There is provided, according to a second aspect of the present invention, a method of selecting one or more components of a gene switch, which gene switch comprises (i) a target nucleic acid molecule; (ii) a nucleic acid binding molecule which binds to the target nucleic acid molecule in a manner modulatable by a ligand; and (iii) the ligand, which method comprises: (a) contacting one or more candidate target nucleic acid molecule(s) with one or more candidate nucleic acid binding molecules, in the presence of one or more ligands; (b) selecting a complex comprising a candidate target nucleic acid, a nucleic acid binding molecule and a ligand; (c) .optionally isolating and/or identifying the unknown components of the complex; (d) comparing the binding of the nucleic acid binding molecule component of the complex to the target nucleic acid component of the complex in the presence and absence of the ligand component of the complex; and (e) identifying a first nucleic acid / nucleic acid binding molecule pair in which the binding between the nucleic acid and the nucleic acid binding molecule differs in the presence and absence of a ligand,in which at least one of the nucleic acid binding molecules, target nucleic acid and candidate ligands is provided in the form of an array.
We provide, according to a third aspect of the present invention, a method of selecting a gene switch, which gene switch comprises (i) a target nucleic acid molecule; (ii) a nucleic acid binding molecule which binds to the target nucleic acid molecule in a manner modulatable by a ligand; and (iii) the ligand, which method comprises: (a) identifying a candidate target nucleic acid and a candidate nucleic acid binding molecule, capable of binding to each other, in which: (i) the target nucleic acid is a modified target nucleic acid and the nucleic acid binding molecule is an unmodified nucleic acid binding molecule; or (ii) the target nucleic acid is an unmodified target nucleic acid and the nucleic acid binding molecule is a modified nucleic acid binding molecule; or (iii) the target nucleic acid is a modified target nucleic acid and the nucleic acid binding molecule is a modified nucleic acid binding molecule; (b) identifying an unmodified nucleic acid and/or
an unmodified nucleic acid binding molecule corresponding to the modified molecule(s) in step (a); (c) comparing the binding of the unmodified nucleic acid binding molecule to the unmodified nucleic acid in the presence and absence of a candidate ligand; and (d) selecting complexes where the binding differs in the presence and absence of a ligand.
Preferably, the candidate target nucleic acid and/or a candidate nucleic acid binding molecule are identified in step (a) by contacting one or more target nucleic acid molecule(s) with one or more nucleic acid binding molecules. A plurality of nucleic acid binding molecules, preferably an array of nucleic acid binding molecules, may be used. Furthermore, a plurality of nucleic acids, preferably an array of nucleic acids, more preferably related to one another by sequence homology, may be used.
In a preferred embodiment of the invention, the candidate target nucleic acid identified in step (a) is a modified nucleic acid molecule, and the candidate nucleic acid binding molecule identified in step (a) is an unmodified nucleic acid binding molecule.
Preferably, the binding of the unmodified nucleic acid binding molecule to the unmodified nucleic acid is compared in step (c) in the presence and absence of a plurality of candidate ligands, preferably a library of candidate ligands. One, two or all of the plurality of candidate target nucleic acids, the plurality of candidate nucleic acid binding molecules and the plurality of candidate ligands, where present, may be provided in the form of an array.
The modified nucleic acid, where present, may be selected from a methylated nucleic acid and a phosphorylated nucleic acid. Furthermore, the modified nucleic acid binding molecule, where present, may be a modified polypeptide selected from the group consisting of: a polypeptide modified with a ubiquitin moiety, a polypeptide modified with a glycosyl moiety, a polypeptide modified with a fatty acyl moiety, a polypeptide modified with a sentrin moiety, a polypeptide modified with an ADP-ribosyl and a polypeptide modified with a phosphate moiety.
Preferably, the modified nucleic acid is produced by the reaction of a nucleic acid capable of being derivatised together with a modifying moiety. More preferably, the nucleic acid capable of being derivatised contains an amino, thio, oxo or bromogroup, or a group that can be chemically or photo-acitaved.
In a preferred embodiment of the invention, the nucleic acid binding molecule has a higher affinity for the target nucleic acid in the presence of the ligand by virtue of the ligand mimicking the interaction between the modified nucleic acid and the nucleic acid binding molecule, or the interaction between the nucleic acid and the modified nucleic acid binding molecule, or the interaction between the modified nucleic acid and the modified nucleic acid binding molecule, as the case may be.
Preferably, the nucleic acid binding molecule is a polypeptide, which is preferably at least partly derived from a transcription factor, preferably a zinc finger transcription factor. Preferably, the target nucleic acid is a DNA or RNA. Furthermore, at least one of the candidate nucleic acid binding molecules may comprise a non-naturally occurring nucleic acid binding domain. The candidate nucleic acid -binding molecules may be provided as a phage display library.
The methods of our invention allow the identification of any or all of the first molecule, the second molecule, and the ligand. Thus, preferably, a ligand is isolated and/or identified. Alternatively or in addition, a nucleic acid binding molecule is isolated and/or identified. The ligand may be a nucleic acid binding ligand, preferably selected from Distamycin A, Actinomycin D and echinomycin.
We provide, according to a fourth aspect of the present invention, a method of selecting a ligand which is capable of modulating the interaction between a nucleic acid binding molecule and a target nucleic acid, the method comprising the steps of: (a) providing one or a plurality of candidate ligands; (b) determining the degree of binding between the nucleic acid binding molecule and the target nucleic acid sequence in the presence of the or each candidate ligand; (c) determining the degree of binding between the nucleic acid binding molecule and the target nucleic acid sequence in the absence of the or
each candidate ligand; and (d) identifying a ligand for which the binding of the nucleic acid binding molecule to the target nucleic acid sequence differs in the presence and absence of the ligand.
The degree of binding between, each of a plurality of target nucleic-acids and a single candidate transcription factor may be determined substantially simultaneously. Furthermore, the degree of binding between each of a plurality of target nucleic acids and each of a plurality of nucleic acid binding molecules is determined substantially simultaneously. Furthermore, one, two or all of the plurality of candidate ligands, the plurality of nucleic acids and/or the plurality of nucleic acid binding molecules, where present, is provided in the form of an array.
In a preferred embodiment of the invention, the nucleic acid binding molecule is a transcription factor and the target nucleic acid is a DNA sequence.
As a fifth aspect of the present invention, there is provided a nucleic acid binding molecule, a target nucleic acid or a ligand selected by a method according to the first, second, third or fourth aspects of the invention.
The invention allows the selection of one or more components of a protein switch. Thus, in an alternative embodiment of the invention, each of the first and second molecules comprises a polypeptide. One or both of the first and second molecules may comprise a polypeptide binding domain. One or both of the first and second molecules may comprise an immunoglobulin molecule, preferably an antibody molecule. The ligand may also comprise an immunoglobulin molecule, preferably an antibody molecule.
Prerably, one or both of the first and second molecules is a nucleic acid binding protein capable of binding to nucleic acid. More preferably, the nucleic acid binding protein binds to nucleic acid in a manner modulatable by the other of the first and second molecule.
The present invention, in a sixth aspect, provides a polypeptide or ligand selected by a method according to the above protein switch embodiment.
In a seventh aspect of the present invention, there is provided use of a nucleic acid binding molecule, a target nucleic acid, and/or a ligand according to fifth or sixth aspect of the invention, or as selected by a method according to the first to fourth aspects of the invention in a method of regulating a biological process, in which the biological process involves binding between a nucleic acid binding molecule and a target nucleic acid.
Preferably, the binding between a target nucleic acid and a nucleic acid binding molecule is dependent on the presence or absence of a ligand.
According to an eighth aspect of the present invention, we provide use of a nucleic acid binding molecule, a target nucleic acid, and/or a ligand according to fifth or sixth aspect of the invention, or as selected by a method according to the first to fourth aspects of the invention, in a method of regulating transcription or translation from a nucleic acid sequence comprising a target nucleic acid to which a nucleic acid binding molecule binds in a manner modulatable by the ligand.
We provide, according to a ninth aspect of the invention, a method of modulating a biological process affecting one or more genes, the method comprising, administering a nucleic acid binding molecule and/or a ligand according to fifth or sixth aspect of the invention, or as selected by a method according to the first to fourth aspects of the invention to a cell, in which the regulatory sequences of said genes comprise a target nucleic acid according according to fifth aspect of the invention, or as selected by a method according to the first to fourth aspects of the invention.
There is provided, in accordance with a tenth aspect of the present invention, a method of modulating a biological process affecting one or more nucleotide sequences of interest in a host cell, which host cell comprises a nucleic acid sequence capable of directing the expression of a nucleic acid binding molecule and a target nucleic acid sequence to which the nucleic acid binding molecule binds in a manner modulatable by a
ligand, which method comprises administering said ligand to the cell and wherein the nucleic acid binding molecule is heterologous or endogenous to the host cell, in which one or more of the nucleic acid binding molecule, the ligand and the target nucleic acid sequence is/are according to fifth or sixth aspect of the invention, or as selected by a method according to the first to fourth aspects of the invention.
Preferably, the biological process is selected from the group consisting of: transcription, translation, phosphorylation, methylation, replication, restriction, modification, ligation, transport, degradation, editing, splicing, integration and recombination.
Preferably, the host cell is a plant cell. More preferably, the plant cell is part of a plant and the target sequence is part of a regulatory sequence to which the nucleotide sequence of interest is operably linked, said regulatory sequence being preferentially active in the male or female organs of the plant.
As an eleventh aspect of the invention, we provide a non human transgenic organism comprising a target nucleic acid sequence and a nucleic acid sequence capable of directing the expression of a nucleic acid binding molecule which binds to the target nucleic acid in a manner modulatable by a ligand wherein the target nucleic acid sequence and/or nucleic acid sequence are heterologous to the organism, in which one or more of the nucleic acid binding molecule, the ligand and the target nucleic acid sequence is/are according to according to fifth or sixth aspect of the invention, or in which one or more of the nucleic acid binding molecule, the ligand and the target nucleic acid sequence is/are as selected by a method according to the first to fourth aspects of the invention. Preferably, the transgenic non-human organism is a plant.
We provide, according to a twelfth aspect of the invention, there is provided a method of selecting a ligand which is capable of modulating the interaction between a nucleic acid binding molecule and a nucleic acid target, the method comprising contacting a nucleic acid binding molecule with a one or a plurality of nucleic acid targets in the form of an array, together with one or a plurality of candidate target nucleic acid molecules in
the form of an array; selecting a complex comprising a candidate target nucleic acid, a nucleic acid binding molecule and a ligand; optionally isolating and/or identifying the unknown components of the complex; comparing the binding of the nucleic acid binding molecule component of the complex to the target nucleic acid component of the complex in the presence and absence of the ligand component of the complex; and selecting complexes where said binding differs in the presence and absence of the ligand component.
According to a thirteenth aspect of the present invention, we provide a ligand selected according to a method according to the twelth aspect of the invention.
BRIEF DESCRIPTION OF THE FIGURES
Figure 1 shows a graph of the effect of Distamycin A concentration on binding of two different phage (clone 3 (3/2F) and clone 4 (4/5F)) to the DNA sequence AAAAAGGCG. In this case, the small molecule causes phage binding to DNA..
Figure 2 shows a graph of the effect of Actinomycin D concentration on binding of two different phage (AD clone 1 and 6) to the DNA sequence AGCTTGGCG. In this case, the small molecule causes phage binding to DNA..
Figure 3 shows four different phage (0.4/1, 0.4/2, 0.4/4 and 0.4/5) binding to the randomised DNA oligo YRYRYGGCG (where Y is C or T and R is G or A) in the presence, but not in the absence, of echinomycin (EM).
Figure 4 shows the binding site signature of phage 0.4/4 selected using the randomised DNA sequence (Yl )(R2)(Y3)(R4)(Y5)GGCG. The phage has a preference for the DNA sequence (T)(G/A)(C)(G/A)(T) in the presence of echinomycin.
Figure 5 shows binding of the phage 0.4/4 to three related DNA sequences, TACGTGGCG, TGTATGGCG and CGTACGGCG, as a function of echinomycin concentration. The first DNA site contains the optimal binding sequence as revealed by the binding site signature.
Figure 6 shows a graph of the effect of ligand concentration on binding of two different phage to specific DNA sequences. In this case, the respective phage are dissociated from the DNA in the presence of distamycin A or actinomycin D.
Figures 7 to 12 are referred to in Example 14. Figure 7 shows the layout of the 96- well stock plate of oligos used in the arrays.
Figure 8 shows the layout of the 96-well assays for a single zinc finger phage arrayed against a single drug.
Figure 9 shows a graph of the sensitivity of actinomycin D binding phage clone 1 to actinomycin D against the array of oligos.
Figure 10 shows a graph of the sensitivity of echinomycin binding phage 0.4/4 to echinomycin against the array of oligos.
Figure 11 shows the layout of the 384-well assay for a single zinc finger phage against multiple drugs. The white square in each quadrant contains no drug whereas the shaded squares contain drug. A different shade represents a different drug with the square in the top right of each quadrant being distamycin A (DA), bottom right of each quadrant being echinomycin (EM) and bottom left being actinomycin D (AD).
Figure 12 shows a graph of the sensitivity of distamycin A binding phage clone 3 (3/2F) to the 3 drugs against the array of oligos in a 384-well assay. The array follows the layout of Figure 11.
Figure 13 shows a graph of the sensitivity of distamycin A binding phage clone 3
(3/2F) to distamycin against the array of oligos. These data were extracted from the 384- well assay in Figure 12.
Figure 14 shows a graph of the sensitivity of distamycin A binding phage clone 3 (3/2F) to echinomycin against the array of oligos. These data were extracted from the 384- well assay in Figure 12.
Figure 15 shows a graph of the sensitivity of distamycin A binding phage clone 3 (3/2F) to actinomycin D against the array of oligos. These data were extracted from the 384-well assay in Figure 12.
DETAILED DESCRIPTION OF THE INVENTION
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art (e.g., in cell culture, molecular genetics, nucleic acid chemistry, hybridization techniques and biochemistry). Standard techniques are used for molecular, genetic and biochemical methods (see generally, Sambrook et al, Molecular Cloning: A Laboratory Manual, 2d ed. (1989) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. and Ausubel et al, Short Protocols in Molecular Biology (1999) 4th Ed, John Wiley & Sons, Inc. which are incorporated herein by reference), chemical methods, pharmaceutical formulations and delivery and treatment of patients.
The term 'modulatable by' is used to indicate that binding of the first molecule to the second molecule can be modulated or affected by the ligand. As applied to a gene switch, therefore, the ligand modulates or affects the binding of the nucleic acid binding molecule to the target nucleic acid, and (as applied to a protein switch), the binding of the two polypeptide molecules is modulated or affected by the ligand. In other words, the ligand can modulate, affect, regulate, adjust, alter, or vary the binding of the first molecule to the second molecule.
The term 'isolating' in the context of the invention, refers to the act of removing one or more components or molecules from a sample of candidate molecules which are used in the methods disclosed herein. Alternatively, 'isolating' means deducing the identity of the molecule, though it may not be physically separated from a mixture.
The term 'complex' is used to describe an association between a DNA and one or more molecules as defined herein, or between a polypeptide molecule and one or more molecules. In the case of a polypeptide, these molecules may include another polypeptide molecule and/or a ligand molecule.
The terms "DNA binding molecule", "DNA binding ligand" and "target DNA" are used extensively herein. However other types of nucleic acids other than DNA may be relevant. Consequently, it is intended that in general the above terms can be replaced with the terms "nucleic acid binding molecule", "nucleic acid binding ligand" and "target nucleic acid", respectively. Nucleic acids will in general be RNA or DNA, double stranded or single stranded. RNA may be at least partially double-stranded in the context of the present invention. However, in a preferred aspect of the invention, references to "DNA" mean deoxyribonucleic acid in a literal sense.
An array may be defined as an orderly arrangement of samples. Such samples may include nucleic acids (including oligonucleotides, double stranded and single stranded DNA, cDNAs, mRNAs, whole chromosomes, etc), proteins (including polypeptides), or any other molecules (such as ligands). Where reference is made to an "array" in this document, such references should be understood to include references to alternative terminologies for the same or similar technology. Thus, arrays are also known as: biochips, chips (e.g., DNA chips), microarrays, gene arrays, genome chips and GeneChip®s (Affymetrix, Inc). Thus, for example, in this document, the terms "DNA microarray(s)" and "DNA chip(s)" are used interchangeably. The term "array" also includes microfluidics-based chips or lab-on-chip systems (as disclosed in for example US Patent 5,750,015).
SELECTION OF GENE SWITCHES
This invention, according to one aspect of the invention, relates to the selection of one or more components of a gene switch. Preferably, the method invovles the use of arrays of any or all of the components.
The term "gene switch" is used herein to describe a multiple component system comprising (i) a target nucleic acid molecule; (ii) a nucleic acid binding molecule which binds to the target nucleic acid molecule in a manner modulatable by a ligand; and (iii) the ligand. Preferably, the nucleic acid is a DNA, and preferably, the nucleic acid binding molecule is a polypeptide. The nucleic acid binding molecule may or may not comprise a transcriptional effector domain, especially when part of the assay procedure. However, since ultimately the gene switch will be used to regulate transcription from one or more promoters, the nucleic acid binding molecule may need to be modified to include a transcriptional activator or repressor domain, if one is not already present. It is noted however that other effector domains, e.g. a nuclease domain such as Fold or an integrase domain such as from HIVb, may be used to modulate gene expression indirectly, as described in further detail below.
Thus, we disclose methods for identifying one, two or all three of a nucleic acid binding molecule, a nucleic acid and a ligand, in which binding between the nucleic acid binding molecule and the nucleic acid is modulatable by a ligand. Some of the methods disclosed make use of single nucleic acid binding molecule species, single nucleic acid species and a single ligand species. Other methods make use of a plurality of nucleic acid binding molecule species, a plurality of nucleic acid species and a plurality of ligand species. Yet other methods make use of combinations of the above, e.g., a plurality of nucleic acid species, a plurality of nucleic acid binding molecule species and a single ligand species. Where a plurality of any of the components is used, the component is preferably provided in the form of a library. Various libraries may be used and are disclosed in detail below.
In highly preferred embodiments of the invention, where a plurality of a component is used, this is in the form of an array of that component, viz, an array of nucleic acids, an array of nucleic acid binding molecules and/or an array of ligands. Thus, the invention encompasses methods of selecting one or more components of a gene switch in which one, two or all three of the nucleic acid binding molecules, target nucleic acids and candidate ligands is in the form of ah array. In other words, the methods may involve: arrayed nucleic acid binding molecules and arrayed target nucleic acids; arrayed nucleic acid
binding molecules and arrayed candidate ligands; or arrayed target nucleic acids and arrayed candidate ligands. Furthermore, the invention includes the use of all three components in the form of arrays, i.e., arrayed nucleic acid binding molecules, arrayed target nucleic acids and arrayed candidate ligands.
The ligand may be capable of binding to the nucleic acid binding molecule.
Alternatively, the ligand may be capable of binding to the nucleic acid. Furthermore, the ligand may be capable of binding to each of the nucleic acid and the nucleic acid binding molecule.
In preferred embodiments of the invention, the nucleic acid is a DNA and the nucleic acid binding molecule is a DNA binding polypeptide (DNA binding protein), such as a transcription factor. However, the nucleic acid may be an RNA, as disclosed in further detail below.
In preferred embodiments of the invention, one nucleic acid binding molecule species is used, in conjunction with a plurality of candidate ligand species in the form of an array. Either one target nucleic acid or a plurality of target nucleic acid species may be used, in which case the target DNAs may be in the form of an array.
SELECTION OF GENE SWITCHES USING MODIFIED COMPONENT(S)
In an alternative embodiment of the invention, our methods make use of modified nucleic acid or polypeptide to select gene switches.
Such a method involves firstly identifying a candidate target nucleic acid and a candidate nucleic acid binding molecule which are capable of binding to each other. Either or both of the candidate target nucleic acid and the candidate nucleic acid binding molecule are in a modified form, as explained in further detail below. Thus, the target nucleic acid may be a modified target nucleic acid and the nucleic acid binding molecule is an unmodified nucleic acid binding molecule. Alternatively, the target nucleic acid is an unmodified target nucleic acid and the nucleic acid binding molecule is a modified nucleic
acid binding molecule. Finally, the target nucleic acid may be a modified target nucleic acid and the nucleic acid binding molecule is a modified nucleic acid binding molecule.
Unmodified versions of the or each modified component in the binding pair are then identified. Finally, the method involves comparing the binding of the unmodified nucleic acid binding molecule to the unmodified nucleic acid in the presence and absence of a candidate ligand, and selecting complexes where the binding differs in the presence and absence of a ligand.
Either or both of the candidate target nucleic acid and the candidate nucleic acid binding molecule may be in the form of a single species. In particular, the candidate target nucleic acid and the candidate nucleic acid binding molecule may be molecules known to bind to each other. For example, the binding pair may comprise a nucleic acid binding molecule such as a polypeptide (for example, a transcription factor) known to bind to a modified nucleic acid sequence (such as a methylated DNA), or a modified polypeptide known to bind to a nucleic acid sequence. Alternatively, the binding pair may be identified through screening one or a plurality of candidate target nucleic acids against one or a plurality of candidate nucleic acid binding molecules, either or both of which may comprise known or unknown species. Where a plurality of a candidate target nucleic acid or a candidate nucleic acid binding molecule is involved, this is preferably in the form of a library of the component, more preferably in the form of an array of that component.
In vivo assays, such as a TRAP assay described in Paraskeva et al (1998), Proc.
Natl. Acad. Sci. USA, 95, 951-956, have been devised for selecting RNA binding polypeptides. This assay is useful for determining whether polypeptide interacts with RNA in vivo, and is based on translational repression of a reporter mRNA encoding green fluorescent protein by an RNA-binding protein for which a cognate binding site has been introduced into the 5' untranslated region. An in vitro variation of the TRAP assay, utilising coupled transcription and translation, may also be used. In vitro selection of RNA- BP variants that bind to a target RNA is also described in Laird-Offringa and Belasco, Methods Enzymol, 261.
In particular, a modified nucleic acid may be screened against a library, or an array, of randomised nucleic acid binding molecules. The modified nucleic acid may be modified (for example, covalently modified) with any suitable moeity, which may be a drug, a ligand, a chemical group or other small molecule. The moeity is thus attached to the nucleic acid in a permanent or semi-permanent manner. The nucleic acid binding molecule may be randomised by means known in the art and described in detail below. Binding between the modified nucleic acid and the nucleic acid binding molecules is assessed to select those nucleic acid binding molecules which bind to the modified nucleic acid (i.e., to select one or more binding pairs). An unmodified nucleic acid is then provided (i.e., a nucleic acid which lacks the attached moeity). Such an umodified nucleic acid may or may not bind to those nucleic acid binding molecules identified previously as being capable of binding to the modified nucleic acid. More often than not, the unmodified nucleic acid does not bind to the nucleic acid molecules identified previously.
Binding between the unmodified nucleic acid to each of the nucleic acid binding molecules is then assessed in the presence of a ligand, and those nucleic acid binding molecules selected which are capable of binding to the unmodified nucleic acid in the presence of ligand. Such an unmodified nucleic acid, together with the selected nucleic acid binding molecule and the ligand forms a gene switch; i.e., the interaction between the nucleic acid and the nucleic acid binding molecule is modulatable by the presence or absence of a ligand.
It will be appreciated that a similar selection for one or more components of a gene switch may be performed using a modified nucleic acid binding molecule and a library of randomised nucleic acids. Furthermore, a plurality of modified nucleic acid species may be screened against a single nucleic acid binding molecule species, or a single modified nucleic acid species may be screened against a plurality of nucleic acid binding molecule species. Likewise, a plurality of nucleic acid species may be screened against a single modified nucleic acid binding molecule species, or a single nucleic acid species may be screened against a plurality of modified nucleic acid binding molecule species. Finally, a plurality of modified nucleic acid species may be screened against a plurality of nucleic acid binding molecule species, or a plurality of nucleic acid species may be screened
against a plurality of modified nucleic acid binding molecule species. A plurality of modified nucleic acid species may be screened against a plurality of modified nucleic acid binding molecule species. Where a plurality of species is involved, this is preferably in the form of a randomised library, such as a combinatorial library, and preferably in the form of an array of molecules.
Similarly, although the above description refers to determination of binding in the presence of a single ligand, it will be appreciated that a plurality of ligands may be used. Preferably, the plurality of ligands is in the form of a library of ligands, more preferably in the form of a combinatorial library of ligands, most preferably in the form of an array of ligands.
The modification to the nucleic acid may be any known modification, such as methylation or phosphorylation. Similarly, the modification to the nucleic acid binding molecule may be any known modification such as ubiquitination, glycosyation, a fatty acylation, sentrinisation, ADP-ribosylation or phosphorylation.
SELECTION OF POLYPEPTIDE SWITCHES
The invention, according to a further embodiment, relates to protein switches, their components and selection of the protein switch and selection of any or all of the components of the switch. Array technology may be utilised in this aspect of the invention, in that one, two or all three of the first polypeptide, second polypeptide and ligand may be in the form of an array.
The term "protein switch" is used herein to describe a multiple component system comprising (i) a first polypeptide molecule; (ii) a second polypeptide molecule which binds to the first polypeptide molecule in a manner modulatable by a ligand; and (iii) the ligand.
Thus, one or both of the first and second polypeptide molecules may be in the form of a plurality of polypeptide molecules, preferably in the form of an array of polypeptide molecules. The first and second polypeptides may be screened against one or a plurality of
ligands, preferably a library of ligands, most preferably an array of ligands. Thus, this aspect of the invention encompasses the use of: arrayed first polypeptide and arrayed second polypeptide and arrayed first and/or second polypeptide and arrayed candidate ligands. Furthermore, the invention includes the use of all three components in the form of arrays, i.e., arrayed first polypeptide, arrayed second polypeptide and arrayed candidate ligands.
ARRAYS
Array technology and the various techniques and applications associated with it is described generally in numerous textbooks and documents. These include Lemieux et al, 1998, Molecular Breeding 4, 277-289, Schena and Davis. Parallel Analysis with Biological Chips, in PCR Methods Manual (eds. M. Innis, D. Gelfand, J. Sninsky), Schena and Davis, 1999, Genes, Genomes and Chips. In DNA Microarrays: A Practical Approach (ed. M. Schena), Oxford University Press, Oxford, UK, 1999), The Chipping Forecast {Nature Genetics special issue; January 1999 Supplement), Mark Schena (Ed.), Micro array Biochip Technology, (Eaton Publishing Company), Cortes, 2000, The Scientist 14[17]:25, Gwynne and Page, Microarray analysis: the next revolution in molecular biology, Science, 1999 August 6; Eakins and Chu, 1999, Trends in Biotechnology, 17, 217-218, and also at http://www.gene-chips.com..
Array technology overcomes the disadvantages with traditional methods in molecular biology, which generally work on a "one gene in one experiment" basis, resulting in low throughput and the inability to appreciate the "whole picture" of gene function. Currently, the major applications for array technology include the identification of sequence (gene / gene mutation) and the determination of expression level (abundance) of genes. Gene expression profiling may make use of array technology, optionally in combination with proteomics techniques (Celis et al, 2000, FEBS Lett, 480(1):2-16;
Lockhart and Winzeler, 2000, Nature 405(6788):827-836; Khan et al, 1999, 20(2):223-9). Other applications of array technology are also known in the art; for example, gene discovery, cancer research (Marx, 2000, Science 289: 1670-1672; Scherf, et al, 2000, Nat Genet;24(3):236-44; Ross et al, 2000, Nat Genet. 2000 Mar;24(3):227-35), SNP analysis
(Wang et al, 1998, Science, 280(5366): 1077-82), drug discovery, pharmacogenomics, disease diagnosis (for example, utilising microfluidics devices: Chemical & Engineering News, February 22, 1999, 77(8):27-36), toxicology (Rockett and Dix (2000), Xenobiotica, 30(2):155-77; Afshari et al., 1999, Cancer Resl;59(19):4759-60) and toxicogenomics (a hybrid of functional genomics and molecular toxicology). The goal of toxicogenomics is to find correlations between toxic responses to toxicants and changes in the genetic profiles of the objects exposed to such toxicants (Nuwaysir, et al (1999), Molecular Carcinogenesis, 24:153-159).
In general, any library may be arranged in an orderly manner into an array, by spatially separating the members of the library. Examples of suitable libraries for arraying include nucleic acid libraries (including DNA, cDNA, oligonucleotide, etc libraries), peptide, polypeptide and protein libraries, as well as libraries comprising any molecules, such as ligand libraries, among others. Accordingly, where reference is made to a "library" in this document, unless the context dictates otherwise, such reference should be taken to include reference to a library in the form of an array.
The samples (e.g., members of a library) are generally fixed or immobilised onto a solid phase, preferably a solid substrate, to limit diffusion and admixing of the samples. In a preferred embodiment, libraries of ligands may be prepared. In particular, the libraries may be immobilised to a substantially planar solid phase, including membranes and non- porous substrates such as plastic and glass. Furthermore, the samples are preferably arranged in such a way that indexing (i.e., reference or access to a particular sample) is facilitated. Typically the samples are applied as spots in a grid formation. Common assay systems may be adapted for this purpose. For example, an array may be immobilised on the surface of a microplate, either with multiple samples in a well, or with a single sample in each well. Furthermore, the solid substrate may be a membrane, such as a nitrocellulose or nylon membrane (for example, membranes used in blotting experiments). Alternative substrates include glass, or silica based substrates. Thus, the samples are immobilised by any suitable method known in the art, for example, by charge interactions, or by chemical coupling to the walls or bottom of the wells, or the surface of the membrane. Other means of arranging and fixing may be used, for example, pipetting, drop-touch, piezoelectric
means, ink-jet and bubblejet technology, electrostatic application, etc. In the case of silicon-based chips, photolithography may be utilised to arrange and fix the samples on the chip.
The samples may be arranged by being "spotted" onto the solid substrate; this may be done by hand or by making use of robotics to deposit the sample. In general, arrays may be described as macroarrays or mieroarrays, the difference being the size of the sample spots. Macroarrays typically contain sample spot sizes of about 300 microns or larger and may be easily imaged by existing gel and blot scanners. The sample spot sizes in mieroarrays are typically less than 200 microns in diameter and these arrays usually contain thousands of spots. Thus, mieroarrays may require specialized robotics and imaging equipment, which may need to be custom made. Instrumentation is described generally in a review by Cortese, 2000, The Scientist 14[11]:26.
Techniques for producing immobilised libraries of DNA molecules have been described in the art. Generally, most prior art methods described how to synthesise single- stranded nucleic acid molecule libraries, using for example masking techniques to build up various permutations of sequences at the various discrete positions on the solid substrate. U.S. Patent No. 5,837,832, the contents of which are incoφorated herein by reference, describes an improved method for producing DNA arrays immobilised to silicon substrates based on very large scale integration technology. In particular, U.S. Patent No. 5,837,832 describes a strategy called "tiling" to synthesize specific sets of probes at spatially-defined locations on a substrate which may be used to produced the immobilised DNA libraries of the present invention. U.S. Patent No. 5,837,832 also provides references for earlier techniques that may also be used.
Arrays of peptides (or peptidomimetics) may also be synthesised on a surface in a manner that places each distinct library member (e.g., unique peptide sequence) at a discrete, predefined location in the array. The identity of each library member is determined by its spatial location in the array. The locations in the array where binding interactions between a predetermined molecule (e.g., a target or probe) and reactive library members occur is determined, thereby identifying the sequences of the reactive library
members on the basis of spatial location. These methods are described in U.S. Patent No. 5,143,854; WO90/15070 and WO92/10092; Fodor et al. (1991) Science, 251: 767; Dower and Fodor (1991) Ann. Rep. Med. Chem., 26: 271.
To aid detection, targets and probes may be labelled with any readily detectable reporter, for example, a fluorescent, bioluminescent, phosphorescent, radioactive, etc reporter. Such reporters, their detection, coupling to targets/probes, etc are discussed elsewhere in this document. Labelling of probes and targets is also disclosed in Shalon et al., 1996, Genome Res 6(7):639-45
Specific examples of DNA arrays are as follow:
Format I: probe cDNA (500-5,000 bases long) is immobilized to a solid surface such as glass using robot spotting and exposed to a set of targets either separately or in a mixture. This method is widely considered as having been developed at Stanford University (Ekins and Chu, 1999, Trends in Biotechnology, 1999, 17, 217-218).
Format II: an array of oligonucleotide (20~25-mer oligos) or peptide nucleic acid (PNA) probes is synthesized either in situ (on-chip) or by conventional synthesis followed by on-chip immobilization. The array is exposed to labeled sample DNA, hybridized, and the identity/abundance of complementary sequences are determined. Such a DNA chip is sold by Affymetrix, Inc., under the GeneChip® trademark.
Examples of some commercially available microarray formats are set out in Table 1 below (see also Marshall and Hodgson, 1998, Nature Biotechnology, 16(1), 27-31.
Table 1. Examples of currently available hybridization microarray formats
Data analysis is also an important part of an experiment involving arrays. The raw data from a microarray experiment typically are images, which need to be transformed into gene expression matrices - tables where rows represent for example genes, columns represent for example various samples such as tissues or experimental conditions, and numbers in each cell for example characterize the expression level of the particular gene in the particular sample. These matrices have to be analyzed further, if any knowledge about the underlying biological processes is to be extracted. Methods of data analysis (including supervised and unsupervised data analysis as well as.bioinformatics approaches) are disclosed in Brazma and Vilo J (2000) FEBS Lett 480(1): 17-24.
As disclosed above, proteins, polypeptides, etc may also be immobilised in arrays. For example, antibodies have been used in microarray analysis of the proteome using protein chips (Borrebaeck CA, 2000, Immunol Today 21(8):379-82). Polypeptide arrays are reviewed in, for example, MacBeath and Schreiber, 2000, Science, 289(5485): p. 1760- 1763.
The methods of selecting one or more components of a gene switch or a protein switch according to the invention, in highly preferred embodiments, make use of array technology. Thus, any, some or all of the components of the gene switch and/or the protein switch may be provided in the form of an array. This is done by for example immobilising one of the components (for example, a nucleic acid) in an array, and exposing the members of the array to the other component (i.e., the nucleic acid binding molecule). Duplicates are set up, in which in one set, the candidate ligand(s) is present, and in the other set the candidate ligand(s) are absent. Where the second component binds to the first component on the array, it may be detected by means of a suitable probe (e.g., an antibody against the nucleic acid binding molecule). For this purpose, the nucleic acid binding molecule may incorporate a suitable tag, whose presence may easily be detected. Comparison of the pattern of bound probe in the duplicates allows the determination of binding in the presence and absence of the ligand.
It will be appreciated that, with the use of arrays of one, several or all of the components of the gene switch, at no time is it necessary to isolate or select complexes of candidate nucleic acid, candidate nucleic acid bindng molecule and candidate ligand, in order to identify a gene switch. Similarly, at no time is it necessary to isolate complexes of first and second polypeptide and ligand in order to identify a protein switch. Thus, the strength of binding between the partners may be determined in the presence and absence of the candidate ligand(s) directly.
COMBINATORIAL LIBRARIES
Libraries according to the invention, in particular, libraries of ligands, may suitably be in the form of combinatorial libraries (also known as combinatorial chemical libraries).
A "combinatorial library", as the term is used in this document, is a collection of multiple species of chemical compounds that consist of randomly selected subunits. According to the invention, combinatorial libraries may be screened for ligands that affect the binding between a nucleic acid binding molecule and a target nucleic acid. The target nucleic acid may be a known nucleotide sequence of interest, for example, a transcription regulatory element. Furthermore, the nucleic acid binding molecule may be a known nucleic acid binding molecule, for example, a transcription factor.
Various combinatorial libraries of chemical compounds are currently available, including libraries active against proteolytic and nonproteolytic enzymes, libraries of agonists and antagonists of G-protein coupled receptors (GPCRs), libraries active against non-GPCR targets (e.g., integrins, ion channels, domain interactions, nuclear receptors, and transcription factors) and libraries of whole-cell oncology and antiinfective targets, among others. A comprehensive review of combinatorial libraries, in particular their construction and uses is provided in Dolle and Nelson (1999), Journal of Combinatorial Chemistry, Vol 1 No 4, 235-282.
Further references describing chemical combinatorial libraries, their production and use include those available from the URL http://www.netsci.org/Science/Combichem/,
including The Chemical Generation of Molecular Diversity. Michael R. Pavia, Sphinx Pharmaceuticals, A Division of Eli Lilly (Published July, 1995); Combinatorial Chemistry: A Strategy for the Future - MDL Information Systems discusses the role its Project Library plays in managing diversity libraries (Published July, 1995); Solid Support Combinatorial Chemistry in Lead Discovery and SAR Optimization, Adnan M. M. Mjalli and Barry E. Toyonaga, Ontogen Corporation (Published July, 1995); Non-Peptidic Bradykinin Receptor Antagonists From a Structurally Directed Non-Peptide Library. Sarvajit Chakravarty, Babu J. Mavunkel, Robin Andy, Donald J. Kyle*, Scios Nova Inc. (Published July, 1995); Combinatorial Chemistry Library Design using Pharmacophore Diversity Keith Davies and Clive Briant, Chemical Design Ltd. (Published July, 1995); A Database System for Combinatorial Synthesis Experiments - Craig James and David Weininger, Daylight Chemical Information Systems, Inc. (Published July, 1995); An Information Management Architecture for Combinatorial Chemistry, Keith Davies and Catherine White, Chemical Design Ltd. (Published July, 1995); Novel Software Tools for Addressing Chemical Diversity, R. S. Pearlman, Laboratory for Molecular Graphics and Theoretical Modeling, College of Pharmacy, University of Texas (Published June/July, 1996); Opportunities for Computational Chemists Afforded by the New Strategies in Drug Discovery: An Opinion, Yvonne Connolly Martin, Computer Assisted Molecular Design Project, Abbott Laboratories (Published June/July, 1996); Combinatorial Chemistry and Molecular Diversity Course at the University of Louisville: A Description, Arno F.
Spatola, Department of Chemistry, University of Louisville (Published June/July, 1996); Chemically Generated Screening Libraries: Present and Future. Michael R. Pavia, Sphinx Pharmaceuticals, A Division of Eli Lilly (Published June/July, 1996); Chemical Strategies For Introducing Carbohydrate Molecular Diversity Into The Drug Discovery Process.. Michael J. Sofia, Transcell Technologies Inc. (Published June/July, 1996); Data
Management for Combinatorial Chemistry. Maryjo Zaborowski, Chiron Corporation and Sheila H. DeWitt, Parke-Davis Pharmaceutical Research, Division of Warner-Lambert Company (Published November, 1995); and The Impact of High Throughput Organic Synthesis on R&D in Bio-Based Industries, John P. Devlin (Published March, 1996).
Techniques in combinatorial chemistry are gaining wide acceptance among modern methods for the generation of new pharmaceutical leads (Gallop, M. A. et al., 1994, J.
Med. Chem. 37:1233-1251; Gordon, E. M. et al., 1994, J. Med. Chem. 37: 1385-1401.). One combinatorial approach in use is based on a strategy involving the synthesis of libraries containing a different structure on each particle of the solid phase support, interaction of the library with a soluble receptor, identification of the 'bead' which interacts with the macromolecular target, and determination of the structure carried by the identified bead' (Lam, K. S. et al, 1991, Nature 354:82-84). An alternative to this approach is the sequential release of defined aliquots of the compounds from the solid support, with subsequent determination of activity in solution, identification of the particle from which the active compound was released, and elucidation of its structure by direct sequencing (Salmon, S. E. et al, 1993, Proc.Natl.Acad.Sci.USA 90:11708-11712), or by reading its code (Kerr, J. M. et al., 1993, J.Am.Chem.Soc. 115:2529-2531; Nikolaiev, V. et al, 1993, Pept. Res. 6:161-170; Ohlmeyer, M. H. J. et al, 1993, Proc.Natl.Acad.Sci.USA 90:10922-10926).
Soluble random combinatorial libraries may be synthesized using a simple principle for the generation of equimolar mixtures of peptides which was first described by Furka (Furka, A. et al, 1988, Xth International Symposium on Medicinal Chemistry, Budapest 1988; Furka, A. et al, 1988, 14th International Congress of Biochemistry, Prague 1988; Furka, A. et al, 1991, Int. J. Peptide Protein Res. 37:487-493). The construction of soluble libraries for iterative screening has also been described (Houghten, R. A. et all 991, Nature 354:84-86). K. S. Lam disclosed the novel and unexpectedly powerful technique of using insoluble random combinatorial libraries. Lam synthesized random combinatorial libraries on solid phase supports, so that each support had a test compound of uniform molecular structure, and screened the libraries without prior removal of the test compounds from the support by solid phase binding protocols (Lam, K. S. et al, 1991, Nature 354:82-84).
Thus, a library of candidate ligands or nucleic acid binding molecules may be a synthetic combinatorial library (e.g., a combinatorial chemical library), a cellular extract, a bodily fluid (e.g., urine, blood, tears, sweat, or saliva), or other mixture of synthetic or natural products (e.g., a library of small molecules or a fermentation mixture).
A library of candidate ligands, nucleic acid binding molecules or target nucleic acids may include, for example, amino acids, oligopeptides, polypeptides, proteins, or fragments of peptides or proteins; nucleic acids (e.g., antisense; DNA; RNA; or peptide nucleic acids, PNA); aptamers; or carbohydrates or polysaccharides. Each member of the library can be singular or can be a part of a mixture (e.g., a compressed library). The library may-eontain purified compounds or can be "dirty" (i.e., containing a significant quantity of impurities).
Commercially available libraries (e.g., from Affymetrix, ArQule, Neose Technologies, Sarco, Ciddco, Oxford Asymmetry, Maybridge, Aldrich, Panlabs, Pharmacopoeia, Sigma, or Tripose) may also be used with the methods described here.
In addition to libraries as described above, special libraries called diversity files can be used to assess the specificity, reliability, or reproducibility of the new methods. Diversity files contain a large number of compounds (e.g., 1000 or more small molecules) representative of many classes of compounds that could potentially result in nonspecific detection in an assay. Diversity files are commercially available or can also be assembled from individual compounds commercially available from the vendors listed above.
NUCLEIC ACID AND POLYPEPTIDE BINDING MOLECULES
The term "nucleic acid binding molecule' includes any molecule which is capable of binding or associating with nucleic acid. Similarly, the term "polypeptide binding molecule" includes any molecule capable of binding or association with a polypeptide. This binding or association may be via covalent bonding, via ionic bonding, via hydrogen bonding, via Van-der-Waals bonding, or via any other type of reversible or irreversible association.
In the context of the present invention, "nucleic acid" is usually DNA. Reference to 'nucleic acid binding molecule' is to be taken to include reference to 'DNA binding molecule'. Accordingly, the term "DNA binding molecule" is to be construed as including any molecule which is capable or binding or associating with DNA. However, the nucleic
acid may be RNA, as set out below, or any other form of nucleic acid, including completely or partially synthetic nucleic acids. Preferably, the nucleic acid binding molecule is a polypeptide.
Polypeptides
As used herein, the terms "peptide", "polypeptide" and "protein" refer to a polymer in which the monomers are amino acids and are joined together through peptide or disulfide bonds. "Polypeptide" refers to either a full-length naturally-occurring amino acid chain or a "fragment thereof or "peptide", such as a selected region of the polypeptide that binds to another protein, peptide or polypeptide in a manner modulatable by a ligand, or to an amino acid polymer, or a fragment or peptide thereof, which is partially or wholly non- natural. "Fragment thereof thus refers to an amino acid sequence that is a portion of a full- length polypeptide, between about 8 and about 500 amino acids in length, preferably about 8 to about 300, more preferably about 8 to about 200 amino acids, and even more preferably about 10 to about 50 or 100 amino acids in length. "Peptide" refers to a short amino acid sequence that is 10-40 amino acids long, preferably 10-35 amino acids. Additionally, unnatural amino acids, for example, -alanine, phenyl glycine and homoarginine may be included. Commonly-encountered amino acids which are not gene- encoded may also be used in the present invention. All of the amino acids used in the present invention may be either the D- or L- optical isomer. The L-isomers are preferred. In addition, other peptidomimetics are also useful, e.g. in linker sequences of polypeptides of the present invention (see Spatola, 1983, in Chemistry and Biochemistry of Amino Acids, Peptides and Proteins, Weinstein, ed. Marcel Dekker, New York, p. 267). A "polypeptide binding molecule" is a molecule, preferably a polypeptide, protein or peptide, which has the ability to bind to another polypeptide, protein or peptide. Preferably, this binding ability is modulatable by a ligand.
The term "synthetic", as used herein, is defined as that which is produced by in vitro chemical.
As used herein, the term "domain" refers to a linear sequence of amino acids which exhibits biological function, such as the ability to bind another molecule (for example,
another polypeptide or fragment thereof). This linear sequence includes full-length amino acid sequences (e.g. those encoded by a full-length gene or polynucleotide), or a portion or fragment thereof, provided the biological function, in particular binding ability, is maintained by that portion or fragment. The term "domain" also may refer to polypeptides and peptides having biological function. A polypeptide useful in the invention will at least have a binding capability, i.e, with respect to binding as or to a binding partner, and also may have another biological function that is a biological function of a protein or domain from which the peptide sequence is derived.
Molecule
The term 'molecule' is used herein to refer to any atom, ion, molecule, macromolecule (for example polypeptide), or combination of such entities. The term 'ligand' is used interchangeably with the term 'molecule'. Molecules according the invention may be free in solution, or may be partially or fully immobilised. They may be present as discrete entities, or may be complexed with other molecules. Preferably, molecules according to the invention include polypeptides displayed on the surface of bacteriophage particles. Alternatively or in addition, the polypeptides may be arranged in arrays. More preferably, molecules according to the invention include libraries of polypeptides presented as integral parts of the envelope proteins on the outer surface of bacteriophage particles. Methods for the production of libraries encoding randomised polypeptides are known in the art and may be applied in the present invention.
Randomisation may be total, or partial; in the case of partial randomisation, the selected codons preferably encode options for amino acids, and not for stop codons.
Candidate Binding Molecules
The term 'candidate nucleic binding molecules' is used to describe any one or more molecule(s) as defined above which may or may not be capable of binding nucleic acids. Likewise, the term 'candidate DNA binding molecules' is used to describe any one or more molecule(s) as defined above which may or may not be capable of binding DNA. Similarly, the term 'candidate polypeptide binding molecules' is used to describe any one or more molecule(s) as defined above which may or may not be capable of binding polypeptides.
The capability of molecules to bind DNA or nucleic acids may or may not be modulatable by a ligand. The latter of these properties may be investigated by the methods of this invention. Preferably, candidate polypeptide, DNA or nucleic acid binding molecules such as DNA binding molecules comprise a plurality of, or a library of polypeptides. Preferably, the candidate polypeptide, DNA or nucleic acid binding molecules comprise an array of polypeptide, DNA or nucleic acid binding molecules.
Preferably, polypeptide binding molecules are, or are derived from known polypeptide binding proteins, preferably polypeptide binding domains. A polypeptide binding domain useful in the present invention will comprise a binding site which will permit it to bind to other polypeptides (binding partners) to form a complex. Polypeptides are known to be able to associate in a number of ways, and domains which mediate polypeptide association are known in the art. For example, coiled coils, acid patches, zinc fingers, calcium hands, WD40 motifs, SH2/SH3 domains and leucine zippers are all polypeptide domains known to mediate protein-protein interactions, as are other domains known to those skilled in the art. Preferably, the invention makes use of randomised polypeptides for the selection of protein switches. More preferably, the randomised polypeptides comprise any of the polypeptide binding domains described here; most preferably, the randomisation occurs at the domain (i.e., that part of the polypeptide responsible for its interaction with another polypeptide).
Preferably, DNA or nucleic acid binding polypeptides are, or are derived from,
DNA or nucleic acid binding proteins (including DNA binding proteins) such as DNA repair enzymes, polymerases, recombinases, methylases, restriction enzymes, replication factors, histones, or DNA binding structural proteins such as chromosomal scaffold proteins; even more preferably DNA binding polypeptides are derived from DNA binding proteins, preferably transcription factors.
Alternatively nucleic acid binding polypeptides are derived from RNA-binding proteins such as ribosomal proteins, components of the splicing machinery, viral or cellular regulatory or RNA packaging proteins, or RNA trafficking proteins. In one aspect of the
invention, nucleic acid binding molecules comprise molecules capable of binding to both RNA and DNA, for example ethidium bromide, or the transcription factor TFIIIA.
'Derived from' means that the candidate DNA binding molecules preferably comprise one or more of; DNA binding proteins, transcription factors, fragment(s) of DNA binding proteins or transcription factors, sequences homologous to DNA binding proteins or transcription factors, or polypeptides which have been fully or partially randomised from a starting sequence which is a DNA binding proteins or a transcription factor, a fragment of a DNA binding protein or a transcription factor, or homologous to a DNA binding protein or a transcription factor. Similarly, candidate polypeptide binding molecules preferably comprise one or more of the known polypeptide binding molecules or domains listed above or known in the art.
Most preferably, candidate polypeptide, DNA or nucleic acid binding molecules comprise polypeptides which are at least 40% homologous, more preferably at least 60% homologous, even more preferably at least 75% homologous or even more, for example 85 %, or 90 %, or even more than 95% homologous to one or more DNA/polypeptide binding proteins (as the case may be), preferably transcription factors (in the case of DNA or nucleic acid binding molecules), using one of the homo logy calculation algorithms defined below.
Candidate DNA or nucleic acid binding molecules may comprise, among other things, DNA binding part(s) of any protein(s), for example zinc finger transcription factors, Zif268, ATF family transcription factors, ATF1, ATF2, bZIP proteins, CHOP, NF-I BKB, TATA binding protein (TBP), MDM, c-jun, elk, serum response factor (SRF), ternary complex factor (TCF); KRUPPEL, Odd Skipped, even skipped and other D.melanogaster transcription factors; yeast transcription factors such as GCN4, the GAL family of galactose-inducible transcription factors; bacterial transcription factors or repressors such
• as lacfi, or fragments or derivatives thereof. Derivatives would be considered by a person skilled in the art to be functionally and/or structurally related to the molecule(s) from which they are derived, for example through sequence homology of at least 40%.
The candidate polypeptide binding molecules or DNA or nucleic acid binding molecules may be non-randomised polypeptides, for example 'wild-type' or allelic variants of naturally occurring polypeptides, or may be specific mutant(s), or may be wholly or partially randomised polypeptides, preferably structurally related to protein, nucleic acid or DNA binding proteins as described herein.
The candidate DNA binding molecules may be displayed on the surface of bacteriophage particles. Such displayed nucleic acid and polypeptide binding molecule are preferably partially randomised zinc-finger type transcription factors, preferably retaining at least 40% homology (as described herein) to zinc-finger type transcription factors.
The bacteriophage particles may themselves be arranged in the form of an array.
In some cases, sequence homology may be considered in relation to structurally important residues, or those residues which are known or suspected of being evolutionarily conserved. In such instances, residues known to be variable or non-essential for a particular structural conformation may be discounted from the homology calculation. For example, as explained herein, zinc fingers are known to have certain residues which are important for the formation of the three-dimensional zinc finger structure. In these cases, homology may be considered over about seven of said important amino acid residues amongst approximately thirty residues which may comprise the whole finger structure.
As used herein, the term homology may refer to structural homology. Structural homology may be estimated by comparing the structural RMS deviation of the main part of the carbon atom backbone of two or more molecules. Preferably, the molecules may be considered structurally homologous if the deviation is 5 A or less, preferably 3 A or less, more preferably 1.5A or less. Structurally homologous molecules will not necessarily show significant sequence homology.
Candidate nucleic acid DNA binding molecules or polypeptide binding molecules, as defined above, may be pre-screened prior to being tested in the methods of the invention using routine assays known in art for determining the binding of molecules to nucleic acids
or polypeptides so as to eliminate molecules that do not have binding ability DNA. For example, a candidate nucleic acid DNA binding molecule, preferably a library of candidate nucleic acid DNA binding molecules, are contacted with nucleic acid and binding determined. The nucleic acids may for example be labelled with a detectable label, such as a fluorophore/flurochrome, such that after a wash step binding can be determined easily, for example by monitoring fluorescence. Similar methods may be used to pre-screen polypeptide binding molecules. Other methods for measuring binding to nucleic acid DNA are set out below.
The nucleic acid with which the candidate nucleic acid binding molecules are contacted may be non-specific nucleic acids, such as a random oligonucleotide library or sonicated genomic DNA and the like. Alternatively, a specific sequence such as a specific DNA sequence or a partially randomised library of sequences may be used or partially randomised library of sequences.
Preferably,- the nucleic acid DNA binding molecules of the invention may bind the target nucleic acid with different affinity in the presence or in the absence of a ligand. Similarly, the polypeptide binding molecules may bind their targets (i.e., where the first and second molecules are both polypeptides) with a different affinity in the presence or in the absence of ligand. The binding to the nucleic acid may be enhanced by the presence of the ligand (i.e. bind with a higher affinity in the presence of ligand), or may be reduced in the presence of ligand (i.e. bind with a lower affinity in the presence of ligand). In the case where association of the nucleic acid or DNA binding molecule(s) with the target nucleic acid (or the association of the polypeptide binding molecule(s) with their targets) is enhanced by the presence of ligand, said association may be additive with the binding of the ligand, or may be synergistic with the binding of the ligand, or may affect the binding in another way. If the binding is synergistic with the binding of the ligand, said binding may be either wholly or partly dependent on the presence of the ligand. Preferably, the characteristics of binding may be such that the nucleic acid DNAor binding molecule(s) or polypeptide binding molecule(s) may be eluted by addition of an excess of the ligand.
Nucleic acid/nucleic acid binding molecule and polypeptide/polypeptide binding assays are known in the art. Preferably, the strength of binding or degree of binding is measured by measuring the dissociation constant of the relevant complex.
The term 'DNA binding molecule' includes any molecule which is capable of binding or associating with DNA. This binding or association may be via covalent bonding, via ionic bonding, via hydrogen bonding, via Van-der-Waals bonding, or via any other type of reversible or irreversible association.
The term 'molecule' is used herein to refer to any atom, ion, molecule, macromolecule (for example polypeptide), or combination of such entities. The term Tigand' is used interchangeably with the term 'molecule'. Molecules according the invention may be free in solution, or may be partially or fully immobilised. They may be present as discrete entities, or may be complexed with other molecules. Preferably, molecules according to the invention include polypeptides displayed on the surface of bacteriophage particles. More preferably, molecules according to the invention include libraries of polypeptides presented as integral parts of the envelope proteins on the outer surface of bacteriophage particles. Methods for the production of libraries encoding randomised polypeptides are known in the art and may be applied in the present invention. Randomisation may be total, or partial; in the case of partial randomisation, the selected codons preferably encode options for amino acids, and not for stop codons.
The term 'candidate DNA binding molecules' is used to describe any one or more molecule(s) as defined above which may or may not be capable of binding DNA. The capability of said molecules to bind DNA may or may not be modulatable by a ligand. The latter of these properties may be investigated by the methods of this invention. Preferably, candidate DNA binding molecules comprise a plurality of, or a library of polypeptides. More preferably, these polypeptides are, or are derived from, DNA binding proteins such as DNA repair enzymes, polymerases, recombinases, methylases, restriction enzymes, replication factors, histones, or DNA binding structural proteins such as chromosomal scaffold proteins; even more preferably said polypeptides are derived from DNA binding proteins, preferably transcription factors. 'Derived from' means that the candidate DNA
binding molecules preferably comprise one or more of; DNA binding proteins, transcription factors, fragment(s) of DNA binding proteins or transcription factors, sequences homologous to DNA binding proteins or transcription factors, or polypeptides which have been fully or partially randomised from a starting sequence which is a DNA binding proteins or a transcription factor, a fragment of a DNA binding protein or a transcription factor, or homologous to a DNA binding protein or a transcription factor. Most preferably, candidate DNA binding molecules comprise polypeptides which are at least 40% homologous, more preferably at least 60% homologous, even more preferably at least 75% homologous or even more, for example 85 %, or 90 %, or even more than 95% homologous to one or more DNA binding proteins, preferably transcription factors, using one of the homology calculation algorithms defined below.
Candidate DNA binding molecules may comprise, among other things, DNA binding part(s) of any protein(s), for example zinc finger transcription factors, Zif268, ATF family transcription factors, ATF1, ATF2, bZIP proteins, CHOP, NF-I B, TATA binding protein (TBP), MDM, c-jun, elk, serum response factor (SRF), ternary complex factor (TCF); KRUPPEL, Odd Skipped, even skipped and other D.melanogaster transcription factors; yeast transcription factors such as GCN4, the GAL family of galactose-inducible transcription factors; bacterial transcription factors or repressors such as lacl^, or fragments or derivatives thereof. Derivatives would be considered by a person skilled in the art to be functionally and/or structurally related to the molecule(s) from which they are derived, for example through sequence homology of at least 40%.
The candidate DNA binding molecules may be non-randomised polypeptides, for example 'wild-type' or allelic variants of naturally occurring polypeptides, or may be specific mutant(s), or may be wholly or partially randomised polypeptides, preferably structurally related to DNA binding proteins as described herein.
In a highly preferred embodiment, these polypeptide candidate DNA binding molecules are displayed on the surface of bacteriophage particles, and are preferably partially randomised zinc-finger type transcription factors, preferably retaining at least 40% homology (as described herein) to zinc-finger type transcription factors.
In some cases, sequence homology may be considered in relation to structurally important residues, or those residues which are known or suspected of being evolutionarily conserved. In such instances, residues known to be variable or non-essential for a particular structural conformation may be discounted from the homology calculation. For example, as explained herein, zinc fingers are known to have certain residues which are important for the formation of the three-dimensional zinc finger structure. In these cases, homology may be considered over about seven of said important amino acid residues amongst approximately thirty residues which may comprise the whole finger structure.
As used herein, the term homology may refer to structural homology. Structural homology may be estimated by comparing the structural RMS deviation of the main part of the carbon atom backbone of two or more molecules. Preferably, the molecules may be considered structurally homologous if the deviation is 5A or less, preferably 3A or less, more preferably 1.5 A or less. Structurally homologous molecules will not necessarily show significant sequence homology.
Candidate DNA binding molecules, as defined above, may be pre-screened prior to being tested in the methods of the invention using routine assays known in art for determining the binding of molecules to nucleic acids so as to eliminate molecules that do not bind DNA. For example, a candidate DNA binding molecule, preferably a library of candidate DNA binding molecules, are contacted with nucleic acid and binding determined. The nucleic acids may for example be labelled with a detectable label, such as a fluorophore/flurochrome, such that after a wash step binding can be determined easily, for example by monitoring fluorescence. Other methods for measuring binding to DNA are set out below.
The nucleic acid with which the candidate nucleic acid binding molecules are contacted may be non-specific nucleic acids, such as a random oligonucleotide library or sonicated genomic DNA and the like. Alternatively, a specific sequence may be used or partially randomised library of sequences. The library of sequences may be in the form of an array.
HOMOLOGY
Nucleic acid binding molecules and polypeptide binding molecules according to the invention are preferably polypeptide sequences, optionally encoded by nucleic acid sequences. Fragments, mutants, alleles and other derivatives of the molecules of the invention preferably retain substantial homology with said sequence(s). As used herein, "homology" means that the two entities share sufficient characteristics for the skilled person to determine that they are similar. Preferably, homology is used to refer to sequence identity. Thus, the derivatives of said nucleic acid binding molecules of the invention and polypeptide binding molecules of the invention, preferably retain substantial sequence identity with said molecules.
In the context of the present invention, a homologous sequence is taken to include any sequence which is at least 60, 70, 80 or 90% identical, preferably at least 95 or 98% identical over at least 5, preferably 8, 10, 15, 20, 30, 40 or even more residues or bases with the molecules (i.e. the sequences thereof) of the invention, for example as shown in the sequence listing herein. In particular, homology should typically be considered with respect to those regions of the molecule(s) which may be known to be functionally important rather than non-essential neighbouring sequences. Although homology can also be considered in terms of similarity (i.e. amino acid residues having similar chemical properties/functions), in the context of the present invention it is preferred to express homology in terms of sequence identity.
Homology comparisons can be conducted by eye, or more usually, with the aid of readily available sequence comparison programs. These commercially available computer programs can calculate % homology between two or more sequences.
% homology may be calculated over contiguous sequences, i.e. one sequence is aligned with the other sequence and each amino acid in one sequence directly compared with the corresponding amino acid in the other sequence, one residue at a time. This is called an "ungapped" alignment. Typically, such ungapped alignments are performed only over a relatively short number of residues (for example less than 50 contiguous amino acids).
Although this is a very simple and consistent method, it fails to take into consideration that, for example, in an otherwise identical pair of sequences, one insertion or deletion will cause the following amino acid residues to be put out of alignment, thus potentially resulting in a large reduction in % homology when a global alignment is performed. Consequently, most sequence comparison methods are designed to produce optimal alignments that take into consideration possible insertions and deletions without penalising unduly the overall homology score. This is achieved by inserting "gaps" in the sequence alignment to try to maximise local homology.
However, these more complex methods assign "gap penalties" to each gap that occurs in the alignment so that, for the same number of identical amino acids, a sequence alignment with as few gaps as possible - reflecting higher relatedness between the two compared sequences - will achieve a higher score than one with many gaps. "Affine gap costs" are typically used that charge a relatively high cost for the existence of a gap and a smaller penalty for each subsequent residue in the gap. This is the most commonly used gap scoring system. High gap penalties will of course produce optimised alignments with fewer gaps. Most alignment programs allow the gap penalties to be modified. However, it is preferred to use the default values when using such software for sequence comparisons. For example when using the GCG Wisconsin Bestfit package (see below) the default gap penalty for amino acid sequences is -12 for a gap and -4 for each extension.
Calculation of maximum % homology therefore firstly requires the production of an optimal alignment, taking into consideration gap penalties. A suitable computer program for carrying out such an alignment is the GCG Wisconsin Bestfit package (University of Wisconsin, U.S.A.; Devereux et al, 1984, Nucleic Acids Research 12:387). Examples of other software than can perform sequence comparisons include, but are not limited to, the BLAST package (see Ausubel et al, 1999 ibid - Chapter 18), FASTA (Atschul et al, 1990, J. Mol. Biol, 403-410) and the GENEWORKS suite of comparison tools. Both BLAST and FASTA are available for offline and online searching (see Ausubel et al., 1999 ibid, pages 7-58 to 7-60). However it is preferred to use the GCG Bestfit program.
Although the final % homology can be measured in terms of identity, the alignment process itself is typically not based on an all-or-nothing pair comparison. Instead, a scaled similarity score matrix is generally used that assigns scores to each pairwise comparison based on chemical similarity or evolutionary distance. An example of such a matrix commonly used is the BLOSUM62 matrix - the default matrix for the BLAST suite of programs. GCG Wisconsin programs generally use either the public default values or a custom symbol comparison table if supplied (see user manual for further details). It is preferred to use the public default values for the GCG package, or in the case of other software, the default matrix, such as BLOSUM62.
Once the software has produced an optimal alignment, it is possible to calculate % homology, preferably % sequence identity. The software typically does this as part of the sequence comparison and generates a numerical result.
NUCLEIC ACID BINDING
Nucleic acid binding molecules and polypeptide binding molecules according to the invention may include any atom, ion, molecule, macromolecule (for example polypeptide), or combination of such entities that are capable of binding to nucleic acids, such as DNA, or (in the case of polypeptide binding molecules) polypeptides. Advantageously, nucleic acid binding molecules according to the invention may include families of polypeptides with known or suspected nucleic acid binding motifs. These may include for example zinc finger proteins (see below). Molecules according to the invention may also include helix- turn-helix proteins, homeodomains, leucine zipper proteins, helix-loop-helix proteins or β- sheet motifs which are well known to a person skilled in the art.
Polypeptide binding molecules of the invention advantageously contain protein- binding motifs, such as protein dimerization motifs as known in the art. Examples of a protein-binding motifs include the tetratricopeptide repeat (TPR) which is found in proteins associated with multiprotein complexes (Blatch and Lassie, 1999, Bioessays 21, 932-9), the Arg-Gly-Asp-Ser found in multimerin (Hayward 1997, Clin Invest Med 20, 176-87), the LXCXE motif found in SV40 Large T antigen necessary for binding to p53
protein (DeCaprio 1999, Biologicals 27, 23-8), the C-terminal VXI motif of ABP, which mediates binding of ABP to GluR2/3 through a Class I PDZ interaction to form homodimers and heteromultimers (Srivastava and Ziff, Ann N Y Acad Sci 868, 561-4), as well as the conserved Ran-binding motif found in species from yeasts to mammals (Seki et al, 1996, J Biochem (Tokyo), 120, 207-14). Other dimerisation motifs are known in the art, and are mentioned in, for example, WO92/00388 and WOOO/50630, and may be used in the invention.
According to the invention, nucleic acid binding motifs such as DNA binding motifs of one or more known or suspected nucleic acid/DNA binding polypeptide(s), or polypeptide binding motifs, may advantageously be randomised, in order to provide libraries of candidate nucleic acid binding molecules.-
The randomised libraries of candidate nucleic acid binding molecules may be in the form of an array.
Crystal structures may advantageously be used in selecting or predicting the relevant binding regions of nucleic acid and polypeptide binding proteins by methods known in the art. Nucleic acid binding regions of proteins within the same structural family are often conserved or homologous to one another, for example zinc finger α-helices, the leucine zipper basic region, homeodomain helix 3.
General considerations and rules governing the binding of several polypeptide families to nucleic acids are set out in the literature, e.g. in (Suzuki et al, 1994:PNAS vol 91 pp 12357-61). Nucleic acid binding criteria for zinc fingers as preferred nucleic acid binding molecules according to the present invention are set out in this application (see above).
It is also envisaged that the methods of the present invention could be advantageously applied to the selection of ligand-modulatable nucleic acid binding molecules from other families of transcription factors, for example from the helix-turn- helix (HTH) family and/or from the probe helix (PH) family, and/or from the C4 Zinc-
binding family (which includes the hormone receptor (HR) family), from the Gal4 family, from the c-myb family, from other zinc finger families, or from any other family of nucleic acid binding proteins or polypeptide binding proteins known to one skilled in the art.
One or more polypeptides from one or more of these families could be advantageously randomised to provide a library of candidate molecules for use in the methods of the invention. Preferably, the amino acid residues known to be important for nucleic acid or polypeptide binding could be randomised. More preferably, the randomised library is in the form of an array of candidate molecules. However, it may be desirable to randomise other regions of the binding molecule since alterations to the amino. acid sequence outside of those elements of secondary structure that present amino acids that contact the nucleic acid or polypeptide are likely to cause conformational changes that may affect the binding properties of the molecule.
For example, randomisation may involve alteration of zinc finger polypeptides, said alteration being accomplished at the DNA or protein level. Mutagenesis and screening of zinc finger polypeptides may be achieved by any suitable means. Preferably, the mutagenesis is performed at the nucleic acid level, for example by synthesising novel genes encoding mutant polypeptides and expressing these to obtain a variety of different proteins. Alternatively, existing genes can themselves be mutated, such as by site-directed or random mutagenesis, in order to obtain the desired mutant genes.
Mutations may be performed by any method known to those of skill in the art.
Preferred, however, is site-directed mutagenesis of a nucleic acid sequence encoding the protein of interest. A number of methods for site-directed mutagenesis are known in the art, from methods employing single-stranded phage such as Ml 3 to PCR-based techniques (see "PCR Protocols: A guide to methods and applications", M.A. Innis, D.H. Gelfand, J.J. Sninsky, T.J. White (eds.). Academic Press, New York, 1990). Preferably, the commercially available Altered Site II Mutagenesis System (Promega) may be employed, according to the manufacturer's instructions.
Randomisation of the zinc finger binding motifs is preferably directed to those amino acid residues where the code provided herein gives a choice of residues (see below). For example, positions +1, +5 and +8 are advantageously randomised, whilst preferably avoiding hydrophobic amino acids; positions involved in binding to the nucleic acid, notably -1, +2, +3 and +6, may be randomised also, preferably within the choices provided by the rules of the present invention.
Screening of the proteins produced by mutant genes is preferably performed by expressing the genes and assaying the binding ability of the protein product. A simple and advantageously rapid method by which this may be accomplished is by phage display, in which the mutant polypeptides are expressed as fusion proteins with the coat proteins of filamentous bacteriophage, such as the minor coat protein pll of bacteriophage ml 3 or gene III of bacteriophage Fd, and displayed on the capsid of bacteriophage transformed with the mutant genes. The target nucleic acid sequence or target polypeptide is used as a probe to bind directly to the protein on the phage surface and select the phage possessing advantageous mutants, by affinity purification. The phage are then amplified by passage through a bacterial host, and subjected to further rounds of selection and amplification in order to enrich the mutant pool for the desired phage and eventually isolate the preferred clone(s). Detailed methodology for phage display is known in the art and set forth, for example, in US Patent 5,223,409; Choo and Klug, (1995) Current Opinion in Biotechnology 6:431-436; Smith, (1985) Science 228:1315-1317; and McCafferty et al, (1990) Nature 348:552-554; all incorporated herein by reference. Vector systems and kits for phage display are available commercially, for example from Pharmacia.
Specific peptide ligands such as zinc finger polypeptides may moreover be selected for binding to targets by affinity selection using large libraries of peptides linked to the C-terminus of the lac repressor Lacl (Cull et al, (1992) Proc Natl Acad Sci U S A, 89,
1865-9). When expressed in E. coli the repressor protein physically links the ligand to the encoding plasmid by binding to a lac operator sequence on the plasmid.
An entirely in vitro polysome display system has also been reported (Mattheakis et al, (1994) Proc Natl Acad Sci U S A, 91, 9022-6) in which nascent peptides are physically
attached via the ribosome to the RNA which encodes them. Furthermore, polypeptides may be partitioned in physical compartments for example wells of an in vitro dish, or subcellular compartments, or in small fluid particles or droplets such as emulsions; further teachings on this topic may be found in Griffith et al, {see WO 99/02671).
A library for use in the invention may be randomised at those positions for which choices are given as set out below. The rules are intended allow the person of ordinary skill in the art to make informed choices concerning the desired codon usage at the given positions. A library for use in the invention may be in the form of an array, as discussed above.
The recognition helix of PH family polypeptides contains conserved Arg/Lys residues which are important structural elements involved in the binding of phosphates in the nucleic acid. Base specificity is attributed to amino acids 1, 4, 5 and 8 of the helix. These residues could be advantageously varied, for example amino acid 1 could be selected from Asn, Asp, His, Val, Ile to provide the possibility of binding to A, C, G, or T. Similarly, amino acid 4 could be selected from Asn, Asp, His, Val, He, Gin, Glu, Arg, Lys, Met, or Leu to provide the possibility of binding to A,C,G or T. Preferably, the rules laid out in (Suzuki et al, 1994: PNAS vol 91 pp 12357-61) would be used in order to randomise those amino acids which affect interaction of the molecule with the nucleic acid, whether in a base specific manner, or via binding to the phosphate backbone, thereby producing a library of candidate nucleic acid binding molecules, which may be in the form of an array, for use in the methods of the invention.
Similarly, polypeptide molecules of the helix-turn-helix family could be randomised to produce a library of candidate molecules, optionally in the form of an array, at least some of which may preferably be capable of binding nucleic acid in a ligand- dependent manner when used in the methods of the present invention. In particular, amino acids 1, 2, 5 and 6 are known to be conserved and function in base-specific nucleic acid ' binding in HTH motifs. Therefore, at least amino acids 1, 2, 5 or 6 would preferably be randomised so as to produce molecules for use according to the present invention. More preferably, amino acids 1 , 5 and 6 could be selected from Asn, Asp, His, Val, Ile, Glu, Gin,
Arg, Met, Lys or Leu, and amino acid 2 could be selected from Asn, Asp, His, Val, Ile, Glu, Gin, Arg, Met, Lys, Leu, Cys, Ser, Thr, or Ala.
Another family of transcription factors which may be advantageously employed in the methods of the current invention are the C4 family which includes hormone receptor type transcription factors. It is envisaged that polypeptides of this family could advantageously be used to provide candidate molecules for use in selecting nucleic acid binding molecules whose association with nucleic acid is modulatable by a nucleic acid binding ligand. Amino acids 1, 4, 5 and 9 of the C4 motif are known to be involved in contacting the DNA, and therefore these residues would preferably be altered to provide a plurality of different molecules which may bind DNA in a ligand dependent manner.
Preferably, amino acids 1 and 5 could be selected from Asn, Asp, His, Val, Ile, Glu, Gin, Arg, Met, Lys or Leu, and amino acids 4 and 9 could be selected from Gin, Glu, Arg, Lys, Leu or Met.
Many C4 transcription factors including hormone receptors bind DNA as homo or heterodimers in either inverted or direct repeat configurations through the action of a dimerisation domain or surface, and can therefore be used to exemplify a further embodiment of the invention. In this case randomisations or modifications may be introduced into one or both of the respective dimerisation domains or interfaces and optionally simultaneously into the DNA binding surfaces of these proteins, and a gene switch and/or a protein switch may be isolated by the methods described above.
Particularly preferred examples of DNA binding molecules are Cys2-His2 zinc finger binding proteins which, as is well known in the art, bind to target nucleic acid sequences via α-helical zinc metal atom co-ordinated binding motifs known as zinc fingers. Each zinc finger in a zinc finger nucleic acid binding protein is responsible for determining binding to a nucleic acid triplet, or an overlapping quadruplet, in a nucleic acid binding sequence. Preferably, there are 2 or more zinc fingers, for example 2, 3, 4, 5 or 6 zinc fingers, in each binding protein. Advantageously, there are 3 zinc fingers in each zinc finger binding protein.
Thus, in one embodiment, the invention provides a method for preparing a DNA binding polypeptide of the Cys2-His2 zinc finger class capable of binding to a target DNA sequence, wherein binding is via a zinc finger DNA binding motif of the polypeptide, and wherein said binding is modulatable by a ligand.
All of the DNA binding residue positions of zinc fingers, as referred to herein, are numbered from the first residue in the α-helix of the finger, ranging from +1 to +9. "-1" refers to the residue in the framework structure immediately preceding the α-helix in a Cys2-His2 zinc finger polypeptide. Residues referred to as "++" are residues present in an adjacent (C-terminal) finger. Where there is no C-terminal adjacent finger, "++" interactions do not operate.
The present invention is in one aspect concerned with the production of what are essentially artificial nucleic acid binding proteins such as DNA binding proteins as well as polypeptide binding molecules such as proteins. In these proteins, artificial analogues of amino acids may be used, to impart the proteins with desired properties or for other reasons. Thus, the term "amino acid", particularly in the context where "any amino acid" is referred to, means any sort of natural or artificial amino acid or amino acid analogue that may be employed in protein construction according to methods known in the art. Moreover, any specific amino acid referred to herein may be replaced by a functional analogue thereof, particularly an artificial functional analogue. The nomenclature used herein therefore specifically comprises within its scope functional analogues or mimetics of the defined amino acids.
The α-helix of a zinc finger binding protein aligns antiparallel to the nucleic acid strand, such that the primary nucleic acid sequence is arranged 3' to 5' in order to correspond with the N terminal to C-terminal sequence of the zinc finger. Since nucleic acid sequences are conventionally written 5' to 3', and amino acid sequences N-terminus to C-terminus, the result is that when a nucleic, acid sequence and a zinc finger protein are aligned according to convention, the primary interaction of the zinc finger is with the - strand of the nucleic acid, since it is this strand which is aligned 3' to 5'. These conventions are followed in the nomenclature used herein. It should be noted, however,
that in nature certain fingers, such as finger 4 of the protein GLI, bind to the + strand of nucleic acid: see Suzuki et al, (1994) NAR 22:3397-3405 and Pavletich and Pabo, (1993) Science 261 :1701-1707. The incorporation of such fingers into DNA binding molecules according to the invention is envisaged.
The present invention may be integrated with the rules set forth for zinc finger polypeptide design in our copending European or PCT patent applications having publication numbers; WO 98/53057, WO 98/53060, WO 98/53058, WO 98/53059, describe improved techniques for designing zinc finger polypeptides capable of binding desired nucleic acid sequences. In combination with selection procedures, such as phage display, set forth for example in WO 96/06166, these techniques enable the production of zinc finger polypeptides capable of recognising practically any desired sequence.
In a preferred aspect, therefore, the invention provides a method for preparing a DNA binding polypeptide of the Cys2-His2 zinc finger class capable of binding to a target DNA sequence, wherein said binding is modulatable by a ligand, and wherein binding to each base of the triplet by an α-helical zinc finger DNA binding motif in the polypeptide is determined as follows:
(a) if the 5' base in the triplet is G, then position +6 in the α-helix is Arg and/or position ++2 is Asp;
(b) if the 5' base in the triplet is A, then position +6 in the α-helix is Gin or Glu and ++2 is not Asp;
(c) if the 5' base in the triplet is T, then position +6 in the α-helix is Ser or Thr and position ++2 is Asp; or position +6 is a hydrophobic amino acid other than Ala;
(d) if the 5' base in the triplet is C, then position +6 in the α-helix may be any amino acid, provided that position ++2 in the α-helix is not Asp;
(e) if the central base in the triplet is G, then position +3 in the α-helix is His;
(f) if the central base in the triplet is A, then position +3 in the α-helix is Asn;
(g) if the central base in the triplet is T, then position +3 in the α-helix is Ala, Ser, Ile, Leu, Thr or Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small residue;
(h) if the central base in the triplet is 5-meC, then position +3 in the α-helix is Ala,
Ser, Ile, Leu, Thr or Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small residue;
(i) if the 3' base in the triplet is G, then position -1 in the α-helix is Arg;
(j) if the 3' base in the triplet is A, then position -1 in the α-helix is Gin and position +2 is Ala;
(k) if the 3' base in the triplet is T, then position -1 in the α-helix is Asn; or position -1 is Gin and position +2 is Ser;
(1) if the 3' base in the triplet is C, then position -1 in the α-helix is Asp and Position +1 is Arg; where the central residue of a target triplet is C, the use of Asp at position +3 of a zinc finger polypeptide allows preferential binding to C over 5-meC.
The foregoing represents a set of rules which permits the design of a zinc finger binding protein specific for any given target DNA sequence.
A zinc finger binding motif is a structure well known to those in the art and defined in, for example, Miller et al, (1985) EMBO J. 4:1609-1614; Berg (1988) PNAS (USA) 85:99-102; Lee et al, (1989) Science 245:635-637; see International patent applications WO 96/06166 and WO 96/32475, corresponding to USSN 08/422,107, incorporated herein by reference.
In general, a preferred zinc finger framework has the structure:
(A) XO-2 C Xl-5 C X9-14 H X3- 6 H/C
where X is any amino acid, and the numbers in subscript indicate the possible numbers of residues represented by X.
In a preferred aspect of the present invention, zinc finger nucleic acid binding motifs may be represented as motifs having the following primary structure:
( B ) Xa C X2-4 C X2-3 F Xc X X X X X H X Xb H - linker
-1 1 2 3 4 5 6 7 8 9
wherein X (including Xa, X and Xc) is any amino acid. X2.4 and X2.3 refer to the presence of 2 or 4, or 2 or 3, amino acids, respectively. The Cys and His residues, which together co-ordinate the zinc metal atom, are marked in bold text and are usually invariant, as is the Leu residue at position +4 in the α-helix.
Modifications to this representation may occur or be effected without necessarily abolishing zinc fmger function, by insertion, mutation or deletion of amino acids. For example it is known that the second His residue may be replaced by Cys (Krizek et al. , (1991) J. Am. Chem. Soc. 113:4518-4523) and that Leu at +4 can in some circumstances be replaced with Arg. The Phe residue before Xc may be replaced by any aromatic other than Trp. Moreover, experiments have shown that departure from the preferred structure and residue assignments for the zinc finger are tolerated and may even prove beneficial in binding to certain nucleic acid sequences. Even taking this into account, however, the general structure involving an α-helix co-ordinated by a zinc atom which contacts four Cys or His residues, does not alter. As used herein, structures (A) and (B) above are taken as an exemplary structure representing all zinc finger structures of the Cys2-His2 type.
Preferably, X is /γ-X or P- /γ-X. In this context, X is any amino acid. Preferably, in this context X is E, K, T or S. Less preferred but also envisaged are Q, V, A and P. The remaining amino acids remain possible.
Preferably, X2.4 consists of two amino acids rather than four. The first of these amino acids may be any amino acid, but S, E, K, T, P and R are preferred. Advantageously, it is P or R. The second of these amino acids is preferably E, although any amino acid may be used.
Preferably, X is T or I. Preferably, Xc is S or T.
Preferably, X2.3 is G-K-A, G-K-C, G-K-S or G-K-G. However, departures from the preferred residues are possible, for example in the form of M-R-N or M-R.
Preferably, the linker is T-G-E-K or T-G-E-K-P.
As set out above, the major binding interactions occur with amino acids -1, +3 and +6. Amino acids +4 and +7 are largely invariant. The remaining amino acids may be essentially any amino acids. Preferably, position +9 is occupied by Arg or Lys. Advantageously, positions +1, +5 and +8 are not hydrophobic amino acids, that is to say are not Phe, Trp or Tyr. Preferably, position ++2 is any amino acid, and preferably serine, save where its nature is dictated by its role as a ++2 amino acid for an N-terminal zinc finger in the same nucleic acid binding molecule.
In a most preferred aspect, therefore, bringing together the above, the invention allows the definition of every residue in a zinc finger DNA binding motif which will bind specifically to a given target DNA triplet. Arrays may be constructed which include some or all of the zinc fingers capable of binding to a given target DNA. Such arrays may be used in the methods of the invention.
The code provided by the present invention is not entirely rigid; certain choices are provided. For example, positions +1, +5 and +8 may have any amino acid allocation, whilst other positions may have certain options: for example, the present rules provide that, for binding to a central T residue, any one of Ala, Ser or Val may be used at +3. In its broadest sense, therefore, the present invention provides a very large number of proteins which are capable of binding to every defined target DNA triplet.
Preferably, however, the number of possibilities may be significantly reduced. For example, the non-critical residues +1, +5 and +8 may be occupied by the residues Lys, Thr and Gin respectively as a default option. In the case of the other choices, for example, the first-given option may be employed as a default. Thus, the code according to the present invention allows the design of a single, defined polypeptide (a "default" polypeptide) which will bind to its target triplet.
We also describe a method for preparing a DNA binding protein of the Cys2-His2 zinc finger class capable of binding to a target DNA sequence in a manner modulatable by a ligand, comprising the steps of: (a) selecting a model zinc finger domain from the group consisting of naturally occurring zinc fingers and consensus zinc fingers; and (b) mutating at least one of positions -1, +3, +6 (and ++2) of the finger as required by a method according to the present invention.
In general, naturally occurring zinc fingers may be selected from those fingers for which the DNA binding specificity is known. For example, these may be the fingers for which a crystal structure has been resolved: namely Zif 268 (Elrod-Erickson et al. , ( 1996) Structure 4:1171-1180), GLI (Pavletich and Pabo, (1993) Science 261 :1701-1707), Tramtrack (Fairall et al, (1993) Nature 366:483-487) and YY1 (Houbaviy et al, (1996) PNAS (USA) 93:13577-13582).
Although mutation of the DNA-contacting amino acids of the DNA binding domain allows selection of polypeptides which bind to desired target nucleic acids, and whose binding may be modulatable by a ligand which operates at the polypeptide-DNA interface, in a preferred embodiment residues which are outside the DNA-contacting region may be mutated. Mutations in such residues may affect the interaction between zinc fingers in a zinc finger polypeptide, and thus alter binding site specificity. Moreover, ligands which bind to a zinc finger polypeptide so as to influence zinc finger interaction and thus binding may be identified. Mutation of residues which affect the interaction between zinc fingers allows for selection of fingers which are modulatable by ligand binding at these sites.
The naturally occurring zinc finger 2 in Zif 268 makes an excellent starting point from which to engineer a zinc finger and is preferred.
Consensus zinc finger structures may be prepared by comparing the sequences of known zinc fingers, irrespective of whether their binding domain is known. Preferably, the consensus structure is selected from the group consisting of the consensus structure P Y K CPECGKSFSQKSDLVKHQRTHTG, and the consensus structure PYKCS ECGKAFSQKSNLTRHQRIHTGEKP.
The consensuses are derived from the consensus provided by Krizek etal., (1991) J. Am. Chem. Soc.113: 4518-4523 and from Jacobs, (1993) PhD thesis, University of Cambridge, UK. In both cases, the linker sequences described above for joining two zinc finger motifs together, namely TGEK or TGEKP can be formed on the ends of the consensus. Thus, a P may be removed where necessary, or, in the case of the consensus terminating T G, E K (P) can be added.
When the nucleic acid specificity of the model finger selected is known, the mutation of the finger in order to modify its specificity to bind to the target DNA may be directed to residues known to affect binding to bases at which the natural and desired targets differ. Otherwise, mutation of the model fingers should be concentrated upon residues -1, +3, +6 and ++2 as provided for in the foregoing rules.
In order to produce a binding protein having improved binding, moreover, the rules provided by the present invention may be supplemented by physical or virtual modelling of the protein/DNA interface in order to assist in residue selection.
We describe a method for producing a zinc finger polypeptide capable of binding to a target DNA sequence, wherein said binding is modulatable by a ligand, comprising: (a) providing a nucleic acid library encoding a repertoire of zinc finger polypeptides, the nucleic acid members of the library being at least partially randomised at one or more of the positions encoding residues -1, 2, 3 and 6 of the α-helix of the zinc finger polypeptides; (b) displaying the library in a selection system and screening it against a target DNA
sequence; (c) isolating the nucleic acid members of the library encoding zinc finger polypeptides capable of binding to the target sequence in the presence/absence of ligand; (d) selecting those members of the library isolated in (c) which bind the target nucleic acid sequence with different affinities in the presence and absence of the ligand. The nucleic acid library encoding the zinc finger polypeptides may be (and are preferably) in the form of an array.
Methods for the production of libraries encoding randomised polypeptides are known in the art and may be applied in the present invention. Construction of arrays incorporating such libraries is disclosed in detail elsewhere in this document.Randomisation may be total, or partial; in the case of partial randomisation, the selected codons preferably encode options for amino acids as set forth in the rules above.
Zinc finger polypeptides may be designed which specifically bind to nucleic acids incorporating the base U, in preference to the equivalent base T.
We describe a method for producing a zinc finger polypeptide capable of binding to a target DNA sequence, wherein said binding is modulatable by a ligand, comprising: (a) providing a nucleic acid library encoding a repertoire of zinc finger polypeptides each possessing more than one zinc fingers, the nucleic acid members of the library being at least partially randomised at one or more of the positions encoding residues -1, 2, 3 and 6 of the α-helix in a first zinc finger and at one or more of the positions encoding residues -1, 2, 3 and 6 of the α-helix in a further zinc finger of the zinc finger polypeptides; (b) displaying the library in a selection system and screening it against a target DNA sequence; (c) assessing the affinity of the DNA binding molecules for the target DNA in the presence and absence of the ligand, and (d) isolating the nucleic acid members of the library encoding zinc finger polypeptides capable of binding to the target sequence with different affinities in the presence and absence of ligand. The library may be in the form of an array of sequences, and the invention includes such an array and its uses.
In this aspect, the invention encompasses library technology described in our copending International patent application WO 98/53057, incorporated herein by reference
in its entirety. WO 98/53057 describes the production of zinc finger polypeptide libraries in which each individual zinc finger polypeptide comprises more than one, for example two or three, zinc fingers; and wherein within each polypeptide partial randomisation occurs in at least two zinc fingers.
- This allows for the selection of the "overlap" specificity, wherein, within each triplet, the choice of residue for binding to the third nucleotide (read 3' to 5' on the + strand) is influenced by the residue present at position +2 on the subsequent zinc finger, which displays cross-strand specificity in binding. The selection of zinc finger polypeptides incorporating cross-strand specificity of adjacent zinc fingers enables the selection of nucleic acid binding proteins more quickly, and/or with a higher degree of specificity than is otherwise possible.
Zinc finger binding motifs designed according to the invention may be combined into nucleic acid binding polypeptide molecules having a multiplicity of zinc fingers. Preferably, the proteins have at least two zinc fingers. In nature, zinc finger binding proteins commonly have at least three zinc fingers, although two-zinc finger proteins such as Tramtrack are known, The presence of at least three zinc fingers is preferred. Nucleic acid binding proteins may be constructed by joining the required fingers end to end, N-terminus to C-terminus. Preferably, this is effected by joining together the relevant nucleic acid sequences which encode the zinc fingers to produce a composite nucleic acid coding sequence encoding the entire binding protein.
We describe a method for producing a DNA binding protein as defined above, wherein the DNA binding protein is constructed by recombinant DNA technology, the method comprising the steps of: (a) preparing a nucleic acid coding sequence encoding two or more zinc finger binding motifs as defined above, placed N-terminus to C-terminus; (b) inserting the nucleic acid sequence into a suitable expression vector; and (c) expressing the nucleic acid sequence in a host organism in order to obtain the DNA binding protein.
A "leader" peptide may be added to the N-terminal finger. Preferably, the leader peptide is MAEEKP.
The invention also provides zinc finger polypeptides which comprise more than three zinc fingers, such as four, five, six, seven, eight or nine zinc fingers. For example, the invention comprises a zinc finger polypeptide which includes the natural zinc fingers TFIIIa and Zif268. A zinc finger protein has been engineered that contains a novel linker sequence. In the present case, the novel linker joins two three zinc finger domains, but may be used to join multiple groups of zinc finger domains or other domains used in engineered transcription factors. This linker differs from previously described linkers in that it is structured and comprises of a single non-DNA-binding zinc finger. We have found that structured linkers are more suitable for connecting zinc finger domains that bind to subsites separated by longer gaps of DNA sequence than linkers previously described, for example, the 8 and 11 amino acid linkers used to span 1 and 2 base pairs respectively. No linkers have been designed for spanning longer regions. The ability of structured linkers to span longer gaps is propbably due to the fact that these linkers confine the relative freedom of the two domains, thus minimising the conformational search that preceeds binding and also the entropy loss on binding.
The multiple zinc finger protein that we have engineered here is composed of zinc fingers 1-3 of TFIIIA and the three zinc fingers from Zif268 joined by zinc finger 4, including flanking sequences, of TFIIIA. We have called the zinc finger protein TFIIIAZif. Zinc finger 4 of TFIIIA does not bind DNA but acts as a linker in between the two sets of zinc fingers that are involved in DNA recognition. Despite the fact that this zinc finger does not make any base contacts within the major groove of the DNA, it is folded in the classical way, for Cys2His2 zinc fingers, around a Zn(II) ion and is folded to contain an alpha helix within its structure (No Ite et al, 1998). Although this particular finger was used in this example, solely because it was a familiar structured polypetide, we believe that other tertiary structures would also be suitable for use as structured linkers.
The DNA binding site for the TFIIIAZif protein contains the DNA recognition sites for zinc fingers 1-3 of TFIIIA and the three zinc fingers of Zif 268. These are the DNA sequences GGATGGGAGAC and GCGTGGGCGT, respectively, as shown in Sequence ID 3. The six base pair sequence GTACCT in Sequence ID 3 is a spacer region of DNA that separates the two binding sites and the nucleotide composition of the DNA spacer
appears to have no effect on binding of the protein. Therefore, this or other structured linkers could be used with other DNA spacers of different length and sequence.
The amino acid sequence of zinc Finger 4 of TFIIIA, including the flanking sequences as used in the composite protein of the invention, is NIKICVYVCHFENCGKAFKKHNQLK VHQFSHTQQLP.
The nucleotide Sequence of Zinc Finger 4 of TFIIIA, including the flanking sequences, is -
AACATCAAGATCTGCGTCTATGTGTGCCATTTTGAGAACTGTGGCAAAGCATTC AAGAAACACAATCAATTAAAGGTTCATCAGTTCAGTCACACACAGCAGCTGCC G.
RNA BINDING MOLECULES
Various molecules capable of binding RNA are known in the art. Preferably, the RNA binding molecule is an RNA binding polypeptide. Preferred polypeptides capable of binding RNA may comprise one or more domains which facilitate interaction with RNA, for example, the RNA Recognition Motif (RRM)
RNA Recognition Motif (RRM)
The most widely found and best-characterised RNA-binding motif is the RNP motif. It is also referred to as RNA recognition motif (RRM) or consensus sequence RNA- binding domain (cs-RBD or RBD). It is composed of 90 to 100 amino acids which form an RNA-binding domain that is present in one or more copies in proteins that bind RNA. The . identifying feature of the RNP motif is the RNP consensus sequence which is composed of two short sequences, RNP1 and RNP2, and a number of other, mostly hydrophobic, conserved amino acids interspersed throughout the motif. Detailed structural information for the RNP motif is available. The babbab secondary structural elements of RBDs form a four-stranded antiparallel b sheet packed against the two perpendicularly oriented a helices. Amino acids of RNP 1 and RNP2 are solvent exposed and make direct contact with bound RNA, probably through hydrogen bonds and ring stacking. A second role is structural: the
aromatic side chain at the last position of RNPl points to the interior of the folded domain and, along with other highly conserved hydrophobic amino acids in the two a helices, forms part of the hydrophobic core of the domain. Other structural features include the pronounced right-handed twist of the b sheets, a very small antiparallel b sheet between a2 and b4 with a type F turn, and bulges in bl and b4. Highly conserved RNPl and RNP2, although crucial-for RNA binding, probably do not distinguish between different RNA sequences. Major determinants of RNA-binding specificity reside in the most variable regions of the RNP motif, particularly in the loops and the termini.
Polypeptides which are known to comprise RRM motifs are listed in the SMART database (Schultz, J, Milpetz, F, Bork, P., and Ponting, C.P. (1998) SMART, a simple modular architecture research tool: Identification of signalling domains Proc. Natl. Acad. Sci. USA 95, 5857-5864; Schultz, J, Copley, R.R, Doerks, T, Ponting, C.P. and Bork, P. (2000) SMART: A Web-based tool for the study of genetically mobile domains Nucleic Acids Res 28, 231-234).
Arginine-Rich Motif (ARM)
Short arginine-rich sequences (also called Basic domains) in viral, bacteriophage, and ribosomal proteins also mediate RNA binding. Apart from arginine-rich elements there is little identity between different ARM sequences, and the structures of the ARM regions of two proteins, Tat and Rev, are diverse.
Rev binds with high affinity to an internal bulged loop (Rev responsive element,
RRE) found in all intron-containing viral mRNAs. Another HIV-encoded ARM protein, Tat, binds the trans-acting responsive element (TAR) of HIV mRNAs and functions in transcription. Amino acids outside Rev and Tat ARMs also contribute to RNA binding. Peptides encompassing the Rev ARM (TRQARRNRRRWRERQ) specifically bind RRE as an α-helix, and at least six amino acids, including four arginines, are essential for specificity of binding. In contrast, Tat ARM (ALGISYGRKKRRQRRRP ) peptides are unstructured but adopt a stable conformation upon binding TAR. Again, amino acids outside ARM are required for wild-type binding activity.
Binding of proteins to TAR and RRE distorts the deep major groove of the RNA, thereby allowing access to required hydrogen bonding atoms. There are two general roles for arginine in RNA binding. Firstly, the positive charge of arginine increases nonspecific affinity for RNA. The second function is to make specific hydrogen bonding networks with the RNA sugar-phosphate backbone and bases. The RNA binding sites of ARM proteins are complex, and consist of stem-loops (N-proteins), internal loops (Rev) or bulges (Tat), and their structure, rather than particular sequence, may be the major binding determinant.
RGG Box
The RGG box is a 20-25 amino acid long RNA-binding motif typically found in combination with other types of RNA-binding domains. The motif is defined as closely spaced Arg-Gly-Gly (RGG) repeats interspersed with other, often aromatic amino acids. The high density of glycine and variations within the motif suggest that it is not a rigid protein structure, but spectroscopic modelling of nucleolin RGGF repeats (GRGGFGGRGGGRGGRGGFGGRGR ) predicted a helical b-spiral structure.
KH Motif
K homology (KH) motif containing proteins have been associated with important biological functions. A single amino acid substitution that unfolds a human KH domain leads to the fragile-X syndrome. The definitive role of the KH motif in RNA binding is not clear, but the motif is essential for RNA binding and it probably binds RNA directly. KH motifs are found in one or multiple copies per protein. Their topology is βααββα.
Double-Stranded RNA-Binding Motif (DSRM or dsRBD)
The dsRBD domain (around 70 amino acid region) is a general double-stranded RNA-binding module with αβββα topology. Isolated domains bind double-stranded RNA of any sequence with little or no specificity, but multiple domains may specifically recognize certain RNA structures. Conserved positions, including many basic (Arg and Lys) and hydrophobic amino acids, are scattered throughout the DSRM. Some DSRM proteins bind unique RNA sequences and they do not bind dsDNA. DSRM proteins are involved in diverse functions and provide an example of post-transcriptional gene
regulation by RNA-binding proteins. An important component of the response of mammalian cells to viral infection is the interferon-induced protein kinase (DAI), containing two DSRMs.
A cellular protein containing two DSRMs binds the TAR RNA element of HIV mRNA and may influence Tat-mediated activation. RNP, KH and DSRM represent ab protein folds with an antiparallel b-sheet on one face of the protein packed by a hydrophobic core against an a-helical face. Although a number of all-helical RNA-binding proteins have been identified, the αβ structural theme is conserved in many RNA-binding proteins that do not share sequence homology with RNP, KH and DSRM motifs.
Other RNA-Binding Domains
A small number of RNA-binding proteins, including retro viral nucleocapsid proteins, RNA polymerases and yeast RNA-binding proteins, contain sequences with appropriately spaced cysteine-histidine residues that relate these proteins to the zinc fmger family of DNA-binding proteins.
A generalized Zinc-fmger-knuckle motif can be written as CX2-5CX4-12C/HX2-
4C/H. The best characterized example is TFIIIA, a nine zinc-finger protein that binds both the 5S rRNA gene and 5S rRNA. The middle three out of nine zinc fingers are primarily responsible for RNA binding. The amino acid sequences of several eukaryotic transcription factors (Y-box proteins) are related to the bacterial cold shock domain and they may have dual roles as DNA- and RNA-binding proteins. Most tRNA synthetases have motifs that are common to related groups of synthetases and whose amino acids directly contact RNA. They can facilitate or hinder the formation of specialized complexes at particular sites on the RNA. They can also directly modify RNA structure, either locally (the conformation of bound RNA) or globally (RNA secondary and tertiary structures). RNA binding proteins can serve as structural components and form, together with RNA, stable RNP particles, or they may serve to transport and localize RNAs.
Any of these motifs known to bind RNA may be used in the nucleic acid binding molecules of our invention.
POLYPEPTIDE MODIFICATIONS
As explained above, some embodiments of the invention make use of modified nucleic acid or polypeptide to select gene switches. Polypeptide modifications are discussed in this section.
Table 1 lists some non-limiting examples of post-translational modifications known to affect polypeptides. Any of these modifications may be used in the methods disclosed here for selection of gene switches using modified entities.
Table 1
Phosphorylation - kinase and phosphatases
A particularly important polypeptide modification is phoshorylation and dephosphorylation. The art is replete with references to enzymes capable of effecting phosphorylation and dephosphorylation, i.e. protein kinases and phosphatases, and their targets, including consensus phosphorylation motifs (such as -SQ- or -TQ- for the DNA dependent protein kinase (DNA-PK).
Some non-limiting examples of kinases and their sites for post-translational modification are presented in Table 2 (phosphorylation dephosphorylation).
Table 2
X signifies any amino acid. Consensus sequences are taken from Trends Biochem. Sci. (1990) 15: 342-346.
Further examples of protein kinases identified to date include the protein tyrosine kinase subfamily (such as PDGF receptors, EGF receptors, src family kinases (see Brown and Cooper, 1996, Biochimica and Biophysica Acta 1287: 121-149 for a review), the JAK kinase family (such as JAK1, JAK2 and tyk2), Erb B2, Bcr-Abl, Alk, Trk, Res/Sky - for a detailed review see Al-Obeidi et al, 1998, Biopolymers (Peptide Science), Vol 47: 197- 223), the MAP kinase pathway subfamily (such as the MAP family, the ERK family, the MEK family, the MEKK family, RAF-1 and JNK), the cyclin-dependent kinase subfamily (such as p34cdc2 and cdk2 - see Nigg, 1995, Bioessays 17: 471-480 for a review), Weel/Mytl, polo-like kinases (such as plkl, Plxl, POLO, Snk, Fnk/Prk Sak-a, Sak-b - see Lane and Nigg, 1997, Trends in Cell Biol. 7: 63-68), the receptor serine kinase subfamily, protein kinase C (PK-C), cyclic- AMP dependent kinase (PK-A), cyclic-GMP dependent kinase (PKG), Ca2+/calmodulin dependent kinases (such as CaM kinase I, II and IV), DNA dependent protein kinase,), phosphoinositide 3-kinases (P13K), PDK-1, the p21- activated protein kinase family (PAKs), such as Pakl, Pak2 and Pak3- see Sells and Chernoff, 1997, Trends in Cell Biol. 7: 162-167), p70 S6 kinase, IkB kinase, casein kinase II, glycogen-synthase kinases (GSK3).
A discussion of particular kinase pathways involved in signal transduction is given in chapter 35 of Lewin, 1997, Gene VI, Oxford University Press. Details of recognition and binding domains for a variety of kinases are given in Kuriyan and Cowburn, 1997, Annu. Rev. Biophys. Biomol. Struc. 26:259-288.
Some specific examples of kinases include the src family tyrosine kinases Lck and
Fyn, that phosphorylate the TCR ζ chain, and are known to be involved in signal transduction associated with T cell receptor stimulation. The TCR ζ chain comprises specific tyrosine residues present in immunoreceptor tyrosine-based activation motifs (ITAMs) that are phosphorylatd by Lck and Fyn (Kuriyan and Cowburn, 1997, ibid.). The SH2 domain of another tyrosine kinase, ZAP70 binds to phosphorylated TCR ζ. Thus TCR ζ IT AM and ZAP70 SH2 represent binding domains and binding partners that may be of interest in studying the activity of the kinases Lck and Fyn (see Elder et al, 1994, Science 264: 1596-1599 and Chan et al, 1994, Science 264: 1599-1601.
Another example is the IgE receptor γ subunit and the SH2 domain of Syk that may be used to study the activity of the Lyn kinase.
Examples of phosphatases identified to date fall into three main families (for review see Barford et al, 1998, Annu. Rev. Biophys. Biomol. Struc. 27: 133-164). The PPP family includes the following catalytic subunits: PPlc, PP2Ac, PP2B, PPP1, PPP2A and PPP5 and the following regulatory subunits: NIPP-1, RIPP-1, p53BP2, γ 4.5, PR65, PR55, PR72, PTPA, SV40 small T antigen, PPY, PP4, PP6 and PP5. The PPM family includes pyruvate dehydrogenase phosphatase and Arabidopsis ABU.
The protein tyrosine phosphatase family includes PTP1B, SHP-1, SHP-2 (cytosolic non-receptor forms), CD45 (see Thomas and Brown, 1999, Trends in Immunol, 20: 406 and Ashwell and D'Oro, 1999, Trends in Immunol, 20: 412 for further details), RPTP (receptor-like, transmembrane forms) and cdc25, kinase-associated phosphatase and MAP kinase phosphatase-1 (dual-specificity phosphatases). PTP1B is known to associate with the insulin receptor in vivo (Bandy opadhyay et al, 1997, J. Biol. Chem. 272: 1639-1645).
Table 3 provides a non- limiting list of enzymes that are representative of some of the classes of modifying enzymes discussed herein which may be used to modify polypeptides.
Table 3
The several types of post-translational modification presented above will be discussed in some detail below.
ADP-ribosylation
Mono-ADP-ribosylation is a post-translational modification of proteins which is currently thought to play a fundamental role in cellular signalling. A number of mono- ADP-ribosyl-transferases have been identified, including endogenous enzymes from both bacterial and eukaryotic sources and bacterial toxins. A mono-ADP-ribosylating enzyme, using as substrates the protein to be modified and nicotinamide adenine dinucleotide (NAD+), is NAD: Arginine ADP ribosyltransferase (Zolkiewska et al, 1992, Proc. Natl. Acad. Sci. U.S.A., 89: 11352-11356). The reactions catalyzed by bacterial toxins such as cholera and pertussis toxin are well understood; the activities of these toxins result in the permanent modification of heterotrimeric G proteins. Endogenous transferases are also thought to modify G proteins and therefore to play a role in signal transduction in the cell (de Murcia et al, 1995, Trends Cell Biol, 5: 78-81). The extent of the effects that ADP- ribosylation can mediate in the cell is illustrated by the example of brefeldin A, a fungal toxin metabolite of palmitic acid. This toxin induces the mono- ADP-ribosylation of BARS-50 (a G protein involved in membrane transport) and glyceraldehyde-3 -phosphate dehydrogenase. The cellular effects of brefeldin A include the blocking of constitutive protein secretion and the extensive disruption of the Golgi apparatus. Inhibitors of the brefeldin A mono-ADP-ribosyl-transferase reaction have been shown to antagonise the disassembly of the Golgi apparatus induced by the toxin (Weigert et al, 1997, J. Biol. Chem, 272: 14200-14207). A number of amino acid residues within proteins have been shown to function as ADP-ribose acceptors. Bacterial transferases have been identified which modify arginine, asparagine, cysteine and diphthamide residues in target proteins. Endogenous eukaryotic transferases are known which also modify these amino acids, in addition there is evidence that serine, threonine, tyrosine, hydroxyproline and histidine residues may act as ADP-ribose acceptors but the relevant transferases have not yet been identified (Cervantes-Laurean et al, 1997, Methods Enzymol, 280: 275-287 and references therein).
Poly- ADP-ribosylation is thought to play an important role in events such as DNA repair, replication, recombination and packaging and also in chromosome decondensation. The enzyme responsible for the poly- ADP-ribosylation of proteins involved in these
processes is poly (ADP-ribose) polymerase (PARP; for Drosophila melanogaster PARP, see Genbank Accession Nos. D 13806, D 13807 and D13808). The discovery of a leucine zipper in the self-poly(ADP-ribosyl)ation domain of the mammalian PARP (Uchida et al. , 1993, Proc. Natl. Acad. Sci. U.S.A., 90: 3481-3485) suggested that this region may be important for the dimerization of PARP and also its interaction with other proteins (Mendoza-Alvarez et al, 1993, J. Biol. Chem, 268: 22575-22580).
Specific examples of ADP-ribosylation sites are those found at Cys3 and Cys4 (underlined) of the B-50 protein (Coggins et al, 1993, J. Neurochem, 60: 368-371 ; SwissProt Accession No. P06836): MLCCMRRTKQVEKNDDD and Pγ (the γ subunit of cylic CMP phophodiesterase; Bondarenko et al, 1997, J. Biol. Chem, 272: 15856-15864; Genbank Accession No. X04270): FKQRQTRQFK.
Glycosylation
N- linked glycosylation is a post-translational modification of proteins which occurs in the endoplasmic reticulum and golgi apparatus and is utilized with some proteins en route for secretion or destined for expression on the cell surface or in another organelle.
The carbohydrate moiety is attached to Asn residues in the non-cytoplasmic domains of the target proteins, and the consensus sequence (Shakineshleman, 1996, Trends Glycosci. Glycotech, 8: 115-130) for a glycosylation site is NxS/T, where x cannot be proline or aspartic acid.
N-linked sugars have a common five-residue core consisting of two GlcNAc residues and three mannose residues due to the biosynthetic pathway. This core is modified by a variety of Golgi enzymes to give three general classes of carbohydrate known as oligomannosyl, hybrid and lactosamine-containing or complex structures (Zubay, 1998, Biochemistry, Wm. C. Brown Publishers). An enzyme known to mediate N-glycosylation at the initial step of synthesis of dolichyl-P-P-oligosaccharides is UDP-N-
Acetylglucosamine-Dolichyl-phosphate-N-acetylsglucosamine phosphotransferase (for mouse, Genbank Accession Nos. X65603 and S41875).
Oxygen-linked glycosylation also occurs in nature with the attachment of various sugar moieties to Ser or Thr residues (Hansen et al, 1995, Biochem. J, 308: 801-813). Complex O-linked glycosylation can be broken into at least six classes - mucin type, ser- 1-GlcNAc; proteoglycan type, ser-Gal-Gal-Xyl core; collagen type, hydroxylys-Gal-Glc; clotting factor type, ser-Xyl-Glc or ser-Xyl-Xyl-Glc core; fungal type, ser-Man; plant type, hydroxypro-Ara or ser-Gal (where GlcNAc = N-acetylglucosamine, Gal = galactose, Xyl = Xylose; Glc = glucose, Man = mannose and Ara = arabinose; Hansen et al, 1995, supra). Intracellular proteins are among the targets for O-glycosylation through the dynamic attachment and removal of O-N-Acetyl-D-glucosamine (O-GlcNAc; reviewed by Hart, 1997, Ann. Rev. Biochem, 66: 315-335). Proteins known to be O-glycosylated include cytoskeletal proteins, transcription factors, the nuclear pore protein complex, and tumor- suppressor proteins (Hart, 1997, supra). Frequently these proteins are also phosphoproteins, and there is a suggestion that O-GlcNAc and phosphorylation of a protein play reciprocal roles. Furthermore, it has been proposed that the glycosylation of an individual protein regulates proteimprotein interactions in which it is involved.
Specific sites for the addition of O-GlcNAc are found, for example, at Ser277, Ser3i6 and Ser383 of p67SRF (Reason et al, 1992, J. Biol. Chem, 267: 16911-16921; Genbank Accession No. J03161). The recognition sequences encompassing these residues are shown below:
274GTTSTIQTAP 313SAVSSADGTVLK 374DSSTDLTQTSSSGTVTLP
The identity of sites of O-GlcNAc is additionally known for a small number of proteins including c-myc (Thr58, also a phosphorylation site; Chou et al, 1995, J. Biol. Chem, 270: 18961-18965), the nucleopore protein p62 (see Reason et al, 1992, supra): MAGGPADTSDPL and band 4.1 of the erythrocyte (see Reason et al, 1992, supra): AQTITSETPSSTT.
The site at which modification occurs is, in each case, underlined. The reaction is mediated by O-GlcNAc transferase (Kreppel et al, 1997, J. Biol. Chem, 272: 9308-9315).
Prenylation (fatty acylation)
The post-translational modification of proteins with fatty acids includes the attachment of myristic acid to the primary amino group of an N-terminal glycine residue (Johnson et al, 1994, Ann. Rev. Biochem, 63: 869-914) and the attachment of palmitic acid to cysteine residues (Milligan et al, 1995, Trends Biochem. Sci, 20: 181-186).
Fatty acylation of proteins is a dynamic post-translational modification which is critical for the biological activity of many proteins, as well as their interactions with other proteins and with membranes. Thus, for a large number of proteins, the location of the protein within a cell can be controlled by its state of prenylation (fatty acid modification) as can its ability to interact with effector enzymes (ras and MAP kinase, Itoh et al. , 1993, J. Biol. Chem, 268: 3025-; ras and adenylate cyclase in yeast; Horiuchi et al, 1992, Mol. Cell. Biol, 12: 4515) or with regulatory proteins (Shirataki et al, 1991, J. Biol. Chem, 266: 20672-20677). The prenylation status of ras is important for its oncogenic properties (Cox, 1995, Methods Enzymol, 250: 105-121) thus interference with the prenylation status of ras is considered a valuable anti-cancer strategy (Hancock, 1993, Curr. Biol, 3: 770).
Sentrinization
Sentrin is a novel 101-amino acid protein which has 18% identity and 48% similarity with human ubiquitin (Okura et al, 1996, J. Immunol, 157: 4277-4281). This protein is known by a number of other names including SUMO-1, UBL1, PIC1, GMP1 and SMT3C and is one of a number of ubiquitin-like proteins that have recently been identified. Sentrin is expressed in all tissues (as shown by Northern blot analysis), but mRNA levels are higher in the heart, skeletal muscle, testis, ovary and thymus.
The sentrinization of proteins is thought to involve the Ubiquitin-conjugating enzyme Ubc9 (Gong et al, 1997, J. Biol. Chem, 272: 28198-28201). The interaction between these two proteins in the yeast two-hybrid screen is very specific, suggesting that this is a biologically relevant phenomenon. The interaction is dependent upon the presence of the conserved C-terminal Gly-Gly residues present in sentrin (Gong et al, 1997, supra).
The conjugation of sentrin to other proteins via Glyαγ requires the cleavage of the C- terminal four amino acids of the protein, His-Ser-Thr-Val.
One important protein shown to be modified by the addition of sentrin is the Ran- specific GTPase-activating protein, RanGAPl, which is involved in nuclear import of proteins bearing nuclear-localization signals (Johnson and Hochstrasser, 1997, Trends Cell Biol, 7: 408-413). Conjugation of RanGAPl by sentrin is essential both for the targeting of RanGAPl to its binding partner on the nuclear pore complex (NPC) and for the nuclear import of proteins. Sentrin itself does not bind with high affinity to the NPC and it is, therefore, likely that it either provokes a conformational change in RanGAPl that exposes a binding site or, alternatively, that the binding site is formed using both sentrin and RanGAPl sequences. There is evidence to suggest that the conjugation of sentrin to RanGAPl is necessary for the formation of other sentrinized proteins (Kamitani et al, 1997, J. Biol. Chem, 272: 14001-14004) and that the majority of these sentrinized proteins are found in the nucleus.
Sentrin has been shown in yeast two-hybrid screens to interact with a number of
- other proteins, including the death domains of Fas/APOl and the TNF receptors, PML, RAD51 and RAD52 (Johnson and Hochstrasser, 1997, supra). These interactions implicate sentrin in a number of important processes. Fas/APOl and TNF receptors are involved in transducing the apoptosis signal via their death domains. Ligation of Fas on the cell surface results in the formation of a complex via death domains and death-effector domains, triggering the induction of apoptosis. The overexpression of sentrin protects cells from both anti-Fas/ APO and TNF-induced cell death (Okura et al, 1996, supra). It is not clear whether this protection is achieved simply by preventing the binding of other proteins to these death domains or whether a more complex process is involved, possibly one involving the ubiquitin pathway.
The interaction of sentrin with PML (a RING finger protein) is important, as it points to a disease state in which this protein may play a role. In normal myeloid cells, PML is found in a nuclear multiprotein complex known as a nuclear body. These nuclear bodies are disrupted in acute promyelocytic leukaemia, where a chromosomal translocation
generates a fusion between regions of the retinoic acid receptor and PML. This disruption can be reversed by treatment with retinoic acid. It has been shown that PML is covalently modified at multiple sites by members of the sentrin family of proteins (but not by ubiquitin or NEDD8). Two forms of the aberrant fusion protein have been identified, neither of which is modified by sentrin. It is, therefore, thought that differential sentrinization of the normal and abberant forms of PML may be important in the processes underlying acute promyelocytic leukaemia and may help in the understanding of the biological role of the PML protein (Kamitani et al, 1998, J. Biol. Chem, 273: 3117-3120).
In general, a modified polypeptide comprises one or more modified nucleic acids. Examples of such modified nucleic acids include: 2-Aminoadipic acid, 3-Aminoadipic acid, beta-Alanine, beta-Aminopropionic acid, 2-Aminobutyric acid, 4-Aminobutyric acid, piperidinic acid, 6-Aminocaproic acid, 2-Aminoheptanoic acid, 2-Aminoisobutyric acid, 3- Aminoisobutyric acid, 2-Aminopimelic acid, 2,4-Diaminobutyric acid, Desmosine, 2,2'- Diaminopimelic acid, 2,3-Diaminopropionic acid, N-Ethylglycine, N-Ethylasparagine, Hydroxylysine, allo-Hydroxylysine, 3-Hydroxyproline, 4-Hydroxyproline, Isodesmosine, allo-Isoleucine, N-Methylglycine, sarcosine, N-Methylisoleucine, 6-N-Methyllysine, N- Methylvaline, Norvaline, Norleucine, and Ornithine.
In addition to the polypeptide modifications listed above, many other customised modifications may be effected to attach novel chemical moieties to polypeptides. Well- known methods in addition to custom synthesis include the use of cross-linking reagents and disulphide linkage.
NUCLEIC ACID MODIFICATIONS
Nucleic acid modifications, which are useful in the embodiments of the invention which make use of modified nucleic acid or polypeptide to select gene switches, are discussed in this section. The modified nucleic acids may comprise epigenetic modifications such as methylated nucleic acids, or comprise nucleotide analogues as described below, etc.
Nucleotides generally include a base, a sugar and a phosphate group, with the base generally located at the 1 ' position of a sugar moiety. Modified nucleic acids generally comprise one or more modified nucleotides (also referred to interchangeably as nucleotide analogs, modified nucleotides, non-natural nucleotides, non-standard nucleotides and other; see for example, Usman and McSwiggen, supra; Eckstein et al. International PCT Publication No. WO 92/07065; Usman et al. International PCT Publication No. WO 93/15187; all hereby incorporated by reference herein).
The modified nucleotides may be modified at the sugar, phosphate and/or base moiety. There are several examples of modified nucleic acid bases known in the art as recently summarized by Limbach et al, 1994, Nucleic Acids Res. 22, 2183. Some of the non- limiting examples of base modifications that can be introduced into nucleic acids include, inosine, purine, pyridin-4-one, pyridin-2-one, phenyl, pseudouracil, 2, 4, 6- trimethoxy benzene, 3 -methyl uracil, dihydrouridine, naphthyl, aminophenyl, 5- alkylcytidines (e.g., 5-methylcytidine), 5-alkyluridines (e.g., ribothymidine), 5-halouridine (e.g., 5-bromouridine) or 6-azapyrimidines or 6-alkylpyrimidines (e.g. 6 -methyluridine) and others (Burgin et al, 1996, Biochemistry, 35, 14090). Other modified nucleotides include: 4-acetylcytidine, 5-(carboxyhydroxymethyl)uridine, 2'-O-methylcytidine, 5- carboxymethylaminomethyl-2-thiouridine, 5-carboxymethylaminomethyluridine, dihydrouridine, 2'-O-methylpseudouridine, betaD-galactosylqueuosine, 2'-0- methyl guanosine, inosine, N6-isopentenyladenosine, 1-methyladenosine, 1- methylpseudouridine, 1-methylguanosine, 1-methylinosine, 2,2-dimethylguanosine, 2- methyladenosine, 2-methylguanosine, 3-methylcytidine, 5-methylcytidine, N6- methyladenosine, 7-methylguanosine, 5-methylaminomethyluridine, 5- methoxyammomethyl-2-thiouridine, betaD-mannosylqueuosine, 5- methoxycarbonylmethyl-2-thiouridine, 5-methoxycarbonylmethyluridine, 5- methoxyuridine, 2-methylthio-N6-isopentenyladenosine, N-((9-beta-D-ribofuranosyl-2- methylthiopurine-6-yl)carbamoyl)threonine, N-((9-beta-D-ribofuranosylpurine-6-yl)N- methylcarbamoyl)threonine, uridine-5-oxyacetic acid-methylester, uridine-5-oxy acetic acid, wybutoxosine, pseudouridine, queuosine, 5-methyl-2-thiouridine, 2-thiocytidine, 5- methyl-2-thiouridine, 2-thiouridine, 4-thiouridine, 5-methyluridine, N-((9-beta-D-
ribofuranosylpurine-6-yl)-carbamoyl)threonine, 2'-O-methyl-5-methyluridine, 2'-0- methyluridine, wybutosine, 3-(3-amino-3-carboxy-propyl)uridine and (acp3)u.
Methylation of DNA is an epigenetic modification that can play an important role in the control of gene expression in mammalian cells (reviewed in Momparler and Bovenzi 2000, J Cell Physiol 183(2): 145-54 and Robertson and Jones, 2000, Carcinogenesis 21, 461-467).
Spermine conjugated oligonucleotides may be made by linking a sperminyl moiety at N4 of dC. Optionally, terminal amino groups of spermidine may be derivitised into guanidinium function. Furthermore, the spermine appendage at C4 of 5-Me-dC may be replaced with 1,1 l-diamino-3,6,9-trioxaundecaneto create 5-Me-dC-(N4-tetraethylene- glycolmonoamine) (teg)-ODNs. Spermine conjugated olignonucleotides are described in detail in Org. Chem, 1997, 62, 5169-5173; Tung, C. H, Breslauer, K. J. and Stein, S, Nucleic Acids Res, 1993, 21, 5489-5494; Maher, L. J, Wold, B. and Dervan, P. B, Science, 1989, 245, 725-729; Maher, L. J, Dervan, P. B. and Wold, B. J, Biochemistry, 1990, 29, 8820-8830; Prakash, T. P, Barawkar, D. A, Kumar, V. A. and Ganesh, K. N, BioMed. Chem. Lett, 1994, 4, 1733-1738; Barawkar, D. A, Kumar, V. A. and Ganesh, K. N, Biochem. Biophys. Res. Commun, 1994, 205, 1665-1670; Barawkar, D. A, Rajeev, K. G, Kumar, V. A. and Ganesh, K. N, Nucleic Acids Res, 1996, 24, 1229-1237; Rajeev, K. G, Jadhav, V. R. and Ganesh, K. N, Nucleic Acids Res, 1997, 25, 4187^1193.
5-Amino-dU oligonucleotides may be made by replacement of the 5-methyl group of T by an amino function to generate 5-amino-dU, a purine mimic. 5-amino-dU is described in detail in Barawkar, D. A, Krishna Kumar, R. and Ganesh, K. N, Tetrahedron, 1992, 48, 8505-8514; Barawkar, D. A. and Ganesh, K. N, BioMed. Chem. Lett, 1993, 3, 347-352; Rana, V. S, Barawkar, D. A. and Ganesh, K. N, J. Org. Chem, 1996, 61, 3578- 3579; Trapane, T. L, Christopherson, M. S, Coby, C. D, Ts'O, P. and Wang, D, J. Am. Chem. Soc, 1994, 116, 8412-8420.
The 5-amino function in 5-Amino-dU may be used to append ligands such as fluorescent groups, metallocomplexes, peptides, etc. For example, dansyl and 5/6-
carboxyfluorescein groups have been linked to this analogue to form 5-amidodansyl-dU, etc. Fluorescent oligonucleotides are described in detail in Singh, D, Kumar, V. A. and Ganesh, K. N, Nucleic Acids Res, 1990, 18, 3339-3345; Barawkar, D. A. and Ganesh, K. N, Nucleic Acids Res, 1995, 23, 159-164; Barawkar, D. A. and Ganesh, K. N, Biochem. Biophys. Res. Commun, 1994; Jadhav, V. R, Barawkar, D. A, Natu, A. A. and Ganesh, K. N, Nucleosides Nucleotides, 1997, 16, 107-114; Jadhav, V. R, Barawkar, D. A. and Ganesh, K. N, JCS Chem. Commun; Kochoyan, M. and Leroy, J. L, Curr. Opin. Struct. Biol, 1995, 5, 329-333.
Chloramphenicol backbone containing oligonucleotides are described in detail in Sanghvi, Y. S. and Cook, P. D, Carbohydrate Modification in Antisense Research, ACS Symposium Series, ACS, Washington DC, 1994; Rana, V. S, Kumar, V. A. and Ganesh, K. N, Bioorg. Med. Chem. Lett, 1997, 7, 2837-2842.
Peptide nucleic acids (PNAs, and their derivatives such as chiral, fluorescent and polyamine conjugates) may also be used as modified nucleic acids according to our invention. Peptide nucleic acids are a novel class of non-chiral, designed synthetic molecules and have an ethylenediamine-glycine backbone to which the nucleobases are linked (at N) through an acetyl chain. PNA is chemically stable and in contrast to natural nucleic acids and peptides. PNAs are described in detail in Nielsen, P. E, Egholm, M, Berg, R. H. and Buchardt, O, Science, 1991, 254, 1497-1501 ; Nielsen, P. E, Egholm, M. and Buchardt, O, Bioconj. Chem, 1994, 5, 3-8; Hyrup, B, Egholm, M. and Nielsen, P. E, BioMed. Chem. Lett, 1996, 4, 5-12; Eriksson, M. and Nielsen, P. E, Quart. Rev. Biophys, 1996, 29, 369-394; Good, L. and Nielsen, P. E, Antisense Nucleic Acids Drug Dev, 1997, 7, 431^140; Gangamani, B. P, Kumar, V. A. and Ganesh, K. N, Biochem. Biophys. Res. Commun, 1997, 240, 778-782; Gangamani, B. P, Kumar, V. A. and Ganesh, K. N, JCS Chem. Commun, 1997, 1913-1914; Gangamani, B. P., Kumar, V. A. and Ganesh, K. N, Tetrahedron, 1996, 52, 15017-15030; Gangamani, B. P, Kumar, V. A. and Ganesh, K. N, Tetrahedron, 1998.
Any nucleic acid may be modified by binding, conjugation, or linking etc a ligand or moeity to the nucleic acid. Preferably, the modified nucleic acid is produced by the
reaction of a nucleic acid capable of being derivatised together with a modifying moiety. More preferably, the nucleic acid capable of being derivatised contains an amino, thio, oxo or bromo group, or a group that can be chemically or photo-acitaved. Photochemically induced cross-linking is especially suitable for this purpose. For example, modified nucleic acids include derivatives of nucleic acids, for example, 5-bromo-pyrimidines,
5-iodo-pyrimidines, and 4-thiopyrimidines. Photoaffinity labelling using modified bases such as phosphoramidites of thionucleosides are also possible. For example, 4-thiothymidine and 6-thiodeoxyguanosine (S6-dG) are shown to cross-link effectively with EcoRV endonuclease and methyltransferase.
Meyer and Hanna, 1996, Bioconjug Chem Jul-Aug 7:4 401-12 describe the synthesis and characterization of a new 5-thiol-protected deoxyuridine phosphoramidite for site-specific modification of DNA. The thiol group in this phosphoramidite provides a unique site for the post-synthetic modification of that nucleotide with a variety of molecular tags, such as photo-cross-linkers and fluorescent or spin-label moieties.
The advantages of using the thio derivatives for photo cross-linking include their similarity to the natural structures, and the wavelength required, 340nm, which is removed from the maxima of the regular bases and should cause no other damage. In addition to its photochemical reactivity, nucleobases containing thiocarbonyl groups can also be chemically modified selectively at the sulfur position by alkylating reagents.
6-Thiodeoxyguanosine has also been incorporated into G-rich triple helix-forming oligonucleotides. Replacement of all or some G residues in G-rich oligonucleotides with S6-dG has been shown to inhibit self association and formation of G tetrads, especially in potassium buffers. This allows triple helix formation to take place normally.
Deprotected bases may also be effectively used in the manufacture of modified nucleic acids. For example, a S6-dG monomer (Glen Research, www.alenres.com) in which the S6 position has been protected with cyanoethyl and N2 with trifluoroacetyl protecting groups may be used. After normal synthesis, the synthesis column with oligonucleotides containing S6-dG may be treated5 with IM
l,8-diazabicyclo[5.4.0]undec-7-ene (DBU) in anhydrous acetonitrile at room temperature for 5 hours to remove the S6-cyanoethyl protecting group. The oligo deprotection is completed with 50 mM sodium hydrosulfide (NaSH) in ammonium hydroxide at room temperature for 24 hours.
Ribonucleotides may also be derivatised or modified to produce modified RNA such as 2'-Amino-RNA. RNA modification is currently in vogue for such applications as antisense and ribozymes. Interesting changes in RNA activity can be effected by substituting the 2'-hydroxyl with 2'-fluoro or 2'-O-alkyl groups. A further substitution which may be used could include the 2' -amino group. The thermal stability of duplexes containing 2'-amino-RNA has been determined and it is reported that 2'-amino-C substitutions destabilized by about 4 relative to RNA C. It is also further reported that 2'-amino-RNA linkages are nuclease-resistant.
The pKa of the 2'-amino group is quite low at 6.2 but this retains sufficient nucleophilicity to allow conjugation reactions to take place. It is therefore possible to label a 2'-amino group with a fluorophore like rhodamine. This activity has been used to investigate thermal motion in a large ribozyme. The 2'-position within an RNA duplex is directed towards the outside of the helix in a location which is very amenable to interhelix contact. The researchers were able to conjugate a disulfide group to the 2 '-amino group via an activated ester to yield intermediates. An exchange reaction between the activated disulfide and a thiol in the complementary section or strand neatly forms a disulfide cross-link.
Sequence Modifiers may be designed for use in automated synthesis of modified nucleic acids. The carboxy-dT is hydrolyzed during deprotection and may be coupled directly to a molecule containing a primary amino group by a standard peptide coupling or via the intermediate N-hydroxysuccinimide (NHS) ester. Both Amino-Modifier dT products can be added in place of a Thymidine residue during oligonucleotide synthesis. After deprotection, the primary amine on the C6 analogue is separated from the oligonucleotide by a spacer arm with a total of 10 atoms and can be labelled or attached to
an enzyme. The C2 analogue is more suitable for the attachment of molecules designed to react with the oligonucleotide.
Oligonucleotides containing pyrazolo[3,4-D]pyrimidines are described in detail in US Patent No. 6,127,121. Arrays of modified nucleic acid probes and methods using them are described in detail in US Paten No. 6, 156,501.
Any of the modified nucleic acids, or modifications of nucleic acids described above, including modified DNA and modified RNA, and other known modifications of these, may be used in the methods of our invention for selecting switching systems such as protein or gene switches.
NUCLEIC ACID VECTORS
A nucleic acid encoding a polypeptide, including a nucleic acid binding protein (which may be a DNA binding protein) as well as a polypeptide binding protein according to the invention can be incorporated into vectors for further manipulation. As used herein, vector (or plasmid) refers to discrete elements that are used to introduce heterologous nucleic acid into cells for either expression or replication thereof. Selection and use of such vehicles are well within the skill of the person of ordinary skill in the art. Many vectors are available, and selection of appropriate vector will depend on the intended use of the vector, i.e. whether it is to be used for DNA amplification or for nucleic acid expression, the size of the DNA to be inserted into the vector, and the host cell to be transformed with the vector. Each vector contains various components depending on its function (amplification of DNA or expression of DNA) and the host cell for which it is compatible. The vector components generally include, but are not limited to, one or more of the following: an origin of replication, one or more marker genes, an enhancer element, a promoter, a transcription termination sequence and a signal sequence.
Both expression and cloning vectors generally contain nucleic acid sequence that enable the vector to replicate in one or more selected host cells. Typically in cloning vectors, this sequence is one that enables the vector to replicate independently of the host
chromosomal DNA, and includes origins of replication or autonomously replicating sequences. Such sequences are well known for a variety of bacteria, yeast and viruses. The origin of replication from the plasmid pBR322 is suitable for most Gram-negative bacteria, the 2μ plasmid origin is suitable for yeast, and various viral origins (e.g. SV40, polyoma, adenovirus) are useful for cloning vectors in mammalian cells. Generally, the origin of replication component is not needed for mammalian expression vectors unless these are used in mammalian cells competent for high level DNA replication, such as COS cells.
Most expression vectors are shuttle vectors, i.e. they are capable of replication in at least one class of organisms but can be transfected into another class of organisms for expression. For example, a vector is cloned in E. coli and then the same vector is transfected into yeast, mammalian or plant cells even though it is not capable of replicating independently of the host cell chromosome. DNA may also be replicated by insertion into the host genome. However, the recovery of genomic DNA encoding the nucleic acid or polypeptide binding protein is more complex than that of episomally replicated vector because restriction enzyme digestion is required to excise nucleic acid binding protein
DNA. DNA can be amplified by PCR and be directly transfected into the host cells without any replication component.
Advantageously, an expression and cloning vector may contain a selection gene also referred to as selectable marker. This gene encodes a protein necessary for the survival or growth of transformed host cells grown in a selective culture medium. Host cells not transformed with the vector containing the selection gene will not survive in the culture medium. Typical selection genes encode proteins that confer resistance to antibiotics and other toxins, e.g. ampicillin, neomycin, methotrexate or tetracycline, complement auxotrophic deficiencies, or supply critical nutrients not available from complex media.
Selectable markers which may be used in fungal cells, for example yeast cells, include wild-type genes which complement auxotrophic defects in for example the Uracil (e.g. URA3 gene), Lysine (e.g. LYS2 gene), Adenine (e.g. ADE2 gene), Methionine (e.g. MET3 gene), Histidine (e.g. HIS3 gene), Tryptophan (e.g. TRP1 gene), Leucine (e.g. LEU2 gene) or other metabolic pathways. In addition, counter-selection methods are well
known in the art. These enable genes to be selected against by the action of a chemical precursor which is harmless unless converted to a toxic product by the action of one or more gene(s). Examples of these include; 5-fluoro-orotic acid, which is converted to a toxic compound by the action of the URA3 gene product; α-amino-adipic acid, which is converted to a toxic compound by the LYS2 gene product; allyl alcohol, which is converted to a toxic compound by alcohol dehydrogenase activity as encoded by the ADH genes, or any other suitable selective regime known to those skilled in the art. Other selective markers are based on the expression of a gene in a fungus such as yeast which overcomes the metabolic arrest induced by, or toxicity of, a chemical entity which may be added to the growth medium or otherwise presented to the cells. Examples of these may include the KAN gene(s) which confer resistance to antibiotics such as G-148, the HIS3 gene which confers resistance to 3-amino-triazole, or the ADH2 gene which can confer resistance to heavy metal ions such as cadmium, or any other suitable genes which confer resistance to toxic or growth arresting regimes.
Since the replication of vectors is conveniently done in E. coli, an E. coli genetic marker and an E. coli origin of replication are advantageously included. These can be obtained from E. coli plasmids, such as pBR322, Bluescript© vector or a pUC plasmid, e.g. pUC18 or pUC19, which contain both E. coli replication origin and E. coli genetic marker conferring resistance to antibiotics, such as ampicillin.
Suitable selectable markers for mammalian cells are those that enable the identification of cells competent to take up nucleic acid binding protein or polypeptide binding protein nucleic acid, such as dihydrofolate reductase (DHFR, methotrexate resistance), thymidine kinase, or genes conferring resistance to G418 or hygromycin. The mammalian cell transformants are placed under selection pressure which only those transformants which have taken up and are expressing the marker are uniquely adapted to survive. In the case of a DHFR or glutamine synthase (GS) marker, selection pressure can be imposed by culturing the transformants under conditions in which the pressure is progressively increased, thereby leading to amplification (at its chromosomal integration site) of both the selection gene and the linked DNA that encodes the nucleic acid binding protein or the polypeptide binding protein. Amplification is the process by which genes in
greater demand for the production of a protein critical for growth, together with closely associated genes which may encode a desired protein, are reiterated in tandem within the chromosomes of recombinant cells. Increased quantities of desired protein are usually synthesised from thus amplified DNA.
Expression and cloning vectors usually contain a promoter that is recognised by the host organism and is operably linked to nucleic acid encoding nucleic acid binding protein or the nucleic acid encoding polypeptide binding protein. Such a promoter may be inducible or constitutive. The promoters are operably linked to DNA encoding the binding protein by removing the promoter from the source DNA by restriction enzyme digestion and inserting the isolated promoter sequence into the vector. Both the native nucleic acid binding protein (or polypeptide binding protein, as the case may be) promoter sequence and many heterologous promoters may be used to direct amplification and/or expression of the binding protein.
Promoters suitable for use with prokaryotic hosts include, for example, the β-lactamase and lactose promoter systems, alkaline phosphatase, the tryptophan (trp) promoter system and hybrid promoters such as the tac promoter. Their nucleotide sequences have been published, thereby enabling the skilled worker operably to ligate them to DNA encoding nucleic acid or polypeptide binding protein, using linkers or adapters to supply any required restriction sites. Promoters for use in bacterial systems will also generally contain a Shine-Delgarno sequence operably linked to the DNA encoding the nucleic acid or polypeptide binding protein.
Preferred expression vectors are bacterial expression vectors which comprise a promoter of a bacteriophage such as phagex or T7 which is capable of functioning in the bacteria. In one of the most widely used expression systems, the nucleic acid encoding the fusion protein may be transcribed from the vector by T7 RNA polymerase (Studier et al, Methods in Enzymol. 185; 60-89, 1990). In the E. coli BL21(DE3) host strain, used in conjunction with pET vectors, the T7 RNA polymerase is produced from the β-lysogen DE3 in the host bacterium, and its expression is under the control of the IPTG inducible lac UV5 promoter. This system has been employed successfully for over-production of many
proteins. Alternatively the polymerase gene may be introduced on a lambda phage by infection with an int- phage such as the CE6 phage which is commercially available (Novagen, Madison, USA). Other vectors include vectors containing the lambda PL promoter such as PLEX (Invitrogen, NL), vectors containing the trc promoters such as pTrcHisXpressTm (Invitrogen) or pTrc99 (Pharmacia Biotech, SE) or vectors containing the tac promoter such as pKK223-3 (Pharmacia Biotech) or PMAL (New England Biolabs, MA, USA).
Moreover, the nucleic acid binding protein or polypeptide binding protein gene according to the invention preferably includes a secretion sequence in order to facilitate secretion of the polypeptide from bacterial hosts, such that it will be produced as a soluble native peptide rather than in an inclusion body. The peptide may be recovered from the bacterial periplasmic space, or the culture medium, as appropriate.
Suitable promoting sequences for use with yeast hosts may be regulated or constitutive and are preferably derived from a highly expressed yeast gene, especially a Saccharomyces cerevisiae gene. Thus, the promoter of the TRPl gene, the ADHI or
ADHII gene, the acid phosphatase (PH05) gene, a promoter of the yeast mating pheromone genes coding for the a- or α-factor or a promoter derived from a gene encoding a glycolytic enzyme such as the promoter of the enolase, glyceraldehyde-3-phosphate dehydrogenase (GAPDH), 3 -phospho glycerate kinase (PGK), hexokinase, pyruvate decarboxylase, phosphofructokinase, glucose-6-phosphate isomerase, 3-phosphoglycerate mutase, pyruvate kinase, triose phosphate isomerase, phosphoglucose isomerase or glucokinase genes, or a promoter from the TATA binding protein (TBP) gene can be used. Furthermore, it is possible to use hybrid promoters comprising upstream activation sequences (UAS) of one yeast gene and downstream promoter elements including a functional TATA box of another yeast gene, for example a hybrid promoter including the UAS(s) of the yeast PH05 gene and downstream promoter elements including a functional TATA box of the yeast GAP gene (PH05-GAP hybrid promoter). A suitable constitutive PHO5 promoter is e.g. a shortened acid phosphatase PH05 promoter devoid of the upstream regulatory elements (UAS) such as the PH05 (-173) promoter element starting at nucleotide -173 and ending at nucleotide -9 of the PH05 gene.
Binding protein gene transcription from vectors in mammalian hosts may be controlled by promoters derived from the genomes of viruses such as polyoma virus, adenovirus, fowlpox virus, bovine papilloma virus, avian sarcoma virus, cytomegalovirus (CMV), a retrovirus and Simian Virus 40 (SV40), from heterologous mammalian promoters such as the actin promoter or a very strong promoter, e.g. a ribosomal protein promoter, and from the promoter normally associated with nucleic acid binding protein or polypeptide binding protein sequence, provided such promoters are compatible with the host cell systems.
Transcription of a DNA encoding nucleic acid binding protein or polypeptide binding protein by higher eukaryotes may be increased by inserting an enhancer sequence into the vector. Enhancers are relatively orientation and position independent. Many enhancer sequences are known from mammalian genes (e.g. elastase and globin). However, typically one will employ an enhancer from a eukaryotic cell virus. Examples include the SV40 enhancer on the late side of the replication origin (bp 100-270) and the CMV early promoter enhancer. The enhancer may be spliced into the vector at a position 5' or 3' to binding protein DNA, but is preferably located at a site 5' from the promoter.
Advantageously, a eukaryotic expression vector encoding a nucleic binding protein or polypeptide binding protein according to the invention may comprise a locus control region (LCR). LCRs are capable of directing high-level integration site independent expression of transgenes integrated into host cell chromatin, which is of importance especially where the binding protein gene is to be expressed in the context of a permanently-transfected eukaryotic cell line in which chromosomal integration of the vector has occurred, or in transgenic animals.
Eukaryotic vectors may also contain sequences necessary for the termination of transcription and for stabilising the mRNA. Such sequences are commonly available from the 5' and 3' untranslated regions of eukaryotic or viral DNAs or cDNAs. These regions contain nucleotide segments transcribed as polyadenylated fragments in the untranslated portion of the mRNA encoding nucleic acid or polypeptide binding protein.
An expression vector includes any vector capable of expressing nucleic acid binding protein nucleic acids and polypeptide binding protein nucleic acids that are operatively linked with regulatory sequences, such as promoter regions, that are capable of expression of such DNAs. Thus, an expression vector refers to a recombinant DNA or RNA construct, such as a plasmid, a phage, recombinant virus or other vector, that upon introduction into an appropriate host cell, results in expression of the cloned DNA. Appropriate expression vectors are well known to those with ordinary skill in the art and include those that are replicable in eukaryotic and/or prokaryotic cells and those that remain episomal or those which integrate into the host cell genome. For example, DNAs encoding relevant binding protein may be inserted into a vector suitable for expression of cDNAs in mammalian cells, e.g. a CMV enhancer-based vector such as pEVRF (Matthias, et α/, (1989) NAR 17, 6418).
In a preferred embodiment, the nucleic acid binding protein and polypeptide binding protein constructs of the invention are expressed in plant cells under the control of transcriptional regulatory sequences that are known to function in plants. The regulatory sequences selected will depend on the required temporal and spatial expression pattern of the binding protein in the host plant. Many plant promoters have been characterised and would be suitable for use in conjunction with the invention. By way of illustration, some examples are provided below:
A large number of promoters are known in the art which direct expression in specific tissues and organs (e.g. roots, leaves, flowers) or in cell types (e.g. leaf epidermal cells, leaf mesophyll cells, root cortex cells). For example, the maize PEPC promoter from the phosphoenol carboxylase gene (Hudspeth & Grula Plant Mol. Bio. 12: 579-589 (1989)) is green tissue-specific; the trpA gene promoter is pith cell-specific (WO 93/07278 to Ciba- Geigy); the TA29 promoter is pollen-specific (Mariani et al. Nature 347: 737-741 (1990); Mariani et al Nature 357: 384-387 (1992)).
Other promoters direct transcription under conditions of presence of light or absence or light or in a circadian manner. For example, the GS2 promoter described by Edwards and Coruzzi, Plant Cell 1 : 241-248 (1989) is induced by light, whereas the AS1
promoter described by Tsai and Coruzzi, EMBO J 9: 323-332 (1990) is expressed only In conditions of darkness.
Other promoters are wound- inducible and typically direct transcription not just on wound induction, but also at the sites of pathogen infection. Examples are described by Xu et al. (Plant Mol. Biol. 22: 573-588 (1993)); Logemann et al. (Plant Cell 1: 151-158 (1989)); and Firek et al. (Plant Mol Biol 22: 129-142 (1993)).
A number of constitutive promoters can be used in plants. These include the Cauliflower Mosaic Virus 35S promoter (US 5,352,605 and US 5,322,938, both to Monsanto) including minimal promoters (such as the -46 or -90 CaMV 35S promoter) linked to other regulatory sequences, the rice actin promoter (McElroy et al. Mol. Gen. Genet. 231: 150-160 (1991)), and the maize and sunflower ubiquitin promoters (Christensen et al. Plant Mol Biol. 12: 619-632 (1989); Binet et al. Plant Science 79: 87-94 (1991)).
Using promoters that direct transcription in the plant species of interest, the nucleic acid or polypeptide binding protein of the invention can be expressed in the required cell or tissue types. For example, if it is the intention to utilise the nucleic acid or polypeptide binding protein to regulate a gene in a specific cell or tissue type, then the appropriate promoter can be used to direct expression of the binding protein construct.
An appropriate terminator of transcription is fused downstream of the selected binding protein containing transgene and any of a number of available terminators can be used in conjunction with the invention. Examples of transcriptional terminator sequences that are known to function in plants include the nopaline synthase terminator found in the ' pBI vectors (Clontech catalog 1993/1994), the E9 terminator from the rbcS gene (ref), and the tml terminator from Cauliflower Mosaic Virus.
A number of sequences found within the transcriptional unit are known to enhance gene expression and these can be used within the context of the current invention. Such sequences include intron sequences which, particularly in monocotyledonous cells, are
known to enhance expression. Both intron 1 of the maize Adhl gene and the intron from the maize bronze 1 gene have been found to be effective in enhancing expression in maize cells (Callis et al. Genes Develop. 1 : 1 183-1200 (1987)) and intron sequences are frequently incorporated into plant transformation vectors, typically within the non- translated leader.
A number of virus-derived non-translated leader sequences have been found to enhance expression, especially in dicotyledonous cells. Examples include the "Ω" leader sequence of Tobacco Mosaic Virus, and similar leader sequences of Maize Chlorotic Mottle Virus and Alfalfa Mosaic Virus (Gallie et al. Nucl. Acids Res. 15: 8693-8711 (1987); Shuzeski et al. Plant Mol Biol, 15: 65-79 (1990)).
The nucleic acid binding proteins of the current invention are targeted to the cell nucleus so that they are able to interact with host cell DNA and bind to the appropriate DNA target in the nucleus and regulate transcription. It may also be desirable to target the polypeptide binding proteins of the invention to the nucleus, if this is where the target polypeptides bound by the polypeptide binding proteins are located, and/or where the activity modulated by binding of the proteins to each other is to be expressed.
To effect this, a Nuclear Localisation Sequence (NLS) is incoφorated in frame with the construct, for example the expressible zinc finger construct. The NLS can be fused either 5' or 3' to the protein encoding sequence.
The NLS of the wild-type Simian Virus 40 Large T- Antigen (Kal feron et al. Cell
37: 801-813 (1984); Markland et al. Mol. Cell Biol. 7: 4255-4265 (1987)) is an appropriate NLS and has previously been shown to provide an effective nuclear localisation mechanism in plants (van der Krol et al. Plant Cell 3: 667-675 (1991)). However, several alternative NLSs are known in the art and can be used instead of the SV40 NLS sequence. These include the Nuclear Localisation Signals of TGA-1 A and TGA-IB (van der Krol et al; Plant Cell 3: 667-675 (1991)).
A variety of transformation vectors are available for plant transformation and the nucleic acid or polypeptide binding protein encoding genes of the invention can be used in conjunction with any such vectors. The selection of vector will depend on the preferred transformation technique and the plant species which is to be transformed. For certain target species, different selectable markers may be preferred.
For Agrobacterium-mediaied transformation, binary vectors or vectors carrying at least one T-DNA border sequence are suitable. A number of vectors are available including pBIN19 (Bevan, Nucl. Acids Res. 12: 8711-8721 (1984), the pBI series of vectors, and pCIBlO and derivatives thereof (Rothstein et al. Gene 53: 153-161 (1987); WO 95/33818 to Ciba-Geigy).
Binary vector constructs prepared for Agrobacterium transformation are introduced into an appropriate strain of Agrobacterium tumefaciens (for example, LB A 4044 or GV 3101) either by triparental mating (Bevan; Nucl. Acids Res. 12: 8711-8721 (1984)) or direct transformation (Hδfgen & Willmitzer, Nucl. Acids Res. 16: 9877 (1988)).
For transformation which is not Agrobacterium-mediated (i. e. direct gene transfer), any vector is suitable and linear DNA containing only the construct of interest may be preferred. Direct gene transfer can be undertaken using a single DNA species or multiple DNA species (co-transformation; Schroder et αl Biotechnology 4: 1093-1096 (1986)).
Particularly useful for practising several embodiments of the present invention are expression vectors that provide for the transient expression of DNA encoding a nucleic acid nucleic acid or polypeptide binding protein in plant cells or mammalian cells. Transient expression usually involves the use of an expression vector that is able to replicate efficiently in a host cell, such that the host cell accumulates many copies of the expression vector, and, in turn, synthesises high levels of nucleic acid or polypeptide binding protein. For the purposes of the present invention, transient expression systems are useful e.g. for identifying DNA binding protein mutants, to identify potential phosphorylation sites, or to characterise functional domains, for example domains which mediate protein-protein interaction, of the protein.
Construction of vectors according to the invention employs conventional ligation techniques. Isolated plasmids or DNA fragments are cleaved, tailored, and religated in the form desired to generate the plasmids required. If desired, analysis to confirm correct sequences in the constructed plasmids is performed in a known fashion. Suitable methods for constructing expression vectors, preparing in vitro transcripts, introducing DNA into host cells, and performing analyses for assessing DNA binding protein expression and function are known to those skilled in the art. Gene presence, amplification and/or expression may be measured in a sample directly, for example, by conventional Southern blotting, Northern blotting to quantitate the transcription of mRNA, dot blotting (DNA or RNA analysis), or in situ hybridisation, using an appropriately labelled probe which may be based on a sequence provided herein. Those skilled in the art will readily envisage how these methods may be modified, if desired.
In accordance with another embodiment of the present invention, there are provided cells containing the above-described nucleic acids. Such host cells such as prokaryote, yeast and higher eukaryote cells may be used for replicating DNA and producing the DNA binding protein. Suitable prokaryotes include eubacteria, such as Gram-negative or Gram-positive organisms, such as E.coli, e.g. E.coli K-12 strains, DH5α and HB101, or Bacilli. Further hosts suitable for the nucleic acid or polypeptide binding protein encoding vectors include eukaryotic microbes such as filamentous fungi or yeast, e.g. Saccharomyces cerevisiae. Higher eukaryotic cells include plant cells and animal cells such as insect and vertebrate cells, particularly mammalian cells including human cells, or nucleated cells from other multicellular organisms. In recent years propagation of vertebrate cells in culture (tissue culture) has become a routine procedure. Examples of useful mammalian host cell lines are epithelial or fibroblastic cell lines such as Chinese hamster ovary (CHO) cells, NIH 3T3 cells, HeLa cells or 293T cells. The host cells referred to in this disclosure comprise cells in in vitro culture as well as cells that are within a multicellular host organism.
DNA may be stably incorporated into cells or may be transiently expressed using methods known in the art. Stably transfected cells may be prepared by transfecting cells with an expression vector having a selectable marker gene, and growing the transfected
cells under conditions selective for cells expressing the marker gene. To prepare transient transfectants, cells are transfected with a reporter gene to monitor transfection efficiency.
To produce such stably or transiently transfected cells, the cells should be transfected with a sufficient amount of the nucleic acid or polypeptide binding protein-encoding nucleic acid to form the relevant binding protein. The precise amounts of DNA encoding the nucleic acid or polypeptide binding protein may be empirically determined and optimised for a particular cell and assay.
Host cells are transfected or, preferably, transformed with the above-mentioned expression or cloning vectors of this invention and cultured in conventional nutrient media modified as appropriate for inducing promoters, selecting transformants, or amplifying the genes encoding the desired sequences. Heterologous DNA may be introduced into host cells by any method known in the art, such as transfection with a vector encoding a heterologous DNA by the calcium phosphate coprecipitation technique or by electroporation. Numerous methods of transfection are known to the skilled worker in the field. Successful transfection is generally recognised when any indication of the operation of this vector occurs in the host cell. Transformation is achieved using standard techniques appropriate to the particular host cells used.
Incorporation of cloned DNA into a suitable expression vector, transfection of eukaryotic cells with a plasmid vector or a combination of plasmid vectors, each encoding one or more distinct genes or with linear DNA, and selection of transfected cells are well known in the art (see, e.g. Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press).
Transfected or transformed cells are cultured using media and culturing methods known in the art, preferably under conditions whereby the nucleic acid or polypeptide binding protein encoded by the DNA is expressed. The composition of suitable media is known to those in the art, so that they can be readily prepared. Suitable culturing media are also commercially available.
Transformation of plant cells is normally undertaken with a selectable marker which may provide resistance to an antibiotic or to a herbicide. Selectable markers that are routinely used in transformation include the nptll gene which confers resistance to kanamycin (Messing & Vierra Gene 19: 259-268 (1982); Bevan et al. Nature 304: 184-187 (1983)), the bar gene which confers resistance to the herbicide phosphinothricin (White et al. Nucl. Acids Res. 18: 1062 (1990); Spencer et al. Theor. Appl. Genet. 79: 625-631 (1990)), the hph gene which confers resistance to the antibiotic hygromycin (Blochlinger & Diggelmann Mol. Cell Biol. 4: 2929-2931 (1984)), and the dhfr gene which confers resistance to methotrexate (Bourouis et al. EMBO J 2: 1099-1104 (1983)). More recently, a number of selection systems have been developed which do not rely of selection for resistance to antibiotic or herbicide. These include the inducible isopentyl transferase system described by Kunkel et al. (Nature Biotechnology 17: 916-919 (1999).
Although specific protocols may vary from species to species, transformation techniques are well known in the art for most commercial plant species.
In the case of dicotyledonous species, Agrobacterium-mediated transformation is generally a preferred technique as it has broad application to many dicotyledonous species and is generally very efficient. Agrobacterium-mediated transformation generally involves the co-cultivation of Agrobacterium with explants from the plant and follows procedures and protocols that are known in the art. Transformed tissue is generally regenerated on medium carrying the appropriate selectable marker. Protocols are known in the art for many dicotyledonous crops including (for example) cotton, tomato, canola and oilseed rape, poplar, potato, sunflower, tobacco and soybean (see for example EP 0 317 511, EP 0 249 432, WO 87/07299, US 5,795,855).
In addition to Agrobacterium-mediated transformation, various other techniques can be applied to dicotyledons. These include PEG and electroporation-mediated transformation of protoplasts, and microinjection (see for example Potrykus et αl Mol. Gen. Genet. 199: 169-177 (1985); Reich et αl Biotechnology 4: 1001-1004 (1986); Klein et αl. Nature 327: 70-73 (1987)). As with -4grob-7Cterz-rø-mediated transformation,
transformed tissue is generally regenerated on medium carrying the appropriate selectable marker using standard techniques known in the art.
Although Agrobacterium-mediated transformation has been applied successfully to monocotyledonous species such as rice and maize and protocols for these approaches are available in the art, the most widely used transformation techniques for monocotyledons remain particle bombardment, and PEG and electroporation-mediated transformation of protoplasts.
In the case of maize, Gordon-Kamm et αl. (Plant Cell 2: 603-618 (1990)), Fromm et αl (Biotechnology 8: 833-839 (1990) and Koziel et αl (Biotechnology 11 : 194-200 (1993)) have published techniques for transformation using particle bombardment.
In the case of rice, protoplast-mediated transformation for both Jαponicα- and Indicα-ty→pes has been described (Zhang et αl. Plant Cell Rep. 7: 379-384 (1988); Shimamoto et αl. Nature 338: 274-277; Datta et αl. Biotechnology 8: 736-740 (1990)) and both types are also routinely transformable using particle bombardment (Christou et αl. Biotechnology 9: 957-962 (1991)).
In the case of wheat, transformation by particle bombardment has been described for both type C long-term regenerable callus (Vasil et αl. Biotechnology 10: 667-674 (1992)) and immature embryos and immature embryo-derived callus (Vasil et αl Biotechnology ϋ: 1553-1558 (1993); Weeks et αl Plant Physiol. 102: 1077-1084 (1993)). A further technique is described in published patent applications WO 94/13822 and WO 95/33818.
The nucleic acid and polypeptide binding protein constructs of the invention are suitable for expression in a variety of different organisms. However, to enhance the efficiency of expression it may be necessary to modify the nucleotide sequence encoding the nucleic acid or polypeptide binding protein to account for different frequencies of codon usage in different host organisms. Hence it is preferable that the sequences to be
introduced into organisms, such as plants, conform to preferred usage of codons in the host organism.
In general, high expression in plants is best achieved from codon sequences that have a GC content of at least 35% and preferably more than 45%. This is thought to be because the existence of ATTTA motifs destabilise messenger RNAs and the existence of AATAAA motifs may cause inappropriate polyadenylation, resulting in truncation of transcription. Murray et al. (Nucl. Acids Res. 17: 477-498 (1989)) have shown that even within plants, monocotyledonous and dicotyledonous species have differing preferences for codon usage, with monocotyledonous species generally preferring GC richer sequences. Thus, in order to achieve optimal high level expression in plants, gene sequences can be altered to accommodate such preferences in codon usage in such a manner that the codons encoded by the DNA are not changed.
Plants also have a preference for certain nucleotides adjacent to the ATG encoding the initiating methionine and for most efficient translation, these nucleotides may be modified. To facilitate translation in plant cells, it is preferable to insert, immediately upstream of the ATG representing the initiating methionine of the gene to be expressed, a "plant translational initiation context sequence". A variety of sequences can be inserted at this position. These include the sequence the sequence 5'-AAGGAGATATAACAATG-3' (Prasher et al Gene 111: 229-233 (1992); Chalfie et al. Science 263 : 802-805 (1992)), the sequence 5'-GTCGACCATG-3' (Clontech 1993/1994 catalog, page 210), and the sequence 5'-TAAACAATG-3' (Joshi et al. Nucl. Acids Res. 15: 6643-6653 (1987)). For any particular plant species, a survey of natural sequences available in any databank (e.g. GenBank) can be undertaken to determine preferred "plant translational initiation context sequences" on a species-by-species basis.
Any changes that are made to the coding sequence can be made using techniques that are well known in the art and include site directed mutagenesis, PCR, and synthetic gene construction. Such methods are described in published patent applications EP 0 385 962 (to Monsanto), EP 0 359 472 (to Lubrizol) and WO 93/07278 (to Ciba-Geigy). Well
known protocols for transient expression in plants can be used to check the expression of modified genes before their transfer to plants by transformation.
Any of the vectors, including expression vectors, described above, in particular those vectors comprising cloned members of a library of nucleic acid binding polypeptides, may be arranged in the form of an array. The invention therefore includes such arrays of vectors and uses of these arrays.
LIGANDS
A ligand according to the invention is typically any molecule capable of binding to any of the other components of a switching system.
For example, with regard to a gene switch, a ligand is typically capable of binding to nucleic acid such as DNA, the nucleic acid binding molecule or any other component of the gene expression machinery. A variety of nucleic acid binding ligands are known in the art and include acridine orange, 9-Amino-6-chloro-2-methoxyacridine, actinomycin D, 7- aminoactinomycin D, echinomycin, dihydroethidium, ethidium-acridine heterodimer, ethidium bromide, propidium iodide, hexidium iodide, Hoechst 33258, Hoechst 33342, hydroxystibamidine, psoralen, Distamycin A, calicheamicin oligosaccharides, triple helix forming oligos, PNA, pyrole-imidazole polyamides and synthetic peptides or peptide derivatives such as described by Lescinier et al, Chem. Eur. J. 4:425-433 (1998).
A ligand, as described above, may be capable of binding to a nucleic acid. Thus, a nucleic acid binding ligand (for example, a DNA binding ligand) includes a polypeptide capable of binding to the nucleic acid (i.e., a DNA binding molecule such as a DNA binding polypeptide). Furthermore, the term ligand includes molecules which are themselves comprised of nucleic acids, for example, RNA aptamers, and are capable of binding to a polypeptide, a nucleic acid, or both.
Also included within the meaning of the term ligand and nucleic acid binding molecules are molecules capable of binding RNA and/or other nucleic acids. Ligands may
bind to the primary, secondary or tertiary structure of RNA. Examples of RNA binding ligands include aminoglycosides, which are caapble of binding ribosomal RNA and causing misreading of the genetic code, paromomycin, which is capable of interacting with the RNA major groove within a pocket created by an A-A base pair and a bulged adenine, and paromycin.
However, the ligand may be capable of binding to the nucleic acid binding molecule, or indeed to both the nucleic acid and the nucleic acid binding molecule.
As applied to a protein switch, a ligand is any molecule capable of binding to the polypeptide binding molecule (including a polypeptide binding protein), or another protein. Protein binding ligands are known in the art, and include, for example, immunoglobulins. antibodies, ATP, cAMP, GAB A, Fas ligand, CIDs (chemical inducers of dimerization), an FK506 and FK1012 (as described in Spencer et al, 1993, Science 262 1019), peptide hormone molecules, retinoic acid, acridine derivatives and other anticancer drugs as described in Finlay and Baguley (2000), Cancer Chemother Pharmacol 45, 417, etc. Furthermore, the ligand may be capable of binding to both members of the protein switch, i.e., the first polypeptide and the second polypeptide.
Ligand mediated protein-protein association is described for example in Lin et al (1998), Blood 91, 890-897, Spencer et al (1993), Science 262, 1019, Keenan et al. (1998) Bioorganic and Medicinal Chemistry 6, 1309-1335 and Fan et al. (1999), Human Gene Therapy 10, 2273-2285.
Derivatives of ligands are also included provided that they are capable of binding to the nucleic acid or polypeptide components, as the case may be, of the switching system as described herein.
The ligands according to the present invention for use in the switching systems described here are preferably capable of modulating the interaction between the components of the switching system. Thus, in a gene switch, a ligand component is capable of affecting the strength of binding between the nucleic acid binding molecule
component and the nucleic acid component of the switch. In a protein switch, the ligand is capable of affecting the strength of binding between the polypeptide members of the switch. Addition of a ligand to a switching system can therefore cause the association or disassociation between the components of the switching system. The ligand may do this by direct or indirect means. For example, the ligand may be capable of directly binding to one or more members of the switching system to disrupt the binding or to precipitate association. Furthermore, the ligand may be capable of binding to other entities associated with one or more members of the gene switch (including accessory proteins). For example, in the case of a gene switch, a nucleic acid binding molecule, such as a DNA binding protein (for example a transcription factor) may be in the form of a complex which binds to the nucleic acid (e.g., DNA). Only one or some of the members of the complex, however, may actually physically contact the nucleic acid. In this case, the ligand may bind to one or more members of the complex, not necessarily the nucleic acid binding molecule, to disrupt the binding between the nucleic acid binding molecule and the DNA (and hence the complex and the DNA). Likewise, association may be promoted by ligand binding to one or more members of a complex comprising the nucleic acid binding molecule. Similar considerations apply to protein switches.
In a preferred embodiment, a ligand according to the invention is capable of modulating the topology, locally or otherwise, of the nucleic acid or polypeptide to which it is bound. For example, a ligand according to the invention may be capable of modulating the topology and/or stereochemistry of a juxtaposed nucleic acid sequence motif to which it is desired to bind a DNA binding molecule according to the invention, or the topology and/or stereochemistry of a protein binding motif on a protein capable of binding to another protein.
Indeed, where we describe a method of selecting one or more components of a gene switch by use of modified nucleic acid, or modified nucleic acid binding molecule, or both, the ligand is preferably capable of mimicking the topology and/or stereochemistry of the modified component, most preferably, at or around the interface with another component (which itself may or may not be modified). For example, the initial steps of selection in this embodiment may involve, for example, selection of a polypeptide modified with a
moeity, which is capable of binding to a nucleic acid. A subsequent step compares the binding of the unmodified form of the polypeptide to the nucleic acid, in the presence of one or more candidate ligands, to select ligand(s) which are capable of causing the unmodified polypeptide to bind to the nucleic acid.
The ligand in this case is preferably capable of mimicking the topology or sterochemistry, etc of the (unmodified) polypeptide where it interacts with the nucleic acid. The ligand may do this by binding transiently to the polypeptide such that the resultant complex is topologically similar to the modified polypeptide. The ligand may be similar in shape to the moeity. Furthermore, the ligand bound to the unmodified polypeptide may adopt a similar charge or hydrophobicity, etc as the modified polypeptide. Thus, in general, the ligand and the unmodified polypeptide preferably adopts or mimicks one or more characteristics of the modified polypeptide. More preferably, the ligand preferably mimicks one or more characteristics of the moeity with which the polypeptide is modified.
It may therefore be considered that addition of the ligand to the nucleic acid binding molecule / nucleic supplies the moiety in trans, allowing binding or disruption of binding to take place.
Exemplary ligands for nucleic acid binding have shape and charge characteristics that allow them to reside along the DNA, in either the minor or major groove, intercalate or a combination of these.
Suitable ligands in addition to those known in the art may be selected by the use of nucleic acid or polypeptide binding assays. For example, a candidate ligand, preferably a plurality of candidate ligands, is contacted with target nucleic acid or polypeptides and binding determined. The targets may for example be labelled with a detectable label, such as a fluorophore/fluorochrome, such that after a wash step binding can be determined easily, for example by monitoring fluorescence. The target with which the candidate binding ligands are contacted may be non-specific, such as a random polypeptide or nucleic acid libraries or sonicated genomic DNA and the like. Alternatively, a specific
sequence may be used, or a partially randomised library of sequences. In particular, a ligand library may be in the form of a combinatorial chemical library.
It is particularly preferred that ligands of the invention bind to polypeptides or DNA in a sequence and/or topology dependent manner so that binding can be restricted to a particular target, thus enhancing the specificity of the gene or protein switch. Specificity of binding may be determined, for example, by comparing the binding of the ligand to a target sequence with binding to a mixture of non-specific molecules.
Ligands according to the invention may bind conditionally to their targets. For example, psoralen is a ligand that can bind DNA covalently if illuminated at wavelengths of about 400 nm or less. Ligands capable of binding their targets in more than one manner may be employed in the current invention. Such ligands may bind or associate with the target via any one or more mechanism(s) such as outlined above.
In a preferred embodiment, libraries of ligands may be prepared. In particular, libraries of ligands may be immobilised to a solid phase, such as a substantially planar solid phase, including membranes and non-porous substrates such as plastic and glass. The resulting immobilised library may conveniently be used in high throughput screening procedures.
Particularly preferred ligands are those which are substantially non toxic to plants and or animal cells such that they may be administered to said cells and modulate binding of the nucleic acid or polypeptide binding molecule without having an adverse effect on the cells. Thus it may be desirable to pre-screen compounds to exclude toxic compounds.
Furthermore, given that ligands should typically be capable of being taken up by or otherwise entering the cells of animals or plants, preferred compounds are suitable for administration to animals and plants. For example, preferred compounds are capable of being taken up via the leaves (for foliar application) or roots of plants (for application to the soil) or of permeating seeds (for use in seed treatment). It may also preferred to use compounds that can be taken up by bacteria, yeast and/or fungi that can themselves be
delivered to the target host organism. The compounds should also preferably be stable in the soil and/or plant for prolonged periods. In the case of animals, preferred compounds are suitable for topical, systemic, or oral adminstration.
TARGET NUCLEIC ACID
The term 'target nucleic acid' refers to any DNA or other nucleic acid for use in the methods of the invention. This nucleic acid may be of known sequence, or may be of unknown sequence. This nucleic acid may be prepared artificially in a laboratory, or may be a naturally occurring nucleic acid. This nucleic acid may be in substantially pure form, or may be in a partially purified form, or may be part of an unpurified or heterogeneous sample. Preferably, the target nucleic acid is a putative promoter or other transcription regulatory region such as an enhancer. More preferably, the target nucleic acid is in substantially pure form. Even more preferably, the target nucleic acid is of known sequence. In a most preferred embodiment, the target nucleic acid is purified nucleic acid of known sequence of a promoter from a gene of interest, for example from a gene suspected of being associated with a disease state, more preferably from a gene useful in gene therapy.
Examples of target sequences of interest include sequence motifs that are bound by transcription factors, such as zinc fingers. Particular examples include the promoters of genes involved in the biosythesis and catabolism of gibberellins (Phillips et al, Plant Physiol 108: 1049-1057 (1995), MacMillin et α/. Plant Physiol 113: 1369-1377 (1997), Williams et al, Plant Physiol 117: 559-563 (1998); Thomas et al, PNAS 96: 4698-4703 (1999)); the promoters of genes whose products are reponsible for ripening (such as polygalacturonase and ACC oxidase; the promoters of genes involved in the biosythesis of volatile ester, which are important flavour compounds in fruits and vegetables (Dudavera et al, Plant Cell 8: 1137-1148 (1996); Dudavera et al, Plant J. 14: 297-304 (1998); Ross et al, Arch. Biochem. Biophys. 367: 9-16 (1999)); the promoters of genes involved in the biosynthesis of pharmaceutically important compounds; and the promoters of genes encoding allergens such as the peatnut allergens Arahl, Arah2 and Arah3 (Rabjohn et al, J. Clin. Invest 103: 535-542).
Other plant promoters of interest are the bronze promoter (Ralston et al, Genetics 119: 185-197 (1988) and Genbank Accession No. X07937.1) that directs expression of UDPglucose flavanoid glycosyl-transferase in maize, the patatin-1 gene promoter (Jefferson et al, Plant Mo. Biol. 14: 995-1006 (1990)) that contains sequences capable of directing tuber-specific expression, and the phenylalanine ammonia lyase promoter (Bevan et al, Embo J. 8: 1899-1906 (1989)) though to be involved in responses to mechanical wounding and normal development of the xylem and flower.
Target nucleic acid may also be provided as a plurality of sequences, for example where one or more residues in the nucleic acid sequence are varied or random. Examples of a plurality of sequences are libraries of nucleic acid sequences comprising putative zinc finger binding sites. In a highly preferred embodiment of the invention, the plurality library of target nucleic acid sequences are in the form of an array of target nucleic acid sequences, as described above. Thus, the invention encompasses arrays of nucleic acids comprising putative zinc finger binding sites, and their use in screening for gene switches. Other sequence motifs that bind the nucleic acid binding domain of a transcription factor may also be included in the plurality of sequences, typically varied or randomised at one or more positions. For example the chemically inducible promoter fragments described above may be randomised to produce a plurality of target nucleic acid sequences for use in the screening methods of the present invention. Accordingly, the invention includes arrays of randomised chemically inducible prmoter fragments, and their use.
ASSAYS
With respect to gene switches, the methods of the present invention typically involve using a tripartite configuration of one or more nucleic acid binding molecules, one or more ligands and one or more target nucleic acid sequences as described above to screen for (i) nucleic acid binding molecules that bind to a target nucleic acid in a manner that is modulatable by a ligand (ii) ligands that modulate binding of a nucleic acid binding molecule to a target nucleic acid and/or (iii) a target nucleic acid that is bound modulatably by a nucleic acid binding molecule as a result of an interaction with a ligand. With regard to protein switches, the methods of the present invention typically involve using a tripartite
configuration of one or more first polypeptide molecules, one or more ligands and one or more second polypeptide as described above to screen for (i) polypeptide binding molecules that bind to a (another) target polypeptide in a manner that is modulatable by a ligand and/or (ii) ligands that modulate binding of two polypeptides to each other.
In other words the methods of the invention may be used to screen for any or all of the components of the gene switch system or protein switch system of the present invention.
Typically, one or two of the components is a known constant while two or one, respectively, of the other components are screened. For example, a given nucleic acid binding molecule and target nucleic acid may be used to screen a plurality of ligands or candidate ligands. Alternatively, a plurality of nucleic acid binding molecules and of ligands may be screened against a given target nucleic acid for a gene switch, and a plurality of polypeptide binding molecules and of ligands may be screened against a given target polypeptide for a protein switch. Other combinations are also envisaged.
Each component may be one individual molecular species or a plurality of molecular species. Where a plurality of species is used, they may be substantially all known, partially randomised or fully randomised. For example, the plurality of nucleic acid binding molecules may be a randomised zinc finger library and the plurality of target nucleic acid may be a library of nucleic acid molecules randomised at one or more, typically three or more contiguous, residues. Alternatively, for a protein switch screen, the plurality of polypeptide molecules may be a library of polypeptides randomised at one or more locations. Preferably, the library corresponding to the plurality of species is in the form of an array.
However, all three components may be screened for simultaneously. Thus, in a preferred embodiment, the invention provides a method for isolating multiple nucleic acid or polypeptide binding molecules in the presence of multiple ligands, said nucleic acid or polypeptide binding molecules being selected using multiple target nucleic acid sequences (or target polypeptides as the case may be) in a single selection (isolation) procedure.
Preferably, however, in the selection of a gene switch, at least one of the nucleic acid binding molecules, target nucleic acids and candidate ligands is provided in the form of an array. Thus, the invention encompasses methods of selecting a gene switch in which one, two or all three of the nucleic acid binding molecules, target nucleic acids and candidate ligands is in the form of an array. In other words, the methods may involve: arrayed nucleic acid binding molecules and arrayed target nucleic acids; arrayed nucleic acid binding molecules and arrayed candidate ligands; or arrayed target nucleic acids and arrayed candidate ligands. Furthermore, the invention includes the use of all three components in the form of arrays, i.e., arrayed nucleic acid binding molecules, arrayed target nucleic acids and arrayed candidate ligands. Similarly, in the selection of a protein switch, at least one of the candidate first and second polypeptides and the candidate ligand is in the form of an array.
Any or all of the components not in the form of an array may comprise a single species, or a plurality of species (such as a library).
The library of candidate nucleic acid or polypeptide binding molecules is preferably a phage display library. In the case of nucleic acid phage libraries, individual candidate molecules of the library optionally are structurally related to zinc finger transcription factors (for example see Choo and Klug, (1994) PNAS (USA) 91:11163-67, which describes aspects of such libraries and is incorporated herein by reference). This library is preferably constructed with DNA sequences of the form GCGNNNGCG (where all 64 middle triplets are represented in the mixture).
One or more ligands means at least one ligand, preferably two, three or four ligands, more preferably five, six, or seven ligands, most preferably a mixture of eight ligands, or even more. The ligands may be in any molar ratio to one another within the mixture, but will preferably be approximately equimolar with one another. The ligands may be provided in the form of a library of ligands.
In our selection method as applied to a protein switch in which the protein components are single species and the ligand is provided in the form of a ligand library, the
methods of our invention as described herein allow the selection of potential ligand molecules of interest as a first step, i.e., those which form complexes with the first and second polypeptides. Thus, ligands of interest are selected which are capable of binding to one or both of the polypeptides. As a second step, the strength of binding of the polypeptides to each other are tested in the absence or presence of the ligand component of the complex to select those complexes in which the binding between the polypeptides differs in the presence or absence of the ligand component. Our selection method therefore directly selects ligands which bind to one or both of the polypeptides, without the need for any further screen to determine whether an individual ligand molecule is capable of forming a complex with the polypeptides.
The method of our invention may preferably be carried out over at least 3, 4, 5 or 6 rounds of selection, preferably about 6 rounds of selection.
Nucleic acid or polypeptide binding molecules (such as phage clones) isolated by the above methods are preferably individually assayed (for example in microtitre plates as described below, preferably in the form of an array) for binding to the target nucleic acid (such as a GCGNNNGCG mixture) or a target polypeptide (as the case may be) in the presence and absence of a mixture of the ligands to identify clones which are capable of ligand-modulatable binding.
Those phage clones which are capable of ligand-modulatable binding are preferably tested in the presence of a mixture of the ligands, in order to deduce the optimum target nucleic acid or polypeptide sequence, for example using different or variant target sequences, or by the binding site signature method method for nucleic acid binding proteins (see Choo and Klug, (1994) PN4S (USA) 91 :11163-67). As described above, array technology may be employed in such a screen.
Where candidate nucleic acid binding or polypeptide molecules are used rather than molecules known or determined to have nucleic acid or polypeptide binding properties, the method of the invention preferably features a pre-selection step to remove candidate n binding molecules which do not require ligand to bind the nucleic acid or polypeptide.
Association of the candidate nucleic acid or polypeptide binding molecule with the target nucleic acid or polypeptide may be assessed by any suitable means known to those skilled in the art. For example, the nucleic acid or polypeptide may be immobilised by biotinylation and linking to beads such as streptavidin coated beads (Dynal). Preferably, the nucleic acid or polypeptide is immobilised on an array, so that a high throughput screen may be carried out. In a preferred embodiment wherein the nucleic acid or polypeptide binding molecules are phage displayed polypeptides, binding of said molecules to the nucleic acid or polypeptide may be assessed by eluting those phage which bind, and infecting logarithmic phase E.coli TGI cells. The presence of infective particles eluted from the nucleic acid indicates that association of the nucleic acid binding molecule(s) with the nucleic acid has occurred, or that association of the polypeptide binding molecule(s) with the polypeptide has occurred in the case of a protein switch. Alternatively, association of the candidate nucleic acid or polypeptide binding molecule(s) with the target nucleic acid or polypeptide may be assessed by Scintillation Proximity Assay (SPA). For example, the target nucleic acid or polypeptide could be biotinylated and immobilised to streptavidin coated SPA beads, and the candidate nucleic acid or polypeptide binding molecules may be radioactively labelled, for example with 33S- methionine where the molecules are polypeptides. Association of the candidate nucleic acid or polypeptide binding molecules with the target nucleic acid or polypeptide could then be assessed by monitoring the readout of the SPA. Alternatively, the association could be monitored by fluorescent resonance energy transfer (FRET). In this case, the target nucleic acid or polypeptide could be labelled with a donor fluor, and the nucleic acid binding molecule(s) or polypeptide(s) could be labelled with a suitable acceptor fluor. Whilst the two entities are separated, no FRET would be observed, but if association (binding) took place, then there would be a change in the amount of FRET observed, this allowing assessment of the degree of associaiton. Array techniques may be employed in such assessment: thus, the candidate nucleic acid binding molecules may be formed into an array, and binding to the labelled nucleic acid binding molecules assayed as described.
Furthermore, an in vivo assay, such as a TRAP assay described in Paraskeva et al (1998), Proc. Natl. Acad. Sci. USA, 95, 951-956, may be used for determining whether polypeptide interacts with RNA in vivo.
Association of the candidate nucleic acid or polypeptide binding molecule with the target nucleic acid or polypeptide may also be assessed by bandshift assays. Bandshift assays are conducted by measuring the mobility of one or more of the components of the assay, for example the mobility of the nucleic acid or polypeptide, as it is electrophoresed through a suitable gel such as a polyacrylamide or agarose gel, as is well known to those skilled in the art. In order to assess the association of the candidate nucleic acid or polypeptide binding molecule with the target nucleic acid or polypeptide, the mobility of the nucleic acid or polypeptide (as the case may be) could be measured in the presence and absence of the candidate binding molecule. If the mobility of the target nucleic acid or polypeptide is essentially the same in the presence or absence of the candidate binding molecule, then it may be inferred that the molecules do not associate, or that the association is weak. If the mobility of the nucleic acid or polypeptide is retarded in the presence of the candidate binding molecule, then it may be inferred that the candidate molecule is associating with or binding to the nucleic acid or polypeptide.
Association of the candidate nucleic acid or polypeptide binding molecule with the target nucleic acid or polypeptide may also be assessed using filter binding assays. For example, the target nucleic acid or polypeptide molecule may be immobilised on a suitable filter, such as a nitrocellulose filter. The candidate binding molecule may then be labelled, for example radioactively labelled, and contacted with the immobilised target nucleic acid or polypeptide. The binding of or association with the target nucleic acid or polypeptide may be assessed by comparing the amount of labelled candidate nucleic acid or polypeptide binding molecule which associates with the filter only to the amount of labelled candidate nucleic acid or polypeptide binding molecule which associates with the filter-immobilised target. If more labelled candidate nucleic acid or polypeptide binding molecule associates with the immobilised nucleic acid or polypeptide than with the filter only, it may be inferred that the target molecule does indeed associate with the candidate binding molecule.
Binding affinities may be estimated by any suitable means known to those skilled in the art. Binding affinities for the purposes of this invention may be absolute or may be relative. Binding affinities may be determined biochemically, or may simply be estimated
by assessing the association of the candidate nucleic acid or polypeptide binding molecule with the target nucleic acid or polypeptide as described above. As used herein, the term binding affinity may refer to a simple estimation of the association of one component of the system with another.
Another suitable detection method for nucleic acid binding proteins is the use of target nucleic acid sequences linked to reporter constructs, such as bacterial luciferase or lacZ. Preferably, the reporter gene product can be measured using optical detection techniques. By way of example, a multiarray format could be used with a different candidate ligand in each position in the array (such as a microtitre plate well) and the same library of zinc fmger proteins and target nucleic acid sequences at each position. The zinc finger proteins will generally be fused to a transcriptional activation domain such as the GAL4 acidic activation domain. Transcription may then be compared in the various wells and wells showing a variation in transcription compared to a control well with no ligand may be selected and the ligand further tested to identify specific target sequences/zinc fmger proteins whose interaction is affected. These further tests may again be performed using an array format in which this time the ligand is kept constant and the target sequence/zinc fingers varied. Phage display techniques as described above may be used to simplify the isolation of suitable zinc finger proteins. Although described in the context of zinc fingers, this method could be applied to other nucleic acid binding molecules.
Particular assays to determine if and the extent to which a ligand modulates protein-protein interactions include a fluorescence polarization assay, as described in Keenan et al (1998) Bioorganic and Medicinal Chemistry 6, 1309-1335. Other assays described in Keenan et al (supra) include assays for inducible Fas activation and for inducible transcriptional activation.
Briefly, in the inducible Fas activation assay, two fusion proteins are constructed, each comprising amino acids 175 to 304 of human Fas together with a first polypeptide or a second polypeptide respectively. Cell line clones expressing both constructs are plated in 96-well plates and treated the next day with serial dilutions of compound (ligand or candidate ligand) at typicaly 1 μM maximum concentration. Wells are assayed the next day
for viability with for example Alamar Blue or Trypan Blue. Controls can include, for example, untransfected cells. In the transcriptional activation assay, transcription factor fusions are expressed from the tricistronic vector pCGNN-F3p65/ZIF3 Neo. An HT1080 cell line (ATCC CCL-121) cell line which contains an integrated secreted alkaline phosphatase (SEAP) target gene under control of a minimal interleukin 2 gene promoter and 12 ZFHD1 binding sites is generated as described in Rivera et al (1996) Nature Med 2, 1028. This cell line is transiently transfected with fusion protein expressing construct, with and without incubation for 18-24 hours with the ligand or candidate ligand. Cell supernatant is removed and assayed for SEAP activity using any suitable phosphatase assay, for example, the assay described in Rivera et al (supra), taking into account background SEAP activity (as measured from mock transfected HT1080 cells).
Ligand mediated protein-protein interaction may also be assayed by way of a modified two hybrid assay. Thus, two fusion protein constructs are made, one comprising one of a pair of protein binding partners and the GAL4 binding domain, and the other comprising the other of the pair of protein binding partners and the VP 16 activation domain. Expression of a reporter gene, for example, beta-galactosidase, is measured in the presence and absence of the candidate ligand.
It is envisaged that the methods of the invention may be applied in vivo, for example they could be applied to the selection or isolation of nucleic acid or polypeptide binding molecules capable of associating with target nucleic acid or polypeptide in vivo inside one or more cells, in a manner analagous to the one-hybrid system.
It is envisaged that the methods of the invention may be practised in parallel. For example, multiple target nucleic acids or polypeptides could be used in a single selective step, thereby enabling multiple nucleic acid or polypeptide binding molecules to be isolated simultaneously, even in the same physical vessel. The multiple nucleic acid or polypeptide binding molecules may preferably be different from one another. The multiple nucleic acid or polypeptide binding molecules may have similar or identical binding specificities, or may preferably have different binding specificities.
The invention may be worked using multiple ligands, either separately or in combination. For example, a target nucleic acid or polypeptide sequence may be used to isolate binding molecules according to the methods essentially as disclosed above, with the modification that more than one ligand may be present. In this way, it is possible to isolate multiple nucleic acid or polypeptide binding molecules which require different ligands to bind to the same target nucleic acid or polypeptide sequence(s).
By way of example, a particular embodiment of the method of the invention is as follows :-
1. Bacterial colonies containing phage libraries that express a library of zinc fingers randomised at one or more nucleic acid binding residues (see above) are transferred from plates to culture medium. Bacterial cultures are grown overnight at 30°C. Culture supernatant containing phages is obtained by centrifugation.
2. 10 pmol of biotinylated target nucleic acid immobilised on 50 mg streptavidin beads (Dynal) is incubated with 1 ml of the bacterial culture supernatant diluted 1 : 1 with PBS containing 50 μM ZnCl2, 4% Marvel, 2% Tween for 1 hour at 20°C on a rolling platform as a preselection step to remove phage that bind to the target nucleic acid in the absence of a ligand.
3. After this time, 0.5 ml of phage solution is transferred to a streptavidin coated tube and incubated with biotinylated nucleic acid target site in the presence of a candidate ligand and 4 μg poly [d(I-C)]. After a one hour incubation the tubes are washed 20 times with PBS containing 50 μM ZnCtø and 1% Tween, and 3 times with PBS containing 50 μM ZnCl2 to remove non-binding phage.
4. The remaining phage are eluted using 0.1 ml 0.1 M triethylamine and the solution is neutralised with an equal volume of 1 M Tris-Cl (pH 7.4).
5. Logarithmic-phase E. coli TGI cells are infected with eluted phage, and grown overnight, as described above, to prepare phage supernatants for subsequent rounds of selection.
6. After 4 rounds of selection (steps 1 to 5), bacteria are plated and phage prepared from 96 colonies are screened for binding to the nucleic acid target site in the presence and absence of the ligand. Binding reactions are carried out in wells of a streptavidin-coated microtitre plate (Boehringer Mannheim) and contain 50 μl of phage solution (bacterial culture supernatant diluted 1 :1 with PBS containing 50 μM ZnCta, 4% Marvel, 2% Tween), 0.15 pmol nucleic acid target site and 0.25 μg poly [d(I-C)]. When added, the ligand is present at a concentration of about 1 μM.
7. After a one hour incubation the wells are washed 20 times with PBS containing 50 μM ZnC and 1% Tween (and also ligand at a concentration of lμM where appropriate), and 3 times with PBS containing 50 μM ZnCtø.
8. Bound phage are detected by ELISA (carried out in the presence of the ligand at a concentration of about 1 μM where appropriate) with horseradish peroxidase- conjugated anti-M13 IgG (Pharmacia Biotech) and quantitated using SOFTMAX 2.32 (Molecular Devices).
9. Single colonies of transformants obtained after four rounds of selection as described, are grown overnight in culture. Single-stranded nucleic acid is prepared from phage in the culture supernatant and sequenced using the Sequenase 2.0 kit (U.S. Biochemical Corp.). The amino acid sequences of the zinc finger clones are deduced .
By way of example, yet another particular embodiment of the method of the invention is as follows:
1. Bacteria containing a phage display library of zinc fingers randomised at one or more DNA binding residues (see section A.) are transferred from plates to culture
medium. Bacterial cultures are grown overnight at 30°C. Culture supernatant containing phage is obtained by centrifugation and 200μl is aliquoted into each well of a 96 well micro titre plate.
2. 96 different target DNA sequences are arrayed into the wells of a microtitre plate. At least 8 replica arrays (array plates) comprising lOpmol DNA in wells of a streptavidin coated microtitre plate are produced for use in different rounds of selections. In this step and steps 3-4 of each round it may be convenient to use a robotic liquid handling system such as the Biomek FX (Beckman Coulter), in order to add reagents or transfer contents of a well to the corresponding well on a different plate (replica-transfer).
3. The bacterial culture supernatant in each well (from step 1 or step 5) is diluted 1 :1 with PBS containing 50 μM ZnCl2, 4% Marvel, 2% Tween. The phage mixture is replica-transferred into an array plate and binding to DNA is allowed to proceed for 1 hour at 20°C as a preselection step to remove phage that bind the target DNA in the absence of a ligand.
4. 90μl of phage solution (i.e. non-retained phage) from each well is then replica-transferred into an array plate, but this time in the presence of a candidate DNA binding ligand and lμg poly [d(I-C)]. After a one hour incubation the wells are washed 20 times with PBS containing 50 μM ZnCl2 and 1% Tween, and 3 times with PBS containing 50 μM ZnCl2 to remove non-binding phage. These washes may be performed by an automated plate washing system such as the Elx405 (Bio-Tek). After washing, the plate wells are emptied of all solutions.
5. The retained phage are eluted using lOμl 0.1 M triethylamine, replica- transferred to a micotitre plate and the solutions neutralised with an equal volume of 1 M Tris-Cl (pH 7.4).
6. Eluted phage are replica-transferred to a deep-well/96-well plate in which wells contain logarithmic-phase E. coli TGI cells. Infections are allowed to proceed for lh at 37oC, and cells are collected by centrifugation. The supernatant is discarded and
replaced with 1ml 2xTY medium containing Tet, and the bacteria are grown overnight to prepare phage supernatants for subsequent rounds of selection.
7. After 4 rounds of selection (steps 3 to 5), bacteria from each well (i.e. each different selection) are plated and phage prepared from 36 colonies are screened for binding to the corresponding DNA target site in the presence and absence of the ligand. Binding reactions are carried out in wells of a streptavidin-coated microtitre plate (Boehringer Mannheim) and contain 50 μl of phage solution (bacterial culture supernatant diluted 1 :1 with PBS containing 50 μM ZnCl2, 4% Marvel, 2% Tween), 0.15 pmol DNA target site and 0.25 μg poly [d(I-C)]. When added, the DNA binding ligand is present at a concentration of about 1 μM.
8. After a one hour incubation the wells are washed 20 times with PBS containing 50 μM ZnCl2 and 1% Tween (and also ligand at a concentration of 1 μM where appropriate), and 3 times with PBS containing 50 μM ZnCl2.
9. Bound phage are detected by ELISA (carried out in the presence of the ligand at a concentration of about 1 μM where appropriate) with horseradish peroxidase- conjugated anti-M13 IgG (Pharmacia Biotech) and quantitated using SOFTMAX 2.32 (Molecular Devices).
10. Single colonies of transformants obtained after four rounds of selection as described, are grown overnight in culture. Single-stranded DNA is prepared from phage in the culture supernatant and sequenced using the Sequenase™ 2.0 kit (U.S. Biochemical Corp.). The amino acid sequences of the zinc fmger clones are deduced.
A modification of the above examples may be used to select polypeptide binding proteins. Briefly, bacteria containing phage libraries expressing a library of polypeptide binding proteins randomised at one or more residues as described above are screened against a biotinylated target polypeptide or protein, which has been immobilised on streptavidin coated beads, essentially as described above. Unbound phage are washed, and bound phage are eluted and used to infect E.coli cells. After several rounds of selection,
each round involving the above steps, phage are prepared and screened for binding to the target polypeptide or protein in the presence and absence of the ligand. Bound phage are detected by ELISA and identified, and the corresponding colonies are amplified, and the DNA sequence of the polypeptide binding proteins are deduced. A modification of the above example using arrays of sequences in the wells of a microtitre plate may also be used to select protein switches.
In the above examples, only one target nucleic acid or polypeptide sequence was used. Where a library of nucleic acid or polypeptide sequences is used, the library of sequences can be screened using the ligand and selected phage expressing the zinc finger or other protein of interest to identify specific target nucleic acid or polypeptide sequences. This may conveniently be carried out with the nucleic acid or polypeptide sequences arrayed onto a solid substrate.
In the above example, the nucleic acid or polypeptide binding molecules (e.g., zinc fingers ) are present on phage. However, alternative methods for displaying the nucleic acid or polypeptide binding molecules could be used. As descibed above, an entirely in vitro polysome display system has also been reported (Mattheakis et al, (1994) Proc Natl Acad Sci U S A, 91, 9022-6) in which nascent peptides are physically attached via the ribosome to the RNA which encodes them. Using a library of RNA/ribosomes expressing the nucleic acid or polypeptide binding molecules, screening is performed in a similar manner to the phage display method except that typically, after an initial preselection step to remove nucleic acid or polypeptide binding molecules that bind in the absence of the ligand only one selection step is performed and the resulting nucleic acid or polypeptide binding molecules identified by cloning the RNA from the RNA/ribosome complexes and sequencing the clones obtained.
To assist in isolating and/or identifying complexes comprising a target nucleic acid, a nucleic acid binding molecule and a ligand (or in the case of protein switches, a target protein, a polypeptide binding protein and a ligand), it may be desirable to label one or more of the components with a detectable label. For example, the nucleic acid or polypeptide may be labelled with a fluorescent tag and the nucleic acid (or polypeptide)
binding molecule labelled with biotin, such that an enzyme conjugate such as streptavidin- horse radish peroxidase (HRP), that catalyses an optically detectable change in a substrate (different from the fluorescent tag) can be used. If the ligand is attached to a bead, then tripartite complexes can be detected because they will both fluoresce and give HRP activity. Labelling of one or more of the components with a detectable label is especially useful where screening is being performed on array(s) of the(se) component(s).
A further method which is useful where multiple candidate ligands are to be screened involves the use of beads to which are attached different peptide tags. Known combinatorial chemistry techniques are used to produce a library of beads whereby the peptide tag can be used to identify unambiguously the ligand attached to the same bead. Complexes comprising the ligand, a target nucleic acid and a nucleic acid binding molecule (or a ligand, a target polypeptide and a polypeptide binding protein) can be identified by the use of labelled target and binding molecules as described above. Beads comprising a tripartite complex can then be selected and the identity of the tag determined by spectroscopy techniques which will then give the identity of the ligand.
In general, a bead format is advantageous since it allows easier isolation of productive tripartite complexes and prescreening.
We describe a method by which nucleic acid binding molecules according to the invention may be advantageously used to determine the sequence composition of a sample of target nucleic acid. For example, a nucleic acid binding molecule according to the invention may be prepared which binds to a known target nucleic acid sequence. By applying this molecule to, or contacting it with, one or more test nucleic acid samples and monitoring its binding thereto, it is possible to determine whether said nucleic acid sample(s) contain the cognate nucleic acid recognition site of the nucleic acid binding molecule, and therefore derive information about the nucleotide composition of said nucleic acid test sample(s). Such analyses may be advantageously conducted using the binding site signature method (see Choo and Klug, (1994) PNAS (USA) 91:11163-67). Where a plurality of nucleic acid samples is being tested for possession of the cognate
sequence, they may usefully be disposed in the form of an array for high throughput analysis.
Individual phage clones could advantageously be assayed for binding of their cognate nucleic acid sequence(s) in the presence or absence of individual ligands, to monitor which particular ligand modulates binding, i.e., binding between the nucleic acid and the nucleic acid binding molecule, or binding between a protein and a polypeptide binding molecule such as a protein.
Clearly, it may be that more than one ligand modulates binding of nucleic acid or polypeptide binding molecules to their cognate nucleic acid or polypeptide sequence(s). Preferably, individual nucleic acid or polypeptide binding molecules (ie. phage clones) may be assayed for binding to target sequence(s) in the presence of discrete ligand mixtures, wherein each ligand mixture preferably contains a unique mixture of ligands. In this way, the particular ligands which may modulate binding of a particular nucleic acid or polypeptide binding molecule to its cognate target sequence may advantageously be determined. For example, if it is found that two mixtures - one lacking ligand X and the other lacking ligand Y - are incapable of inducing binding, then a mixture of ligands X and Y may have the effect of moduating the binding. This could advantageously be further investigated according to the methods of the invention as described herein.
It is envisaged that this invention may be advantageously used in the isolation of a ligand that is capable of modulating the association of a particular nucleic acid binding molecule or a particular -polypeptide binding molecule with its target nucleic acid or polypeptide sequence.
We describe a method for isolating one or more ligands, said ligands each binding one or more target nucleic acid or polypeptide sequence(s), wherein said binding to one or more target nucleic acid or polypeptide sequence(s) modulates the binding of one or more nucleic acid or polypeptide binding molecules respectively, and wherein said nucleic acid or polypeptide binding molecule(s) and said ligands are different, said method comprising: (a) providing one or more target nucleic acid or polypeptide molecule(s); (b) contacting the
target nucleic acid or polypeptide molecule(s) with one or more nucleic acid or polypeptide binding molecule(s), (c) providing a library of candidate ligands, (d) assessing the ability of candidate ligands to modulate the association of the nucleic acid or polypeptide binding molecule(s) with the respective target nucleic acid or polypeptide molecule(s); and (e) isolating those candidate ligands which modulate the association of the nucleic acid or polypeptide binding molecule(s) with the target nucleic acid or polypeptide molecule(s). One, two, or all three of the ligands, target nucleic acids and nucleic acid binding molecules may be in the form of an array.
In order to remove nucleic acid or polypeptide binding molecules (for example phage displayed polypeptides) which bind nucleic acid or polypeptide in a ligand- independent manner from a library, a pre-selection step may optionally be performed in the absence of ligand prior to each round of selection. This step removes from the library those clones which do not require ligand for nucleic acid or polypeptide binding. Optionally, candidate molecules selected in this manner may be screened by ELISA for binding to the nucleic acid or polypeptide target in the presence or absence of the ligand(s).
It is envisaged that the methods of the current invention may be advantageously applied to the selection of nucleic acid binding molecules capable of binding nucleic acids other than DNA, for example RNA. Structural considerations of RNA binding molecules are discussed in Afshar et al (Afshar et al, 1999: Curr. Op. Biotech, vol 10 pages 59-63). In particular, ligands suitable for use in the methods of the invention as applied to RNA include those ligands described above, or may be selected from aminoglycosides and their derivatives such as paromomycin, neomycin (for examples see Park et al, 1996: J. Am. Chem. Soc. vol 118 ppl0150-10155); aminoglycoside mimetics (Tok and Rando 1998: J. Am. Soc. Chem. vol 120 pp 8279-8280); acridine derivatives (for examples see Hamy et al, 1998: Biochemistry vol 37 pp5086-5095); small peptides ('aptamers'); polycationic compounds (for examples see Wang et al, 1998: Tetrahedron 54 pp7955-7976) or any other nucleic acid binding molecules known to those skilled in the art. In a preferred embodiment, derivatives or libraries of said nucleic acid binding ligands may be prepared.
Accordingly, we describe a method for isolating an RNA binding molecule which binds to a target RNA molecule in a manner modulatable by an RNA-binding ligand, wherein said RNA-binding ligand and said RNA-binding molecule are different, said method comprising; providing a target RNA molecule; (a) contacting the target RNA molecule with an RNA-binding ligand, to produce an RNA-ligand complex; (b) assessing the ability of candidate RNA-binding molecules to bind the target RNA molecule and the RNA-ligand complex; and isolating those candidate RNA-binding molecules which bind the target RNA molecule and RNA-ligand complex with different binding affinities. The candidate RNA-binding molecules may be in the form of an array.
It is further envisaged that the methods of the invention may be advantageously used to select nucleic acid sequences which allow binding of a particular ligand/nucleic acid binding molecule combination, or alternatively or to select polypeptidesequences which allow binding of a particular ligand/polypeptide binding protein combination. For example, one may wish to isolate particular nucleic acid sequences to which a given nucleic acid binding molecule is able to bind, or particular polypeptide sequences to which a given polypeptide binding molecule is able to bind, or to isolate only those nucleic acid or polypeptide sequences which depend on the presence of ligand for the nucleic acid or polypeptide binding molecule to associate with them.
Accordingly, we describe a method for isolating target nucleic acid sequences to which a particular nucleic acid binding molecule will bind, said method comprising providing a library of target nucleic acid molecule(s); contacting said nucleic acid molecules with a nucleic acid binding molecule in the presence or absence of ligand assessing the ability of the candidate target nucleic acid molecule(s) to bind the nucleic acid binding molecule; and isolating those target nucleic acid molecules which bind the nucleic acid binding molecule. We also describe a method for isolating target polypeptide sequences to which a particular polypeptide binding molecule will bind, said method comprising providing a library of target polypeptide molecule(s); contacting said polypeptide molecules with a polypeptide binding molecule in the presence or absence of ligand assessing the ability of the candidate target polypeptide molecule(s) to bind the
polypeptide binding molecule; and isolating those target polypeptide molecules which bind the polypeptide binding molecule.
A library of target nucleic acid or polypeptide molecule(s) according to the invention may preferably comprise a plurality of different nucleic acid or polypeptide molecules; preferably said nucleic acid or polypeptide molecules may be related to one another in terms of sequence homology.
The library of target nucleic acid molecule(s) may advantageously be in the form of an array of target nucleic acid molecule(s).
A library of candidate nucleic acid or polypeptide binding molecule(s) according to the invention may preferably comprise a plurality of different candidate nucleic acid or polypeptide binding proteins; preferably said candidate nucleic acid or polypeptide binding proteins may be related to one another in terms of amino acid sequence homology. The library of candidate nucleic acid molecule(s) may advantageously be in the form of an array of candidate nucleic acid molecule(s).
It is envisaged that this method could be advantageously used in order to isolate nucleic acid or polypeptide sequences which require ligand to associate with a known nucleic acid or polypeptide binding molecule. For example, there may be a nucleic acid or polypeptide sequence which is bound by a known nucleic acid or polypeptide binding molecule in a ligand-independent manner, and it may be desirable to find a nucleic acid or polypeptide sequence(s) which can also associate with the same wild-type nucleic acid or polypeptide binding molecule, but which do so in a ligand-modulatable manner. Preferably, this may be accomplished according to the above method of the present invention.
USES
The assay methods of the invention may be used to identify nucleic acid or polypeptide binding molecules, ligands and/or target nucleic acid or polypeptide where the binding the binding molecule to the target is modulatable by the ligand.
These components, such as nucleic acid binding proteins according to the invention and identified by the assay methods of the invention, may be used individually or in combination in a wide variety of applications.
Thus, nucleic acid or polypeptide binding proteins according to the invention and identified by the assay methods of the invention may be employed in a wide variety of applications, including diagnostics and as research tools. Advantageously, they may be employed as diagnostic tools for identifying the presence of particular nucleic acid or polypeptide molecules in a complex mixture. Nucleic acid or polypeptide binding molecules according to the invention can preferably differentiate between different target nucleic acid or polypeptide molecules, and their binding affinities for the nucleic acid or polypeptide target sequences are preferably modulated by ligand(s). Nucleic acid or polypeptide binding molecules according to the invention are useful in switching or modulating gene expression, especially in gene therapy applications and agricultural biotechnology applications as described below.
In general, the polypeptides, nucleic acids, nucleic acid binding molecules and ligands may be used generally in regulating any biological process. In particular, the polypeptides, etc are suitable for regulating any biological process which is dependent on interaction between a nucleic acid binding molecule and a nucleic acid, or which is dependent on interaction between a polypeptide and another polypeptide. Examples of such biological processes include enzyme functions, signal transduction, protein and nucleic acid trafficking, macromolecular assembly, antibody-antigen interactions, DNA / gene transcription, translation, phosphorylation, methylation, replication, restriction, modification, ligation, transport, degradation, editing, splicing, integration and recombination, etc.
Specifically, targeted nucleic acid or polypeptide binding molecules, such as zinc fingers, according to the invention may moreover be employed in the regulation of gene transcription, for example by specific cleavage of nucleic acid sequences using a fusion polypeptide comprising a zinc finger targeting domain and a nucleic acid cleavage domain, or by fusion of an transcriptional effector domain to a zinc finger, to activate or repress transcription from a gene which possesses the zinc finger binding sequence in its upstream sequences.
A polypeptide binding protein according to the invention fused to a transcriptional effector domain may be used to target proteins bound to particular gene regulatory sequences such as promoters or enhancers, to turn on or off transcription of a gene. Gene transcription may also be increased or decreased from a promoter or enhancer containing zinc fmger binding sequences, by making use of a fusion protein comprising a zinc finger fused to a polypeptide binding protein and another fusion protein comprising a protein which binds to the polypeptide binding protein fused to a transcriptional effector domain, for example, VP 16.
Preferably, activation or repression only occurs in the presence of the ligand, since in a preferred embodiment the zinc fingers or polypeptide binding proteins will not bind their targets in the absence of the ligand. Alternatively, activation only occurs in the absence of the ligand, since the zinc fingers or polypeptide binding proteins may not bind their target nucleic acid or polypeptide sequences in the presence of the ligand. Zinc fingers capable of differentiating between U and T may be used to preferentially target RNA or nucleic acid, as required. Where RNA-targeting polypeptides are intended, these are included in the term "nucleic acid binding molecule".
Thus nucleic acid or polypeptide binding molecules according to the invention will typically require the presence of a transcriptional effector domain, such as an activation domain or a repressor domain. Examples of transcriptional activation domains include the VP16 and VP64 transactivation domains of Herpes Simplex Virus. Alternative transactivation domains are various and include the maize Cl transactivation domain sequence (Sainz et al, 1997, Mol. Cell. Biol. 17: 115-22) and PI (Goff et al, 1992, Genes
Dev. 6: 864-75; Estruch et al, 1994, Nucleic Acids Res. 22: 3983-89) and a number of other domains that have been reported from plants (see Estruch et al, 1994, ibid).
Instead of incorporating a transactivator of gene expression, a repressor of gene expression can be fused to the nucleic acid binding protein or polypeptide binding protein and used to down regulate the expression of a gene contiguous or incorporating the nucleic acid binding protein target sequence, or a gene bound by the target polypeptide of the polypeptide binding protein as described above. Such repressors are known in the art and include, for example, the KRAB-A domain (Moosmann et al., Biol. Chem. 378: 669-677 (1997)) the engrailed domain (Han et al, Embo J. 12: 2723-2733 (1993)) and the snag domain (Grimes et al, Mol Cell. Biol. 16: 6263r6272 (1996)). These can be used alone or in combination to down-regulate gene expression.
Another possible application is the use of zinc fingers fused to nucleic acid cleavage moieties, such as the catalytic domain of a restriction enzyme, to produce a restriction enzyme capable of cleaving only target nucleic acid of a specific sequence (see Kim et al, (1996) Proc. Natl. Acad. Sci. USA 93:1156-1160). Using such approaches, different nucleic acid binding domains can be used to create restriction enzymes with any desired recognition nucleotide sequence, but which cleave nucleic acid conditionally dependent on the presence or absence of a particular ligand, for instance Distamycin A. It may also be possible to use enzymes other than those that cleave nucleic acids for a variety of purposes.
Moreover, many catalytic polypeptides are known including naturally-occuring proteins such as enzymes or engineered proteins such as catalytic antibodies. In addition many catalytic RNAs are being developed by processes such as SELEX. It is a further application of this technology that tripartite systems can be isolated comprising a catalytically active first molecule and a substrate second molecule wherein the reaction between these components is modulated by a ligand.
In a preferred embodiment, the zinc fmger polypeptides of the invention may be employed to detect the presence or absence of a particular target nucleic acid sequence in a
sample. Similarly, the polypeptide binding proteins of the invention may be used to detect the presence or absence of a particular target polypeptide sequence in a sample.
We therefore describe a method for determining the presence of a target nucleic acid molecule, comprising the steps of: (a) preparing a nucleic acid binding protein by the method set forth above which is specific for the target nucleic acid molecule; (b) exposing a test system which may comprise the target nucleic acid molecule to the nucleic acid binding protein under conditions which promote binding, and removing any nucleic acid binding protein which remains unbound; (c) detecting the presence of the nucleic acid binding protein in the test system. To detect the presence of a target protein in a sample. the following steps may be taken: (a) preparing a polypeptide binding protein by the method set forth above which is specific for the target polypeptide molecule; (b) exposing a test system which may comprise the target polypeptide molecule to the polypeptide binding protein under conditions which promote binding, and removing any polypeptide binding protein which remains unbound; (c) detecting the presence of the polypeptide binding protein in the test system.
Other uses of the components of the gene switches may be envisaged, for example as described elsewhere in this document.
Thus, the methods disclosed here are suitable for screening of arrays of DNA with a known transcription factor, and a known or library of ligands for identification of new molecules which potentially modulate DNA-transcription factor interaction. Where one target DNA is used, this is preferably a natural or known target of the DNA binding molecule; thus, our method is capable of selecting for molecules which alter or modulate the interaction between a transcription factor and its DNA target.
In particular, RNA switches (i.e., gene switches comprising an RNA component, an RNA binding component and a ligand) may be used to modulate gene expression. For example, an RNA switch may be used to disrupt translation of an mRNA which comprises a binding site for the RNA binding component. Addition of ligand may cause binding of the RNA binding molecule to bind to the RNA, and hence prevent or inhibit translation.
by, for example, steterically hindering ribosomes or tRNA binding. The binding site may be located in the coding sequence, or at the 5' or 3' UTR.
Furthermore, it has been demonstrated that RNA interactions with small molecules (e.g., ligands) may be used to control gene expression (Werstuck and Green, 1998, Science 282, 296-298). In this paper, short RNA aptamers that specifically bind to a wide variety of ligands in vitro axe isolated from randomised pools of RNA, and are shown to bind their ligands in vivo. Insertion of a small molecule aptamer into the 5' UTR of a mRNA allows its translation to be repressible by ligand addition in vitro as in vivo. Accordingly, we envisage that RNA binding ligands isolated according to the methods of our invention may be used in controlling gene regulation in the system described in Werstuck and Green, as well as similar systems. Indeed, the ligands isolated according to our invention would be expected to be particularly useful, as in some embodiments they are capable of binding to both RNA as well as RNA binding molecules (e.g., RNA binding proteins). The binding of the ligands to the mRNA via the binding site at the 5' UTR allows recruitment of one or more RNA binding proteins, thus increasing steric hindrance.
The methods of selecting gene switches, particularly using arrays, may also be used in the treatment of diseases. It is known that certain diseases are associated with up- regulation or down-regulation of particular genes. An array comprising DNA sequences (including regulatory sequences, for example, promoters and enhancers) of genes of interest may be made. Such an array may be contacted with a cellular extract, for example, a nuclear extract, from a diseased patient, and also with a corresponding extract from a normal, undiseased patient. Changes in promoter occupancy in the diseased patient may be identified for example by probing the arrays with suitable probes against proteins of interest (for example, antibodies against particular transcription factors), and detecting transcription factors bound to the DNA sequences (e.g., promoter or enhancer sequences, etc). The transcription factor / DNA sequence binding pairs may additionally or separately be screened against one or a library of candidate ligands which are capable of modulating the interaction between the transcription factor and the DNA. Such ligands are suitable candidates for treatment of the particular diseases.
Components of protein switches, and protein switches themselves, may be used in various ways. Protein protein interactions underpin a wide variety of biological processes. These biological processes include signal transduction, which may be brought about by dimerisation of receptors. Thus, the methods of our invention may be used to select ligands which are capable of modulating the association between a first receptor molecule and a second receptor molecule, to regulate any process invovled in signal transduction. Furthermore, protein-protein interactions are also involved in intracellular trafficking of proteins, nucleic acids, etc, and the methods of our invention may be used to identify ligands which are capable of modulating these processes. These ligands may be used as therapeutics and administered to a cell, tissue, organ or patient, to regulate processes involving protein-protein interactions, which processes are associated with a diseased state. For example, a disease may be identified caused by or associated with decreased signal transduction. A ligand may be identified which is capable of promoting dimerisation of receptors. Such a ligand may be used to treat that disease.
Many other processes involving protein-protein interactions are known in the art, and will be apparent to the skilled reader. Diseases associated with reduced, enhanced, or otherwise aberrant protein-protein interactions in these processes are also known, or readily identified. Assays for protein-protein interactions are also known in the art. Examples of systems to measure such interaction include, inter alia, the yeast two-hybrid system (see, e.g., Fields, Nature 340(6230):245-6 (1989) and Finley, R. L. JR & Brent R. (1996) in DNA Cloning—Expression Systems: A Practical Approach, eds. Glover D. & Hames B. D (Oxford University Press, Oxford, England), pp. 169-203), immunoprecipitation (see, e.g., Current Protocols in Molecular Biology Volumes 2, .sctn.10.16, John Wiley & Sons, Inc. (1994-1998)), or the use of various sequence tags (e.g., TAG, His, etc.) that allow for the isolation of a polypeptide under nondenaturing conditions (see, e.g., Chen & Hai Gene 139(l):73-5 (1994); and Current Protocols in Molecular Biology Volumes 2, .sctn..sctn.l0.1 A-B, 10.15, John Wiley & Sons, Inc. (1994- 1998)).
REGULATION OF GENE EXPRESSION IN VIVO
In a particularly preferred embodiment of the present invention, nucleic acid binding molecules capable of binding to a target nucleic acid in a manner modulatable by a ligand, as well as the polypeptide binding molecules capable of binding to a target polypeptide in a manner modulatable by a ligand, are used to regulate expression from a gene in vivo.
The target gene may be endogenous to the genome of the cell or may be heterologous. However, in either case it will comprise a target nucleic acid sequence, such as a target nucleic acid sequence described above, to which a nucleic acid binding molecule of the invention binds in a manner modulatable by a ligand, or which is bound by the complex consisting of a polypeptide and a polypeptide binding protein. Where the nucleic acid binding molecule is a polypeptide, it may typically be expressed from a nucleic acid construct present in the host cell comprising the target sequence. A polypeptide binding protein may similarly be expressed. Such a nucleic acid construct is preferably stably integrated into the genome of the host cell, but this is not essential.
Thus in the case of polypeptide nucleic acid binding molecules, a host cell according to the invention comprises a target nucleic acid sequence and a construct capable of directing expression of the nucleic acid binding molecule in the cell. If a polypeptide binding protein is used, the host cell may comprise a target nucleic acid sequence and a construct capable of directing expression of the polypeptide binding molecule in the cell.
Suitable constructs for expressing the nucleic acid or polypeptide binding molecule are known in the art and are described above. The coding sequence may be expressed constitutively or be regulated. Expression may be ubiquitous or tissue-specific. Suitable regulatory sequences are known in the art and are also described above. Thus the nucleic acid construct will comprise a nucleic acid sequence encoding a nucleic acid binding molecule or a or polypeptide binding molecule operably linked to a regulatory sequence capable of directing expression of the nucleic acid or polypeptide binding molecule in a host cell.
1.24
It may also be desirable to use target nucleic acid sequences that include operably linked neighbouring sequences that bind transcriptional regulatory proteins, such as transactivators. Preferably the transcriptional regulatory proteins are endogenous to the cell. If not, they typically will need to be introduced into the host cell using suitable nucleic acid constructs. *
Techniques for introducing nucleic acid constructs into host cells are known in the art for both prokaryotic and eukaryotic cells, including yeast, fungi, plant and animal cells. Many of these techniques are mentioned below in the section on the production of transgenic organisms.
Regulation of expression of the gene of interest which comprises a second coding sequence operably linked to the target nucleic acid sequence is typically achieved by administering to the cell a ligand according to the invention. Typically, the ligand is a molecule such as Distamycin A which may be administered exogenously to the cell and taken up by the cell whereupon it may contact the nucleic acid or polypeptide binding molecule and modulate its binding directly or indirectly to the target sequence. For example, two proteins may interact in such a way that they bind to the target sequence only when bound to each other (i.e., dimerised), in which case an antibody which modulates the interaction between two protein binding partners may be used to modulate binding of the proteins to the target sequence. Such antibody ligands may be identified by screening a library of randomised antibodies with the methods of our invention. However polypeptide ligands may also be introduced into the cell either directly or by introducing suitable nucleic acid vectors, including viruses.
The target nucleic acid sequence and the nucleic acid construct encoding the nucleic acid or polypeptide binding molecule are preferably stably integrated into the genome of the host cell. Where the host cell is a single celled organism or part of a multicellular organism, the resulting organism may be termed transgenic. The target nucleic acid may, in a preferred embodiment, be a naturally occurring sequence for which a corresponding nucleic acid or polypeptide binding molecule and ligand have been identified using the screening methods of the invention.
The term "multicellular organism" here denotes all multicellular plants, fungi and animals except humans, i.e. prokaryotes and unicellular eukaryotes are excluded specifically. The term also includes an individual organism in all stages of development, including embryonic and fetal stages. A "transgenic" multicellular organisms is any multicellular organism containing cells that bear genetic information received, directly or indirectly, by deliberate genetic manipulation at the subcellular level, such as by microinjection or infection with recombinant virus. Preferably, the organism is transgenic by virtue of comprising at least a heterologous nucleotide sequence encoding a nucleic acid binding molecule (or a polypeptide binding molecule) or target nucleic acid as herein defined.
"Transgenic" in the present context does not encompass classical crossbreeding or in vitro fertilization, but rather denotes organisms in which one or more cells receive a recombinant nucleic acid molecule. Transgenic organisms obtained by subsequent classical crossbreeding or in vitro fertilization of one or more transgenic organisms are included within the scope of the term "transgenic" .
The term "germline transgenic organism" refers to a transgenic organism in which the genetic information has been taken up and incorporated into a germline cell, therefore conferring the ability to transfer the information to offspring. If such offspring, in fact, possess some or all of that information, then they, too, are transgenic multicellular organisms within the scope of the present invention.
• The information to be introduced into the organism is preferably foreign to the species of animal to which the recipient belongs (i.e., "heterologous"), but the information may also be foreign only to the particular individual recipient, or genetic information already possessed by the recipient. In the last case, the introduced gene may be differently expressed than is the native gene.
"Operably linked" refers to polynucleotide sequences which are necessary to effect the expression of coding and non-coding sequences to which they are ligated. The nature of such control sequences differs depending upon the host organism; in prokaryotes, such control sequences generally include promoter, ribosomal binding site, and transcription
termination sequence; in eukaryotes, generally, such control sequences include promoters and a transcription termination sequence. The term "control sequences" is intended to include, at a minimum, components whose presence can influence expression, and can also include additional components whose presence is advantageous, for example, leader sequences and fusion partner sequences.
Since the nucleic acid constructs are typically to be integrated into the host genome, it is important to include sequences that will permit expression of polypeptides in a particular genomic context. One possible approach would to use homologous recombination to replace all or part of the endogenous gene whose expression it is desired to regulate with equivalent sequences comprising a target nucleic acid in its regulatory sequences. This should ensure that the gene is subject to the same transcriptional regulatory mechanisms as the endogenous gene, with the exception of the target nucleic acid sequence. Alternatively, homologous recombination may be used in a similar manner but with the regulatory sequences also replaced so that the gene is subject to a different form of regulation.
However, if the construct encoding either the nucleic acid binding molecule (or polypeptide binding molecule) or target nucleic acid is placed randomly in the genome, it is possible that the chromatin in that region will be transcriptionally silent and in a condensed state. If this occurs, then the polypeptide will not be expressed - these are termed position-dependent effects. To overcome this problem, it may be desirable to include locus control regions (LCRs) that maintain the intervening chromatin in a transcriptionally competent open conformation. LCRs (also known as scaffold attachment regions (SARs) or matrix attachment regions (MARs)) are well known in the art - an example being the chicken lysozyme A element (Stief et al, 1989, Nature 341 : 343), which can be positioned around an expressible gene of interest to effect an increase in overall expression of the gene and diminish position dependent effects upon incorporation into the organism's genome (Stief et al, 1989, supra). Another example is the CD2 gene LCR described by Lang et al, 1991, Nucl. Acid. Res. 19: 5851-5856.
Thus, a polynucleotide construct for use in the present invention, to introduce a nucleotide sequence encoding a nucleic acid or polypeptide binding molecule into the genome of a
multicellular organism, typically comprises a nucleotide sequence encoding the nucleic acid or polypeptide binding molecule operably linked to a regulatory sequence capable of directing expression of the coding sequence. In addition the polynucleotide construct may comprise flanking sequences homologous to the host cell organism genome to aid in integration. An alternative approach would be to use viral vectors that are capable of integrating into the host genome, such as retro viruses.
Preferably, a nucleotide construct for use in the present invention further comprises flanking LCRs.
CONSTRUCTION OF TRANSGENIC ORGANISMS EXPRESSING NUCLEIC ACID BINDING MOLECULES
A transgenic organism of the invention is preferably a multicellular eukaryotic organism, such as an animal, a plant or a fungus. Animals include animals of the phyla cnidaria, ctenophora, platyhelminthes, nematoda, annelida, moUusca, chelicerata, uniramia, Crustacea and chordata. Uniramians include the subphylum hexpoda that includes insects such as the winged insects. Chordates includes vertebrate groups such as mammals, birds, reptiles and amphibians. Particular examples of mammals include non-human primates, cats, dogs, ungulates such as cows, goats, pigs, sheep and horses and rodents such as mice, rats, gerbils and hamsters.
Plants include the seed-bearing plants (angiosperms) and conifers. Angiosperms include dicotyledons and monocotyledons. Examples of dicotyledonous plants include tobacco, (Nicotiana plumbaginifolia and Nicotiana tabacum), arabidopsis (Arabidopsis thalianά), Brassica napus, Brassica nigra, Datura innoxia, Vicia narbonensis, Viciafaba, pea (Pisum sαtivum), cauliflower, carnation and lentil (Lens culinαris). Examples of monocotyledonous plants include cereals such as wheat, barley, oats and maize.
PRODUCTION OF TRANSGENIC ANIMALS
Techniques for producing transgenic animals are well known in the art. A useful general textbook on this subject is Houdebine, Transgenic animals - Generation and Use (Harwood Academic, 1997) - an extensive review of the techniques used to generate transgenic animals from fish to mice and cows.
Advances in technologies for embryo micromanipulation now permit introduction of heterologous nucleic acid into, for example, fertilized mammalian ova. For instance, totipotent or pluripotent stem cells can be transformed by micro injection, calcium phosphate mediated precipitation, liposome fusion, retroviral infection or other means, the transformed cells are then introduced into the embryo, and the embryo then develops into a transgenic animal. In a highly preferred method, developing embryos are infected with a retrovirus containing the desired nucleic acid, and transgenic animals produced from the infected embryo. In a most preferred method, however, the appropriate nucleic acids are coinjected into the pronucleus or cytoplasm of embryos, preferably at the single cell stage, and the embryos allowed to develop into mature transgenic animals. Those techniques as well known. See reviews of standard laboratory procedures for micro injection of heterologous nucleic acids into mammalian fertilized ova, including Hogan et al, Manipulating the Mouse Embryo, (Cold Spring Harbor Press 1986); Krimpenfort et al, Bio/Technology 9:844 (1991); Palmiter et al, Cell, 41 : 343 (1985); Kraemer et al, Genetic manipulation of the Mammalian Embryo, (Cold Spring Harbor Laboratory Press 1985); Hammer et al, Nature, 315: 680 (1985); Wagner et al, U.S. Pat. No. 5,175,385; Krimpenfort et al, U.S. Pat. No. 5,175,384, the respective contents of which are incorporated herein by reference
Another method used to produce a transgenic animal involves microinjecting a nucleic acid into pro-nuclear stage eggs by standard methods. Injected eggs are then cultured before transfer into the oviducts of pseudopregnant recipients.
Transgenic animals may also be produced by nuclear transfer technology as described in Schnieke, A.E. et al, 1997, Science, 278: 2130 and Cibelli, J.B. et al, 1998,
Science, 280: 1256. Using this method, fibroblasts from donor animals are stably transfected with a plasmid incorporating the coding sequences for a binding domain or binding partner of interest under the control of regulatory. Stable transfectants are then fused to enucleated oocytes, cultured and transferred into female recipients.
Analysis of animals which may contain transgenic sequences would typically be performed by either PCR or Southern blot analysis following standard methods.
By way of a specific example for the construction of transgenic mammals, such as cows, nucleotide constructs comprising a sequence encoding a nucleic acid binding molecule are microinjected using, for example, the technique described in U.S. Pat. No. 4,873,191, into oocytes which are obtained from ovaries freshly removed from the mammal. The oocytes are aspirated from the follicles and allowed to settle before fertilization with thawed frozen sperm capacitated with heparin and prefractionated by Percoll gradient to isolate the motile fraction.
The fertilized oocytes are centrifuged, for example, for eight minutes at 15,000 g to visualize the pronuclei for injection and then cultured from the zygote to morula or blastocyst stage in oviduct tissue-conditioned medium. This medium is prepared by using luminal tissues scraped from oviducts and diluted in culture medium. The zygotes must be placed in the culture medium within two hours following micro injection.
Oestrous is then synchronized in the intended recipient mammals, such as cattle, by administering coprostanol. Oestrous is produced within two days and the embryos are 'transferred to the recipients 5-7 days after estrous. Successful transfer can be evaluated in the offspring by Southern blot.
Alternatively, the desired constructs can be introduced into embryonic stem cells (ES cells) and the cells cultured to ensure modification by the transgene. The modified cells are then injected into the blastula embryonic stage and the blastulas replaced into pseudopregnant hosts. The resulting offspring are chimeric with respect to the ES and host cells, and nonchimeric strains which exclusively comprise the ES progeny can be obtained using conventional cross-breeding. This technique is described, for example, in W091/10741.
PRODUCTION OF TRANSGENIC PLANTS
Techniques for producing transgenic plants are well known in the art. Typically, either whole plants, cells or protoplasts may be transformed with a suitable nucleic acid construct encoding a nucleic acid binding molecule or a polypeptide binding molecule or target nucleic acid (see above for examples of nucleic acid constructs). There are many methods for introducing transforming nucleic acid constructs into cells, but not all are suitable for delivering nucleic acid to plant cells. Suitable methods include Agrobacterium infection (see, among others, Turpen et al, 1993, J. Virol. Methods, 42: 227-239) or direct delivery of nucleic acid such as, for example, by PEG-mediated transformation, by electroporation or by acceleration of nucleic acid coated particles. Acceleration methods are generally preferred and include, for example, microprojectile bombardment. A typical protocol for producing transgenic plants (in particular moncotyledons), taken from U.S. Patent No. 5, 874, 265, is described below.
An example of a method for delivering transforming nucleic acid segments to plant cells is microprojectile bombardment. In this method, non-biological particles may be coated with nucleic acids and delivered into cells by a propelling force. Exemplary particles include those comprised of tungsten, gold, platinum, and the like.
A particular advantage of microprojectile bombardment, in addition to it being an effective means of reproducibly stably transforming both dicotyledons and monocotyledons, is that neither the isolation of protoplasts nor the susceptibility to
■ Agrobacterium infection is required. An illustrative embodiment of a method for delivering nucleic acid into plant cells by acceleration is a Biolistics Particle Delivery System, which can be used to propel particles coated with nucleic acid through a screen, such as a stainless steel or Nytex screen, onto a filter surface covered with plant cells cultured in suspension. The screen disperses the tungsten-nucleic acid particles so that they are not delivered to the recipient cells in large aggregates. It is believed that without a screen intervening between the projectile apparatus and the cells to be bombarded, the projectiles aggregate and may be too large for attaining a high frequency of transformation. This may be due to damage inflicted on the recipient cells by projectiles that are too large.
For the bombardment, cells in suspension are preferably concentrated on filters. Filters containing the cells to be bombarded are positioned at an appropriate distance below the macroprojectile stopping plate. If desired, one or more screens are also positioned between the gun and the cells to be bombarded. Through the use of techniques set forth herein one may obtain up to 1000 or more clusters of cells transiently expressing a marker gene ("foci") on the bombarded filter. The number of cells in a focus which express the exogenous gene product 48 hours post-bombardment often range from 1 to 10 and average 2 to 3.
After effecting delivery of exogenous nucleic acid to recipient cells by any of the methods discussed above, a preferred step is to identify the transformed cells for further culturing and plant regeneration. This step may include assaying cultures directly for a screenable trait or by exposing the bombarded cultures to a selective agent or agents.
An example of a screenable marker trait is the red pigment produced under the control of the R-locus in maize. This pigment may be detected by culturing cells on a solid support containing nutrient media capable of supporting growth at this stage, incubating the cells at, e.g., 18°C and greater than 180 μE m"2 s'1, and selecting cells from colonies (visible aggregates of cells) that are pigmented. These cells may be cultured further, either in suspension or on solid media.
An exemplary embodiment of methods for identifying transformed cells involves exposing the bombarded cultures to a selective agent, such as a metabolic inhibitor, an antibiotic, herbicide or the like. Cells which have been transformed and have stably integrated a marker gene conferring resistance to the selective agent used, will grow and divide in culture. Sensitive cells will not be amenable to further culturing.
To use the bar-bialaphos selective system, bombarded cells on filters are resuspended in nonselective liquid medium, cultured (e.g. for one to two weeks) and transferred to filters overlaying solid medium containing from 1-3 mg/1 bialaphos. While ranges of 1-3 mg/1 will typically be preferred; it is proposed that ranges of 0.1-50 mg/1 will
find utility in the practice of the invention. The type of filter for use in bombardment is not believed to be particularly crucial, and can comprise any solid, porous, inert support.
Cells that survive the exposure to the selective agent may be cultured in media that supports regeneration of plants. Tissue is maintained on a basic media with hormones for about 2-4 weeks, then transferred to media with no hormones. After 2-4 weeks, shoot development will signal the time to transfer to another media.
Regeneration typically requires a progression of media whose composition has been modified to provide the appropriate nutrients and hormonal signals during sequential developmental stages from the transformed callus to the more mature plant. Developing plantlets are transferred to soil, and hardened, e.g., in an environmentally controlled chamber at about 85% relative humidity, 600 ppm CO , and 250 μE m"2 s"1 of light. Plants are preferably matured either in a growth chamber or greenhouse. Regeneration will typically take about 3-12 weeks. During regeneration, cells are grown on solid media in tissue culture vessels. An illustrative embodiment of such a vessel is a petri dish. Regenerating plants are preferably grown at about 19°C to 28°C. After the regenerating plants have reached the stage of shoot and root development, they may be transferred to a greenhouse for further growth and testing.
Genomic DNA may be isolated from callus cell lines and plants to determine the presence of the exogenous gene through the use of techniques well known to those skilled in the art such as PCR and/or Southern blotting.
Several techniques exist for inserting the genetic information, the two main principles being direct introduction of the genetic information and introduction of the genetic information by use of a vector system. A review of the general techniques may be found in articles by Potrykus (Annu Rev Plant Physiol Plant Mol Biol [1991] 42:205-225) and Christou (Agro-Food-Industry Hi-Tech March/April 1994 17-27).
Thus, in one aspect, the present invention relates to a vector system which carries a construct encoding a nucleic acid or polypeptide binding molecule or target nucleic acid
according to the present invention and which is capable of introducing the construct Into the genome of an organism, such as a plant.
The vector system may comprise one vector, but it can comprise at least two vectors. In the case of two vectors, the vector system is normally referred to as a binary vector system. Binary vector systems are described in further detail in Gynheung An et al. (1980), Binary Vectors, Plant Molecular Biology Manual A3, 1-19.
One extensively employed system for transformation of plant cells with a given promoter or nucleotide sequence or construct is based on the use of a Ti plasmid from Agrobacterium tumefaciens or a Ri plasmid from Agrobacterium rhizogenes (An et al. (1986), Plant Physiol. 81, 301-305 and Butcher D.N. et al. (1980), Tissue Culture Methods for Plant Pathologists, eds.: D.S. Ingrams and J.P. Helgeson, 203-208).
Several different Ti and Ri plasmids have been constructed which are suitable for the construction of the plant or plant cell constructs described above.
EXAMPLES OF SPECIFIC APPLICATIONS
The nucleic acid (or polypeptide) binding molecule/ target nucleic acid (or polypeptide)/ ligand combination may be used to regulate the expression of a nucleotide sequence of interest, such as in a cell of an organism, including prokaryotes, yeasts, fungi, plants and animals, for example mammals, including humans.
Nucleotide sequences of interest include genes associated with disease in humans and animals and therapeutic genes. Thus a nucleic acid or polypeptide binding molecule may be used in conjunction with a target nucleic acid or polypeptide sequence and ligand in a method of treating or preventing disease in an animal or human patient.
Alternatively, a switching system, whether a gene switch or a protein switch, may be used to regulate expression of a nucleotide sequence of interest in a plant. Examples of specific applications include the following:
1. Improvement of ripening characteristics in fruit. A number of genes have been identified that are involved in the ripening process (such as in ethylene biosynthesis). Control of the ripening process via regulation of the expression of those genes will help reduce significant losses via spoilage.
2. Modification of plant growth characteristics through intervention in hormonal pathways. Many plant characteristics are controlled by hormones. Regulation of the genes involved in the production of and response to hormones will enable produce crops with altered characteristics.
3. Improvement of other characteristics by manipulation of plant gene expression. Overexpression of the Na+/H+ antiport gene has resulted in enhanced salt tolerance in Arabidopsis. Targetted zinc fingers could be used to regulate the endogenous gene.
4. Improvement of plant aroma and flavour. Pathways leading to the production of aroma and flavour compounds in vegetables and fruit are currently being elucidated allowing the enhancement of these traits using gene switch technology.
5. Improving the pharmaceutical and nutraceutical potential of plants. Many pharmaceutically active compounds are known to exist in plants, but in many cases production is limited due to insufficient biosynthesis in plants. Gene switch technology could be used to overcome this limitation by upregulating specific genes or biochemical pathways. Other uses include regulating the expression of genes involved in biosynthesis of commercially valuable compounds that are toxic to the development of the plant.
6. Reducing harmful plant components. Some plant components lead to adverse allergic reaction when ingested in food. Gene switch technology could be used to overcome this problem by downregulating specific genes responsible for these reactions.
7. As well as modulating the expression of endogenous genes, heterologous genes may be introduced whose expression is regulated by a gene switch of the invention.
For example, a nucleotide sequence of interest may encode a gene product that is preferentially toxic to cells of the male or female organs of the plant such that the ability of the plant to reproduce can be regulated. Alternatively, or in addition, the regulatory sequences to which the nucleotide sequence is operably linked may be tissue-specific such that expression when induced only occurs in male or female organs of the plant. Suitable sequences and/or gene products are described in WO89/10396, WO92/04454 (the TA29 promoter from tobacco) and EP-A-344,029, EP-A-412,006 and EP-A-412,911.
Other uses include regulating the expression of genes involved in biosynthesis of commercially valuable compounds that are toxic to the development of the plant.
Particualrly, the methods according to our invention allow the screening of ligands that affect the binding of a nucleic acid binding molecule, for example, DNA binding protein such as a transcription factor, to its binding sites (e.g., DNA). For example, a transcription factor such as c-myc may be used to bind to an array of all the putative promoter sites in the genome in the presence or absence of one or more ligands. The transcription factor may be recombinant, or purified from a cell extract, or present in a cell extract or extract from a subcellular compartment, e.g. the nucleus. For this purpose, the array may contain for example the promoters of therapeutically important genes, or a subsection of these, for example, cytokine genes. Thus, ligands that are capable of affecting transcription factor binding to certain genes (including genes of interest) may be isolated and used for example in therapy.
The Examples demonstrate the above as well as other uses. In some of the Examples, the proteins and ligands used are known to affect each other and the DNA sequences are quite similar. However, it should be appreciated that all three components could have been chosen arbitrarily. The Examples describe experiments in which the protein (nucleic acid binding molecule) was displayed on phage, but the nucleic acid binding molecule could have been identified by other means. For example, the nucleic acid binding molecule (e.g., protein) could have been epitope tagged, or an antibody to the native protein could have been used or the protein could have been fluorescently tagged or made as a GFP fusion, etc. Likewise, although the protein is expressed by E coli on phage
in the Examples, it could have been produced by any means known in the art including in vitro translation or importantly isolation from a relevant cell type.
The protein used in the Examples is relatively pure, but our methods do not require this. Thus, for example, a cell extract could have been used in the assays, preferably a mammalian cell extract where the protein is expressed eg HELA cell extract (for example, for Spl protein). Proteins isolated from relevant cell extracts may be modified (including being glycosylated, phosphorylated, or otherwise post-translationally modified) and the drug (candidate ligand) could affect the modification. The relevant protein in the cell extract may be capable of binding to the DNA together with other proteins present in the extract. The DNA array can be stained with antibodies to any protein thought to bind the extract, for example, an Spl accessory protein. Moreover the test chemical (ligand) may bind and 'knock out' any component of the DNA binding complex, not just the target (epitope tagged) protein. Cells may be pre-treated with chemicals or modified in other ways (for example, by altering of growth conditions) prior to the preparation of extracts and testing on DNA arrays.
Arrays may be RNA and RNA-binding proteins may be assayed according to our invention. The RNAs may be short synthetic fragments or full length mRNAs. Not only RNA binding, but RNA-processing, editing and splicing may be assayed. The enables the discovery of drugs which, for example, selectively inhibit the splicing of a particular exon pair in a single mRNA or a famly of RNAs. Furthermore, other processes such as RNA trans splicing can may be assayed instead of protein binding.
Assays according to our methods involving wild-type proteins may be used to identify protein-drug-DNA combinations for use as gene switches, as well as to discover therapeutics.
The present invention will now be described by way of the following examples, which are illustrative only and non-limiting. The examples refer to the figures:
EXAMPLES
Example 1 - Preparation and Screening of a Zinc Finger Phage Display Library
Selection Of Zinc Finger Phage Binding DNA Targets In The Presence Of Small Molecules
Example 1.1 Selection of Zinc Finger Phage that Bind DNA In The Presence Of Distamycin A
A powerful method of selecting DNA binding proteins is the cloning of peptides (Smith (1985) Science 228, 1315-1317), or protein domains (McCafferty et al, (1990) Nature 348:552-554; Bass et al, (1990) Proteins 8:309-314), as fusions to the minor coat protein (pill) of bacteriophage fd, which leads to their expression on the tip of the capsid. A phage display library is created comprising variants of the middle finger from the DNA binding domain of Zif268.
Materials And Methods
Construction And Cloning Of Genes.
In general, procedures and materials are in accordance with guidance given in
Sambrook et al, Molecular Cloning. A Laboratory Manual, Cold Spring Harbor, 1989. The gene for the Zif268 fingers (residues 333-420) is assembled from 8 overlapping synthetic oligonucleotides (see Choo and Klug, (1994) PNAS (USA) 91:11163-67), giving Sfil and Notl overhangs. The genes for fingers of the phage library are synthesised from 4 oligonucleotides by directional end to end ligation using 3 short complementary linkers, and amplified by PCR from the single strand using forward and backward primers which contain sites for Notl and Sfil respectively. Backward PCR primers in addition introduce Met-Ala-Glu as the first three amino acids of the zinc finger peptides, and these are followed by the residues of the wild type or library fingers as required. Cloning overhangs are produced by digestion with Sfil and Notl where necessary. Fragments are ligated to 1
μg similarly prepared Fd-Tet-SN vector. This is a derivative of fd-tet-DOGl (Hoogenboom et al, (1991) Nucleic Acids Res. 19, 4133-4137) in which a section of the pelB leader and a restriction site for the enzyme Sfil (underlined) have been added by site-directed mutagenesis using the oligonucleotide:
5'
CTCCTGCAGTTGGACCTGTGCCATGGCCGGCTGGGCCGCATAGAATGG
AACAACTAAAGC 3' (SeqID No.1)
which anneals in the region of the polylinker. Electrocompetent DH5α cells are transformed with recombinant vector in 200ng aliquots, grown for 1 hour in 2xTY medium with 1% glucose, and plated on TYE containing 15 μg/ml tetracychne and 1% glucose.
The zinc finger phage display library of the present invention contains amino acid randomisations in putative base-contacting positions from the second and third zinc fingers of the three-finger DNA binding domain of Zif268, and contains members that bind DNA of the sequence XXXXXGGCG where X is any base. Further details of the library used may be found in WO 98/53057, which is incorporated herein by reference. The DNA sequences AAAAAAGGCG and AAAAAAGGCGAAAAAA are used as selection targets in this example because short runs of adenines can cause intrinsic DNA bending - moreover, the structure of the bend can be disrupted by binding of the antibiotic distamycin A.
Phage Selection.
Bacterial colonies containing zinc fmger phage libraries are transferred from plates to 200ml 2xTY medium (lόg/litre Bactotryptone, lOg/litre Bactoyeast extract, 5g/litre NaCl) containing 50 μM ZnCl2 and 15 μg/ml tetracychne. Bacterial cultures are grown overnight at 30°C. Culture supernatant containing phages is obtained by centrifuging at 1500xg for 5 minutes.
Phage selection is over 4 rounds. Before each round, a pre-selection step is included comprising binding of 10 pmol of biotinylated DNA target sites immobilised on 50mg streptavidin coated beads (Dynal) to 1 ml of phage solution (bacterial culture supernatant diluted 1 : 1 with PBS containing 50 μM ZnC , 4% Marvel, 2% Tween), for 1 hour at 20°C on a rolling platform. After this time, 0.5 ml of phage solution is transferred to a streptavidin coated tube and incubated with 2 pmol biotinylated DNA target site in the presence of 2 μM distamycin A (Sigma) and 4 μg poly [d(I-C)]. After a one hour incubation the tubes are washed 20 times with PBS containing 50 μM ZnCl and 1% Tween, and 3 times with PBS containing 50 μM ZnCi2. Phage are eluted using 0.1ml 0.1 M triethylamine and the solution is neutralised with an equal volume of IM Tris-Cl (pH 7.4). Logarithmic -phase E. coli TGI cells are infected with eluted phage, and grown overnight, as described above, to prepare phage supernatants for subsequent rounds of selection.
After 4 rounds of selection, bacteria are plated and phage prepared from 96 colonies are screened for binding to the DNA target site in the presence and absence of distamycin A. Binding reactions are carried out in wells of a streptaVidin-coated microtitre plate (Boehringer Mannheim) and contain 50 μl of phage solution (bacterial culture supernatant diluted 1:1 with PBS containing 50 μM ZnCl2, 4% Marvel, 2% Tween), 0.15 pmol DNA target site and 0.25 μg poly [d(I-C)]. When added, distamycin A is present at a concentration of 2 μM. After a one hour incubation the wells are washed 20 times with PBS containing 50 μM ZnCl2 and 1% Tween (and also distamycin A at a concentration of 2 μM where appropriate), and 3 times with PBS containing 50 μM ZnC - Bound phage are detected by ΕLIS A (carried out in the presence of distamycin A at a concentration of 2 μM where appropriate) with horseradish peroxidase-conjugated anti-M13 IgG (Pharmacia Biotech) and quantitated using SOFTMAX 2.32 (Molecular Devices).
Sequencing Of Selected Phage.
Single colonies of transformants obtained after four rounds of selection as described, are grown overnight in 2xTY/Zn/Tet. Small aliquots of the cultures are stored in 15% glycerol at-20°C, to be used as an archive. Single-stranded DNA is prepared from
TiVT phage in the culture supernatant and sequenced using the Sequenase 2.0 kit (U.S. Biochemical Corp.). The amino acid sequences of the zinc finger clones are deduced.
Amino acid sequences from helical regions of zinc fingers selected to bind DNA in the presence of distamycin
Clones 1-4 were selected to bind the oligo:
tataAAAAAAGGCGTGtcacagtcagtccacacgtc
Clones 5-8 were selected to bind the oligo:
tataAAAAAAGGCGAAAAAAtcacagtcagtccacacgtc
Zinc finger phage clones are isolated according to this method which bind the target with higher affinity in the presence of ligand than in the absence of ligand (see Figure 1). This method also selected certain clones that bound DNA in the absence of the ligand but were displaced from the DNA in the presence of the ligand (see Example 1.4 below).
Example 1.2 - Selection of Zinc Finger Phage Binding DNA In The Presence of Actinomycin D
An adaptation to the method outlined in the Example 1.1 was used to isolate phage that bound DNA in the presence of a different small molecule, actinomycin D. In this example the DNA target was AGCTTGGCG.
Phage Selection
Essentially the method was the same as used in the previous section using four rounds of a preselection step followed by a selection step, washing and elution. Differences in the method are described. The preselection step comprised of 7.5 pmol of biotinylated DNA target site immobilised on 18.75 μl streptavidin coated beads (Dynal) in a 100 μl mixture containing 4 μl phage library 96 μl PBS, 2% Marvel, 1% Tween-20, 50 μM ZnCl2 for 1 hour at room temperature with constant mixing. Phage selections were made in streptavidin coated tubes with the phage supernatant, 5 nM biotinylated target DNA, 10 μM actinomycin D in the presence of 1 μg poly [d(I-C)] competitor. The selections were incubated for 1 hour at room temperature. The bound phage were washed and eluted as described above.
ELISA was performed as described above but using 5 nM biotinylated target DNA, 0.25 μg poly[d(I-C)] competitor in the assay and 10 μM actinomycin D where appropriate. Phage were sequenced using Big Dye Terminator Cycle Sequencing Kit (Perkin Elmer Biosystems) and automated sequencing.
The amino acid sequences from the helical regions of the selected zinc fingers were sequenced as:
These two clones were selected using the oligo:
tatacaAGCTTGGCGatcacagtcagtccacacgtc
These zinc finger clones bind to the target oligo with higher affinity in the presence of actinomycin D than in the absence of ligand (see Figure 2).
Example 1.3 - Selection of Zinc Finger Phage Using Randomised DNA In The Presence Of Echinomycin, And Subsequent Deconvolution of Binding Partners
In this experiment the library of DNA binding molecules was sorted using a library of DNA sequences in the presence of a small molecule. After DNA binding molecules that bound to DNAs in the presence of the small molecule had been selected, the optimal binding site(s) for each DNA binding molecule were determined using the binding site signature.
Selections
In this experiment, 50 pmol of DNA target library of sequence YRYRYGGCG (where Y is C or T and R is G or A) was bound to 125 μl of streptavidin coated beads
(Dynal) and the beads were used to preselect 0.4 μl of phage library in 100 μl of PBS, 2% Marvel, 1% Tween-20, 50 μM ZnCl2 for 1 hour at room temperature with constant mixing. Phage selections were made in streptavidin coated tubes with the phage supernatant, 30 nM biotinylated target DNA, 10 μM echinomycin in the presence of 1 μg poly [d(I-C)] competitor. The selections were incubated for 1 hour at room temperature. The bound phage were washed and eluted as described above.
ELISA was performed as described above but using 30 nM biotinylated target DNA, 0.5 μg poly[d(I-C)] competitor in the assay and 10 μM echinomycin where appropriate. Phage were sequenced using Big Dye Terminator Cycle Sequencing Kit (Perkin Elmer Biosystems) and automated sequencing.
Four different clones were selected using the DNA library tatagt YRYRYGGCG atcacagtcagtccacacgtc in the presence of echinomycin (see Figure 3).
The amino acid sequences from the helical regions of the selected zinc fingers were sequenced as:
b) Binding site signature
The signature of the clone 0.4/4 was determined using a modified binding site signature assay. For each of the 5 randomised positions of the oligo, a base was fixed at one of the five positions whilst the remaining 4 positions contained defined mixtures of bases. For the pyrimidine position the base was fixed as either C or T and for the purine position the base was fixed as either G or A so thatby testing each position in turn an optimal sequence or binding site signature could be determined.
In each well of a streptavidin-coated microtitre plate 2 μl of phage solution (overnight E. coli culture supernatant containing phage) were mixed with 48 μl of 2% Marvel, 1% Tween-20, 0.5 μg poly [d(I-C)j, 10 μM echinomycin and between 8-16 nM of biotinylated target DNA. The reaction was incubated for 1 hour at room temperature, followed by 6 washes with PBS containing 1% Tween-20, 50 μM ZnCl2 and 3 washes with PBS containing 0.05% Tween-20, 50 μM ZnCl2. 100 μl of PBS containing 1% Marvel, 0.05% Tween-20, 50 μM ZnCl2 and 1/5000 dilution of anti-M13 horse radish peroxidase antibody conjugate (Amersham Pharmacia Biotech) was added to each well and incubated for 1 hour at room temperature. The ΕLISA plate was washed 3 times with PBS containing 0.05% Tween-20, 50 μM ZnCl2 followed by three washes with 3 washes of PBS containing 50 μM ZnCl2. The assay was developed with BCIP/NBT substrates and quantified using a plate reader.
This method determined the binding site sequence of clone 0.4/4 to be
(T1)(G/A2)(C3)(G/A4)(T5) (see Figure 4).
c) Verification of the target DNA sequence
The optimal target DNA sequence, as determined by the binding site signature, was synthesised together with two other related DNA sequences that were present in the original random DNA library but differed in some of the optimal base positions of the binding site.
These oligonucleotides had the sequence:
tatagtTACGTGGCGatcacagtcagtccacacgtc
tatagtTGTATGGCGatcacagtcagtccacacgtc
tatagtCGTACGGCGatcacagtcagtccacacgtc
Binding of the phage clone was tested as a function of DNA concentrations (from 5 nM to 0.312 nM) in the presence of 10 μM echinomycin. A phage ELISA was set up using 20 μl phage supernatant, 0.5 μg poly[d(I-C)], 10 μM echinomycin in PBS containing 1% Marvel, 1% Tween-20, 50 μM ZnCl2. The total volume of the assay was 50 μl. The assay was washed and developed as described as for the binding site signature assay.
This method showed that the clone 0.4/4 bound preferentially to the sequence determined from the binding site signature, i.e. TACGTGGCG, in the presence of the small molecule (see Figure 5).
Example 1.4 Selection of Zinc Finger Phage that are dissociated from their DNA Targets In The Presence of Distamycin A or Actinomycin D
This example describes phage that bound DNA targets with higher affinity in the absence of ligand. These phage were isolated using either: (a) the same method as in example 1.1, or (b) by selection in the absence of small molecule and phage elution from DNA using a small molecule.
In this latter case (b) the method was as follows.
Phage selection is over 4 rounds. Binding reactions contain 10 pmol biotinylated DNA site immobilised on 50mg streptavidin coated beads (Dynal) and a 1 ml solution of zinc finger phage library (as described in 1.1) Reactions were incubated for 1 h on a rolling platform. After this time, beads were washed 20 times as described in 1.1 and finally phage were eluted from the beads over 5 minutes using a solution containg ligand (10 μM Distamycin A, or 1 μM Actinomycin D in PBS/Zn).
Some phage isolated by either of the above methods (a or b) bound DNA in the absence of ligand but could be displaced by concentrations of distamycin A at 10 μM and actinomycin D at 1 μM. The distamycin sensitive clone was selected using the DNA target AAAAAGCGGAAAAA and its helices were sequenced as:
QSRSLIQ QRDSLSR RSDERKR
The actinomycin D sensitive clone was selected with the DNA target AGCTTGGCG and its helices were sequenced as:
RSDELTR RSDVLST TRSSRKK
Figure 6 demonstrates the sensitivity of each clone to the respective drug.
Example 2 - Modulation Of Binding Of Polypeptides To Target DNA By ligand
Individual phage clones are assayed for modulation of target DNA binding by ligand in a phage ELISA binding assay.
Binding assay reactions are carried out in wells of a streptavidin-coated microtitre plate (Boehringer Mannheim) as in Example 1, except that the distamycin concentration is varied while the DNA concentration is kept constant at 2 nM.
Induction of higher affinity DNA binding is observed when distamycin is added to the binding reaction at 10"6M - 10"7M.
Binding of the zinc finger phage to DNA in the absence of ligand, or at ligand concentrations of 10~9 M or lower, results in phage retention close to background level, i.e. lower affinity binding than in the presence of ligand.
Background level affinity binding is defined as the phage retention in binding reactions that contain no DNA binding site.
Example 3 - DNA-ligand Modulatable Restriction Enzyme
Phage-selected or rationally designed zinc finger domains which bind target DNA sequences in a manner modulatable by a ligand can be converted to restriction enzymes which cleave DNA containing said target sequences in a manner modulatable by ligand. This is achieved by coupling an appropriate zinc finger, as isolated in Example 1 above, to a cleavage domain of a restriction enzyme or other nucleic acid cleaving moiety.
A method of converting zinc finger DNA binding domains to chimaeric restriction endonucleases has been described in Kim, et al, (1996) Proc. Natl. Acad. Sci. USA
93:1156-1160. In order to demonstrate the applicability of DNA ligand-modulatable zinc fingers to restriction enzymes, a fusion is made between the catalytic domain of Fok I as described by Kim et al. and a zinc finger of Example 1. Fusion of the zinc finger nucleic acid-binding domain to the catalytic domain of Fok I restriction enzyme results in a novel endonuclease which cleaves DNA adjacent to the DNA recognition sequence of the zinc fmger (AAAAAAGGCG or AAAAAAGGCGAAAAAA).
The oligonucleotides AAAAAAGGCG and AAAAAAGGCGAAAAAA are synthesised and ligated to arbitrary DNA sequences. After incubation with the zinc fmger restriction enzyme, the nucleic acids are analysed by gel electrophoresis. Bands indicating cleavage of the nucleic acid at a position corresponding to the location of the oligonucleotide(s) (AAAAAAGGCG / AAAAAAGGCGAAAAAA) are visible.
In a further experiment, the zinc finger is fused to an amino terminal copper/nickel binding motif. Under the correct redox conditions (Nagaoka, M, et al, (1994) J. Am. Chem. Soc. 1 16:4085-4086), sequence-specific DNA cleavage is observed, only in the presence of DNA incorporating oligonucleotide AAAAAAGGCG or AAAAAAGGCGAAAAAA.
Example 4 - Modulation Of Transcriptional Activity In Vivo
A reporter system is produced which produces a reporter signal conditionally depending on the binding of the zinc finger DNA binding molecule to its target DNA sequence. This binding, and hence transcription from the reporter system, is modulated by the ligand Distamycin A.
A transient transfection system using zinc fmger transcription factors is produced as described in Choo, Y, et al, (1997) J. Mol. Biol 273:525-532. This system comprises an expression plasmid which produces a phage-selected zinc finger fused to the activation domain of HSV VP16, and a reporter plasmid which contains the recognition sequence of the zinc finger upstream of a CAT reporter gene.
Thus, a zinc finger which recognises the DNA sequence AAAAAAGGCG is selected by phage display as described in Example 1. This zinc f ger domain is used to construct a multifinger protein. By the method of the preceding examples, said zinc finger is used to construct transcription factors as described above.
A transient expression experiment is conducted, wherein the CAT reporter gene on the reporter plasmid is placed downstream of a ten copy repeat of the sequence AAAAAAGGCG. The reporter plasmid is cotransfected with a plasmid vector expressing the zinc finger-HS V fusion under the control of a constitutive promoter. No activation of CAT gene expression is observed.
However, when the same experiment is conducted in the presence of Distamycin A, CAT expression is observed as a result of the binding of the zinc fmger transcription factor to its recognition sequence AAAAAAGGCG.
Example 5 - Isolation of cognate target nucleic acids
Using a known DNA binding molecule, target DNA sequences to which it can bind are isolated.
The 434 repressor is a gene regulatory protein of phage 434. It binds to a 14bp operator site (see Koudelka et al, 1987, Nature vol 326 pp 886-888). This operator site consists of five conserved bp (1-5), then four variable bp (6-9), then five more conserved - bp (10-14) as shown below:
Site: 1 6 7 8 9 10 14
Base: A C A A G/T X X X X A/T T T G T
wherein X is any base.
The conserved bases contact the 434 repressor protein. The four variable bases are thought not to contact the 434 repressor protein. However, the four bases which do not contact the 434 repressor protein may affect the affinity of binding of the repressor to the operator site.
The 434 repressor protein (ie. the DNA binding molecule) is contacted with a library of different target DNA sequences in the presence and absence of ligand. The target DNA sequences are synthesized using an Applied Biosystems 380 A DNA synthesizer and are purified by gel electrophoresis. The four variable bases ('X' as shown above) are randomised, producing a library of 256 different target DNA molecules, position 5 being T, and position 10 being A. The different target molecules are arranged in an array spotted onto a glass slide by means of a polypeptide linker.
Structure of target DNA sequence library:
5 ' 1 6 9 14 3 '
GTCGGATCCTGTCTGAGGTGAGACAATXXXXATTGTGTCTTCCGACGTCGAATTCGCG
wherein X is any base, and the partially randomised 434 operator is underlined.
The 434 repressor protein is added to the library of target DNA sequences, in the presence and absence of 2 μM distamycin A (Sigma) ligand in 200 μl binding buffer (9 mM Tris-HCl pH 8.0, 90 mM KC1, 90 μM ZnSO4) and incubated for 30 min.
Binding of the 434 repressor to the array is visualised by immunosorbent techniques, for intance using a fluorescently labeled antibody.
Target DNA sequences are selected which bind the 434 repressor with higher affinity in the presence of ligand than in the absence of ligand. Furthermore, DNA sequences are selected which bind the 434 repressor in the absence of ligand with a higher affinity than in the presence of ligand.
Example 6 - Isolation of ligands which affect the binding of a DNA binding molecule to its cognate DNA target
The 434 repressor protein of Example 5 is used in conjunction with a target operator DNA sequence to which it binds.
The operator sequence used is
5 ' - A C A A T A A A T A T T G T -3 '
A library of ligands is used in place of the 2 μM distamycin A (Sigma) ligand of
Example 5.
ligands are isolated which are capable of increasing the affinity of the 434 repressor for its cognate DNA target sequence, ligands are also isolated which are capable of decreasing the affinity of the 434 repressor for its cognate DNA target sequence.
Example 7: Zinc Finger Protein/Drug/DNA Mieroarrays
Oligonucleotide Array
95 oligonucleotides are made containing 13 base binding regions on a 37 base oligo template TATANN123456789NNTCACAGTCAGTCCACACGTC where NN are the two base flanking sequences and 123456789 is the specific 9 base binding sequence. 47 oligos had the binding sequence 12345GGCG where bases 1-5 are of a particular sequence and 46 oligos the sequence GCGG56789 where bases 5-9 are of a particular sequence. Two oligos are used that had the binding sequence AAAAAAGGCGAAAAAA and AAAAAAGCGGAAAAAA in place of NN123456789NN. The oligos are annealed to a biotinylated oligo GACGTGTGGACTGACTGTGA and filled-in using dideoxynucleotides and Klenow polymerase. The oligonucleotides are diluted to a concentration of 0.1 pmol/μl and arrayed randomly into single wells on a 96-well plate as shown in Figure 7. Position 95 in Figure 7 did not have any oligo arrayed into the well and is used as a negative (background) control. This plate served as a stock oligonucleotide plate.
Drugs and Zinc finger Phage
Three drugs and zinc fmger phage are chosen to assay. The drugs are distamycin A, a minor groove binding drug, actinomycin D, an intercalating drug and echinomycin, a bis- intercalating antibiotic. The phage chosen are distamycin binding phage clone 3 (Dist3/2F), actinomycin D binding phage clone 1 (Adl) and echinomycin binding phage 0.4/4 (EM0.4/4).
Single Zinc finger protein/Single Drug/Multiple Sites Binding Array
Two 96-well streptavidin coated plates are preblocked with 150 μl 4% Marvel, PBS, 50μM ZnCl2 for 1 hour at room temperature. Following blocking, the solution is
discarded and 45 μl of Assay Mixture (2% Marvel, 1% Tween 20, PBS, 50μM ZnCl2, 20 μg salmon sperm DNA) with and without 10 μM drug are arrayed so that in the 96-well plates there are adjacent columns of no drug and drug solutions. Binding sites are added to a concentration of 8 nM as in Figure 8 so that the same oligo is arrayed into adjacent wells without drug and containing drug. 5 μl of overnight zinc finger phage culture supernatant are then added to each well in each plate and the assay is incubated at room temperature for 1 hour.
The plates are washed seven times by flooding the with 1% Tween 20, PBS, 50μM ZnCl2 followed by three washes with PBS, 50μM ZnCl2. For detection of bound phage, 100 μl of anti-M13 HRP conjugated antibody (1/5000 dilution) in 2% Marvel, 0.05% Tween 20, PBS, 50μM ZnCl are incubated in each well for 1 hour at room temperature and then the plates are washed with three washes of 0.05% Tween 20, PBS, 50μM ZnCl2 and three washes with PBS, 50μM ZnCl . The assay is developed with TMB substrate and stopped by the addition of 100 μl 0.1 M H2SO4. The assays are read at OD450 and subtracted from absorbance readings at OD695. Readings from the zinc finger phage incubated in the absence of drug are subtracted from readings in the presence of drug to give values for drug/nucleic acid binding.
Positive absorbance values indicate that the zinc fmger protein binds DNA in the presence of a small molecule drug ('ON' switch). Negative absorbance readings indicate that the zinc finger protein binds to a DNA sequence preferentially in the absence of that drug ('OFF' switch). A reading close to zero indicates that the phage does not bind the sequence in the presence or absence of drug or binds equally in both circumstances, Figures 9 and 10. The results are summarised in Table 2 for two zinc fmger phage and demonstrate the utility of the method in identifying both 'ON' and 'OFF' switches for nucleic acid binding proteins and small molecule combinations.
Table 2: Examples of ON/OFF Switches Identified by Zinc Finger/Drug/DNA Mieroarrays Single Zinc Finger Proteins/Multiple Drugs/Multiple Site Binding Array
The binding site oligos are arrayed on 384-well plates so that each site is dispensed into 4 adjacent wells. For each binding site, one well remained free of drug, whilst distamycin A, actinomycin D, echinomycin are dispensed into the each of the three remaining wells, see Figure 11. Binding sites and drug mixtures are prepared in stock plates and 18 μl of the Assay Mixture (see above) with or without 10 μM drug are dispensed into a 384-well plate. After 5 minutes, 2 μl phage are then added to each well and incubated for 1 hour at room temperature. The wells are washed seven times with 100 μl 1% Tween 20, PBS, 50μM ZnCl2 followed by three washes with 100 μl PBS, 50μM ZnCl . For detection of bound phage, 40 μl of anti-M13 HRP conjugated antibody (1/5000 dilution) in 2% Marvel, 0.05% Tween 20, PBS, 50μM ZnCl2 are incubated in each well for 1 hour at room temperature and then the plates are washed with three washes of 100 μl 0.05% Tween 20, PBS, 50μM ZnCl2 and three washes with 100 μl PBS, 50μM ZnCl2. The
assay is developed with TMB substrate and stopped by the addition of 40 μl 0.1 M H2S0 . The assays are read as previously described.
For analysis, the absorbance readings for the phage against the DNA/no drug are subtracted from the DNA/drug assays. Again, this provides a measure of relative DNA/drug binding. Positive absorbance values showed that the zinc fmger phage bound the DNA site preferentially when the drug is present ('ON' switch) and negative values showed that the DNA site is bound preferentially when the drug is absent and that the presence of the drug reduced binding to the DNA site ('OFF' switch). Figure 12 shows the 384-well assay results for the distamycin binding phage Dist3/2F in a 384-well format. These are rearranged, for ease of analysis, into Figures 13, 14 and 15 for the individual drugs distamycin A, echinomycin and actinomycin D.
The results demonstrate a number of 'ON' and 'OFF' sequences with distamycin A and the distamycin zinc finger phage Dist3/2F and the method has identified different responses of the zinc finger protein in the presence of different drugs. In the presence of other drugs it can be seen that either the zinc fingers may have an opposite effect on the same site with a different drug or that a different drug may cause the zinc fingers to have a different DNA site specificity. By comparison of Figures 13, 14 and 15 it can clearly be seen that the phage binds to different patterns of oligos in response to the three drugs. Two notable examples are oligos 35 and 86. Both of these oligos act as 'ON' sequences with distamycin A. However, in the presence of echinomycin the zinc fingers do not bind to oligo 35 and bind less tightly in the presence of actinomycin D. Oligo 86 is an 'ON' sequence with distamycin A, however, this becomes an 'OFF' switch with either echinomycin or actinomycin D.
These experiments have shown methods to identify different responses of nucleic acid binding proteins to different small molecules on different DNA sites. In the above examples, the DNA sites are short lengths of DNA but these may be longer stretches of DNA such as cDNA libraries or libraries of subgenomic fragments of DNA such as promoter regions. The following are examples of how the nucleic acid binding proteins
used in these assays and their binding charateristics in response to small molecules are to be utilised.
Example 8
We have collected data that show zinc finger domain Dist3/2F will bind DNA site AAAAAAGGCGAAAAAA in the presence of distamycin A (oligo 35), but will dissociate from DNA site ATGTTAGGGCGTG (oligo 55) in the presence of distamycin A. The same drug will therefore cause this protein to re-locate to a different DNA sequence, which has broad utility in molecular biology, for example in differential regulation of two genes.
In order to demonstrate this we create a 6-finger protein that comprises a dimer of domain Dist3/2F (protein Dist3/2F-Dist3/2F). We also create a DNA test molecule that comprises two copies of sequence AAAAAAGGCGAAAAAA (molecule AA), and a separate DNA test molecule that comprises two copies of sequence ATGTTAGGGCGTG (molecule BB), where the lenghts of molecules AA and BB differ by 15bp such that they can be resolved by non-denaturing polyacrylamide gel electrophoresis.
We carry out a series of in vitro reactions in which the protein Dist3/2F-Dist3/2F is incubated together with DNAs AA and BB in the absence of distamycin A and also in the presence of the drug in increasing concentrations in 20 mM Bis-tris propane (pH 7.0), 100 mM NaCl, 5 mM MgCl2, 50 μM ZnCl2, 5 mM DTT, 0.1 mg/ml BSA, 0.1% Nonidet P40. The binding of protein to either or both of DNAs AA and BB is analysed using non- denaturing polyacrylamide gel electrophoresis (band shift assay). In the absence of distamycin A, protein Dist3/2F-Dist3/2F is seen to bind only to site BB. In the presence of increasing concentrations of drug, the protein no longer binds to BB, but rather begins to bind AA. Control reactions showing each DNA free or bound to the protein separately are included as size standards.
Example 9
We have collected data that show zinc finger domain Dist3/2F will bind DNA site GAGCTGGGGCGTG (oligo 86) in the presence of distamycin A, but will bind DNA site GCGCCGCGGCGTG (oligo 94) in the presence of echinomycin. Different drugs will therefore determine the binding of this protein different DNA sequences, which has broad utility in molecular biology, for example in differential regulation of two genes.
In order to demonstrate this we create a 6-finger protein that comprises a dimer of domain Dist3/2F (protein Dist3/2F-Dist3/2F). We also create a DNA test molecule that comprises two copies of sequence GAGCTGGGGCGTG (molecule CC), and a separate DNA test molecule that comprises two copies of sequence GCGCCGCGGCGTG
(molecule DD), where the lenghts of molecules CC and DD differ by 15bp such that they can be resolved by non-denaturing polyacrylamide gel electrophoresis.
We carry out a series of in vitro reactions in which the protein Dist3/2F-Dist3/2F is incubated together with a mix of DNAs CC and DD in the absence of drugs and also in the presence either drug in 20 mM Bis-tris propane (pH 7.0), 100 mM NaCl, 5 mM MgCl2, 50 μM ZnCl2, 5 mM DTT, 0.1 mg/ml BSA, 0.1% Nonidet P40. The binding of protein to either or both of DNAs CC and DD is analysed using non-denaturing polyacrylamide gel electrophoresis (band shift assay). The protein is seen to bind DNA site CC in the presence of drug distamycin A, but will bind DNA site DD in the presence of drug echinomycin. Control reactions showing each DNA free or bound to the protein separately are included as size standards.
Example 10
We have collected data that show zinc finger domain EM0.4/4 will bind to DNA site CCGGAATGGCGTG (oligo 87) in the presence of echinomycin, while zinc fmger domain Dist3/2F will dissociate from DNA site GGGTTGGGGCGTG (oligo 3) in the presence of the same drug. Therefore if DNA sequences CCGGAATGGCGTG and GGGTTGGGGCGTG are present on the same DNA, echinomycin can cause different zinc
finger proteins to occupy the same DNA molecule in a mutually exclusive manner. This has broad utility in molecular biology, for example in regulation of a single gene by two different proteins, e.g. a transcriptional repressor and a separate transcriptional activator protein.
In order to demonstrate this we create a 6-finger protein that comprises a dimer of domain EMO.4/4 (protein EM0.4/4-EM0.4/4) and a 6-finger protein that comprises a dimer of domain Dist3/2F (protein Dist3/2F-Dist3/2F) where the size of the two molecules differs such that they can be resolved by non-denaturing polyacrylamide gel electrophoresis. We also create a single DNA test molecule that comprises two copies of sequence CCGGAATGGCGTG and two copies of sequence GGGTTGGGGCGTG.
We carry out a series of in vitro reactions in which the two proteins (EMO.4/4- EMO.4/4 and Dist3/2F-Dist3/2F) are incubated together with the test DNA in the absence of drug D and also in the presence of the drug in increasing concentrations in 20 mM Bis- tris propane (pH 7.0), 100 mM NaCl, 5 mM MgCl2, 50 μM ZnCl2, 5 mM DTT, 0.1 mg/ml BSA, 0.1% Nonidet P40. The binding of either protein to DNA is analysed using non- denaturing polyacrylamide gel electrophoresis (band shift assay). In the absence of drug echinomycin, protein Dist3/2F-Dist3/2F only is seen to bind the DNA. In the presence of increasing concentrations of drug, the Dist3/2F-Dist3/2F protein no longer binds the DNA, but rather protein EM0.4/4-EM0.4/4 is seen to bind the DNA. Control reactions showing each protein bound to the DNA separately, as well as the free DNA, are included as size standards.
Example 11. Selection of a Gene Switch Using Modified Nucleic Acid
This example describes the use of the methods of our invention to select gene switches, and their components, using modified DNA.
The creation of the modified DNA is prophetic. There are protocols for linking BP to DNA though, but perhaps it's best not to disclose these at present.
A zinc finger library is constructed according to the protocols described in US Patent Nos. 6,013,453 and 6,007,988 and International Patent Publication Nos. WO 98/53060, WO 98/53057, and WO 98/53058; selection of zinc finger proteins capable of binding to specific sequences is also described in these documents. These documents also describe phage ELISA screening in presence and absence of a drug, formation of multifinger proteins, and bandshift assays, and the relevant protocols described in those documents are used here.
A zinc finger protein is selected to bind the modified DNA sequence d(Gl-C2-G3- G4-C5-[BP]G6-C7-T8-A9-C10-C11) by selections from the library libl2 which is designed to recognise DNA sequences of the form GCGGXXXXX. The [BP]dG moiety is derived from the covalent linkage of a benzo[a]pyrenyl moiety to the guanine. Following selections, the recovered zinc finger phage clones are tested for binding to the unmodified DNA sequence (i.e., d(Gl-C2-G3-G4-C5-G6-C7-T8-A9-C10-Cl l). Those clones which do not bind the unmodified DNA are re-tested in the presence of compounds derived from a combinatorial library of benzo[a]pyrenyl derivatives.
Gene switches are identified comprising combinations of (1) the unmodified DNA sequence, (2) zinc fingers selected by the above method and (3) library-derived compounds capable of promoting the association of (1) and (2). Optionally any selected zinc finger domain can be homo- or hetero- multimerised to the form of a polydactyl protein using appropriate linkers and tested for binding to a suitable head-to-tail multimer of a composite cognate DNA sequence, in the presence and absence of the corresponding BP derivative. In this way, gene switches that bind DNA with greater affinity and that display a greater responsiveness to the BP derivative are identified.
Each of the applications and patents mentioned above, and each document cited or referenced in each of the foregoing applications and patents, including during the prosecution of each of the foregoing applications and patents ("application cited documents") and any manufacturer's instructions or catalogues for any products cited or mentioned in each of the foregoing applications and patents and in any of the application cited documents, are hereby incorporated herein by reference. Furthermore, all documents
cited in this text, and all documents cited or referenced in documents cited in this text, and any manufacturer's instructions or catalogues for any products cited or mentioned in this text, are hereby incorporated herein by reference. In particular, we hereby incorporate by reference International Patent Application Numbers PCT/GBOO/02080, PCT/GB00/02071, PCT/GB00/03765, United Kingdom Patent Application Numbers GB0001582.6, GB0001578.4, and GB9912635.1 as well as US09/478513.
Various modifications and variations of the described methods and system of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with specific preferred embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the invention which are obvious to those skilled in molecular biology or related fields are intended to be within the scope of the following claims.