TITLE OF THE INVENTION
METHOD FOR CONTROLLING THE DISTRIBUTION OF DNA SEQUENCING TERMINATION PRODUCTS
BACKGROUND OF THE INVENTION
DNA sequencing has become a very widely used technique since its introduction in 1977. It is used in many ways. For example it can be used to determine the presence or absence of mutations known to be associated with disease states thereby allowing a physician to make a diagnosis or prognosis. It is expected that genotyping will be used in the future by physicians to determine which drugs to prescribe. Sequencing is used to find new genes and their encoded proteins and to perform comparisons between various species for evolutionary analysis. Sequencing can be used as part of a paternity test or forensic analysis. DNA sequencing is also being used as the technique of determining the sequence of the human genome as well as genomes of other organisms such as bacteria, yeast, slime mold, roundworms and fruit flies. Two different methods of sequencing DNA were introduced in 1977. A paper by Maxam and Gilbert (1977) introduced a method wherein four sets of chemical reactions are performed with each reaction preferentially cleaving next to one or two of the four different types of nucleotides which occur naturally in DNA. Following these chemical reactions, which are carried out only long enough to allow partial cleavage of the many molecules present, the DNA is electrophoresed on a gel which separates the cleavage products by size. The results can be analyzed to determine the sequence of nucleotides in the DNA.
Also in 1977 a paper was published by Sanger et al. (1977). This presented a completely different method of sequencing DNA. In this method a DNA template is mixed with dNTPs and dideoxy NTPs (ddNTPs) as well as with DNA polymerase and a buffer. Four separate reactions were performed wherein only a single ddNTP was used in each reaction. The DNA polymerase would synthesize a new strand of DNA based upon the template which was present. The polymerase would randomly choose between, e.g., dATP and ddATP. Whenever a ddNTP was inserted, the DNA strand being synthesized could no longer be elongated. Some synthesized strands would be terminated quickly because a ddNTP was inserted early in the reaction whereas other strands of newly synthesized DNA would be longer because by chance no ddNTP was incorporated for a long time. Because many strands of DNA were being synthesized, every size of DNA was being formed with every one ending in a ddNTP. Usually one of the dNTPs or the
ddNTP was radioactively labeled. After performing the four separate reactions, the products were electrophoresed side by side on a polyacrylamide gel and then an autoradiogram of the gel was made. The resulting bands allowed one to easily determine the sequence of the template DNA. The Sanger method is the more widely used of the above two methods. Originally the method allowed for the determination of approximately 200 base pairs of sequence from a single set of reactions. Over time many improvements have been made and now sequences of over 1000 bases can be read from a single sequencing reaction. The buffer system has been improved and a wide variety of different DNA polymerases from different species have been introduced. These allowed the determination of many more bases from a single reaction. Also, modified dNTPs are sometimes used to read through unusual stretches of DNA, such as long runs of G. Another major improvement was the introduction of fluorescent labels which allowed the final reactions to all be run on a single lane of a gel rather than side by side in four separate lanes. This is possible because four differently colored fluorescent labels are used and can be distinguished from each other. This has further allowed automation to become very widely used.
The introduction of automated DNA synthesizers was a further important advance because it allowed the easy manufacture of primers which can be used to walk along a long piece of DNA which is too long to be sequenced in a single reaction. The introduction of cycle sequencing further increased the usefulness and sensitivity of DNA sequencing. Despite the many improvements made to the Sanger method of sequencing since it was introduced over 20 years ago, there is still room for and a desire for yet more improved techniques. The instant invention addresses one of the problems which still exists with the Sanger method of sequencing. It has still been a problem to achieve high signal intensity a long distance from the primer. Most DNA polymerases do not use ddNTPs very efficiently as compared to dNTPs whereas some polymerases such as AmpliTaq® DNA Polymerase, FS
(Perkin-Elmer) have been engineered to use dNTPs and ddNTPs relatively equally. Wild-type bacteriophage T7 DNA polymerase efficiently incorporates both dNTPs and ddNTPs whereas wild-type E. coli DNA polymerase and wild-type Thermus aquaticus DNA polymerase (Taq) discriminate greatly against ddNTPs. This difference in discrimination has been attributed to a single amino acid residue of the polymerases. Replacing Tyr-526 of T7 DNA polymerase with phenylalanine increases the discrimination against the four ddNTPs by 2000-fold while replacing
the phenylalanine at the homologous position in the E. coli DNA polymerase I or T. aquaticus DNA polymerase with tyrosine decreases discrimination against the four ddNTPs by 250-8000 fold (Tabor and Richardson, 1995). Brandis et al. (1996) report that Taq DNA polymerase is biased in favor of dATP over ddATP by about 700 to 1 whereas T7 DNA polymerase showed a preference of 4 to 1.
The problem which is addressed by this disclosure is how to obtain both short and long fragments of DNA from the sequencing reactions. If the ratio of ddNTPs to dNTPs is too high most synthesized strands will quickly be terminated and it will not be possible to read very much sequence. In practice it is common that the shorter strands are produced in greater number and result in much higher signal as compared to the longer strands synthesized in the sequencing reactions. This is especially true when the label is incorporated in the primer or in the ddNTP thereby causing each strand to be labeled to an equal degree regardless of the length of the strand. Consequently, the longer strands give a much weaker signal making it difficult to read the sequence out as far as desired. This invention overcomes this problem by using a combination of DNA polymerases wherein a first polymerase incorporates dNTPs at a much higher efficiency than it does ddNTPs and the second polymerase incorporates dNTPs and ddNTPs with approximately equal efficiency or incorporates ddNTPs at a higher efficiency than dNTPs. The combination of DNA polymerases either used in single temperature or cycle sequencing method allows for a more even synthesis of all sizes of terminated DNA products over a long range of sizes. The result is that one can more accurately determine long stretches of sequence in a single reaction as compared to the prior art methods.
SUMMARY OF THE INVENTION
The present invention is a modification of the Sanger method of DNA sequencing. The modification allows the determination of long stretches of DNA sequence with a single reaction.
This is accomplished by using a combination of DNA polymerases rather than a single DNA polymerase as has been done in the prior art. The combination of DNA polymerases allows for the synthesis of more equal amounts of all sizes of products from short to long than occurs with the prior art methods. Because relatively more of the longer sized products can be synthesized than occurred using the prior art methods, it is possible to more accurately determine longer amounts of sequence via a single reaction than could previously be accomplished.
DEFINITIONS
"Different DNA polymerases" means that the polymerases discriminate differently from each other as to their relative ability to incorporate dNTPs vs. ddNTPs into DNA.
"To incorporate more efficiently" means that the polymerase incorporates the favored nucleotide at higher efficiency at a specific site than the disfavored nucleotide. For example, if it is stated that the DNA polymerase incorporates dNTPs into DNA at least 20-fold more efficiently than it incorporates ddNTPs into DNA, this means that the polymerase is 20 times more likely to bind and insert a dNTP than the ddNTP at a specific site under the reaction conditions being used. "Relatively equal efficiency" or "similar efficiency" as applied to the relative ability of a DNA polymerase to insert a dNTP vs. its ability to insert a ddNTP means that the polymerase will insert a dNTP no more than 10-fold more efficiently than it inserts a ddNTP and that the polymerase will insert a ddNTP no more than 10-fold more efficiently than it inserts a dNTP.
DETAILED DESCRIPTION OF THE INVENTION
Sanger dideoxynucleotide termination chemistry (Sanger method) is the most widely used method to determine the sequence of bases along a length of DNA. The components in the Sanger method include: dideoxynucleotide, deoxynucleotides, DNA polymerase, reaction buffer, primer and template. DNA polymerase extends the length of DNA from a primer when given the appropriate template, buffer and deoxynucleotides. The basic mechanism by which the
Sanger method works is the termination of elongation at a specific position by the incorporation of a dideoxynucleotide in place of a deoxynucleotide by the DNA polymerase. Typically, only a single type of dideoxynucleotide is used in a reaction, so four different reactions are required to determine the DNA sequence. However, if the four different dideoxynucleotides are fluorescently labeled with different fluors, then the reaction may be performed in a single tube.
The ratio of dideoxynucleotide to deoxynucleotide determines the probability that termination will occur at a particular base. This probability is also dependent upon the efficiency with which the DNA polymerase is able to incorporate the dideoxynucleotide. Most DNA polymerases cannot incorporate dideoxynucleotides efficiently. Thermostable DNA polymerases such as native Taq DNA polymerase inherently do not use dideoxynucleotides efficiently. Therefore to use native Taq DNA polymerase in a DNA sequencing reaction the concentration of
dideoxynucleotide is increased to an empirically determined level. DNA polymerases such as AmpliTaq® DNA Polymerase, FS have been engineered so that they discriminate very little between dideoxynucleotides and deoxynucleotides.
DNA elongation termination is controlled by the relative effective concentration of dideoxynucleotides to deoxynucleotides and the processivity of the DNA polymerase. The probability that the DNA polymerase will incorporate a dideoxynucleotide vs. a deoxynucleotide is dependent upon their relative effective concentration. This probability starts at the base immediately adjacent to the 3' end of the primer and follows an exponential decay. Therefore most of the signal (i.e., termination) will be near the beginning of the sequencing primer. This makes it difficult to obtain sequence information farther from the primer. Being able to read more sequence from a single reaction is highly desirable. Yet, without any special treatment of the sequencing reaction, the farther away that the reaction gets from the sequencing primer the higher the probability that a termination event will have already occurred. Effectively, the signal intensity for sequences far away from the sequencing primer is low. The kit Sequenase from U.S. Biochemical (USB) in the late 1980s solved this problem by adding an elongation step prior to the termination step. The sequencing primer is extended for a short period of time in a first step which includes dNTPs but does not include ddNTPs. This step produces elongated DNA strands from the sequencing primer. The 3' end of each of these elongated strands is a deoxynucleotide and therefore all of these elongated strands are still templates for further elongation. The length of elongation is dependent upon the processivity of the DNA polymerase and this processivity may be modified by varying the ionic strength. After the elongation step a second step which is a termination reaction is performed. Dideoxynucleotides are used in this second step and they cause sequence defined termination to occur. However, because the Sequenase kit uses a two step process it cannot be used in a high throughput environment. This is because it requires a second addition of reagents.
The disclosed method takes advantage of the differential ability of DNA polymerases to incorporate dideoxynucleotides to produce a system where the distribution of DNA sequencing products can be controlled. This new sequencing system uses two different DNA polymerases. A first DNA polymerase is one that cannot efficiently incorporate dideoxynucleotides and/or may be incapable of incorporating ddNTPs. A second DNA polymerase is able to readily incorporate dideoxynucleotides and may incorporate dNTPs and ddNTPs relatively equally or it may favor
ddNTPs. Both of these DNA polymerases may also be thermostable, and may be used in cycle- sequencing reactions.
The first DNA polymerase may not incorporate dideoxynucleotides efficiently, so for this first DNA polymerase the effective concentration of dideoxynucleotides is low and this DNA polymerase acts as if it only extends the sequencing primer. The 3' ends of these extended products most of the time will contain a deoxynucleotide and therefore they are still a substrate for DNA polymerase extension. The second DNA polymerase is able to incorporate dideoxynucleotides and deoxynucleotides with more similar efficiency than the first DNA polymerase or alternatively it may favor ddNTPs either slightly or up to a large extent even to the exclusion of incorporating dNTPs. So the second DNA polymerase will generate termination products. Both DNA polymerases can be thermostable so that the extension-termination steps may be repeated several times. The thermocycle profile includes denaturation of the template at an elevated temperature, annealing of the sequencing primer and extension-termination. This method will distribute the signal intensity more evenly throughout the sequence and will allow one to move and control the distribution of the signal intensity (e.g., move the distribution farther away from the sequencing primer). The distribution of the signal intensity is dependent upon the ability of the two DNA polymerases to use the dideoxynucleotide and their processivity (number of nucleotides added per enzyme-substrate (primeπtemplate) binding event). Control of the signal intensity along the DNA sequence is effected by manipulating the concentration of the first DNA polymerase relative to the second DNA polymerase, the concentration of the DNA polymerases in the sequencing reaction and the relative concentration of the dideoxynucleotides to the deoxynucleotides.
In another aspect of the invention, the sequencing reactions with two different polymerases can be performed in separate tubes and then combined before gel analysis.
The present invention is further detailed in the following Example, which is offered by way of illustration and is not intended to limit the invention in any manner. Standard techniques well known in the art or the techniques specifically described below are utilized.
EXAMPLE Protocol for Obtaining More Even Signal Intensities Along Sequencing Reaction
Prepare a 5X PCR buffer of 50% sucrose, 50 mM Tris pH 8.1, 250 mM KC1, 0.05% Tween 20, 5 mM EDTA, 3 mM MgCl2. Prepare four separate deoxynucleotides/ dideoxynucleotide (d/ddNTP) mixes. Each mix is 7.2 mM of each of four deoxynucleotides (dATP, dGTP, dCTP, dTTP) and 56 μM of a single dideoxynucleotide.
Mix the following reagents in four tubes labeled with AF, CF, GF, TF for the four termination reactions (sufficient for eight reactions). Native Taq DNA polymerase, Stoffel fragment or any DNA polymerase of the first type (those which favor dNTPs over ddNTPs) may be substituted for Platinum Taq. Thermosequenase or any DNA polymerase of the second type (those which use dNTPs and ddNTPs relatively equally or which favor ddNTPs) may be substituted for AmpliTaq® DNA Polymerase, FS.
*FET stands for fluorescence energy transfer. These are sold by Amersham.
Dispense 2 μL of the AF and CF and 4 μL of the GF and TF mixes into individual wells of a 384 well microtiter plate. Add 2 μL of the sequencing template (pGEM 3ZF(+) at 0.2 μg/μL) into the AF and CF mixes and 4 μL of the same template into the GF and TF mixes. Overlay 4 μL of silicone oil into each well and centrifuge to bring the reagents to the bottom of the well. Cover the plate with a silicone gasket and thermocycle. The thermocycle profile is 94°C for 20 minutes; 56°C for 30 minutes; 72°C for 1 minute for 32 cycles, hold at 72°C for 1 minute, then keep at 4°C.
After the thermocycling, consolidate the AF, CF, GF and TF reaction for each template. Add 0.1 volume of 7 M NH4OAc and 2.5 volumes of 95% ethanol. Mix well and pellet the
precipitated sequencing reactions by centrifugation. Wash the pelleted sequencing product with 70%) ethanol and allow to dry. The sequencing product can then be resuspended in formamide and loaded onto an automated fluorescent sequencer.
While the invention has been disclosed in this patent application by reference to the details of preferred embodiments of the invention, it is to be understood that the disclosure is intended in an illustrative rather than in a limiting sense, as it is contemplated that modifications will readily occur to those skilled in the art, within the spirit of the invention and the scope of the appended claims.
LIST OF REFERENCES Brandis JW, et al. (1996). Biochemistry 35:2189-2200. Maxam AM and Gilbert W (1977). Proc. Natl. Acad. Sci. USA 74:560-564. Sanger F, Nicklen S and Coulson AR (1977). Proc. Natl. Acad. Sci. USA 74:5463-5467. Tabor S and Richardson CC (1995). Proc. Natl. Acad. Sci. USA 92:6339-6343.