WO2004067764A2

WO2004067764A2 - Nucleic acid sequencing using nicking agents

Info

Publication number: WO2004067764A2
Application number: PCT/US2004/002719
Authority: WO
Inventors: Jeffrey Van Ness; David J. Galas; Lori K. Van Ness
Original assignee: Keck Graduate Institute
Priority date: 2003-01-29
Filing date: 2004-01-29
Publication date: 2004-08-12
Also published as: WO2004067764A3

Abstract

Nucleotide base sequence information is obtained upon combining a template nucleic acid comprising a primer strand and a target nucleic acid strand, the target nucleic acid strand comprising an interrogation sequence, with a nicking agent that recognizes a double-stranded sequence present in the template nucleic acid and then nicks the primer strand of the template nucleic acid at a location relative to the nicking agent recognition sequence, with a distributive polymerase that will extend the primer strand from the site where the nicking agent nicked the primer strand, in the presence of suitably-reactive mononucleotides, to provide a plurality of oligonucleotides that are complements of the sequence present in the interrogation sequence, where the plurality differ in having different nucleotide lengths, so the aligning the plurality of oligonucleotides provides nucleotide base sequence information about the interrogation sequence.

Description

NUCLEIC ACID SEQUENCING USING NICKING AGENTS

BACKGROUND OF THE INVENTION

Field of the Invention

The present invention is generally directed to compositions and methods for obtaining information about the nucleotide base sequence present in a polynucleotide.

Description of the Related Art

The technological capability to detect the presence of a particular nucleic acid molecule in a biological sample or to detect a particular sequence in a nucleic acid molecule is of substantial importance in forensics, epidemiology and public health, and in medicine. Assays can be used, for example, to identify tissue samples, detect microbial contaminants in water, identify the causal agent of an infectious disease, or to predict the likelihood that an individual would suffer from a genetic disease. A clinician who can obtain a molecular analysis of a patient's cells or tissues may be able to predict or diagnose a disease, which, in turn, may guide the clinician toward choosing the most appropriate treatment for prevention or therapy.

Mutations, whether occurring in a host somatic or germline cell or in a microorganism causing a disease, can result in a substantial alteration of protein function that impacts morbidity and mortality of the host. Many of the 150-200 common human genetic diseases and about 600-800 of the more rare genetic diseases are associated with one or more defective genes. More than 200 of these genetic diseases are caused by a defect in a single gene, often resulting in a change of only a single amino acid residue. (Olsen, Biotechnology: An Industry Comes of Age (National Academic Press, 1986)). Other mutations may occur in host cells as part of a response to a particular treatment of a disease, for example, resulting in resistance to a chemotherapeutic drug used for treatment of cancer. Mutations in infectious disease organisms also can occur in response to drugs or antibiotics, requiring adjustments in the course of treatment. Methods to identify mutations are therefore needed so that a clinician can determine the presence of or the risk of presence of a disease and appropriately design a course of treatment for prevention or therapy.

Methods for determining the presence of a mutation include various methods for mismatch analysis. One method involves passive hybridization of numerous probes of known sequence to a target nucleic acid under varying stringency conditions to determine which probe binds to the target nucleic acid under the most stringent conditions. (See, e.g., Guo et al., Nucleic Acids Res. 22:5456-65, 1994; Kozal et al., Nature Medicine 2:753-59, 1996). Because the hybridization rate is proportional to the initial concentration of target fragments in solution, high concentrations of single-stranded nucleic acid target fragments are required. Stringency and rate of hybridization are generally controlled by temperature and salt concentration. Because all probes are exposed to the same conditions simultaneously, the probes are necessarily limited in length, GC content, and secondary structure. Because of difficulties controlling hybridzation conditions, single base discrimination is generally restricted to capture oligomer sequences of 20 bases or less with centrally placed differences. (See, e.g., supra Guo et al and Kozal et al.).

Other mismatch methods include enzymatic or chemical cleavage techniques. For example, an RNA protection method detects mismatches between an RNA-RNA or RNA-DNA hybrid. (See, e.g., Winter et al., Proc. Natl. Acad. Sci. USA 32:7575, 1985; Meyers et al., Science 230:1241 , 1985). Other methods use DNA probes rather than RNA to detect mismatches via enzymatic or chemical cleavage. (See, e.g., Cotton et al., Proc. Natl. Acad. Sci. USA 85:4397, 1988; Shenk et al., Proc. Natl. Acad. Sci. USA 72:989, 1975). However, all these methods are less sensitive than sequencing.

The above mismatch methods may also be limited by the need for additional amplification steps and conversion of a double-stranded molecule to a single strand. A technique that incorporates amplification as a step and uses double-stranded DNA is the "di-oligonucleotide" method. According to this method, two (or more) oligonucleotides anneal to complementary sequences at adjacent sites in a target nucleic acid and are ligated in the presence of a nucleic acid having the sequence of the resulting "di-oligonucleotide," thereby amplifying the di-oligonucleotide (Wu et al., Genomics 4:560, 1989). If a single base pair mismatch exists between the target nucleic acid serving as template and the annealed oligonucleotides, the latter are not ligated, thus distinguishing between DNA templates. This method provides only the limited information that a mismatch is present but does not identify the nucleotide base or bases of the mismatch.

DNA sequencing, whether for research or diagnostics, is generally performed using techniques based on the "chain termination" method described by Sanger et al., Proc. Nat'l Acad. Sci. (USA) 74(12):5463-5467, 1977. Basically, in this process, DNA to be tested is isolated, rendered single stranded, and placed into four vessels. In each vessel are the necessary components to replicate the DNA strand, i.e., a template-dependant DNA polymerase, a short primer molecule complementary to a known region of the DNA to be sequenced, and the standard deoxynucleotide triphosphates (dNTP's) commonly represented by A, C, G and T, in buffers compatible with hybridization between the primer and the DNA to be sequenced and chain extension of the hybridized primer. In addition, each vessel contains a small quantity of one type (i.e., one species) of dideoxynucleotide triphosphate (ddNTP), e.g., dideoxyadenosine triphosphate (ddA).

In each vessel, the primer hybridizes to a specific site on the isolated DNA. The primers are then extended, one base at a time to form a new nucleic acid polymer complementary to the isolated pieces of DNA. When a dideoxynucleotide triphosphate is incorporated into the extending polymer, this terminates the polymer strand and prevents it from being further extended. Accordingly, in each vessel, a set of extended polymers of specific lengths are formed which are indicative of the positions of the nucleotide corresponding to the dideoxynucleotide in that vessel. These sets of polymers are then evaluated using gel electrophoresis to determine the sequence.

It is usually the case that DNA sequencing procedures have dealt with this complexity by adding steps which substantially purify the DNA of interest relative to other DNA species present in the sample. This purification has been accomplished by cloning of the DNA to be sequenced prior to sequencing, or by amplification of a selected portion of the genetic material in a sample to enrich the concentration of a region of interest relative to other DNA. For example, it is possible to amplify a selected portion of a gene using a polymerase chain reaction (PCR) as described in U.S. Pat. Nos. 4,683,194, 4,683,195 and 4,683,202, which are incorporated herein by reference. This process involves the use of pairs of primers, one for each strand of the duplex DNA, that will hybridize at a site located near a region of interest in a gene. Chain extension polymerization (without a chain terminating nucleotide) is then carried out in repetitive cycles to increase the number of copies of the region of interest many times. The amplified polynucleotides are then separated from the reaction mixture and used as the starting sample for the sequencing reaction. Gelfand et al. have described a thermostable enzyme, "Taq polymerase," derived from the organism Thermus aquaticus, which is useful in this amplification process. (See U.S. Pat. Nos. 4,889,818; 5,352,600 and

5,079,352). Taq polymerase has also been disclosed as useful in sequencing DNA when certain special conditions are met. U.S. Pat. No. 5,075,216.

Improvements to the original technique described by Sanger et al. have included improvements to the enzyme used to extend the primer chain. Reeve et al. have described a thermostable enzyme preparation, called Thermo Sequenase™, with improved qualities for DNA sequencing. Nature 376:796- 797, 1995; EP-A-0 655 506. For sequencing, the Thermo Sequenase.TM. product is used with an amplified DNA sample containing 0.5-2 micrograms of single stranded DNA (or 0.5 to 5 ug of double stranded DNA) into four aliquots, and combining each aliquot with the Thermo Sequenase™. enzyme preparation, one dideoxynucleotide termination mixture containing one ddNTP and all four dNTP's; and one dye-labeled primer which will hybridize to the DNA to be sequenced. The mixture is placed in a thermocycler and run for 20-30 cycles of annealing, extension and denaturation to produce measurable amounts of dye-labeled extension products of varying lengths which are then evaluated by gel electrophoresis.

Notwithstanding the observations in the art that enzymes useful for amplification can also be used for sequencing, and vice versa, efforts to combine the amplification reaction and the sequencing reaction into a single step have been limited. Ruano and Kidd, Proc. Nat'l Acad. Sci. (USA) 88:2815- 2819, 1991 , and U.S. Pat. No. 5,427,911 describe a process which they call "coupled amplification and sequencing" (CAS) for sequencing of DNA. In this process, a sample is treated in a first reaction stage with two primers and amplified for a number of cycles to achieve 10,000 to 100, 000-fold amplification. A ddNTP is then added during the exponential phase of the amplification reaction, and the reaction is processed for additional thermal cycles to produce chain-terminated sequencing fragments. The CAS process does not achieve the criteria set forth above for an ideal diagnostic assay because it requires an intermediate addition of reagents (the ddNTP reagents). This introduces an opportunity for error or contamination and increases the complexity of any apparatus which would be used for automation.

The invention of the automated fluorescence DNA sequencer (Smith, et al., "The synthesis of oligonucleotides containing an aliphatic amino group at the 5' terminus: synthesis of fluorescent DNA primers for use in DNA sequence analysis," Nucleic Acids Res. 73:2399-2412, 1985; Smith, et al., "Fluorescence detection in automated DNA sequence analysis," Nature

321:674-679, 1986; Hood, et al., "Automated DNA sequencing and analysis of the human genome," Genomics 7:201-212, 1987; Hunkapiller et al., "Large- scale and automated DNA sequence determination," Science 254:59-67, 1991) has advanced the level of DNA sequencing. Until recently, the most commonly used format was a horizontal or vertical slab gel. Years of research on capillary sequencers have yielded several recent commercial systems that significantly increase the throughput and decrease the time required to sequence. Some of the efforts in slab- and capillary-based methods are described below. An excellent review of all sequencing technologies is provided by Meldrum (Meldrum, D., "Automation for Genomics, Part Two: Sequencers, Microarrays, and Future Trends," Genome Research 70(9):1288-1303, 2000). A review of automated DNA sequencing operation is provided in (Huang, G.M., "High- throughput DNA sequencing: A genomic data manufacturing process," DNA Seq. 70:149-153, 1999).

Pyrosequencing is another DNA sequencing method that takes utilizes four enzymes cooperating in a single tube to determine the nucleotide composition of a DNA fragment in real time. Detection is based on the amount of visible light produced by coupling the pyrophosphate that is released during nucleotide incorporation with the enzymes sulfurylase and luciferase. Unincorporated nucleotides are degraded in the reaction mixture by the enzyme apyrase. A fully automated instrument called the PSQ96 System has been developed primarily for SNP analysis by Pyrosequencing AB (Uppsala, Sweden) (http://www.pyrosequencing.com) to perform pyrosequencing. It uses ink-jet technology to dispense submicroliter volumes of the four nucleotides into a single tube coupled with simultaneous detection of all samples by a single CCD unit (Hyman 1988; Ronaghi, M. et al., "Automated pyrosequencing in screening of a cDNA library," Microb. Comparat. Genomics 3: C-33, 1998a; Ronaghi, M. et al., "A sequencing method based on real-time pyrophosphate," Science 287:363-365, 1998b; Ronaghi, M., et al., "Analyses of secondary structures in DNA by pyrosequencing, Anal. Biochem. 267:65-71 , 1999; Ahmadian, A., et al., "Single-nucleotide polymorphism analysis by pyrosequencing," Anal. Biochem. 280:103-110, 2000; Andersson et al. 2000; Nordstrom, T., et al., "Method enabling pyrosequencing on double-stranded DNA," Anal. Biochem. 282:186-193, 2000).

The usefulness of a detection assay, including sequencing, is often limited by the concentration of the particular target nucleic acid molecule present in a sample. Thus, methods for amplifying the number of molecules in a sample have been developed as adjuncts to detection assays. Recombinant DNA methods for in vivo amplification of purified nucleic acid fragments have long been used. Typically, such methods involve introducing a nucleic acid fragment of interest into a DNA or RNA vector, transfecting or transforming a host cell, such as E. coli, with the vector, amplifying a clone of the vector by culturing the host cell, and recovering the amplified nucleic acid fragment. (See, e.g., Cohen et al., U.S. Patent No. 4,237,224; Maniatis et al., Molecular Cloning (A Laboratory Manual), Cold Spring Harbor Laboratory (1982)). In many instances in clinical medicine, however, the target nucleic acid fragment cannot be readily cloned because the concentration of a target species in a sample is too low. In these situations, in vitro methods of nucleic acid amplification employing template-directed extension may provide sufficient amounts of the target nucleic acid for analysis.

The most widely used template extension method for nucleic acid amplification is the polymerase chain reaction (PCR). (See, e.g., Mullis et al., Cold Spring Harbor Symp. Quant. Biol. 57:263-73, 1986; Erlich et al., EP 50,424, EP 84,796, EP 258,017, EP 237,362; Mullis, EP 201 ,184; Mullis et al., U.S. Patent No. 4,683,202; Erlich, U.S. Patent No. 4,582,788; Saiki et al., U.S. Patent No. 4,683,194; and Higuchi, PCR Technology, Ehrlich, H. (ed.), Stockton Press, N.Y., p 61-68 (1989)). By PCR the concentration of the target nucleic acid can be selectively increased even when the target molecule has not been purified or when only a single copy is present in a sample.

Specific amplification of the target nucleic acid molecule by PCR requires the presence of two oligonucleotides that serve as primers for the template-dependent, polymerase-mediated replication of the target molecule. The two primers are designed to comprise sequences identical to or complementary to sequences that flank the target nucleic acid sequence to be amplified. The extension product of the "first" primer contains a sequence that is complementary to the sequence or a portion of the sequence of the "second" primer, such that the extension product of the "first" primer serves as the template for production of the extension product of the "second" primer. Similarly, the extension product of the "second" primer contains a sequence complementary to the "first" primer and thus serves as template for the "first" primer. By repeating cycles of hybridization, polymerization, and denaturation steps, the concentration of the target nucleic acid can be exponentially increased. (See, e.g., Mullis, Cold Spring Harbor Symp. Quant. Biol. 57:263- 73, 1986; and Mullis et al., Meth. Enzymol. 155:335-50, 1987).

Generally, rapid and extensive amplification of a target nucleic acid molecule can be achieved by PCR methods. The salient deficiencies of the method, however, can affect the yield and may reduce the usefulness of PCR in nucleic acid detection assays, such as sequencing methods. Because two different primers are required for PCR and therefore by necessity have different nucleotide sequences, the efficiency of the reaction may be affected by the concentration of each primer and by reaction conditions, such as buffers and temperatures, which can affect the efficiency of hybridization. An additional disadvantage of PCR is the requirement for thermocycling. If extension of either primer has not been completed prior to the heating step of the following cycle, the rate of amplification is impaired.

Other known nucleic acid amplification procedures include transcription-based amplification systems (Kwoh et al., Proc. Natl. Acad. Sci. USA 86:1173, 1989; Gingeras et al., PCT application WO 88/10315; Davey et. al., European Patent Application Publication No. 329,822; Miller et al., PCT WO 89/06700); "RACE" (Frohman, In: PCR Protocols: A Guide to Methods and Applications, Academic Press, N.Y. (1990)); and "one-sided PCR" (Ohara et al., Proc. Natl. Acad. Sci. USA 86:5673-77, 1989). Another exponential amplification method "strand displacement amplification" (SDA). SDA is an isothermal method that uses restriction enzymes to nick the unmodified strand of a hemiphosphorothioate form of its recognition site in the presence of a DNA polymerase that extends the 3'-end at the nick, which displaces the downstream DNA strand. (See, e.g., Walker et al., Proc. Natl. Acad. Sci. USA 89:392-96, 1992; Walker et al., Nucleic Acids Res. 20:1691 -96, 1992). All of the above amplification procedures depend on the principle that an end-product of a cycle is structurally identical to the starting material. Therefore, amplification depends on whether the temperature, time, and reaction components are optimal for the hybridization and extension steps, which, as noted above for PCR, may be difficult to achieve when more than one oligonucleotide is used as a primer. New advances in medicine are being made at a rapidly increasing pace, largely due to recent developments in genome analysis. DNA sequence analysis provides a clearer understanding of how genetic variation leads to disease Three major milestones have made it possible to streamline and automate most of the processes required for DNA analysis: the invention of sequencing reactions (Maxam, A.M. and Gilbert, W., "A new method for sequencing DNA," Proc. Natl. Acad. Sci. 74: 560-564, 1977; Sanger, F., et al., "DNA sequencing with chain-terminating inhibitors," Proc. Natl. Acad. Sci. 74: 5463-5467, 1977), the Polymerase Chain Reaction (PCR) (Mullis, K.B., et al., "Specific enzymatic amplification of DNA in vitro: the polymerase chain reaction," Cold Spring Harbor Symp. Quant. Biol. 57:263- 273, 1986; Mullis, K.B. and Faloona, F.A., "Specific synthesis of DNA in vitro via a polymerase-catalyzed chain reaction," Meth. Enzymol. 755:335-350, 1987), and automated fluorescent DNA sequencers (Smith, L.M., et al., "The synthesis of oligonucleotides containing an aliphatic amino group at the 5' terminus: synthesis of fluorescent DNA primers for use in DNA sequence analysis," Nucleic Acids Res. 73:2399-2412, 1985; Smith, L.M., et al., "Fluorescence detection in automated DNA sequence analysis," Nature 327:674-679, 1986; Hood, L.E., et. al., "Automated DNA sequencing and analysis of the human genome," Genomics 7:201-212, 1987; Hunkapiller, T., et al., "Large-scale and automated DNA sequence determination," Science 254:59-67, 1991 ). DNA sequencing is one of the most important platforms for the study of biological systems today. Sequence determination is most commonly performed using dideoxy chain termination technology. There is presently an exponential growth in the number of sequencing experiments being performed and the volume of data collected. This growth, driven by the Human Genome Project (Collins, F.S., et al., "New goals for the U.S. Human Genome Project," Science 282:682-689, 1998.) and other large-scale sequencing projects, is made possible by advances in sequencing methods and laboratory automation. The development of DNA sequence determination techniques with enhanced speed, sensitivity, and throughput are of utmost importance for the study of biological systems. Conventional DNA sequencing relies on the elegant principle of the dideoxy chain termination technique first described more than two decades ago (Sanger, F., et al., "DNA sequencing with chain- terminating inhibitors," Proc. Natl. Acad. Sci. 74:5463-5467, 1977). Many research groups around the world have put effort into the development of alternative principles for DNA sequencing. Three general alternative methods are sequencing by hybridization (Bains and Smith 1988; Drmanacet al. 1989; Khrapko et al. 1989; Southern 1989), parallel signature sequencing based on ligation and cleavage (Brenner et al. 2000), and pyrosequencing (Ronaghi et al. 1996; Ronaghi, M., et al., "A sequencing method based on real-time pyrophosphate," Science 287:363-365, 1998b). Pyrosequencing has been successful for both confirmatory sequencing and de novo sequencing. This technique has not been used for genome sequencing due to the limitation in the read length, but it has been employed for applications such as genotyping (Ahmadian et al. 2000a; Alderborn et al. 2000; Ekstrόm et al. 2000; Nordstrom et al. 2000b), resequencing of diseased genes (Garcia et al. 2000), and sequence determination of difficult secondary DNA structure (Ronaghi, M., et al., "Analyses of secondary structures in DNA by pyrosequencing," Anal. Biochem. 267:65-71 , 1999).

The term 'genotyping' describes the genetic characterization of a genome. The genotype analysis is performed to identify mutations that differentiate one individual or strain from another. The mutations may confer resistance to specific antiviral drugs or they may simply allow classification of a strain as to 'type' and 'subtype'. There are four human viruses for which genotype information is clinically useful and well illustrates the need for accurate sequencing and genotyping. Hepatitis B virus (HBV) infections are being treated with antiretroviral drugs and resistance after prolonged treatment is common. Since HBV cannot be cultured, the only method of detecting resistance-conferring mutations in the genome is through genotypic analysis. Hepatitis C virus (HCV) infection can be treated with the combination of interferon and ribavirin but certain strains of virus are more resistant to treatment than others. Since interferon treatment may have significant side effects, the determination of HCV genotype is an important aspect of this therapeutic regimen. Treatment of cytomegalovirus (CMV) disease with nucleoside analogues occasionally results in resistant virus with mutations in the phosphotransferase gene (UL97) and/or the DNA polymerase gene (UL54) that can be tested with phenotypic or genotypic assays. Since CMV grows very slowly, it may be more clinically useful to perform a rapid genotypic assay although only the UL97 gene can be efficiently genotyped. Finally, the virus for which genotyping has become the standard of care, human immunodeficiency virus type 1 (HIV-1) can now be genotyped routinely by many clinical virology labs experienced with molecular amplification methods and automated DNA sequencing technology. All currently-available antiretroviral drugs are directed against either the protease or reverse transcriptase genes of HIV-1 and the mutations within these genes that confer resistance have been well described. Sequence-based genotyping methods are not necessarily the best approach for routine genotyping of these four viruses, but sequencing is the gold standard from which other methods are developed and against which they are compared. There are currently two widely used methods for amplifying specific DNA sequences, PCR and the rolling circle amplification method. The PCR method is the simpler and more flexible of these and has the added advantage of being geometric rather than linear in character so that the amplification levels of 10⁶ or more can be achieved. It is by far the most widely used amplification method in biology. It has the disadvantage relative to the isothermal rolling-circle amplification method, however, of needing a temperature cycling protocol to achieve amplification. This imposes instrumentation constraints on the PCR method that make it more complex and limits the rate of the amplification to the temperature cycling schedule. Another limitation of the rate of PCR derives from the nature of the reaction itself in that a maximum two-fold amplification can be achieved in each cycle. Advances in speed, accuracy and sensitivity of amplification, in addition to simplicity, would be important and most welcome for many applications in biology and medicine.

Because of the limitations of the amplification methods, sequencing techniques that incorporate these different amplification methods often add more complexity. One sequencing method that generates fragments for sequencing by strand displacement amplification uses a duplex probe as a primer, which requires a ligation step, and incorporation of terminating deoxynucleotides to generate different length fragments. (See, e.g., Fu et al., Nucleic Acids Res. 25:677-79, 1997). Critical to another method, is the addition of two polymerases, each with differing affinities for dideoxynucleotides, to facilitate the amplification reaction. (See, e.g., Koster et al., U.S. Patent No. 5,928,906).

Many diseases are associated with particular DNA sequences. The DNA sequences are often referred to as DNA sequence polymorphisms to indicate that the DNA sequence associated with a diseased state differs from the corresponding DNA sequence in non-afflicted individuals. DNA sequence polymorphisms can include, e.g., insertions, deletions, or substitutions of nucleotides in one sequence relative to a second sequence. An example of a particular DNA sequence polymorphism is 5' -ATCG-3' , relative to the sequence 5' -ATGG-3' . The first nucleotide ^*G^" in the latter sequence has been replaced by the nucleotide ^'C in the former sequence. The former sequence is associated with a particular disease state, whereas the latter sequence is found in individuals not suffering from the disease. Thus, the presence of the nucleotide sequence ^{^}δ-ATCG-3^" indicates the individual has the particular disease. This particular type of sequence polymorphism is known as a single-nucleotide polymorphism, or SNP, because the sequence difference is due to a change in one nucleotide. Techniques which enable the rapid detection of as little as a single DNA base change are therefore important methodologies for use in genetic analysis. Because the size of the human genome is large, on the order of 3 billion base pairs, techniques for identifying polymorphisms must be sensitive enough to specifically identify the sequence containing the polymorphism in a potentially large population of nucleic acids.

See also the following references that are relevant to the technology of the present invention: Butler, J.M., Li, J., Shaler, T.A., Monforte, J.A., and Becker, CH. 1999a. Reliable genotyping of short tandem repeat loci without an allelic ladder using time-of-flight mass spectrometry. Int. J. Legal Med. 772:45-49; Butler, J., Shaler, T., Royer, S., and Monforte, J. 1999b. High- throughput multiplexed SNP genotyping by mass spectrometry. Microb. Comparat. Genomics 4:111 ; Ekstrand, G., Holmquist, C, Orlefors, A.E., Hellman, B., Larsson, A., and Andersson, P. 2000. Microfluidics in a rotating CD. In Micro total analysis systems 2000 (ed. A. van den Berg, W. Olthuis and P. Bergveld), pp. 311-314. Kluwer Academic Publishers, Boston, MA; Fodor, S.P.A., Read, J.L., Pirrung, M.C., Stryer, L., Lu, A ., and Solas, D. 1991. Light- directed, spatially addressable parallel chemical synthesis. Science 251:767- 773; Gevaert, K. and Vandekerckhove, J. 2000. Protein identification methods in proteomics. Electrophoresis 27:1145-1154; Li, J., Butler, J.M., Tan, Y., Lin, H., Royer, S., Ohler, L., Shaler, T.A., Hunter, J.M., Pollart, D.J., Monforte, J.A. 1999. Single nucleotide polymorphism determination using primer extension and time-of-flight mass spectrometry. Electrophoresis 20:1258-1265; Monforte, J.A. and Becker, CH. 1997. High-throughput DNA analysis by time-of-flight mass spectrometry. Nat. Med. 3:360-362; Shaler, T.A., Tan, Y., Wickham, J.N., Wu, K.J., and Becker, CH. 1995. Analysis of enzymatic DNA sequencing reactions by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Rapid Commun. Mass Spectrom. 9:942-947; and Smith, T.M., Abajian, C, and Hood, L. 1997. Hopper: software for automating data tracking and flow in DNA sequencing. CABIOS 73:175-182. A need exists in the art for rapid, sensitive, and uncomplicated methods to obtain base sequence information from a target polynucleotide. Such information may, for example, identify mutations associated with human diseases and disorders, both in the host cells and in organisms that infect the host. The present invention fulfills this and related needs by providing a sequencing method that can be performed with increased speed and convenience and at less cost.

BRIEF SUMMARY OF THE INVENTION

We disclose here a new class of isothermal reactions for sequencing short stretches of DNA that overcomes the disadvantages of traditional sequencing that employ tags or labels. This class includes a linear amplification method and several versions of an exponential amplification scheme. These reactions are simple, flexible, and require no special cycling of conditions. The reactions depend entirely for their rate of amplification on the molecular parameters governing the interactions of the molecules in the reaction. Because of the balance between the thermal properties of the DNA oligonucleotides and the enzymes used, the optimum temperature of the reaction with these enzymes is about 60°C The exponential version of the method, designated the exponential amplification reaction (EXPAR), is an isothermal, molecular chain reaction in that the products of one reaction catalyze further reactions that copy and sequence triggering oligonucleotides. The linear version of the method is the basic sequencing reaction upon which EXPAR is based.

In short, a method has been devised in which a ladder of oligonucleotides are linearly amplified from genomic DNA, cDNA, mRNA or other nucleic acid without necessarily using tags or labels. In some cases and when needed, the ladder of oligonucleotides can be used to trigger their own exponential amplification. By generating an amplification-template, a linear amplification of ladders of oligonucleotides are generated by coupling a nicking enzyme (e.g., N.BstNBI) and a polymerase in an isothermal reaction at about 55-65°C The ladder of oligonucleotides from the linear amplification differ as a single base which can usefully be separated by sequence or length using liquid chromatography, which can be coupled to electrospray ionization Time-of-Flight (ESI-TOF) mass spectrometry. In one aspect, the read-out is performed by mass spectrometry using with LC -ESI-TOF or MALDI. Foreknowledge of the sequence of the individual or organism is not necessary as it is possible to generate the fragments de novo from genomic DNA. The methods described here permit the creation an assay panel of diagnostic sequences that can identify any organism or individual. In some cases, the ladder of oligonucleotides from the linear sequencing reaction can then be coupled to an isothermal method for exponentially amplifying the triggering sequences in true chain reactions. The triggering and amplification reaction can be made a homogenous assay in which 10⁸- 1O⁹— fold amplification can be achieved in as little as 3 minutes. The exponential or string reaction is composed of two steps: a copying reaction that replicates the triggering oligonucleotide and a second copying reaction that copies the complement of the original trigger. In the chain amplification step, two template oligonucleotides are preferably used, one that is partially or completely complementary to the trigger, and the other which is partially or completely complementary to the copy of the trigger. Since both templates generate triggers, once the reaction is initiated, a chain reaction ensues. In one aspect, the present invention provides a method for obtaining base sequence information of a polynucleotide. The method includes: a) providing a polynucleotide comprising a nicking agent recognition sequence; b) combining the polynucleotide of a) with components comprising: i) a nicking agent that recognizes the recognition sequence; ii) a distributive polymerase; and iii) a deoxyribonucleoside triphosphate; under conditions that form a plurality of oligonucleotides, where members of the plurality differ by the number of nucleotides in the oligonucleotides; and c) characterizing the plurality of oligonucleotides to thereby obtain base sequence information about the polynucleotide.

In various optional embodiments of the invention that are contemplated by the inventors: the nicking agent is a nicking endonuclease; and/or the nicking endonuclease is N.BstNBI; and/or the distributive polymerase is selected from the group consisting of VentR® (exo-), Deep VentR® (exo-), and 9° North. Optionally, the components further comprise: trehalose; and/or dATP; and/or dCTP/ and/or dTTP/ and/or dGTP, where these are a few of many possible optional components. Typically, and in one aspect of the invention, the plurality of oligonucleotides will have lengths within the range of 4-24 nucleotides. This plurality of olignucleotides may be characterized, at least in part, by mass spectroscopy, and/or they may additionally or alternatively be characterized by liquid chromatography, where these are two of many possible characterization techniques. In one embodiment, the conditions comprise isothermal conditions, i.e., the conditions are described, in part, by specifying that the production of the oligonucleotides is achieved under isothermal conditions rather than using temperature cycling as is commonly done with PCR. For instance, in one embodiment, the isothermal conditions are 60°C +/- 5°C These and other embodiments are described in further detail herein. In another aspect, the invention provides a method for determining the nucleotide sequence of an interrogation sequence located within a target polynucleotide. This method comprises:

(a) forming a double-stranded template polynucleotide comprising the target polynucleotide, the double-stranded template polynucleotide further comprising at least one nicking agent recognition sequence (NARS) and at least one nicking site (NS) susceptible to the nuclease activity of a nicking agent (NA);

(b) contacting the double-stranded template polynucleotide comprising the interrogation polynucleotide with the NA; (c) nicking the template at the NS to provide a new 3' terminus at the NS;

(d) extending the nicked template of step (c) from the new 3' terminus at the NS;

(e) amplifying two or more single-stranded polynucleotide fragments by repeating steps (c) and (d), wherein each fragment is extended by a differing number of nucleotides, to provide a plurality of amplified fragments; and

(f) aligning the plurality of amplified fragments and identifying the 3' terminal nucleotide of each of the fragments, thereby determining the nucleotide sequence of the interrogation region of the polynucleotide.

In another aspect, the present invention provides a method for determining the nucleotide sequence of an interrogation sequence located in a double-stranded target polynucleotide having a first and second strand. This method includes: (a) linking a first oligonucleotide adaptor comprising a nucleotide sequence of the sense strand of a NERS to the first strand of the target polynucleotide that comprises the complement of the interrogation polynucleotide sequence at a location 3' to the complement of the interrogation polynucleotide sequence; and (b) linking a second oligonucleotide adaptor comprising a nucleotide sequence of one strand of a Type I Is restriction endonuclease sequence (TRERS) to the second strand of the target polynucleotide that comprises the interrogation polynucleotide sequence at a location 3' to the interrogation polynucleotide sequence; (c) extending the first and second oligonucleotide adaptors to produce a double-stranded template polynucleotide comprising the NERS and the TRERS;

(d) digesting the double-stranded template polynucleotide with a restriction endonuclease that recognizes the TRERS to produce a digestion product;

(e) contacting the digestion product with a nicking enzyme (NE) and nicking the digestion product to provide a new 3' terminus at the NS;

(f) extending the nicked digestion product of step (e) from the new 3' terminus at the NS in the presence of a distributive DNA polymerase;

(g) amplifying two or more single-stranded polynucleotide fragments by repeating steps (e) and (f), wherein each fragment is extended by a differing number of nucleotides but no more than 50 nucleotides; and

(h) aligning the amplified fragments of differing lengths and identifying the 3' terminal nucleotide of each fragment, thereby determining the nucleotide sequence of the interrogation polynucleotide.

In another aspect, the present invention provides a method for determining the nucleotide sequence of an interrogation nucleotide sequence located in a target polynucleotide, where the method comprises: (a) forming a mixture of a first oligonucleotide primer (ODNP), a second ODNP, and the target polynucleotide, under conditions and for a time sufficient to allow the ODNPs to anneal to the target polynucleotide, wherein

(i) if the target polynucleotide is a double-stranded polynucleotide having a first strand and a second strand, then the first ODNP comprises a nucleotide sequence of the sense strand of a nicking endonuclease recognition sequence (NERS) and a nucleotide sequence at least substantially complementary to a nucleotide sequence of the first strand of the target polynucleotide and located 3' to the complement of the interrogation polynucleotide, and the second ODNP comprises a nucleotide sequence at least substantially complementary to a nucleotide sequence of the second strand of the target polynucleotide and located 3' to the interrogation polynucleotide, and optionally comprises a sequence of one strand of a restriction endonuclease recognition sequence (RERS), or (ii) if the target polynucleotide is a single-stranded polynucleotide, then the first ODNP comprises a nucleotide sequence of a sense strand of a NERS and a nucleotide sequence at least substantially identical to a nucleotide sequence of the target polynucleotide located 5' to the interrogation polynucleotide, and the second ODNP comprises a nucleotide sequence at least substantially complementary to a nucleotide sequence of the target polynucleotide located 3' to the interrogation polynucleotide and optionally comprises a RERS; and (b) extending the first and the second ODNPs to produce a double-stranded template polynucleotide comprising an NERS and a RERS; (c) digesting the double-stranded template polynucleotide of step (b) with a restriction endonuclease that recognizes the RERS to produce a digestion product; (d) contacting the digestion product with a nicking enzyme

(NE) and nicking the digestion product to provide a new 3' terminus at the NS;

(e) extending the nicked digestion product of step (d) from the new 3' terminus at the NS using a distributive DNA polymerase;

(f) repeating steps (d) and (e) to amplify two or more single- stranded polynucleotide fragments, wherein each fragment is extended by a differing number of nucleotides but no more than 36 nucleotides;

(g) separating each single-stranded polynucucleotide fragment and measuring the molecular mass of each fragment by a method comprising at least partially liquid chromatography and mass spectrometry; (h) aligning the single-stranded polynucleotide fragments according to the differences in molecular mass of each fragment and identifying the 3' terminal nucleotide of each of the fragments, and thereby determining the nucleotide sequence of the interrogation polynucleotide.

In various optional embodiments of these aspects of the invention, as contemplated by the inventors: the NA is a nicking endonuclease (NE), e.g., the NE is N.BstNb I or the NE is a N.BstNb I mutant that retains nicking activity at temperatures greater than 65°C; and/or step (d) is performed in the presence of a distributive DNA polymerase, e.g., a distributive DNA polymerase selected from the group consisting of VentR® (exo-), Deep VentR® (exo-), and 9° North. Optionally, steps (c), (d), and (e) are performed under an identical isothermal condition. While temperature cycling may be used, as is commonly done when polymerases are used in polymerase chain reaction, the methods of the present invention have the advantage that temperature cycling is not necessary. For example, in one aspect of the invention, steps (c), (d), and (e) are performed at a temperature of about 50°C to about 70°C, preferably at about 60°C The oligonucleotides produced by this method are typically not very long. For example, in various aspects, the longest extended single-stranded polynucleotide fragment of step (e) is no more than 40 nucleotides; or the longest extended single-stranded polynucleotide fragment of step (e) is no more than 28 nucleotides; or the longest extended single-stranded polynucleotide fragment of step (e) is no more than 16 nucleotides; or the longest extended single-stranded polynucleotide fragment of step (e) is no more than 12 nucleotides; or the longest extended single-stranded polynucleotide fragment of step (e) is no more than 7 nucleotides. The length of the longest extended single-stranded polynucleotide fragment is determined, in part, by the melting temperature of the fragment and the target polynucleotide, where this temperature is determined, in part, by the base composition of the fragment and the chemicals included in the reaction mixture that generates the polynucleotide fragment. The temperature at which the polynucleotide fragments are generated also impacts the longest extended single-stranded polynucleotide fragments, where higher temperatures tend to produce shorter fragments. Optionally, in the methods of the present invention, a technique selected from the group consisting of luminescence spectroscopy, fluorescence spectroscopy, mass spectrometry, liquid chromatography, fluorescence polarization, electron ionization, gel electrophoresis, gas chromatography, and capillary electrophoresis is used to characterize the ladder of oligonucleotides (sometimes referred to as single-stranded polynucleotide fragments) that are generated by the method of the present invention. For instance, when mass spectroscopy is utilized as a characterization method, the method of the present invention may further includes measuring the molecular mass of each single-stranded polynucleotide fragment, aligning the single-stranded polynucleotide fragments according to the difference in molecular mass of each fragment, and thereby determining the nucleotide sequence of the interrogation polynucleotide.

In the methods of the present invention, obtaining base sequence information about a target polynucleotide may be desirable for any of a large number of reasons. For instance, the interrogation polynucleotide may comprise a genetic variation, where the researcher or clinician wants to determine the identity of that genetic variation. The genetic variation may be, for example, a single nucleotide polymorphism. The base sequence information contained in the interrogation sequence may be associated with a disease, so that having information about the base sequence of the interrogation sequence can aid in diagnosis and/or treatment of the disease. The disease may be, for example, a human genetic disease, or the disease may be cancer. In one aspect of the invention, the target polynucleotide is isolated from at least one cell having or suspected of having a mutation in a target polynucleotide sequence, wherein said mutation is associated with tumorigenesis. In another aspect, the interrogation sequence of the polynucleotide is associated with drug resistance of a microorganism. To these and other ends, the target polynucleotide may be any of genomic nucleic acid, a cDNA, a mRNA, a ribosomal RNA, a mitochondrial DNA, and a mitochondrial RNA, or other forms and sources of nucleic acid. The target polynucleotide may be derived from an infectious agent, where exemplary infectious agents are a virus, a bacterium, a fungus, and a parasite.

In another aspect, the present invention provides kits useful in performing methods of the present invention. In one embodiment, the present invention provides a kit for determining the nucleotide sequence of an interrogation nucleotide sequence within a target polynucleotide, where the contents of the kit depend on the way in which the method will be performed, and include the following: (a) if the target polynucleotide is a double-stranded polynucleotide having a first strand and a second strand,

(i) a first oligonucleotide primer (ODNP) comprising a sequence of a sense strand of a nicking endonuclease recognition sequence (NERS) and a nucleotide sequence at least substantially complementary to a nucleotide sequence of the first strand of the target polynucleotide and located at 3' to the complement of the interrogation polynucleotide, and

(ii) a second ODNP comprising a nucleotide sequence at least substantially complementary to a nucleotide sequence of the second strand of the target polynucleotide located 3' to the interrogation polynucleotide and optionally comprising a sequence of one strand of a restriction endonuclease recognition sequence (RERS); or,

(b) if the target polynucleotide is a single-stranded polynucleotide, (i) the first ODNP comprising a sequence of a sense strand of a NERS and a nucleotide sequence at least substantially identical to a nucleotide sequence of the target polynucleotide and located 5' to the interrogation polynucleotide, and

(ii) the second ODNP comprising a nucleotide sequence at least substantially complementary to a nucleotide sequence of the target polynucleotide located 3' to the interrogation polynucleotide and optionally comprising a sequence of one strand of a RERS.

The present invention also provides compositions useful in performing the methods of the present invention, and compositions generated according to the present invention that are useful in obtaining nucleotide sequence information about a target polynucleotide. For example, in one aspect the present invention provides a composition comprising a nicking endonuclease, a distributive DNA polymerase, and one or more deoxyribonucleoside triphosphate(s). In optional embodiments: the nicking endonuclease is N.BstNb I; and/or the distributive DNA polymerase is selected from the group consisting of VentR® (exo-), Deep VentR® (exo-), and 9° North. The composition may and preferably does include an inorganic salt, e.g., potassium chloride, sodium chloride, ammonium sulfate, magnesium sulfate, etc. The composition may also include Tris-HCl, and/or trehalose. These and other aspects of the present invention are described in further detail herein.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING(S)

Figure 1a. The cycle of the synthesis and release of the amplified oligonucleotide is shown schematically. On the upper strand is indicated the recognition site for the enzyme N.BstNBI (5' -GAGTC-3' ) and the specific nicking site four bases downstream on this strand. The oligonucleotide produced is indicated in blue, the primer in green and the template in red. The lengths of the template and amplified oligo are shown in the upper left drawing. Figure 1b. The results of a linear amplification reaction where the primer template produces a 12mer as the full length product. The primer- template was present at 1 μM in a 50ul reaction and the yield of the reaction products is shown in the figure. The duplex used a top strand (ITAtop) of 16 nucleotides and a bottom strand (NBbt12) of 28 nucleotides that produced a 12- mer. Figure 2a. The relative yields of 12mer in a 30 minute reaction are shown as a function of different enzyme concentrations. Yield of a 12-mer from a duplex template as a function of polymerase concentration at 5 different nicking enzyme concentrations. The yield increases as nicking enzyme amounts increase, the optimum polymerase concentration is about 0.05 units per microliter. The nicking enzyme was N.BstNBI and the polymerase was Vent exo-, both from NEB. The sequences of the template and primer oligonucleotides were NBbt12 and ITAtop (as described in detail in Example 1 ) respectively. Figure 2b. The yield of various products as a function of time for a higher concentration of nicking enzyme (0.8 units/ μl) and a DNA polymerase concentration of 0.02 units/ul. Yield of all possible fragments from a template designed to produce an 16-mer oligonucleotide. The dotted lines represent linear, least squares fits to the data. All possible fragments were produced, albeit at low abundances compared to the full length product. The high concentration of nicking enzyme allowed linearity over about a 60 minute period. The sequences of the template and primer oligonucleotides were NBbt16 and ATAtop (as described in Example 1) respectively.

Figure 3a. Exponential amplification reactions. Diagram of the reaction scheme for the exponential amplification of oligonucleotides. The segments in red represent the sequence complement of the oligonucleotide sequence to be amplified, the signal sequence (shown in blue). The amplification template, τ, consists of two copies of the signal complement flanking the nicking enzyme recognition site shown as a light blue box, and a spacer sequence, shown as a green segment. The signal oligonucleotide (labeled σ ) is produced in the linear amplification cycle for each amplification template created. The labels on each structure in the figure correspond to the symbols used for their concentrations in the equations.

Figure 3b. This Figure shows mass spectrometry measurements results for an exponential reaction as described herein. Figure 3c. This Figures shows a solution of the differential equations describing the mass-action kinetics of the reaction scheme shown in Figure 3a. The kinetic parameters used for this solution were: r = 0.4 sec^"1; a = 2x10^"5; c = 2. The parameter c was chosen to give a reasonable fit to the data, though the curve is not very sensitive to this parameter. The initial (trigger) concentrations were chosen to match the curves in panel b. The curves of variable other than s correspond to the lower curve (green).

Figures 4, 5 and 6 present the extracted ion chromatograms for a sequence ladder generated according to the present invention. In Figure 4 is presented, from top to bottom, panels 1-5 (Figures 4a, 4b, 4c, 4d and 4e, respectively). In Figure 5 is presented, from top to bottom, panels 6-11 (Figures 5a, 5b, 5c, 5d, 5e and 5f, respectively). In Figure 6 is presented, from top to bottom, panels 12-17 (Figures 6a, 6b, 6c, 6d, 6e and 6f, respectively). Figures 7, 8, 9, 10 and 11 provide information pertaining to a sequence experiment performed according to the present invention. Figure 7 shows the oligonucleotide products as a function of nucleotide length. Figures 8 and 9 provide the extracted ion chromatograms of for a sequence ladder generated according to this experiment. In Figure 8 is presented, from top to bottom, panels 1-7 (Figures 8a, 8b, 8c, 8d, 8e, 8f and 8g, respectively), while Figure 9 presents, from top to bottom, panels 8-14 (Figures 9a, 9b, 9c, 9d, 9e, 9f and 9g). Figure 10 shows a spectrum of yield for the various oligomeric products as a function of time. Figure 11 shows the yield of 16 mer vs. nicking enzyme.

Figures 12a and 12b show the distribution of lengths of oligonucleotides produced in a linear amplification reaction using Nbtop/NBt20 duplex (see the Examples) and N.BstNB I and Vent exo- DNA polymerase. Incubation was at either 55°C (Figure 12a) or 60°C (Figure 12b) for 30 minutes. Figures 13 and 14 illustrate steps in preparing an oligonucleotide ladder according to the present invention. Figure 13 shows a duplex being formed in which nicking enzyme recognition sites (NERS) are present in the top strand and the bottom strand of the duplex. The nicking enzyme is then added and the duplex is nicked. The duplex falls apart at 60°C in the nicking buffer. As shown in Figure 14, the top and bottom strands (designated sequence A) are amplified.

DETAILED DESCRIPTION OF THE INVENTION

A. Overview

The present invention is generally directed to compositions and methods for obtaining base sequence information about a polynucleotide. The method of the present invention utilizes a linear amplification reaction that generates a ladder of oligonucleotides each differing by a single base. A ladder of oligonucleotides of known mass/charge ratios are produced, from which a nucleic acid sequence can be deduced, which identify unambiguously an organism or individual. The readout of this new sequencing assay is, for example, matrix-assisted-laser-desorption ionization (MALDI) or liquid chromatography time-of-flight (LC-TOF) mass spectrometry. More specifically, the present invention utilizes a nicking agent to nick double stranded nucleic acids to generate an amplification template that has the following properties: 1) a recessed 3' hydroxyl-group that can be extended by a DNA or RNA polymerase, and 2) a single strand template that extends 6 to 100 bases, preferably 6 to 24 nucleotides. The amplification template is generated by the presence of two nicking (or one nicking site and one double-stranded site) in a nucleic acid structure. The structure in the nucleic acid can be visualized as follows:

5, GAGTCNNNNNNNNNNNNGACTC 3'

3, CTCAGNNNNNNNNNNNNCTGAG 5' where the number of Ns can be from about 12 to 24 nucleotides, the "GAGTC" in the top strand is the recognition site for the nicking enzyme N.BstNBI which reaches 4 nucleotides to the 3' end and makes a single strand nick (top strand) and the "GAGTC" in the bottom strand is the recognition site for the nicking enzyme N.BstNBI which reaches 4 nucleotides to the 3' end and makes a single strand nick (bottom strand). Other nicking agents having different recognition sequences may also be used in the present invention; N.BstNBI is merely exemplary. After the nicking step, which takes place at about 55°C to 65°C, the duplex dissociates and forms two amplification templates.

5' GAGTCNNNN

₃ CTCAGNNNNNNNNNNNNNNNNNNNN plus

NNNNNNNNNNNNNNNNNNNNGACTC 3'

NNNNCTGAG 5'

In the presence of a polymerase, the recessed 3' -hydroxyl is filled in by the polymerase, the nicking enzyme then again cleaves the strand, the newly created strand immediately dissociates, and cycle of nicking and filling continues thereby generating a linear amplification of olionucleotide ladders:

5' -NNNNNN-3'

5' -NNNNNNN-3'

5' -NNNNNNNN-3'

5' -NNNNNNNNN-3'

5' -NNNNNNNNNN-3' 5' -NNNNNNNNNNN-3 '

5' -NNNNNNNNNNNN-3'

5' -NNNNNNNNNNNNN-3'

5'-NNNNNNNNNNNNNN-3'

5'-NNNNNNNNNNNNNNN-3'

In brief, the present invention overcomes drawbacks of the present state of the art through the discovery of novel methods and kits for rapidly sequencing DNA. In some cases, the method does not require the use of probes or primers to initiate the reaction, but rather the creation of oligonucleotides from naturally occurring nicking enzyme recognition sites using nicking agents in the presence of a polymerase.

In one aspect, the present invention provides a composition comprising polynucleotide, a nicking agent and a polymerase. The inventive compositions have unique properties that render them particularly useful in a wide variety of methods. In another aspect, the present invention provides methods for determining the nucleotide sequence of an interrogation region of a target polynucleotide that comprises a particular unique area or an area of variability or hypervariability. In certain embodiments of the invention, the method of sequencing an interrogation region may be used for genotyping an organism. In other embodiments of the invention, the sequencing method may provide the nucleotide sequence of a genetic variation at a defined location in a target nucleic acid or detect the identity of a single nucleotide polymorphism (SNP).

B. Advantages A desirable DNA sequencing procedure for use in a diagnostic environment would have the following characteristics: 1 ) it would be able to utilize a DNA-containing sample which had been subjected to only minimal pretreatment to make the DNA accessible for sequencing; 2) it would allow combining this sample with only a single reaction mixture, thus reducing risk of error and contamination, and increasing the ease with which the procedure can be automated; and 3) it would require only a short amount of time to perform the sequence determination, thus decreasing the marginal costs in terms of equipment and labor for performing the test, and 4) not require any tags or labels. The sequencing compositions and methods disclosed here fulfill those requirements.

Other advantages of the sequencing method of the present invention include: does not require single-stranded template for amplification step; uses single-stranded or double-stranded template equally well; isothermal amplification process; and has the ability to read sequence of shorter fragments than other methods.

C. Definitions

The following conventions and definition of terms as used herein may be helpful to an understanding of the detailed description of the invention. Additional definitions may be found throughout the description of the present invention.

The term "a" refers to one or more of the indicated items. For example, "a polynucleotide" refers to one or more polynucleotides; "a nicking agent" refers to one or more nicking agents; "a distributive polymerase" refers to one or more distributive polymerases; and "a mononucleotide" refers to one or more mononucleotides.

The terms "3"' and "5'" are used herein to describe the location of a particular site within a single strand of nucleic acid. When a location in a nucleic acid is "3' to" or "3' of a reference nucleotide or a reference nucleotide sequence, the location is between the 3' terminus of the reference nucleotide or the reference nucleotide sequence and the 3' hydroxyl of that strand of nucleic acid. Similarly, when a location in a nucleic acid is "5' to" or "5' of a reference nucleotide or a reference nucleotide sequence, the location is between the 5' terminus of the reference nucleotide or the reference nucleotide sequence and the 5' phosphate of that strand of nucleic acid. Further, when a subject nucleotide sequence is "directly 3' to" or "directly 3' of a reference nucleotide or a reference nucleotide sequence, the subject nucleotide sequence is immediately next to the 3' terminus of the reference nucleotide or the reference nucleotide sequence. Similarly, when a subject nucleotide sequence is "directly 5' to" or "directly 5' of a reference nucleotide or a reference nucleotide sequence, the subject nucleotide sequence is immediately next to the 5' terminus of the reference nucleotide or the reference nucleotide sequence. "Complexity" in reference to a population of polynucleotides means the number of different species of molecule present in the population.

"Corresponding to": A nucleotide in one strand (the first strand) of a double-stranded polynucleotide that is located at a position "corresponding to" a position (i.e., a defined position) in the other strand (the second strand) of the double-stranded polynucleotide refers to the nucleotide in the first strand that hydrogen bonds with and is complementary to the nucleotide at the defined position in the second strand when the first and second strands hybridize to one another to form the double-stranded polynucleotide. Similarly, a position in one strand (the first strand) of a double-stranded polynucleotide "corresponding to" a nicking site within the other strand (the second strand) of the double-stranded polynucleotide refers to the position between the two nucleotides in the first strand that are complementary to the two nucleotides that flank the nicking site in the second strand.

A "deoxyribonucleoside triphosphate" (dNTP) is a molecule composed of a nitrogenous base, for example, adenosine, guanosine, cytosine, or thymidine, attached to the five-carbon sugar deoxyribose, which also has three phosphate groups attached to it. To synthesize a DNA strand complementary to a target polynucleotide, each of the dNTPs must be available to the polymerase so that the complementary strand may be extended.

"Extending" a template polynucleotide refers to polymerization of one strand of nucleoside triphosphates to form a complement to the template. "Mutations": Genetic variations may or may not have effects on gene expression, including expression levels and expression products (i.e., encoded peptides). Genetic variations that affect gene expression are also referred to as "mutations," including point mutations, frameshift mutations, regulatory mutations, nonsense mutations, and missense mutations. A "point mutation" refers to a mutation in which a wild type base (i.e., A, C, G, or T) is replaced with one of the other standard bases at a defined nucleotide locus within a nucleic acid sample. A "frameshift mutation" is caused by deletion or insertion of one or a few nucleotides that cause the reading frame(s) of a gene to be shifted, resulting in potential transcription and translation of a protein different than that coded by the wild type gene. A "regulatory mutation" refers to a mutation in a non-coding region, e.g., an intron or a 5'- or 3'-flanking region, that affects correct gene expression (e.g., amount of product, localization of protein, timing of expression). A "nonsense mutation" is a single nucleotide change that at the point of mutation results in a triplet codon which is read as a "STOP" codon, causing premature termination of peptide elongation, that is, a truncated peptide. A "missense mutation" is a mutation that results in one amino acid being exchanged for a different amino acid. The substituted amino acid may directly affect a biological function, for example, enzymatic activity, or may affect function by causing a change in structure, such as the folding (3-dimensional structure) of the peptide or its proper association with other peptides in a multimeric protein.

A "native nucleotide" refers to adenylic acid, guanylic acid, cytidylic acid orthymidylic acid. A "modified nucleotide" is a nucleotide other than a native nucleotide.

"Nicking" refers to the cleavage at a specific position of only one strand of a double-stranded polynucleotide or of a double-stranded portion of a partially double-stranded polynucleotide. The specific position where the polynucleotide is nicked is referred to as the "nicking site" (NS).

A "nicking agent" (NA) is an agent that recognizes a particular double-stranded sequence, the "recognition sequence," of a double-stranded polynucleotide or a partially double-stranded polynucleotide and cleaves only one strand of the double stranded polynucleotide at a specific position relative to the recognition sequence. Nicking agents include, but are not limited to, a nicking endonuclease (e.g., N.BstNB I) and a restriction endonuclease (e.g., Hinc II) when the double-stranded (or partially double-stranded) polynucleotide contains a hemimodified recognition/cleavage sequence in which one strand contains at least one derivatized nucleotide, thus preventing cleavage of that strand by the restriction endonuclease.

"Nicking Agent Recognition Sequence" (NARS): The particular nucleotide sequence of a double-stranded (or partially double-stranded) polynucleotide that a NA recognizes is referred to as the "nicking agent recognition sequence" (NARS). Similarly, the particular double-stranded sequence of a double-stranded (or partially double-stranded) polynucleotide that a NE recognizes is referred to as the "nicking endonuclease recognition sequence" (NERS), whereas the specific sequence that a RE recognizes is referred to as the "restriction endonuclease recognition sequence" (RERS). A "hemimodified RERS," as used herein, refers to a double-stranded RERS in which one strand of the recognition sequence contains at least one derivatized nucleotide (e.g., α-thio deoxynucleotide), which prevents a RE that recognizes the RERS from cleaving the strand containing the derivatized nucleotide.

A "nicking endonuclease" (NE) refers to an endonuclease that recognizes a particular double-stranded polynucleotide sequence (the "recognition sequence") of a double-stranded polynucleotide, or a particular double-stranded portion of a partially double-stranded polynucleotide, and that cleaves only one strand of the polynucleotide at a specific location relative to the recognition sequence. Unlike a restriction endonuclease (RE), which functions as a nicking agent only when its recognition sequence contains at least one derivatized nucleotide, a NE typically recognizes a double-stranded polynucleotide sequence composed of only native nucleotides and cleaves only one strand of the double-stranded (or partially double-stranded) polynucleotide. "Nucleoside" refers to the natural nucleosides, including 2' -deoxy and 2' -hydroxyl forms, e.g., as described in Komberg and Baker, DNA

Replication, 2nd Ed. (Freeman, San Francisco, 1992). "Analogs" in reference to nucleosides includes synthetic nucleosides having modified base moieties and/or modified sugar moieties, e.g., described by Scheit, Nucleotide Analogs (John Wiley, New York, 1980); Uhlman and Peyman, Chemical Reviews 90:543-584, 1990, or the like, with the only proviso that they are capable of specific hybridization. Such analogs include synthetic nucleosides designed to enhance binding properties, reduce complexity, increase specificity, and the like.

"Oligonucleotide" refers to linear oligomers of natural or modified monomers or linkages, including deoxyribonucleosides, ribonucleosides, anomeric forms thereof, peptide nucleic acids (PNAs), and the like, capable of specifically binding to a target polynucleotide by way of a regular pattern of monomer-to-monomer interactions, such as Watson-Crick type of base pairing, base stacking, Hoogsteen or reverse Hoogsteen types of base pairing, or the like. Usually monomers are linked by phosphodiester bonds or analogs thereof to form oligonucleotides ranging in size from a few monomeric units, e.g., 3-4, to several tens of monomeric units. Whenever an oligonucleotide is represented by a sequence of letters, such as "ATGCCTG," it will be understood that the nucleotides are in 5' -3' order from left to right and that "A" denotes deoxyadenosine, "C" denotes deoxycytidine, "G" denotes deoxyguanosine, and "T" denotes thymidine. Analogs of phosphodiester linkages include phosphorothioate, phosphorodithioate, phosphoranilidate, phosphoramidate, and the like. Usually oligonucleotides of the invention comprise the four natural nucleotides; however, they may also comprise non- natural nucleotide analogs. The term oligonucleotide is often used herein to refer to the products produced by action of a nicking agent and a polymerase in combination with monomeric nucleotides and a target polynucleotide. In some embodiments of the present invention, an oligonucleotide is a single-stranded nucleic acid fragment that is at most 4, 5, 8, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, or 100 nucleotides in length. The short length of such a fragment facilitates determination of the fragment nucleotide sequence by various techniques, including liquid chromatography and mass spectrometry, as described in detailed below.

"Perfectly matched" in the context of a double stranded nucleic acid molecule (duplex) means that the poly- or oligonucleotide strands making up the duplex form a double stranded structure with one other such that every nucleotide in each strand undergoes Watson-Crick basepairing with a nucleotide in the other strand. The term also comprehends the pairing of nucleoside analogs, such as deoxyinosine, nucleosides with 2-aminopurine bases, and the like, that may be employed. In reference to a triplex, the term means that the triplex consists of a perfectly matched duplex and a third strand in which every nucleotide undergoes Hoogsteen or reverse Hoogsteen association with a basepair of the perfectly matched duplex. Conversely, a "mismatch" in a duplex between a tag and an oligonucleotide means that a pair or triplet of nucleotides in the duplex or triplex fails to undergo Watson-Crick and/or Hoogsteen and/or reverse Hoogsteen bonding. The terms "polymorphism" and "genetic variation" refer to the occurrence of two or more genetically determined alternative sequences or alleles in a small region (i.e., one to several (e.g., 2, 3, 4, 5, 6, 7, or 8) nucleotides in length) in a population. The allelic form occurring most frequently in a selected population is referred to as the wild type form. Other allelic forms are designated as variant forms. Diploid organisms may be homozygous or heterozygous for allelic forms.

"Polynucleotide" or "polynucleotide molecule" refers to any of deoxyribonucleic acid (DNA), ribonucleic acid (RNA), oligonucleotides, fragments generated by the polymerase chain reaction (PCR), and fragments generated by any of ligation, scission, endonuclease action, and exonuclease action. Polynucleotides can be composed of two or more monomers that are naturally-occurring nucleotides (such as deoxyribonucleotides and ribonucleotides), or analogs of naturally-occurring nucleotides (e.g., α- enantiomeric forms of naturally-occurring nucleotides or base analogs), or a combination of both. Modified nucleotides can have modifications in sugar moieties and/or in pyrimidine or purine base moieties. Sugar modifications include, for example, replacement of one or more hydroxyl groups with halogens, alkyl groups, amines, and azido groups, or sugars can be functionalized as ethers or esters. Moreover, the entire sugar moiety can be replaced with sterically and electronically similar structures, such as aza-sugars and carbocyclic sugar analogs. Examples of modifications in a base moiety include alkylated purines and pyrimidines, acylated purines or pyrimidines, or other well-known heterocyclic substitutes. Nucleotides can be linked by phosphodiester bonds or analogs of such linkages. Analogs of phosphodiester linkages include phosphorothioate, phosphorodithioate, phosphoroselenoate, phosphorodiselenoate, phosphoroanilothioate, phosphoranilidate, phosphoramidate, and the like. The term "polynucleotide" also includes oligonucleotides (i.e., polynucleotides having less than 100 nucleotides) incorporating one or more specificity spacers (as defined herein) where abasic residue and base analog residues are exemplary specificity spacers. The term "polynucleotide" also includes so-called "peptide nucleic acids," which comprise naturally-occurring or modified nucleic acid bases attached to a polyamide backbone. Polynucleotides can be either single stranded or double stranded. In a preferred aspect of the invention, the polynucleotide is single-stranded. "Sequence determination" or "determining a nucleotide sequence" in reference to polynucleotides refers to the determination of partial as well as full sequence information of the polynucleotide. That is, the term includes sequence comparisons, fingerprinting, and like levels of information about a target polynucleotide, as well as the express identification and ordering of nucleosides, usually each nucleoside, in a target polynucleotide. The term also includes the determination of the identification, ordering, and locations of one, two, or three of the four types of nucleotides within a target polynucleotide. For example, in some embodiments sequence determination may be effected by identifying the ordering and locations of a single type of nucleotide, e.g., cytosines, within the target polynucleotide "CATCGC . . . " so that its sequence is represented as a binary code, e.g., "100101 . . . " for "C-(not C)-(not C)-C- (not C)-C . . . " and the like. Sequence determination is an example of obtaining base sequence information from the polynucleotide.

Sense and Antisense Strands: The asymmetric nucleotide sequence of a NARS that is present in the strand containing the NS susceptible to the nuclease activity of the NA ("nickable") by the NA that recognizes the NARS is referred to as "the sequence of the sense strand of the NARS" and is present in the "sense strand of the NARS." The nucleotide sequence of the NARS in the strand that does not contain the NS is referred to as "the sequence of the antisense strand of the NARS" and is present in the "antisense strand of the NARS." Similarly, the nucleotide sequence of a NERS that is present in the strand containing the NS nickable by the NE that recognizes the NERS is referred to as "the sequence of the sense strand of the NERS" and is located in the "sense strand of the NERS." The nucleotide sequence of the NERS in the strand that does not contain the NS is referred to as "the sequence of the antisense strand of the NERS" and is located in the "antisense strand of the NERS." For example, the recognition sequence and the nicking site of an exemplary nicking endonuclease, N.BstNB I, are shown below, with "♦" to indicate the nicking site.

♦ 5'-GAGTCNNNNN-3'

3'-CTCAGNNNNN-5'

Thus, the sequence of the sense strand of the N.BstNB I recognition sequence is 5'-GAGTC-3', whereas that of the antisense strand is 5'-GACTC-3'. The nucleotide sequence of a hemimodified RERS that is present in the strand containing the NS nickable by the RE that recognizes the hemimodified RERS (i.e., the strand that does not contain any derivatized nucleotides within the RERS) is referred to as "the sequence of the sense strand of the hemimodified RERS" and is present in the "sense strand of the hemimodified RERS." The nucleotide sequence of the hemimodified NERS in the strand that does not contain the NS (i.e., the strand that contains derivatized nucleotide(s) in the RERS) is referred to as "the sequence of the antisense strand of the hemimodified RERS" and is present in the "antisense strand of the hemimodified RERS."

D. Assay Steps In one aspect, the present invention provides a method for obtaining base sequence information from a polynucleotide. The method includes the following actions: a) providing a polynucleotide comprising a nicking agent recognition sequence; b) combining the polynucleotide of a) with: i) a nicking agent that recognizes the recognition sequence; ii) a distributive polymerase; and iii) a mononucleotide; under conditions that form a plurality of oligonucleotides, where members of the plurality differ by the number of nucleotides in the oligonucleotides; and c) characterizing the plurality of oligonucleotides and thereby obtaining base sequence information about the polynucleotide.

In another aspect, the present invention provides a method for determining the nucleotide sequence of an interrogation sequence located within a target polynucleotide. The method comprises:

(b) contacting the double-stranded template polynucleotide comprising the target polynucleotide with the NA;

(c) nicking the template at the NS to provide a new 3' terminus at the NS;

(d) extending the nicked template of step (c) from the new 3' terminus at the NS; (e) amplifying two or more single-stranded oligonucleotide fragments by repeating steps (c) and (d) to provide a plurality of amplified fragments, wherein member fragments of the plurality are extended by a differing number of nucleotides; and

E. Target Nucleic Acids

The target nucleic acid of the present invention that comprises an interrogation fragment (also sometimes equivalently referred to herein as the interrogation sequence) of interest may include any nucleic acid molecule. The target nucleic acid may be single-stranded or double-stranded DNA. Single- stranded DNA may be derived from any double-stranded DNA or may exist as a single-stranded DNA. Other DNA target nucleic acids may include, but are not limited to, genomic DNA, mitochondrial DNA, viral DNA, and cDNA. The target nucleic acid may be RNA, including mRNA, ribosomal RNA, tRNA, and viral RNA.

The target nucleic acid may be naturally occurring or derived from a naturally occurring nucleic acid. A naturally occurring target nucleic acid may be obtained from a biological sample. Biological samples may be provided by obtaining a blood sample, biopsy specimen, tissue explant, organ culture, biological fluid, or any other tissue or cell preparation from a subject or a biological source. The subject or biological source may be a human or non- human animal, a plant, a primary cell culture, or a culture adapted cell line including but not limited to genetically engineered cell lines that may contain chromosomally integrated or episomal recombinant nucleic acid sequences, immortalized or immortalizable cell lines, somatic cell hybrid cell lines, differentiated or differentiatable cell lines, transformed cell lines and the like. In certain preferred embodiments of the invention, the subject or biological source may be suspected of having or being at risk for having a disease or disorder as provided herein, and in certain other preferred embodiments of the invention the subject or biological source may be known to be free of a risk or presence of such disease. In certain embodiments the target nucleic acid originates from an organism that is a pathogen or opportunistic pathogen, such as a bacterium, virus, parasite, or fungus. Methods for isolating populations of nucleic acids from biological samples are well known and readily available to those skilled in the art of the present invention. Exemplary techniques are described, for example, in the following laboratory research manuals: Sambrook et al., Molecular Cloning (Cold Spring Harbor Press, 3rd Edition, 2001) and Ausubel et al., Short Protocols in Molecular Biology (1999) (incorporated herein by reference in their entireties). Nucleic acid isolation kits are also commercially available from numerous companies and may be used to simplify and accelerate the isolation process.

A target nucleic acid may be synthetic, that is, prepared by human action, or may be a combination of a synthetic and naturally occurring nucleic acid. Many companies manufacture and sell synthetic nucleic acids that may be useful as the target nucleic acid or a portion thereof, or as controls for sequencing reactions, or as oligonucleotide primers in the present invention. See, e.g., Applied Bio Products Bionexus (www.bionexus.net); Commonwealth Biotechnologies, Inc. (Richmond, VA; www.cbi-biotech.com); Gemini Biotech (Alachua, Florida; www.geminibio.com); INTERACTIVA Biotechnologie GmbH (Ulm, Germany; www.interactiva.de); Microsynth (Balgach, Switzerland; www.microsynth.ch); Midland Certified Reagent Company (Midland, TX; www.mcrc.com); Oligos Etc. (Wilsonville, OR; www.oligosetc.com); Operon Technologies, Inc. (Alameda, CA; www.operon.com); Scandanavian Gene Synthesis AB (Kόping, Sweden; www.sgs.dna); Sigma-Genosys (The

Woodlands, Texas; www.genosys.com); Synthetic Genetics (San Diego, CA; www.syntheticgenetics.com, owned by Epoch Biosciences, Inc. (Bothell, WA; www.epochbio.com)); and many others.

In certain embodiments, a synthetic target nucleic acid may be prepared using an amplification reaction, for example, the polymerase chain reaction (PCR). (See supra Mullis et al., Erlich, Mullis, Saiki et al., and Higuchi). Alternatively, the synthetic nucleic acid target may be prepared using recombinant DNA techniques to produce the synthetic nucleic acid in one or more prokaryotic or eukaryotic systems such as, e.g., E. coli, yeast, Drosophila, baculovirus/insect cells, or mammalian tissue culture cell lines.

The target nucleic acid molecule may, and typically will, contain one or more of the natural bases present in nucleotides, that is, adenine (A), guanine (G), cytosine (C), thymine (T) or, in the case of an RNA, uracil (U). In addition, and particularly when the nucleic acid is a synthetic molecule, the target nucleic acid may include "unnatural" nucleotides. Unnatural nucleotides are chemical moieties that can be substituted for one or more natural nucleotides in a nucleotide chain without causing the nucleic acid to lose its ability to serve as a template for a primer extension reaction. The substitution may include sugar or phosphate substitutions or both, in addition to base substitutions. Such moieties are very well known in the art, and are known by a large number of names including, for example, abasic nucleotides, which do not contain a commonly recognized nucleotide base, such as adenine, guanine, cytosine, uracil or thymine (see, e.g., Takeshita et al., J. Biol. Chem. 262:10171-179, 1987; Iyer et al., Nucleic Acids Res. 78:2855-59, 1990; and U.S. Patent No. 6,117,657); base or nucleotide analogs (see, e.g., Ma et al., Nucleic Acids Res. 27:2585, 1993). Some bases are known as universal mismatch base analogs, such as the abasic 3-nitropyrrole; convertides (see, e.g., Hoops et al., Nucleic Acids Res. 25:4866-71 , 1997); modified nucleotides (see, e.g., Millican et al., Nucleic Acids Res. 72:7435-53, 1984); nucleotide mimetics; nucleic acid related compounds; spacers (see, e.g., Nielsen et al., Science 254:1497-1500, 1991; and specificity spacers (see, e.g., PCT International Publication No. WO 98/13527). Additional examples of non- natural nucleotides are set forth in Augustyns, K. A. et al., Nucleic Acids Res. 79:2587-93, 1991 ; Jaschke et al., Tetrahedron Lett. 34:301 , 1993; Seela and Kaiser, Nucleic Acids Res. 75:3113, 1990, and Nucleic Acids Res. 78:6353, 1990; Usman et al., PCT International Patent Application No. PCT/US 93/00833; Eckstein, PCT International Patent Application No. PCT/EP91/01811 ; Sproat et al., U.S. Patent No. 5,334,711 ; Buhr et al., PCT International Publication No. WO 91/06556; and U.S. Patent Nos. 5,959,099 and 5,840,876.

When either the target nucleic acid molecule or the primer used in the present method contains a non-natural nucleotide, a base-pair mismatch will occur between the target and the primer. The term "base-pair mismatch" refers to all single and multiple nucleotide substitutions that perturb the hydrogen bonding between conventional base pairs, e.g., G:C, AT, or A:U, by substitution of a nucleotide with a moiety that does not hybridize according to the standard Watson-Crick model to a corresponding nucleotide on the opposite strand of the oligonucleotide duplex. Such base-pair mismatches include, e.g., G:G, GT, G:A, G:U, C:C, CA, CT, CU, TT, T:U, U:U and A:A. Also included within the definition of base-pair mismatches are single or multiple nucleotide deletions or insertions that perturb the normal hydrogen bonding of a perfectly base-paired duplex. In addition, base-pair mismatches arise when one or both of the nucleotides in a base pair has undergone a covalent modification (e.g., methylation of a base) that disrupts the normal hydrogen bonding between the bases. Base-pair mismatches also include non- covalent modifications, for example, mismatches resulting from incorporation of intercalating agents such as ethidium bromide and the like that perturb hydrogen bonding by altering the helicity or base stacking of an nucleic acid duplex.

The target, in addition to optionally containing known nucleotides or analogs thereof, may also comprise one or more natural bases of unknown identity (i.e., potential genetic variations). The present invention provides compositions and methods whereby the unknown nucleotide(s) are identified by sequencing, and thereby the unknown sequence of a genetic variation can be identified. The base or bases of unknown identity are presen within the interrogation sequence, which may also be referred to herein as the "nucleotide locus" or the "defined position" or the "defined location", all of which refers to a specific nucleotide or region encompassing one or more nucleotides that has a precise location on a target nucleic acid.

F. Providing a polynucleotide comprising a nicking agent recognition sequence

As indicated elsewhere herein, a nicking agent recognition sequence is a double-stranded sequence of nucleotides that is recognized by a nicking agent. When the NARS is part of a template nucleic acid, the nicking agent will, under the correct conditions, nick (i.e., cleave one strand of) the duplex template nucleic acid at a defined position relative to the NARS. That is, the nicking agent does not randomly nick the duplex template, but instead preferentially nicks the template nucleic acid at a specific location relative to the NARS.

The polynucleotide comprising a NARS will include a target polynucleotide. The polynucleotide is a single-stranded molecule that contains a base sequence of interest, where this particular base sequence is sometimes referred to herein as the interrogation sequence or the interrogation fragment. According to the present invention, the target polynucleotide must anneal to another polynucleotide in order to form a duplex. This duplex, which will contain a NARS, is referred to herein as the template nucleic acid.

Thus, in order to perform the assay of the present invention, as an initial matter, a target polynucleotide must be present in aqueous solution as a template nucleic acid, where the template nucleic acid contains the NARS. The template nucleic acid may be provided in many different ways. For example, it may conveniently occur that the sample which is of interest to be analyzed, already is at least partially double-stranded and contains a nicking agent recognition sequence. Alternatively, the sample which is of interest may be double stranded but not contain a NARS, however, upon extension of one of the strands on the double stranded molecule, a NARS may be formed. If the nucleotide sequence of the target polynucleotide is partially known, then it is a simple matter to convert the target polynucleotide into a template nucleic acid having a NARS. This conversion is readily accomplished by various means, depending on the sequence of the target polynucleotide. For example, if the target polynucleotide contains the essential bases for nicking agent recognition, and only lacks having those bases in double- stranded form, then a second polynucleotide may be obtained (e.g., synthesized) and added to the target polynucleotide, where the second oligonucleotide should be designed so that it has a base sequence which will anneal to the target polynucleotide at a location that "covers" the essential bases and thus converts those essential bases to a double-stranded form (and thus into a form which will interact with the nicking agent and allow nicking of the template to occur). Alternatively, if the target polynucleotide has the essential bases for nicking agent recognition, a second polynucleotide may be added to the target polynucleotide which anneals upstream of the essential bases, and then extension of the second polynucleotide by polymerase will generate the nicking agent recognition sequence.

As yet another possibility, if the target polynucleotide does not contain the bases that are essential for recognition by a nicking agent, but some base sequence information about the target polynucleotide is known, then a second polynucleotide can be designed that both (1) has base sequence(s) that will anneal to the target nucleic acid and (2) includes the essential bases for nicking agent recognition. In this case, the template nucleic acid will certain contain one or more mismatched bases within the NARS. However, when nicking enzymes are used in the assay, those nicking enzymes will recognize the NARS even though there are base mismatches in the NARS, so long as the essential bases are present in one of the strands of the NARS. This aspect of the present invention is extremely important because it provides enormous flexibility is locating the NARS. Further discussion about mismatches in a duplex NARS is provided below. However, for present purposes it may be noted that the NARS can be located relative to any sequence present in the target polynucleotide simply by designing a second polynucleotide that includes both (1) the essential bases of a NARS and (2) the bases which will cause the second polynucleotide to anneal to the target polynucleotide at a desired location of the target polynucleotide.

It will sometimes happen that little or not base sequence information is known about the target polynucleotide. In this case, one can design an adapter polynucleotide which can be ligated to either the 3' or 5' end of the target polynucleotide. Methods to ligate one polynucleotide to the end of another polynucleotide are well known in the art, and may be utilized in the context of the present invention. The adapter will have a known base sequence, and particularly in those cases where the adapter is a synthesized polynucleotide, it will have a completely known and designed base sequence. In this case, the adapter will have a known base sequence, which may or may not include the essential bases of a NARS, or the complement thereof, and the ligation product of the adapter and the target polynucleotide can be treated as described above as far as forming a double stranded NARS which will be recognized by a nicking agent and allow that nicking agent to nick the template nucleic acid. In this way the sequence of the first 15 or so bases of the target polynucleotide can be determined. After this determination, some base sequence information about the target polynucleotide becomes known, and the methods discussed above can be applied to the target polynucleotide. Essentially, it is possible according to the methods of the present invention to perform an interactive sequencing procedure, where the sequence of the first 15 or so bases of a target polynucleotide are determined, and from this information a second polynucleotide is designed which will anneal to one or more of those 15 bases and thereby create a NARS which, in conjunction with a nicking agent and a polymerase, allows the identification of the next 15 or so bases. If desired, this process can be repeated iteratively in order to determine the entire base sequence of the target polynucleotide.

As another option, the double-stranded template nucleic acid may be provided by first forming a mixture of a first oligonucleotide primer (ODNP), a second ODNP, and a target nucleic acid that contains an interrogation sequence, where the interrogation sequence is the portion of the target nucleic acid to be sequenced. The target nucleic acid may be double-stranded or single-stranded. When the target nucleic acid is double stranded, the first ODNP comprises a nucleotide sequence of a sense strand of a NERS and a nucleotide sequence at least substantially complementary to a nucleotide sequence of one strand of the target nucleic acid located 3' to the complement of the interrogation fragment. The second ODNP comprises a nucleotide sequence at least substantially complementary to a nucleotide sequence 3' to the interrogation fragment located on the other strand of the target nucleic acid, and optionally comprises a nucleotide sequence of one strand of a restriction endonuclease recognition sequence (RERS). When the target nucleic acid is single-stranded, the first ODNP comprises a nucleotide sequence of a sense strand of a NERS and a nucleotide sequence at least substantially identical to the target nucleic acid located 5' to the interrogation sequence; the second ODNP comprises a nucleotide sequence at least substantially complementary to a nucleotide sequence of the target nucleic acid located 3' to the interrogation sequence and optionally comprises a nucleotide sequence of one strand of a RERS.

In this case, the extension of the first and the second ODNPs using the target nucleic acid as template produces an extension product having a NERS and optionally having a RERS where the interrogation sequence is located between the NERS and RERS. This extension product is the desired double-stranded template nucleic acid that comprises an interrogation sequence of a target nucleic acid and a NS located 5' to the interrogation sequence. The extension product, if it contains a RERS, may be cleaved with a restriction endonuclease that recognizes the RERS. The digested extension product then may be used as template for amplifying at least two single- stranded nucleic acid molecules of differing lengths in the presence of a NE that recognizes the NERS and a DNA polymerase that lacks 5'->3' exonuclease activity. Otherwise, the extension product may be directly used as a template for amplifying a single-stranded nucleic acid molecule.

In another embodiment of the invention, a method for sequencing an interrogation sequence comprises nicking one strand of a double-stranded template with a restriction endonuclease. The double-stranded template nucleic acid may be provided by first forming a mixture of a first ODNP, a second ODNP, and a single-stranded or double-stranded target nucleic acid. In the instance when the target nucleic acid is double-stranded, the first ODNP comprises a sequence of one strand of a first restriction endonuclease recognition sequence (RERS) and a nucleotide sequence at least substantially complementary to a nucleotide sequence of one strand of the target nucleic acid located at 3' to the complement of the interrogation fragment, while the second ODNP comprises a sequence of one strand of a second RERS and a nucleotide sequence at least substantially complementary to a nucleotide sequence of the second strand of the target nucleic acid located 3' to the interrogation fragment. In the instance when the target nucleic acid is single- stranded, the first ODNP comprises a sequence of one strand of a first RERS and a nucleotide sequence at least substantially identical to a nucleotide sequence of the target nucleic acid located 5' to the complement of the interrogation fragment; the second ODNP comprises a sequence of one strand of a second RERS and a nucleotide sequence at least substantially complementary to a nucleotide sequence of the target nucleic acid located 3' to the interrogation fragment. The extension of the first and second ODNPs in the presence of deoxyribonucleoside triphosphates and at least one modified deoxyribonucleoside triphosphate produces a double-stranded template having both the first and the second hemimodified RERSs. The double-stranded template is then digested with a RE that recognizes the second RERS. The digested extension product then may be used as a template for amplifying at least two single-stranded nucleic acid molecules of differing lengths in the presence of a RE that recognizes the first RERS and a DNA polymerase that lacks 5'- 3' exonuclease activity.

G. Base-pair mismatches in the nicking agent recognition sequence One method to create sequencing ladders allows the amplification of an oligonucleotide at almost any site in a polynucleotde. This method makes use of the odd behavior of the nicking enzyme N.BstNBI to recognize a mismatched recognition sequence in a duplex. That is, the entire 5' -GAGTC- 3' does not need to be entirely base-paired for the nicking enzyme to bind and nick a duplex. The nicking enzyme has no proclivity to cleave single-strand oligonucleotides containing the recognition site 5' -GAGTC-3'' . The ability of N.BstNBI to bind to mismatched recognition sequence was tested by constructing a series of oligonucleotides duplexes in which only the recognition site was mismatched to various extents and then amplifying oligonucleotides in a linear reaction (see methods). The duplex forms a primer-template for linear amplification when cleaved by the nicking enzyme:

3f-GG ATG CTG ACC TTG TCT GAG TGG ATG CTG ACC T- 5' 5'-CC TAC GAC TGG AAC AGA CTC ACC TAC GAC TGG A- 3'

where the fragment 3' -GG ATG CTG ACC-5' is generated in the amplification reaction. Duplexes containing 1-5 mismatched were tested and the read-out was by mass spectrometry using LC-TOF (see Examples). This observation has been coupled with the concept of using a distributive enzyme to create triggers almost anywhere in an organism's genome. Again, it is important to note that when Vent exo- or 9°-North polymerase, which are both distributive enzymes, are used in the linear amplification reaction, that a sequencing ladder appears in the chromatography and mass spectrum when analyzed by LC-TOF. To illustrate the point a primer-template was used to generate oligonucleotides from a 20-mer template. The distribution of products was characterized, and it was observed that very few full-length products (20-mers) were formed. 20- mers are relatively stable under the isothermal (60°C) conditions and so not readily dissociate after the duplex is nicked. It has been reported that for Vent exo- the average chain length generated per initiation was about 7 nucleotides for Vent exo-. It may be the case that at 60°C the average chain length is 6 nucleotides. The notable point however is that the sequencing ladder can be produced on a template nucleic acid and that a well-defined 5' -terminus on the template does not need to be present. For the nicking endonuclease N.BstNb I, the essential bases for recognition purposes are the base sequence 5' -GAGTC-3' . These bases are preferably part of a longer oligonucleotide, where this longer oligonucleotide is annealed to the target polynucleotide. Preferably, these bases (5' -GAGTC-3' ) are across from other bases in the duplex. In other words, in the duplex, it is preferred that none of the bases "loop out" from the duplex and thus do not have a matching base in the template nucleic acid. However, all five of these essential bases, i.e., the bases that are essential in order for the nicking agent to recognize the NARS, may be mismatched, or only four, or only three, or only two, or only one of the essential bases may be mismatched.

H. Using primers to prepare a double-stranded template nucleic acid As noted above, a single-stranded nucleic acid molecule containing an interrogation sequence from a target nucleic acid may be amplified in the presence of a nicking agent (e.g., a NE or a RE) and a DNA polymerase. For the amplification of the single-stranded nucleic acid molecule using a NE as the nicking agent, the template nucleic acid for the amplification must contain a NERS (sometimes referred to as the first NERS). In addition, the template nucleic acid may also contain a RERS or a second NERS. The presence of either the RERS or the second NERS in addition to the first NERS allows for a cleavage in the strand of the template that does not contain the NS, near the site corresponding to the nicking site produced by a NE that binds to the first NERS. Such a cleavage allows the single-stranded nucleic acid molecule amplified from the template to be relatively short, which facilitates the characterization of the interrogation fragment incorporated into the single- stranded nucleic acid molecule. For the amplification of the single-stranded nucleic acid molecule that uses a RE as the nicking agent, the template nucleic acid for the amplification must contain a hemimodified RERS and may contain an additional RERS. If two hemimodified RERSs are present, they may or may not be the same. Similar to the template that contains two NERSs, the presence of the second hemimodified RERS also allows the amplified single- stranded nucleic acid molecule to be relatively short.

The template nucleic acid of the present invention may be provided by amplifying a fragment of a target nucleic acid containing an interrogation fragment using specifically designed ODNPs. The term "oligonucleotide primer" (ODNP) refers to any polymer having two or more nucleotides used in a hybridization, extension, and/or amplification reaction. The ODNP may be comprised of deoxyribonucleotides, ribonucleotides, or an analog of either. As used herein for hybridization, extension, and amplification reactions, ODNPs are generally between 8 and 200 nucleotides in length. More preferred are ODNPs of between 12 and 50 nucleotides in length and still more preferred are ODNPs of between 18 and 32 nucleotides in length.

In one embodiment, the present invention provides an ODNP pair (referred to as "the first ODNP pair") useful for producing a template nucleic acid containing a NERS, and optionally containing an RERS, where the interrogation fragment is located between the NERS and the optionally present RERS. For convenience, a double-stranded target nucleic acid is used in the following description of the ODNP pair. One ODNP of the first ODNP pair (referred to as "the first ODNP") comprises (1 ) a sequence of a sense strand of a NERS and (2) a nucleotide sequence at least substantially complementary to a nucleotide sequence of the strand of the target nucleic acid that contains the complement nucleotide(s) of the interrogation fragment. The nucleotide sequence of the target to which a portion of the first ODNP is complementary is located 3' to the complement nucleotide(s) of the genetic variation. This design allows the extension product of the first ODNP to incorporate the interrogation fragment. The phrase "at least substantially complementary" refers to a degree of complementarity between a portion of the ODNP and a target nucleic acid sufficient to allow the ODNP to specifically anneal to the target and to function as a primer for extension/amplification. In a preferred embodiment, the complementarity is exact. The sequence of the ODNP that complements the target can be located either 3' or 5' to the sequence of the sense strand of the NERS. Preferably, sequences exist that are located both 5' and 3' to the sequence of the sense strand of the NERS and that are at least substantially complementary to the target. The presence of a substantially or exactly complementary sequence located 3' to the sequence of the sense strand of the NERS facilitates annealing of the primer to the template at a pre-defined location and increases extension/amplification efficiency. The presence of a substantially or exactly complementary sequence located 5' to the sequence of the sense strand of the NERS reduces the number of nucleotides located 3' to the sequence of the sense strand of the NERS that are needed for successful and efficient annealing and extension, and also shortens the length of the subsequently amplified single-stranded nucleic acid molecule as described in detail below. The complete sequence of the sense strand of the NERS of the ODNP may or may not be complementary to the corresponding region of the target. In one aspect, the complete sequence of the sense strand of the NERS of the ODNP is not exactly complementary to the corresponding region of the target. Generally, the ODNP contains at least 6, preferably 8, more preferably 10, most preferably 12, 14, or 16 nucleotides that are exactly complementary to the target nucleic acid. The other ODNP of the first ODNP pair (referred to as "the second

ODNP") comprises (1) a nucleotide sequence at least substantially complementary to a nucleotide sequence of the strand of the target nucleic acid that contains the interrogation fragment and, optionally, (2) a sequence of one strand of a RERS. The nucleotide sequence of the ODNP to which a portion of the target is at least substantially complementary is designed to be located 3' to the genetic variation. This design allows the extension product of the second ODNP to incorporate the complement of the interrogation fragment. Similar to the first ODNP, the complementarity between the annealing portion of the second ODNP and the corresponding portion of the target need not be exact, but must be sufficient to allow the second ODNP to specifically anneal to the target at a desired location and to function as a primer for extension/amplification. The sequence of the second ODNP that is at least substantially complementary to the target can be located either 3' or 5' to the sequence of the one strand of the RERS if a RERS is present. Preferably, sequences exist that are located both 5' and 3' to the sequence of the RERS and that are at least substantially complementary to the target. The sequence of the RERS of the second ODNP may or may not be complementary to the corresponding region of the target, and typically it will not be exactly complementary to the corresponding region of the target. Generally, the second ODNP contains at least 6, preferably 8, more preferably 10, most preferably 12, 14, or 16 nucleotides that are exactly complementary to the target nucleic acid.

While the first ODNP pair has been described above in connection with a target nucleic acid that is double-stranded, one of ordinary skill in the art could, with the guidance provided herein, readily design this ODNP pair as well as other ODNP pairs (i.e., the second ODNP pair and the third ODNP pair as described below) for situations in which the target nucleic acid is single-stranded. Briefly, if the target nucleic acid is single-stranded, one first ODNP comprises a nucleotide sequence of a sense strand of a NERS and a nucleotide sequence at least substantially identical to a nucleotide sequence of the target nucleic acid located 5' to the interrogation fragment, while the second ODNP comprises a nucleotide sequence at least substantially complementary to a nucleotide sequence of the target nucleic acid located 5' to the interrogation fragment and optionally comprises a sequence of one strand of a RERS. The phrase "at least substantially identical" refers to as degree of identity between a portion of the nucleotide sequence of an ODNP and a corresponding portion of a target nucleic acid sufficient to allow the ODNP to specifically anneal to a nucleic acid comprising a sequence that is exactly complementary to the corresponding portion of the target nucleic acid, and to function as a primer for extension/amplification with the nucleic acid as a template. In a certain preferred embodiment, a portion of the nucleotide sequence of the first ODNP is exactly identical to a portion of a target nucleic acid.

In another embodiment, the present invention provides an ODNP pair (referred to as "the second ODNP paif) useful for producing a template nucleic acid containing one or two NERSs, where the interrogation fragment from the target is located between the two NERSs if a second NERS is present. The first ODNP of the second ODNP pair is the same as that the first ODNP of the first ODNP pair described above. The second ODNP of the second ODNP pair is the same as the second ODNP of the first ODNP pair except that the sequence of one strand of the RERS in the second ODNP of the first ODNP pair is replaced with a sequence of a sense strand of a NERS in the second ODNP of the second ODNP pair.

In yet another aspect, the present invention provides an ODNP pair (referred to as "the third ODNP pair") useful for producing a template nucleic acid containing one or two hemimodified RERSs, with an interrogation fragment located between the two hemimodified RERSs if a second hemimodified RERS is present. The first ODNP of the third ODNP pair is the same as the first ODNP of the first ODNP pair except that the sequence of the sense strand of the NERS in the first ODNP of the first ODNP pair is replaced with a sense strand of the hemimodified RERS. The second ODNP of the third ODNP pair is the same as the second ODNP of the first ODNP pair except that the sequence of one strand of the RERS in the second ODNP of the first ODNP pair may be replaced with a sequence of a sense strand of a hemimodified RERS in the second ODNP of the third ODNP pair. The sequence of the sense strand of the hemimodified RERS in the first ODNP may or may not be identical to that in the second ODNP of the third ODNP pair. Preferably, they are the same.

In a further aspect, the present invention provide an ODNP pair (referred to as "the fourth ODNP pair") useful for producing a portion of a single- stranded nucleic acid containing an interrogation fragment to be identified at a defined location. For convenience, a single-stranded target nucleic acid is used in the following description of the ODNP pair. However, designing ODNP pairs, wherein the target nucleic acid is double-stranded, using the guidance provided herein is within the skill of an ordinary artisan. One primer of the fourth ODNP pair ("the first ODNP") comprises a nucleotide sequence at least substantially complementary to a nucleotide sequence of a target nucleic acid at a location 3' to the defined position ("the first region of the target nucleic acid"), whereas the other primer of the fourth ODNP pair ("the second ODNP") comprises a nucleotide sequence at least substantially complementary to a nucleotide sequence of the complement of the target nucleic acid at a location 3' to the complementary nucleotide of the nucleotide at the defined position ("the first region of the complement"). The complementarity between the ODNPs and their corresponding target nucleic acid, or the complement thereof, need not be exact, but must be sufficient for the ODNPs to selectively hybridize with the target nucleic acid, or the complement thereof, such that the ODNPs are able to function as primers for extension and/or amplification using the target nucleic acid, or the complement thereof, as a template. Generally, each ODNP contains at least 6, preferably 8, more preferably 10, most preferably 12, 14, or 16 nucleotides that are exactly complementary to the target nucleic acid or the complement thereof. Because each ODNP of the fourth ODNP pair hybridizes to a target nucleic acid, or the complement thereof, at a location 3' to the defined position in the target or the complementary position in the complement of the target, the resulting extension and/or amplification products from the fourth ODNP pair incorporates the nucleotide to be identified at the defined position or the complement thereof. Each ODNP in the fourth ODNP pair of the present invention further comprises a partial sequence of one strand of an interrupted restriction endonuclease recognition sequence (IRERS), but not a complete sequence of that strand of the IRERS, located 3' to, and preferably located directly 3' to, the nucleic acid sequence described above (i.e., the sequence complementary to the target nucleic acid or the complement thereof). As described in more detail below, a complete IRERS is a double-stranded nucleotide sequence comprising a first constant recognition sequence (CRS) and a second CRS linked with a variable recognition sequence (VRS). Generally, the first ODNP and the second ODNP comprise the first CRS of the first strand of the IRERS and the second CRS of the second strand of the IRERS, respectively. In addition, the first ODNP and the second ODNP are so spaced that (1) the extension and/or amplification product with the fourth ODNP pair as primers and the target nucleic acid as a template contains a complete IRERS and (2) the nucleic acid to be identified is incorporated within the VRS. In other words, the number of nucleotides between the first and the second CRS is the exact number of nucleotides in the VRS so that the extension and/or amplification product from the fourth ODNP pair can be digested by a RE that recognizes the complete IRERS. The partial IRERS in each ODNP may or may not be complementary to the target nucleic acid.

In a preferred embodiment, each ODNP of the fourth ODNP pair further contains one or more nucleotides complementary to the target nucleic acid or the complement thereof ("the second region of the target nucleic acid" and "the second region of the complement," respectively) at a location 3' to, or preferably the 3' terminus of, the CRS. Such nucleotides are a portion of the VRS. The number of nucleotides between the first and second regions of the target nucleic acid or the complement thereof may be larger or smaller, but preferably are equal to, the number of nucleotides of the ODNPs located between the corresponding first and second regions that are complementary to the target nucleic acids or the complement thereof. In addition, in one embodiment the first ODNP further comprises a sequence of a sense strand of a NERS at a location 5' to the first CRS of the first strand of an IRERS.

Alternatively, the second ODNP further contains a sequence of a sense strand of a NERS at a location 5' to the second CRS of the second strand of the IRERS. The presence of such a NERS allows the production of short single- stranded oligonucleotides upon digestion of extension/amplification products from the ODNPs using a NE that recognizes the NERS and a RE that recognizes the IRERS as described below.

General techniques for designing sequence-specific primers are well known. For instance, such techniques are described in books, such as PCR Protocols: Current Methods and Application (Bruce A. White, ed., 1993); PCR Primer: A Laboratory Manual (Carl W. Dieffenbach & Gabriela S. Dveksler, eds., 1995); McPherson et al., PCR (Basics: From Background to Bench); PCR Applications: Protocols for Functional Genomics (Michael A. Innis, ed., 1999); Newton et al., PCR: Introduction to Biotechniques Series (1997); Gelfand et al., PCR Protocols: A Guide to Methods and Applications (1990); Innis, PCR Strategies; Griffin et al., PCR Technology: Current Innovations (1994); and PCR: Essential Techniques (J. F. Burke, ed.). In addition, software programs for designing primers are also available, including Primer Master (see, Proutski et al., Comput. Appl. Biosci. 72:253-55, 1996) and OLIGO Primer Analysis Software from Molecular Biology Insights, Inc. (Cascade, CO, USA). The above reference books and description of software programs are incorporated herein by reference in their entireties.

ODNPs according to the invention can be synthesized by any method known in the art for oligonucleotide synthesis such as methods disclosed in U.S. Patent Nos. 6,166,198, 6,043,353, 6,040,439, and 5,945,524 (incorporated herein in their entireties by reference). For instance, solid phase oligonucleotide synthesis can be performed by sequentially linking 5^' blocked nucleotides to a nascent oligonucleotide attached to a resin, followed by oxidizing and unblocking to form phosphate diester linkages. ODNPs of the present invention may then be isolated. The term "isolated" as used herein refers to a molecule that is substantially free of undesired contaminants, such as molecules having other sequences.

Thus, in one aspect, the present invention provides a method of forming the double-stranded template polynucleotide by a method comprising: (a) forming a mixture of a first oligonucleotide primer (ODNP), a second ODNP, and the target polynucleotide comprising the interrogation polynucleotide, under conditions and for a time sufficient to allow the ODNPs to hybridize to the target polynucleotide, wherein

(i) if the target polynucleotide is a double-stranded polynucleotide having a first strand and a second strand, then the first ODNP comprises a nucleotide sequence of a sense strand of a NARS and a nucleotide sequence at least substantially complementary to a nucleotide sequence of the first strand of the target polynucleotide and located 3' to the complement of the interrogation polynucleotide, wherein the NARS is a nicking endonuclease recognition sequence (NERS); and the second ODNP comprises a nucleotide sequence at least substantially complementary to a nucleotide sequence of the second strand of the target polynucleotide located 3' to the interrogation polynucleotide, and optionally comprises a sequence of one strand of a restriction endonuclease recognition sequence (RERS), or (ii) if the target polynucleotide is a single-stranded polynucleotide, then the first ODNP comprises a nucleotide sequence of a sense strand of a NARS and a nucleotide sequence at least substantially identical to the target polynucleotide located 5' to the interrogation polynucleotide, wherein the NARS is a nicking endonuclease recognition sequence (NERS); and the second ODNP comprises a nucleotide sequence at least substantially complementary to a nucleotide sequence of the target polynucleotide located 3' to the interrogation polynucleotide, and optionally comprises a sequence of one strand of a restriction endonuclease recognition sequence (RERS);

(b) extending the first and the second ODNPs to produce the double-stranded template having the NERS and the RERS if the sequence of one strand of the RERS is present in the second ODNP; and

(c) optionally cleaving the extension product of step (b) with a restriction endonuclease that recognizes the RERS.

Thus, in another aspect, the present invention provides a method of forming the double-stranded template polynucleotide by a method comprising:

(a) forming a mixture of a first ODNP, a second ODNP, and the target polynucleotide comprising an interrogation polynucleotide, under conditions and for a time sufficient to allow the ODNPs to hybridize to the target polynucleotide, wherein

(i) if the target polynucleotide is a double-stranded polynucleotide having a first strand and a second strand, then the first ODNP comprises a nucleotide sequence of one strand of a NARS and a nucleotide sequence at least substantially complementary to a nucleotide sequence of the first strand of the target polynucleotide and located 3' to the complement of the interrogation polynucleotide, wherein the NARS is a first restriction endonuclease recognition sequence (RERS), and the second ODNP comprises a sequence of one strand of a second RERS and a nucleotide sequence at least substantially complementary to a nucleotide sequence of the second strand of the target polynucleotide located 3' to the interrogation polynucleotide, or

(ii) if the target polynucleotide is a single-stranded polynucleotide then the first ODNP comprises a nucleotide sequence of one strand of a NARS, wherein the NARS is a first restriction endonuclease recognition sequence (RERS), and a nucleotide sequence at least substantially complementary to a nucleotide sequence of the target polynucleotide located 5' to the interrogation polynucleotide, and the second ODNP comprises a nucleotide sequence of one strand of a second RERS and a nucleotide sequence at least substantially complementary to a nucleotide sequence of the target polynucleotide located 3' to the interrogation polynucleotide; (b) extending the first and the second ODNPs in the presence of at least one modified deoxyribonucleoside triphosphate to produce the double-stranded template comprising both the first RERS and the second RERS.

In yet another aspect, the present invention provides a method of forming the double-stranded template polynucleotide by a method comprising: (a) forming a mixture of a first ODNP, a second ODNP, and the target polynucleotide comprising the interrogation polynucleotide, under conditions and for a time sufficient to allow the ODNPs to hybridize to the target polynucleotide, wherein (i) if the target polynucleotide is a double-stranded polynucleotide molecule having a first strand and a second strand, then the first ODNP comprises a nucleotide sequence at least substantially complementary to a nucleotide sequence of the first strand of the target polynucleotide and located 3' to the complement of the interrogation polynucleotide, and the second ODNP comprises a nucleotide sequence at least substantially complementary to a nucleotide sequence of the second strand of the target polynucleotide located 3' to the interrogation polynucleotide, or

(ii) if the target polynucleotide is a single-stranded polynucleotide, then the first ODNP comprises a nucleotide sequence at least substantially identical to a nucleotide sequence of the target polynucleotide located 5' to the interrogation polynucleotide, and the second ODNP comprises a nucleotide sequence at least substantially complementary to a nucleotide sequence of the target polynucleotide located 3' to the interrogation polynucleotide; the first and the second ODNPs each further comprises the sequence of a sense strand of a NARS, wherein the NARS is a nicking endonuclease recognition sequence (NERS); (b) extending the first and the second ODNPs to produce the double-stranded template having two NERSs.

In a further aspect, the present invention provides a method of forming the double-stranded template polynucleotide by a method comprising:

(a) mixing an ODNP and the target polynucleotide comprising the interrogation polynucleotide under conditions and for a time sufficient to allow the ODNP and the target polynucleotide to hybridize, wherein the target polynucleotide is a single-stranded polynucleotide, and wherein the ODNP comprises a sequence of the sense strand of an NERS and a nucleotide sequence that is substantially complementary to a sequence of the target polynucleotide that is located 3' to the interrogation polynucleotide; and (b) extending the ODNP to produce the double-stranded template polynucleotide comprising the NERS.

These and other methods may be used to form a double-stranded template nucleic acid molecule useful in the present methods.

I. Combining the target polynucleotide with nicking agent, etc.

After the target polynucleotide has been placed into condition that it contains a double-stranded NARS, the target polynucleotide can be combined with the nicking agent, the polymerase and any necessary mononucleotides, in order to allow formation of a plurality of oligonucleotides that reflect the sequence of the interrogation sequence of the target polynucleotide. This plurality of oligonucleotides will preferably form a "ladder" of various lengths, i.e., the first member of the plurality will be 5 nucleotides in length, while a second member will be 6 nucleotides in length, and a third member will be 7 nucleotides in length, etc. The sequence of the first member will be repeated in the sequence of the second member, with the second member having an additional nucleotide at the 3' end, where this additional nucleotide is the complement to a nucleotide present in the interrogation sequence. Likewise, the third member will include the same base sequence present in the second member, however, the third member will include one additional nucleotide at its 3' end, where this additional nucleotide is the complement of a nucleotide present in the interrogation sequence, and where this additional nucleotide will be adjacent to the complement of the additional nucleotide discussed previously that is present in the second member. Thus, a plurality of oligonucleotides are formed, where members of the plurality differe by the number of nucleotides in the family members.

J. Distribution polymerase

It appears that all distributive enzymes such as 9-Degree North and Vent exo- DNA polymerases will produce a ladder of oligonucleotides from which a sequence can be deduced. Accordingly, distributive enzymes are the preferred polymerase in the practice of the assay of the present invention. A sequencing ladder will not be produced by processive DNA polymerases such as Bst DNA polymerase.

Accordingly, in one aspect, the present invention provides a composition comprising a distributive polymerase, a nicking endonuclease, and one or more nucleotides in a form that allows the nucleotide to be incorporated into a growing nucleic acid molecule. This composition is extremely useful as a reagent system that may be in admixture with a template nucleic acid molecule as described herein, to provide sequence information about the template nucleic acid molecule. In one embodiment, the composition includes a buffer within which the distributive polymerase and nicking endonuclease are active.

In another embodiment, the present invention provides provides a composition comprising a distributive polymerase, a nicking endonuclease, one or more nucleotides in a form that allows the nucleotide to be incorporated into a growing nucleic acid molecule, and a template nucleic acid molecule as described herein, e.g., a template nucleic acid molecule that includes a recognition sequence for the nicking endonuclease and also includes a target polynucleotide that comprises an interrogation sequence. This composition is useful for obtaining base sequence information about the target polynucleotide.

K. Hybridization conditions

Methods, kits, and compositions of the present invention may involve or include hybridizing two nucleic acids so as to form a duplex, e.g., ODNPs may be hybridized to the target nucleic acid, where the ODNP facilitates the production and/or amplification of a defined interrogation fragment within the target nucleic acid. The nucleic acid molecules are thus preferably combined under base-pairing conditions that allow for hybridization and/or amplification. Selection of suitable nucleic acid hybridization and amplification conditions is within the skill of one of ordinary skill in the art, and may be assisted by reference to, for example, the following laboratory research manuals: Sambrook et al., Molecular Cloning (Cold Spring Harbor Press, 1989) and Ausubel et al., Short Protocols in Molecular Biology (1999) (incorporated herein by reference in their entirety).

Depending on the application envisioned, the artisan can vary conditions of hybridization to achieve the desired degree of selectivity of an ODNP or other nucleic acid molecule towards a target sequence. For applications requiring high selectivity, relatively stringent conditions may be employed to form the hybrids, such as low salt and/or high temperature conditions (e.g., from about 0.02 M to about 0.15 M salt at temperatures of from about 50 °C to about 70 °C). These selective conditions are relatively intolerant of large mismatches between the ODNP or other nucleic acid molecule and the target nucleic acid. Alternatively, hybridization of the ODNPs or other nucleic acid molecule may be achieved under moderately stringent conditions, for example, in 10 mM Tris, pH 8.3; 50 mM KCI; 1.5 mM MgCI₂ at 60 °C, which conditions permit the hybridization of an ODNP or other nucleic acid molecule comprising nucleotide mismatches with the target nucleic acid. The design of alternative hybridization conditions is well within the expertise of the skilled artisan.

L. Amplification

In one aspect, the sequencing method of the present invention comprises amplifying an interrogation nucleic acid sequence and sequencing the single-stranded DNA products that result from the amplification process. Amplification is performed by combining a template double-stranded nucleic acid, comprising a nicking agent recognition sequence (NARS) and a nicking site (NS), with a nicking agent (NA) that recognizes the NARS. The double- stranded nucleic acid template is then nicked at the NS by the nicking agent thereby producing a 3' terminus at the NS. In the presence of a 5'- 3' exonuclease-deficient DNA polymerase, the 3' terminus at the NS is extended, replacing the downstream single-stranded DNA fragment (i.e., the single- stranded DNA fragment having a 5' terminus that was at the NS). The extension product is then nicked by the nicking agent, again creating a 3' terminus at the NS, which is again extended by the DNA polymerase to form another extension product. This nicking-extension process is repeated multiple times, resulting in the amplification of the single-stranded DNA fragment having a 5' terminus that is created by the nicking agent at the NS. Surprisingly, the fragments generated by this process vary in length. By aligning the single- stranded fragments according to methods that differentiate fragments that differ by a single nucleotide, the nucleotide sequence of the target nucleic acid can be determined.

In one aspect of the invention, in order to obtain a template nucleic acid that comprises both a portion of a target nucleic acid containing a genetic variation at a defined position, and various combinations of recognition sequences (e.g., a NERS and a RERS (including an IRERS), two NERSs, or two hemimodified RERSs), a pair of primers is hybridized to the target nucleic acid, and each primer of the ODNP pair is extended using various methodologies known in the art, such as the polymerase chain reaction (PCR) and modified ligase chain reaction (LCR). Generally, at least three runs of extension reactions from the ODNP pairs described above need be carried out. Briefly, the first run of extension is for the first primer having a sequence of a sense strand of an enzyme recognition sequence (ERS) ("the first ERS") (e.g., hemimodifed RERS, NERS) to incorporate the interrogation fragment of the target nucleic acid into the first extension product. The second primer, optionally having an additional sequence of one strand of another ERS ("the second ERS"), which may or may not be the same as the first ERS, then hybridizes to and extends using the first extension product as a template and thereby incorporates the complement of the interrogation fragment and the complement of the first ERS into a second extension product. An unextended first primer then hybridizes to and extends using the second extension product as a template to form, in combination with the second extension product, a double-stranded nucleic acid fragment that contains the interrogation fragment and the complement thereof, as well as two complete ERSs. While three runs of extension reactions are sufficient to produce a fragment containing an interrogation fragment of a target nucleic acid and two ERSs, preferably, more than three extension reactions are conducted to amplify the template. As one of ordinary skill in the art will appreciate, in the subsequent runs of extension, the first primer can hybridize to and extend using any of the target nucleic acid, the second extension product, and the complement of the third extension product as a template. Similarly, in the subsequent runs of extension, the second primer can hybridize to and extend using either the first extension product or the third extension product as a template. However, because the third extension product and the complement thereof are shorter than any one of the target nucleic acid, the first extension product, and the second extension product, the third extension products are the preferred templates for subsequent extension reactions from either the first or the second ODNPs. This is because the extension efficiency using a short fragment as a template is higher than the efficiency using a large fragment as a template. With an increase in the number of extension reactions, the double- stranded fragment containing both an interrogation fragment and the first ERS and the optional second ERS accumulate more quickly than other molecules in the reaction mixture. Such accumulation increases the sensitivity of subsequent characterization of a single-stranded nucleic acid amplified using the above template nucleic acid as a template.

The extension/amplification reaction can be carried out by any method known in the art. For instance, PCR methods described in U.S. Patent Nos. 4,683,195, 4,683,202, and 4,800,159 may be used. Other PCR methods may also be used as described in books, e.g., Gelfand et al., PCR Protocols: A Guide to Methods and Application (1990); PCR: Essential Techniques (Burke, ed.); and McPherson et al. PCR (Basic: From Background to Bench). Each of the above references is incorporated herein by reference in its entirety. Briefly, in PCR, two ODNPs are prepared that are complementary to regions on opposite complementary strands of the target nucleic acid sequence. An excess of deoxyribonucleoside triphosphates is added to a reaction mixture along with a DNA polymerase (e.g., Taq or Pfu polymerase). If the target nucleic acid sequence is present in a sample, the ODNPs will bind to the target and the polymerase will cause the ODNPs to be extended along the target nucleic acid sequence by adding on nucleotides. By raising and lowering the temperature of the reaction mixture, the extended ODNPs will dissociate from the target to form reaction products, excess ODNPs will bind to the target and to the reaction product and the process is repeated.

Exemplary PCR conditions according to the present invention may include, but are not limited to, the following. PCR reactions (100 μl) comprise 100 ng target nucleic acid; 0.5 μM of each of the first ODNP and the second ODNP; 10 mM Tris, pH 8.3; 50 mM KCI; 1.5 mM MgCI_2; 200 μM each dNTP; 4 units Taq™ DNA Polymerase (Boehringer Mannheim; Indianapolis, IN), and 880 ng TaqStart™ Antibody (Clontech, Palo Alto, CA). Exemplary thermocycling conditions may be as follows: 94°C for 5 minutes initial denaturation; 45 cycles of 94°C for 30 seconds, 60°C for 30 seconds, 72°C for 1 minute; final extension at 72°C for 5 minutes. Exemplary nucleic acid polymerases may include one of the thermostable DNA polymerases that are readily available in the art such as, e.g., Taq™, Vent™, or PFU™. Depending on the particular application contemplated, it may be preferred to employ one of the nucleic acid polymerases having a defective 3' to 5' exonuclease activity.

M. Gap-LCR

An alternative way to make and/or amplify a fragment containing a nucleotide to be identified is by a modified ligase chain reaction, referred to herein as the gap-LCR (Abravaya et al., Nucleic Acids Res. 23:675-682 (1995)). Briefly, in the presence of the target sequence, each pair of the set will bind to the target, or the complement thereof, located 5' and 3' of (on either side of) the nucleotide of interest in the target nucleic acid. In the presence of a polymerase and a ligase, the gap between the two ODNPs of each pair will be filled in and the ODNPs of each pair ligated to form a single unit. By temperature cycling, as in PCR, bound ligated units dissociate from the target and then serve as "target sequences" for ligation of excess ODNP pairs. Thus, LCR uses both a nucleic acid polymerase enzyme and a nucleic acid ligase enzyme to drive the reaction. Exemplary nucleic acid polymerases may include one of the thermostable DNA polymerases that are readily available in the art such as, e.g., Taq™, Vent™ or PFU™. Exemplary nucleic acid ligases may include T4 DNA ligase, or the thermostable Tsc or Pfu DNA ligases. U.S. Patent No. 4,883,750, incorporated herein by reference in its entirety, describes an alternative method of amplification similar to LCR for binding ODNP pairs to a target sequence. Exemplary gap-LCR conditions may include, but are not limited to, the following. LCR reactions (50 μl) comprise 500 ng DNA; a buffer containing 50 mM EPPS, pH 7.8, 30 mM MgCI₂, 20 mM K⁺, 10 μM NAD, 1-10 μM gap filling nucleotides, 30 nM each oligonucleotide primer, 1 U Thermus flavus DNA polymerase, lacking 3'->5' exonuclease activity (MBR, Milwaukee, Wl), and 5000 U 7. thermophilus DNA ligase (Abbott Laboratories). Cycling conditions may consist of a 30 second incubation at 85°C and a 30 second incubation at 60°C for 25 cycles and may be carried out in a standard PCR machine such as a Perkin Elmer 9600 thermocycler.

N. When the NARS is a RERS As explained herein, depending on the specific bases contained within the nicking agent recognition sequence (NARS), the NARS may be a restriction enzyme recognitions sequence (RERS). To prevent the template nucleic acid that contains two RERSs from being cleaved in both strands, extension/amplification from the third ODNP pair may be performed in the presence of a modified deoxyribonucleoside triphosphate (e.g., α-thio deoxyribonucleoside triphosphate). The incorporation of the modified deoxynucleotide into one strand of the RERS blocks the cleavage by the RE that recognizes the RERS in the modified strand. Consequently, only the sense strand of the RERS that is a portion of the ODNP pair may be cleaved. Any modified deoxyribonucleoside triphosphate that contributes to the inhibition of cleavage of one of the two DNA strands comprising the RERS may be used in the present invention. Exemplary modified deoxyribonucleoside triphosphates include, but are not limited to, 2'- deoxycytidine 5'-0-(1 -thiotriphosphate) [i.e., dCTP(.alpha.S)], 2'- deoxyguanosine 5'-O-(1 -thiotriphosphate), thymidine-5'-0-(1 -thiotriphosphate), 2'-deoxycytidine 5'-0(1 -thiotriphosphate), 2'-deoxyuridine 5'-triphosphate, 5- methyldeoxycytidine 5'-triphosphate, and 7-deaza-2'-deoxyguanosine 5'- triphosphate. In certain embodiments, more than one modified deoxyribonucleoside triphosphate may be used. In some other embodiments, post-synthesis modification of certain appropriate nucleotide(s) provides an alternative to the use of a modified deoxyribonucleoside triphosphate. The amplified template nucleic acids as described above, if containing a RERS (not including a hemimodified RERS), are typically digested by a RE that recognizes the RERS. The term "restriction endonuclease" (RE) refers to the class of nucleases that recognize unique double-stranded nucleic acid sequences and that generate a cleavage in the double-stranded nucleic acid, resulting in either blunt double-stranded ends, or single-stranded ends with either a 5' or a 3' overhang. REs are usually classified into three types: type I, type II and type III, among which type II are most commonly used in molecular biological manipulation. The RE that may be used in the present invention includes any type II RE, such as (1) the vast majority of type II REs that recognize specific sequences that are four, five, or six nucleotides in length and display twofold symmetry (e.g., EcoR I, BamH I); (2) type I Is REs (TSREs); and (3) interrupted REs (IREs). A "type lls restriction endonuclease" is a restriction endonuclease that cleaves between two nucleotides that are not part of the enzyme's recognition sequence. Exemplary TSREs include, but are not limited to, Fok I, Bsgl, and Bpml. An "interrupted restriction endonuclease" is a type II restriction endonuclease that recognizes an interrupted restriction endonuclease recognition sequence (IRERS). An IRERS is defined as a restriction endonuclease recognition sequence that is comprised of a "first constant recognition sequence" (CRS), a "second CRS," and a "variable recognition sequence" (VRS) that links the first and second CRSs. All three recognition sequences of an IRERS are double-stranded. According to the present invention, "first CRS" is defined as that region of the IRERS that contains the constant (not variable) nucleotides of the IRERS flanking the VRS of the IRERS at one side, wherein "second CRS" is defined as that region of the IRERS that contains the constant (not variable) nucleotides of the IRERS flanking the VRS of the IRERS at the other side. According to the present invention, the VRS is defined as the region of one or more variable nucleotides that is located between the first and second CRSs. Exemplary IREs include, but are not limited to, Bsl I, EcoN I, Dra III, and Dde I.

The RE useful for the present invention may be purchased from various companies such as, New England Biolabs Inc. (Beverly, MA; www.neb.com); Stratagene (La Jolla, CA; www.stratagene.com), Promega (Madison, Wl: www.promega.com), and Clontech (Palo Alto, CA; www.clontech.com). Non-commercially available restriction enzymes may be isolated and/or purified based on the teaching available in the art. For instance, the following articles describe the isolation and/or purification of several non- commercially available restriction enzymes suitable for the present invention and are incorporated herein by reference in their entireties by reference: for restriction enzyme ApaB I, Grones et al., Biochim. Biophys. Ada 7762:323-25, 1993, Grones et al., Biologia (Bratisl) 46:1103-08, 1991 ; for EcoH I, Glatman et al., Mol. Gen. Mikrobiol. Virusol. 3:32, 1990; for Fmu I, Rebentish et al., Biotekhnologiya 3: 15-16, 1994; for HpyB 11 , FEMS Microbiol. Lett. 179: 175-80, 1999; for

Sse8647 I, Nomura et al., European Patent Application No. 0698663 A1 , Ishino et al., Nucleic Acids Res. 23:742-44, 1995; for Unb I, Kawalec et al., Ada Biochim. Pol. 44:849-852, 1997; and for VpaK11A I, Miyahara et al. J. Food. Hyg. Sci. Japan 35:605-609, 1994. Descriptions of conditions for storage and use of restriction endonucleases that are used according to the present invention are readily available in the art and are found, for example, in laboratory manuals such as Sambrook et al., supra and Ausubel et al., supra. Briefly, the number of units of RE added to a reaction may be calculated and adjusted according to the varying cleavage rates of nucleic acid substrates. One unit of restriction endonuclease will digest 1 μg of substrate nucleic acid in a 50 μl reaction in 60 minutes. Generally, to be cleaved completely, fragments, such as those generated by amplification of the target nucleic acid, may require more than 1 unit μg. The restriction enzyme buffer is typically used at 1X concentration in the reaction. Some restriction endonucleases require bovine serum albumin (BSA) (usually incorporated in a reaction at a final concentration of 100 μg/ml for optimal activity). Restriction endonucleases that do not require bovine serum albumin (BSA) for optimal activity are not adversely affected if BSA is present in the reaction.

Most restriction endonucleases are stable when stored at -20 °C in the recommended storage buffer. All restriction endonucleases should be kept on ice when not otherwise being stored in the freezer to minimize exposure to temperatures above -20 °C Enzymes should always be the last component added to a reaction.

The recommended incubation temperature for most restriction endonucleases is about 37°C Restriction endonucleases isolated from thermophilic bacteria require higher incubation temperatures, typically ranging from 50 °C to 65 °C Incubation time may often be shortened if an excess of restriction endonuclease is added to the reaction. Longer incubation times may be used to allow a reaction to proceed to completion with fewer units of restriction endonuclease.

When two REs are used simultaneously, reaction conditions, including incubation temperature and reaction buffer, need to be optimized to be suitable for both enzymes. However, if no digestion condition can be found that is suitable for both of two different REs, digestion with one RE may be carried out prior to digestion with the other RE. In the case that double digestion is carried out using a NE and a RE, the resulting products from an extension/amplification product using the primer pair of the present invention (e.g., the fourth ODNP pair) are two double-stranded oligonucleotides and a single-stranded oligonucleotide.

In a preferred embodiment of the invention, the NARS is recognized by a nicking endonuclease, so that the NARS is a nicking endonuclease recognition sequence (NERS).

O. Preferred Embodiments of the Assay

To produce a sequencing reaction that amplifies the fragments it is necessary to devise a cyclic chain of reactions that will restore the reactants to their initial state after each synthesis of the molecules to be amplified. The linear amplification reaction described here provides such a cycle whose sequence specificity derives from template-dependent synthesis of the ladder of oligonucleotides to be amplified. The reaction synthesizes short oligonucleotides whose cycle of reactions depends on the idea that, at the reaction temperature, oligonucleotides above a certain length form stable duplexes, while those below this critical length form unstable duplexes that dissociate readily. By arranging a specific, single-strand nicking site and nicking enzyme and a compatible DNA polymerase, as described in Figure 1a, a cycle of polymerization and subsequent oligonucleotide release can be set up. This cycle depends on the nicking reaction cleaving a phosphodiester bond to create an oligonucleotide that is below the threshold of stability in a duplex, and is thereby released from the duplex, thus regenerating the initial primer template. The synthesized oligonucleotide is fully stable at 60°C when it is covalently joined to the rest of the upper stand, as it is immediately after its synthesis, but is only transiently stable as a 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, or 16-mer which it becomes after the nicking reaction. Therefore, when the bond is cleaved at the nicking site the oligonucleotide soon dissociates recreating a primer template ready for elongation. This cycle thus creates oligonucleotides that are complementary to the template beyond the nicking site. The general steps are outlined as follows: 1 ) A duplex is formed in which nicking enzyme recognition sites (NERS) are present in the top strand and the bottom strand of the duplex. This is shown in Figure 13.

2) The nicking enzyme is then added and the duplex is nicked.

3) The duplex falls apart at 60C in the nicking buffer.

4) The top and bottom strands (designated sequence A) is amplified. This is shown in Figure 14.

Another general method for signature sequencing is as follows. 1 ) Isolate target nucleic acid, and if target nucleic acid is double stranded, cut into pieces with nicking agents or restriction endonucleases, denature.

2) Hybridize an oligonucleotide probe to a sequence containing the complement of the nicking enzyme recognition site "GAGTC". The probe is 18 to 50 nucleotides long, preferably 24-36 nt.

3) Add nicking enzyme and cut the oligonucleotide 4 bases to the 3' side.

Carry out the sequencing reaction with a distributive polymerase.

The general procedure can be described as follows: 1 ) An amplicon is generated that contains the genetic variation of interest, or the nucleic acid sequence of interest. The middle of the fragment contains the genetic variation of interest, or the nucleic acid sequence of interest. This area is referred to as the "interrogation fragment".

5' xxxxxxxxxxxxxxxxxxxxx 3'

3' xxxxxxxxxxxxxxxxxxxxx 5' interrogation fragment area

2) At the 5' end of the top strand of the amplicon contains the recognition sequence for N.BstNBI which is a single strand nicking enzyme that nicks 4 bases upstream (towards the 3' end of the top strand in the amplicon) of the recognition site GAGTC. The recognition site for the N.BstNBI is incorporated into the forward primer during synthesis of the oligonucleotide. 3) The reverse primer is constructed such that it contains a non-palindromic recognition sequence for a type 2 restriction endonuclease (RE) that reaches 1/4 bases to the 3' end of the fragment and makes a staggered double strand cut. These enzymes are represented by enzymes such as Fok I, Bst I, commonly known as type II restricition endonucleases, etc.

4) The genetic variation of interest, or the nucleic acid sequence of interest is amplified using these two sets of primers, to the extent necessary.

5) This step amplifies the interrogation fragment using an isothermal process. Three enzymes are added. The nicking enzyme N.BstNBI, the type II RE like Bstl, and a DNA polymerase like Bst or Taq. Also added are appropriate buffer including Mg++, KCI, NaCI, Tris-HCl, additives such as trehalose, betaine etc. Also added are dNTPs (deoxynucleotide triphosphates).

6) The reaction is incubated at 55 to 65°C, prepferably at 60°C in the case of using N.BstNBI. nicking enzyme.

7) The type II RE cleaves off the 3' of the amplicon.

-xxxxxxxxxxxxxxx + -xxxxxxxxxxxxxxx +

8) The nicking enzyme nick the top strand only and the top strand of the interrogation fragments is released from the duplex. This leaves a 3' recessed duplex that can be filled in by the DNA polymerase:

5' + 5'xxxxxxxxxxxxxxxxx3'

3' xxxxxxxxxxxxxxxx5'

9) The polymerase now fills in the recessed 3' end of the top fragment. 5' 3cxxxxxxxxxxxxxxx3 ^r

3' xxxxxxxxxxxxxxxxδ' each fill-in terminates at a dufferent distance from the recessed 3' -terminus.

10) The enzyme is now cut by the N.BstNBI again as the polymerase has fallen off the fragment.

5' + 5' xxxxxxxxxxxxxxxxx3 '

3' xxxxxxxxxxxxxxxxδ'

+ 5' xxxxxxxxxxxxxxxx3 ' + 5' xxxxxxxxxxxxxxx3

+ 5' xxxxxxxxxxxxxx3 ' + 5' xxxxxxxxxxxxx3 '

5' xxxxxxxxxxxx3 '

11 ) Each step produces a range of fragment ranging in size from 4 to 24 nucleotides in length and is easily measured by LC/MS. 12) This isothermal amplification sequencing reaction is possible because, the nicking enzyme dissociates after cutting and is catalytic, and the DNA polymerase dissociates after entension and is catalytic. The amplification is linear but the cycling can be conducted for hours since both enzymes are thermostable. Up to a thousand fold to 100,000-fold amplification can be achieved without sacrificing the integrity of the product.

13) This amplification can be achieved by ligating two set of adapters to any double strand nucleic acid. One adapter contains the N.BstNBI site and the other adapter contains the type II RE site. This would be especially useful for making probe for arrays on solid supports. 14) This amplification can also be used for detecting SNPs on solid supports or arrays. An adapter is arrayed onto a solid support containing the N.BstNBI recognition sequence and an 8 to 36 base pair single-strand region specific for the allele of sequence of interest. The allele is captured by hybridization. The nicking enzyme is added, along with the DNA polymerase, and the interrogation fragment is produced, which is detectable by mass spectrometry.

The amplification method is particularly amenable to high throughput gene expression analysis. In this format, RNA is captured by polyA/T hybridization, The RNA is converted to cDNA, the cDNA converted to double strand DNA, cut with a RE to minimize the length of the fragment. The nicking enzyme and DNA polymerase is then added generating small DNA fragments which can be interrogated by mass spectrometry.

When the nicking enzyme is also present with a compatible polymerase the reaction proceeds around the cycle shown in figure 1a, and linear amplification of the product oligonucleotides occurs, creating a sequencing ladder. In Figure 1 b we show the results of one of these reactions. The experiment was devised to produce a series of oligomers as its amplified product.

P. Characterizing Oligonucleotides

The sequencing assay of the present invention prepares a family of oligonucleotides. These oligonucleotides are characterized, and based on this characterization, information about a base sequence present in a target polynucleotide is obtained. The following are suitable methods for characterizing the oligonucleotides.

Matrix-Assisted Laser Desorption-lonization (MALDI), Time-of- Flight (TOF), and Electrospray lonization (ESI) mass spectrometers are useful characterization techniques according to the present invention because they permit the rapid analysis for the size-separation step in DNA sequence analysis. Mass spectrometry can significantly increase the speed of the separation, detection, and data-acquisition processes for sequence analysis over conventional gel electrophoresis methods (Fitzgerald, M.C and Smith, L.M., "Mass spectrometry of nucleic acids: The promise of matrix-assisted laser desorption-ionization (MALDI) mass spectrometry,". Annu. Rev. Biophys. Biomol. Struct. 24:\ 17-140, 1995). With MALDI-TOF mass spectrometry, the molecular weights of different molecules are measured directly, combining separation, detection, and characterization into a single step. In 1998, SEQUENOM (San Diego, CA) (http://www.sequenom.com) sequenced 670 bases of the p53 gene using MALDI-TOF (Fu, D.J., et al., ..Sequencing exons 5 to 8 of the p53 gene by MALDI-TOF mass spectrometry," Nat. Biotechnol. 76:381-384, 1998.). For review articles on mass spectrometry, see Clark, M.D., et al., "Construction and analysis of arrayed cDNA libraries," Methods Enzymol. 303:205-233, 1999; Griffin, T.J., et al., "Direct genetic analysis by matrix-assisted laser desorption/ionization mass spectrometry," Proc. Natl. Acad. Sci. 96:6301-6306, 1999; Aebersold, R., et al., "Equipping scientists for the new biology," Nat. Biotechnol. 78:35, 2000; Deforce and Van den Eeckhout(2000), Fei, Z. and Smith, L.M., "Analysis of single nucleotide polymorphisms by primer extension and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry," Rapid Commun. Mass Spectrom. 74:950-959, 2000; Gatlin, C.L., et al., "Mass spectrometry. From genomics to proteomics," Trends Genet. 76:5-8, 2000; "Automated identification of amino acid sequence variations in proteins by HPLC/microspray tandem mass spectrometry," Anal. Chem. 72:757-763, ; Gevaert, K. and Vandekerckhove, J., "Protein identification methods in proteomics," Electrophoresis 27:1145-1154, 2000; Griffin, T.J. and Smith, L.M., "Single- nucleotide polymorphism analysis by MALDI-TOF mass spectrometry," Trends Biotechnol. 78:77-8, 2000; Griffiths, W.J., "Nanospray mass spectrometry in protein and peptide chemistry," EXS 88:69-79, 2000; Guilhaus, M., et al., Orthogonal acceleration time-of-flight mass spectrometry," Mass Spectrom. Rev. 79:65-107, 2000; Jackson, P.E., et al., "Mass spectrometry for genotyping: An emerging tool for molecular medicine," Mol. Med. Today 6:271-276, 2000; Johnston, M.V., "Sampling and analysis of individual particles by aerosol mass spectrometry," J. Mass Spectrom. 35:585-595, 2000; Li, L., et al., "Single-cell MALDI: A new tool for direct peptide profiling," Trends Biotechnol. 78:151-160, 2000; Roepstorff, P. 2000. MALDI-TOF mass spectrometry in protein chemistry. EXS 88:81-97, 2000, and Yates, J.R., 3rd, Mass spectrometry. From genomics to proteomics," Trends Genet. 76:5-8, 2000. A new mass spectrometric technique, charge reduction electrospray mass spectrometry (CREMS), is described in Scalf, M., et al., "Charge reduction electrospray mass spectrometry," Anal. Chem. 72:52-60, 2000. 1. Mass spectroscopy

MS is particularly advantageous in those applications in which it is desirable to eliminate a size separation step prior to molecular weight determination. Quite surprisingly, as part of the present invention it was discovered that small DNA fragments are amenable to detection by MS. Sensitivities may be achieved to at least to 1 amu. The smallest mass differences in nucleic acid bases is between adenine and thymidine which is 9 Daltons.

Particularly preferred methodologies according to the present invention employ Liquid Chromatography-Time-of-Flight Mass Spectrometry (LC-TOF-MS). LC-TOF-MS is composed of an orthogonal acceleration Time- of-Flight (TOF) MS detector for atmospheric pressure ionization (API) analysis using electrospray (ES) or atmospheric pressure chemical ionization (APCI). LC-TOF-MS provides high mass resolution (5000 FWHM), high mass measurement accuracy (to within 5ppm) and very good sensitivity (ability to detect femtomolar amount of DNA polymer) compared to scanning quadrupole instruments. TOF instruments are generally more sensitive that quadrupoles, but are correspondingly more expensive.

LC-TOF-MS has a more efficient duty cycle since the current instruments can sequentially analyze one mass at a time while rejecting all others (this is referred to as single ion monitoring (SIM)). LC-TOF-MS samples all of the ions passing into the TOF analyzer at the same time. This results in higher sensitivity, provides quantitative data, which improves the sensitivity between 10 and 100 fold. Enhanced resolution (5000 FWHM) and mass measurement accuracy of better than 5 ppm imply that differences between nucleosides as small as 9 amu (Daltons) can be accurately measured. The TOF mass analyzer performs very high frequency sampling (10 spectra/sec) of all ions simultaneously across the full mass range of interest. The duty cycle of the LC-TOF-MS allows high sensitivity spectra to be recorded in quick succession making the instrument compatible with more efficient separations techniques such as narrow bore LC, capillary chromatography (CE) and capillary electrochromatography (CEC). The ions are pulsed into the analyzer, effectively taking a 'snapshot' of the ions present at any time.

In the first stage the ES or APCI aerosol spray is directed perpendicularly past the sampling cone, which is displaced from the central axis of the instrument. Ions are extracted orthogonally from the spray into the sampling cone aperture leaving large droplets, involatile materials, particulates and other unwanted components to collect in the vent port that is protected with an exchangeable liner. The second orthogonal step enables the volume of gas (and ions) sampled from atmosphere to be increased compared with conventional API sources. Gas at atmospheric pressure sampled through an aperture into a partial vacuum forms a freely expanding jet, which represents a region of high performance compared to the surrounding vacuum. When this jet is directed into the second aperture of a conventional API interface it increases the flow of gas through the second aperture. Maintaining a suitable vacuum in the MS-TOF therefore places a restriction on the maximum diameter of the apertures in such an LC interface. Ions in the partial vacuum of the ion block are extracted electrostatically into the hexapole ion bridge which efficiently transports ions to the analyzer.

The coupling of the TOF mass analyzers with MUX-technology allows the connection of up to 8 HPLC columns in parallel to a single LC-TOF- MS. (Micromass, Manchester UK). A multiplexed electrospray (ESI) interface is used for on-line LC-MS utilizing an indexed stepper motor to sequentially sample from up to 8 HPLC columns or liquid inlets operated in parallel.

Use of LC-TOF-MS is sometines preferred over use of MALDI- TOF because LC-TOF-MS is a quantitative method for analysis of the molecular weight of polymers. LC-TOF-MS does not fragment the polymers and it employs a very gentle ionization process compared to matrix-assisted- lazer-desorption-ionization (MALDI). Because every MALDI blast is different, the ionization is not quantitative. LC-TOF-MS does, however, produce different m/z values for polymers. 2. Liquid chromatography

High-Performance Liquid Chromatography (HPLC) is a chromatographic technique for separation of compounds dissolved in solution. HPLC instruments consist of a reservoir of mobile phase, a pump, an injector, a separation column, and a detector. Compounds are separated by injecting an aliquot of the sample mixture onto the column. The different components in the mixture pass through the column at different rates due to differences in their partitioning behavior between the mobile liquid phase and the stationary phase. The pumps provide a steady high performance with no pulsating, and can be programmed to vary the composition of the solvent during the course of the separation.

Exemplary detectors useful within the methods of present invention include UV-VIS absorption, or fluorescence after excitation with a suitable wavelength, mass spectrometers and IR spectrometers. Recently, IP- RO-HPLC on non-porous PS/DVB particles with chemically bonded alkyl chains have been shown to be rapid alternatives to capillary electrophoresis in the analysis of both single and double-strand nucleic acids providing similair degrees of resolution. (Huber et al., Anal. Biochem., 272:351 , 1993; Huber et al., Nuc. Acids Res,. 27:1061 , 1993; Huber et al., Biotechniques 76:898, 1993). In contrast to ion-exchange chromatography, which does not always retain double-strand DNA as a function of strand length (since AT base pairs interact with the positively charged stationary phase, more strongly than GC base- pairs), IP-RP-HPLC enables a strictly size-dependent separation.

A method has been developed using 100 mM triethylammonium acetate as ion-pairing reagent, phosphodiester oligonucleotides could be successfully separated on alkylated non-porous 2.3 μM poly(styrene- divinylbenzene) particles by means of high performance liquid chromatography. (Oefner, et al., Anal. Biochem. 223:39, 1994). The technique described allows the separation of PCR products differing by only 4 to 8 base pairs in length within a size range of 50 to 200 nucleotides. Denaturing HPLC (DHPLC) is an ion-pair reversed-phase high performance liquid chromatography methodology (IP-RP-HPLC) that uses a non-porous C-18 column as the stationary phase. The column is comprised of a pofystyrene-divinylbenzene copolymer. The mobile phase is comprised of an ion-pairing agent of triethylammonium acetate (TEAA), which mediates binding of DNA to the stationary phase, and acetonitrile (ACN) as an organic agent to achieve subsequent separation of the DNA from the column. A linear gradient of acetonitrile allows separation DHPLC identifies mutations and polymorphisms based on detection of heteroduplex formation between mismatched nucleotides in double stranded PCR amplified DNA. Sequence variation creates a mixed population of heteroduplexes and homoduplexes during reannealling of wild type and mutant DNA of fragments based on size and/or presence of heteroduplexes (this is the traditional use of the DHPLC technology). When this mixed population is analyzed by HPLC under partially denaturing temperatures, the heteroduplexes elute from the column earlier than the homoduplexes because of their reduced melting temperature. Analysis can be performed on individual samples to determine heterozygosity, or on mixed samples to identify sequence variation between individuals.

In certain applications, it may be preferred to use the DHPLC column in a non-denaturing mode in order to separate identically sized DNA fragments which possess a different nucleotide composition. For example, the non-denaturing mode may be applicable where, for example, a 6-mer contains a C -> T single nucleotide polymorphism (SNP) such as where the wild-type single strand DNA fragment has the nucleotide sequence 5' -AACCCC-3' and where the mutant single strand DNA fragment has the nucleotide sequence 5' - AATCCC-3' . Fragments as short as 1-mers, 2-, 3-, 4-, 5-, 6-, 7-, 8-, to 16-mers show different mobilities (retention times) on the DHPLC instrument. Thus it is possible to detect mutation not defined nucleotides locus using NPREs to create very short fragments that can be discriminated by DHPL. Alternative to applications employing non-porous materials for performing the chromatography of the small DNA fragments generated by NPRE cleavage, HPLC as both sizing and DHPLC applications work on a wide pore silica based material. Porous materials have the advantage of high sample capacity for semipreparative work.

Q. Determining sequence information after characterizing product oligonucleotides

It is straightforward to deduce the sequence of the interrogation sequence of the target nucleic acid by subtracting the masses of adjacent oligonucleotides in the oligonucleotide sequencing ladder produced according to the method of the present invention. The following example will be used for purposes of illustration. Consider the oligonucleotide sequencing template:

5' -ccqatctaqtqaqtcqctccaqctqctcaqqtqcctcqcaccqqctqqa-3' 3 ' -ggctagatcactcagcgaggtcgacgagtccacggagcgtggccgacct- 5 '

The fragments shown in Table 1 can be generated from this template: 5' -cagctgctcaggtgcctcgcaccggctgga-3' and all n-1 products.

TABLE 1

To calculate the sequence from the base masses (m/z = 1), sequential fragment lengths are subtracted. The base masses are in amu (atomic mass units or Daltons) as shown in Table 2.

TABLE 2

R. Solid Supports

The methods of the present invention can be performed on, or with use of, a solid support. For example, in one embodiment, the method includes attaching the single-stranded target polynucleotide to a solid support prior to forming the template nucleic acid molecule. In another embodiment, an ODNP is attached to a solid surface prior to mixing the ODNP and the single- stranded target polynucleotide. In either case, the template nucleic acid molecule becomes bound to a solid support.

Methods to affix a polynucleotide to a solid support are well known in the art. In various optional embodiments, the solid support is selected from the group consisting of a particle, a microwell, a microfabricated device, and a dipstick.

S. Amplification rate and yield

The products of the amplification reaction may be analyzed on an LC-MS system after the indicated incubation times at 60°C, as described in the Examples. When the exact masses of all of the relevant molecules are known, the relative concentrations of all the components, including the amplified oligonucleotide ladder, can be directly measured. The yield of oligonucleotide is perhaps best characterized in this case as the number of molecules produced per primer-template, per second. In some instances, this initial rate is about one molecule per primer-template every 2.5 seconds, or a rate of -0.4 molecules/ primer-template • sec. Typically, the reaction slows down noticeably after 10 minutes or so. This effect is consistent with the reaction rate declining exponentially, as if an essential component of the reaction is being inactivated. We expect that inactivation of the nicking enzyme is responsible, as the optimum temperature (-55°) of the enzyme is lower than the 60°C of the reaction, and preliminary experiments show a clear difference in the rate decline between different starting nicking enzyme (NE) concentrations - more enzyme makes the reaction stay linear longer. An extensive set of experiments show that the absolute initial rate of the reaction is proportional to the primer- template concentration, as expected, over a wide range of concentrations. The balance between the nicking enzyme and the DNA polymerase is more complex.

The dependence of the reaction yield (e.g., the 6-16 mers product) on the amounts of the two enzymes has been investigated. It is clear that the reaction is completely dependent on the presence of both enzymes (nicking agent and polymerase), the target, and the primer oligonucleotide, but the yield is a complex function of the amounts of both enzymes. What we see from Figure 2a is that for small amounts of nicking enzyme (NE) there is a broad range of similar, albeit low, reaction yields. At higher NE concentrations there is a sharper maximum as a function of polymerase concentration. What is particularly striking is that the DNA polymerase sharply decreases the yield when present at higher concentrations, while the yield plateaus with nicking enzyme increases, but does not decrease except at rather extreme concentrations. In addition, it is clear from the data that we can modulate, to some extent, the yield of partial products by changing the ratio of the enzymes. While we do not know precisely how the enzymes interact, cooperate or compete with one another, it is clear that there are optimal concentration ranges of both enzymes for full size product yield under the conditions used here. It is important to note that the sequencing ladder of oligonucleotides are products that are seen by their masses to be the result of incomplete elongation of the primer to the full length of the template. It happens that the reaction favors 12mers as partial products for reasons not well understood, but may have to do with the structural details of the DNA-polymerase complex, and the distributive nature of the polymerase. This phenomenon has been seen with several different template sequences.

T. Pre-Amplification

Target nucleic acids may be amplified before being combined with ODNPs or otherwise being subjected to a sequencing assay according to the present inveniotn. Any of the known methods for amplifying nucleic acids may be used. Exemplary methods include, but are not limited to, PCR (see, e.g., supra Mullis et al., (1986); Erlich et al; EP 50,424, EP 84,796, EP 258,017, EP 237,362; Mullis, EP 201 ,184; Mullis et al., U.S. Patent No. 4,683,202; Erlich, U.S. Patent No. 4,582,788; Saiki et al., U.S. Patent No. 4,683,194; and Higuchi (1989)); the use of Qbeta Replicase; strand displacement amplification (Walker et al., Nucleic Acid Res. 20:1691-96 (1995)), transcription-mediated amplification (Kwoh et al., PCT International. Patent Application Publication No. WO88/10315), RACE (Frohman, Methods Enzymol. 218:340-56, (1993)), onesided PCR (Ohara et al., Proc. Natl. Acad. Sci USA. 86: 5673-77, (1989)); and gap-LCR (Abravaya et al., Nucleic Acids Res. 23: 675-82, (1995)). The cited articles and the PCT international patent application are incorporated herein by reference in their entireties.

U. Post-amplification

Rather than, or in addition to, increasing the concentration of the target polynucleotide or the template nucleic acid molecule by an amplification process, it is also possible to amplify the members of the amplifiction ladder produced by the method of the present invention.

In order to exponentially amplify the oligonucleotides produced in the sequencing reaction, a two step copying reaction has been devised. This isothermal chain reaction can be conceptualized in three steps as outlined in Figure 3. The first step is generates a trigger oligonucleotide (an oligonucleotide from the sequencing reaction) from the target sequence of interest. The second step uses a synthetic oligonucleotide template that binds the trigger oligonucleotide and upon which the trigger is elongated by the polymerase. The elongated trigger creates the double strand recognition site (5' -GAGTC-3' ) for the nicking enzyme. The nicking enzyme then binds and cleaves the template oligonucleotide creating a primer template that allows the trigger to be copied. At a rate of about 2 triggers/minute, the primer template produces the trigger S' in a linear reaction. In the second step, the copy of the trigger binds to the second oligonucleotide template and it is elongated by the polymerase again creating a recognition site for N.BstNBI. This primer template produces the trigger S. The trigger S' primes the "S" template to generate more S' and the trigger S primes the S' template to generate more S. After the first round, the reaction becomes a chain reaction and exponential amplification ensues. It is apparent from inspection of Figure 3 that the triggers S and S' are complementary and that the trigger-binding regions of the two template oligonucleotides are complementary. This complementarity does not adversely affect the reaction as the short oligonucleotides are unstable in the reaction buffers we employ at 60C The differential between hybrid (oligonucleotide/oligonucleotide) stability and the ability of a polymerase to extend a triggering oligonucleotide during a priming step is the premise of the isothermal chain reaction. The oligonucleotide duplexes are unstable at 60C while the priming is relatively efficient.

V. Kits The present invention provides kits that may be used in performing assays according to the present invention.

In one aspect, the present invention provides a kit for determining the nucleotide sequence of an interrogation polynucleotide within a target polynucleotide, comprising: (a) if the target polynucleotide is a double-stranded polynucleotide having a first strand and a second strand,

(b) if the target polynucleotide is a single-stranded polynucleotide,

(i) the first ODNP comprising a sequence of a sense strand of a NERS and a nucleotide sequence at least substantially identical to a nucleotide sequence of the target polynucleotide and located 5' to the interrogation polynucleotide, and (ii) the second ODNP comprising a nucleotide sequence at least substantially complementary to a nucleotide sequence of the target polynucleotide located 3' to the interrogation polynucleotide and optionally comprising a sequence of one strand of a RERS.

In one embodiment, the ODNPs are designed such that when the target polynucleotide is amplified using the first and the second ODNPs as primers to generate a double-stranded polynucleotide template, the distance between a nicking site (NS) produced by a nicking endonuclease (NE) that recognizes the NERS in one strand and the location corresponding to a cleavage site produced by a restriction endonuclease (RE) that recognizes the RERS in the other strand is no more than 50 nucleotides. In one embodiment, the kit contains a nicking endonuclease (NE) and a nicking endonuclease buffer. For example, the NE may be N.BstNB I and the buffer may be suitable for N.BstNB I.

In one embodiment, the kit contains a restriction endonuclease (RE) that recognizes the RERS, and a RE buffer.

In one embodiment, the kit contains a distributive DNA polymerase and a DNA polymerase buffer, i.e., a buffer that is suitable for the distributive DNA polymerase. In one embodiment, the kit contains a distributive DNA polymerase selected from the group consisting of VentR® (exo-), Deep VentR® (exo-), and 9° North.

In one embodiment, the kit includes instructions for using the kit.

In one embodiment, the kit includes a liquid chromatography column, and a first buffer that comprises water and an ammonium salt of a secondary or tertiary amine complexed with an organic or inorganic acid, and a second buffer that comprises the first buffer and an organic solvent. Optionally, the amine is selected from the group consisting of triethylamine, diallylamine, diisopropylamine, N,N-dimethyl-N-cyclohexylamine, N,N-dimethyl-N- isopropylamine, and N,N-dimethyl-N-butylamine. Optionally, the amine is complexed with an organic acid, and wherein the organic acid is selected from the group consisting of acetic acid, propionic acid, formic acid, carbonic acid, and halogenated versions thereof. Optionally, the organic solvent is methanol or acetonitrile. Optionally, the chromatography column is a reversed-phase chromatography column.

In one embodiment, the kit includes at least one deoxyribonucleoside triphosphate. In one embodiment, the kit includes at least one modified deoxyribonucleoside triphosphate.

In one embodiment, the kit includes a control template polynucleotide and a control ODNP pair.

In one embodiment, the kit includes trehalose. In one embodiment, the kit includes an oligonucleotide standard. In one embodiment, the kit includes an access code for a software used in designing or ordering the ODNPs.

W. Compositions

In another aspect, the present invention provides various compositions that may be used or generated by the methods of the present invention. For example, in one aspect, the present invention provides a composition comprising a nicking endonuclease, a distributive DNA polymerase, and one or more deoxyribonucleoside triphosphates. In various optional embodiments contemplated by the inventors: the nicking endonuclease is N.BstNb I; and/or the distributive DNA polymerase is selected from the group consisting of VentR® (exo-), Deep VentR® (exo-), and 9° North; and/or the composition further includes an enzyme-compatible inorganic salt, e.g., potassium chloride; and/or ammonium sulfate; and/or magnesium sulfate; the composition further includes Tris-HCl; and/or the composition further includes trehalose. In one embodiment, the composition is sterile. Method for preparing sterile compositions are well known in the art, and are established by various government authorities.

X. Applications

The present invention provides methods, compositions, and kits for sequencing single-stranded nucleic acid fragments that are amplified by a method using nicking agents. The sequencing method is useful for determining the nucleotide sequence of short single-stranded or double-stranded fragments, for example, for sequencing genetic variations; genotyping microorganisms and viruses; genotyping mitochondrial genomes; sequencing single-stranded oligonucleotides; and detecting the presence or absence of a genetic mutation in a cell isolated from a biological sample. Such applications are useful in many areas, including genetic analysis for hereditary diseases; cancer diagnosis; infectious disease diagnosis; identifying disease predisposition; monitoring antibiotic sensitivity profiles of microorganisms; monitoring drug susceptibility of tumor cells; monitoring vaccine production; forensics; paternity determination; creating and monitoring crop cultivation or animal superior breeding programs; expression profiling of cell function or disease marker genes; and identification and characterization of infectious organisms that cause diseases in plants or animals, including those that may be related to food safety.

In one aspect of the invention, the sequencing method identifies a genetic variation, e.g., a "single-nucleotide polymorphism" (SNP), which refers to any single nucleotide sequence variation, preferably one that is common in a population of organisms and is inherited in a Mendelian fashion. Typically, the SNP is either of two possible bases without a possibility of finding a third or fourth nucleotide identity at the SNP site.

The genetic variation may be associated with or cause diseases or disorders. The term "associated with," as used herein, refers to the presence of a positive correlation between the occurrence of the genetic variation and the presence of a disease or a disorder in the host. Such diseases or disorders may be human genetic diseases or disorders and include, but are not limited to, bladder carcinoma, coiorectal, ovarian, breast, or other cancers, sickle-cell anemia, thalassemias, al-antitrypsin deficiency, Lesch-Nyhan syndrome, cystic fibrosis/mucoviscidosis, Duchenne/Becker muscular dystrophy, Alzheimer's disease, X-chromosome-dependent mental deficiency, Huntington's chorea, phenylketonuria, galactosemia, Wilson's disease, hemochromatosis, severe combined immunodeficiency, alpha-1-antitrypsin deficiency, albinism, alkaptonuria, lysosomal storage diseases, Ehlers-Danlos syndrome, hemophilia, glucose-6-phosphate dehydrogenase disorder, agammaglobulimenia, diabetes insipidus, Wiskott-Aldrich syndrome, Fabry's disease, fragile X-syndrome, familial hypercholesterolemia, polycystic kidney disease, hereditary spherocytosis, Marfan's syndrome, von Willebrand's disease, neurofibromatosis, tuberous sclerosis, hereditary hemorrhagic telangiectasia, familial colonic polyposis, Ehlers-Danlos syndrome, myotonic dystrophy, osteogenesis imperfecta, acute intermittent porphyria, and von Hippel-Lindau disease. In other embodiments of the invention, the target nucleic acid may comprise an interrogation fragment to be sequenced that, rather than being associated with the occurrence or risk of occurrence of a disease or disorder, relates to modulation of the disease or disorder. By way of example, resistance of a cancer cell to a chemotherapeutic agent may result from a mutation in a cellular protein, which in some manner prevents the chemotherapeutic agent from having a toxic effect on the cancer cell. By sequencing a target nucleic acid prior to treatment and during treatment to monitor any change in drug susceptibility, a subject may receive more effective treatment for the disease or disorder. In certain embodiments, the target nucleic acid is isolated from a microorganism, such as a bacterium, virus, parasite, or fungus. The target nucleic acid may be sequenced, for example, to identify the organism infecting a host, to determine the species, subspecies, or strain of a microorganism, and to evaluate and monitor drug resistance of the organism. A target nucleic acid may be isolated from organisms contained in a biological sample or the organism may be removed from a biological sample, subcultured to isolate and propogate the organism, followed by preparation of the target nucleic acid from the isolated organisms.

Any source of bacterial nucleic acid in purified or non-purified form can be utilized as starting material, provided it contains or is suspected of containing a bacterial genome of interest. Thus, the bacterial nucleic acids may be obtained from any source which can be contaminated by bacteria. When looking for bacterial infection or in distinguishing bacteria from human or animal subjects, the sample to be tested can be selected or extracted from any bodily sample such as blood, urine, spinal fluid, tissue, vaginal swab, stool, amniotic fluid or buccal mouthwash.

In other applications the sample can come from a variety of sources. For example: (1 ) in horticulture and agricultural testing the sample can be a plant, fertilizer, soil, liquid or other horticultural or agricultural product; (2) in food testing the sample can be fresh food or processed food (for example infant formula, seafood, fresh produce and packaged food); (3) in environmental testing the sample can be liquid, soil, sewage treatment, sludge and any other sample in the environment which is considered or suspected of being contaminated by bacteria.

When the sample is a mixture of material for example blood, soil and sludge it can be treated within an appropriate reagent which is effective to open the cells and expose or separate the strands of nucleic acids. Although not necessary, this lysing and nucleic acid denaturing step will allow amplification to occur more readily. Further, if desired, the bacteria can be cultured prior to analysis and thus a pure sample obtained. In the areas of horticulture and agriculture a variety of uses of this method are found. One can monitor bacterial inoculations of plants or bacterial disease of plants. It can also be used to monitor the distribution of recombinant bacteria added to the environment. Samples can come from the soil where bacteria have been added, or from fertilizer to make sure that the fertilizer has the appropriate bacteria. It can be used to monitor pest control where bacteria are added in order to kill pests such as insects. This procedure allows quick and accurate monitoring of the application of the bacterial insecticide and the activity of the insecticide. Thus, in any horticulture or agriculture procedure which requires the addition of bacteria the bacteria can be monitored throughout the procedure.

Another application of this method is in the manufacturing process. A number of manufacturing processes for instance drugs, microorganism-aided synthesis, food manufacturing, chemical manufacturing and fermentation process all rely either on the presence or absence of bacteria. In either case the method of the present invention can be used. It can monitor bacterial contamination or test that strain purity is being maintained.

This method can also be used to test stored blood for bacterial contamination. This would be important in blood banking where bacteria such as Yersinia enterocolitica can cause serious infection and death if it is in transfused blood. The procedure can also be used for quality assurance and quality control in monitoring bacterial contamination in laboratory tests. For example the Guthrie bacterial inhibition assay uses a specific strain of bacteria to measure phenylalanine in newborn screening. If this strain changes it could affect test results and thus affect the accuracy of the newborn screening program. This method of the present invention can be used to monitor the strain's purity. Any other laboratory test which uses or relies on bacteria in the assay can be monitored. The laboratory or test environment can also be monitored for bacterial contamination by sampling the lab and testing for specific strains of bacteria.

This procedure will also be useful in hospitals for tracing the origin and distribution of bacterial infections. It can show whether or not the infection of the patient is a hospital-specific strain. The type of treatment and specific anti-bacterial agent can depend on the source and nature of the bacteria. The following examples are provided by way of illustration and not limitation.

EXAMPLES

Example 1

Oligonucleotides and enzymes: To produce an amplification reaction we devised a cyclic chain of reactions that will restore the reactants to their initial state after each synthesis of the molecule to be amplified. The linear amplification reaction described here provides such a cycle whose sequence specificity derives from template- dependent synthesis of the oligonucleotide to be amplified. The reaction synthesizes short oligonucleotides whose cycle of reactions depends on the idea that, at the reaction temperature, oligonucleotides above a certain length form stable duplexes, while those below this critical length form unstable duplexes that dissociate readily. By arranging a specific, single-strand nicking site and nicking enzyme and a compatible DNA polymerase, as described in figure 1a, a cycle of polymerization and subsequent oligonucleotide release can be set up. This cycle depends on the nicking reaction cleaving a phosphodiester bond to create an oligonucleotide that is below the threshold of stability, and is thereby released from the duplex regenerating the initial primer template. The synthesized oligonucleotide is fully stable at 60°C when it is covalently joined to the rest of the upper stand, as it is immediately after its synthesis, but is only transiently stable as a 12mer which it becomes after the nicking reaction. Therefore, when the bond is cleaved at the nicking site the oligonucleotide soon dissociates recreating a primer template ready for elongation. This cycle thus creates oligonucleotides that are complementary to the template beyond the nicking site (as shown in figure 1 a).

When the nicking enzyme is also present with a compatible polymerase the reaction proceeds around the cycle shown in figure 1a, and amplification of the product oligonucleotide occurs. In figure 1 b we show the results of one of these reactions. The experiment was devised to produce an 11mer as its amplified product. The products of the reaction were analyzed on an LC-MS system after the indicated incubation times at 60°C. Since the exact masses of all of the relevant molecules are known, the relative concentrations of all the components, including the amplified oligonucleotide, can be directly measured. The yield of oligonucleotide is perhaps best characterized in this case as the number of molecules produced per primer-template, per second. For the experiment shown in the figure this initial rate is about one molecule (11 mer) per primer-template every 2.5 seconds, or a rate of ~0.4 molecules/ primer-template • sec. Note that the reaction slows down noticeably after 10 minutes or so. This effect is consistent with the reaction rate declining exponentially, as if an essential component of the reaction is being inactivated. We expect that inactivation of the nicking enzyme is responsible, as the optimum temperature (~55°C) of the enzyme is lower than the 60°C of the reaction, and preliminary experiments show a clear difference in the rate decline between different starting nicking enzyme (NE) concentrations - more enzyme makes the reaction stay linear longer. Further experiments, however, are needed to verify this hypothesis. An extensive set of experiments (data not shown) show that the absolute initial rate of the reaction is proportional to the primer-template concentration, as expected, over a wide range of concentrations. The balance between the nicking enzyme and the DNA polymerase is more complex.

To investigate this relationship we examined the dependence of the reaction yield (11 mer product) on the amounts of the two enzymes. It is clear that the reaction is completely dependent on the presence of both of the enzymes, the template, and the primer oligonucleotide (data not shown), but the yield is a complex function of the amounts of both enzymes. What we see from figure 2a is that for small amounts of NE there is a broad range of similar, albeit low, reaction yields. At higher NE concentrations there is a sharper maximum as a function of polymerase concentration. What is particularly striking is that the DNA polymerase sharply decreases the yield when present at higher concentrations, while the yield plateaus with nicking enzyme increases, but does not decrease except at rather extreme concentrations (date not shown). In addition, it is clear from the data that we can modulate, to some extent, the yield of partial products by changing the ratio of the enzymes. It is apparent that there are optimal concentration ranges of both enzymes for full size product yield under the conditions used here. If we amplify an oligonucleotide of length 16 bases, and examine the yield of full and partial products at the elevated NE concentrations we see that the extension leads to some partial products (see for example the 12mer yield in figure 2b.). These products are seen by their masses to be the result of incomplete elongation of the primer to the full length of the template. It happens that the reaction favors 12mers as partial products for reasons not well understood, but may have to do with the structural details of the distributive nature of the polymerase. This phenomenon has been seen with several different template sequences. Tuning the reaction conditions, including the enzyme concentrations, thus appears to be important for maximizing the yield of any particular product. In the present experiments full-length product is optimal. The following four oligonucleotides were used in a linear amplification reaction to generate the data shown in Figures 1 b, 2a and 2b.

Primer oligonucleotide: ITAtop : 5 ' - CCGATCTAGTGAGTCg c t c - 3 '

Target oligonucleotides:

NBbtl2 : 5' -ACGACTGGAACTgagcGACTCACTAGATCGG-3' NBbtl6 : 5' -ACCTACGACTGGAACTgagcGACTCACTAGATCGG-3' NBbt20: 5' -TGAAACCTACGACTGGAACTqaqcGACTCACTAGATCGG-3'

The following two oligonucleotides were used in an exponential amplification reaction to generate the data shown in Figures 3b and 3c:

Template oligo ceap : 5' CCTACGACTGGaacaGACTCaCCTACGACTGGAP- 3'

Trigger seqS : 5' - ACCAGTCGTAGG -3'

The in above oligonucleotides, spacer bases are indicated in lower case, and P indicates phosphate group. The NE site or its complement is indicated by an underline in all above sequences. Note that the trigger oligo above (seqS) is one base longer than that produced by the primer template. This makes it simple to distinguish the initial trigger from the amplified sequence, and does not affect its ability to prime effectively.

Oligonucleotides were synthesized by Midland Certified Reagent Company, Inc. (Midland, TX), MWG Biotech, Inc. (High Point, NC), or Sigma- Genosys (The Woodlands, TX). The oligonucleotides were routinely checked by time-of-flight mass spectrometry (using LCT from Micromass, see below). All enzymes were purchased from New England Biolabs. The DNA polymerase used was Vent exo- (Kong, H. et al. (1993) J. Biol. Chem. 268(3):1965-1967). The nicking enzyme (N.BstNBI) has a specific activity of approximately 10⁶ units/mg (H.-M. Kong, unpublished). All HPLC components (water and acetonitrile) were purchased from Fisher Scientific (Pittsburgh, PA). The dimethyl-butylamine was purchased from Sigma-Aldrich Corp. (St. Louis, MO) and a salt was made by the addition of acetic acid (Sigma Aldrich) to pH 8.4. The 2 molar stock solution was filtered using a 0.2 micron nylon filter.

Linear Sequencing and Amplification Reaction:

The conditions for the linear reaction were: 85 mM KCI, 25 mM Tris-HCl (pH 8.8 at 25°C), 2.0 mM MgS0₄, 5 mM MgCI₂, 10 mM (NH₄)₂S0₄, 0.1% (vol/vol) Triton X-100, 0.5 mM DTT, 0.4 U/ul N.BstNBI nicking enzyme, 0.05 U/μl Vent exo^" polymerase, 400μM dNTPs (EpiCentre, Madison, Wl) 10 ug/ml BSA, 1.0 μM template and primer olignucleotides (ITAtop and NBbtl 2 (equimolar)) in ultra-pure water that is nuclease-free (Ambion). These conditions correspond to 1 part Thermopol buffer and 0.5 parts N.BstNBI buffer as supplied by New England Biolabs. Reactions were assembled at 4°C, initiated by transferring to a preheated thermocycler at 60°C, and stopped by incubation at 4°C No further manipulations were performed prior to placement on the auto-injector for the LC-MS which is held at 4°C

Chromatography and Mass Spectrometry.

The chromatography system was an Agilent 1100 Series HPLC composed of a binary pump, degasser, a column oven, a diode array detector, and thermostatted microwell plate autoinjector (Agilent Technologies, Palo Alto, CA). The column was a Waters Xterra MS C18, incorporating C18 packing with 3.5 uM particle size, with 125 Angstrom pore size, 2.1 mm x 20 mm (Waters Inc. Milford, MA). The column was run at 30°C with a gradient of acetonitrile in 5mM dimethyl-butylamine acetate (DMBAA). As a check on the complete release of the signal oligo during the chromatography and injection, the column was run at 50°C after incubating the sample briefly at 95°C This procedure did not produce any observable increase in the oligo yield over the standard conditions. Buffer A was 5mM DMBAA, buffer B was 5mM DMBAA and 50% (V/V) acetonitrile. The gradient began at 10_o%B and ramped to 15%B over 0.3 minute, to 30%B over 2 minutes, to 90%B over 0.5 minute, to 10%B over 0.25 minute, then was held at 10%B for 1.25 minutes. The column temperature was held constant at 30°C The flow rate was 0.25 ml/minute. The injection volume was 10 μl. Flow rate into the mass spectrometer was also 0.25 ml/min. The mass spectrometer was a Micromass LCT Time-of-Flight with an electrospray inlet (Micromass Inc. Manchester UK). The samples were run in electrospray negative mode with a range from 800 to 2000 amu using a 1 second scan time. The source parameters were: Desolvation gas 450 L/hr, Capillary 2225V, Sample cone 30V, RF lens 400V, extraction cone 7V, desolvation temperature 275°C, Source temperature 120°C Analysis of the LC-mass spectrometry data made use of the software supplied by the manufacturer.

Oligonucleotides are seen to exhibit different ionization efficiencies, which in our measurements is translated into sequence-specific differences in measured oligo concentration. A survey of a range of more than 80 different 12mers indicated that the variation between sequences attributable to this difference is less than 30%. Almost all relevant quantitative comparisons are with the same oligo sequence. It is necessary, however, to calibrate for ionization efficiencies for quantitative comparisons between different sequences.

Real-time Fluorescence Measurement

All fluorescence measurements reported here were made on an MJ Opticon instrument (MJ Research Ltd., Waltham, MA) using software supplied by the manufacturer. The real-time measurements on this instrument were made using an isothermal protocol using a 30 second interval read beginning 10 seconds after the lid and chamber reached 60°C Determining the sequence of a nucleic acid sample using a nicking enzyme and a DNA polymerase

ITAtop has the following sequence:

ITAtop : 5 ' -ccgatctagtgagtcgctc-3'

The sequence of NBbtl 6 is as follows:

NBbtl6 : 5 ' -acctacgactggaactgagcgactcactagatcgg-3'

The oligonucleotides were obtained commercially from MWG Biotech (NC). The following reaction mixture was assembled at room temperature:

75 μl water

10 μl 10x Thermopol buffer (NEB Biolabs, Beverly MA)

5 μl 10x N.BstNBI (NEB Biolabs, Beverly MA)

5 μl ITAtop at 0.2 nanomoles/ul

5 μl ITAbt12 at 0.2 nanmoles/ul

The mixture was heated to 95°C and then cooled to 50°C and held at 50°C for 10 minutes. The following duplex was formed:

5'-ccg ate tag t a ate get c-3' 3'-ggc tag ate act cap cga gtcaaggtcagcatcca-5'

The duplex mixture was diluted into a reaction mixture containing the following:

25 μl 10x Thermopol buffer (NEB Biolabs, Beverly MA)

12.5 μl 10x N.BstNBI (NEB Biolabs, Beverly MA)

0.5 μl of the duplex mixture described above

10 μl 25 mM dNTPs (NEB Biolabs, Beverly MA)

100 μl 1 M trehalose (Sigma, St. Louis MO) 25 units N.BstNBI nicking enzyme (NEB Biolabs, Beverly MA)

5 units Vent exo- DNA polymerase (NEB Biolabs, Beverly MA)

102 μl water

10x ThermoPol buffer used at 1 x concentration: 100 mM KCI 100 mM (NH4)2SO4 200 mM Tris-HCl (pH 8.8 at 25°C) 1% Trition X-100 2.0 mM Mg S04

10x N.BstNBI buffer used at 0.5X concentration: 1500 mM KCI

100 mM Tris-HCl (pH 7.5 at 25°C) 100 mM MgCI2 10 mM DTT

9-degree North Thermostable DNA polymerase was used at 5-50 units/ml depending on assay. N. BstNBI enzyme was used at 10- 50 units per ml depending on assay. Some assays employed 0.2 M trehalose as an additive (final concentration). The reaction was incubated at 60°C for 30 minutes. After 30 minutes, 10 μl of the reaction was sampled and subjected to mass spectrometry.

The following duplex was filled in by the action of the DNA polymerase:

5'-ccg ate tag tqa ate get caattccaatcataaat-3'

3'-ggc tag ate act cag cga gtcaaggtcagcatcca-5'

The nicking enzyme cuts the upper strand of the duplex and releases the following fragments:

16-mer: 5' - attccaatcataaat-3'

15-mer: 5' - attccaatcataaa-3'

14-mer: 5' - attccaatcataa-3'

13-mer: 5' -attccaatcata-3'

12-mer: 5' -αttccaαtcαt-3'

11-mer: 5' - αttccaαtcα-3'

10-mer: 5' -αttccaαtc-3^f

9-mer: 5' -αttccaαt-3'

8-mer: 5' -gttccaα-3'

7-mer: 5' -gttcca-3'

6-mer: 5' -gttcca-3'

5-mer: 5' -gttcc-3'

4-mer: 5' -αttc-3'

3-mer: 5' -qtt-3'

+

5'-ccg ate tag tqa αtc get ca-3'

3'-ggc tag ate act caα ega gtcaaggtcagcatcca-5'

TABLE 3

ANTICIPATED MASS/CHARGE RATIOS FOR THE SEQUENCING LADDER.

In Figures 4, 5 and 6, are provided the results from the mass spectrometry are shown. In Figure 4 is presented, from top to bottom, panels 1-5. In Figure 5 is presented, from top to bottom, panels 6-11. In Figure 6 is presented, from top to bottom, panels 12-17. The results are summarized in Tables 4A and 4B.

Panel 1 is the extracted ion chromatogram for the 16-mer with a m/z of 1247.1 with a peak area of 28.1 Panel 2 is the extracted ion chromatogram for the 15-mer with a m/z of 1170.9+936.6 with a peak area of 3.4.

Panel 3 is the extracted ion chromatogram for the 14-mer with a m/z of 1088.7 with a peak area of 3.5.

Panel 4 is the extracted ion chromatogram for the 13-mer with a m/z of 1342.2+1006.4 with a peak area of 104.2.

Panel 5 is the total ion chromatogram.

Panel 6 is the extracted ion chromatogram for the 12-mer with a m/z of 1237.8 with a peak area of 129.2.

Panel 7 is the extracted ion chromatogram for the 11-mer with a m/z of 1136.4 with a peak area of 41.6.

Panel 8 is the extracted ion chromatogram for the 10-mer with a m/z of 1026.7 with a peak area of 11.9.

Panel 9 is the extracted ion chromatogram for the 9-mer with a m/z of 1395.9+930.3 with a peak area of 16.1. Panel 10 is the extracted ion chromatogram for the 8-mer with a m/z of 1243.8 with a peak area of 8.5.

Panel 11 is the total ion chromatogram.

Panel 12 is the extracted ion chromatogram for the 7-mer with a m/z of 1079.2 with a peak area of 5.7. Panel 13 is the extracted ion chromatogram for the 6-mer with a m/z of 922.6 with a peak area of 14.2.

Panel 14 is the extracted ion chromatogram for the 5-mer with a m/z of 1557 with a peak area of 0.

Panel 15 is the extracted ion chromatogram for the 4-mer with a m/z of 1267.8 with a peak area of 0. Panel 16 is the extracted ion chromatogram for the 3-mer with a m/z of 936.6 with a peak area of 0.

Panel 17 is the total ion chromatogram.

TABLE 4A

N/a = not applicable

TABLE 4B

Therefore, the deduced sequence is 5' -tccagtcgtaggt-3' . Note that there is no signal from the 5-mer, 4-mer and 3-mer which probably indicates that the site is covered by the nicking enzyme.

The plate was loaded onto the LC/MS (Micromass LTD, Manchester UK and Beverly, MA, USA) which is a LCT time-of-flight using electrospray in the negative mode. The conditions were as described above.

Example 2 Following the procedures described in Example 1 , the sequencing reaction was also demonstrated by using the ITAtop/NBbt16 duplex:

ITAtop 5' -ccqatctaqtqaqtcqctcaqttccaqtcqtaqqt NBbtlδ- 3' -qqctaqatcactcaqcqaqtcaaqqtcaqcatcca

The sequencing reaction products are shown in Table 5

TABLE 5

Figures 7, 8, 9, 10 and 11 provide information about this experiment. Figure 7 shows the oligonucleotide products as a function of nucleotide length. Figures 8 and 9 provide the extracted ion chromatograms of for a sequence ladder generated according to this experiment. In Figure 8 is presented, from top to bottom, panels 1-7 (Figures 8a, 8b, 8c, 8d, 8e, 8f and 8g, respectively), while Figure 9 presents, from top to bottom, panels 8-14 (Figures 9a, 9b, 9c, 9d, 9e, 9f and 9g). Figure 10 shows a spectrum of yield for the various oligomeric products as a function of time. Figure 11 shows the yield of 16 mer vs. nicking enzyme.

Example 3 This example employed NBbt20 and Vent exo- at temperatures presented in Tables 6A, 6B and 6C The values are in relative mass units

(RMUs) and top line is the mass/charge value for the indicated oligonucleotide. These results are also shown in Figure 12 for 55°C (Figure 12a) and 60°C (Figure 12b).

ι_ CN

Φ CN ^ CN lO co co σs co T-

O CO co LO CD

N Cfl 00 l CO CM CO 05 o co

E LO o LO σ> co co σ> LO^' σ> co cb d

CN LO N M ^■* CM T- co r τ- (M M r

00

Φ f- 00 oo - LO O

E CD co q o LO O) CO o o CO o CO -

^ CN CO lO co co CO CO d d d d d

<

CD

LU

_J Φ co co r- O N- t oo σ> CM CQ E O co C co co CM

< co co d d ci d d o O d σ d d d d d d

1-

c co co CO o co o co co co

~c q LO o p o q LO

5> CM X o d o d CM *_*— σ d d d CM ^•*- d d d

LO LO LO lO LO LO

E Q IO LO LO IO LO LO o o o o LO LO lO LO LO LO .^' -^' : N: -^' o o r- ^

T- CN co ** ιo co r— oo σ> O t- CM co ^ LO co r-.

CD °l°l _°, °l °l °l °l °l °ι 1 1 1 L <'<' <' <' <' <' <' <' <' <' <' <' cα LU LU LU LU LU LU LU LU LU LU LU UJ L <U LU L<U LU L<U¹

Q Q Q Ω Ω Q a a Q Q Q Ω Ω Ω Ω Ω

Q ω ω <D α> ω ω <D ω α> cu ω φ φ ⁽D Φ Φ

^'α. ^'Q. ^"o. ^'o. ^"D. ^'Q. ^"a. ^'Ω. ^'a. ^'α. ^'a. ^'a. ^'a. ^'a. o o O o o o o o o o o o o o o o o

Φ co CM CM ^■* _ ro O) N CO _m „, CM CO CO 00

E OO N ^f ^ o ro CN CM < ^■**r °? CM CO CO T- co o r

CM CN cb co ^ "2 CN -^' LO ^■* 2 *- oo CM cb

CM CM CN ^ CD CD LO LO CM cb ~ ^■* CO CO T-

N- CO CN CM lO LO LO co r— O

CD co lO co CN co LO ro or co . ro CD co L ro CM CO ro LO d 00 r-: LO d ib c-: LO CN CN CM CM

φ co co E cb CD CN CO CD ro 00 00 co O co co

CM CO CN ro LO CO LO 00 CM CN oo co co 00 co CN CM CN o CO LO LO LO C d d d LO CO LO N: cb cb ro CN

Φ ' cO OO _rt t r <t co co oo CN CD ro L CN co 00 LO co

E o n co ^ o ^r CO O cb ro cq co ι co O CM O co oo ro ro I-- N- CO CO N CO CO C

CQ

CD

LU ι_ oo

_l Φ oo co ro co co oo ro oo CM CD N S m co o oo CD

E w CO CM oo o ιo oo 1- LO CN LO CM LO

CN cb cb ro ro ιb ro ro oo

CO CM CM ^ CN cb co co <* CN CM CN

W *2 .ti C O LO CO CO CO r- O B CO (D O s O LO CO CO CO

C Φ r O in CM r O T- O LO CN T- O - O LO CN T-; =>> CN d o o CN σ o o o CM^' r d d d

LO LO LO lO LO LO

E ϋ LO LO lO lO LO LO o o o o σ (D o lO LO LO LO LO LO N S N N S CD CD CD CD CO lO LO LO LO LO LO

_°| CN CO ^•* LO CO - oo ro o - CN co ^■* LO CO N- o 1 o 1 σ 1 o 1 σ 1 _°| _°| °l^*"l 1 1 I

Lu <' <'<' <'<'<' <' <^■ <'<' <'<' <' <'<' <' <' ro LU LU LU LU LU LU LU LU LU LU LU LU LU LU LU LU LU

Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω cα Ω φ Φ φ U C (IDI I φD φ φ φ φ φ φ φ φ Φ Φ Φ Φ Φ

^'o. ^'o. ^"o. ^'a. ^' L ^'a. Ό. ^'o. ^'a. ^'o. ^'a. ^'a. α. ^'5. ^'α. o o o o o o o o o o o o o o o o o

φ ro CO CM r- E ro ro o o o o o LO p LO o o h- co co o o oo ° co> d ° ^■^ d X d d

Φ oo CD

E CD O O O O O O O O o o o o o o o

φ ro

CD t- - r- co - T- CM CD in o in "* O CO r"- -; 00 CO C33 o co

"3" CM CO O O N ^■* lO ^ d in ^ cb cb T^

φ °l °l °l °l °l °l u <'<'<' <'<'<' <'<'

LU LU LU LU LU LU LU LU O Ω Ω Ω Ω Ω Ω Ω Ω

Ω Φ Φ Φ Φ Φ Φ Φ Φ

^'α. ^'o. ^'Q. ^"Q. ^'Q. ^'a. o o o o o o

o o Example 4 A method to create triggers from the sequencing ladders allows the amplification of an oligonucleotide ladder at almost any site in a genome. This method makes use of the odd behavior of the N.BstNBI to recognize a mismatched recognition sequence in a duplex. That is, the entire 5' -GAGTC- 3' does not need to be entirely base-paired for the nicking enzyme to bind and nick a duplex. The nicking enzyme has no proclivity to cleave single-strand oligonucleotides containing the recognition site 5' -GAGTC-3' (data not shown). The ability of N.BstNBI to bind to mismatched recognition sequence was tested by constructing a series of oligonucleotides duplexes in which only the recognition site was mismatched to various extents and then amplifying oligonucleotides in a linear reaction (see methods). The duplex forms a primer- template for linear amplification when cleaved by the nicking enzyme:

3'-GG ATG CTG ACC TTG TCT GAG TGG ATG CTG ACC T- 5'

5'-CC TAC GAC TGG AAC AGA CTC ACC TAC GAC TGG A- 3'

where, above and in Table 7, the amplified fragment is shown in bold, the recognition site for the nicking enzyme is shown as underlined text, and the reach-over is shown in italicized text and mismatches are indicated with italicized underlined bolded text.

TABLE 7 MISMATCHED DUPLEXES IN THE RECOGNITION SITE FOR N.BSTNBI.

3' GG ATG CTG ACC TTG TCT GAG TGG ATG CTG ACC T- 5'

5' CC TAC GAC TGG AAC AGA CTC ACC TAC GAC TGG A- 3'

5' CC TAC GAC TGG AAC A T AAA ACC TAC GAC TGG A- 3'

5' CC TAC GAC TGG AAC AGA 7TC ACC TAC GAC TGG A- 3'

5 ' CC TAC GAC TGG AAC AGA G C ACC TAC GAC TGG A- 3'

5 ' CC TAC GAC TGG AAC AGJ CTC ACC TAC GAC TGG A- 3'

5 ' CC TAC GAC TGG AAC AGA AAC ACC TAC GAC TGG A- 3'

5' CC TAC GAC TGG AAC AGJ_A C ACC TAC GAC TGG A- 3'

5' T CCA GTC GTA GGT GAG TCT GTT CCA GTC GTA GG- 3'

The fragment 3' -GG ATG CTG ACC-5' is generated in the amplification reaction. Duplexes containing 1-5 mismatched base pairs were tested and the read-out was by mass spectrometry using LC-TOF (see Methods). The results are summarized in Table 8 below.

TABLE 8

ABILITY OF N.BSTNBI TO GENERATE OLIGONUCLEOTIDE FRAGMENTS IN A LINEAR

AMPLIFICATION WHEN THE RECOGNITION SITE (5' -GAGTC-3' )

IS WHOLLY OR PARTIALLY MISMATCHED.

*Signal is in Relative Mass Units (RMUs)

This observation has been coupled with the concept of using a distributive enzyme to create triggers almost anywhere in an organism's genome. It has been noted that when Vent exo- or 9°-North polymerase, which are both distributive enzymes, are used in the linear amplification reaction that truncation products appear in the chromatography and mass spectrum when analyzed by LC-TOF. To illustrate the point a primer-template was used to generate oligonucleotides from a 20-mer template. The distribution of products was shown above. Note that very few full-length products (20-mers) were formed. 20-mers are relatively stable under the isothermal (60°C) conditions and so not readily dissociate after the duplex is nicked. However, a preponderance of 6-mers and 12-mers are formed. It has been reported that for Vent exo- the average chain length generated per initiation was about 7 nucleotides for Vent exo-. It may be the case that at 60°C the average chain length is 6 nucleotides. The notable point however is that triggers can be produced on a template nucleic acid and that a well-defined 5' -terminus on the template does not need to be present. Ail of the above U.S. patents, U.S. patent application publications, U.S. patent applications, foreign patents, foreign patent applications and non- patent publications referred to in this specification and/or listed in the Application Data Sheet, are incorporated herein by reference, in their entirety.

From the foregoing it will be appreciated that, although specific embodiments of the invention have been described herein for purposes of illustration, various modifications may be made without deviating from the spirit and scope of the invention. Accordingly, the invention is not limited except as by the appended claims.

no

Claims

1. A method for obtaining base sequence information of a polynucleotide, the method comprising: a) providing a polynucleotide comprising a nicking agent recognition sequence; b) combining the polynucleotide of a) with components comprising: i) a nicking agent that recognizes the recognition sequence; ii) a distributive polymerase; and iii) a deoxyribonucleoside triphosphate; under conditions that form a plurality of oligonucleotides, where members of the plurality differ by the number of nucleotides in the oligonucleotides; c) characterizing the plurality of oligonucleotides and thereby obtaining base sequence information about the polynucleotide.

2. The method of claim 1 wherein the nicking agent is a nicking endonuclease.

3. The method of claim 2 wherein the nicking endonuclease is N.BesNBI.

4. The method of claim 1 wherein the distributive polymerase is selected from the group consisting of VentR® (exo-), Deep VentR® (exo-), and 9° North.

5. The method of claim 1 wherein the components further comprise trehalose.

6. The method of claim 1 wherein the components comprise each of dATP, dCTP, dTTP and dGTP.

Ill

7. The method of claim 1 wherein the plurality of oligonucleotides have lengths within the range of 4-24 nucleotides.

8. The method of claim 1 wherein the plurality of oligonucleotides are characterized, at least in part, by mass spectroscopy.

9. The method of claim 1 wherein the plurality of oligonucleotides are characterized, at least in part, by liquid chromatography.

10. The method of claim 1 wherein the condition comprise isothermal conditions.

11. The method of claim 10 wherein the isothermal conditions are 60°C +/- 5°C

12. A method for determining the nucleotide sequence of an interrogation sequence located within a target polynucleotide, comprising:

(b) contacting the double-stranded template polynucleotide comprising the interrogation polynucleotide with the NA;

(c) nicking the template at the NS to provide a new 3' terminus at the NS;

(e) amplifying two or more single-stranded polynucleotide fragments by repeating steps (c) and (d), wherein each fragment is extended by a differing number of nucleotides, to provide a plurality of amplified fragments; and (f) aligning the plurality of amplified fragments and identifying the 3' terminal nucleotide of each of the fragments, thereby determining the nucleotide sequence of the interrogation region of the polynucleotide.

13. The method of claim 12 wherein the NA is a nicking endonuclease (NE).

14. The method of claim 13 wherein the NE is N.BstNb I.

15. The method of claim 13 wherein the NE is a N.BstNb I mutant that retains nicking activity at temperatures greater than 65 °C

16. The method of claim 12 wherein the NA is a restriction endonuclease (RE).

17. The method of claim 12 wherein step (d) is performed in the presence of a distributive DNA polymerase.

18. The method of claim 17 wherein the distributive DNA polymerase is selected from the group consisting of VentR® (exo-), Deep VentR® (exo-), and 9° North.

19. The method of claim 18 wherein the distributive DNA polymerase is VentR® (exo-) polymerase.

20. The method of claim 12 wherein steps (c), (d), and (e) are performed under an identical isothermal condition.

21. The method of claim 20 wherein steps (c), (d), and (e) are performed at a temperature of about 50 °C to about 70 °C.

22. The method of claim 20 wherein steps (c), (d), and (e) are performed at about 60 °C

23. The method of claim 20 wherein the isothermal condition is performed at temperatures between a highest temperature and a lowest temperature, wherein the highest temperature is within 20°C of the lowest temperature.

24. The method of claim 23 wherein the highest temperature is within 15 °C of the lowest temperature.

25. The method of claim 23 wherein the highest temperature is within 10 °C of the lowest temperature.

26. The method of claim 23 wherein the highest temperature is within 5 °C of the lowest temperature.

27. The method of claim 12 wherein the longest extended single- stranded polynucleotide fragment of step (e) is no more than 40 nucleotides.

28. The method of claim 12 wherein the longest extended single- stranded polynucleotide fragment of step (e) is no more than 28 nucleotides.

29. The method of claim 12 wherein the longest extended single- stranded polynucleotide fragment of step (e) is no more than 16 nucleotides.

30. The method of claim 12 wherein the longest extended single- stranded polynucleotide fragment of step (e) is no more than 12 nucleotides.

31. The method of claim 12 wherein the longest extended single- stranded polynucleotide fragment of step (e) is no more than 7 nucleotides.

32. The method of claim 12 wherein determining the nucleotide sequence is performed at least partially by a technique selected from the group consisting of luminescence spectroscopy, fluorescence spectroscopy, mass spectrometry, liquid chromatography, fluorescence polarization, electron ionization, gel electrophoresis, gas chromatography, and capillary electrophoresis.

33. The method of claim 32 wherein determining the nucleotide sequence comprises the technique of mass spectrometry.

34. The method of claim 33 wherein determining the nucleotide sequence further comprises liquid chromatography.

35. The method of claim 12 further comprising measuring the molecular mass of each single-stranded polynucleotide fragment, aligning the single- stranded polynucleotide fragments according to the difference in molecular mass of each fragment, and thereby determining the nucleotide sequence of the interrogation polynucleotide.

36. The method of claim 12 wherein the double-stranded template polynucleotide is formed by a method comprising:

(a) forming a mixture of a first oligonucleotide primer (ODNP), a second ODNP, and the target polynucleotide comprising the interrogation polynucleotide, under conditions and for a time sufficient to allow the ODNPs to hybridize to the target polynucleotide, wherein

37. The method of claim 12 wherein the double-stranded template polynucleotide is formed by a method comprising:

(i) if the target polynucleotide is a double-stranded polynucleotide having a first strand and a second strand, then the first ODNP comprises a nucleotide sequence of one strand of a NARS and a nucleotide sequence at least substantially complementary to a nucleotide sequence of the first strand of the target polynucleotide and located 3' to the complement of the interrogation polynucleotide, wherein the NARS is a first restriction endonuclease recognition sequence (RERS), and the second ODNP comprises a sequence of one strand of a second RERS and a nucleotide sequence at least substantially complementary to a nucleotide sequence of the second strand of the target polynucleotide located 3' to the interrogation polynucleotide, or (ii) if the target polynucleotide is a single-stranded polynucleotide then the first ODNP comprises a nucleotide sequence of one strand of a NARS, wherein the NARS is a first restriction endonuclease recognition sequence (RERS), and a nucleotide sequence at least substantially complementary to a nucleotide sequence of the target polynucleotide located 5' to the interrogation polynucleotide, and the second ODNP comprises a nucleotide sequence of one strand of a second RERS and a nucleotide sequence at least substantially complementary to a nucleotide sequence of the target polynucleotide located 3' to the interrogation polynucleotide;

(b) extending the first and the second ODNPs in the presence of at least one modified deoxyribonucleoside triphosphate to produce the double-stranded template comprising both the first RERS and the second RERS.

38. The method of claim 12 wherein the double-stranded template polynucleotide is formed by a method comprising:

(a) forming a mixture of a first ODNP, a second ODNP, and the target polynucleotide comprising the interrogation polynucleotide, under conditions and for a time sufficient to allow the ODNPs to hybridize to the target polynucleotide, wherein (i) if the target polynucleotide is a double-stranded polynucleotide molecule having a first strand and a second strand, then the first ODNP comprises a nucleotide sequence at least substantially complementary to a nucleotide sequence of the first strand of the target polynucleotide and located 3' to the complement of the interrogation polynucleotide, and the second ODNP comprises a nucleotide sequence at least substantially complementary to a nucleotide sequence of the second strand of the target polynucleotide located 3' to the interrogation polynucleotide, or (ii) if the target polynucleotide is a single-stranded polynucleotide, then the first ODNP comprises a nucleotide sequence at least substantially identical to a nucleotide sequence of the target polynucleotide located 5' to the interrogation polynucleotide, and the second ODNP comprises a nucleotide sequence at least substantially complementary to a nucleotide sequence of the target polynucleotide located 3' to the interrogation polynucleotide; the first and the second ODNPs each further comprises the sequence of a sense strand of a NARS, wherein the NARS is a nicking endonuclease recognition sequence (NERS);

(b) extending the first and the second ODNPs to produce the double-stranded template having two NERSs.

39. The method of claim 12 wherein the double-stranded template polynucleotide is formed by a method comprising:

(a) mixing an ODNP and the target polynucleotide comprising the interrogation polynucleotide under conditions and for a time sufficient to allow the ODNP and the target polynucleotide to hybridize, wherein the target polynucleotide is a single-stranded polynucleotide, and wherein the ODNP comprises a sequence of the sense strand of an NERS and a nucleotide sequence that is substantially complementary to a sequence of the target polynucleotide that is located 3' to the interrogation polynucleotide; and

(b) extending the ODNP to produce the double-stranded template polynucleotide comprising the NERS.

40. The method of claim 39 further comprising attaching the single- stranded target polynucleotide or the ODNP to a solid surface prior to mixing the ODNP and the single-stranded target polynucleotide.

41. The method of claim 40 wherein the solid surface is selected from the group consisting of a particle, a microwell, a microfabricated device, and a dipstick.

42. A method for determining the nucleotide sequence of an interrogation polynucleotide located in a double-stranded target polynucleotide having a first and second strand, comprising:

(a) linking a first oligonucleotide adaptor comprising a nucleotide sequence of the sense strand of a NERS to the first strand of the target polynucleotide that comprises the complement of the interrogation polynucleotide sequence at a location 3' to the complement of the interrogation polynucleotide sequence; and

(b) linking a second oligonucleotide adaptor comprising a nucleotide sequence of one strand of a Type Ms restriction endonuclease sequence (TRERS) to the second strand of the target polynucleotide that comprises the interrogation polynucleotide sequence at a location 3' to the interrogation polynucleotide sequence;

(c) extending the first and second oligonucleotide adaptors to produce a double-stranded template polynucleotide comprising the NERS and the TRERS;

(d) digesting the double-stranded template polynucleotide with a restriction endonuclease that recognizes the TRERS to produce a digestion product; (e) contacting the digestion product with a nicking enzyme (NE) and nicking the digestion product to provide a new 3' terminus at the NS;

43. A method for determining the nucleotide sequence of an interrogation polynucleotide located in a target polynucleotide, comprising:

(a) forming a mixture of a first oligonucleotide primer (ODNP), a second ODNP, and the target polynucleotide, under conditions and for a time sufficient to allow the ODNPs to anneal to the target polynucleotide, wherein

(i) if the target polynucleotide is a double-stranded nuclepolynucleotide having a first strand and a second strand, then the first ODNP comprises a nucleotide sequence of the sense strand of a nicking endonuclease recognition sequence (NERS) and a nucleotide sequence at least substantially complementary to a nucleotide sequence of the first strand of the target polynucleotide and located 3' to the complement of the interrogation polynucleotide, and the second ODNP comprises a nucleotide sequence at least substantially complementary to a nucleotide sequence of the second strand of the target polynucleotide and located 3' to the interrogation polynucleotide, and optionally comprises a sequence of one strand of a restriction endonuclease recognition sequence (RERS), or (ii) if the target polynucleotide is a single-stranded polynucleotide, then the first ODNP comprises a nucleotide sequence of a sense strand of a NERS and a nucleotide sequence at least substantially identical to a nucleotide sequence of the target polynucleotide located 5' to the interrogation polynucleotide, and the second ODNP comprises a nucleotide sequence at least substantially complementary to a nucleotide sequence of the target polynucleotide located 3' to the interrogation polynucleotide and optionally comprises a RERS; and

(b) extending the first and the second ODNPs to produce a double- stranded template polynucleotide comprising an NERS and a RERS;

(c) digesting the double-stranded template polynucleotide of step (b) with a restriction endonuclease that recognizes the RERS to produce a digestion product;

(d) contacting the digestion product with a nicking enzyme (NE) and nicking the digestion product to provide a new 3' terminus at the NS;

(g) separating each single-stranded polynucucleotide fragment and measuring the molecular mass of each fragment by a method comprising at least partially liquid chromatography and mass spectrometry;

(h) aligning the single-stranded polynucleotide fragments according to the differences in molecular mass of each fragment and identifying the 3' terminal nucleotide of each of the fragments, and thereby determining the nucleotide sequence of the interrogation polynucleotide.

44. The method of either claim 42 or claim 43 wherein the NE is N.BstNB I.

45. The method of either claim 42 or claim 43 wherein the NE is a N.BstNB I mutant that retains nicking activity at temperatures greater than 65 °C

46. The method of either claim 42 or claim 43 wherein the distributive DNA polymerase is selected from the group consisting of VentR® (exo-), Deep VentR® (exo-), and 9° North.

47. The method of claim 46 wherein the distributive DNA polymerase is VentR® (exo-) polymerase.

48. The method of either claim 42 or claim 43 wherein the longest extended single-stranded polynucleotide fragment is no more than 40 nucleotides.

49. The method of either claim 42 or claim 43 wherein the longest extended single-stranded polynucleotide fragment is no more than 28 nucleotides.

50. The method of either claim 42 or claim 43 wherein the longest extended single-stranded polynucleotide fragment is no more than 16 nucleotides.

51. The method of either claim 42 or claim 43 wherein the longest extended single-stranded polynucleotide fragment is no more than 12 nucleotides.

52. The method of either claim 42 or claim 43 wherein the longest extended single-stranded polynucleotide fragment is no more than 7 nucleotides.

53. The method of claim 42 wherein determining the sequence is performed at least partially by a technique selected from the group consisting of luminescence spectroscopy, fluorescence spectroscopy, mass spectrometry, liquid chromatography, fluorescence polarization, electron ionization, gel electrophoresis, gas chromatography, and capillary electrophoresis.

54. The method of claim 53 wherein determining the sequence comprises the technique of mass spectrometry.

55. The method of claim 42, further comprising measuring the molecular mass of each single-stranded polynucleotide fragment and aligning the single-stranded polynucleotide fragments according to the difference in molecular mass of each fragment, thereby determining the nucleotide sequence of the interrogation polynucleotide.

56. The method of either claim 42 or claim 43 wherein the nicking, extension, and amplification steps are performed under an isothermal condition.

57. The method of claim 56 wherein the nicking, extension, and amplification steps are performed at a temperature of about 50 °C to about 70 °C

58. The method of claim 56 wherein the nicking, extension, and amplification steps are performed at about 60 °C

59. The method of claim 56 wherein the isothermal condition is performed at temperatures between a highest temperature and a lowest temperature, wherein the highest temperature is within 20 °C of the lowest temperature.

60. The method of claim 59 wherein the highest temperature is within 15 °C of the lowest temperature.

61. The method of claim 59 wherein the highest temperature is within 10 °C of the lowest temperature.

62. The method of claim 59 wherein the highest temperature is within 5 °C of the lowest temperature.

63. The method of any one of claims 1, 12, 42, or 43 wherein the interrogation polynucleotide comprises a genetic variation.

64. The method of claim 63 wherein the genetic variation is a single nucleotide polymorphism.

65. The method of any one of claims 1 , 12, 42, or 43 wherein the interrogation polynucleotide is associated with a disease.

66. The method of claim 65 wherein the disease is a human genetic disease.

67. The method of claim 65 wherein the disease is cancer.

68. The method of any one of claims 1 , 12, 42, or 43 wherein the target polynucleotide is isolated from at least one cell having or suspected of having a mutation in a target polynucleotide sequence, wherein said mutation is associated with tumorigenesis.

69. The method of any one of claims 1 , 12, 42, or 43 wherein the interrogation polynucleotide is associated with drug resistance of a microorganism.

70. The method of any one of claims 1 , 12, 42, or 43 wherein the target polynucleotide is selected from the group consisting of a genomic nucleic acid, a cDNA, a mRNA, a ribosomal RNA, a mitochondrial DNA, and a mitochondrial RNA.

71. The method of any one of claims 1 , 12, 42, or 43 wherein the target polynucleotide is derived from an infectious agent.

72. The method of claim 71 wherein the infectious agent is selected from the group consisting of a virus, a bacterium, a fungus, and a parasite.

73. The method of any one of claims 1 , 12, 42, or 43 wherein the target polynucleotide is present in a biological sample.

74. A kit for determining the nucleotide sequence of an interrogation polynucleotide within a target polynucleotide, comprising:

(a) if the target polynucleotide is a double-stranded polynucleotide having a first strand and a second strand,

(ii) a second ODNP comprising a nucleotide sequence at least substantially complementary to a nucleotide sequence of the second strand of the target polynucleotide located 3^* to the interrogation polynucleotide and optionally comprising a sequence of one strand of a restriction endonuclease recognition sequence (RERS); or,

75. The kit of claim 74 wherein the target polynucleotide is amplified using the first and the second ODNPs as primers to generate a double-stranded polynucleotide template, such that the distance between a nicking site (NS) produced by a nicking endonuclease (NE) that recognizes the NERS in one strand and the location corresponding to a cleavage site produced by a restriction endonuclease (RE) that recognizes the RERS in the other strand is no more than 50 nucleotides.

76. The kit of claim 74 further comprising a nicking endonuclease (NE) and a nicking endonuclease buffer. N.BstNB 1 and an N.BstNB I buffer.

77. The kit of claim 76 wherein the NE is N.BstNB I and the nicking endonuclease buffer is a N.BstNB I buffer.

78. The kit of claim 74 further comprising a restriction endonuclease (RE) that recognizes the RERS and a RE buffer.

79. The kit of claim 74 further comprising a distributive DNA polymerase and a DNA polymerase buffer.

80. The kit of claim 79 wherein the distributive DNA polymerase is selected from the group consisting of VentR® (exo-), Deep VentR® (exo-), and 9° North.

81. The kit of claim 74 further comprising instructions for using the kit.

82. The kit of claim 74 further comprising a liquid chromatography column, and a first buffer that comprises water and an ammonium salt of a secondary or tertiary amine complexed with an organic or inorganic acid, and a second buffer that comprises the first buffer and an organic solvent.

83. The kit of claim 82 wherein the amine is selected from the group consisting of triethylamine, diallylamine, diisopropylamine, N,N-dimethyl-N- cyclohexylamine, N,N-dimethyl-N-isopropylamine, and N,N-dimethyl-N-butylamine.

84. The kit of claim 82 wherein the amine is complexed with an organic acid, and wherein the organic acid is selected from the group consisting of acetic acid, propionic acid, formic acid, carbonic acid, and halogenated versions thereof.

85. The kit of claim 82 wherein the organic solvent is methanol or acetonitrile.

86. The kit of claim 82 wherein the chromatography column is a reversed-phase chromatography column.

87. The kit of claim 74 further comprising at least one deoxyribonucleoside triphosphate.

88. The kit of claim 74 further comprising at least one modified deoxyribonucleoside triphosphate.

89. The kit of claim 74 further comprising a control template polynucleotide and a control ODNP pair.

90. The kit of claim 74 further comprising trehalose.

91. The kit of claim 74 further comprising an oligonucleotide standard.

92. The kit of claim 74 further comprising an access code for a software used in designing or ordering the ODNPs.

93. A composition comprising a nicking endonuclease, a distributive DNA polymerase, and one or more deoxyribonucleoside triphosphate.

94. The composition of claim 93 wherein the nicking endonuclease is N.BstNb I.

95. The composition of claim 93 wherein the distributive DNA polymerase is selected from the group consisting of VentR® (exo-), Deep VentR® (exo-), and 9° North.

96. The composition of claim 93 further comprising potassium chloride.

97. The composition of claim 93 further comprising ammonium sulfate.

98. The composition of claim 93 further comprising magnesium sulfate.

99. The composition of claim 93 further comprising Tris-HCl.

100. The composition of claim 93 further comprising trehalose.

101. The composition of claim 93 in sterile form.